使用引用从CSV文件中提取数据(extracting data from CSV file using a reference)

编程入门 行业动态 更新时间:2024-10-26 16:24:11
使用引用从CSV文件中提取数据(extracting data from CSV file using a reference)

我有一个包含数百个生物ID的csv文件和一个包含数千个生物ID和其他特征的第二个csv文件(分类信息,每个样本的丰度等)

我正在尝试编写一个代码,该代码将使用较小的csv文件作为参考从较大的csv中提取信息。 这意味着它将查看较小和较大的文件,如果ID在两个文件中,它将从较大的文件中提取所有信息并将其写入新文件(基本上写入该ID的整个行)。

到目前为止,我已经写了以下内容,虽然代码没有出错,但我最后得到一个空白文件,我不知道为什么。 我是一名研究生,知道一些简单的编码,但我仍然是一个新手,

谢谢

import sys import csv import os.path SparCCnames=open(sys.argv[1],"rU") OTU_table=open(sys.argv[2],"rU") new_file=open(sys.argv[3],"w") Sparcc_OTUs=csv.writer(new_file) d=csv.DictReader(SparCCnames) ids=csv.DictReader(OTU_table) for record in ids: idstopull=record["OTUid"] if idstopull[0]=="OTUid": continue if idstopull[0] in d: new_id.writerow[idstopull[0]] SparCCnames.close() OTU_table.close() new_file.close()

I have a csv file with several hundred organism IDs and a second csv file with several thousand organism IDs and additional characteristics (taxonomic information, abundances per sample, etc)

I am trying to write a code that will extract the information from the larger csv using the smaller csv file as a reference. Meaning it will look at both smaller and larger files, and if the IDs are in both files, it will extract all the information form the larger file and write that in a new file (basically write the entire row for that ID).

so far I have written the following, and while the code does not error out on me, I get a blank file in the end and I don't exactly know why. I am a graduate student that knows some simple coding but I'm still very much a novice,

thank you

import sys import csv import os.path SparCCnames=open(sys.argv[1],"rU") OTU_table=open(sys.argv[2],"rU") new_file=open(sys.argv[3],"w") Sparcc_OTUs=csv.writer(new_file) d=csv.DictReader(SparCCnames) ids=csv.DictReader(OTU_table) for record in ids: idstopull=record["OTUid"] if idstopull[0]=="OTUid": continue if idstopull[0] in d: new_id.writerow[idstopull[0]] SparCCnames.close() OTU_table.close() new_file.close()

最满意答案

我不确定你在代码中尝试做什么,但你可以试试这个:

def csv_to_dict(csv_file_path): csv_file = open(csv_file_path, 'rb') csv_file.seek(0) sniffdialect = csv.Sniffer().sniff(csv_file.read(10000), delimiters='\t,;') csv_file.seek(0) dict_reader = csv.DictReader(csv_file, dialect=sniffdialect) csv_file.seek(0) dict_data = [] for record in dict_reader: dict_data.append(record) csv_file.close() return dict_data def dict_to_csv(csv_file_path, dict_data): csv_file = open(csv_file_path, 'wb') writer = csv.writer(csv_file, dialect='excel') headers = dict_data[0].keys() writer.writerow(headers) # headers must be the same with dat.keys() for dat in dict_data: line = [] for field in headers: line.append(dat[field]) writer.writerow(line) csv_file.close() if __name__ == "__main__": big_csv = csv_to_dict('/path/to/big_csv_file.csv') small_csv = csv_to_dict('/path/to/small_csv_file.csv') output = [] for s in small_csv: for b in big_csv: if s['id'] == b['id']: output.append(b) if output: dict_to_csv('/path/to/output.csv', output) else: print "Nothing."

希望会有所帮助。

Thank you everyone for your help. I played with things and consulted with an advisor, and finally got a working script. I am posting it in case it helps someone else in the future.

Thanks!

import sys import csv input_file = csv.DictReader(open(sys.argv[1], "rU")) #has all info ref_list = csv.DictReader(open(sys.argv[2], "rU")) #reference list output_file = csv.DictWriter( open(sys.argv[3], "w"), input_file.fieldnames) #to write output file with headers output_file.writeheader() #write headers in output file white_list={} #create empty dictionary for record in ref_list: #for every line in my reference list white_list[record["Sample_ID"]] = None #store into the dictionary the ID's as keys for record in input_file: #for every line in my input file record_id = record["Sample_ID"] #store ID's into variable record_id if (record_id in white_list): #if the ID is in the reference list output_file.writerow(record) #write the entire row into a new file else: #if it is not in my reference list continue #ignore it and continue iterating through the file

更多推荐

本文发布于:2023-08-07 14:46:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1464904.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文件   数据   CSV   extracting   reference

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!