我正在操作UCI数据集,其中一些包含“?” 在线。 例如:
56.0,1.0,2.0,130.0,221.0,0.0,2.0,163.0,0.0,0.0,1.0,0.0,7.0,0 58.0,1.0,2.0,125.0,220.0,0.0,0.0,144.0,0.0,0.4,2.0,?,7.0,0 57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0,2.0,1.0,3.0,1 38.0,1.0,3.0,138.0,175.0,0.0,0.0,173.0,0.0,0.0,1.0,?,3.0,0我首先使用numpy.loadtxt()加载文件,并尝试用“?”删除行。 使用line.contains('?') ,但遇到类型错误。
然后,我使用pandas.read_csv ,但是,我仍然没有简单的方法删除所有包含特定字母“?”的行。
有没有简单的方法来清理数据? 我需要一个没有任何“?”的浮点型数据文件 在里面。 谢谢〜
I am operating the UCI data sets, some of them contains "?" in lines. For example:
56.0,1.0,2.0,130.0,221.0,0.0,2.0,163.0,0.0,0.0,1.0,0.0,7.0,0 58.0,1.0,2.0,125.0,220.0,0.0,0.0,144.0,0.0,0.4,2.0,?,7.0,0 57.0,0.0,2.0,130.0,236.0,0.0,2.0,174.0,0.0,0.0,2.0,1.0,3.0,1 38.0,1.0,3.0,138.0,175.0,0.0,0.0,173.0,0.0,0.0,1.0,?,3.0,0I firstly use numpy.loadtxt() to load file, and try to delete the lines with "?" using line.contains('?'), but got error with the type.
Then I use pandas.read_csv, however, I still have no easy way to delete all lines contains a specific letter "?".
Is there any easy way to clean the data? I need a float type data file without any "?" in it. Thanks~
最满意答案
创建一个小脚本逐行读取文件,并将“合意”行输出到新文件。 然后对清理的数据进行操作。
我会这样做的方式是:
import fileinput for line in fileinput.input(): if '?' not in line: print line.strip() # to avoid adding new newlines并按如下方式在bash中运行它
python script.py < dirty.txt > clean.txt这使用stdin和stdout来处理文件,并且bash stdin / stdout重定向来读/写文件。
另一种纯粹的Python解决方案:
input_file = 'dirty.txt' output_file = 'clean.txt' with open(input_file) as dirty: with open(output_file, 'w') as clean: for line in dirty: if '?' not in line: clean.write(line)Create a small script to read the file line by line, and output "desirable" lines to a new file. Then operate on the cleaned data.
The way I would do it is with this:
import fileinput for line in fileinput.input(): if '?' not in line: print line.strip() # to avoid adding new newlinesand run it in bash as follows
python script.py < dirty.txt > clean.txtThis uses stdin and stdout to process the file, and bash stdin/stdout redirects to read/write to/from files.
An alternative pure python solution:
input_file = 'dirty.txt' output_file = 'clean.txt' with open(input_file) as dirty: with open(output_file, 'w') as clean: for line in dirty: if '?' not in line: clean.write(line)更多推荐
发布评论