我使用 pandas.read_fwf() 函数读取具有以下内容的文件 fwf.txt:
I use pandas.read_fwf() function in Python pandas 0.19.2 to read a file fwf.txt that has the following content:
# Column1 Column2 123 abc 456 def # #我的代码如下:
import pandas as pd file_path = "fwf.txt" widths = [len("# Column1"), len(" Column2")] names = ["Column1", "Column2"] data = pd.read_fwf(filepath_or_buffer=file_path, widths=widths, names=names, skip_blank_lines=True, comment="#")打印出来的数据框是这样的:
The printed dataframe is like this:
Column1 Column2 0 123.0 abc 1 NaN NaN 2 456.0 def 3 NaN NaN看起来 skip_blank_lines=True 参数被忽略了,因为数据帧包含 NaN.
It looks like the skip_blank_lines=True argument is ignored, as the dataframe contains NaN's.
确保跳过空行的 pandas.read_fwf() 参数的有效组合应该是什么?
What should be the valid combination of pandas.read_fwf() arguments that would ensure the skipping of blank lines?
推荐答案 import io import pandas as pd file_path = "fwf.txt" widths = [len("# Column1 "), len("Column2")] names = ["Column1", "Column2"] class FileLike(io.TextIOBase): def __init__(self, iterable): self.iterable = iterable def readline(self): return next(self.iterable) with open(file_path, 'r') as f: lines = (line for line in f if line.strip()) data = pd.read_fwf(FileLike(lines), widths=widths, names=names, comment='#') print(data)印刷品
Column1 Column2 0 123 abc 1 456 def with open(file_path, 'r') as f: lines = (line for line in f if line.strip())定义一个生成器表达式(即一个可迭代的),它产生删除空行的文件中的行.
defines a generator expression (i.e. an iterable) which yields lines from the file with blank lines removed.
pd.read_fwf 函数可以接受 TextIOBase 对象.你可以子类TextIOBase 以便其 readline 方法从可迭代对象返回行:
The pd.read_fwf function can accept TextIOBase objects. You can subclass TextIOBase so that its readline method returns lines from an iterable:
class FileLike(io.TextIOBase): def __init__(self, iterable): self.iterable = iterable def readline(self): return next(self.iterable)将这两者放在一起为您提供了一种操作/修改文件行的方法在将它们传递给 pd.read_fwf 之前.
Putting these two together gives you a way to manipulate/modify lines of a file before passing them to pd.read_fwf.
更多推荐
如何在 Pandas 中使用 read
发布评论