如何在 Pandas 中使用 read

编程入门行业动态更新时间:2024-10-26 19:38:27

本文介绍了如何在 Pandas 中使用 read_fwf 跳过空行?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我使用 pandas.read_fwf() 函数读取具有以下内容的文件 fwf.txt:

I use pandas.read_fwf() function in Python pandas 0.19.2 to read a file fwf.txt that has the following content:

# Column1 Column2 123 abc 456 def # #

我的代码如下:

import pandas as pd file_path = "fwf.txt" widths = [len("# Column1"), len(" Column2")] names = ["Column1", "Column2"] data = pd.read_fwf(filepath_or_buffer=file_path, widths=widths, names=names, skip_blank_lines=True, comment="#")

打印出来的数据框是这样的:

The printed dataframe is like this:

Column1 Column2 0 123.0 abc 1 NaN NaN 2 456.0 def 3 NaN NaN

看起来 skip_blank_lines=True 参数被忽略了，因为数据帧包含 NaN.

It looks like the skip_blank_lines=True argument is ignored, as the dataframe contains NaN's.

确保跳过空行的 pandas.read_fwf() 参数的有效组合应该是什么?

What should be the valid combination of pandas.read_fwf() arguments that would ensure the skipping of blank lines?

推荐答案

import io import pandas as pd file_path = "fwf.txt" widths = [len("# Column1 "), len("Column2")] names = ["Column1", "Column2"] class FileLike(io.TextIOBase): def __init__(self, iterable): self.iterable = iterable def readline(self): return next(self.iterable) with open(file_path, 'r') as f: lines = (line for line in f if line.strip()) data = pd.read_fwf(FileLike(lines), widths=widths, names=names, comment='#') print(data)

印刷品

Column1 Column2 0 123 abc 1 456 def

with open(file_path, 'r') as f: lines = (line for line in f if line.strip())

定义一个生成器表达式(即一个可迭代的)，它产生删除空行的文件中的行.

defines a generator expression (i.e. an iterable) which yields lines from the file with blank lines removed.

pd.read_fwf 函数可以接受 TextIOBase 对象.你可以子类TextIOBase 以便其 readline 方法从可迭代对象返回行:

The pd.read_fwf function can accept TextIOBase objects. You can subclass TextIOBase so that its readline method returns lines from an iterable:

class FileLike(io.TextIOBase): def __init__(self, iterable): self.iterable = iterable def readline(self): return next(self.iterable)

将这两者放在一起为您提供了一种操作/修改文件行的方法在将它们传递给 pd.read_fwf 之前.

Putting these two together gives you a way to manipulate/modify lines of a file before passing them to pd.read_fwf.

更多推荐

如何在 Pandas 中使用 read