我有一个数据框,使用Data = pandas.read_csv从.csv文件中提取
数据框的一列是日期,例如'14/09/2015' ,数据类型是str 。
我需要创建一个子集,我使用它: NewDataFrame = DataFrame['DatesColumn'][DataFrame['DatesColumn']==desired date]
但我有两个主要问题:
由于日期是字符串,我试图使用切片[-1]。 但我收到错误: KeyError : -1L我尝试使用此代码选择2014:
NewDataFrame = DataFrame['DatesColumn'][DataFrame['DatesColumn'][-1]==4]
我有空字段已导入为nan值。 如果我尝试执行for循环来转换数据,我会收到错误:TypeError: 'float' object has no attribute '__getitem__'
问:我如何按年分配数据(或清理数据)?
非常感谢。
I have a data frame, extracted from a .csv file using Data = pandas.read_csv
One of the columns of the data frame are dates, such as '14/09/2015', the type of data is str.
I need to create a subset, for which I use: NewDataFrame = DataFrame['DatesColumn'][DataFrame['DatesColumn']==desired date]
But I have two main problems:
Since the dates are strings, I have tried to use a slice [-1]. But I get the error: KeyError : -1LI tried to use this code to select 2014:
NewDataFrame = DataFrame['DatesColumn'][DataFrame['DatesColumn'][-1]==4]
I have empty fields that have been imported as nan values. If I try to perform a for loop to transform the data, I get the error:TypeError: 'float' object has no attribute '__getitem__'
Q: How can I subset the data (or clean it) by year?
Many thanks.
最满意答案
对于NaN值,您可以使用fillna() 。
# to fill NaNs with zeros noNans = withNans.fillna(0)对于日期问题,您应该让现有的库为您处理日期字符串,而不是自己处理日期字符串。 在这种情况下, read_csv()函数可以为您完成。 请参阅此处的文档。
这是一个小例子:
Csv文件:
1,14/09/2016,dataa 1,14/09/2015,dataa 2,14/10/2014,dataa2码:
import pandas as pd from datetime import date df = pd.read_csv("test.csv", header=None, parse_dates=[1]) df[df[1] > date.today()]仅打印
0 1 2 0 1 2016-09-14 dataaFor the NaN values you can use fillna().
# to fill NaNs with zeros noNans = withNans.fillna(0)And for the date issue, instead of handling the date strings yourself you should let the already existing libraries handle them for you. In this case the read_csv() function can do it for you. See the documentation here.
Here's a little example:
Csv file:
1,14/09/2016,dataa 1,14/09/2015,dataa 2,14/10/2014,dataa2Code:
import pandas as pd from datetime import date df = pd.read_csv("test.csv", header=None, parse_dates=[1]) df[df[1] > date.today()]Prints only
0 1 2 0 1 2016-09-14 dataa更多推荐
发布评论