我有一个150.000行和15列的表。 此示例的重要列是COUNTRY,COSTCENTER和EXTENSION。 我正在从CSV读取到Pandas Dataframe。 所有列都是object类型。
我想做的是:
搜索某个国家/地区(例如“中国”) 过滤COSTCENTER为1000或2000或EXTENSION以“862”开头的实例 应用所有过滤器后,将COUNTRY中的国家/地区名称更改为新的。我有一个解决方案,但我总是得到一个链接问题的警告:
df.COUNTRY[df.COUNTRY.str.match("China") & (df.COSTCENTER.str.match("1000") | df.COSTCENTER.str.match("2000"))] = 'China_new_name'我不能说,我完全理解,为什么我可以在这里遇到问题,但我一直在寻找替代方案。 我正在尝试使用lambda并申请,但我不断遇到各种各样的错误。
我现在的最新方法是:
filter_China = df.ix[(df["COUNTRY"]=="China") & ((df["COSTCENTER"]=="1000") | (df["COSTCENTER"]=="2000"))]它似乎过滤,我正在寻找(我还没有包括搜索EXTENSION,因为我首先想要这个工作)。
但是当我尝试根据搜索条件更改值时,我遇到了麻烦:
df.ix[(df["COUNTRY"]=="China") & ((df["COSTCENTER"]=="1000") | (df["COSTCENTER"]=="2000")), df["COUNTRY"]] = "China_new_name"我收到此错误:引发KeyError('%s不在索引'%objarr [mask])
我在这里想念的是什么? 这种方法是正确的还是我需要走完一条完全不同的路线?
I have a table with 150.000 rows and 15 columns. Important columns for this example are COUNTRY, COSTCENTER and EXTENSION. I am reading from a CSV into a Pandas Dataframe. All columns are of type object.
What I want to do is:
Search for a certain COUNTRY (e.g. "China") Filter for these instances where the COSTCENTER is either 1000 or 2000 or where an EXTENSION starts with "862" Once all filters have been applied, change the country name in COUNTRY to something new.I had a solution, but I always got the warning for a chaining issue:
df.COUNTRY[df.COUNTRY.str.match("China") & (df.COSTCENTER.str.match("1000") | df.COSTCENTER.str.match("2000"))] = 'China_new_name'I cannot say, I understood completely, why I could have problems here, but I was looking for an alternative. I was trying with lambda and apply, but I kept getting all sorts of errors.
My latest approach now was:
filter_China = df.ix[(df["COUNTRY"]=="China") & ((df["COSTCENTER"]=="1000") | (df["COSTCENTER"]=="2000"))]and it seems to filter, what I am looking for (I did not include the search on EXTENSION yet, as I first wanted this to work).
But when I am trying to change a value, based on my search criteria, I am running into trouble:
df.ix[(df["COUNTRY"]=="China") & ((df["COSTCENTER"]=="1000") | (df["COSTCENTER"]=="2000")), df["COUNTRY"]] = "China_new_name"I am getting this error: raise KeyError('%s not in index' % objarr[mask])
What am I missing here? Is the approach the right one or would I need to go a total different route?
最满意答案
您需要阅读有关链式索引和SettingWithCopy警告的文档部分
df.loc[df.COUNTRY.str.match("China") & (df.COSTCENTER.str.match("1000") | df.COSTCENTER.str.match("2000")), "COUNTRY"] = 'China_new_name'You need to read the section of the documentation on chained indexing and the SettingWithCopy warning
df.loc[df.COUNTRY.str.match("China") & (df.COSTCENTER.str.match("1000") | df.COSTCENTER.str.match("2000")), "COUNTRY"] = 'China_new_name'更多推荐
发布评论