我的熊猫数据框df具有以下形状:(763, 65)
I have a pandas dataframe df of the following shape: (763, 65)
我使用以下代码创建4个新列:
I use the following code to create 4 new columns:
df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1) def myFunc(row): #code to get some result from another dataframe return result1, result2, result3, result4在myFunc中返回的数据框的形状为(1, 4).代码遇到以下错误:
The shape of the dataframe which is returned in myFunc is (1, 4). The code runs into the following error:
ValueError:传递的值的形状为(763,4),索引暗示(763,65)
ValueError: Shape of passed values is (763, 4), indices imply (763, 65)
我知道df有65列,并且从myFunc返回的数据只有4列.但是,我只想创建4个新列(即col1,col2等),所以我认为当在myFunc中仅返回4列时代码是正确的.我在做什么错了?
I know that df has 65 columns and that the returned data from myFunc only has 4 columns. However, I only want to create the 4 new columns (that is, col1, col2, etc.), so in my opinion the code is correct when it only returns 4 columns in myFunc. What am I doing wrong?
推荐答案演示:
In [40]: df = pd.DataFrame({'a':[1,2,3]}) In [41]: df Out[41]: a 0 1 1 2 2 3 In [42]: def myFunc(row): ...: #code to get some result from another dataframe ...: # NOTE: trick is to return pd.Series() ...: return pd.Series([1,2,3,4]) * row['a'] ...: In [44]: df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1) In [45]: df Out[45]: a col1 col2 col3 col4 0 1 1 2 3 4 1 2 2 4 6 8 2 3 3 6 9 12免责声明::尽量避免使用.apply(..., axis=1)-因为它是引擎盖下的for loop-即它没有进行矢量化处理,与矢量化熊猫相比,其运行速度 要慢得多/脾气暴躁的功能.
Disclaimer: try to avoid using .apply(..., axis=1) - as it's a for loop under the hood - i.e. it's not vectoried and will work much slower compared to vectorized Pandas/Numpy ufuncs.
PS,如果您要提供myFunc函数中要计算的内容的详细信息,那么我们可以尝试找到向量化的解决方案...
PS if you would provide details of what you are trying to calculate in the myFunc functuion, then we could try to find a vectorized solution...
更多推荐
使用apply +函数为pandas数据框创建多个新列
发布评论