我试图在包含多列和多行的数据框中填充所有的nans。 我正在使用它来训练一个多变量ML模型,所以我想用中位数填充每列的nans。 只是为了测试中值函数,我这样做了:
training_df.loc[[0]] = np.nan # Sets first row to nan print(training_df.isnull().values.any()) # Prints true because we just inserted nans test = training_df.fillna(training_df.median()) # Fillna with median print(test.isnull().values.any()) # Check afterwards但是当我这样做时什么都没有发生,最后一行的打印仍然返回True。 如果我尝试改为使用像这样的中值函数:
training_df.fillna(training_df.median(), inplace=True)没有任何反应。 如果我这样做:
training_df = training_df.fillna(training_df.median(), inplace=True)Training_df变成无。 我该如何解决这个问题?
I am trying to fill all the nans in a dataframe containing multiple columns and several rows. I am using this to train a multi variate ML-model so I want to fill the nans for each column with the median. Just to test the median function I did this:
training_df.loc[[0]] = np.nan # Sets first row to nan print(training_df.isnull().values.any()) # Prints true because we just inserted nans test = training_df.fillna(training_df.median()) # Fillna with median print(test.isnull().values.any()) # Check afterwardsBut when I do this nothing happens, the print of the last row still returns True. If I try to change to use the median function like this instead:
training_df.fillna(training_df.median(), inplace=True)Nothing happens as well. If I do this:
training_df = training_df.fillna(training_df.median(), inplace=True)Training_df becomes none. How can I solve this?
最满意答案
正如@thesilkworm建议的那样,首先将你的系列转换为数字。 下面是一个简单的例子:
import pandas as pd, numpy as np df = pd.DataFrame([[np.nan, np.nan, np.nan], [5, 1, 2, 'hello'], [1, 4, 3, 4], [9, 8, 7, 6]], dtype=object) df = df.fillna(df.median()) # fails df[df.columns] = df[df.columns].apply(pd.to_numeric, errors='coerce') df = df.fillna(df.median()) # worksAs @thesilkworm suggested, convert your series to numeric first. Below is a minimal example:
import pandas as pd, numpy as np df = pd.DataFrame([[np.nan, np.nan, np.nan], [5, 1, 2, 'hello'], [1, 4, 3, 4], [9, 8, 7, 6]], dtype=object) df = df.fillna(df.median()) # fails df[df.columns] = df[df.columns].apply(pd.to_numeric, errors='coerce') df = df.fillna(df.median()) # works更多推荐
发布评论