每行获取第一个非空值(Get first non

编程入门 行业动态 更新时间:2024-10-14 04:30:38
每行获取第一个非空值(Get first non-null value per row)

我有一个示例数据框显示如下。 对于每一行,我想首先检查c1,如果它不为空,则检查c2。 通过这种方式,找到第一个notnull列并将该值存储到列结果中。

ID c1 c2 c3 c4 result 1 a b a 2 cc dd cc 3 ee ff ee 4 gg gg

我现在正在使用这种方式。 但我想知道是否有更好的方法(列名没有任何模式,这只是示例)

df["result"] = np.where(df["c1"].notnull(), df["c1"], None) df["result"] = np.where(df["result"].notnull(), df["result"], df["c2"]) df["result"] = np.where(df["result"].notnull(), df["result"], df["c3"]) df["result"] = np.where(df["result"].notnull(), df["result"], df["c4"]) df["result"] = np.where(df["result"].notnull(), df["result"], "unknown)

当有很多列时,这种方法看起来不太好。

I have a sample dataframe show as below. For each line, I want to check the c1 first, if it is not null, then check c2. By this way, find the first notnull column and store that value to column result.

ID c1 c2 c3 c4 result 1 a b a 2 cc dd cc 3 ee ff ee 4 gg gg

I am using this way for now. but I would like to know if there is a better method.(The column name do not have any pattern, this is just sample)

df["result"] = np.where(df["c1"].notnull(), df["c1"], None) df["result"] = np.where(df["result"].notnull(), df["result"], df["c2"]) df["result"] = np.where(df["result"].notnull(), df["result"], df["c3"]) df["result"] = np.where(df["result"].notnull(), df["result"], df["c4"]) df["result"] = np.where(df["result"].notnull(), df["result"], "unknown)

When there are lots of columns, this method looks not good.

最满意答案

先使用回填NaN ,然后通过iloc选择第一列:

df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')

要么:

df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')
print (df) ID c1 c2 c3 c4 result 0 1 a b a NaN a 1 2 NaN cc dd cc cc 2 3 NaN ee ff ee ee 3 4 NaN NaN gg gg gg

性能

df = pd.concat([df] * 1000, ignore_index=True) In [220]: %timeit df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown') 100 loops, best of 3: 2.78 ms per loop In [221]: %timeit df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown') 100 loops, best of 3: 2.7 ms per loop #jpp solution In [222]: %%timeit ...: cols = df.iloc[:, 1:].T.apply(pd.Series.first_valid_index) ...: ...: df['result'] = [df.loc[i, cols[i]] for i in range(len(df.index))] ...: 1 loop, best of 3: 180 ms per loop #cᴏʟᴅsᴘᴇᴇᴅ' s solution In [223]: %timeit df['result'] = df.stack().groupby(level=0).first() 1 loop, best of 3: 606 ms per loop

Use back filling NaNs first and then select first column by iloc:

df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')

Or:

df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')
print (df) ID c1 c2 c3 c4 result 0 1 a b a NaN a 1 2 NaN cc dd cc cc 2 3 NaN ee ff ee ee 3 4 NaN NaN gg gg gg

Performance:

df = pd.concat([df] * 1000, ignore_index=True) In [220]: %timeit df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown') 100 loops, best of 3: 2.78 ms per loop In [221]: %timeit df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown') 100 loops, best of 3: 2.7 ms per loop #jpp solution In [222]: %%timeit ...: cols = df.iloc[:, 1:].T.apply(pd.Series.first_valid_index) ...: ...: df['result'] = [df.loc[i, cols[i]] for i in range(len(df.index))] ...: 1 loop, best of 3: 180 ms per loop #cᴏʟᴅsᴘᴇᴇᴅ' s solution In [223]: %timeit df['result'] = df.stack().groupby(level=0).first() 1 loop, best of 3: 606 ms per loop

更多推荐

本文发布于:2023-07-18 04:34:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1154828.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:第一个   非空值

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!