将Tranform Pandas数据框列转换为行(Tranform Pandas dataframe columns to rows)

编程入门 行业动态 更新时间:2024-10-10 06:20:30
将Tranform Pandas数据框列转换为行(Tranform Pandas dataframe columns to rows)

我有一个大型的Pandas数据框,包括2002列和258行,每列代表一个产品,每行代表每个产品在给定日期的销售额。

我想将其转换为包含产品名称和销售额的2002 * 258 = 516,516行和2列的Pandas数据框。 我如何在Python高效执行此操作?

以下数据可以作为示例:

d = {'Product 1': [1, 2], 'Product 2': [3, 4], 'Product 3': [1, 1]} df = pd.DataFrame(data=d)

I have a large Pandas dataframe of 2002 columns and 258 rows, where each column represents a product and each row presents the sales of each product on a given day.

I would like to transform this to a Pandas dataframe with 2002 * 258 = 516,516 rows and 2 columns containing the product name and the sales amount. How do I do this efficiently in Python?

The following data can serve as an example:

d = {'Product 1': [1, 2], 'Product 2': [3, 4], 'Product 3': [1, 1]} df = pd.DataFrame(data=d)

最满意答案

我认为需要stack ,double reset_index和rename_axis :

df = df.stack().reset_index(level=0, drop=True).rename_axis('a').reset_index(name='b') print (df) a b 0 Product 1 1 1 Product 2 3 2 Product 3 1 3 Product 1 2 4 Product 2 4 5 Product 3 1

numpy.tile或numpy.repeat.html和numpy.ravel提供更好的性能替代方案:

df = pd.DataFrame({'a':np.tile(df.columns, len(df)), 'b':df.values.ravel()}) print (df) a b 0 Product 1 1 1 Product 2 3 2 Product 3 1 3 Product 1 2 4 Product 2 4 5 Product 3 1
df = pd.DataFrame({'a':np.repeat(df.columns, len(df)), 'b':df.values.T.ravel()}) print (df) a b 0 Product 1 1 1 Product 1 2 2 Product 2 3 3 Product 2 4 4 Product 3 1 5 Product 3 1

时间

np.random.seed(145) #[258 rows x 2002 columns] df = pd.DataFrame(np.random.randint(100, size=(258,2002))).add_prefix('Product ') #print (df) In [112]: %timeit pd.DataFrame({'a':np.tile(df.columns, len(df)), 'b':df.values.ravel()}) 100 loops, best of 3: 12.6 ms per loop In [113]: %timeit pd.DataFrame({'a':np.repeat(df.columns, len(df)), 'b':df.values.T.ravel()}) 100 loops, best of 3: 10.8 ms per loop In [114]: %timeit df.reset_index().melt(id_vars='index', var_name='product', value_name='sales') 100 loops, best of 3: 18 ms per loop In [115]: %timeit df.stack().reset_index(level=0, drop=True).rename_axis('a').reset_index(name='b') 10 loops, best of 3: 27.8 ms per loop In [116]: %timeit df.unstack().swaplevel().sort_index() 10 loops, best of 3: 156 ms per loop

编辑:

d = {'Product 1': [1, 2], 'Product 2': [3, 4], 'Product 3': [1, 1]} df = pd.DataFrame(data=d, index=pd.date_range('2015-01-04', periods=2)) print (df) Product 1 Product 2 Product 3 2015-01-04 1 3 1 2015-01-05 2 4 1 df = pd.DataFrame({'a': np.repeat(df.columns, len(df)), 'b': np.tile(df.index, len(df.columns)), 'c': df.values.T.ravel()}) print (df) a b c 0 Product 1 2015-01-04 1 1 Product 1 2015-01-05 2 2 Product 2 2015-01-04 3 3 Product 2 2015-01-05 4 4 Product 3 2015-01-04 1 5 Product 3 2015-01-05 1

I think need stack, double reset_index and rename_axis:

df = df.stack().reset_index(level=0, drop=True).rename_axis('a').reset_index(name='b') print (df) a b 0 Product 1 1 1 Product 2 3 2 Product 3 1 3 Product 1 2 4 Product 2 4 5 Product 3 1

Alternative for better performance with numpy.tile or numpy.repeat.html and numpy.ravel:

df = pd.DataFrame({'a':np.tile(df.columns, len(df)), 'b':df.values.ravel()}) print (df) a b 0 Product 1 1 1 Product 2 3 2 Product 3 1 3 Product 1 2 4 Product 2 4 5 Product 3 1
df = pd.DataFrame({'a':np.repeat(df.columns, len(df)), 'b':df.values.T.ravel()}) print (df) a b 0 Product 1 1 1 Product 1 2 2 Product 2 3 3 Product 2 4 4 Product 3 1 5 Product 3 1

Timings:

np.random.seed(145) #[258 rows x 2002 columns] df = pd.DataFrame(np.random.randint(100, size=(258,2002))).add_prefix('Product ') #print (df) In [112]: %timeit pd.DataFrame({'a':np.tile(df.columns, len(df)), 'b':df.values.ravel()}) 100 loops, best of 3: 12.6 ms per loop In [113]: %timeit pd.DataFrame({'a':np.repeat(df.columns, len(df)), 'b':df.values.T.ravel()}) 100 loops, best of 3: 10.8 ms per loop In [114]: %timeit df.reset_index().melt(id_vars='index', var_name='product', value_name='sales') 100 loops, best of 3: 18 ms per loop In [115]: %timeit df.stack().reset_index(level=0, drop=True).rename_axis('a').reset_index(name='b') 10 loops, best of 3: 27.8 ms per loop In [116]: %timeit df.unstack().swaplevel().sort_index() 10 loops, best of 3: 156 ms per loop

EDIT:

d = {'Product 1': [1, 2], 'Product 2': [3, 4], 'Product 3': [1, 1]} df = pd.DataFrame(data=d, index=pd.date_range('2015-01-04', periods=2)) print (df) Product 1 Product 2 Product 3 2015-01-04 1 3 1 2015-01-05 2 4 1 df = pd.DataFrame({'a': np.repeat(df.columns, len(df)), 'b': np.tile(df.index, len(df.columns)), 'c': df.values.T.ravel()}) print (df) a b c 0 Product 1 2015-01-04 1 1 Product 1 2015-01-05 2 2 Product 2 2015-01-04 3 3 Product 2 2015-01-05 4 4 Product 3 2015-01-04 1 5 Product 3 2015-01-05 1

更多推荐

本文发布于:2023-08-07 18:29:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1465297.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:转换为   数据   Pandas   Tranform   columns

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!