重塑pandas数据帧中的非数字值(Reshaping non

重塑pandas数据帧中的非数字值(Reshaping non-numeric values in pandas dataframe)

我通过谷歌搜索找到答案，但没有运气。我需要重塑一个pandas数据帧，使数字非数值（comp_url）成为多索引数据帧中的“值”。以下是数据示例：

store_name sku comp price ship comp_url CSE A1025 compA 30.99 9.99 some url CSE A1025 compB 30.99 9.99 some url CSE A1025 compC 30.99 9.99 some url

我有几个store_name，所以我需要看起来像这样：

SKU CSE store_name2 comp_url price ship comp_url price ship A1025 some url 30.99 9.99 some url 30.99 9.99

任何想法或指导将不胜感激！

I've searched through google to find an answer but haven't had luck. I need to reshape a pandas dataframe to have numeric non-numeric values (comp_url) as the "value" in a multi-index dataframe. Below is a sample of the data:

store_name sku comp price ship comp_url CSE A1025 compA 30.99 9.99 some url CSE A1025 compB 30.99 9.99 some url CSE A1025 compC 30.99 9.99 some url

I have several store_name's so I need to have it look like this:

SKU CSE store_name2 comp_url price ship comp_url price ship A1025 some url 30.99 9.99 some url 30.99 9.99

Any ideas or guidance would be appreciated!

最满意答案

假设每个SKU / store_name组合都是唯一的，这是一个工作示例：

# imports import pandas as pd # Create a sample DataFrame. cols = ['store_name', 'sku', 'comp', 'price', 'ship', 'comp_url'] records = [['CSA', 'A1025', 'compA', 30.99, 9.99, 'some url'], ['CSB', 'A1025', 'compB', 32.99, 9.99, 'some url2'], ['CSA', 'A1026', 'compC', 30.99, 19.99, 'some url'], ['CSB', 'A1026', 'compD', 30.99, 9.99, 'some url3']] df = pd.DataFrame.from_records(records, columns=cols) # Move both 'sku' and 'store_name' to the rows index; the combination # of these two columns provide a unique identifier for each row. df.set_index(['sku', 'store_name'], inplace=True) # Move 'store_name' from the row index to the column index. Each # unique value in the 'store_name' index gets its own set of columns. # In the multiindex, 'store_name' will be below the existing column # labels. df = df.unstack(1) # To get the 'store_name' above the other column labels, we simply # reorder the levels in the MultiIndex and sort it. df.columns = df.columns.reorder_levels([1, 0]) df.sort_index(axis=1, inplace=True) # Show the result. df

这是有效的，因为sku / store_name标签组合是唯一的。当我们使用unstack() ，我们只是移动标签和单元格。我们没有做任何聚合。如果我们正在做一些没有唯一标签和需要聚合的东西， pivot_table()可能是更好的选择。

Assuming each SKU/store_name combination is unique, here is a working example:

# imports import pandas as pd # Create a sample DataFrame. cols = ['store_name', 'sku', 'comp', 'price', 'ship', 'comp_url'] records = [['CSA', 'A1025', 'compA', 30.99, 9.99, 'some url'], ['CSB', 'A1025', 'compB', 32.99, 9.99, 'some url2'], ['CSA', 'A1026', 'compC', 30.99, 19.99, 'some url'], ['CSB', 'A1026', 'compD', 30.99, 9.99, 'some url3']] df = pd.DataFrame.from_records(records, columns=cols) # Move both 'sku' and 'store_name' to the rows index; the combination # of these two columns provide a unique identifier for each row. df.set_index(['sku', 'store_name'], inplace=True) # Move 'store_name' from the row index to the column index. Each # unique value in the 'store_name' index gets its own set of columns. # In the multiindex, 'store_name' will be below the existing column # labels. df = df.unstack(1) # To get the 'store_name' above the other column labels, we simply # reorder the levels in the MultiIndex and sort it. df.columns = df.columns.reorder_levels([1, 0]) df.sort_index(axis=1, inplace=True) # Show the result. df

This works because the sku/store_name label combination is unique. When we use unstack(), we are just moving labels and cells around. We are not doing any aggregation. If we were doing something that didn't have unique labels and required aggregation, pivot_table() would probably be a better option.

更多推荐

重塑pandas数据帧中的非数字值(Reshaping non

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表