Pandas常用基本操作

编程入门 行业动态 更新时间:2024-10-09 18:24:28

Pandas<a href=https://www.elefans.com/category/jswz/34/1769776.html style=常用基本操作"/>

Pandas常用基本操作

Pandas基本操作—第一节

以下是实验所用的数据表,需要数据表练习的请留言

本文的所有试验均基于jupyter notebook

使用Pandas读取csv文件并显示

import pandas as pd
#读取csv数据文件
food_info = pd.read_csv('food_info.csv')
#查看数据类型   DataFrame数据类型是pandas的核心数据类型之一
print(type(food_info))
#查看数据
food_info[0:10]  #仅显示前10行
<class 'pandas.core.frame.DataFrame'>
NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg)
01001BUTTER WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.051.36821.0213.043215.0
11002BUTTER WHIPPED WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.050.48923.4263.012219.0
21003BUTTER OIL ANHYDROUS0.248760.2899.480.000.000.00.00...3069.0840.02.801.873.08.661.92428.7323.694256.0
31004CHEESE BLUE42.4135321.4028.745.112.340.00.50...721.0198.00.250.521.02.418.6697.7780.80075.0
41005CHEESE BRICK41.1137123.2429.683.182.790.00.51...1080.0292.00.260.522.02.518.7648.5980.78494.0
51006CHEESE BRIE48.4233420.7527.682.700.450.00.45...592.0174.00.240.520.02.317.4108.0130.826100.0
61007CHEESE CAMEMBERT51.8030019.8024.263.680.460.00.46...820.0241.00.210.418.02.015.2597.0230.72472.0
71008CHEESE CARAWAY39.2837625.1829.203.283.060.0NaN...1054.0271.0NaNNaNNaNNaN18.5848.2750.83093.0
81009CHEESE CHEDDAR37.1040624.0433.823.711.330.00.28...994.0263.00.780.624.02.919.3688.4281.433102.0
91010CHEESE CHESHIRE37.6538723.3730.603.604.780.0NaN...985.0233.0NaNNaNNaNNaN19.4758.6710.870103.0

10 rows × 36 columns

查看读入的csv文件的每一列的数据类型

print(food_info.dtypes)  
#返回每一列的数据类型  最常见的三种int64  float64  object(即string)
NDB_No               int64
Shrt_Desc           object
Water_(g)          float64
Energ_Kcal           int64
Protein_(g)        float64
Lipid_Tot_(g)      float64
Ash_(g)            float64
Carbohydrt_(g)     float64
Fiber_TD_(g)       float64
Sugar_Tot_(g)      float64
Calcium_(mg)       float64
Iron_(mg)          float64
Magnesium_(mg)     float64
Phosphorus_(mg)    float64
Potassium_(mg)     float64
Sodium_(mg)        float64
Zinc_(mg)          float64
Copper_(mg)        float64
Manganese_(mg)     float64
Selenium_(mcg)     float64
Vit_C_(mg)         float64
Thiamin_(mg)       float64
Riboflavin_(mg)    float64
Niacin_(mg)        float64
Vit_B6_(mg)        float64
Vit_B12_(mcg)      float64
Vit_A_IU           float64
Vit_A_RAE          float64
Vit_E_(mg)         float64
Vit_D_mcg          float64
Vit_D_IU           float64
Vit_K_(mcg)        float64
FA_Sat_(g)         float64
FA_Mono_(g)        float64
FA_Poly_(g)        float64
Cholestrl_(mg)     float64
dtype: object

显示前表格的前五行

#显示表格
food_info.head() #默认只显示前5行
NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg)
01001BUTTER WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.051.36821.0213.043215.0
11002BUTTER WHIPPED WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.050.48923.4263.012219.0
21003BUTTER OIL ANHYDROUS0.248760.2899.480.000.000.00.00...3069.0840.02.801.873.08.661.92428.7323.694256.0
31004CHEESE BLUE42.4135321.4028.745.112.340.00.50...721.0198.00.250.521.02.418.6697.7780.80075.0
41005CHEESE BRICK41.1137123.2429.683.182.790.00.51...1080.0292.00.260.522.02.518.7648.5980.78494.0

5 rows × 36 columns

显示前表格的前几行

#当需要显示别的数目的行数时可以给head传参
food_info.head(2)  #显示两行
NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg)
01001BUTTER WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.051.36821.0213.043215.0
11002BUTTER WHIPPED WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.050.48923.4263.012219.0

2 rows × 36 columns

显示表格的最后五行

#可以查看开头几行,当然也能查看尾几行
food_info.tail()#默认5行
NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg)
861383110MACKEREL SALTED43.0030518.5025.1013.400.000.00.0...157.047.02.3825.21006.07.87.1488.3206.21095.0
861490240SCALLOP (BAY&SEA) CKD STMD70.2511120.540.842.975.410.00.0...5.02.00.000.02.00.00.2180.0820.22241.0
861590480SYRUP CANE26.002690.000.000.8673.140.073.2...0.00.00.000.00.00.00.0000.0000.0000.0
861690560SNAIL RAW79.209016.101.401.302.000.00.0...100.030.05.000.00.00.10.3610.2590.25250.0
861793600TURTLE GREEN RAW78.508919.800.501.200.000.00.0...100.030.00.500.00.00.10.1270.0880.17050.0

5 rows × 36 columns

显示表格的最后几行

food_info.tail(2)#只显示最后两行
NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg)
861690560SNAIL RAW79.29016.11.41.32.00.00.0...100.030.05.00.00.00.10.3610.2590.25250.0
861793600TURTLE GREEN RAW78.58919.80.51.20.00.00.0...100.030.00.50.00.00.10.1270.0880.17050.0

2 rows × 36 columns

查看csv文件的所有列名

#显示列名
print(food_info.columns,type(food_info.columns))
Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)','Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)','Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)','Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)','Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)','Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)','Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg','Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)','Cholestrl_(mg)'],dtype='object') <class 'pandas.core.indexes.base.Index'>

查看当前读入的csv文件的数据结构

#查看当前数据的结构多少行多少列
print(food_info.shape)
(8618, 36)

对当前的数据文件进行索引

#pands读入的csv文件也可以进行索引和切片但是要通过一个loc方法
food_info.loc[0]  #只显示第一行也就是csv文件中的列名那一行
NDB_No                         1001
Shrt_Desc          BUTTER WITH SALT
Water_(g)                     15.87
Energ_Kcal                      717
Protein_(g)                    0.85
Lipid_Tot_(g)                 81.11
Ash_(g)                        2.11
Carbohydrt_(g)                 0.06
Fiber_TD_(g)                      0
Sugar_Tot_(g)                  0.06
Calcium_(mg)                     24
Iron_(mg)                      0.02
Magnesium_(mg)                    2
Phosphorus_(mg)                  24
Potassium_(mg)                   24
Sodium_(mg)                     643
Zinc_(mg)                      0.09
Copper_(mg)                       0
Manganese_(mg)                    0
Selenium_(mcg)                    1
Vit_C_(mg)                        0
Thiamin_(mg)                  0.005
Riboflavin_(mg)               0.034
Niacin_(mg)                   0.042
Vit_B6_(mg)                   0.003
Vit_B12_(mcg)                  0.17
Vit_A_IU                       2499
Vit_A_RAE                       684
Vit_E_(mg)                     2.32
Vit_D_mcg                       1.5
Vit_D_IU                         60
Vit_K_(mcg)                       7
FA_Sat_(g)                   51.368
FA_Mono_(g)                  21.021
FA_Poly_(g)                   3.043
Cholestrl_(mg)                  215
Name: 0, dtype: object

对当前的csv文件进行切片操作

#对csv文件数据的切片
food_info.loc[0:5]  #查看前5行与food.head显示的一致
NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg)
01001BUTTER WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.051.36821.0213.043215.0
11002BUTTER WHIPPED WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.050.48923.4263.012219.0
21003BUTTER OIL ANHYDROUS0.248760.2899.480.000.000.00.00...3069.0840.02.801.873.08.661.92428.7323.694256.0
31004CHEESE BLUE42.4135321.4028.745.112.340.00.50...721.0198.00.250.521.02.418.6697.7780.80075.0
41005CHEESE BRICK41.1137123.2429.683.182.790.00.51...1080.0292.00.260.522.02.518.7648.5980.78494.0
51006CHEESE BRIE48.4233420.7527.682.700.450.00.45...592.0174.00.240.520.02.317.4108.0130.826100.0

6 rows × 36 columns

food_info.loc[0:9:2]  #类似Python中的切片,food_info.loc[开始,结束,步长],这里显示前10行的偶数行
NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg)
01001BUTTER WITH SALT15.877170.8581.112.110.060.00.06...2499.0684.02.321.560.07.051.36821.0213.043215.0
21003BUTTER OIL ANHYDROUS0.248760.2899.480.000.000.00.00...3069.0840.02.801.873.08.661.92428.7323.694256.0
41005CHEESE BRICK41.1137123.2429.683.182.790.00.51...1080.0292.00.260.522.02.518.7648.5980.78494.0
61007CHEESE CAMEMBERT51.8030019.8024.263.680.460.00.46...820.0241.00.210.418.02.015.2597.0230.72472.0
81009CHEESE CHEDDAR37.1040624.0433.823.711.330.00.28...994.0263.00.780.624.02.919.3688.4281.433102.0

5 rows × 36 columns

通过列索引取数据

#通过索引可以按行取数据,当然也可以通过列索引来取整列的数据
food_info['NDB_No'][0:5]  #只显示NDB_No这一列的前5条数据
0    1001
1    1002
2    1003
3    1004
4    1005
Name: NDB_No, dtype: int64

获取所有以g结尾的列名

#现在有一个需求要获取所有以g结尾的列名
columns = food_info.columns #这里得到的columns并不是list而是<class 'pandas.core.indexes.base.Index'>可以使用tolist方法转为list类型
print(columns.tolist(),type(columns.tolist()))
print('*'*100)
g_columns = []
for this_column in columns:if this_column.endswith('(g)'):g_columns.append(this_column)
print(g_columns)
food_info[g_columns][0:5]  #所有以g结尾的有29列  只显示前5行
['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)', 'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)', 'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)', 'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)', 'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)', 'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)', 'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg', 'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)', 'Cholestrl_(mg)'] <class 'list'>
****************************************************************************************************
['Water_(g)', 'Protein_(g)', 'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)', 'Sugar_Tot_(g)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)']
Water_(g)Protein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)
0315.870.8581.112.110.060.00.0651.36821.0213.043
1315.870.8581.112.110.060.00.0650.48923.4263.012
2300.240.2899.480.000.000.00.0061.92428.7323.694
3342.4121.4028.745.112.340.00.5018.6697.7780.800
4341.1123.2429.683.182.790.00.5118.7648.5980.784

Pandas对csv文件的基本数学运算

#针对数据的基本数学运算 +|-|×|÷  与常数运算每一个值与常数运算,列与列运算,列的对应位置运算
#将Iron_(mg)这一列的数据转换成以g为单位
(food_info['Iron_(mg)']/1000)[0:5]  #只显示前五条数据
0    0.00002
1    0.00016
2    0.00000
3    0.00031
4    0.00043
Name: Iron_(mg), dtype: float64
#将Protein_(g)这一列的值全部加100
food_info['Protein_(g)'] += 100
food_info['Protein_(g)'][0:5]  #只显示前五行
0    100.85
1    100.85
2    100.28
3    121.40
4    123.24
Name: Protein_(g), dtype: float64
#将Lipid_Tot_(g)这一列的值全部减去10
food_info['Lipid_Tot_(g)'] -= 10
food_info['Lipid_Tot_(g)'][0:5]  #只显示前五行
0    71.11
1    71.11
2    89.48
3    18.74
4    19.68
Name: Lipid_Tot_(g), dtype: float64
#将water_(g)与Protein_(g)相乘
new_lipid = food_info['Lipid_Tot_(g)'] * food_info['Carbohydrt_(g)']
new_lipid[0:5]
0     4.2666
1     4.2666
2     0.0000
3    43.8516
4    54.9072
dtype: float64

按某一列的值排序操作

food_info.sort_values('Water_(g)',inplace=True)  #inplace = True不生成新的dataframe
food_info.sort_values('Water_(g)',inplace=False)[0:5]#inplace = False,默认为false生成新的dataframe只显示前5行
NDB_NoShrt_DescWater_(g)Energ_KcalProtein_(g)Lipid_Tot_(g)Ash_(g)Carbohydrt_(g)Fiber_TD_(g)Sugar_Tot_(g)...Vit_A_IUVit_A_RAEVit_E_(mg)Vit_D_mcgVit_D_IUVit_K_(mcg)FA_Sat_(g)FA_Mono_(g)FA_Poly_(g)Cholestrl_(mg)
6764544SHORTENING HOUSEHOLD LARD&VEG OIL400.09000.090.00.00.00.00.0...0.00.01.00.00.021.540.344.410.956.0
6644520FAT MUTTON TALLOW400.09020.090.00.00.00.00.0...0.00.02.80.728.00.047.340.67.8102.0
6654528OIL WALNUT400.08840.090.00.00.00.00.0...0.00.00.40.00.015.09.122.863.30.0
6664529OIL ALMOND400.08840.090.00.00.00.00.0...0.00.039.20.00.07.08.269.917.40.0
6674530OIL APRICOT KERNEL400.08840.090.00.00.00.00.0...0.00.04.0NaNNaNNaN6.360.029.30.0

5 rows × 36 columns

按某一列的值索引值排序返回索引与列值

food_info['Water_(g)'].sort_index(ascending=False)[0:5]  #对某一列索引值的排序  只显示前五行
8617    478.50
8616    479.20
8615    426.00
8614    470.25
8613    443.00
Name: Water_(g), dtype: float64

更多推荐

Pandas常用基本操作

本文发布于:2024-02-26 23:28:30,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1704376.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:常用   操作   Pandas

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!