太空泰坦尼克号

编程入门 行业动态 更新时间:2024-10-17 16:19:53

<a href=https://www.elefans.com/category/jswz/34/1759274.html style=太空泰坦尼克号"/>

太空泰坦尼克号

基于XGBClassifier太空泰坦尼克号数据集分类

数据集:kaggle泰坦尼克号宇宙飞船

得分:

数据预处理

import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')
import missingno as msn #缺失值可视化

导入数据集

test = pd.read_csv('./test.csv')
sample = pd.read_csv('./sample_submission.csv')
train = pd.read_csv('./train.csv')

查看数据信息

print(train.isnull().sum())
print(train.info())
#缺失值可视化
msn.matrix(train)
print(test.isnull().sum())
print(test.info())
#缺失值可视化
msn.matrix(test)
#定义得分函数
def get_score(model,X,y):n = cross_val_score(model,X,y,scoring ='accuracy',cv=20)return n

缺失值填充

fill_col = [ 'HomePlanet', 'CryoSleep', 'Cabin', 'Destination', 'Age','VIP', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck','Name',]# 对于分类型变量的缺失值用众数进行填充、对于数值型变量的缺失值用平均数进行填充
for s in fill_col:if s in train.columns:if train[s].dtype==object:fill_none = train[s].value_counts().index[0]else:fill_none = np.mean(train[s])train[s] = train[s].fillna(fill_none)
#类别数据矢量化
for s in train.columns:if train[s].dtype == object:df_ob = {label: idx for idx,label in enumerate(set(train[s]))}train[s] = train[s].map(df_ob)
train['CryoSleep'] = train["CryoSleep"].map({False:0,True:1})
train['VIP'] = train["VIP"].map({False:0,True:1})
train['Transported'] = train["Transported"].map({False:0,True:1})
# Test
test = test.fillna(method='ffill')
#类别数据矢量化
for s in test.columns:if test[s].dtype == object:df_ob = {label: idx for idx,label in enumerate(set(test[s]))}test[s] = test[s].map(df_ob)
test['CryoSleep'] = test["CryoSleep"].map({False:0,True:1})
test['VIP'] = test["VIP"].map({False:0,True:1})

模型搭建

y = train['Transported']
X = train.drop(columns = ['Transported'])
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=30)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
from bayes_opt import BayesianOptimization
import warnings
warnings.filterwarnings("ignore")
from sklearn import metrics
from sklearn.model_selection import cross_val_predict,cross_validate
from xgboost import XGBClassifiermodel = xgb.XGBClassifier(learning_rate=0.01,n_estimators=227,           # 树的个数-10棵树建立xgboost\n",max_depth=4,               # 树的深度\n",min_child_weight = 1,      # 叶子节点最小权重\n",gamma=5,                  # 惩罚项中叶子结点个数前的参数\n",subsample=1.0,               # 所有样本建立决策树\n",colsample_btree=0.76,         # 所有特征建立决策树\n",scale_pos_weight=1,        # 解决样本个数不平衡的问题\n",random_state=27,           # 随机数\n",verbosity = 0,)
model.fit(X_train,y_train)
rf_grid_1_best = model.predict(test)
sample['Transported'] = rf_grid_1_best.astype(bool)
sample.to_csv('submission1.csv', index=False)

提交结果获取得分

更多推荐

太空泰坦尼克号

本文发布于:2024-03-13 15:31:56,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1734290.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:太空   泰坦尼克号

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!