Coogle学习 LightGBM 任务三|电子爱好者

admin管理员组
文章数量:1608850

任务3：分类、回归和排序任务

步骤1 ：学习LightGBM中sklearn接口的使用，导入分类、回归和排序模型。
步骤2 ：学习LightGBM中原生train接口的使用。
步骤3 ：二分类任务
1. 使用make_classification，创建一个二分类数据集。
2. 使用sklearn接口完成训练和预测。
3. 使用原生train接口完成训练和预测。
步骤4 ：多分类任务
1. 使用make_classification，创建一个多分类数据集。
2. 使用sklearn接口完成训练和预测。
3. 使用原生train接口完成训练和预测。
步骤5 ：回归任务
1. 使用make_regression，创建一个回归数据集。
2. 使用sklearn接口完成训练和预测。
3. 使用原生train接口完成训练和预测。

步骤1和2使用的数据集仍然为任务一和任务二里的iris数据集，步骤3，4，5为自己生成的数据集

步骤1

import pandas as pd
import lightgbm as lgb
from sklearn.datasets import make_classification
from sklearn.datasets import make_regression
from sklearn import datasets
from sklearn.model_selection import train_test_split

from sklearn import datasets
from sklearn.model_selection import train_test_split

# 载入数据集
iris = datasets.load_iris()  

# 将原始数据划分为训练，测试，验证集
train_data_all,test_data,train_y_all,test_y = \
                train_test_split(iris.data, iris.target,test_size=0.2,random_state=1,shuffle=True,stratify=iris.target)
train_data,val_data,train_y,val_y = \
                train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True,stratify=train_y_all)

sklearn的LGBMClassifier模型

# sklearn的LGBMClassifier模型
params_sklearn = {
    'learning_rate':0.1,
    'max_bin':150,
    'num_leaves':32,    
    'max_depth':11,
    
    'reg_alpha':0.1,
    'reg_lambda':0.2,   
     
    'objective':'multiclass',
    'n_estimators':300,
    #'class_weight':weight
}

# 定义模型
clf = lgb.LGBMClassifier(**params_sklearn)
# 模型训练
clf.fit(train_data,train_y,early_stopping_rounds=10,eval_set=[(val_data,val_y)],verbose=10)
# 模型预测
y_pred = clf.predict(test_data)

[10]	valid_0's multi_logloss: 0.422237
[20]	valid_0's multi_logloss: 0.290667
[30]	valid_0's multi_logloss: 0.25248
[40]	valid_0's multi_logloss: 0.253147
[2 0 1 0 0 0 2 2 2 1 0 1 2 1 2 0 2 1 1 2 1 1 0 0 2 1 0 0 1 1]


D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "

# 与真实值进行比较
print('pred:',y_pred)
print('test:',test_y)

pred: [2 0 1 0 0 0 2 2 2 1 0 1 2 1 2 0 2 1 1 2 1 1 0 0 2 1 0 0 1 1]
test: [2 0 1 0 0 0 2 2 2 1 0 1 2 1 2 0 2 1 1 2 1 1 0 0 2 2 0 0 1 1]

会发现30个预测值仅出现一个错误，正确率还是很高的

sklearn的LGBMRegressor模型

# 由于这里没有读取相应数据集，就不进行训练了
# sklearn的LGBMRegressor模型
params = {
        'num_leaves':54,
        'objective':'regression',
        'max_depth':18,
        'learning_rate':0.01,
        'boosting':'gbdt',
        'metric':'rmse',
        'lambda_l1':0.1
}
reg = lgb.LGBMRegressor(**params, n_estimators = 20000, nthread = 4, n_hobs = -1)

sklearn的LGBMRanker模型

推荐系统的常用模型，https://blog.csdn/wuzhongqiang/article/details/110521519

# sklearn的LGBMRanker模型
# 关于'objective'的使用，可以参考：https://github/xuetf/KDD_CUP_2020_Debiasing_Rush/issues/4

boosting_type='gbdt', num_leaves=31, reg_alpha=0.0, reg_lambda=1,
        max_depth=-1, n_estimators=300, objective='binary',
        subsample=0.7, colsample_bytree=0.7, subsample_freq=1,
        learning_rate=0.01, min_child_weight=50, random_state=2018,
        n_jobs=-1
params = {
        'num_leaves':54,
        'objective':'lambdarank',
        'max_depth':18,
        'learning_rate':0.01,
        'boosting':'gbdt',
        'metric':'rmse',
        'lambda_l1':0.1
}
rank = lgb.LGBMRanker(**params, n_estimators = 20000, nthread = 4, n_hobs = -1)

步骤2：使用LightGBM的原生train接口

需要注意的是，若使用LightGBM的原生train接口，需要先使用Dataset对输入数据进行处理，然后输入模型

# lightgbm中使用lgb.train来训练模型，模型参数以形参形式传入：
params_naive={
    "learning_rate":0.1,    
    'max_bin':150,
    'num_leaves':32,
    "max_depth":11,

    "lambda_l1":0.1,
    "lambda_l2":0.2,

    "objective":"multiclass",    
    "num_class":3,
}


# 使用原生接口
dtrain = lgb.Dataset(train_data,label=train_y)
dtest = lgb.Dataset(test_data,label=test_y)
dval = lgb.Dataset(val_data,label=val_y)

clf = lgb.train(params=params_naive,train_set=dtrain,valid_sets=[dtrain,dval],verbose_eval=10,early_stopping_rounds=10,num_boost_round=300)

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000053 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 78
[LightGBM] [Info] Number of data points in the train set: 96, number of used features: 4
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Training until validation scores don't improve for 10 rounds
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[10]	training's multi_logloss: 0.298963	valid_1's multi_logloss: 0.422237
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[20]	training's multi_logloss: 0.122397	valid_1's multi_logloss: 0.290667
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[30]	training's multi_logloss: 0.0638911	valid_1's multi_logloss: 0.25248
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[40]	training's multi_logloss: 0.0380959	valid_1's multi_logloss: 0.253147
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Early stopping, best iteration is:
[32]	training's multi_logloss: 0.057511	valid_1's multi_logloss: 0.24906


D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:181: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:239: UserWarning: 'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. "

# 模型预测
y_pred = clf.predict(test_data)
y_pred

array([[0.00854976, 0.01336626, 0.97808399],
       [0.9695985 , 0.02322582, 0.00717568],
       [0.1266051 , 0.84130907, 0.03208582],
       [0.98113984, 0.01159906, 0.0072611 ],
       [0.8987924 , 0.08175213, 0.01945547],
       [0.98113984, 0.01159906, 0.0072611 ],
       [0.01584902, 0.01109586, 0.97305512],
       [0.00854976, 0.01336626, 0.97808399],
       [0.01929656, 0.14943477, 0.83126867],
       [0.01583122, 0.80600597, 0.17816281],
       [0.8987924 , 0.08175213, 0.01945547],
       [0.01596415, 0.96674736, 0.01728849],
       [0.00854976, 0.01336626, 0.97808399],
       [0.0128839 , 0.97000256, 0.01711354],
       [0.00984623, 0.03916364, 0.95099013],
       [0.85542871, 0.12023735, 0.02433394],
       [0.00854976, 0.01336626, 0.97808399],
       [0.01230884, 0.97153018, 0.01616098],
       [0.1266051 , 0.84130907, 0.03208582],
       [0.02180155, 0.18923031, 0.78896814],
       [0.01236308, 0.97581151, 0.0118254 ],
       [0.01437657, 0.97083508, 0.01478835],
       [0.98113984, 0.01159906, 0.0072611 ],
       [0.9695985 , 0.02322582, 0.00717568],
       [0.00854976, 0.01336626, 0.97808399],
       [0.02056858, 0.9545397 , 0.02489172],
       [0.8987924 , 0.08175213, 0.01945547],
       [0.9695985 , 0.02322582, 0.00717568],
       [0.04605153, 0.9231903 , 0.03075817],
       [0.01224923, 0.96682535, 0.02092542]])

步骤3：二分类任务

使用make_classification，创建一个二分类数据集

# 使用make_classification，创建一个二分类数据集
bi_class_data = make_classification(
                        n_samples=10000, n_features=20, n_informative=5, n_redundant=2,
                        n_repeated=0, n_classes=2, n_clusters_per_class=2, 
                        flip_y=0.4, class_sep=1.0, 
                        hypercube=True,shift=0.0, scale=1.0, 
                        shuffle=True, random_state=2022
                )
data = pd.DataFrame(bi_class_data[0])
label = pd.DataFrame(bi_class_data[1])

label.value_counts()

1    5055
0    4945
dtype: int64

from sklearn.model_selection import train_test_split

# 将原始数据划分为训练，测试，验证集
train_data_all,test_data,train_y_all,test_y = \
                train_test_split(data, label,test_size=0.2,random_state=1,shuffle=True,stratify=label)
train_data,val_data,train_y,val_y = \
                train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True,stratify=train_y_all)

使用sklearn接口完成训练和预测

# sklearn的LGBMClassifier模型
params_sklearn = {
    'learning_rate':0.1,
    'max_bin':150,
    'num_leaves':32,    
    'max_depth':11,
    
    'reg_alpha':0.1,
    'reg_lambda':0.2,   
    'n_estimators':300,
}

# 定义模型
clf = lgb.LGBMClassifier(**params_sklearn)
# 模型训练
clf.fit(train_data,train_y,early_stopping_rounds=10,eval_set=[(val_data,val_y)],verbose=10)
# 模型预测
y_pred = clf.predict(test_data)
y_pred

[10]	valid_0's binary_logloss: 0.550373
[20]	valid_0's binary_logloss: 0.523356
[30]	valid_0's binary_logloss: 0.515842
[40]	valid_0's binary_logloss: 0.516377


D:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py:63: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return f(*args, **kwargs)
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "





array([1, 1, 1, ..., 0, 1, 0])

test_y

	0
8762	1
2447	1
9380	1
7209	1
8379	1
...	...
9013	0
1943	0
4228	0
6566	1
4712	1

2000 rows × 1 columns

使用原生train接口完成训练和预测

# lightgbm中使用lgb.train来训练模型，模型参数以形参形式传入：
params_naive={
    "learning_rate":0.1,    
    'max_bin':150,
    'num_leaves':32,
    "max_depth":11,

    "lambda_l1":0.1,
    "lambda_l2":0.2,

    "objective":"multiclass",    
    "num_class":2,
}


# 使用原生接口
dtrain = lgb.Dataset(train_data,label=train_y)
dval = lgb.Dataset(val_data,label=val_y)

clf = lgb.train(params=params_naive,train_set=dtrain,valid_sets=[dtrain,dval],verbose_eval=10,early_stopping_rounds=10,num_boost_round=300)

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000498 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3000
[LightGBM] [Info] Number of data points in the train set: 6400, number of used features: 20
[LightGBM] [Info] Start training from score -0.704145
[LightGBM] [Info] Start training from score -0.682269
Training until validation scores don't improve for 10 rounds
[10]	training's multi_logloss: 0.533821	valid_1's multi_logloss: 0.549425
[20]	training's multi_logloss: 0.485421	valid_1's multi_logloss: 0.522554
[30]	training's multi_logloss: 0.454127	valid_1's multi_logloss: 0.516862
[40]	training's multi_logloss: 0.429026	valid_1's multi_logloss: 0.517027
Early stopping, best iteration is:
[32]	training's multi_logloss: 0.448519	valid_1's multi_logloss: 0.516316


D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:181: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:239: UserWarning: 'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. "

# 模型预测
y_pred = clf.predict(test_data)
y_pred

array([[0.24276812, 0.75723188],
       [0.17581516, 0.82418484],
       [0.23915198, 0.76084802],
       ...,
       [0.78755907, 0.21244093],
       [0.47517021, 0.52482979],
       [0.76431998, 0.23568002]])

步骤4：多分类任务

使用make_classification，创建一个多分类数据集

# 使用make_classification，创建一个多分类数据集
mul_class_data = make_classification(n_samples=10000, n_features=20, n_informative=4, n_redundant=2,
                        n_repeated=0, n_classes=5, n_clusters_per_class=2, weights=[0.05,0.1,0.1,0.5],
                        flip_y=0.4, class_sep=1.0, hypercube=True,shift=0.0, scale=1.0, 
                        shuffle=True, random_state=2018)
data = pd.DataFrame(mul_class_data[0])
label = pd.DataFrame(mul_class_data[1])

label.value_counts()

3    3789
4    2266
2    1418
1    1413
0    1114
dtype: int64

from sklearn.model_selection import train_test_split

# 将原始数据划分为训练，测试，验证集
train_data_all,test_data,train_y_all,test_y = \
                train_test_split(data, label,test_size=0.2,random_state=1,shuffle=True,stratify=label)
train_data,val_data,train_y,val_y = \
                train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True,stratify=train_y_all)

使用sklearn接口完成训练和预测

# sklearn的LGBMClassifier模型
params_sklearn = {
    'learning_rate':0.1,
    'max_bin':150,
    'num_leaves':32,    
    'max_depth':11,
    
    'reg_alpha':0.1,
    'reg_lambda':0.2,   
     
    'objective':'multiclass',
    'n_estimators':300,
    'n_class':5
    #'class_weight':weight
}

# 定义模型
clf = lgb.LGBMClassifier(**params_sklearn)
# 模型训练
clf.fit(train_data,train_y,early_stopping_rounds=10,eval_set=[(val_data,val_y)],verbose=10)
# 模型预测
y_pred = clf.predict(test_data)

[LightGBM] [Warning] Unknown parameter: n_class
[10]	valid_0's multi_logloss: 1.32287
[20]	valid_0's multi_logloss: 1.3157
[30]	valid_0's multi_logloss: 1.31711


D:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py:63: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return f(*args, **kwargs)
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "

y_pred

array([3, 3, 0, ..., 3, 3, 3])

test_y

	0
5261	1
3330	3
4423	0
5602	3
2454	0
...	...
993	2
3640	3
6746	0
4433	2
4354	3

2000 rows × 1 columns

使用原生train接口完成训练和预测

# lightgbm中使用lgb.train来训练模型，模型参数以形参形式传入：
params_naive={
    "learning_rate":0.1,    
    'max_bin':150,
    'num_leaves':32,
    "max_depth":11,

    "lambda_l1":0.1,
    "lambda_l2":0.2,

    "objective":"multiclass",    
    "num_class":5
}


# 使用原生接口
dtrain = lgb.Dataset(train_data,label=train_y)
dval = lgb.Dataset(val_data,label=val_y)

clf = lgb.train(params=params_naive,train_set=dtrain,valid_sets=[dtrain,dval],verbose_eval=10,early_stopping_rounds=10,num_boost_round=300)

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000684 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3000
[LightGBM] [Info] Number of data points in the train set: 6400, number of used features: 20
[LightGBM] [Info] Start training from score -2.194572
[LightGBM] [Info] Start training from score -1.956118
[LightGBM] [Info] Start training from score -1.953911
[LightGBM] [Info] Start training from score -0.970466
[LightGBM] [Info] Start training from score -1.484734
Training until validation scores don't improve for 10 rounds
[10]	training's multi_logloss: 1.16251	valid_1's multi_logloss: 1.32287
[20]	training's multi_logloss: 1.01795	valid_1's multi_logloss: 1.3157
[30]	training's multi_logloss: 0.907985	valid_1's multi_logloss: 1.31711
Early stopping, best iteration is:
[22]	training's multi_logloss: 0.992725	valid_1's multi_logloss: 1.31435


D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:181: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:239: UserWarning: 'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. "

# 模型预测
y_pred = clf.predict(test_data)
y_pred

array([[0.06573803, 0.24028913, 0.25260646, 0.28651964, 0.15484674],
       [0.12112858, 0.11112768, 0.11250295, 0.53089178, 0.12434901],
       [0.38740289, 0.15078851, 0.08953237, 0.19059779, 0.18167844],
       ...,
       [0.08005472, 0.06012911, 0.16478034, 0.56084711, 0.13418871],
       [0.10006992, 0.1113494 , 0.21564907, 0.43883464, 0.13409696],
       [0.06697997, 0.10993549, 0.08865547, 0.61475259, 0.11967648]])

步骤5：回归任务

使用make_regression，创建一个回归数据集

n_samples：int，默认=100
样本数。
n_features：int，默认=100
特征的数量。
n_informative：int，默认=10
信息特征的数量，即用于构建用于生成输出的线性模型的特征数量。
n_targets：int，默认=1
回归目标的数量，即与样本相关的 y 输出向量的维度。默认情况下，输出是一个标量。
bias：float，默认=0.0
基础线性模型中的偏差项。
Effective_rank：int，默认=无
if not None：
通过线性组合解释大部分输入数据所需的奇异向量的近似数量。在输入中使用这种奇异谱允许生成器重现实践中经常观察到的相关性。
if None：
输入集条件良好、居中且具有单位方差的高斯分布。
tail_strength，float，默认=0.5
如果effective_rank不是“无” ，则奇异值轮廓的胖噪声尾部的相对重要性。当一个浮点数时，它应该在 0 和 1 之间。
noise，float,默认=0.0
应用于输出的高斯噪声的标准偏差。
shuffle，bool,默认=True
随机播放样本和特征。
coef，bool，默认=False
如果为 True，则返回基础线性模型的系数。
random_state：int，RandomState 实例或无，默认=无
确定数据集创建的随机数生成。跨多个函数调用传递一个 int 以实现可重现的输出。请参阅词汇表。

# 使用make_classification，创建一个多分类数据集
reg_data = make_regression(
                        n_samples=10000, n_features=20, n_informative=4,
                        shuffle=True, random_state=2022)
data = pd.DataFrame(reg_data[0])
label = pd.DataFrame(reg_data[1])

# label.value_counts()

使用sklearn接口完成训练和预测

from sklearn.model_selection import train_test_split

# 将原始数据划分为训练，测试，验证集
train_data_all,test_data,train_y_all,test_y = \
                train_test_split(data, label,test_size=0.2,random_state=1,shuffle=True)
train_data,val_data,train_y,val_y = \
                train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True)

params = {
            'num_leaves':54,
            'objective':'regression',
            'max_depth':18,
            'learning_rate':0.01,
            'boosting':'gbdt',
            'metric':'rmse',
            'lambda_l1':0.1
        }

model = lgb.LGBMRegressor(**params, n_estimators = 20000, nthread = 4, n_hobs = -1)

model.fit(train_data,train_y,
         eval_set=[(val_data, val_y)],
         eval_metric='rmse',
         verbose=400,early_stopping_rounds=200)

[400]	valid_0's rmse: 10.0853
[800]	valid_0's rmse: 7.47665
[1200]	valid_0's rmse: 7.39721
[1600]	valid_0's rmse: 7.37381
[2000]	valid_0's rmse: 7.3529
[2400]	valid_0's rmse: 7.34084





LGBMRegressor(boosting='gbdt', lambda_l1=0.1, learning_rate=0.01, max_depth=18,
              metric='rmse', n_estimators=20000, n_hobs=-1, nthread=4,
              num_leaves=54, objective='regression')

# 模型预测
y_pred = model.predict(test_data)

y_pred

array([-44.12178542, -66.2419684 ,  64.01033108, ..., 165.45905462,
       -92.42507017,  -3.32388838])

test_y

	0
9953	-54.013487
3850	-65.165979
4962	73.254042
3886	-10.609690
5437	-182.511371
...	...
3919	-121.955407
162	-5.252414
7903	174.163314
2242	-84.114293
2745	-6.554235

2000 rows × 1 columns

使用原生train接口完成训练和预测

# 构建数据集
lgb_train = lgb.Dataset(train_data,label=train_y)
lgb_val = lgb.Dataset(val_data,label=val_y)

# lgbt直接train的代码
params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbose': 0
}

# generate feature names
# feature_name = ['feature_' + str(col) for col in range(num_feature)]

reg = lgb.train(params,
          lgb_train,
          num_boost_round=10,
          valid_sets=lgb_val,  # eval training data
          # feature_name=feature_name,
          categorical_feature=[21])

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000764 seconds.
You can set `force_col_wise=true` to remove the overhead.
[1]	valid_0's binary_logloss: 0.668837
[2]	valid_0's binary_logloss: 0.653079
[3]	valid_0's binary_logloss: 0.631535
[4]	valid_0's binary_logloss: 0.600479
[5]	valid_0's binary_logloss: 0.572133
[6]	valid_0's binary_logloss: 0.545759
[7]	valid_0's binary_logloss: 0.535201
[8]	valid_0's binary_logloss: 0.519968
[9]	valid_0's binary_logloss: 0.510698
[10]	valid_0's binary_logloss: 0.489484

y_pre = reg.predict(test_data)

y_pre

array([0.3692487 , 0.41450911, 0.6040802 , ..., 0.67991224, 0.40550047,
       0.46091694])

本文标签： Coogle lightgbm

版权声明：本文标题：Coogle学习 LightGBM 任务三内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://www.elefans.com/xitong/1728549868a1163291.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

Coogle学习 LightGBM 任务三

任务3：分类、回归和排序任务

步骤1

sklearn的LGBMClassifier模型

sklearn的LGBMRegressor模型

sklearn的LGBMRanker模型

步骤2：使用LightGBM的原生train接口

步骤3：二分类任务

使用make_classification，创建一个二分类数据集

使用sklearn接口完成训练和预测

使用原生train接口完成训练和预测

步骤4：多分类任务

使用make_classification，创建一个多分类数据集

使用sklearn接口完成训练和预测

使用原生train接口完成训练和预测

步骤5：回归任务

使用make_regression，创建一个回归数据集

使用sklearn接口完成训练和预测

使用原生train接口完成训练和预测

更多相关文章

使用Pyspark 运行lightgbm的预测函数时遇到 expected zero arguments for construction of ClassDict (for numpy.dtype)

LightGBM训练过程中的‘No further splits with positive gain‘警告解析

【lgb去除警告，设置早停】[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

消除LightGBM训练过程中出现的[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

LightGBM] [Warning] No further splits with positive gain, best gain: -inf

lightGBM专题1：基于Python的lightGBM做二分类

Coogle学习 LightGBM 任务三

LightGBM两种使用方式

Lightgbm多余信息显示

发表评论

推荐文章

Ubuntu22.04安装freecad、ODAFileConverter

电脑常用软件下载地址

计算机固态加机械硬盘,在台式机中添加固态机械硬盘驱动器，让我与这篇文章一起教你...

手机蹭网新体验之WiFi精灵

php流光字,各种漂亮的流光字在线制作方法

热门文章

GooglePlay 金融品类政策更新（7月17号）

PDF转word文档（java）

机器学习与深度学习资料整理

电脑运行应用程序出现0xc000007b的解决方法

DBeaver连接MySQL时报错Connection refused: no further information

达摩院榜首模型人脸检测MogFace CVPR论文深入解读

2023年全国职业院校技能大赛 河南省选拔赛 建筑智能化系统安装与调试赛项（教师组）竞赛方案

App-V软件排序参考之（二）：Office 2007英文版+多国语言包 (1)

《认识我们人类自己》江湖一剑客

论开学第四个月干了点啥

最新文章

电脑忘记密码无法登录解决方案

win11家庭版开机密码忘记了怎么办？

电脑忘记开机密码很着急？一招搞定

win11系统 忘记开机密码重置密码方法

[转]信息安全相关理论题(三)

windows电脑忘记了开机密码

win10计算机用户密码,win10台式电脑怎么设置开机密码

iPhone开机密码什么时候会用到？忘记了怎么办？

电脑开机密码忘记了，怎么办？

破解WiFi！！——由airmon-ng引起的纷争

1 “IT小百科”之“电脑开机密码忘记了怎么办”

Wins10系统忘记开机密码快速解锁方法（图文教程）

进bios怎么改开机密码

渗透测试--6.2.mdk3攻击wifi

[转]信息安全相关理论题(二)

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

2023年全国职业院校技能大赛河南省选拔赛建筑智能化系统安装与调试赛项（教师组）竞赛方案

win11系统忘记开机密码重置密码方法

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载