admin管理员组文章数量:1608813
LightGBM(Light Gradient Boosting Machine) 是微软开源的一个实现 GBDT 算法的框架,支持高效率的并行训练。
-
更快的训练速度
-
更低的内存消耗
-
更好的准确率
-
分布式支持,可以快速处理海量数据
LightGBM是一个梯度提升框架,使用基于树的学习算法。
LightGBM树的生长方式是垂直方向的,其他的算法都是水平方向的,也就是说LightGBM生长的是树的叶子,其他的算法生长的是树的层次。LightGBM选择具有最大误差的树叶进行生长,当生长同样的树叶,生长叶子的算法可以比基于层的算法减少更多的loss。
原生形式使用lightgbm(import lightgbm as lgb)
import lightgbm as lgb
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# 加载数据
iris = load_iris()
data = iris.data
target = iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2)
print("Train data length:", len(X_train))
print("Test data length:", len(X_test))
# 转换为Dataset数据格式
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# 参数
params = {
'task': 'train',
'boosting_type': 'gbdt', # 设置提升类型
'objective': 'regression', # 目标函数
'metric': {'l2', 'auc'}, # 评估函数
'num_leaves': 31, # 叶子节点数
'learning_rate': 0.05, # 学习速率
'feature_fraction': 0.9, # 建树的特征选择比例
'bagging_fraction': 0.8, # 建树的样本采样比例
'bagging_freq': 5, # k 意味着每 k 次迭代执行bagging
'verbose': 1 # <0 显示致命的, =0 显示错误 (警告), >0 显示信息
}
# 模型训练
gbm = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, early_stopping_rounds=5)
# 模型保存
gbm.save_model('model.txt')
# 模型加载
gbm = lgb.Booster(model_file='model.txt')
# 模型预测
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# 模型评估
print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5)
'''
Train data length: 120
Test data length: 30
[LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000019 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 89
[LightGBM] [Info] Number of data points in the train set: 120, number of used features: 4
[LightGBM] [Info] Start training from score 1.016667
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1] valid_0's auc: 0.9775 valid_0's l2: 0.548619
Training until validation scores don't improve for 5 rounds
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2] valid_0's auc: 1 valid_0's l2: 0.500157
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3] valid_0's auc: 1 valid_0's l2: 0.454786
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4] valid_0's auc: 1 valid_0's l2: 0.414112
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[5] valid_0's auc: 1 valid_0's l2: 0.377665
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[6] valid_0's auc: 1 valid_0's l2: 0.346867
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[7] valid_0's auc: 1 valid_0's l2: 0.319188
Early stopping, best iteration is:
[2] valid_0's auc: 1 valid_0's l2: 0.500157
The rmse of prediction is: 0.7072175933903914
'''
Sklearn接口形式使用lightgbm(from lightgbm import LGBMRegressor)
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import joblib
# 加载数据
iris = load_iris()
data = iris.data
target = iris.target
# 划分训练数据和测试数据
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2)
# 模型训练
gbm = LGBMRegressor(objective='regression', num_leaves=31, learning_rate=0.05, n_estimators=20)
gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], eval_metric='l1', early_stopping_rounds=5)
# 模型存储
joblib.dump(gbm, 'loan_model.pkl')
# 模型加载
gbm = joblib.load('loan_model.pkl')
# 模型预测
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration_)
# 模型评估
print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5)
# 特征重要度
print('Feature importances:', list(gbm.feature_importances_))
# 网格搜索,参数优化
estimator = LGBMRegressor(num_leaves=31)
param_grid = {
'learning_rate': [0.01, 0.1, 1],
'n_estimators': [20, 40]
}
gbm = GridSearchCV(estimator, param_grid)
gbm.fit(X_train, y_train)
print('Best parameters found by grid search are:', gbm.best_params_)
'''
[1] valid_0's l1: 0.564611 valid_0's l2: 0.53568
Training until validation scores don't improve for 5 rounds
[2] valid_0's l1: 0.541868 valid_0's l2: 0.492686
[3] valid_0's l1: 0.520262 valid_0's l2: 0.45387
[4] valid_0's l1: 0.499592 valid_0's l2: 0.419784
[5] valid_0's l1: 0.475829 valid_0's l2: 0.383425
[6] valid_0's l1: 0.457481 valid_0's l2: 0.354883
[7] valid_0's l1: 0.436038 valid_0's l2: 0.324898
[8] valid_0's l1: 0.419327 valid_0's l2: 0.302255
[9] valid_0's l1: 0.399978 valid_0's l2: 0.27748
[10] valid_0's l1: 0.385154 valid_0's l2: 0.258424
[11] valid_0's l1: 0.37125 valid_0's l2: 0.240029
[12] valid_0's l1: 0.359304 valid_0's l2: 0.225339
[13] valid_0's l1: 0.344684 valid_0's l2: 0.208233
[14] valid_0's l1: 0.332142 valid_0's l2: 0.194488
[15] valid_0's l1: 0.320227 valid_0's l2: 0.182062
[16] valid_0's l1: 0.310099 valid_0's l2: 0.169595
[17] valid_0's l1: 0.30074 valid_0's l2: 0.16047
[18] valid_0's l1: 0.29047 valid_0's l2: 0.151185
[19] valid_0's l1: 0.280713 valid_0's l2: 0.142789
[20] valid_0's l1: 0.270687 valid_0's l2: 0.133844
Did not meet early stopping. Best iteration is:
[20] valid_0's l1: 0.270687 valid_0's l2: 0.133844
The rmse of prediction is: 0.36584694593602285
Feature importances: [9, 6, 44, 10]
Best parameters found by grid search are: {'learning_rate': 0.1, 'n_estimators': 40}
'''
eval_metric:【默认=通过目标函数选择】
rmse:均方根误差
mae: 平均绝对值误差
logloss:negative log-likelihood
error:二分类错误率=错误分类数目/全部分类数目。对于预测,预测值>0.5被认为是正类,其他归为负类。error@t:不同的划分阈值可以通过 ‘t’进行设置
merror:多分类错误率=错误分类数目/全部分类数目
mlogloss:多分类log损失
auc:曲线下的面积
map:平均正确率
调参1:提高准确率":num_leaves, max_depth, learning_rate
调参2:降低过拟合 max_bin min_data_in_leaf
调参3:降低过拟合 正则化L1, L2
调参4:降低过拟合 数据抽样 列抽样
版权声明:本文标题:LightGBM两种使用方式 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/dianzi/1728550133a1163320.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论