SMAC源码分析

编程入门行业动态更新时间:2024-10-26 02:26:44

SMAC<a href=https://www.elefans.com/category/jswz/34/1770099.html style= 源码分析"/>

SMAC源码分析

文章目录

初始化与DOE
- 初始采样次数的确定
- DOE
- 将design向量封装为Configuration
代理模型训练与下一个样本推荐
- 将runhistory转化为可供模型训练的张量
- conditional parameters导致的缺失值处理
- 代理模型的构建
- 采集函数(acquisition function)
- 局部搜索

这次依然用官方的Example：examples/SMAC4HPO_svm.py

初始化与DOE

DOE: Design of Experiment

初始采样次数的确定

self.init_budget = 12

        self.init_budget = int(max(1, min(n_configs_x_params * n_params,(max_config_fracs * scenario.ta_run_limit))))

n_configs_x_params
Out[2]: 10
n_params
Out[3]: 7
max_config_fracs
Out[4]: 0.25
scenario.ta_run_limit
Out[5]: 50.0
scenario.ta_run_limit*0.25
Out[6]: 12.5

        n_configs_x_params: inthow many configurations will be used at most in the initial design (X*D)max_config_fracs: floatuse at most X*budget in the initial design. Not active if a time limit is given.

DOE

smac.initial_design.sobol_design.SobolDesign#_select_configurations

sobol = sobol_seq.i4_sobol_generate(len(params) - constants, self.init_budget)

看看算出来这个东西是什么

sobol
Out[6]: 
array([[0.5   , 0.5   , 0.5   , 0.5   , 0.5   , 0.5   , 0.5   ],[0.75  , 0.25  , 0.75  , 0.25  , 0.75  , 0.25  , 0.75  ],[0.25  , 0.75  , 0.25  , 0.75  , 0.25  , 0.75  , 0.25  ],[0.375 , 0.375 , 0.625 , 0.125 , 0.875 , 0.875 , 0.125 ],[0.875 , 0.875 , 0.125 , 0.625 , 0.375 , 0.375 , 0.625 ],[0.625 , 0.125 , 0.375 , 0.375 , 0.125 , 0.625 , 0.875 ],[0.125 , 0.625 , 0.875 , 0.875 , 0.625 , 0.125 , 0.375 ],[0.1875, 0.3125, 0.3125, 0.6875, 0.5625, 0.1875, 0.0625],[0.6875, 0.8125, 0.8125, 0.1875, 0.0625, 0.6875, 0.5625],[0.9375, 0.0625, 0.5625, 0.9375, 0.3125, 0.4375, 0.8125],[0.4375, 0.5625, 0.0625, 0.4375, 0.8125, 0.9375, 0.3125],[0.3125, 0.1875, 0.9375, 0.5625, 0.4375, 0.8125, 0.1875]])

第一个参数是维数，第二个参数是样本数。
不过需要注意的是，sobol_seq规定，维数不能大于40 。

self.init_budget = 12

将design向量封装为Configuration

sobol_seq.sobol_seq.i4_sobol_generate

cs
Out[11]: 
Configuration space object:Hyperparameters:C, Type: UniformFloat, Range: [0.001, 1000.0], Default: 1.0coef0, Type: UniformFloat, Range: [0.0, 10.0], Default: 0.0degree, Type: UniformInteger, Range: [1, 5], Default: 3gamma, Type: Categorical, Choices: {auto, value}, Default: autogamma_value, Type: UniformFloat, Range: [0.0001, 8.0], Default: 1.0kernel, Type: Categorical, Choices: {linear, rbf, poly, sigmoid}, Default: polyshrinking, Type: Categorical, Choices: {true, false}, Default: trueConditions:coef0 | kernel in {'poly', 'sigmoid'}degree | kernel in {'poly'}gamma | kernel in {'rbf', 'poly', 'sigmoid'}gamma_value | gamma in {'value'}cs.get_hyperparameters()
Out[17]: 
[C, Type: UniformFloat, Range: [0.001, 1000.0], Default: 1.0,kernel, Type: Categorical, Choices: {linear, rbf, poly, sigmoid}, Default: poly,shrinking, Type: Categorical, Choices: {true, false}, Default: true,coef0, Type: UniformFloat, Range: [0.0, 10.0], Default: 0.0,degree, Type: UniformInteger, Range: [1, 5], Default: 3,gamma, Type: Categorical, Choices: {auto, value}, Default: auto,gamma_value, Type: UniformFloat, Range: [0.0001, 8.0], Default: 1.0]vector
Out[13]: array([0.5, 2. , 1. , 0.5, 0.5, 1. , 0.5])conf
Out[15]: 
Configuration:C, Value: 500.0005coef0, Value: 5.0degree, Value: 3gamma, Value: 'value'gamma_value, Value: 4.00005kernel, Value: 'poly'shrinking, Value: 'false'

根据观察，cs.get_hyperparameters()的顺序和代码中添加超参的顺序是一致的。

Categorical索引从0开始的。

Numerical的value值域都是(0,1)

代理模型训练与下一个样本推荐

将runhistory转化为可供模型训练的张量

[self.runhistory.get_cost(config) for config in self.runhistory.get_all_configs()]
Out[18]: 
[0.07333333333333325,0.046666666666666634,0.6666666666666667,

smac.optimizer.smbo.SMBO#run

X, Y = self.rh2EPM.transform(self.runhistory)

smac.runhistory.runhistory2epm.RunHistory2EPM4LogScaledCost#transform_response_values

Y的数据是被log scale过的。

简单看了一下当前代码文件下的其他函数，RunHistory2EPM4SqrtScaledCost这个类，先minmax到[0, 1]，然后开方。

SMAC论文提到，做log scale会提升模型的表现。（P7，参考文献20）

从直觉的角度，ln-scale让[0, 0.5]的部分区分度更大，sqrt 让[0.5, 1]区分度更大。

conditional parameters导致的缺失值处理

automl的CASH问题，在超参空间的描述上有3个最为关键的问题：

异质类型。即既有numerical又有categorical。
conditional parameters。即某个变量是否作用取决于另一个变量（即双亲变量）的取值。
高维特征。auto-sklearn有110个超参，auto-WEKA有768个超参。

BOHB是通过随机填充的方式做impute的，TPE不存在impute，因为他在树上做后验分布传递。

看懂了怎么准备供代理模型训练的X和Y，继续看到推荐样本点部分。

smac.optimizer.smbo.SMBO#choose_next

self.model.train(X, Y)

self.model
Out[2]: <smac.epm.rf_with_instances.RandomForestWithInstances at 0x7f786ce27940>

smac.epm.base_epm.AbstractEPM#train

return self._train(X, Y)

smac.epm.rf_with_instances.RandomForestWithInstances#_train

X = self._impute_inactive(X)

填充前

X[:,:5]
Out[4]: 
array([[0.5      , 2.       , 1.       , 0.5      , 0.5      ],[0.75     , 1.       , 1.       ,       nan,       nan],[0.25     , 3.       , 0.       , 0.75     ,       nan],[0.375    , 1.       , 1.       ,       nan,       nan],[0.875    , 3.       , 0.       , 0.625    ,       nan],[0.625    , 0.       , 0.       ,       nan,       nan],[0.125    , 2.       , 1.       , 0.875    , 0.7000008],[0.1875   , 1.       , 0.       ,       nan,       nan],[0.6875   , 3.       , 1.       , 0.1875   ,       nan],[0.9375   , 0.       , 1.       ,       nan,       nan],[0.4375   , 2.       , 0.       , 0.4375   , 0.9000016],[0.3125   , 0.       , 1.       ,       nan,       nan]])

填充后

X[:,:5]
Out[3]: 
array([[ 0.5      ,  2.       ,  1.       ,  0.5      ,  0.5      ],[ 0.75     ,  1.       ,  1.       , -1.       , -1.       ],[ 0.25     ,  3.       ,  0.       ,  0.75     , -1.       ],[ 0.375    ,  1.       ,  1.       , -1.       , -1.       ],[ 0.875    ,  3.       ,  0.       ,  0.625    , -1.       ],[ 0.625    ,  0.       ,  0.       , -1.       , -1.       ],[ 0.125    ,  2.       ,  1.       ,  0.875    ,  0.7000008],[ 0.1875   ,  1.       ,  0.       , -1.       , -1.       ],[ 0.6875   ,  3.       ,  1.       ,  0.1875   , -1.       ],[ 0.9375   ,  0.       ,  1.       , -1.       , -1.       ],[ 0.4375   ,  2.       ,  0.       ,  0.4375   ,  0.9000016],[ 0.3125   ,  0.       ,  1.       , -1.       , -1.       ]])

让我们看看SMAC是怎么处理缺失值的（inactive parameters）

smac.epm.base_rf.BaseModel#_impute_inactive

与BOHB的缺失值处理（随机填充）不同，

                parents = self.configspace.get_parents_of(hp.name)if len(parents) == 0:self.impute_values[idx] = None

获取双亲结点（结构空间，或者说conditional Configuration space，其实是一个数状空间）
如果发现这个结点不是根节点（有双亲），需要处理。没有双亲，map值为None

                else:if isinstance(hp, CategoricalHyperparameter):self.impute_values[idx] = len(hp.choices)elif isinstance(hp, (UniformFloatHyperparameter, UniformIntegerHyperparameter)):self.impute_values[idx] = -1elif isinstance(hp, Constant):self.impute_values[idx] = 1else:raise ValueError

大概看了下，都是填充一些不存在点

CategoricalHyperparameter填充一个out of bound的len(hp.choices)
UniformFloatHyperparameter和UniformIntegerHyperparameter填充-1，而标准的分布应该在(0, 1)
Constant的变量应该是不提供给训练的。

代理模型的构建

        if self.n_points_per_tree <= 0:self.rf_opts.num_data_points_per_tree = self.X.shape[0]else:self.rf_opts.num_data_points_per_tree = self.n_points_per_tree

self.n_points_per_tree 
Out[4]: -1self.X.shape[0]
Out[6]: 12

-1表示全部，这里值每棵树可以看到所有的样本（但是特征会取子集，应该是83%）

        self.rf = regression.binary_rss_forest()self.rf.options = self.rf_optsdata = self._init_data_container(self.X, self.y)self.rf.fit(data, rng=self.rng)return self

这里一顿操作，表示看不懂。到时候看一下pyrfr的源码，再对比一下skopt的RF和ET

采集函数(acquisition function)

回到smac.optimizer.smbo.SMBO#choose_next函数

self.acquisition_func.update(model=self.model, eta=incumbent_value, num_data=len(self.runhistory.data))challengers = self.acq_optimizer.maximize(runhistory=self.runhistory,stats=self.stats,num_points=self.scenario.acq_opt_challengers,random_configuration_chooser=self.random_configuration_chooser
)

eta指的是skopt中的y_opt，update函数只是更新一下成员变量，重点在maximize函数。

self.acq_optimizer
Out[7]: <smac.optimizer.ei_optimization.InterleavedLocalAndRandomSearch at 0x7fb1f2713cf8>

SMAC论文中提到，在他之前的算法都只是随机搜索（即使看skopt的源码，也会发现RF的acq函数优化用的是随机搜索）

先看随机搜索

        next_configs_by_random_search_sorted = self.random_search._maximize(runhistory,stats,num_points,_sorted=True,)

smac.optimizer.ei_optimization.RandomSearch#_maximize

if _sorted:for i in range(len(rand_configs)):rand_configs[i].origin = 'Random Search (sorted)'return self._sort_configs_by_acq_value(rand_configs)

看acq值排序

smac.optimizer.ei_optimization.AcquisitionFunctionMaximizer#_sort_configs_by_acq_value

acq_values = self.acquisition_function(configs)

self.acquisition_function
Out[8]: <smac.optimizer.acquisition.LogEI at 0x7fb1f27139e8>

看到logEI计算的部分
smac.optimizer.acquisition.AbstractAcquisitionFunction#__call__

        X = convert_configurations_to_array(configurations)if len(X.shape) == 1:X = X[np.newaxis, :]acq = self._compute(X)

看到_compute函数
smac.optimizer.acquisition.LogEI#_compute

m, var_ = self.model.predict_marginalized_over_instances(X)

看看均值，特别是方差是怎么算的

smac.epm.rf_with_instances.RandomForestWithInstances#predict_marginalized_over_instances

看了一下，记两个点：

predict方差这个功能在pyrfr中原生实现了
predict的时候，X含nan

        def calculate_log_ei():# we expect that f_min is in log-spacef_min = self.eta - self.parv = (f_min - m) / stdreturn (np.exp(f_min) * norm.cdf(v)) - \(np.exp(0.5 * var_ + m) * norm.cdf(v - std))

看到这个log_ei的计算。和skopt有很大区别的地方在于，对于loss y，用了log scale。

f_min = self.eta - self.par
这里引入一个阈值par，把f_min降低一些，使得acquisition_function的计算更加aggressive。

v = (f_min - m) / std
将所有样本的分布标准化，可以理解为对于每个样本，其小于f_min的随机事件的分布(已标准化)， v这个值，类似于下图的x轴上的值。如果这个值越大，cdf(v)越大，样本小于f_min这个随机事件发生的概率越大。如果是PI采集函数，求到这就完事了。但是我们算的不是概率，而是期望。

(np.exp(f_min) * norm.cdf(v))
样本小于f_min这个随机事件发生的概率 x inverse_transform(f_min)

(np.exp(0.5 * var_ + m) * norm.cdf(v - std))
inverse_transform( 均值 + (1/2) x 方差 ) x …

这波公式成功触碰了我的知识盲区，有时间再研究一下。。。

介绍了EI采集函数探索部分与利用部分，与skopt写法一致
/

对于EI采集函数的详细推导
/

回到smac.optimizer.ei_optimization.AcquisitionFunctionMaximizer#_sort_configs_by_acq_value

random = self.rng.rand(len(acq_values))
indices = np.lexsort((random.flatten(), acq_values.flatten()))
return [(acq_values[ind][0], configs[ind]) for ind in indices[::-1]]

如果有acq值相同的，做一个shuffle

局部搜索

看到局部搜索

smac.optimizer.ei_optimization.LocalSearch#_maximize

论文也提到了这点，先随机采样+评估，然后从随机采样中好的点+历史点中所邻近搜索。

init_points = self._get_initial_points(num_points, runhistory, additional_start_points)

看看是怎样从邻近中采样的。

smac.optimizer.ei_optimization.LocalSearch#_get_initial_points

configs_previous_runs_sorted = self._sort_configs_by_acq_value(configs_previous_runs)

这个函数其实是先计算acq值，然后根据这个值排序，之前已经分析过了。

configs_previous_runs_sorted = [conf[1] for conf in configs_previous_runs_sorted[:num_points]]

num_points=10论文也提到取10个点

conf_array = convert_configurations_to_array(configs_previous_runs)
costs = self.acquisition_function.model.predict_marginalized_over_instances(conf_array)[0]
random = self.rng.rand(len(costs))
indices = np.lexsort((random.flatten(), costs.flatten()))

如果这些点的acq值相同，就会用cost值再评价一波。相当于先用acq排序，再用cost排序。

            for cand in itertools.chain(configs_previous_runs_sorted,configs_previous_runs_sorted_by_cost,additional_start_points,):

把一堆点串一起

发现这波操作只是获取初始点，还没做近邻搜索。。。

看到做近邻搜索的部分

smac.optimizer.ei_optimization.LocalSearch#_do_search

这个函数看起来很复杂

        for i, inc in enumerate(incumbents):neighborhood_iterators.append(get_one_exchange_neighbourhood(inc, seed=self.rng.randint(low=0, high=100000)))local_search_steps[i] += 1

论文也提到了get_one_exchange_neighbourhood的技术，可以仔细看看。

from ConfigSpace.util import get_one_exchange_neighbourhood
from smac.configspace.util import convert_configurations_to_arrayget_one_exchange_neighbourhood = partial(get_one_exchange_neighbourhood, stdev=0.05, num_neighbors=8)

好吧，这个函数是ConfigSpace库中的。

get_one_exchange_neighbourhood的返回值是一个迭代器

len(list(neighborhood_iterators[0]))
Out[22]: 12
len(list(neighborhood_iterators[1]))
Out[23]: 12
len(list(neighborhood_iterators[-1]))
Out[24]: 12

num_neighbors为8，但这里显示为12 。值得研究

后面操作太多了，先挂起。。。

感觉研究的差不多了

DOE的部分值得研究，可以穿插看一下robo的代码

更多推荐

SMAC源码分析

本文发布于:2024-03-07 05:18:24，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1716943.html

源码 SMAC

上一篇： SMAC安装的最好方法
下一篇： smac源码分析(1):初探smac

发布评论取消回复

评论列表（有 0 条评论）

SMAC源码分析