admin管理员组文章数量:1666734
Ornstein-Uhlenbeck Action Noise
An OrnsteinUhlenbeckActionNoise object has the following numeric value
properties.
PropertyDescriptionDefault ValueInitialActionInitial value of action for noise model0
MeanNoise model mean0
MeanAttractionConstantConstant specifying how quickly the noise model output is attracted to the mean0.15
StandardDeviationDecayRateDecay rate of the standard deviation0
StandardDeviationNoise model standard deviation0.3
StandardDeviationMinMinimum standard deviation0
At each sample time step k, the noise value v(k) is
updated using the following formula, where Ts is the agent sample
time, and the initial value v(1) is defined by the InitialAction
parameter.
v(k+1) = v(k) + MeanAttractionConstant.*(Mean - v(k)).*Ts
+ StandardDeviation(k).*randn(size(Mean)).*sqrt(Ts)
At each sample time step, the standard deviation decays as shown in the following code.
decayedStandardDeviation = StandardDeviation(k).*(1 - StandardDeviationDecayRate);
StandardDeviation(k+1) = max(decayedStandardDeviation,StandardDeviationMin);
You can calculate how many samples it will take for the standard deviation to be halved using
this simple formula.
halflife = log(0.5)/log(1-StandardDeviationDecayRate);
For continuous action signals, it is important to set the noise standard deviation
appropriately to encourage exploration. It is common to set
StandardDeviation*sqrt(Ts) to a value between 1% and 10% of
your action range.
If your agent converges on local optima too quickly, promote agent exploration by increasing
the amount of noise; that is, by increasing the standard deviation. Also, to increase
exploration, you can reduce the StandardDeviationDecayRate.
版权声明:本文标题:matlab agent,Options for DDPG agent 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/xitong/1730076094a1221795.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论