梯度下降似乎失败了

本文介绍了梯度下降似乎失败了的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述我实现了一种梯度下降算法，以最小化成本函数，从而获得用于确定图像质量是否良好的假设.我是在Octave中做到的.这个想法以某种方式基于Andrew Ng的机器学习课程中的算法>

因此，我有880个值"y"，其中包含从0.5到〜12的值.而且我在"X"中有880个值(从50到300)可以预测图像的质量.

可悲的是，该算法似乎失败了，在一些迭代之后，theta的值是如此之小，以至于theta0和theta1变成了"NaN".而且我的线性回归曲线的值很奇怪...

这是梯度下降算法的代码: (theta = zeros(2, 1);，alpha = 0.01，迭代次数= 1500)

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters tmp_j1=0; for i=1:m, tmp_j1 = tmp_j1+ ((theta (1,1) + theta (2,1)*X(i,2)) - y(i)); end tmp_j2=0; for i=1:m, tmp_j2 = tmp_j2+ (((theta (1,1) + theta (2,1)*X(i,2)) - y(i)) *X(i,2)); end tmp1= theta(1,1) - (alpha * ((1/m) * tmp_j1)) tmp2= theta(2,1) - (alpha * ((1/m) * tmp_j2)) theta(1,1)=tmp1 theta(2,1)=tmp2 % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCost(X, y, theta); end end

这是成本函数的计算:

function J = computeCost(X, y, theta) % m = length(y); % number of training examples J = 0; tmp=0; for i=1:m, tmp = tmp+ (theta (1,1) + theta (2,1)*X(i,2) - y(i))^2; %differenzberechnung end J= (1/(2*m)) * tmp end

解决方案

我认为您的computeCost函数是错误的. 我去年参加了NG的课程，并且有以下实现(矢量化):

m = length(y); J = 0; predictions = X * theta; sqrErrors = (predictions-y).^2; J = 1/(2*m) * sum(sqrErrors);

其余的实现对我来说似乎还不错，尽管您也可以将它们矢量化.

theta_1 = theta(1) - alpha * (1/m) * sum((X*theta-y).*X(:,1)); theta_2 = theta(2) - alpha * (1/m) * sum((X*theta-y).*X(:,2));

然后，将正确的临时theta(此处称为theta_1和theta_2)正确设置回真实" theta.

通常，矢量化而不是循环更有用，读取和调试的烦恼也更少.

I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning class by Andrew Ng

Therefore I have 880 values "y" that contains values from 0.5 to ~12. And I have 880 values from 50 to 300 in "X" that should predict the image's quality.

Sadly the algorithm seems to fail, after some iterations the value for theta is so small, that theta0 and theta1 become "NaN". And my linear regression curve has strange values...

here is the code for the gradient descent algorithm: (theta = zeros(2, 1);, alpha= 0.01, iterations=1500)

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters tmp_j1=0; for i=1:m, tmp_j1 = tmp_j1+ ((theta (1,1) + theta (2,1)*X(i,2)) - y(i)); end tmp_j2=0; for i=1:m, tmp_j2 = tmp_j2+ (((theta (1,1) + theta (2,1)*X(i,2)) - y(i)) *X(i,2)); end tmp1= theta(1,1) - (alpha * ((1/m) * tmp_j1)) tmp2= theta(2,1) - (alpha * ((1/m) * tmp_j2)) theta(1,1)=tmp1 theta(2,1)=tmp2 % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCost(X, y, theta); end end

And here is the computation for the costfunction:

function J = computeCost(X, y, theta) % m = length(y); % number of training examples J = 0; tmp=0; for i=1:m, tmp = tmp+ (theta (1,1) + theta (2,1)*X(i,2) - y(i))^2; %differenzberechnung end J= (1/(2*m)) * tmp end

解决方案

I think that your computeCost function is wrong. I attended NG's class last year and I have the following implementation (vectorized):

m = length(y); J = 0; predictions = X * theta; sqrErrors = (predictions-y).^2; J = 1/(2*m) * sum(sqrErrors);

The rest of the implementation seems fine to me, although you could also vectorize them.

theta_1 = theta(1) - alpha * (1/m) * sum((X*theta-y).*X(:,1)); theta_2 = theta(2) - alpha * (1/m) * sum((X*theta-y).*X(:,2));

Afterwards you are setting the temporary thetas (here called theta_1 and theta_2) correctly back to the "real" theta.

Generally it is more useful to vectorize instead of loops, it is less annoying to read and to debug.

更多推荐

梯度下降似乎失败了

梯度下降似乎失败了

发布评论取消回复

最近发表

热门文章

标签列表