为了进一步探索keras-tf RNN功能和不同参数,我决定解决上述玩具问题-
其背后的想法是EWMA对如何使用序列的历史"有一个非常清晰而简单的定义-
EWMA t =(1-alpha)*平均值 t-1 + alpha * x t
我的假设是,当查看一个简单的RNN单元时,当前输入的神经元为单个,先前状态的神经元为单个,则方程的(1-alpha)部分可以直接是先前隐藏状态的权重,而 alpha 部分可以是网络经过充分训练后的当前输入权重.
例如对于alpha = 0.2,我希望一旦训练后网络的权重为:
Waa = [0.8](先前状态的权重参数)
Wxa = [0.2](当前输入的权重参数)
我使用numpy非常简单地模拟了数据集和标签.
目前,我已经使用反向传播实现了自己的简单rnn. 我用MSE来衡量损失和SGD,它很快收敛到上述参数.一次只能在一个输入上工作.
我尝试使用keras和tensorflow尝试不同的网络配置,但似乎没有一个能碰到头.我想知道什么是最好的建议方式来复制玩具RNN的行为.这是我的玩具神经网络-
import numpy as np np.random.seed(1337) # for reproducibility def run_avg(signal, alpha=0.2): avg_signal = [] avg = np.mean(signal) for i, sample in enumerate(signal): if np.isnan(sample) or sample == 0: sample = avg avg = (1 - alpha) * avg + alpha * sample avg_signal.append(avg) return np.array(avg_signal) X = np.random.rand(10000) Y = run_avg(X) def train(X,Y): W_a = np.random.rand() W_x = np.random.rand() b = np.random.rand() a = np.random.rand() lr = 0.001 for i in range(100): for x,y in zip(X,Y): y_hat = W_x * x + W_a * a + b L = (y-y_hat)**2 dL_dW_a = (y - y_hat) * a dL_dW_x = (y - y_hat) * x dL_db = (y - y_hat) * 1 W_a = W_a + dL_dW_a*lr W_x = W_x + dL_dW_x*lr b = b + dL_db*lr a = y_hat print("epoch " ,str(i), " LOSS = ", L, " W_a = ", W_a, " W_x = ", W_x , " b = " ,b) train(X,Y)与keras-tf simpleRNN相比,有关实现的几点说明-
考虑到EWMA算法的性质,当然还有很多要补充的地方,因为它可以存储整个序列的历史信息,而不仅仅是窗口,而且可以使信息更短并得出结论,您将如何使用简单的RNN或任何神经网络来预测EWMA?
我如何在喀拉拉邦复制玩具神经网络的行为?
更新: 似乎阻止我解决此问题的主要问题似乎是由于使用本机" keras(导入keras)而不是tensorflow实现(来自tensorflow导入keras). 在此处发布了有关此问题的更具体的问题
解决方案在喀拉斯邦复制玩具神经网络行为的代码如下所示:
from tensorflow import keras import numpy as np from tensorflow.keras.models import Sequential as Sequential np.random.seed(1337) # for reproducibility def run_avg(signal, alpha=0.2): avg_signal = [] avg = np.mean(signal) for i, sample in enumerate(signal): if np.isnan(sample) or sample == 0: sample = avg avg = (1 - alpha) * avg + alpha * sample avg_signal.append(avg) return np.array(avg_signal) def train(): x = np.random.rand(3000) y = run_avg(x) x = np.reshape(x, (-1, 1, 1)) y = np.reshape(y, (-1, 1)) # SimpleRNN model model = Sequential() model.add(Dense(32, batch_input_shape=(1,1,1), dtype='float32')) model.add(keras.layers.SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1')) modelpile(optimizer=keras.optimizers.SGD(lr=0.1), loss='mse') model.summary() print(model.get_layer('rnn_layer_1').get_weights()) model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False) print(model.get_layer('rnn_layer_1').get_weights()) train()In an attempt to further explore the keras-tf RNN capabilities and different parameters, i decided to solve a toy problem as described -
The idea behind it is that EWMA has a very clear and simple definition of how it uses the "history" of the sequence -
EWMAt = (1-alpha)*averaget-1 + alpha*xt
My assumption is, that when looking at a simple RNN cell with a single neuron for current input and a single one for the previous state, the (1-alpha) part of the equation can directly be the weight of the previous hidden state, and the alpha part can be the weight of current input, once the network is fully trained.
so for example for alpha = 0.2, i expect the weights of the network once trained to be:
Waa = [0.8] (weight parameter for previous state)
Wxa = [0.2] (weight parameter for current input)
i simulated the data set and labels in a pretty much straight forward way using numpy.
currently i have implemented my own simple rnn with back propagation. i used MSE for loss, and SGD, and it converges to the said parameters pretty fast. it works on a single input at a time.
iv'e tried different network configurations using keras and tensorflow, but none seem to hit the nail on the head. i am wondering what is your best suggested way to replicate the behavior of the toy RNN.
here is my toy neural network -
import numpy as np np.random.seed(1337) # for reproducibility def run_avg(signal, alpha=0.2): avg_signal = [] avg = np.mean(signal) for i, sample in enumerate(signal): if np.isnan(sample) or sample == 0: sample = avg avg = (1 - alpha) * avg + alpha * sample avg_signal.append(avg) return np.array(avg_signal) X = np.random.rand(10000) Y = run_avg(X) def train(X,Y): W_a = np.random.rand() W_x = np.random.rand() b = np.random.rand() a = np.random.rand() lr = 0.001 for i in range(100): for x,y in zip(X,Y): y_hat = W_x * x + W_a * a + b L = (y-y_hat)**2 dL_dW_a = (y - y_hat) * a dL_dW_x = (y - y_hat) * x dL_db = (y - y_hat) * 1 W_a = W_a + dL_dW_a*lr W_x = W_x + dL_dW_x*lr b = b + dL_db*lr a = y_hat print("epoch " ,str(i), " LOSS = ", L, " W_a = ", W_a, " W_x = ", W_x , " b = " ,b) train(X,Y)a few remarks on the implementation, compared to keras-tf simpleRNN -
There is of course a lot to be added on the nature of the EWMA algorithm, given the fact that it holds information on the entire history of the sequence, and not just the window, but to keep things shorter and to conclude, how would you go about predicting EWMA with a simple RNN or any neural network for that matter?
how can i replicate the behavior of the toy neural network in keras?
update: it seems as if the main problem preventing me from solving this is due to using "native" keras (import keras) and not the tensorflow implementation (from tensorflow import keras). posted a more specific question about it here.
解决方案The code for replicating the behavior of the toy neural network in keras is shown below:
from tensorflow import keras import numpy as np from tensorflow.keras.models import Sequential as Sequential np.random.seed(1337) # for reproducibility def run_avg(signal, alpha=0.2): avg_signal = [] avg = np.mean(signal) for i, sample in enumerate(signal): if np.isnan(sample) or sample == 0: sample = avg avg = (1 - alpha) * avg + alpha * sample avg_signal.append(avg) return np.array(avg_signal) def train(): x = np.random.rand(3000) y = run_avg(x) x = np.reshape(x, (-1, 1, 1)) y = np.reshape(y, (-1, 1)) # SimpleRNN model model = Sequential() model.add(Dense(32, batch_input_shape=(1,1,1), dtype='float32')) model.add(keras.layers.SimpleRNN(1, stateful=True, activation=None, name='rnn_layer_1')) modelpile(optimizer=keras.optimizers.SGD(lr=0.1), loss='mse') model.summary() print(model.get_layer('rnn_layer_1').get_weights()) model.fit(x=x, y=y, batch_size=1, epochs=10, shuffle=False) print(model.get_layer('rnn_layer_1').get_weights()) train()
更多推荐
使用简单rnn预测指数加权平均值
发布评论