有状态 LSTM:何时重置状态?

编程入门行业动态更新时间:2024-10-10 21:21:25

本文介绍了有状态 LSTM:何时重置状态?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

给定 X 维度 (m 个样本、n 个序列和 k 个特征)，以及 y 个维度 (m样本，0/1):

Given X with dimensions (m samples, n sequences, and k features), and y labels with dimensions (m samples, 0/1):

假设我想训练一个有状态的 LSTM(根据 keras 定义，其中stateful = True"意味着每个样本的序列之间不会重置细胞状态——如果我错了，请纠正我！)，状态是否应该在每个时期或每个样本基础上重置?

Suppose I want to train a stateful LSTM (going by keras definition, where "stateful = True" means that cell states are not reset between sequences per sample -- please correct me if I'm wrong!), are states supposed to be reset on a per epoch basis or per sample basis?

示例:

for e in epoch: for m in X.shape[0]: #for each sample for n in X.shape[1]: #for each sequence #train_on_batch for model... #model.reset_states() (1) I believe this is 'stateful = False'? #model.reset_states() (2) wouldn't this make more sense? #model.reset_states() (3) This is what I usually see...

总而言之，我不确定是否在每个序列或每个时期之后重置状态(在 X 中训练所有 m 个样本之后).

In summary, I am not sure if to reset states after each sequence or each epoch (after all m samples are trained in X).

非常感谢您的建议.

推荐答案

如果您使用 stateful=True，您通常会在每个 epoch 或每两个样本结束时重置状态.如果您想在每个样本后重置状态，那么这相当于仅使用 stateful=False.

If you use stateful=True, you would typically reset the state at the end of each epoch, or every couple of samples. If you want to reset the state after each sample, then this would be equivalent to just using stateful=False.

关于您提供的循环:

for e in epoch: for m in X.shape[0]: #for each sample for n in X.shape[1]: #for each sequence

注意X的维度不完全

(m samples, n sequences, k features)

维度实际上是

(batch size, number of timesteps, number of features)

因此，你不应该有内循环:

Hence, you are not supposed to have the inner loop:

for n in X.shape[1]

现在，关于循环

for m in X.shape[0]

由于在 keras 中自动完成批量枚举，因此您不必也实现此循环(除非您想每隔几个样本重置状态).所以如果你只想在每个 epoch 结束时重置，你只需要外部循环.

since the enumeration over batches is done in keras automatically, you don't have to implement this loop as well (unless you want to reset the states every couple of samples). So if you want to reset only at the end of each epoch, you need only the external loop.

这是此类架构的一个示例(取自 this博文):

Here is an example of such architecture (taken from this blog post):

batch_size = 1 model = Sequential() model.add(LSTM(16, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True)) model.add(Dense(y.shape[1], activation='softmax')) modelpile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) for i in range(300): model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False) model.reset_states()