TensorFlow seq2seq教程中GRU或LSTM单元的大小代表什么？(What does size of the GRU or LSTM cell in the TensorFlow seq

编程入门行业动态更新时间:2024-10-26 06:38:46

TensorFlow seq2seq教程中GRU或LSTM单元的大小代表什么？(What does size of the GRU or LSTM cell in the TensorFlow seq2seq tutorial represent?)

我在TensorFlow教程中使用seq2seq模型，我无法理解一些细节。令我困惑的一件事是细胞的“大小”代表什么。我想我对图像有很高的理解力

我相信这表明编码器最后一步的输出是编码器第一步的输入。在这种情况下，每个框是序列中不同时间步的GRU或LSTM单元。

我也认为我在表面层面理解如下图：来自colah的博客文章关于LSTM和GRU细胞。我的理解是“单元”是一个神经网络，它将输出从一步返回到自身，同时为后续步骤提供新输入。大门控制它“记住”和“忘记”的程度。

我觉得我在这种表面的，高层次的理解和低级细节之间的水平上感到困惑。听起来像单元格的“大小”是sigmoid和tanh框中的节点数。那是对的吗？如果是这样，那与seq2seq模型的输入大小有什么关系？例如，默认词汇表大小为40,000，默认单元格大小为1024.序列中每个步骤的40,000元素单热词汇表向量如何与1024节点内部单元格大小匹配？这是嵌入包装器的作用吗？

最重要的是，增加或减少细胞大小会产生什么影响？更大的细胞会更好地学习嵌入吗？还是在预测产出？都？

I'm working with the seq2seq model in the TensorFlow tutorials, and I'm having trouble understanding some of the details. One thing that is confusing to me is what the "size" of a cell represents. I think I have a high level understanding of images like

I believe this is showing that the output from the last step in the encoder is the input to the first step in the encoder. In this case each box is the GRU or LSTM cell at a different time-step in the sequence.

I also think I understand, at a superficial level, diagrams like this: from colah's blog post about LSTM and GRU cells. My understanding is that a "cell" is a neural network that feeds the output from one step back into itself along with the new input for the subsequent step. The gates control how much it "remembers" and "forgets."

I think I am getting confused at the level between this superficial, high level understanding and the low-level details. It sounds like the "size" of a cell is the number of nodes in the sigmoid and tanh boxes. Is that correct? If so, how does that relate to the input size for the seq2seq model? For example, the default vocabulary size is 40,000, and the default cell size is 1024. How does the 40,000 element one-hot vocabulary vector for each step of the sequence get matched to the 1024 node internal cell size? Is that what the embedding wrapper does?

Most importantly, what effect would increasing or decreasing the size of the cell have? Would a larger cell be better at learning embeddings? Or at predicting outputs? Both?

最满意答案

听起来像单元格的“大小”是sigmoid和tanh框中的节点数。那是对的吗？

小区的大小是RNN状态向量h的大小。在LSTM的情况下，它也是c的大小。它不是“节点数”（我不确定你的节点是什么意思）。

如果是这样，那与seq2seq模型的输入大小有什么关系？例如，默认词汇表大小为40,000，默认单元格大小为1024.序列中每个步骤的40,000元素单热词汇表向量如何与1024节点内部单元格大小匹配？

模型的输入大小与状态大小无关。将两个向量（输入和状态）连接起来并乘以形状矩阵[state_size + input_size，state_size]以获得下一个状态（简化版本）。

这是嵌入包装器的作用吗？

不，嵌入是在进行乘法之前将1-hot输入向量与大小为[vocab_size，input_size]的矩阵相乘的结果。

It sounds like the "size" of a cell is the number of nodes in the sigmoid and tanh boxes. Is that correct?

The size of the cell is the size of the RNN state vector h. In the case of LSTM it's also the size of c. It's not "the number of nodes" (I'm not sure what you mean by nodes).

If so, how does that relate to the input size for the seq2seq model? For example, the default vocabulary size is 40,000, and the default cell size is 1024. How does the 40,000 element one-hot vocabulary vector for each step of the sequence get matched to the 1024 node internal cell size?

The input size for the model is independent of the state size. The two vectors (input and state) are concatenated and multiplied by a matrix of shape [state_size + input_size, state_size] to get the next state (simplified version).

Is that what the embedding wrapper does?

No, the embedding is the result of multiplying the 1-hot input vector with a matrix of size [vocab_size, input_size], before doing the multiplication.

更多推荐

本文发布于:2023-07-05 15:51:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1039150.html