|
In the regular Backpropagation Through Time (BPTT), the backward pass is about as efficient as the forward pass, since both use each weight once for each time step and sequence. So, even for a sequence of length 1000, the full forward pass is done, followed by the backward pass, and only then the weights are updated. In Truncated BPTT, the weights are updated at every time step, and the backward pass is truncated to about 30 most recent time steps or fewer [1]. Does this mean that with the truncated BPTT, the backward pass is about 30 times as expensive as the forward pass, or am I misunderstanding something? [1] Haykin, Neural Networks, A Comprehensive Foundation |