Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad size of gradInput in BiSequencerLM #418

Open
saztorralba opened this issue Jun 12, 2017 · 1 comment
Open

Bad size of gradInput in BiSequencerLM #418

saztorralba opened this issue Jun 12, 2017 · 1 comment

Comments

@saztorralba
Copy link

Hi

I'm trying to use BiSequencerLM to train a network using sequences of different length, but I'm finding an issue in the gradInput of the BiSequencerLM module. When a sequence is shorter than the previous sequence, the number of elements in self.gradInput in the function below is the number of elements of the previous sequence, not the current sequence.

function BiSequencerLM:updateGradInput(input, gradOutput)
   local nStep = #input

   self._mergeGradInput = self._merge:updateGradInput(self._mergeInput, gradOutput)
   self._fwdGradInput = self._fwd:updateGradInput(_.first(input, nStep - 1), _.last(self._mergeGradInput[1], nStep - 1))
   self._bwdGradInput = self._bwd:updateGradInput(_.last(input, nStep - 1), _.first(self._mergeGradInput[2], nStep - 1))

   -- add fwd rnn gradInputs to bwd rnn gradInputs
   for i=1,nStep do
      if i == 1 then
         self.gradInput[1] = self._fwdGradInput[1]
      elseif i == nStep then
         self.gradInput[nStep] = self._bwdGradInput[nStep-1]
      else
         self.gradInput[i] = nn.rnn.recursiveCopy(self.gradInput[i], self._fwdGradInput[i])
         nn.rnn.recursiveAdd(self.gradInput[i], self._bwdGradInput[i-1])
      end
   end
   return self.gradInput
end

I believe this is caused by self.gradInput not being recreated for the new sequence, and hence maintaining the length of the previous sequence. This causes an error when you have further modules down to backpropagate, because their gradOutput is going to have incorrect size (different to their input). This issue can be fixed by setting gradInput to an empty table before the call to updateGradInput. I can do this by accessing the module form my code, but maybe it would be better to just add this line of code

self.gradInput={}

before the for loop (something equivalent would have to be done if working with Tensors instead of Tables).

Or maybe I'm wrong and this is expected behavior and I'm doing something wrong, in which case any advice is appreciated. Thanks!

@murthyrudra
Copy link

Hi,
I'm facing same issue when training a language model (sort of). Please suggest me how do i get this issue resolved. Additionally, i'm using optim package for optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants