Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwritable object <userdata> at <?>.callback.self.resnet.DataLoader.threads.__gc__ #82

Open
alecwangcq opened this issue Oct 22, 2016 · 3 comments

Comments

@alecwangcq
Copy link

Hi, I modified the code fb.resnet.torch/dataloader.lua in order to read data triplet by triplet. But I encountered with an confusing error:

FATAL THREAD PANIC: (write) /home/haha/torch/install/share/lua/5.1/torch/File.lua:141: 
Unwritable object <userdata> at <?>.callback.self.resnet.DataLoader.threads.__gc__  

Below is my code...

function DataLoader:run()
   local threads = self.threads
   local size, batchSize = self.__size, self.batchSize
   local perm = torch.randperm(size)

   local tripletList = self:genTriplet()

   local idx, sample = 1, nil
   local function enqueue()
      while idx <= size and threads:acceptsjob() do
         local indices = perm:narrow(1, idx, math.min(batchSize, size - idx + 1))
         threads:addjob(
            function(indices, nCrops, tripletList)
               local sz = indices:size(1) * 3 --should be 3 times as previous, since now it is triplet
               local batch, imageSize
               local target = torch.IntTensor(sz)
               for i, idx in ipairs(indices:totable()) do

                  local idx_anchor = tripletList[idx][1]
                  local idx_positive = tripletList[idx][2]
                  local idx_negative = tripletList[idx][3]

                  local sample_anchor = _G.dataset:get(idx_anchor)   --get images
                  local sample_positive = _G.dataset:get(idx_positive)
                  local sample_negative = _G.dataset:get(idx_negative)


                  local input_anchor = _G.preprocess(sample_anchor.input)
                  local input_positive = _G.preprocess(sample_positive.input)
                  local input_negative = _G.preprocess(sample_negative.input)

                  if not batch then
                     imageSize = input_anchor:size():totable()
                     if nCrops > 1 then table.remove(imageSize, 1) end
                     batch = torch.FloatTensor(sz, nCrops, table.unpack(imageSize))
                  end
                  batch[(i-1)*2 + 1]:copy(input_anchor)
                  batch[(i-1)*2 + 2]:copy(input_positive)
                  batch[self.samples*self.blocks + i]:copy(input_negative)

                  target[(i-1)*2 + 1] = sample_anchor.target
                  target[(i-1)*2 + 2] = sample_positive.target
                  target[self.samples*self.blocks + i] = sample_negative.target

               end
               collectgarbage()
               return {
                  input = batch:view(sz * nCrops, table.unpack(imageSize)),
                  target = target,
               }
            end,
            function(_sample_)
              -- print ('WHAT????')
               sample = _sample_
            end,
            indices,
            self.nCrops,
            tripletList
         )
         idx = idx + batchSize
      end
   end

   local n = 0
   local function loop()
      enqueue()
      if not threads:hasjob() then
         return nil
      end
      threads:dojob()
      if threads:haserror() then
         threads:synchronize()
      end
      enqueue()
      n = n + 1
      return n, sample
   end

   return loop
end

Below is the original code:

function DataLoader:run()
   local threads = self.threads
   local size, batchSize = self.__size, self.batchSize
   local perm = torch.randperm(size)

   local idx, sample = 1, nil
   local function enqueue()
      while idx <= size and threads:acceptsjob() do
         local indices = perm:narrow(1, idx, math.min(batchSize, size - idx + 1))
         threads:addjob(
            function(indices, nCrops)
               local sz = indices:size(1)
               local batch, imageSize
               local target = torch.IntTensor(sz)
               for i, idx in ipairs(indices:totable()) do
                  local sample = _G.dataset:get(idx)
                  local input = _G.preprocess(sample.input)
                  if not batch then
                     imageSize = input:size():totable()
                     if nCrops > 1 then table.remove(imageSize, 1) end
                     batch = torch.FloatTensor(sz, nCrops, table.unpack(imageSize))
                  end
                  batch[i]:copy(input)
                  target[i] = sample.target
               end
               collectgarbage()
               return {
                  input = batch:view(sz * nCrops, table.unpack(imageSize)),
                  target = target,
               }
            end,
            function(_sample_)
               sample = _sample_
            end,
            indices,
            self.nCrops
         )
         idx = idx + batchSize
      end
   end

   local n = 0
   local function loop()
      enqueue()
      if not threads:hasjob() then
         return nil
      end
      threads:dojob()
      if threads:haserror() then
         threads:synchronize()
      end
      enqueue()
      n = n + 1
      return n, sample
   end

   return loop
end
@juesato
Copy link

juesato commented Dec 3, 2016

This error message is saying that the DataLoader object can't be serialized. Since you try to use it as an upvalue when you call, self, this would require the object to be serialized. However, the object can't be serialized because the threads library doesn't know what type self refers to.

Relevant snippet:

         threads:addjob(
            function(indices, nCrops)
                  ...
                  batch[self.samples*self.blocks + i]:copy(input_negative)

I think you have two options here: (1) create the DataLoader object inside the thread, as opposed to trying to serialize an upvalue from the main thread or (2) call require 'DataLoader' inside the thread before trying to add this job. To be clear, (2) involves you moving this code to another file.

@mbcel
Copy link

mbcel commented Dec 16, 2016

I have a similar issue, however, I already required the relevant files that I think are necessary and I still get this error.

Error:
FATAL THREAD PANIC: (addjob) /home/.../lua/5.1/torch/File.lua:141: Unwritable object <userdata> at <?>.callback.self.TrainManager.threadPool.__gc__

I think it's the threadPool object (= threads.Threads(...) ) in my trainManager object that somehow cannot be serialized.

My code:
I have a trainManager object(self) that contains all the necessary objects for training so I use it in the worker thread and afterwards pass it to the main thread.

I have a ThreadManager that initializes all the requires in the threads:

function ThreadManager:__init(options)
  self.threadPool = threads.Threads(
        options.threadNumber,
        function()
          require 'torch'
          local threads = require 'threads'
          require 'image'
          require 'dataManager'
          require 'trainManager'
        end,
        function(threadId)
          local seed = opt.manualSeed + threadId
          torch.manualSeed(seed)
          print("Start of new worker thread with id: " .. threadId)
        end)
end

I then pass the threadPool value to the TrainManager:

function TrainManager:__init(model, options, criterion, optimisationState, dataManager, threadPool)
  self.model = model
  self.options = options
  self.criterion = criterion
  self.optimisationState = optimisationState
  self.dataManager = dataManager
  self.threadPool = threadPool

  -- get model parameters
  local modelParameters, gradientParameters = model:getParameters()
  self.modelParameters = modelParameters
  self.gradientParameters = gradientParameters

  -- allocate gpu tensors
  self.batchInputs = torch.CudaTensor()
  self.batchLabels = torch.CudaTensor()
end

And here I add jobs to the worker thread where the code fails:

function TrainManager:test()
  local model = self.model
  local options = self.options
  local dataManager = self.dataManager
  local threadPool = self.threadPool

  cutorch.synchronize()

  --set dropouts to evaluation mode
  model:evaluate()


  local totalLoss = 0
  local setSize = dataManager:getValidationSize()
  for i=1,  math.ceil(setSize/options.batchSize) do
    -- builds up the testing queue
    local startIndex = (i - 1) * options.batchSize + 1
    local endIndex = math.min(i * options.batchSize , setSize)
    local batchNumber = i
    threadPool:addjob(
      function()
        -- runs on worker thread
        -- loads new test batch
        collectgarbage() -- free unused memory before allocating new batch
        local inputsCpu, labelsCpu =
                      dataManager:getValidationBatch(startIndex, endIndex)
        return self, inputsCpu, labelsCpu, batchNumber, totalLoss
      end,
      -- called on main thread after worker function finished
      function(self, inputsCpu, labelsCpu, batchNumber, totalLoss)
        -- TEST
      end
    )
  end
  ...
end

@jonathantompson
Copy link
Contributor

jonathantompson commented Dec 17, 2016

+1 I am also seeing the same issue. It looks like the gc is not being serialized properly?

Just to be clear: like marcel1991@ I am also correctly initializing all objects on thread startup. The problem seems to be that when an upvalue object's method is called within the thread lambda, for some reason the __gc__ method becomes non-serializable.

I've been trying to debug this for a few hours and I'm making no progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants