Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unpickling #21

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Fix unpickling #21

wants to merge 3 commits into from

Conversation

LucasAdams
Copy link

UnicodeDecodeError would be thrown if the file is open()ed without byte mode or if no coding is passed to pickle.load.

UnicodeDecodeError would be thrown if the file is open()ed without byte mode or if no coding is passed to pickle.load.
@ebenolson
Copy link
Member

Thanks for the fix, but I think this won't work in 2.7. Do you know of a solution that will work in both?

@aleksaro
Copy link

This is how it is done for the MNIST example in the main library: mnist.py
With the model zoo in mind, It might be a good idea to have some IO functionality in Lasagne that handles this (possibly switch to HDF5 with h5py, though this might be overkill).

@f0k
Copy link
Member

f0k commented Sep 15, 2015

It might be a good idea to have some IO functionality in Lasagne that handles this

I think that's outside the scope of Lasagne, we'd basically be providing a subset of six then. A better idea would be to place a general "download and load MNIST" function in Lasagne/Recipes that can be imported (see #18).

possibly switch to HDF5 with h5py

The reason we're using a pickle is because the dataset is provided as a pickle. If we switch to HDF5, we need to host the download ourselves, and we'd create an unnecessary extra dependency just for the MNIST dataset. We could switch to using the raw data from http://yann.lecun.com/exdb/mnist/ which doesn't need to be unpickled, but it's spread over multiple files and it's a bit more complicated to read in, so we'd end up needing a central "download and load MNIST" function anyway.

/edit: Oh wait, I thought this was about MNIST, but that was somewhere else. For the model zoo, we could switch to np.savez / np.load to avoid the unpickling problem. I still don't think we should provide an unpickling wrapper in Lasagne just for Python2/Python3 compatibility...

@benanne
Copy link
Member

benanne commented Sep 15, 2015

I still don't think we should provide an unpickling wrapper in Lasagne just for Python2/Python3 compatibility...

Agreed, that doesn't belong in there.

@aleksaro
Copy link

My apologies, you are correct, I was talking about standardising (stored) model parameters that belong to the model zoo.

Yes, we are thinking about making the model zoo an importable package (#18), and I think a better standard for distributing the weights is important for that.

@ebenolson
Copy link
Member

The reason we're using a pickle is because the dataset is provided as a pickle. If we switch to HDF5, we need to host the download ourselves, and we'd create an unnecessary extra dependency just for the MNIST dataset. We could switch to using the raw data from http://yann.lecun.com/exdb/mnist/ which doesn't need to be unpickled, but it's spread over multiple files and it's a bit more complicated to read in, so we'd end up needing a central "download and load MNIST" function anyway.

I'm not sure I agree with this - we've already had one instance where the examples broke because the MNIST host was down for an extended period. It's also not a great security practice to direct people to download third-party pickles (or really distribute pickles at all). But this isn't really the place to discuss that :)

@f0k
Copy link
Member

f0k commented Sep 15, 2015

I'm not sure I agree with this - we've already had one instance where the examples broke because the MNIST host was down for an extended period. It's also not a great security practice to direct people to download third-party pickles (or really distribute pickles at all).

Sure, I'm all for switching to the original source -- I just meant it would make the "download and load MNIST" code more complicated and even less suited to duplicate between notebooks, so we should probably figure out #18 before doing so.

But this isn't really the place to discuss that :)

Too late :)

@LucasAdams
Copy link
Author

For WinPython 2.7, pickle.load(open(...,'rb')) is preferred:

>>> import pickle, sys
>>> print(sys.version_info)
sys.version_info(major=2, minor=7, micro=10, releaselevel='final', serial=0)
>>> values = pickle.load(open('vgg19_normalized.pkl'))['param values']
TypeError: ord() expected a character, but string of length 0 found
>>> values = pickle.load(open('vgg19_normalized.pkl','rb'))['param values'] #success
>>> values = pickle.load(open('vgg19_normalized.pkl','rb'),encoding='Latin-1')['param values']
TypeError: load() got an unexpected keyword argument 'encoding'

For WinPython 3.4, pickle.load(open(...,'rb'),encoding='Latin-1') is prefered:

>>> import pickle, sys
>>> print(sys.version_info)
sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)
>>> values = pickle.load(open('vgg19_normalized.pkl'))['param values']
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3634: character maps to <undefined>
>>> values = pickle.load(open('vgg19_normalized.pkl','rb'))['param values']
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbc in position 1: ordinal not in range(128)
>>> values = pickle.load(open('vgg19_normalized.pkl','rb'),encoding='Latin-1')['param values'] #success

Due to the cryptic error messages, I spent the better part of 5 hours re-downloading, taking apart the pkl file format and iterating through various locales and file encodings before I trialed the encoding= trick by dumb luck. I'm not saying that Lasagne should hold my hand but a commented line with the alternative loading format would give a pretty big hint on how to debug it.

Thanks everyone for contributing to Lasagne, its a great library.

@ebenolson
Copy link
Member

Yes, it's a very annoying problem that's caused me trouble before as well. Ultimately I think we'll want to move away from pickles, but I'm not sure what the best option is yet.

If you have a chance, could you amend your PR to something like this, so that it will work in both?

try:
    values = pickle.load(open('vgg19_normalized.pkl','rb'))['param values']
except UnicodeDecodeError:
    values = pickle.load(open('vgg19_normalized.pkl','rb'),encoding='Latin-1')['param values'] # python3

@f0k
Copy link
Member

f0k commented Sep 15, 2015

Looks good! Can you fix examples/styletransfer/Art Style Transfer.ipynb in the same way? And when you're done (and have committed everything you want to commit), could you squash your commits into one? (git reset --soft 1287712; git commit --amend --no-edit; git push --force)

@f0k f0k mentioned this pull request Sep 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants