Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem loading the Data #51

Open
dudany opened this issue Sep 9, 2020 · 1 comment
Open

Problem loading the Data #51

dudany opened this issue Sep 9, 2020 · 1 comment

Comments

@dudany
Copy link

dudany commented Sep 9, 2020

Hi, i've been trying to activate the full PyTorch model, but i had issue with the data.
I loaded the whole data you provided and added it to the signed location path, then after the train.py, i was running those commands and i got those errors, i hope you could help me out with them:

!python repr_code.py --model JointEmbeder --reload_from 340000

NumExpr defaulting to 2 threads.
Constructing Model..
loading data...
tcmalloc: large alloc 1116725248 bytes == 0xe93ea000 @  0x7f87e6b8d1e7 0x7f87e46f35e1 0x7f87e475a420 0x7f87e47e7f87 0x50a7f5 0x50c1f4 0x507f24 0x509277 0x594b01 0x54a17f 0x5517c1 0x5a9eec 0x50a783 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x50b053 0x634dd2 0x634e87 0x63863f 0x6391e1 0x4b0dc0 0x7f87e678ab97 0x5b26fa
tcmalloc: large alloc 1365450752 bytes == 0x140276000 @  0x7f87e6b8d1e7 0x7f87e46f35e1 0x7f87e475a420 0x7f87e47e7f87 0x50a7f5 0x50c1f4 0x507f24 0x509277 0x594b01 0x54a17f 0x5517c1 0x5a9eec 0x50a783 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x50b053 0x634dd2 0x634e87 0x63863f 0x6391e1 0x4b0dc0 0x7f87e678ab97 0x5b26fa
16262602 entries
 12% 199/1627 [03:24<25:10,  1.06s/it]tcmalloc: large alloc 4096000000 bytes == 0x7f83218e0000 @  0x7f87e6b8d1e7 0x7f87e46f35e1 0x7f87e4757c78 0x7f87e4757d93 0x7f87e47f5ea8 0x7f87e47f6704 0x7f87e47f6852 0x567193 0x59fe1e 0x7f87e47434ed 0x50a47f 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x588e91 0x59fe1e 0x7f87e47434ed 0x50a47f 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24 0x509c50 0x50a64d 0x50c1f4 0x507f24
^C

!python search.py --model JointEmbeder --reload_from 340000

NumExpr defaulting to 2 threads.
Constructing Model..
Loading codebase (chunk size=2000000)..
Traceback (most recent call last):
  File "search.py", line 137, in <module>
    "inconsistent number of chunks, check whether the specified files for codebase and code vectors are correct!"    
AssertionError: inconsistent number of chunks, check whether the specified files for codebase and code vectors are correct!
@guxd
Copy link
Owner

guxd commented Sep 11, 2020

The first error seems to be the root cause. Probably because your machine has a small memory to store temporary code vectors. You can try to reduce the chunk size, for example, from 2,000,000 to 200,000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants