Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded #198

Open
jialu-stellar-xia opened this issue Apr 12, 2021 · 1 comment

Comments

@jialu-stellar-xia
Copy link

jialu-stellar-xia commented Apr 12, 2021

I have this issue when importing the data to the format for LDA. I tried enlarge the MALLET_MEMORY=128G (the memory of my server is also 128G), but it still does not work.
My data contains 6,712,484 documents in one .txt file and its size is 3.07G
I sampled 100 documents to test the script for importing data, it works well. But keep popping this error message when importing my entire data.
Could you please help to figure out the problem? Really appreciate your help!!
截屏2021-04-11 下午8 14 08

@mimno
Copy link
Owner

mimno commented Apr 12, 2021

The "bulk-load" function may be more efficient. But that size collection should definitely fit in 128G. I would suspect that the variable isn't being set in the right way for the shell script to find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants