Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getpapers FATAL ERROR when downloading 30k articles or more #64

Open
EmanuelFaria opened this issue Jun 17, 2020 · 9 comments
Open

getpapers FATAL ERROR when downloading 30k articles or more #64

EmanuelFaria opened this issue Jun 17, 2020 · 9 comments
Assignees

Comments

@EmanuelFaria
Copy link
Collaborator

EmanuelFaria commented Jun 17, 2020

Hi @petermr @deadlyvices

I got the error below last night trying to download more than 30k at a pop. When I limit my download to 29k or below (-k 29000), the problem goes away.

I found a discussion online about a fix to a similar-looking error here ... but even if this is useful, I don't know how to implement it. :-/

HERE'S WHAT HAPPENS:

It seems to stall in the "Retrieving results" phase. I get is these incremental updates as below

Retrieving results [==----------------------------] 97%
Retrieving results [==----------------------------] 98%

and then this error message:

<--- Last few GCs --->
at[39301:0x110000000] 739506 ms: Mark-sweep 2056.4 (2064.7) -> 2055.5 (2064.7) MB, 835.1 / 0.0 ms (+ 0.0 ms in 4 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 837 ms) (average mu = 0.082, current mu = 0.002) allocatio[39301:0x110000000] 740780 ms: Mark-sweep 2056.5 (2064.7) -> 2055.7 (2064.9) MB, 1134.1 / 0.0 ms (+ 0.0 ms in 15 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1273 ms) (average mu = 0.098, current mu = 0.109) alloca

<--- JS stacktrace --->

==== JS stack trace =========================================

0: ExitFrame [pc: 0x100950919]
1: StubFrame [pc: 0x1009519b3]

Security context: 0x1e91dd3c08d1
2: write [0x1e91f4b412d9] [/usr/local/lib/node_modules/getpapers/node_modules/xml2js/node_modules/sax/lib/sax.js:~965] [pc=0x33fd2f286764](this=0x1e91b044eec1 ,0x1e9221e00119 <Very long string[8244830]>)
3: /* anonymous */ [0x1e91c4bc7169] [/usr/local/lib/node_modules/getpapers/node_modu...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Writing Node.js report to file: report.20200415.123321.39301.0.001.json
Node.js report completed
1: 0x100080c68 node::Abort() [/usr/local/bin/node]
2: 0x100080dec node::errors::TryCatchScope::~TryCatchScope() [/usr/local/bin/node]
3: 0x100185167 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
4: 0x100185103 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
5: 0x10030b2f5 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
6: 0x10030c9c4 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/usr/local/bin/node]
7: 0x100309837 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
8: 0x1003077fd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
9: 0x100312fba v8::internal::Heap::AllocateRawWithLightRetry(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0x100313041 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
11: 0x1002e035b v8::internal::factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
12: 0x100618718 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
13: 0x100950919 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/usr/local/bin/node]
14: 0x1009519b3 Builtins_StringAdd_CheckNone [/usr/local/bin/node]
15: 0x33fd2f286764
Abort trap: 6

@ziflex
Copy link

ziflex commented Jun 17, 2020

It seems that you have hit NodeJS's (V8) default heap memory limit.

You can try to run the app using the following args to increase the limit:

node --max-old-space-size=8192 ./bin/getpapers.js OTHER_ARGS

Either, there is a memory leak or design flaw that does not use data streaming in order to prevent such errors.

@petermr
Copy link
Owner

petermr commented Jun 17, 2020 via email

@EmanuelFaria
Copy link
Collaborator Author

EmanuelFaria commented Jun 22, 2020

It seems that you have hit NodeJS's (V8) default heap memory limit.

You can try to run the app using the following args to increase the limit:

node --max-old-space-size=8192 ./bin/getpapers.js OTHER_ARGS

Either, there is a memory leak or design flaw that does not use data streaming in order to prevent such errors.

@ziflex Thanks for the tip!

I tried it the terminal command you posted, but it seemed my getpapers.js was hiding. Here's what I got:

Last login: Sun Jun 21 18:37:21 on console
`Mannys-MacBook-Pro:~ emanuelfaria$ node --max-old-space-size=8192 ./bin/getpapers.js OTHER_ARGS
internal/modules/cjs/loader.js:969
throw err;
^

_Error: Cannot find module '/Users/emanuelfaria/bin/getpapers.js'
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:966:15)
at Function.Module.load (internal/modules/cjs/loader.js:842:27)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
at internal/main/run_main_module.js:17:47 {
code: 'MODULE_NOT_FOUND',
requireStack: []
}

So I searched my drive and found it, and updated the snippet like so:

node --max-old-space-size=8192 /usr/local/lib/node_modules/getpapers/bin/getpapers.js OTHER_ARGS

This is the response I received... any idease?

error: No query given. You must provide the --query argument.

@ziflex
Copy link

ziflex commented Jun 23, 2020

By OTHER_ARGS I implied any valid getpapers arguments :)

@petermr
Copy link
Owner

petermr commented Jun 23, 2020 via email

@EmanuelFaria
Copy link
Collaborator Author

By OTHER_ARGS I implied any valid getpapers arguments :)

Ooooohhhhhhh LOL. Thanks!

@EmanuelFaria
Copy link
Collaborator Author

do 1000 papers at a time.

@peter I don't know what curl or ferret means, but I think downloading 1000 at a time, rather than (what looks to me like) processing them 1000 at a time, and then waiting until the end to download them all is a good idea.

@petermr
Copy link
Owner

petermr commented Jun 24, 2020 via email

@EmanuelFaria
Copy link
Collaborator Author

ok. Thanks Peter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants