You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my use case, onLoadFinished returns with fail atleast 75% of the time. While i have not dove deep enough to understand why, by modifying https://github.com/johntitus/node-horseman/blob/master/lib/index.js#L291-L294 to not reject the promise on fail (commenting out the rejection line) i've been able to acheive my goals without any hinderance.
Question : Is it possible to make this rejection discretionary in the module?
Bonus : why does horseman reject the promise when page.onLoadFinished fails?
The failure always results in the following error, triggered at seemingly random parts of the code each time:
horseman phantomjs onLoadFinished triggered fail 4 +343ms
Unhandled rejection Error: Failed to load url
at checkStatus (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:292:16)
From previous event:
at Object.loadFinishedSetup [as onLoadFinished] (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:290:43)
at /var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
at Array.forEach (native)
at IncomingMessage.<anonymous> (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
at emitNone (events.js:110:20)
at IncomingMessage.emit (events.js:207:7)
at endReadableNT (_stream_readable.js:1047:12)
at _combinedTickCallback (internal/process/next_tick.js:102:11)
at process._tickCallback (internal/process/next_tick.js:161:9)
The full debug output, before modification of the lib/index.js file, is
horseman using PhantomJS from phantomjs-prebuilt module +0ms
horseman .setup() creating phantom instance 1 +4ms
horseman .viewport() set 1300 900 +10ms
horseman phantom created +116ms
horseman phantom version 2.1.1 +13ms
horseman page created +8ms
horseman phantomjs onLoadFinished triggered success NaN +10ms
horseman injected jQuery +20ms
horseman .on() error set. +3ms
horseman .on() resourceError set. +1ms
horseman .on() loadFinished set. +0ms
horseman .on() urlChanged set. +0ms
horseman .userAgent() set Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 +11ms
Opening Trip Advisor...
horseman .open() https://www.tripadvisor.com/Hotels +2ms
(!) Phantom Url Changed to : https://www.tripadvisor.com/Hotels
horseman phantomjs onLoadFinished triggered success 1 +2s
horseman injected jQuery +25ms
Opened Trip Advisor. Waiting for everything to load...
horseman .waitForNextPage() +8ms
horseman .waitForNextPage() completed successfully +52ms
horseman .wait() 21 +2ms
Searching for city New York...
horseman .waitForSelector() div[data-placement-name='masthead_search'] .search undefined +50ms
horseman .waitFor() elementPresent div[data-placement-name='masthead_search'] .search +0ms
horseman:verbose .waitFor() iteration elementPresent true 51 1 +85ms
horseman .waitFor() completed successfully +1ms
horseman .waitForSelector() complete +0ms
Clicking the search opening button...
horseman .click() div[data-placement-name='masthead_search'] .search +0ms
horseman .click() done +20ms
horseman .waitForNextPage() +0ms
horseman phantomjs onLoadFinished triggered success 2 +3s
horseman jQuery not injected - already exists on page +23ms
horseman .waitForNextPage() completed successfully +20ms
Entering Search Values...
horseman .clear() input#mainSearch +0ms
horseman .value() input#mainSearch +0ms
horseman .type() input#mainSearch hotels undefined +11ms
horseman .keyboardEvent() keypress h null +22ms
horseman .keyboardEvent() keypress o null +6ms
horseman .keyboardEvent() keypress t null +5ms
horseman .keyboardEvent() keypress e null +4ms
horseman .keyboardEvent() keypress l null +6ms
horseman .keyboardEvent() keypress s null +7ms
horseman .clear() input#GEO_SCOPED_SEARCH_INPUT +0ms
horseman .value() input#GEO_SCOPED_SEARCH_INPUT +0ms
horseman .type() input#GEO_SCOPED_SEARCH_INPUT New York undefined +6ms
horseman .keyboardEvent() keypress N null +32ms
horseman .keyboardEvent() keypress e null +8ms
horseman .keyboardEvent() keypress w null +8ms
horseman .keyboardEvent() keypress null +8ms
horseman .keyboardEvent() keypress Y null +4ms
horseman .keyboardEvent() keypress o null +6ms
horseman .keyboardEvent() keypress r null +6ms
horseman .keyboardEvent() keypress k null +6ms
horseman .wait() 500 +0ms
horseman:verbose onConsoleMessage Facebook Pixel Warning: You are sending a non-standard event 'LogAttribution'. The preferred way to send events is using trackCustom. See https://www.facebookmarketingdevelopers.com/pixels/up#sec-custom for more information line: undefined in undefined 1 +325ms
horseman .waitForNextPage() +176ms
horseman phantomjs onLoadFinished triggered success 3 +330ms
horseman jQuery not injected - already exists on page +15ms
horseman .waitForNextPage() completed successfully +9ms
Selecting first search result...
horseman .exists() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +2ms
horseman .count() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +0ms
true
horseman .waitForSelector() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill undefined +25ms
horseman .waitFor() elementPresent .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +0ms
horseman:verbose .waitFor() iteration elementPresent true 51 1 +78ms
horseman .waitFor() completed successfully +0ms
horseman .waitForSelector() complete +1ms
horseman .exists() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +0ms
horseman .count() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +0ms
true
horseman .click() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +16ms
horseman .click() done +21ms
horseman .wait() 500 +0ms
horseman phantomjs onLoadFinished triggered fail 4 +343ms
Unhandled rejection Error: Failed to load url
at checkStatus (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:292:16)
From previous event:
at Object.loadFinishedSetup [as onLoadFinished] (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:290:43)
at /var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
at Array.forEach (native)
at IncomingMessage.<anonymous> (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
at emitNone (events.js:110:20)
at IncomingMessage.emit (events.js:207:7)
at endReadableNT (_stream_readable.js:1047:12)
at _combinedTickCallback (internal/process/next_tick.js:102:11)
at process._tickCallback (internal/process/next_tick.js:161:9)
Clicking final search button...
horseman .exists() #SEARCH_BUTTON_CONTENT .search +159ms
horseman .count() #SEARCH_BUTTON_CONTENT .search +0ms
Catching an error:
Error: Failed to load url
at checkStatus (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:292:16)
From previous event:
at Object.loadFinishedSetup [as onLoadFinished] (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:290:43)
at /var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
at Array.forEach (native)
at IncomingMessage.<anonymous> (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
at emitNone (events.js:110:20)
at IncomingMessage.emit (events.js:207:7)
at endReadableNT (_stream_readable.js:1047:12)
at _combinedTickCallback (internal/process/next_tick.js:102:11)
at process._tickCallback (internal/process/next_tick.js:161:9)
Trying again...
horseman .close(). +11ms
Unhandled rejection Error: Failed to load url
at checkStatus (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:292:16)
From previous event:
at Object.loadFinishedSetup [as onLoadFinished] (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:290:43)
at /var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
at Array.forEach (native)
at IncomingMessage.<anonymous> (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
at emitNone (events.js:110:20)
at IncomingMessage.emit (events.js:207:7)
at endReadableNT (_stream_readable.js:1047:12)
at _combinedTickCallback (internal/process/next_tick.js:102:11)
at process._tickCallback (internal/process/next_tick.js:161:9)
And the a code sample which reproduces the error is the following :
// load horseman
var Horseman = require('node-horseman');
Horseman.registerAction("open_the_city", open_the_city)
var horseman = new Horseman({timeout : 10000, ignoreSSLErrors: true});
horseman.on('error', (err) => {
//console.log(" regular error caught");
//console.log(err);
})
horseman.on('resourceError', (err) => {
//console.log(" resource error caught");
//console.log(err);
})
horseman.on('loadFinished', (status) => {
//console.log(" resource load: " + status);
if(status == "fail"){
console.log("Load finished status was fail. Weird.")
}
})
horseman.on("urlChanged", (new_url)=>{
console.log(" (!) Phantom Url Changed to : " + new_url);
})
var current_city = "New York"
horseman // define new horseman which waits atmost 10 seconds for loading
.viewport(1300, 900) // define a viewport
.userAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36') // define a user agent
.then(()=>{ // output opening
console.log("Opening Trip Advisor...");
return horseman;
})
.open('https://www.tripadvisor.com/Hotels')
.then(()=>{ // output opening
console.log("Opened Trip Advisor. Waiting for everything to load...");
return horseman;
})
.waitForNextPage({timeout: 15000}) // wait up to 15 seconds for first page
// search for and open the city
.open_the_city(current_city) // ----- throws {type: "no_matches"} error if city has not autocomplete options
.close()
function open_the_city(city_to_open){
// define horseman object explicitly
var horseman = this;
var open_search_area_button_selector = "div[data-placement-name='masthead_search'] .search";
var first_result_choice_selector = ".resultContainer.local .displayItem.result:eq(0) .map-pin-fill";
var final_search_button_selector = "#SEARCH_BUTTON_CONTENT .search";
/*
element = document.querySelector("input#GEO_SCOPED_SEARCH_INPUT")
element.addEventListener("focus", function(){
console.log(document.querySelector(".results_panel .where_results").innerHTML)
})
element.addEventListener("click", function(){
console.log(document.querySelector(".results_panel .where_results").innerHTML)
})
*/
return this
.wait(21)
.then(()=>{
console.log("Searching for city " + city_to_open + "...");
return horseman;
})
// click to open search area and wait to load
.waitForSelector(open_search_area_button_selector)
.then(()=>{
console.log("Clicking the search opening button...");
return horseman;
})
.click(open_search_area_button_selector)
.waitForNextPage()
// type in search values and wait untill results load
.then(()=>{
console.log("Entering Search Values...");
return horseman;
})
.clear('input#mainSearch')
.type('input#mainSearch', "hotels")
.clear('input#GEO_SCOPED_SEARCH_INPUT')
.type('input#GEO_SCOPED_SEARCH_INPUT', city_to_open)
.wait(500)
.waitForNextPage()
.then(()=>{
console.log("Selecting first search result...");
return horseman;
})
// select first autocomplete recomendation
.exists(first_result_choice_selector)
.log()
.waitForSelector(first_result_choice_selector) // wait for autocomplete elements to appear
.exists(first_result_choice_selector)
.log()
.click(first_result_choice_selector)
.wait(500) // wait after clicking first choice
// click search button
.then(()=>{
console.log("Clicking final search button...");
return horseman;
})
.exists(final_search_button_selector)
.log()
.click(final_search_button_selector)
.waitForNextPage()
// new page openning
.then(()=>{
console.log("City page was opened. Waiting for new page to load...");
return horseman;
})
.then((data)=>{
console.log("Assigning error handler.");
horseman.on('resourceError', (err) => {
console.log("resource error caught");
console.log(err);
})
return horseman;
})
//.waitForNextPage()
.wait(5000)
// snapshot and exit
.screenshot("page.jpg")
.then(()=>{
console.log("Completed...");
process.exit();
})
.catch((e)=>{
console.log("Catching an error:")
console.log(e);
console.log("Trying again...");
})
}
The text was updated successfully, but these errors were encountered:
In my use case,
onLoadFinished
returns withfail
atleast 75% of the time. While i have not dove deep enough to understand why, by modifying https://github.com/johntitus/node-horseman/blob/master/lib/index.js#L291-L294 to not reject the promise on fail (commenting out the rejection line) i've been able to acheive my goals without any hinderance.Question : Is it possible to make this rejection discretionary in the module?
Bonus : why does horseman reject the promise when
page.onLoadFinished
fails?The failure always results in the following error, triggered at seemingly random parts of the code each time:
The full debug output, before modification of the lib/index.js file, is
And the a code sample which reproduces the error is the following :
The text was updated successfully, but these errors were encountered: