Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current onLoadFinished fail handling is needlessly disruptive #296

Open
uladkasach opened this issue Jul 10, 2017 · 2 comments
Open

Current onLoadFinished fail handling is needlessly disruptive #296

uladkasach opened this issue Jul 10, 2017 · 2 comments

Comments

@uladkasach
Copy link

uladkasach commented Jul 10, 2017

In my use case, onLoadFinished returns with fail atleast 75% of the time. While i have not dove deep enough to understand why, by modifying https://github.com/johntitus/node-horseman/blob/master/lib/index.js#L291-L294 to not reject the promise on fail (commenting out the rejection line) i've been able to acheive my goals without any hinderance.

Question : Is it possible to make this rejection discretionary in the module?

Bonus : why does horseman reject the promise when page.onLoadFinished fails?

The failure always results in the following error, triggered at seemingly random parts of the code each time:

  horseman phantomjs onLoadFinished triggered fail 4 +343ms
Unhandled rejection Error: Failed to load url
    at checkStatus (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:292:16)
From previous event:
    at Object.loadFinishedSetup [as onLoadFinished] (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:290:43)
    at /var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
    at Array.forEach (native)
    at IncomingMessage.<anonymous> (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
    at emitNone (events.js:110:20)
    at IncomingMessage.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1047:12)
    at _combinedTickCallback (internal/process/next_tick.js:102:11)
    at process._tickCallback (internal/process/next_tick.js:161:9)

The full debug output, before modification of the lib/index.js file, is

  horseman using PhantomJS from phantomjs-prebuilt module +0ms
  horseman .setup() creating phantom instance 1 +4ms
  horseman .viewport() set 1300 900 +10ms
  horseman phantom created +116ms
  horseman phantom version 2.1.1 +13ms
  horseman page created +8ms
  horseman phantomjs onLoadFinished triggered success NaN +10ms
  horseman injected jQuery +20ms
  horseman .on() error set. +3ms
  horseman .on() resourceError set. +1ms
  horseman .on() loadFinished set. +0ms
  horseman .on() urlChanged set. +0ms
  horseman .userAgent() set Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 +11ms
Opening Trip Advisor...
  horseman .open() https://www.tripadvisor.com/Hotels +2ms
 (!) Phantom Url Changed to : https://www.tripadvisor.com/Hotels
  horseman phantomjs onLoadFinished triggered success 1 +2s
  horseman injected jQuery +25ms
Opened Trip Advisor. Waiting for everything to load...
  horseman .waitForNextPage() +8ms
  horseman .waitForNextPage() completed successfully +52ms
  horseman .wait() 21 +2ms
Searching for city New York...
  horseman .waitForSelector() div[data-placement-name='masthead_search'] .search undefined +50ms
  horseman .waitFor() elementPresent div[data-placement-name='masthead_search'] .search +0ms
  horseman:verbose .waitFor() iteration elementPresent true 51 1 +85ms
  horseman .waitFor() completed successfully +1ms
  horseman .waitForSelector() complete +0ms
Clicking the search opening button...
  horseman .click() div[data-placement-name='masthead_search'] .search +0ms
  horseman .click() done +20ms
  horseman .waitForNextPage() +0ms
  horseman phantomjs onLoadFinished triggered success 2 +3s
  horseman jQuery not injected - already exists on page +23ms
  horseman .waitForNextPage() completed successfully +20ms
Entering Search Values...
  horseman .clear() input#mainSearch +0ms
  horseman .value() input#mainSearch  +0ms
  horseman .type() input#mainSearch hotels undefined +11ms
  horseman .keyboardEvent() keypress h null +22ms
  horseman .keyboardEvent() keypress o null +6ms
  horseman .keyboardEvent() keypress t null +5ms
  horseman .keyboardEvent() keypress e null +4ms
  horseman .keyboardEvent() keypress l null +6ms
  horseman .keyboardEvent() keypress s null +7ms
  horseman .clear() input#GEO_SCOPED_SEARCH_INPUT +0ms
  horseman .value() input#GEO_SCOPED_SEARCH_INPUT  +0ms
  horseman .type() input#GEO_SCOPED_SEARCH_INPUT New York undefined +6ms
  horseman .keyboardEvent() keypress N null +32ms
  horseman .keyboardEvent() keypress e null +8ms
  horseman .keyboardEvent() keypress w null +8ms
  horseman .keyboardEvent() keypress   null +8ms
  horseman .keyboardEvent() keypress Y null +4ms
  horseman .keyboardEvent() keypress o null +6ms
  horseman .keyboardEvent() keypress r null +6ms
  horseman .keyboardEvent() keypress k null +6ms
  horseman .wait() 500 +0ms
  horseman:verbose onConsoleMessage Facebook Pixel Warning: You are sending a non-standard event 'LogAttribution'. The preferred way to send events is using trackCustom. See https://www.facebookmarketingdevelopers.com/pixels/up#sec-custom for more information line: undefined in undefined 1 +325ms
  horseman .waitForNextPage() +176ms
  horseman phantomjs onLoadFinished triggered success 3 +330ms
  horseman jQuery not injected - already exists on page +15ms
  horseman .waitForNextPage() completed successfully +9ms
Selecting first search result...
  horseman .exists() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +2ms
  horseman .count() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +0ms
true
  horseman .waitForSelector() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill undefined +25ms
  horseman .waitFor() elementPresent .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +0ms
  horseman:verbose .waitFor() iteration elementPresent true 51 1 +78ms
  horseman .waitFor() completed successfully +0ms
  horseman .waitForSelector() complete +1ms
  horseman .exists() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +0ms
  horseman .count() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +0ms
true
  horseman .click() .resultContainer.local .displayItem.result:eq(0) .map-pin-fill +16ms
  horseman .click() done +21ms
  horseman .wait() 500 +0ms
  horseman phantomjs onLoadFinished triggered fail 4 +343ms
Unhandled rejection Error: Failed to load url
    at checkStatus (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:292:16)
From previous event:
    at Object.loadFinishedSetup [as onLoadFinished] (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:290:43)
    at /var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
    at Array.forEach (native)
    at IncomingMessage.<anonymous> (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
    at emitNone (events.js:110:20)
    at IncomingMessage.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1047:12)
    at _combinedTickCallback (internal/process/next_tick.js:102:11)
    at process._tickCallback (internal/process/next_tick.js:161:9)

Clicking final search button...
  horseman .exists() #SEARCH_BUTTON_CONTENT .search +159ms
  horseman .count() #SEARCH_BUTTON_CONTENT .search +0ms
Catching an error:
Error: Failed to load url
    at checkStatus (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:292:16)
From previous event:
    at Object.loadFinishedSetup [as onLoadFinished] (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:290:43)
    at /var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
    at Array.forEach (native)
    at IncomingMessage.<anonymous> (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
    at emitNone (events.js:110:20)
    at IncomingMessage.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1047:12)
    at _combinedTickCallback (internal/process/next_tick.js:102:11)
    at process._tickCallback (internal/process/next_tick.js:161:9)
Trying again...
  horseman .close(). +11ms
Unhandled rejection Error: Failed to load url
    at checkStatus (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:292:16)
From previous event:
    at Object.loadFinishedSetup [as onLoadFinished] (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-horseman/lib/index.js:290:43)
    at /var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
    at Array.forEach (native)
    at IncomingMessage.<anonymous> (/var/www/git/NLP/Hotel-Reviews-Scraper/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
    at emitNone (events.js:110:20)
    at IncomingMessage.emit (events.js:207:7)
    at endReadableNT (_stream_readable.js:1047:12)
    at _combinedTickCallback (internal/process/next_tick.js:102:11)
    at process._tickCallback (internal/process/next_tick.js:161:9)

And the a code sample which reproduces the error is the following :

// load horseman
var Horseman = require('node-horseman');
Horseman.registerAction("open_the_city", open_the_city)

var horseman = new Horseman({timeout : 10000, ignoreSSLErrors: true});
horseman.on('error', (err) => {
    //console.log("  regular error caught");
    //console.log(err);
})
horseman.on('resourceError', (err) => {
    //console.log("  resource error caught");
    //console.log(err);
})
horseman.on('loadFinished', (status) => {
    //console.log("  resource load: " + status);
    if(status == "fail"){
        console.log("Load finished status was fail. Weird.")
    }
})
horseman.on("urlChanged", (new_url)=>{
    console.log(" (!) Phantom Url Changed to : " + new_url);
})

var current_city = "New York"


horseman // define new horseman which waits atmost 10 seconds for loading
    .viewport(1300, 900)         // define a viewport
    .userAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36') // define a user agent
    .then(()=>{ // output opening
        console.log("Opening Trip Advisor...");
        return horseman;
    })
    .open('https://www.tripadvisor.com/Hotels')  
    .then(()=>{ // output opening
        console.log("Opened Trip Advisor. Waiting for everything to load...");
        return horseman;
    })
    .waitForNextPage({timeout: 15000}) // wait up to 15 seconds for first page

    // search for and open the city
    .open_the_city(current_city) // ----- throws {type: "no_matches"} error if city has not autocomplete options

    .close()


function open_the_city(city_to_open){
    // define horseman object explicitly
    var horseman = this;

    var open_search_area_button_selector = "div[data-placement-name='masthead_search'] .search";
    var first_result_choice_selector = ".resultContainer.local .displayItem.result:eq(0) .map-pin-fill";
    var final_search_button_selector = "#SEARCH_BUTTON_CONTENT .search";

    /*
    element = document.querySelector("input#GEO_SCOPED_SEARCH_INPUT")
    element.addEventListener("focus", function(){
        console.log(document.querySelector(".results_panel .where_results").innerHTML)

    })
    element.addEventListener("click", function(){
        console.log(document.querySelector(".results_panel .where_results").innerHTML)

    })
    */

    return this
        .wait(21)

        .then(()=>{
            console.log("Searching for city " + city_to_open + "...");
            return horseman;
        })

         // click to open search area and wait to load
        .waitForSelector(open_search_area_button_selector)
        .then(()=>{
            console.log("Clicking the search opening button...");
            return horseman;
        })
        .click(open_search_area_button_selector)
        .waitForNextPage()


        // type in search values and wait untill results load
        .then(()=>{
            console.log("Entering Search Values...");
            return horseman;
        })
        .clear('input#mainSearch')
        .type('input#mainSearch', "hotels")
        .clear('input#GEO_SCOPED_SEARCH_INPUT')
        .type('input#GEO_SCOPED_SEARCH_INPUT', city_to_open)
        .wait(500)
        .waitForNextPage()

        .then(()=>{
            console.log("Selecting first search result...");
            return horseman;
        })
        // select first autocomplete recomendation

        .exists(first_result_choice_selector)
        .log()
        .waitForSelector(first_result_choice_selector) // wait for autocomplete elements to appear
        .exists(first_result_choice_selector)
        .log()
        .click(first_result_choice_selector)
        .wait(500) // wait after clicking first choice

        // click search button
        .then(()=>{
            console.log("Clicking final search button...");
            return horseman;
        })
        .exists(final_search_button_selector)
        .log()
        .click(final_search_button_selector)
        .waitForNextPage()


        // new page openning
        .then(()=>{
            console.log("City page was opened. Waiting for new page to load...");
            return horseman;
        })
        .then((data)=>{
            console.log("Assigning error handler.");
            horseman.on('resourceError', (err) => {
                console.log("resource error caught");
                console.log(err);
            })
            return horseman;
        })
        //.waitForNextPage()
        .wait(5000)


        // snapshot and exit
        .screenshot("page.jpg")
        .then(()=>{
            console.log("Completed...");
            process.exit();
        })
        .catch((e)=>{
            console.log("Catching an error:")
            console.log(e);
            console.log("Trying again...");
        })

}
@marcelinhov2
Copy link

any update?

@johntitus
Copy link
Owner

Sorry, but both Alex and I have stopped supporting this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants