You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This component is intended to handle all sources and execute the appropriate scraper, given the source. The idea is for this to be the frame under which all scrapers are invoked.
Proposed Solution
The scraper manager will be given a list of all links that need to be visited by the corresponding scraper and return a list where each element is a dictionary with [{source: url_of_source, result: content}]. this manager will interact closely to Angostura.
In: a list of sources to be scraped.
Result: a list with the content scrapped.
This assumes each source might have its own implementation of the scraper. For this it would be crucial that each scraper is implemented using an interface (following a contract, see #66), such interface shall include a scrape method. Abstract Classes in python
Questions to be answered:
how do we know the page has articles or posts of interests? - by the url?
is there a way we can guarantee the content is related?
In order to answer these questions, no nlp should be required. This is still part of the scraping component.
The text was updated successfully, but these errors were encountered:
Problem
This component is intended to handle all sources and execute the appropriate scraper, given the source. The idea is for this to be the frame under which all scrapers are invoked.
Proposed Solution
The scraper manager will be given a list of all links that need to be visited by the corresponding scraper and return a list where each element is a dictionary with [{source: url_of_source, result: content}]. this manager will interact closely to Angostura.
In: a list of sources to be scraped.
Result: a list with the content scrapped.
This assumes each source might have its own implementation of the scraper. For this it would be crucial that each scraper is implemented using an interface (following a contract, see #66), such interface shall include a scrape method.
Abstract Classes in python
Questions to be answered:
In order to answer these questions, no nlp should be required. This is still part of the scraping component.
The text was updated successfully, but these errors were encountered: