Automatically extract the full article text from a media webpage #17

digitaldutch · 2024-09-09T10:38:41Z

Currently our spider uses the JSON-LD article tag to find the full text of a media article web page. Problem is that few media websites support this tag. Consequently our volunteers have to manually copy and page the text from the web page to our input field.

Any method (scraping, not yet used tags) that helps to automatically read the full text is welcome.

As roaddanger.org is multilingual, it would be nice if the full text extractor supports multiple languages.

digitaldutch changed the title ~~Automatically extract the full article text from a media page~~ Automatically extract the full article text from a media webpage Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically extract the full article text from a media webpage #17

Automatically extract the full article text from a media webpage #17

digitaldutch commented Sep 9, 2024 •

edited

Loading

Automatically extract the full article text from a media webpage #17

Automatically extract the full article text from a media webpage #17

Comments

digitaldutch commented Sep 9, 2024 • edited Loading

digitaldutch commented Sep 9, 2024 •

edited

Loading