v0.6.0
We are pleased to announce version 0.6.0 of ACHE Focused Crawler. Here we list the major changes since last version.
New features, improvements and bug fixes:
- Implementation of SeedFinder algorithm, which leverages search engine's APIs to automatically create a large and diverse seed URL set to start to bootstrap the crawler.
- Added flexible way to different handlers for different types of links, which will allow to have different extractors for each content type such as HTML, media files, XML sitemaps, etc.
- Support for sitemap.xml protocol, which allows the crawler automatically discover all links along with some metadata specified by webmasters.
- More bug fixes and code refactoring.
- More unit tests and integration tests (coverage raised to 42%)