Skip to content

v0.6.0

Compare
Choose a tag to compare
@aecio aecio released this 09 Jul 02:38
· 737 commits to master since this release

We are pleased to announce version 0.6.0 of ACHE Focused Crawler. Here we list the major changes since last version.

New features, improvements and bug fixes:

  • Implementation of SeedFinder algorithm, which leverages search engine's APIs to automatically create a large and diverse seed URL set to start to bootstrap the crawler.
  • Added flexible way to different handlers for different types of links, which will allow to have different extractors for each content type such as HTML, media files, XML sitemaps, etc.
  • Support for sitemap.xml protocol, which allows the crawler automatically discover all links along with some metadata specified by webmasters.
  • More bug fixes and code refactoring.
  • More unit tests and integration tests (coverage raised to 42%)