Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Improved relative link handling * New Group type, docs, tests that didnt make it pre namespace change * Introduces allowed_classes filtering. Fixes encoding issues * UnwrapLinks processor * More comprehensive unwrap links but still WIP * Add option for referer to fetcher * Unescape slashes on json output * Whitespace leave one space * Latest group type * namespace exception * Support for uuidv3 on group item content and json output to be consumed as paragraphs in Drupal world * Allow generic any name of output * Optionally use Guzzle redirect info for speed * Use Guzzle redirect * Check a url exists in cache and report path * Use Guzzle redirect info * composer * Group crawl by query string * Track redirects on crawl in Guzzle * Add mandatory support for field in group * Build effective after redirect url lists * Option to use effective url in fetcher if redirect * Group crawled urls by regex * PHP warnings * More unicode fixing * More options and features * More unicode fixes * Pass in whole object to callback * Fix redirect check * More unicoe support * More unicode support * More unicode support * Add method to return console io * General group_uuid instead of paragraph * Support for extra media attributes * Use results from fetcher, remove JSON UTF8 error check * New sub_fetch processor to fetch and process an URL. Nested Merls. * Proper check for config and rename entity based on config * Track what page media was on * Support for a prebuilt alias map * comment * Generator for mappings * spelling * Array config holder for sub_fetch processor * composer * Moved uuid generation to standard MerlinUuid method. * Unicode menu links * process_file for xpath Type/Media * Comment typo * Better error reporting for SubFetch. WIP Still needs a bit more tidying up in the case the fetched thing wasn't TEXT/HTML. * Use v4 ip resolve for Curl options, a lot faster * Use v4 ip resolve for Curl options, a lot faster * Resolve robots.txt ignore. * Fixed ordered type to emit the field name as well. * Updated error message to be more descriptive. * Allowed to have dot in cache dir name. * Fixing ordered. * Allowed to group URLs by the value of a meta tag. * Added a URL options flag to control content duplicates for redirects. * Print url cache path from CLI exists lookup. * Remove alpha UnwrapLinks type. * MD rendering. * Linting. * Remove old unused functions. * Fix existing tests. * Linting. * Remove old getMapping(). * Comment typo. * Return original reset comment. * phpcs * Add cURL IP resolve method as option. * Default address IP resolve to any/whatever. * Update Fetcher Docs. * Make some feature of group optional. * Update Group type tests and docs. * Docs update. * Pass same config object to Output as used in GenerateCommand * Rename _redirected_from. Add curl ip resolve func. * Use ip resolve func. * Getter for multicurl object * Separate build duplicates function * Options for SubFetch. * phpcs, typos * Save sub fetch status error similar to normal fetch. * Subfetch tests. * Composer update. * Typo and missing JSON files for subfetch test. * sub_fetch processor docs. * Add is_external flag to redirect info. * Only add internal or non redirect links to queue when loading from cache. * Only add redirect to effective url list if internal. * Update browsershot for dependencies vulnerability. * Minor package update. Moved from drupal-entity to drupal-media tags. * Updated packages. * MediaNullAttributeTest update. * Update tests. * Use puppeteer orb. * Remove orb in favour of hardcoding. * Apt-update. * Update to non-strech debian. * Add the google signing key Co-authored-by: Andrew Rowlands <[email protected]> Co-authored-by: Stuart Rowlands <[email protected]> Co-authored-by: Stuart Rowlands <[email protected]> Co-authored-by: Suchi Garg <[email protected]> Co-authored-by: Sonny Kieu <[email protected]> Co-authored-by: Stuart Rowlands <[email protected]>
- Loading branch information