Skip to content

dd text extracting for SOLR indexing of FileAbstractionLayer based files in TYPO3 CMS

Notifications You must be signed in to change notification settings

beechit/solrfal-textextract

Repository files navigation

Text extraction for Apache Solr + TYPO3

This TYPO3 extension provides a hook/aspect that uses the signal of ext:solrfal during indexing to extract the contents of known text files.

It uses the binary pdftotext for this (when present on the machine) and has a fallback to the standalone apache Tika jar (when present on the system).

There are some additional checks when processing pdf files to determine if the contents is encrypted. If encrypted it tries the fallback to tika.

About

dd text extracting for SOLR indexing of FileAbstractionLayer based files in TYPO3 CMS

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages