Text extraction for Apache Solr + TYPO3

This TYPO3 extension provides a hook/aspect that uses the signal of ext:solrfal during indexing to extract the contents of known text files.

It uses the binary pdftotext for this (when present on the machine) and has a fallback to the standalone apache Tika jar (when present on the system).

There are some additional checks when processing pdf files to determine if the contents is encrypted. If encrypted it tries the fallback to tika.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Classes		Classes
Configuration/TCA/Overrides		Configuration/TCA/Overrides
README.md		README.md
composer.json		composer.json
ext_conf_template.txt		ext_conf_template.txt
ext_emconf.php		ext_emconf.php
ext_icon.png		ext_icon.png
ext_localconf.php		ext_localconf.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text extraction for Apache Solr + TYPO3

About

Releases 4

Packages

Contributors 2

Languages

beechit/solrfal-textextract

Folders and files

Latest commit

History

Repository files navigation

Text extraction for Apache Solr + TYPO3

About

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages