Skip to content
forked from MIR-MU/MIaSMath

Adds math processing capabilities to Lucene or Solr

License

Notifications You must be signed in to change notification settings

fsi-open/MIaSMath

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIaSMath – Math processing for Lucene / Solr

ci

MIaSMath is a math processing plugin for Lucene or Solr.

Usage

To integrate MIaSMath including MathTokenizer into a Solr instance:

  1. Copy the following libraries to the solr/lib directory:
  1. Configure the following attributes in schema.xml for the tokenizer MathTokenizer:
  • subformulaetrue for analyzer type index, and false for analyzer type query, as follows:

    <fieldType name="math" class="solr.TextField">
      <analyzer type="index">
        <tokenizer class="cz.muni.fi.mias.MathTokenizerFactory" subformulae="true"/> 
      </analyzer>
      <analyzer type="query">
        <tokenizer class="cz.muni.fi.mias.MathTokenizerFactory" subformulae="false"/> 
      </analyzer>
    </fieldType>
  • Declare a field for storing math as follows:

    <field name="math" type="math" indexed="true" stored="false" multiValued="true" />

That's it. You can now run your Solr instance and test MathTokenizer in the analysis interface.

Citing MIaSMath

Text

SOJKA, Petr and Martin LÍŠKA. The Art of Mathematics Retrieval. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA: ACM, 2011. p. 57–60. ISBN 978-1-4503-0863-2. doi:10.1145/2034691.2034703.

BibTeX

@inproceedings{doi:10.1145:2034691.2034703,
     author = "Petr Sojka and Martin L\'{i}\v{s}ka",
      title = "{The Art of Mathematics Retrieval}",
  booktitle = "{Proceedings of the ACM Conference on Document Engineering,
  		DocEng 2011}",
  publisher = "{Association of Computing Machinery}",
    address = "{Mountain View, CA}",
       year = 2011,
      month = Sep,
       isbn = "978-1-4503-0863-2",
      pages = "57--60",
        url = {http://doi.acm.org/10.1145/2034691.2034703},
        doi = {10.1145/2034691.2034703},
   abstract = {The design and architecture of MIaS (Math Indexer and Searcher), 
	       a system for mathematics retrieval is presented, and design 
	       decisions are discussed. We argue for an approach based on 
	       Presentation MathML using a similarity of math subformulae. The 
	       system was implemented as a math-aware search engine based on the 
	       state-of-the-art system Apache Lucene. Scalability issues were 
	       checked against more than 400,000 arXiv documents with 158 
	       million mathematical formulae. Almost three billion MathML 
	       subformulae were indexed using a Solr-compatible Lucene.},
}

About

Adds math processing capabilities to Lucene or Solr

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%