Redesign of the ala-name-matching library

See here for documentation.

This is very much a work in progress. You have been warned

The approach taken is to treat taxonomic information as a vector of evidence and try to find the taxon that has the highest probability of matching all the supplied evidence. To do this, a Bayesian network is used to identify how the various pieces of taxonomic information interact and build a conditional probability graph based on the network. Since handling every possible combination of pieces of evidence would result in a combinatorial explosion, the network is "compiled" into a graph of cause and effect, narrowing conditional probabilities to the smallest possible set of antecendents.

Test indexes

To use the test cases in ala-linnaean module, you will need the corresponding Linnaean and vernacular name indexes. You can get:

These need to be unzipped into /data/lucene

Every time the network definition changes, the indexes change, since the underlying inference model will have also changed. If you get exceptions or errors, you may need a new copy of the index.

Name Matching Libraries

Generic Libraries

These libraries are subject-matter agnostic and can be used to build matching systems for any domain that can be structured into a graph of cause and effect.

Bayesian Core Core classes for describing observable properties and how they can be derived, stored and matched.
Bayesian Lucene A storage implementation using the lucene index and search system.
Bayseian Builder Builder software that will take a Bayesian network and "compile" it into a set of java classes that implement the deductive framework specified by the network. Also, a generic index builder that takes source data and builds a store that can be used to search for matches.
Bayesian Maven Plugin A maven plugin that allows you to embed network building and compilation into your maven build cycle.

Taxonomy-Specific Libraries

These libraries are oriented towards handling generic biological nomenclature, wihout insisting on a specific model. These libraries draw extensively on Darwin Core and the Global Biodiversity Information Facility suite of tools and software.

Taxonomic Tools Utility vocabularies and processing designed to handle biological taxonomy.
Taxonomic Tools Builder Builder processing designed to complement the taxonomic tools.

ALA-Specific Libraries

Libraries that contain the ALA-specific implementation of taxonomy matching. There are three netorks:

The Linnaean network models scientific names based on the Linnaean hierarchy.
The Vernacular network models vernacular (common) names.
The Location network models localities.
ALA Linnaean The classes needed to implement and analyse the Linnaean and vernacular networks and match a search against candidates. A library that allows a client to build a template of known information about a name, search an index built by the ALA builder library and return a "most likely" match to a specific taxon. In particular, it includes more sophisticated post-processing of results to take care of oddities such as parent-child synonyms, misapplied names, etc. This is the library that an application would use to implement name searching.
ALA Linnaean Builder The classes needed to build name indexes for both networks. This also includes some useful tools for testing and analysis.
ALA Taxonomic Tools Useful tools for analysis and comparison
ALA Distribution A packaged distribution of the name index builders and tools.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
ala-distribution		ala-distribution
ala-linnaean-builder		ala-linnaean-builder
ala-linnaean		ala-linnaean
ala-taxonomic-tools		ala-taxonomic-tools
bayesian-builder		bayesian-builder
bayesian-core		bayesian-core
bayesian-lucene		bayesian-lucene
bayesian-maven-plugin		bayesian-maven-plugin
data		data
doc		doc
name-matching-common-test		name-matching-common-test
taxonomic-tools-builder		taxonomic-tools-builder
taxonomic-tools		taxonomic-tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lombok.config		lombok.config
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redesign of the ala-name-matching library

Test indexes

Name Matching Libraries

Generic Libraries

Taxonomy-Specific Libraries

ALA-Specific Libraries

About

Releases

Packages

Contributors 3

Languages

License

AtlasOfLivingAustralia/ala-name-matching-2

Folders and files

Latest commit

History

Repository files navigation

Redesign of the ala-name-matching library

Test indexes

Name Matching Libraries

Generic Libraries

Taxonomy-Specific Libraries

ALA-Specific Libraries

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages