Links | |
---|---|
Build Status | |
Current Release | |
Unit Test Coverage | |
Follow |
Idyl NLP is a natural language processing (NLP) framework released under the business-friendly Apache License, version 2.0. The framework features core NLP capabilities such as language detection, sentence extraction, tokenization, and named-entity extraction.
Idyl NLP uses a combination of custom implementations and other open-source projects to perform its tasks. In some cases there are multiple implementations available allowing a choice of which to use. Idyl NLP stands on the shoulders of giants to provide a capable, flexible, and powerful NLP framework.
If you are looking for commercially supported NLP microservices look at the NLP Building Blocks. These applications are powered by Idyl NLP.
Visit the Idyl NLP home page at idylnlp.ai.
Refer to the sample projects for example implementations of the below capabilities. Some of the unit tests in this project will also provide examples.
- Language Detection
- Sentence Extraction
- Tokenization
- Named-Entity Extraction (supports neural network models on CPU/GPU)
- Document Classification (supports neural network models on CPU/GPU)
All of these core capabilities with the exception of language detection can utilize custom trained models. The ability to train and evaluate trained models is available. Named-entity extraction and document classification support neural network models as well as maximum entropy and perceptron-based models.
- idylnlp-nifi provides Apache NiFi processors using Idyl NLP for NLP tasks.
- idylnlp-deeplearning4j allows for using Idyl NLP within DeepLearning4j projects.
- idylnlp-standford-core-nlp provides wrapper implementations to use Stanford Core NLP within Idyl NLP.
- Renku Language Detection Engine is an open source microservice that identifes the language of natural language text.
- Sonnet Tokenization Engine is an open source microservice for performing string tokenization.
- Prose Sentence Extraction Engine is an open source microservice for performing sentence extraction on natural language text.
- Idyl E3 Entity Extraction Engine is an open source microservice for performing named-entity extraction.
Release dependencies are available:
<dependency>
<groupId>ai.idylnlp</groupId>
<artifactId>...</artifactId>
<version>1.0.0</version>
</dependency>
An example to quickly make a named-entity extraction pipeline to extract person entities from English natural language text:
NerPipelineBuilder builder = new NerPipeline.NerPipelineBuilder();
NerPipeline pipeline = builder.build(LanguageCode.en);
EntityExtractionResponse response = pipeline.run("George Washington was president.");
for(Entity e : response.getEntities()) {
System.out.println(e.toString());
}
This code outputs:
Text: George Washington; Confidence: 0.96; Type: person; Language Code: eng; Span: [0..2);
Idyl NLP requires Java 8. To build, simply:
mvn clean install
Unit tests are included. Some tests require data that cannot be made publicly available at this time due to either size constraints or licensing. These tests are categorized as ExternalData
and are skipped during a regular build. We execute these tests using an in-house build job executed after each commit. We are working to find a suitable location to host the large test data to make it available to everyone.
There are also some tests categorized as HighMemoryUsage
. These tests require a very large amount of memory to execute. For this reason they are disabled during regular builds. We execute these tests on a privately hosted build server.
Idyl NLP is available under the Apache License, version 2.0.
Copyright 2019 Mountain Fog, Inc.