Yawni is an API to Princeton University's WordNet®. WordNet is a graph; it is a potentially invaluable resource for injecting knowledge into applications. WordNet is probably the single most used NLP resource ; many companies have it as their cornerstone. It embodies one of the most fundamental of all NLP problems: "word-sense disambiguation". The Yawni code library can be used to add lexical and semantic knowledge, primarily derived from WordNet, to your applications.
Yawni is written in the Java programming language.
The Yawni website is https://www.yawni.org/
Yawni currently consists of 3 main modules:
-
api/
Yawni WordNet API: a pure Java standalone object-oriented interface to the WordNet database of lexical and semantic relationships. -
data*/
Yawni WordNet Data: Jar file containing the Princeton WordNet 3.0 data files, and derivative files to support efficient, exhaustive access to this information. -
browser/
Yawni WordNet Browser: A GUI browser of WordNet content using the Yawni API.
- Install JDK 8 (or greater), Apache Maven 3.0.3 (or greater)
- Specify the following Apache Maven dependencies in your project
<dependency>
<groupId>org.yawni</groupId>
<artifactId>yawni-wordnet-api</artifactId>
<version>2.0.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.yawni</groupId>
<artifactId>yawni-wordnet-data30</artifactId>
<version>2.0.0-SNAPSHOT</version>
</dependency>
- Start using the Yawni API!: all required resources are loaded on demand from the classpath (i.e., jars) made accessible via a singleton:
WordNetInterface wn = WordNet.getInstance();
Numerous unit tests that serve as great executable examples are included
in api/src/test/java/org/yawni/
. For a more complex example application, check
out the browser/
sub-module.
WordNet consists of enough data to exceed the recommended capacity of Java Collections
(e.g., java.util.SortedMap<String, X>
), but not enough to justify a full relational database.
There are a lot of Java interfaces to WordNet already. Here are 8 of the Java APIs, along with their URL and software license.
- Dr. Dan Bikel / Stanford NLP WordNet https://nlp.stanford.edu/nlp/javadoc/wn/doc/ ; “Academic User”
- JAWS (Java API for WordNet Searching) https://github.com/jaytaylor/jaws; BSD-2-Clause License
- Jawbone ; https://sites.google.com/site/mfwallace/jawbone; MIT license
- JWI (MIT Java Wordnet Interface) ; https://projects.csail.mit.edu/jwi/; non-commercial license
- Java WordNet Interface (javawn) https://sourceforge.net/projects/javawn/; GPL 2.0
- WordNet JNI Java Native Support (WNJN) ; http://wnjn.sourceforge.net/ ; GPL 2.0
- JWNL (Java WordNet Library) ; https://sourceforge.net/projects/jwordnet/; BSD
- extJWNL (Extended Java WordNet Library) ; https://sourceforge.net/projects/extjwnl/; BSD
Many of the pure Java ones (like Yawni), are actually derivatives of Oliver Steele 's original JWordNet. In fact, Yawni is the “new” name of that original Java WordNet, JWordNet.
- commercial-grade implementation
- 🚀 very fast & small memory footprint 👣
- pure Java ☕ so it’s compatible with any JVM language! Scala, Clojure, Kotlin, …
- facilitates access to all aspects of WordNet data and algorithms including "Morphy" morphological processing (i.e., lemmatization, i.e., stemming) routines
- simple, intuitive, and well documented 📚 API
- all required resources load from jars by default making deployment a snap 💥
- all query results are immutable 🔒; safe for used in caches and/or accessed by concurrent threads
- easy Apache Maven-based build with minimal dependencies
- extensive unit tests 🧪 provide peace of mind (and great examples!)
- includes refined GUI browser featuring
- user-friendly 😊 🎛 🔍 & snappy 🚀
- incremental find 🔍 (Ctrl+Shift+F / ⌘ ⇧ F)
- no limits on search: Never see “Search too large. Narrow search and try again...” again!
- comprehensive keyboard navigation ⌨ 🧭 support (arrows ⇦ ⇨ ⇧ ⇩, tab ↹, etc.)
- multi-window 🪟🪟 support (Ctrl+N / ⌘ N)
- cross-platform 🔀 including zero-install Java Web Start version
- commercial-friendly Apache license
- Extreme speed improvements: literally faster than the C version (benchmark source included)
- Bloom filters used to avoid fruitless lookups (no loss in accuracy!)
- re-implemented
LRUCache
using Google Guava'sMapMaker
FileManager.CharStream
andFileManager.NIOCharStream
utilize in-memory andjava.nio
for maximum speed
- Major reduction in memory requirements
- use of primitives where possible (hidden by API)
- eliminated unused / unneeded fields
- Implemented
Morphy
stemming / lemmatization algorithms - Completely rewritten GUI browser in Java Swing featuring
- incremental find
- no limits on search: Never see “Search too large. Narrow search and try again...” again!
- Support for WordNet 3.0 data files (and all older formats)
- Support for numerous optional and extended WordNet resources
- 'sense tagged frequencies' (
WordSense.getSensesTaggedFrequency()
) - 'lexicographer category' (
Synset.getLexCategory()
) - 14 new 'morphosemantic' relations (
RelationType.RelationTypeType.MORPHOSEMANTIC
) - 'evocation' empirical ranks (
WordSense.getCoreRank()
)
- 'sense tagged frequencies' (
- Supports reading ALL data files from JAR file
- Many bug fixes
- fixed broken
RelationType
s - fixed Verb example sentences and generic frames (and made them directly accessible)
- fixed iteration bugs and memory leaks
- fixed various thread safety bugs
- fixed broken
- Updated to leverage Java 1.6 and beyond
- generics
- use of
Enum
,EnumSet
, andEnumMap
where apropos - uses maximally configurable slf4j logging system
- added
LookaheadIterator
(analogous to oldLookaheadEnumeration
)- changed to even better Google Guava
AbstractIterator
- changed to even better Google Guava
- Growing suite of unit tests
- Automated all build infrastructure using Apache Maven
- New / changed API methods
- renamed
Word
→WordSense
,IndexWord
→Word
,Pointer
→Relation
,PointerType
→RelationType
,PointerTarget
→RelationTarget
- easier to understand, agrees with W3C proposal (https://www.w3.org/TR/wordnet-rdf/)
WordSense.getSenseNumber()
WordSense.getTaggedSenseCount()
WordSense.getAdjPosition()
WordSense.getVerbFrames()
Word.isCollocation()
Word.getRelationTypes()
Synset.getLexCategory()
RelationTarget.getSynset()
Word.getSenses() → Word.getSynsets()
Word.getWordSenses()
WordSense.getTargets()
→WordSense.getRelationTargets()
DictionaryDatabase
iteration methods areIterable
s for ease of use (e.g.,for
loops)- all core classes implement
Comparable<T>
- all core classes implement
Iterable<WordSense>
- added iteration for all
WordSense
s and allRelation
s (and all of a certainRelationType
) - added support for
POS.ALL
where apropos - all major classes are
final
- currently, no major classes are
Serializable
- removed RMI client / server capabilities - deemed overkill
- removed applet - didn't justify its maintenance burden
- renamed