Skip to content

Learning by Reading pipeline of NLP and Entity Linking tools

License

Notifications You must be signed in to change notification settings

BrazilForever11/learningbyreading

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KNEWS: Knowledge Extraction With Semantics

A Learning by Reading pipeline of NLP and Entity Linking tools.

KNEWS is a composite tool that bridges semantic parsing (using C&C tools and Boxer or Semafor), word sense disambiguation (using UKB or Babelfy) and entity linking (using Babelfy or DBpedia Spotlight) to produce a unified, LOD-compliant abstract representation of meaning.

KNEWS can produce several kinds of output:

  1. Frame instances, based on the FrameBase scheme:
<http://framebase.org/ns/fi-Operate_vehicle_0059a98c-3870-49ed-87e1-f882e11a49f7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://framebase.org/ns/frame-Operate_vehicle-drive.v> .
<http://framebase.org/ns/fi-Operate_vehicle_0059a98c-3870-49ed-87e1-f882e11a49f7> <http://framebase.org/ns/fe-Driver> <http://dbpedia.org/resource/Robot> .
<http://framebase.org/ns/fi-Operate_vehicle_0059a98c-3870-49ed-87e1-f882e11a49f7> <http://framebase.org/ns/fe-Vehicle> <http://wordnet-rdf.princeton.edu/wn31/02961779-n> .
  1. Word-aligned semantics, based on lexicalized Discourse Representation Graphs:
<frameinstances>
  <frameinstance id="Operate_vehicle_9a3fa55e-4d97-406a-ab0d-cf681e277296" type="Operate_vehicle-drive.v" internalvariable="e1">
    <framelexicalization>k3:x1 is driving k3:x2</framelexicalization>
    <instancelexicalization>A robot is driving the car</instancelexicalization>
    <frameelements>
      <frameelement role="Driver" internalvariable="x1">
        <concept>http://dbpedia.org/resource/Robot</concept>
        <roleexicalization>A robot is driving x2</roleexicalization>
        <conceptlexicalization/>
      </frameelement>
      <frameelement role="Vehicle" internalvariable="x2">
        <concept>http://wordnet-rdf.princeton.edu/wn31/02961779-n</concept>
        <roleexicalization>x1 is driving the car</roleexicalization>
        <conceptlexicalization/>
      </frameelement>
    </frameelements>
  </frameinstance>
</frameinstances>
  1. First-order logic formulae with WordNet synsets and DBpedia ids as symbols:
fol(1,some(A,and(02961779-n(A),some(B,some(C,and(r1Theme(B,A),and(r1Agent(B,C),and(01934845-v(B),Robot(C))))))))).

Online demo

A demo of KNEWS is now available at http://gingerbeard.alwaysdata.net/knews/.

Installation and configuration

After cloning the repository or otherwise downloaded the KNEWS source code, you must instals the prerequisite Python packages listed in the file requirements.txt. With pip, this is done with:

$ pip install -r requirements.txt

Semantic parsing configuration

KNEWS can work with either Semafor or C&C tools/Boxer to perform semantic parsing. By default Semafor is used, in order to switch to Boxer set semantics->module value to boxer in the config/disambiguation.conf file.

Installation of the Semafor

To install Semafor run:

$ cd ext/
$ ./install_semafor.sh

It is expected that Semafor is run in server mode, server startup instructions can be found in Semafor documentation.

In order to run it locally, open the config/semanticparsing.conf file and switch the value for semafor->mode to local

Installation of the C&C tools and Boxer

Alternatively, C&C tools and Boxer can be used for the semantic parsing. The C&C source code is included in the KNEWS repository (revision v2614). A shell script is provided to automate the compilation and installation. To install the C&C tools locally run

$ cd ext/
$ ./install_candc.sh

By default the script expects to be run on unix/linux. In order to compile on other platforms please modify the install_candc.sh accordingly. For example, on macOS you should change:

ln -s Makefile.unix Makefile to ln -s Makefile.macosx Makefile

Please note: you will need a working installation of swi-prolog v6.6.x in order to correctly compile Boxer.

To test that the installation has completed successfully run (from the candc/ directory):

$ bin/candc --version
$ candc v2614 (unix build on 19 April 2016, 11:35:31)
$ bin/boxer --version
$ boxer v2614 (unix build on 19 April 2016, 11:35:31)

To use the SOAP client/server version of the C&C tools, run the server first with the following command line (from the candc/ directory):

$ bin/soap_server --server localhost:8888 --models models/boxer/ --candc-printer boxer
$ waiting for connections on localhost:8888

Next, you must configure how to run the C&C tools. Open the file config/semanticparsing.conf and select a value for boxer->mode:

  • online will access the [online API]. This is the easiest solution but is it unpractical if KNEWS is used to parse a large amount of text.
  • local will use a local installation of the C&C tools (see below for instructions on how to get this running).
  • soap will usa a local installation of the C&C tools with the SOAP-based client/server architecture, convenient for parsing many different files.

Configuration of the disambiguation tools

You must configure which module to use for word sense disambiguation and entity linking. Open the file config/disambiguation.conf and set a value for wsd->module:

  • babelfy uses the Babelfy online API. Note: a valid API key is needed. You must request it and write it in the config/babelfy.var.properties file.
  • ukb uses the UKB Word Sense Disambiguation system. A script is provided in the ext/ directory to download and install it.
  • lesk uses the Enhanced Lesk WSD algorithm proposed by P. Basile et al. A script is provided in the ext/ directory to download and install it.

You can also configure an entity linking module in the *config/disambiguation.conf file:

  • babelfy uses the Babelfy online API. Note: a valid API key is needed. You must request it and write it in the config/babelfy.var.properties file.
  • spotlight uses the DBpedia Spotlight online API.
  • none makes KNEWS skip the entity linking step altogether.

Test the installation

$ src/pipeline.py -i input.txt -o output.txt

or

$ src/pipeline.py -d input/ -o output.txt

About

Learning by Reading pipeline of NLP and Entity Linking tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 59.0%
  • Prolog 26.5%
  • C++ 9.2%
  • Python 1.5%
  • JavaScript 1.4%
  • Lex 0.9%
  • Other 1.5%