Skip to content

Commit

Permalink
#53 - Update documentation for 2nd release
Browse files Browse the repository at this point in the history
- adding documentation for variable detection module
  • Loading branch information
maxxkia committed Sep 15, 2017
1 parent a5545a8 commit 969dce9
Show file tree
Hide file tree
Showing 3 changed files with 123 additions and 1 deletion.
58 changes: 57 additions & 1 deletion ss-doc/src/main/asciidoc/user-guide/components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -276,4 +276,60 @@ FMeasure scores
Overall precision: 0.601852
Overall recall: 0.970149
Overall F-Measure: 0.742857
--------------------------------------
--------------------------------------

=== Variable mention detection
With this module, you can perform variable mention detection on your data. This module can be used to detect whether
a sentence from an article contains a mention of a variable from one of the social sciences studies. The input documents
to this module contain one or more sentences from an article and the module output indicates whether a variable is
mentioned in the given sentences, and if so the output also contains the id of the mentioned variable.

==== Train and Test pipeline
This pipeline can be used for benchmarking various classifiers on a specific dataset. The pipeline requires the training
and test document files be in the following directory structure

train/
├────variable-id-1/
├────────doc1.txt
├────────doc2.txt
├────────...
├────variable-id-2/
├────────doc1.txt
├────────...
test/
├────variable-id-1/
├────────doc1.txt
├────────doc2.txt
├────────...
├────variable-id-2/
├────────doc1.txt
└────────...

The pipeline uses several different lexical features along with features extracted using external resources such as
WordNet. Read the following instructions to correctly configure these resources.

===== DKPRO_HOME environment variable
Before continuing, please make sure that you have set up an environment variable `DKPRO_HOME` either system-wide or
per-project in the Eclipse run configuration (or your chosen IDE). The variable should point to a (possibly yet empty)
directory which is intended to store any sort of resources which are to be used by any DKPro component.

===== Configuring WordNet
Download WordNet version 3.0 from https://wordnet.princeton.edu/wordnet/download/current-version/[here]
(download http://wordnetcode.princeton.edu/3.0/WordNet-3.0.tar.gz[tar-gzipped]
or http://wordnetcode.princeton.edu/3.0/WordNet-3.0.tar.bz2[tar-bzip2'ed]).

After the download has finished, unzip the package and copy the `dict/` directory to
`$DKPRO_HOME/LexSemResources/Wordnet/`.

Download the Wordnet properties file
`uc-tdm-socialsciences/src/test/resources/installation/wordnet_properties.xml`
(https://github.com/openminted/uc-tdm-socialsciences/blob/master/ss-variable-detection/src/test/resources/installation/wordnet_properties.xml[download])
and place it under
`$DKPRO_HOME/LexSemResources/Wordnet/`. Adjust the value of the `param` element with name `dictionary_path` so it
contains the absolute path to the dict directory.

Create a directory named `de.tudarmstadt.ukp.dkpro.lexsemresource.core.ResourceFactory` under `$DKPRO_HOME`. Download the
resources file `uc-tdm-socialsciences/src/test/resources/installation/resources.xml`
(https://github.com/openminted/uc-tdm-socialsciences/blob/master/ss-variable-detection/src/test/resources/installation/resources.xml[download])
and place it under
this directory.
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">
<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"></bean>
<bean id="wordnet-en" lazy-init="true" class="de.tudarmstadt.ukp.dkpro.lexsemresource.wordnet.WordNetResource">
<constructor-arg value="${DKPRO_HOME}/LexSemResources/wordnet/wordnet_properties.xml"/>
</bean>
<bean id="wiktionary-en" lazy-init="true" class="de.tudarmstadt.ukp.dkpro.lexsemresource.wiktionary.WiktionaryResource">
<constructor-arg value="ENGLISH"/>
<constructor-arg value="${DKPRO_HOME}/LexSemResources/Wiktionary/jwktl_0.15.2_en20100403"/>
</bean>
</beans>
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
<?xml version="1.0" encoding="UTF-8"?>

<!-- This is a sample extJWNL properties file which you can adapt. -->

<jwnl_properties language="en">
<version publisher="Princeton" number="3.0" language="en"/>
<dictionary class="net.sf.extjwnl.dictionary.FileBackedDictionary">
<param name="morphological_processor" value="net.sf.extjwnl.dictionary.morph.DefaultMorphologicalProcessor">
<param name="operations">
<param value="net.sf.extjwnl.dictionary.morph.LookupExceptionsOperation"/>
<param value="net.sf.extjwnl.dictionary.morph.DetachSuffixesOperation">
<param name="noun" value="|s=|ses=s|xes=x|zes=z|ches=ch|shes=sh|men=man|ies=y|"/>
<param name="verb" value="|s=|ies=y|es=e|es=|ed=e|ed=|ing=e|ing=|"/>
<param name="adjective" value="|er=|est=|er=e|est=e|"/>
<param name="operations">
<param value="net.sf.extjwnl.dictionary.morph.LookupIndexWordOperation"/>
<param value="net.sf.extjwnl.dictionary.morph.LookupExceptionsOperation"/>
</param>
</param>
<param value="net.sf.extjwnl.dictionary.morph.TokenizerOperation">
<param name="delimiters">
<param value=" "/>
<param value="-"/>
</param>
<param name="token_operations">
<param value="net.sf.extjwnl.dictionary.morph.LookupIndexWordOperation"/>
<param value="net.sf.extjwnl.dictionary.morph.LookupExceptionsOperation"/>
<param value="net.sf.extjwnl.dictionary.morph.DetachSuffixesOperation">
<param name="noun" value="|s=|ses=s|xes=x|zes=z|ches=ch|shes=sh|men=man|ies=y|"/>
<param name="verb" value="|s=|ies=y|es=e|es=|ed=e|ed=|ing=e|ing=|"/>
<param name="adjective" value="|er=|est=|er=e|est=e|"/>
<param name="operations">
<param value="net.sf.extjwnl.dictionary.morph.LookupIndexWordOperation"/>
<param value="net.sf.extjwnl.dictionary.morph.LookupExceptionsOperation"/>
</param>
</param>
</param>
</param>
</param>
</param>
<param name="dictionary_element_factory"
value="net.sf.extjwnl.princeton.data.PrincetonWN17FileDictionaryElementFactory"/>
<param name="file_manager" value="net.sf.extjwnl.dictionary.file_manager.FileManagerImpl">
<param name="file_type" value="net.sf.extjwnl.princeton.file.PrincetonRandomAccessDictionaryFile">
<!--<param name="write_princeton_header" value="true"/>-->
<!--<param name="encoding" value="UTF-8"/>-->
</param>
<!--<param name="cache_use_count" value="true"/>-->

<!-- Change the following path to point to your WordNet installation -->
<param name="dictionary_path" value="/home/local/UKP/kiaeeha/DKPRO_HOME/LexSemResources/wordnet/dict"/>

</param>
</dictionary>
<resource class="net.sf.extjwnl.princeton.PrincetonResource"/>
</jwnl_properties>

0 comments on commit 969dce9

Please sign in to comment.