Code by Thai-Hoang Pham, Xuan-Khoai Pham
The vnSRL system is used to labelling semantic roles of arguments for each predicate in a Vietnamese sentence. This software is written by Python 2.x.
This software depends on NumPy, SciPy, Scikit-learn, Pandas, Pulp, ETE2, six Python packages for scientific computing. You must have them installed prior to using vnSRL.
The simple way to install them is using pip:
# pip install -U numpy scipy scikit-learn pandas pulp ete2
The input data's format of vnSRL is Penn Treebank format. For details, see sample data 100-sen.txt in a directory 'data/input'.
For classifying task, put your data in a directory 'data/input'.
Your output file is stored in a directory 'data/output'.
You can use vnSRL software by a following command:
$ python vnSRL.py <input> <ilp> <embedding> <output>
Positional arguments:
input
: input file nameilp
: integer linear programming post-processing (1 for using ilp and 0 for vice versa)embedding
: word embedding file (skipgram or glove)output
: output file name
For example, if you want to use this software to label the file input.txt, ilp for post-processing, glove for word embedding and the output file output.txt, you use the command:
$ python vnSRL.py input.txt 1 glove output.txt
Note: In our experiment, integer linear programming method helps to improve the performance about 0.4% but takes very long time to run (about 50x longer).
Thai-Hoang Pham < [email protected] >
FPT Technology Research Institute, FPT University