forked from ceteri/textrank
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
95 lines (64 loc) · 2.43 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
Open source Java implementation of the TextRank algorithm by Mihalcea,
et al. Note that this code only implements key phrase extraction based
on keyword co-occurance described in section 3 of the Mihalcea-Tarau
paper. This code does not yet implement the sentence extraction
described in section 4 of that paper.
See also:
http://lit.csci.unt.edu/index.php/Graph-based_NLP
GitHub code repo:
http://github.com/ceteri/textrank/
GoogleGroups discussion:
http://groups.google.com/group/textrank-dev
Paco NATHAN
@pacoid
http://www.google.com/profiles/ceteri
NB: There is a known issue with use of JWNL (Java libraries for
WordNet) such that if the graph size exceeds a particular threshold,
then low-level Java I/O reads to the WordNet database on disk will
cause Java thread to block -- even though JVM tools show no blocked
threads.
A potential remedy is to dump WordNet, or at least the parts of it
used here, into some DBD structure with an in-memory cache.
---------
To build:
mvn package
Simple Test:
java -jar target/textrank.jar -i src/test/resources/good.txt
---------
Usage: com.sharethis.textrank.TextRank [options] Example of usage:
java -jar target/textrank.jar -i <input> -l <language> -r <resources>
--log <log properties>
java -jar target/textrank.jar -i src/test/resources/good.txt
java -jar target/textrank.jar -i src/test/resources/good.txt -l en
-r src/main/resources
Options:
* -i, --input The input document
-l, --language Language: en / es
Default: en
--log Logging properties file
Default: src/main/resources/log4j.properties
-r, --resources Path to resources
Default: src/main/resources
---------
Sources for third-party JAR files:
commons-logging-1.1.1.jar
http://commons.apache.org/downloads/download_logging.cgi
commons-math-1.2.jar
http://commons.apache.org/downloads/download_math.cgi
log4j-1.2.15.jar
http://logging.apache.org/log4j/1.2/download.html
porterstemmer.jar
http://snowball.tartarus.org/download.php
opennlp-tools-1.3.0.jar
http://opennlp.sourceforge.net/
maxent-2.4.0.jar
https://sourceforge.net/projects/maxent/
sptoolkit.jar
http://text0.mib.man.ac.uk:8080/scottpiao/sent_detector
trove-2.0.2.jar
http://trove4j.sourceforge.net/
extjwnl
http://extjwnl.sourceforge.net/
jdom-1-1.jar
http://jdom.org/downloads/index.html