Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cliner #8

Open
wants to merge 257 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
257 commits
Select commit Hold shift + click to select a range
71949e9
removed junked from file
Mar 13, 2015
295b4b1
made reading files more efficient
Mar 13, 2015
89cf092
fixed error, was inserting wrong data into tables
kwaco Mar 14, 2015
379cf4b
spacing issue corrected
wboag Mar 16, 2015
90ba175
Update README.rst
wboag Mar 16, 2015
599965e
Update README.rst
wboag Mar 16, 2015
3916ed4
Added structure for first unit test
Mar 16, 2015
9a399b1
Added doctest for is_date()
Mar 16, 2015
50aff8a
Added basic doctests to utilities.py
Mar 17, 2015
b565adc
made loading model object a lot faster
Mar 17, 2015
b491c78
Merge branch 'master' of github.com:text-machine-lab/CliNER
Mar 17, 2015
708c56c
created a method for loading pickled objects quickly
Mar 17, 2015
2b7219e
made cache loading faster
Mar 17, 2015
ad015f0
make loading pos tagger faster
Mar 18, 2015
8a3146f
modified where temp files are saved, fixed error in cli.py, fixed not…
Mar 19, 2015
46ad32d
reduced imported overhead
Mar 19, 2015
627b149
fixed potential race condition
Mar 20, 2015
c9681ba
Very simple files for docstring test cases
wboag Mar 20, 2015
789c99b
setup tests to diagnose system prereqs
wboag Mar 20, 2015
e91d908
Added section about prerequisite diagnosis
wboag Mar 20, 2015
2e833fb
Added section about prerequisite diagnosis
wboag Mar 20, 2015
01683b5
Added diagnosis coverage and patches broken demo
wboag Mar 20, 2015
68eff62
minor adjustments to readme
wboag Mar 20, 2015
e06dec8
Added many new tools for installation
wboag Mar 21, 2015
a60d927
removed old install.sh file
wboag Mar 21, 2015
81092b4
fixed installation bugs
wboag Mar 21, 2015
192b9dc
fixed naming issue in scripts
wboag Mar 21, 2015
d3dd715
made trie loading more efficient
Mar 22, 2015
e330d88
made usage of nltk stemmer more efficient
Mar 22, 2015
de9dd41
removed imports that were not used
Mar 22, 2015
145e28f
made tagger only load once during predict
Mar 22, 2015
b0ef04d
made loading pos tagger more efficient
Mar 22, 2015
9cc5c6a
added test file for word features
Mar 23, 2015
64314d4
fixed failing doctest
Mar 23, 2015
f8b589e
added more doctests and general documentation
Mar 23, 2015
905a1cb
added more doctests/documentation
Mar 25, 2015
72c5b5d
Merge branch 'install'
wboag Mar 26, 2015
6fe0e48
renaming install script (dev)
wboag Mar 26, 2015
f8656f7
more helpful error message in diagnose
Mar 26, 2015
ef0290c
updated directory path
Mar 26, 2015
550f37f
update install script
Mar 26, 2015
b113b0c
Fixed doctests, modified regexes
Mar 26, 2015
0977a22
added basic doctests
Mar 26, 2015
467ebe2
added test file for features
Mar 26, 2015
1dad6c2
added basic doctests
Mar 26, 2015
d10a4ef
added test file for read_config
Mar 26, 2015
3240c4f
added test file for sentence_features
Mar 26, 2015
2a03993
added structure for doctests
Mar 26, 2015
54f721f
removed unused imports
Mar 27, 2015
35db856
added web demo code for storage only. do not even try to use this.
Mar 27, 2015
dd80a99
moved where demo code was stored
Mar 27, 2015
ec07da9
removed demo.tar.gz file
Mar 27, 2015
b6fa844
concatenated tests into one file
Mar 30, 2015
9da5e22
removed excess test files
Mar 30, 2015
8619c2d
cleaned up some documentation
Mar 30, 2015
539dc17
added more doctests
Mar 30, 2015
fed2fd3
removed unnecessary doctests
Mar 30, 2015
37ffdcd
added doctests
Mar 31, 2015
128ace0
added more doctests
Mar 31, 2015
18c9d31
expanded on current doctests
Apr 1, 2015
853a818
fixed failing doctests
Apr 1, 2015
5e03a5a
Merge branch 'doctests'
tnaumann Apr 7, 2015
00b557b
fixed trailing text
wboag May 21, 2015
74ca1c2
removed misc text
wboag May 21, 2015
4f815a4
renamed prose_sentence to is_prose_sentence
wboag May 22, 2015
58cc504
tore whole file in Note object
wboag May 28, 2015
dc111c6
changed file permissions
wboag May 28, 2015
2a7e794
updated call to lineno_and_tokspan to provide context
wboag May 28, 2015
6002f8e
Update README.rst
wboag Jun 20, 2015
078b9a8
We don't need the demo stuff sitting around on master doing nothing
Jun 20, 2015
f7cc436
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
Jun 20, 2015
698638c
fixed install diagnostics for nltk dependency
Jun 20, 2015
fef46c1
renamed output dir from test_predictions/ to predictions/
Jun 20, 2015
3f4eef7
moved tmp_files_dir to new misc directory
Jun 20, 2015
891577e
We don't need our own temp files directory. just use /tmp
Jun 21, 2015
2308dea
fixed bug with opening tempfiles
Jun 22, 2015
10e8567
Adding vagrant deployment capabilities
Jul 27, 2015
ff95c91
vagrant up now successfully builds a virtual machine with cliner in w…
Jul 28, 2015
89dd752
Fixed some bugs with nltk pos tagger being loaded.
Jul 30, 2015
71b9dc3
Fixed problem with GENIA not building correctly. The demo script now …
Aug 2, 2015
9ffe904
word_features.py was modularized. Minor comment typos were fixed.
Aug 8, 2015
ec412c6
Modularized sentence_features and features. Features now contains three
Aug 11, 2015
df3aabd
Broke features out into their own function modules, and adjusted call…
c-cooper Aug 16, 2015
fa1a082
Modularized umls_features (yet to be tested)
Aug 22, 2015
77d3ad1
Merge branch 'modularization' of https://github.com/text-machine-lab/…
c-cooper Sep 6, 2015
a0073cc
Function caching using repoze.lru
Sep 19, 2015
8da4cc6
commiting my summer changes for merge with Connor & Renan
wboag Sep 29, 2015
cedcd16
removed extraneous files from cliner/
wboag Sep 29, 2015
7a3362c
adding tools and configuration globals
wboag Sep 29, 2015
62f3178
Added func_cache.py
Sep 29, 2015
93fc4bf
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
wboag Sep 29, 2015
29217ae
Merge branch 'modularization', remote-tracking branch 'origin'
Sep 29, 2015
458abc9
qaccidentally closing file twice
wboag Sep 29, 2015
9f3bd5e
Merge branch 'vagrant', remote-tracking branch 'origin'
Sep 29, 2015
caf04f5
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
wboag Sep 29, 2015
acb6576
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
wboag Sep 29, 2015
0c5310e
func_cache says which function it's reporting cache information on.
Oct 8, 2015
1cc7350
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
Oct 8, 2015
0bd82ab
remove default arguments from train and predict
wboag Oct 24, 2015
e44643a
adding python dependencies for pip to requirements.txt
wboag Oct 24, 2015
3e6f17d
minor adjustments to files (comments, spacing, etc)
wboag Oct 24, 2015
bf54277
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
wboag Oct 24, 2015
6403f56
fixed pos tagger loading in newest nltk version
kwaco Oct 25, 2015
3e7d4f9
Prints function name in ShowInfo
Oct 25, 2015
47802cf
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
Oct 25, 2015
8fb9107
Added function name to ShowInfo
Oct 25, 2015
6abc26c
Updated virtual-env command to explicitly use python 2.7
Nov 2, 2015
399e24f
default config file to not use genia
wboag Nov 7, 2015
7a531ec
fixed minor bug that checked path of where to serialize model
wboag Nov 7, 2015
01fb46d
fixed index assumption (apparently tokens are not always exactly one …
wboag Nov 7, 2015
01e0f3a
fixed genia problems
kwaco Nov 12, 2015
ba32fdf
fixed some character indices
wboag Nov 12, 2015
e4ae93f
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
wboag Nov 12, 2015
a52b266
refined condition for invalid BIO span, cliner now exits when invalid…
kwaco Nov 12, 2015
da0bd6a
added some fixes to reader. this will be cliner's first build release
kwaco Nov 12, 2015
ea258a9
updated readme to include instructors how to get umls tables
kwaco Nov 12, 2015
e21dc70
cleaned up installation dir a little
wboag Nov 15, 2015
983491d
Made README way nicer
wboag Nov 15, 2015
d8ed2a2
fixed i2b2 reader bug when duplicate line appears in txt file
wboag Nov 15, 2015
f696562
fixed demo script
wboag Nov 15, 2015
cbea6e6
allows config.txt to specify path to umls_tables
wboag Nov 15, 2015
25496f6
do not allow cliner to run without setting CLINER_DIR
wboag Nov 15, 2015
15a096c
format enabled-disabled modules display
wboag Nov 15, 2015
dab753d
turned off cache hit debug info verbosity
wboag Nov 15, 2015
96929c5
misc files for build
wboag Nov 15, 2015
61ac630
fixed issue with order in which dependencies were installed. fixed is…
kwaco Nov 16, 2015
c48f334
updated readme
kwaco Nov 16, 2015
0e30ab3
updated readme. indicated wrong umls table
kwaco Dec 2, 2015
cbc5fbb
Update README.rst
Dec 2, 2015
dada566
fixed some errors that occur on interrupts. fixed read config to be m…
kwaco Dec 2, 2015
be7fdcf
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
kwaco Dec 2, 2015
3373a5d
remove single quote chars from config path
kwaco Dec 2, 2015
68ddfad
updated readme, made read_config check if paths are actually correct
kwaco Dec 2, 2015
544f4a9
wrong arguments in umls_cache.py
kwaco Dec 2, 2015
92e9be4
added some error messages
kwaco Dec 2, 2015
7d26929
added error messages
kwaco Dec 2, 2015
32f8338
merged semeval fork into master
kwaco Jan 11, 2016
2aceee3
forgot to add java code from semeval merge
kwaco Jan 13, 2016
f188ce9
install steps were commented out
kwaco Jan 13, 2016
891a815
silver model (yay)
wboag Jan 19, 2016
ccea98d
fixed bugs that arose when merging branches
kwaco Feb 19, 2016
defe900
fixed another import issue
kwaco Feb 19, 2016
5abdb0b
added lines to download and compile stanford parser
kwaco Feb 19, 2016
af3d92f
PY4J is now an option within config.txt
kwaco Feb 19, 2016
f3bc317
Update README.rst
wboag Feb 26, 2016
8443e3f
added grammar features and infrastructure for word2vec features, shou…
kwaco Mar 4, 2016
d35eea0
Merge branch 'master' of https://github.com/text-machine-lab/CliNER
kwaco Mar 4, 2016
3402d07
fixed bugs with word2vec sequence features
kwaco Mar 8, 2016
8dbf863
added more features, changed read_config.py to not have to use enviro…
kwaco Mar 8, 2016
52197f0
fixed bug when a warning is outputted by genia
kwaco Aug 8, 2016
8a4cb20
fixing error
navd Oct 17, 2016
594f7f8
Merge pull request #3 from navd/patch-1
tnaumann Oct 18, 2016
eee3f8e
updated config
Jun 30, 2017
5ea4ab2
Removed CLINER_DIR, reorganized code base
Jul 7, 2017
b764a85
removed CLINER_DIR dependencies in features_dir and notes/
Jul 7, 2017
4c4c4cf
renamed used and removed unused argparse arguments in train.py
Jul 7, 2017
6b4f438
renamed argparse arguments in evaluate.py & format.py
Jul 7, 2017
50b0caa
renamed argparse in predict.py
Jul 7, 2017
d319421
replaced notes directory with documents.py
Jul 7, 2017
71d2fc5
removed vagrant files and setup.pu related dependencies
Jul 7, 2017
12a3e1e
Removed unmaintained features
Jul 7, 2017
ea67aa3
Renamed features_dir to feature_extraction
Jul 7, 2017
157a7ec
Removed currently unnecessary features in feature_extraction/
Jul 7, 2017
12b4129
trimmed and simplified train.py
Jul 7, 2017
dcd9ca7
removed unnecessary items in models
Jul 7, 2017
83e7871
removed unneeded files and fixed syntax error in train.py
Jul 7, 2017
4937d94
removed unnecessary directory
Jul 7, 2017
0e32f07
trimmed model.py so it doesnt crash on train
wboag Jul 7, 2017
26a99e0
fixed problems related to crashing runtime
Jul 10, 2017
673266e
fixedcliner prediction and evaluation
Jul 10, 2017
8e30ce3
fixed evaluation and prediction scripts
Jul 10, 2017
b2929f4
changed train from two-pass to one-pass
Jul 13, 2017
7962a2d
changed train from two-pass to one-pass
Jul 13, 2017
1fddb0b
Modified train() and created train_fit() in model.py
Jul 13, 2017
d91cba2
added use_lstm flag to train.py
Jul 13, 2017
c824b88
added generic_train() to model.py
Jul 13, 2017
42b547a
removed contents of notes dir, added new documents.py to replace the …
Jul 14, 2017
23599fe
Fixed train()
Jul 14, 2017
0c96a17
add logging info to training
wboag Jul 14, 2017
2430aea
Changed predict from 2-pass to 1-pass
Jul 19, 2017
4b49127
fixed crf_ml name clash
Jul 19, 2017
0068d07
fixed attribute error for generic_train() and generic_predict()
Jul 20, 2017
9ed7e7d
Fixed formatting
Jul 21, 2017
72c754a
added features logging
Aug 16, 2017
7f818fe
enabled umls feats
Sep 1, 2017
90dbe91
enabled genia feats
Sep 1, 2017
d154029
enabled umls feats
Sep 1, 2017
34d494a
removed debugging line
Sep 1, 2017
af6c191
Updated package requirements
Sep 8, 2017
5340382
Updated README
Sep 8, 2017
6a0b544
Removed random seeding for train
Sep 8, 2017
2c6ed8e
Removed debugging line
Sep 8, 2017
408eb23
Merge branch 'master' of https://github.com/simthyrearch/CliNER
Sep 18, 2017
d7793cd
Update README.rst
Sep 18, 2017
4aa84da
Merge branch 'master' of https://github.com/simthyrearch/CliNER
Sep 18, 2017
a52db11
Cleaned cliner evaluation --out flag, added new author
Sep 18, 2017
0bdd466
Updated information about lstm feature support
Sep 19, 2017
7e2a937
Removed unnecessary information files
Sep 19, 2017
daf63c3
Removed LICENSE
Sep 19, 2017
4b7d464
Update README.rst
Sep 19, 2017
b915cf0
Removed output flag related code
Sep 19, 2017
d71061a
Merge branch 'master' of https://github.com/simthyrearch/CliNER
Sep 19, 2017
de7b803
Deleting minor changes
Sep 19, 2017
d01092e
Update README.rst
Sep 19, 2017
b2a815b
Minor maintenance
Sep 19, 2017
b421be7
Merge branch 'master' of https://github.com/simthyrearch/CliNER
Sep 19, 2017
1e1ef97
Minor maintenance
Sep 20, 2017
64691ed
Minor maintenance
Sep 20, 2017
6787bf2
Created example data, provided i2b2 tokenization script
Sep 20, 2017
af6c2da
Update README.rst
Sep 23, 2017
21e2c85
Merge pull request #4 from simthyrearch/master
wboag Sep 24, 2017
bc6a7a2
Created link to wiki page
Sep 25, 2017
c5af26c
about to fold lstm into cliner
wboag Oct 12, 2017
78ebda2
make cliner python3 compatible. add validation data arguments
wboag Oct 13, 2017
e0d3b96
be able to predict on any optional test data during training (todo: h…
wboag Oct 13, 2017
7a1c89b
enabled pos & pos_context
Oct 27, 2017
9be4c4b
Update README.rst
Oct 27, 2017
0cb599d
Merge pull request #5 from text-machine-lab/simthyrearch-patch-1
Oct 27, 2017
ce023c1
Initial Update
elesideprojects Dec 2, 2017
9d3c3ef
Checking epch sced
elesideprojects Dec 2, 2017
3f7325b
Merge pull request #6 from elensergwork/LSTM
wboag Dec 3, 2017
331f07a
encode_pickle_error
Jan 29, 2018
d451778
fixed the python2 crf
wboag Feb 11, 2018
5f33917
bugfix for import
wboag Feb 12, 2018
98b77b9
updated silver model link
wboag Feb 12, 2018
29a071b
Update README.rst
wboag Feb 12, 2018
3407bfe
sometimes, fake words crash the nltk stemmer. catch the exceptions
Feb 18, 2018
196654c
Merge branch 'master' into master
wboag Feb 18, 2018
72f2dbc
Merge pull request #7 from erayon/master
wboag Feb 18, 2018
6e20c7d
print statement
wboag Feb 23, 2018
fe832e9
tweaked a few python 2/3 syntaxes
wboag Feb 23, 2018
f88f994
adding two python libs that are required when running UMLS (though th…
wboag Feb 23, 2018
ec828a8
Update README.rst
wboag Apr 8, 2018
7dc0410
made i2b2 2010 data edits more clear
wboag Apr 10, 2018
e5282d0
fixed genia feature extraction to work for python3
wboag Apr 11, 2018
5e1599f
Update README.rst
wboag Apr 11, 2018
441e3d2
changed tmp dir to data/tmp instead of assuming /tmp (which is now co…
Jul 17, 2018
91e8b1e
Fixing the tmp_file write fail #10
dsouzadaniel Jul 18, 2018
90b3875
Merge pull request #11 from dsouzadaniel/patch-1
wboag Jul 18, 2018
4fc765d
Fixed pickle encoding arguments for Python2 and Python3
Aug 16, 2018
5d86336
Update README.rst
Aug 16, 2018
4cad9a3
ensure that the model has a default encoding of latin
correlator Dec 17, 2018
70fd8e5
Merge pull request #16 from correlator/load_model_in_latin
wboag Dec 20, 2018
56758cd
Update README.rst
shuvoxcd01 Jul 21, 2020
ceea065
Update README.rst
shuvoxcd01 Jul 21, 2020
eb8868d
Update README.rst
shuvoxcd01 Jul 21, 2020
cf91b5e
Update README.rst
shuvoxcd01 Jul 21, 2020
312d57b
Merge pull request #29 from shuvoxcd01/fixes
wboag Jul 21, 2020
854dd83
Removes leftover debugging statement.
tnaumann Aug 14, 2020
3d84e17
Adds archive notice to README.rst
tnaumann Aug 14, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 0 additions & 16 deletions AUTHORS.rst

This file was deleted.

111 changes: 0 additions & 111 deletions CONTRIBUTING.rst

This file was deleted.

1 change: 0 additions & 1 deletion COPYRIGHT

This file was deleted.

9 changes: 0 additions & 9 deletions HISTORY.rst

This file was deleted.

201 changes: 0 additions & 201 deletions LICENSE

This file was deleted.

22 changes: 22 additions & 0 deletions LSTM_parameters.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
token_pretrained_embedding_filepath vectors2.txt
load_all_pretrained_token_embeddings False
load_only_pretrained_token_embeddings False
tagging_format bio
use_character_lstm True
use_crf True
Use_LSTM True
use_features_before_final_lstm False
character_embedding_dimension 25
character_lstm_hidden_state_dimension 25
token_embedding_dimension 100
freeze_token_embeddings False
token_lstm_hidden_state_dimension 100
optimizer sgd
gradient_clipping_value 5.0
remap_unknown_tokens_to_unk True
learning_rate 0.005
check_for_lowercase True
check_for_digits_replaced_with_zeros True
model_folder ./models/NN_models/Test_November
conll_like_result_folder ./RESULTS/TEST_SAVER/NOVEMBER_DEBUG/
model_name model_00001.ckpt
11 changes: 0 additions & 11 deletions MANIFEST.in

This file was deleted.

Loading