Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load and export methods added #40

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
32550e9
load and export methods added
May 24, 2020
f21245e
csv added
May 24, 2020
173d259
graph overview and format description
May 27, 2020
c093cee
graph pre-proccessed by Charlie Hoyt
May 28, 2020
49030be
pre-processed graph and snipnet for dealing with INDRA added
May 28, 2020
952029d
csv updated
May 28, 2020
46dd828
pre-processed graph and snipnet for dealing with INDRA added
May 28, 2020
67a3de5
Merge branch 'master' into immunology_kg
dnsosa May 31, 2020
3357437
added skeleton code
IlanaL1 Jun 9, 2020
0d50a48
initial commit; frauenhofer sentence extraction and cord19 article te…
kaleidoescape Jun 10, 2020
e369463
updated steps for snorkel heuristic labelling
IlanaL1 Jun 12, 2020
d428db5
processed covid19-annovated.csv to training data
IlanaL1 Jun 13, 2020
21d8ce1
feat(frauenhofer,-spacey): add initial spacy nlp pipeline with RE
kaleidoescape Jun 14, 2020
b32a460
feat(spacy): store the char start/end of entities as well
kaleidoescape Jun 15, 2020
30c3a3b
feat(requirements): add my environment requirements
kaleidoescape Jun 15, 2020
169ce33
fix(spacy): correctly add start_char/end_char and change start/end to…
kaleidoescape Jun 15, 2020
0f7e282
updated with new pybel
IlanaL1 Jun 16, 2020
e7813a6
updated with new pybel
IlanaL1 Jun 16, 2020
a59b140
feat(frauenhofer): add all sentences (not just 0th), convert pmcid to…
kaleidoescape Jun 18, 2020
d4187c0
Merge remote-tracking branch 'origin/immunology_kg' into kaleidoescap…
kaleidoescape Jun 18, 2020
ad9994d
feat(frauenhofer): extract seemingly matching entities
kaleidoescape Jun 21, 2020
bf2f65a
feat(frauenhofer): update matched entities format; add more comments …
kaleidoescape Jun 21, 2020
9304f04
feat(frauenhofer): calculate entries where source AND target were mat…
kaleidoescape Jun 22, 2020
53d6f20
added file
IlanaL1 Jun 25, 2020
d9494a4
feat(frauenhofer,spacify): add Entity.namespace; convert sent tokens …
kaleidoescape Jun 26, 2020
b410259
fix(spacify): fix dict output key
kaleidoescape Jun 26, 2020
f9c5c15
feat(spacify): only include scientific entities
kaleidoescape Jun 28, 2020
3f42df4
addded thesuarus and EDA notebokks
IlanaL1 Jun 29, 2020
fe85d3b
Merge remote-tracking branch 'origin/immunology_kg' into kaleidoescap…
kaleidoescape Jun 30, 2020
a4e6d43
added files for generating toy indra covid dataset
IlanaL1 Jul 19, 2020
ec9f877
cleaned script
IlanaL1 Jul 22, 2020
d16ec0c
toy dataset
IlanaL1 Jul 22, 2020
1d843bd
filtered toy graph for high belief statements
IlanaL1 Jul 28, 2020
0d2cb8e
added evidence annotations with start/stop locaiton evidence to indra
IlanaL1 Aug 9, 2020
a319b18
added evidence annotation text nlp to indra statements
IlanaL1 Aug 9, 2020
2e92a32
generated start and stop positions as separate columns
IlanaL1 Aug 19, 2020
44fe040
updated to include src and target start and stop positions
IlanaL1 Aug 19, 2020
53ae2b6
included dask placeholder code - not working yet
IlanaL1 Aug 20, 2020
d9586aa
adding indra_df pickle file
IlanaL1 Aug 20, 2020
092eb73
included dask placeholder code - not working yet
IlanaL1 Aug 20, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,106 changes: 1,106 additions & 0 deletions immunology_kg/notebooks/1.1_data_pre-processing.ipynb

Large diffs are not rendered by default.

57 changes: 52 additions & 5 deletions immunology_kg/notebooks/1.2_INDRA_baseline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,66 @@
"__Goal:__ Evaluate accuracy of INDRA models\n",
"\n",
"__Method:__ Test of INDRA Covid-19 model:\n",
"1. Use entities form covid-19 dataset as search query to INDRA,\n",
"2. get INDRA statements,\n",
"3. convert them to BEL format,\n",
"4. compare with relations from covid-19 dataset, calculate accuracy\n",
"1. Convert Fraunhofer COVID19 knowledge graph to INDRA statements\n",
"2. Compare relations with relations from INDRA\n",
"4. Calculate accuracy\n",
"5. run error analysis\n",
"\n",
"\n",
"__Data:__ covid-19-kg dataset, [Covid-19 model from INDRA](https://emmaa.indra.bio/dashboard/covid19?tab=model)\n",
"\n",
"__Tools:__ [INDRA](http://www.indra.bio/), [PyBEL](https://github.com/pybel/pybel)\n",
"__Tools:__ [INDRA](http://www.indra.bio/), [PyBEL](https://github.com/pybel/pybel), [pyobo](https://github.com/pyobo/pyobo)\n",
"\n",
"__Result:__ INDRA models accuracy, results of error analysis"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pybel\n",
"import requests\n",
"from indra.processors import bel\n",
"from indra.util import batch_iter\n",
"from indra.sources import indra_db_rest"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#load graph pre-procesed by Charlie Hoyt: https://github.com/CoronaWhy/bel4corona/tree/master/data/covid19kg\n",
"url = 'https://github.com/CoronaWhy/bel4corona/raw/master/data/covid19kg/covid19-fraunhofer-grounded.bel.nodelink.json'\n",
"res = requests.get(url)\n",
"graph = pybel.from_nodelink(res.json())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# Process the PyBEL graph into INDRA Statements\n",
"pybel_proc = bel.process_pybel_graph(pybel_graph)\n",
"\n",
"# Note that of the ~4k statements in the PyBEL graph, only 831 are successfully\n",
"# converted to INDRA statements in large part because of issues with namespaces\n",
"covid_stmts = pybel_proc.statements\n",
"stmt_hashes = [s.get_hash() for s in covid_stmts]\n",
"\n",
"# Use the INDRA Database REST API client to search for corresponding evidences\n",
"# for 100 statements at a time\n",
"db_stmts = []\n",
"for hash_batch in batch_iter(stmt_hashes, 100):\n",
" idrp = idr.get_statements_by_hash(stmt_hashes, ev_limit=1000)\n",
" db_stmts.extend(idrp.statements)"
]
}
],
"metadata": {
Expand Down
Loading