Causal Relation Extraction and Identification using Conditional Random Fields. It was a project under our faculty Mr. Tirthankar Dasgupta.
Link to the project presentation.
Causal Relation is a relation between two events: cause and effect. Cause is the producer of the effect, and effect the result of the cause.
Ex. “Hunger is the most common cause of crying in a young baby.” Here cause is “Hunger” and effect is “Crying”.
The present work is focused on the detection and extraction of Causal Relations from Medical domain text.
From the point of view of detecting Causal Relations, the following distinctions may be useful:
• Marked or unmarked: a causation is marked if there is a specific linguistic unit that signals the relation; unmarked otherwise. “I bought it because I read a good review” is marked; “Be careful. It’s unstable” isn’t.
• Ambiguity: if the mark signals always a causation, it is unambiguous (e.g. “because”). If it signals sometimes a causation, it is ambiguous (e.g. “since” ).
• Explicit or implicit: a causation is explicit if both arguments are present; implicit if one or both are missing. “She was thrown out of the hotel after she had run naked through its halls.” is explicit; “John killed Bob.” is implicit, since the effect, Bob’s death, is not explicitly stated. We focus on marked and explicit causations.
1. Data Preprocessing
2. Feature Selection and Extraction
3. Training Model
4. Testing Model Prediction Accuracy
- Extracting unique words
- POS Tagging & Term Labelling (CC- cause, EE- effect, O- Null, RR- relation(Causal Link word) )
- Word Case (upper/lower)
- Word POS
- Word title
- Type (Alphanumeric/Character)
Statistical Model CRF (Conditional Random Field) is used from sklearn-crfsuite library. We trained model on our preprocessed training dataset.
Testing model on test data with following Precession, Recall, & F-1 score values.
To get more accurate result we can use (Sequence Models) Deep Neural Networks, like Bidirectional LSTM Models.
These models can be used owing to their high accuracy because of their very deep feature extraction capabilities. Only disadvantage is that they (LSTMs) require very large amount of data for training.
• University Of New Zealand
• Wikipedia
• Automatic Extraction of Causal Relations from Text using Linguistically
Informed Deep Neural Networks
|
Special Thanks to Shivendra Pratap Singh for all his efforts and contributions.