Skip to content

Annotation Data

Huan He edited this page Jul 18, 2022 · 6 revisions

To initialize the annotation data file (.xml), you can use the Converter Tab in MedTator.

We adopt the same annotation file format used by MAE (Stubbs, 2011; Rim, Kyeongmin, 2016) to save annotations. The annotations are saved in XML format file, which follows the settings defined in the schema file. The basic structure of the annotation XML file is as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<TASK_NAME>
  <TEXT></TEXT>
  <TAGS></TAGS>
  <META></META>
</TASK_NAME>

The annotation XML file has a root element named as the annotation task name. Within that root element, there are three elements, TEXT element, TAGS element, and META element.

  • The TEXT element contains the raw text of a text file for annotation, which comes from the .txt file.
  • The TAGS element contains all the tags annotated, with detailed attribute values (e.g., id, spans, text, etc.). The element names in the TAGS elements are defined in the sample schema, i.e., each concept name is used as an element name in the XML file.
  • The META element contains the meta-data of an annotation file, such as labels and other information. For most of time, this element can be empty and this element is also optional.

For example, using the sample schema file, we annotate a text file pain_10.txt, which looks like the following:

A spontaneous report was received from a consumer concerning a 78 years old male patient, who received Moderna's COVID-19 vaccine (mRNA-1273) and experienced terrible pain on the left side of his upper body, it hurt so much, blood clot in his left and right lung and blood clots in right groin.

Then, with this text file and the sample schema, we annotate 3 tags, an entity tag for AE concept, an entity tag for the SRVT concept, and a link tag for LK_AE_SVRT concept:

Sample Annotation

Then, when user saves the annotation, MedTator will create an annotation XML file that would look like the following:

<?xml version="1.0" encoding="UTF-8" ?>
<COVID_VAX_AE>
<TEXT><![CDATA[A spontaneous report was received from a consumer concerning a 78 years old male patient, who received Moderna's COVID-19 vaccine (mRNA-1273) and experienced terrible pain on the left side of his upper body, it hurt so much, blood clot in his left and right lung and blood clots in right groin.]]></TEXT>
<TAGS>
<SVRT spans="158~166" text="terrible" id="S0" severity="NA" comment=""/>
<AE spans="167~171" text="pain" id="A0" certainty="positive" comment=""/>
<LK_AE_SVRT id="L0" link_AEID="A0" link_AEText="pain" link_SVRTID="S0" link_SVRTText="terrible" comment=""/>
</TAGS>
<META/>
</COVID_VAX_AE>

As shown in this sample, the content in the .txt file are saved in the TEXT element. The three tags we annotated are saved as three elements, <AE>, <SRVT>, and <LK_AE_SVRT>. The attributes of each concept are saved in the element field such as id, spans, text, and severity.

Clone this wiki locally