Skip to content

SBGN Creation and Validation

lknegendorf edited this page Jul 3, 2021 · 7 revisions

Web-search

The web was carefully searched for available, re-useable SBGN map representing the Bachmann model. First, we consulted the model chosen by group 2 and the model belonging to the original publication on the Biomodels database for SBGN maps or SBGN-ML files.

Since neither of the models supplied an SBGN map or SBGN-ML we consulted the SBGN website to find databases and archives of SBGN maps.

Screenshot SBGN databases and collections

Figure 1: Screenshot list of SBGN databases and collections

All of the databases/collections mentioned on the SBGN website featured the chosen Process Description. As we were interested in modifiable maps, we only queried databases that supported SBGN-ML export. The AsthmaMap, Metabolism Regulation Mpas, MetaCrop and Rheumatoid Arthritis Map were omitted, since the original publication didn't mention any connection to the focus of the aforementioned databases/collections.

In the Atlas of Cancer Signalling Networks we checked the Cell Survical Map for the involved proteins JAK2, STAT5, EpoR, SOC3, and CIS, but couldn't find any of these entities on the provided map.

The PANTHER Pathway's pathway section was searched for the terms Bachmann, JAK2, STAT5, EpoR, SOCS3 and CIS yielding more general representations or other aspects of the JAK-STAT-pathway excluding the dual feedback of interest.

The Reactome data base led similar results with the search terms Bachmann, JAK2, STAT5, EpoR, SOCS3 and CIS. The search results included some maps highlighting other aspects of the JAK-STAT-Pathway, e.g. erythropoetin activating STAT5.

Next, we searched the PathWiz with the aforementioned key words, and once again, got some maps featuring other aspects like EPO Signaling Pathway.

Last, we queried the Pathway Commons with the following terms: Bachmann, JAK2, STAT5, EpoR, SOCS3 and CIS. Digging through the various pathways, we could neither identify the map of the Bachmann model itself nor a satisfying map as starting point for adding the key message of the Bachmann model. For example, the map 'EPO signaling pathway' featured important compartments and phosphorylation stages, but lacked to transport the modulation of reactions and was rather confusing.

Since the search was not providing any useful results, we tried to re-use the SBML representation of the Bachmann model researched by group 2 by importing it into several SGBN editors.

Screenshot Newt Editor Import

Figure 2: Screenshot SBML Import in Newt Editor

As shown above, the import into Newt Editor yielded a complex and confusing network. Amongst other drawbacks, there was no distinction between the different types of modulation. The nuclear compartment was not integrated into the cytoplasma and the source for production and the sink for degradation weren't associated with neither of the compartments.

Next we tried the import into VANTED using the SGBN-ED add-on producing an even worse starting point. As shown below, some of the problems were the display of certain macromolecules as unspecified entities or process nodes displayed as rounded rectangles. Correcting or specifying each node and each arc did seemed error-prone.

Screenshot SBGN-ED Import

Figure 3: Screenshot SBML Import in VANTED Editor using SBGN-ED add-on

Since this step did not yield any satisfying results, the SBGN network was developed from scratch.

Drafting the SBGN network

The web tool Newt was used to develop the SBGN network representing the Bachmann model [1, 2]. To facilitate the process and to improve the readability and reusability the 10 tips published by Touré et al. were used as a guideline [3].

To create the first draft of the Bachmann model as an SBGN network, all necessary biological components where identified and added to the map. Then, all important reactions where represented by adding appropriate arcs.

While creating the diagram we encountered the problem that membrane-spanning macromolecules can only be assigned to one cell compartment, whereas their parts are actually located in different compartments. The current level 1 specification for Process Description language does not provide a solution and postponed this issue to a future specification level. However, the level 1 specification states three workaround all coming with a trade-off [4]. We decided to assign the receptor complexes to the cytoplasma compartment. Unfortunately, Newt didn't allow to place it on the compartment boundary.

Validation of SBGN and SBGN-ML

To check the validity and integrity of the drafted SBGN map, we first used the semantic validation feature of Newt. The semantic validity feature in Newt is based on the LibSBGN javascript library [1, 5]. The validation feature declared the draft map as valid. We exported the SBGN map in SBGN-ML 0.2 format. Remarkably, despite declaring the SBGN map as valid, Newt was not able to export the map in CellDesigner format, returning the exception that the conversion service was not available.

To ensure the interoperability of the exported SBGN-ML file, we aimed at importing the file into three different tools supporting SBGN-ML (compare section on tools). The import of the Newt-exported SBGN-ML into VANTED with SBGN-ED plugin failed on the first attempt, returning that the file was not a valid SBGN file. The exception (cvc-datatype-valid.1.2.1) claimed that an issued string was not a valid value for 'NCName'. We found, that the issued string was a glyph ID automatically assigned by Newt. Therefore, we hypothesized that there is an issue with XML standards between the two platforms. As workaround, we renamed the corresponding glyph ID in the SBGN-ML code with 'manualID1' using Notepad++ (compare Toolbox). Iteratively, we renamed 20 glyph IDs manually to 'manualIDn' which was thereafter allowing the import into VANTED. The 20 glyph IDs returning the exception had in common, that the first digit was a number. Apparently, our replacing IDs had a character as first digit.

Next, the modified SBGN-ML was re-imported into Newt and the map was visually checked for correctness. As the SBGN map was correctly reproduced despite the manual changes of some glyph IDs, we exported the map as Scalable Vector Graphic and Portable Network Graphic files. Afterwards we imported the manually changed SBGN-ML into VANTED/SBGN-ED, Krayon for SBGN and SBGNViz. Except from some minimal errors not adversely affecting the biological correct representation, our SBGN map was reproducible and editable in three different tools using the SBGN-ML file created.

We recognized differences in the SBGN-ML code depending on the tool from which the SBGN-ML file exported. Especially, Krayon uses a specific notation with format extensions. A more precise comparison of tool specific characteristics of SBGN-ML would be of worth for the community, but was clearly above the scope of this work.

Beautify

After the created SBGN network successfully passed validation with various tools (cf. above) visual attractiveness was still limited.

BachmannPD_newt Figure 4: SGBN map in Newt Editor after validation

We decided to keep the comprehensive structure of the SBGN map without further reduction of biological components or reactions to enable readers retracing the complex Ordinary Differential Equation (ODE) model. In order to make the SBGN map visually appealing and improve readibility we manually enhanced it by the following steps:

  1. Removal of ports
  2. Rearrangement of proteins and compartements, creation of submaps where meaningful
  3. Horizontal and/or vertical alignment of entity pool nodes, process nodes and logical operators
  4. Adaptation of label sizes
  5. Redirection of connecting arcs

As pointed out by Touré et al. [3] the network design should be in line with the message and the scientific question it aims to communicate. After alignment with the full team we decided to highlight the roles of the two transcriptional negative feedback regulators of the suppressor of cytokine signaling (SOCS) family, CIS and SOCS3, with color.

BachmannPD_beautified Figure 5: SGBN map beautified in Newt Editor

References

[1] Balci, H. et al. Newt: a comprehensive web-based tool for viewing, constructing and analyzing biological maps. Bioinformatics 37, 1475–1477 (2021). https://doi.org/10.1093/bioinformatics/btaa850

[2] Sari, M. et al. SBGNViz: A Tool for Visualization and Complexity Management of SBGN Process Description Maps. PLoS ONE 10, e0128985 (2015). https://doi.org/10.1371/journal.pone.0128985

[3] Touré, V., Le Novère, N., Waltemath, D. & Wolkenhauer, O. Quick tips for creating effective and impactful biological pathways using the Systems Biology Graphical Notation. PLoS Comput Biol 14, e1005740 (2018). https://doi.org/10.1371/journal.pcbi.1005740.g001

[4] Rougny, A. et al. Systems Biology Graphical Notation: Process Description language Level 1 Version 2.0. Journal of Integrative Bioinformatics 16, (2019). https://dx.doi.org/10.1515/jib-2019-0022

[5] van Iersel, M.P., Villéger, A. C., Czauderna, T. et al. Software support for SBGN maps: SBGN-ML and LibSBGN. Bioinformatics 28, 2016-2021 (2012). https://doi.org/10.1093/bioinformatics/bts270

Clone this wiki locally