In short, OSDG-mapping builds an integrated ontology from the feature sets identified in previous research, and then matches the ontology items to the topics from Microsoft Academic.
Check out our paper on ArXiv OSDG -- Open-Source Approach to Classify Text Data by UN Sustainable Development Goals (SDGs)
OSDG takes relevant text features (such as ontology items, features from machine-learning models or extracted keywords) from the previous research, cleans them and merges them into a comprehensive, constantly-growing OSDG ontology. The ontology items are mapped to the ever-growing list of topics/Fields of Study in the Microsoft Academic Graph (MAG). By doing this, we:
- expand the ontology – acquire more key terms associated with the relevant MAG Topics, natively called Fields of Study (FOS);
- capture more nuanced relationships between individual terms and latent concepts.
OSDG-mapping integrates the existing research into a comprehensive approach, and does so in a way that evades the shortcomings of former individual approaches and duplication of research efforts.
OSDG-mapping constructs SDG relevant FOS ontology which is an important element of OSDG-tool .
Assigned labels from raw data sources are assembled in two steps:
-
Assembling terms
AssemblingTerms.py
Assembles terms fromraw_data/0_add/
data sources.- Term label conflicts from sources
00_add_validated/
are ignored meaning ifterm_1
is assigned toSDG_1
bysource_1
and toSDG_2
bysource_2
→term_1
is assigned to both. - Conflicts for term labels from
01_add_generated/
data sources are managed in two ways:- If the conflict is between validated and generated term label → generated term label is discarded while validated one remains.
- If the conflict is between generated & generated → both are discarded.
→ produces
InterimTerms.json
{ 'SDG_1': { 'term_1': ['source_1', 'source_2', ...], 'term_2': ['source_1', 'source_3', ...] ... } ... }
- Term label conflicts from sources
-
Assembling OSDG Ontology
AssemblingOntology.py
Assembles FOS fromInterimTerms.json
and02_add_all_to_all/
data sources.- 2.1. Terms from
InterimTerms.json
are matched to MAG Fields of Study subsetFOSMAP.json
which contains over 150 thousand fields. - 2.2. Matched FOS are added to the final ontology
FOS-Ontology.json
. - 2.3.
02_add_all_to_all/
FOS are added to the final ontologyFOS-Ontology.json
. - 2.4 Final ontology
OSDG-Ontology.json
is adjusted based on1_replace/
and2_remove/
.
→ produces
OSDG-Ontology.json
{ 'SDG_1': ['fos_id_1', 'fos_id_2', ...], 'SDG_2': ['fos_id_3', 'fos_id_4', ...] ... }
- 2.1. Terms from
The list of data sources used in the current version of the OSDG Tool are here. OSDG leverages the data from Microsoft Academic:
- Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.-J. & Wang, K. (2015). AnOverview of Microsoft Academic Service (MAS) and Applications. Proceedings of the24th International Conference on World Wide Web (p./pp. 243--246), Republic andCanton of Geneva, Switzerland: International World Wide Web Conferences SteeringCommittee. ISBN: 978-1-4503-3473-0. doi:10.1145/2740908.27428398.
- Wang, K., Shen, Z., Huang, C., Wu, C., Eide, D., Dong, Y., Qian, J., Kanakia, A., Chen,A.C., & Rogahn, R. (2019). A Review of Microsoft Academic Services for Science ofScience Studies. Frontiers in Big Data, 2. doi:10.3389/FDATA.2019.00045