Skip to content

Conversion of IPCC documents into semantic form

License

Notifications You must be signed in to change notification settings

AmbrineH/semanticClimate

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

semanticClimate

Conversion of IPCC documents into semantic form

goals

  • to convert the IPCC documents from PDF into (a) HTML (b) XML
  • extract terms and exploire their use and meaning
  • link terms to Wikidata and create AMI-dictionaries
  • create new structiures for navigation, search, display

Content

Initially we will start with AR6 WGIII but move onto other WG's and perhaps look backwards as well.

Strategy

Download components, using a hierarchical naming scheme, and convert to text (pdf2txt)

semanticClimate pm286$ cd ipcc/ar6/wg3/
$ ls
Chapter01.pdf
$ mkdir Chapter01
$ cp Chapter01.pdf Chapter01/fulltext.pdf
$ cd Chapter01
$ pdf2txt.py -o fulltext.txt fulltext.pdf 
$ ls
fulltext.pdf	fulltext.txt

About

Conversion of IPCC documents into semantic form

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%