Skip to content

Rapidly transform vast amounts of TEI XML files using the power of Saxon and multiprocessing

License

Notifications You must be signed in to change notification settings

Pantagrueliste/multi-saxon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

multi-saxon

DOI

multi-saxon swiftly converts large amounts of XML TEI files into text. Harnessing the power of Saxonica's SaxonC-HE processor (XSLT 2.0+), it handles XSLT 2.0 and 3.0 transformations in parallel. This approach enables users to circumvent some of the limitations of lxml, which in spite of its speed, operates exclusively within the XSLT 1.0 framework.

Features

  • Fast Transformations: Utilize the multiprocessing capabilities of your machine for simultaneous XML transformations.
  • Saxon Integration: Seamlessly process XML files using the renowned Saxon processor.
  • CSV Output: Generate comprehensive CSV reports containing relevant metadata about the processed XML TEI files.
  • Limited Logging Capabilities:

Limitations

  • multi-saxon is optimized for TEI P5 files. I do not plan on extending it to other frameworks.

Upcoming Features

- A separate config.toml file to increase metadata customization.

Installation

  1. Ensure you have Python 3.x installed on your machine. If not, download and install Python.

  2. Clone this repository:

    git clone https://github.com/Pantagrueliste/multi-saxon.git

About

Rapidly transform vast amounts of TEI XML files using the power of Saxon and multiprocessing

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages