With the 3.x versions, we are transitioning to a Tree-Sitter based grammar implementation for usfm-grammar, replacing the Ohm.js grammar used in the 2.x versions. This upgrade enhances performance, extensibility, and support for complex parsing scenarios.
Variants of USFM-Grammar
We now provide specialized variants of USFM-Grammar tailored for different environments:
- usfm-grammar for Python
- usfm-grammar for Node.js
- usfm-grammar-web for frontend JavaScript and
- A command-line interface (CLI) integrated into the Python package.
USFM-USX-USJ Format Support
Version 3.0.0 expands support across all three formats in the USFM ecosystem:
- Parses USFM, converts to other formats, and generates USFM from the other two formats
- Parses USX (XML), converts to other formats, and generates USX from the other two formats
- Parses USJ (JSON), converts to other formats, and generates USJ from the other two formats
- Exports to additional user-friendly formats such as CSV and BibleNLP.
Other Features
- Marker-Based Filtering: Simplify the cleanup and reformatting of marker-rich USFM files by specifying markers or marker types to include or exclude. This feature is centered on the USJ format.
- Error Reporting and Validation: When initializing a USFMParser with a USFM file, all errors in the file are reported in the USFMParser.errors field. A USJ input can also be validated against its JSON-Schema definition.
- Error Ignoring Option: An
ignore_errors=True/False
option is available for format conversion methods, allowing processing of imperfect input files wherever possible. - Autofix Errors (Experimental): Automatically identifies and fixes common errors in USFM files to streamline processing.
Standards and Testing
This release adheres to the comprehensive test suite and standards recommended by the USFM/X Technical Committee, ensuring robust validation and compatibility with approved file formats.
Independent Grammar Implementations
For developers working directly with syntax trees, we offer grammar implementations as standalone packages for improved performance:
- tree-sitter-usfm3 for Python,
- tree-sitter-usfm3 for Node.js,
- WASM build for front-end applications.
Breaking Changes
- The JSON output schema used in the 2.x versions has been completely replaced with the officially supported USJ format for better compatibility and adherence to standards.
- The APIs in the 2.x Node.js library have been re-designed to support new features and ensure cross-platform consistency.