-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #286 from kavitharaju/test-for-stable
Preps for stable release
- Loading branch information
Showing
16 changed files
with
177 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,42 @@ | ||
# Release Notes | ||
|
||
## Towards Version 3.0.0 | ||
## 3.0.0 | ||
With the 3.x versions, we are transitioning to a [Tree-Sitter](https://tree-sitter.github.io/tree-sitter/) based grammar implementation for usfm-grammar, replacing the [Ohm.js](https://ohmjs.org/) grammar used in the 2.x versions. This upgrade enhances performance, extensibility, and support for complex parsing scenarios. | ||
|
||
1. `tree-sitter-usfm` on NPM | ||
**v3.0.0-alpha.3** | ||
A grammar modelling the USFM language and a parser that can generate a syntax-tree using tree-sitter. Has been tested against USFM/X committee's testsuite for ensuring pass or fail on pasring, via the python module. | ||
#### Variants of USFM-Grammar | ||
We now provide specialized variants of USFM-Grammar tailored for different environments: | ||
* [usfm-grammar](https://pypi.org/project/usfm-grammar/) for Python | ||
* [usfm-grammar](https://www.npmjs.com/package/usfm-grammar) for Node.js | ||
* [usfm-grammar-web](https://www.npmjs.com/package/usfm-grammar-web) for frontend JavaScript and | ||
* A command-line interface (CLI) integrated into the Python package. | ||
|
||
2. `usfm-grammar` on PyPi | ||
**v3.0.0-alpha.5** | ||
A python parser for USFM that uses the `tree-sitter-usfm` grammar implementation. The parser is capable of converting the USFM to other formats like JSON, CSV, USX etc. It can also be used to extract specific contents from the USFM file like just the verses or just the notes. JSON output structure has been updated. Also conversion to USX implemented. Behaviour of filter in the API has be altered. Testing of these features are in progress. | ||
#### Independent Grammar Implementations | ||
For developers working directly with syntax trees, we offer grammar implementations as standalone packages for improved performance: | ||
* [tree-sitter-usfm3](https://pypi.org/project/tree-sitter-usfm3/) for Python, | ||
* [tree-sitter-usfm3](https://www.npmjs.com/package/tree-sitter-usfm3) for Node.js, | ||
* [WASM build](https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm) for fornt-end applications. | ||
|
||
<!-- 3. `language-usfm` on https://atom.io/packages/ | ||
For syntax highlighting and code folding on Atom. | ||
--> | ||
#### USFM-USX-USJ Format Support | ||
Version 3.0.0 expands support across all three formats in the [USFM ecosystem]((https://docs.usfm.bible/usfm/3.1/index.html)): | ||
|
||
* Parse USFM, convert to other formats, and generate USFM from the other two formats. | ||
* Parse USX (XML), convert to other formats, and generate USX from the other two formats. | ||
* Parse USJ (JSON), convert to other formats, and generate USJ from the other two formats. | ||
* Export to additional user-friendly formats such as CSV and BibleNLP. | ||
|
||
|
||
#### Other Features | ||
|
||
* *Marker-Based Filtering*: Simplify the cleanup and reformatting of marker-rich USFM files by specifying markers or marker types to include or exclude. This feature is centered on the USJ format. | ||
* *Error Reporting and Validation*: When initializing a USFMParser with a USFM file, all errors in the file are reported in the USFMParser.errors field. A USJ input can also be validated against its JSON-Schema definition. | ||
* *Error Ignoring Option*: An `ignore_errors=True/False` option is available for format conversion methods, allowing processing of imperfect input files wherever possible. | ||
* *Autofix Errors (Experimental)*: Automatically identifies and fixes common errors in USFM files to streamline processing. | ||
|
||
#### Standards and Testing | ||
|
||
This release adheres to the comprehensive test suite and standards recommended by the USFM/X Technical Committee, ensuring robust validation and compatibility with approved file formats. | ||
|
||
#### Breaking Changes | ||
|
||
* The JSON output schema used in the 2.x versions has been completely replaced with the officially supported USJ format for better compatibility and adherence to standards. | ||
* The APIs in the 2.x Node.js library have been re-designed to support new features and ensure cross-platform consistency. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta" | |
|
||
[project] | ||
name = "usfm-grammar" | ||
version = "3.0.0-beta.17" | ||
version = "3.0.0" | ||
description = "Python parser for USFM files, based on tree-sitter-usfm3" | ||
readme = "README.md" | ||
authors = [{ name = "BCS Team", email = "[email protected]" }] | ||
|
@@ -23,7 +23,7 @@ classifiers = [ | |
keywords = ["usfm", "parser", "grammar", "tree-sitter"] | ||
dependencies = [ | ||
'tree-sitter==0.22.3; python_version >= "3.9"', | ||
'tree-sitter-usfm3==3.0.0-beta.17; python_version >="3.8"', | ||
'tree-sitter-usfm3==3.0.0; python_version >="3.8"', | ||
'lxml==5.2.2; python_version >= "3.5"', | ||
'jsonschema==4.23.0; python_version>= "3.8"' | ||
] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
tree-sitter==0.22.3 | ||
tree-sitter-usfm3==3.0.0-beta.17 | ||
tree-sitter-usfm3==3.0.0 | ||
lxml==5.2.2 | ||
jsonschema==4.23.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,4 +9,4 @@ | |
|
||
Validator = validator.Validator | ||
|
||
__version__ = "3.0.0-beta.17" | ||
__version__ = "3.0.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
'''JSON schema definition included within the package code''' | ||
|
||
usj_schema = { | ||
"$schema": "http://json-schema.org/draft-07/schema", | ||
"$id": "https://github.com/usfm-bible/tcdocs/blob/main/grammar/usj.js", | ||
"title": "Unified Scripture JSON", | ||
"description": "The JSON varient of USFM and USX data models", | ||
"type": "object", | ||
"$defs": { | ||
"markerObject": { | ||
"type": "object", | ||
"properties": { | ||
"type": { | ||
"description": "The kind/category of node or element this is,"+\ | ||
"corresponding the USFM marker and USX node", | ||
"type": "string" | ||
}, | ||
"marker": { | ||
"description": "The corresponding marker in USFM or style in USX", | ||
"type": "string" | ||
}, | ||
"content": { | ||
"type": "array", | ||
"items": { | ||
"anyOf":[ | ||
{"type": "string"}, | ||
{"$ref": "#/$defs/markerObject"} | ||
] | ||
} | ||
}, | ||
"sid": { | ||
"description": "Indicates the Book-chapter-verse value in the paragraph based structure", | ||
"type": "string" | ||
}, | ||
"number": { | ||
"description": "Chapter number or verse number", | ||
"type": "string" | ||
}, | ||
"code": { | ||
"description": "The 3-letter book code in id element", | ||
"pattern": "^[0-9A-Z]{3}$", | ||
"type": "string" | ||
}, | ||
"altnumber": { | ||
"description": "Alternate chapter number or verse number", | ||
"type": "string" | ||
}, | ||
"pubnumber": { | ||
"description": "Published character of chapter or verse", | ||
"type": "string" | ||
}, | ||
"caller": { | ||
"description": "Caller character for footnotes and cross-refs", | ||
"type": "string" | ||
}, | ||
"align": { | ||
"description": "Alignment of table cells", | ||
"type": "string" | ||
}, | ||
"category": { | ||
"description": "Category of extended study bible sections", | ||
"type": "string" | ||
} | ||
}, | ||
"required": ["type"] | ||
} | ||
}, | ||
"properties": { | ||
"type": { | ||
"description": "The kind of node/element/marker this is", | ||
"type": "string" | ||
}, | ||
"version": { | ||
"description": "The USJ spec version", | ||
"type": "string" | ||
}, | ||
"content": { | ||
"description": "The JSON representation of scripture contents from USFM/USX", | ||
"type": "array", | ||
"items":{ | ||
"anyOf":[ | ||
{"type": "string"}, | ||
{"$ref": "#/$defs/markerObject"} | ||
] | ||
} | ||
} | ||
}, | ||
"required": ["type", "version", "content"] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,10 +23,10 @@ function App() { | |
|
||
useEffect(() => { | ||
const initParser = async () => { | ||
await USFMParser.init("https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter-usfm.wasm", | ||
"https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter.wasm"); | ||
await Validator.init("https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter-usfm.wasm", | ||
"https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter.wasm"); | ||
await USFMParser.init("https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm", | ||
"https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter.wasm"); | ||
await Validator.init("https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm", | ||
"https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter.wasm"); | ||
|
||
}; | ||
initParser(); | ||
|
@@ -60,13 +60,13 @@ It can be used directly in the HTML script tag too. Please ensure its dependenci | |
|
||
```html | ||
<script type="module"> | ||
import { USFMParser, Filter, Validator } from 'https://cdn.jsdelivr.net/npm/[email protected]-beta.17/dist/bundle.mjs'; | ||
import { USFMParser, Filter, Validator } from 'https://cdn.jsdelivr.net/npm/[email protected]/dist/bundle.mjs'; | ||
console.log('Hello world'); | ||
(async () => { | ||
await USFMParser.init("https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter-usfm.wasm", | ||
"https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter.wasm"); | ||
await Validator.init("https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter-usfm.wasm", | ||
"https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter.wasm"); | ||
await USFMParser.init("https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm", | ||
"https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter.wasm"); | ||
await Validator.init("https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm", | ||
"https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter.wasm"); | ||
const usfmParser = new USFMParser('\\id GEN\n\\c 1\n\\p\n\\v 1 In the begining..\\v 2 more text') | ||
const output = usfmParser.toUSJ() | ||
console.log({ output }) | ||
|
@@ -132,11 +132,13 @@ Bible NLP format consists of two `txt` files: the first, with verse texts, one p | |
const output = usfmParser.toBibleNlpFormat() | ||
//const output = my_parser.toBibleNlpFormat(true) //ignore_errors | ||
|
||
const textLines = output.text.join('\n'); | ||
fs.writeFileSync('bibleNLP.txt', textLines, { encoding: 'utf-8' }); | ||
output.text.forEach(txt => { | ||
console.log(txt); | ||
}); | ||
|
||
const refLines = output.vref.join('\n'); | ||
fs.writeFileSync('vref.txt', refLines, { encoding: 'utf-8' }); | ||
output.vref.forEach(ref => { | ||
console.log(ref); | ||
}); | ||
``` | ||
|
||
### Table/List format | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters