Skip to content

Commit

Permalink
Merge pull request #286 from kavitharaju/test-for-stable
Browse files Browse the repository at this point in the history
Preps for stable release
  • Loading branch information
joelthe1 authored Dec 12, 2024
2 parents 5f065c6 + b12389e commit 82a5270
Show file tree
Hide file tree
Showing 16 changed files with 177 additions and 45 deletions.
4 changes: 2 additions & 2 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
[bumpversion]
current_version = 3.0.0-beta.17
current_version = 3.0.0
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)\-(?P<release>\w+).(?P<num>\d+)
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(?:\-(?P<release>\w+)\.(?P<num>\d+))?
serialize =
{major}.{minor}.{patch}-{release}.{num}
{major}.{minor}.{patch}
Expand Down
47 changes: 37 additions & 10 deletions docs/Releases.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,42 @@
# Release Notes

## Towards Version 3.0.0
## 3.0.0
With the 3.x versions, we are transitioning to a [Tree-Sitter](https://tree-sitter.github.io/tree-sitter/) based grammar implementation for usfm-grammar, replacing the [Ohm.js](https://ohmjs.org/) grammar used in the 2.x versions. This upgrade enhances performance, extensibility, and support for complex parsing scenarios.

1. `tree-sitter-usfm` on NPM
**v3.0.0-alpha.3**
A grammar modelling the USFM language and a parser that can generate a syntax-tree using tree-sitter. Has been tested against USFM/X committee's testsuite for ensuring pass or fail on pasring, via the python module.
#### Variants of USFM-Grammar
We now provide specialized variants of USFM-Grammar tailored for different environments:
* [usfm-grammar](https://pypi.org/project/usfm-grammar/) for Python
* [usfm-grammar](https://www.npmjs.com/package/usfm-grammar) for Node.js
* [usfm-grammar-web](https://www.npmjs.com/package/usfm-grammar-web) for frontend JavaScript and
* A command-line interface (CLI) integrated into the Python package.

2. `usfm-grammar` on PyPi
**v3.0.0-alpha.5**
A python parser for USFM that uses the `tree-sitter-usfm` grammar implementation. The parser is capable of converting the USFM to other formats like JSON, CSV, USX etc. It can also be used to extract specific contents from the USFM file like just the verses or just the notes. JSON output structure has been updated. Also conversion to USX implemented. Behaviour of filter in the API has be altered. Testing of these features are in progress.
#### Independent Grammar Implementations
For developers working directly with syntax trees, we offer grammar implementations as standalone packages for improved performance:
* [tree-sitter-usfm3](https://pypi.org/project/tree-sitter-usfm3/) for Python,
* [tree-sitter-usfm3](https://www.npmjs.com/package/tree-sitter-usfm3) for Node.js,
* [WASM build](https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm) for fornt-end applications.

<!-- 3. `language-usfm` on https://atom.io/packages/
For syntax highlighting and code folding on Atom.
-->
#### USFM-USX-USJ Format Support
Version 3.0.0 expands support across all three formats in the [USFM ecosystem]((https://docs.usfm.bible/usfm/3.1/index.html)):

* Parse USFM, convert to other formats, and generate USFM from the other two formats.
* Parse USX (XML), convert to other formats, and generate USX from the other two formats.
* Parse USJ (JSON), convert to other formats, and generate USJ from the other two formats.
* Export to additional user-friendly formats such as CSV and BibleNLP.


#### Other Features

* *Marker-Based Filtering*: Simplify the cleanup and reformatting of marker-rich USFM files by specifying markers or marker types to include or exclude. This feature is centered on the USJ format.
* *Error Reporting and Validation*: When initializing a USFMParser with a USFM file, all errors in the file are reported in the USFMParser.errors field. A USJ input can also be validated against its JSON-Schema definition.
* *Error Ignoring Option*: An `ignore_errors=True/False` option is available for format conversion methods, allowing processing of imperfect input files wherever possible.
* *Autofix Errors (Experimental)*: Automatically identifies and fixes common errors in USFM files to streamline processing.

#### Standards and Testing

This release adheres to the comprehensive test suite and standards recommended by the USFM/X Technical Committee, ensuring robust validation and compatibility with approved file formats.

#### Breaking Changes

* The JSON output schema used in the 2.x versions has been completely replaced with the officially supported USJ format for better compatibility and adherence to standards.
* The APIs in the 2.x Node.js library have been re-designed to support new features and ensure cross-platform consistency.
5 changes: 3 additions & 2 deletions node-usfm-parser/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ npm install usfm-grammar
### Importing, parsing USFM, checking errors

```javascript
const {USFMParser} = require('usfm-grammar');
const {USFMParser, Filter} = require('usfm-grammar');

const USFM = '\\id GEN\n\\c 1\n\\p\n\\v 1 In the begining..\\v 2 some more text'
const usfmParser = new USFMParser(USFM);
Expand Down Expand Up @@ -65,6 +65,7 @@ console.log(usfmGen);
Bible NLP format consists of two `txt` files: the first, with verse texts, one per line and the second, with corresponding references. The API generates a JSON with two fields, `text` and `vref`, each containing an array of strings.

```javascript
const fs = require('fs');

const output = usfmParser.toBibleNlpFormat()
//const output = my_parser.toBibleNlpFormat(true) //ignore_errors
Expand All @@ -82,7 +83,7 @@ fs.writeFileSync('vref.txt', refLines, { encoding: 'utf-8' });
const listOutput = usfmParser.toList();
/* const listOutput = usfmParser.toList(
Filter.NOTES, //exclude
["id", "c", "v"] //include
["id", "c", "v"], //include
true, //ignore errors
true //combine texts
)*/
Expand Down
4 changes: 2 additions & 2 deletions node-usfm-parser/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "usfm-grammar",
"version": "3.0.0-beta.17",
"version": "3.0.0",
"description": "Uses the tree-sitter-usfm3 parser to convert USFM files to other formats such as USJ, USX, and CSV, and converts them back to USFM",
"main": "./dist/cjs/index.cjs",
"module": "./dist/es/index.mjs",
Expand Down Expand Up @@ -28,7 +28,7 @@
"dependencies": {
"ajv": "^8.17.1",
"tree-sitter": "0.21.1",
"tree-sitter-usfm3": "3.0.0-beta.17",
"tree-sitter-usfm3": "3.0.0",
"xmldom": "^0.6.0",
"xpath": "^0.0.34"
},
Expand Down
18 changes: 17 additions & 1 deletion py-usfm-parser/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ print(my_parser2.usfm)

##### To remove unwanted markers from USFM
```python
from usfm_grammar import USFMParser, Filter, USFMGenerator
from usfm_grammar import USFMParser, Filter

my_parser = USFMParser(input_usfm_str)
usj_obj = my_parser.to_usj(include_markers=Filter.BCV+Filter.TEXT)
Expand Down Expand Up @@ -151,6 +151,22 @@ with open(test_xml_file, 'r', encoding='utf-8') as usx_file:
# print(my_parser.to_list())
```

#### Experimental Validation and Autofix

For USJ:
```python
from usfm_grammar import Validator

wrong_USFM="\\id GEN\n\\c 1\n\\v 1 test verse"
checker = Validator();
resp = checker.is_valid_usfm(wrong_USFM); # true or false
print(checker.message) # List of errors if present

edited_USFM = checker.auto_fix_usfm(wrong_USFM);
print(checker.message); # Report on autofix attempt
```


### From CLI

```
Expand Down
4 changes: 2 additions & 2 deletions py-usfm-parser/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "usfm-grammar"
version = "3.0.0-beta.17"
version = "3.0.0"
description = "Python parser for USFM files, based on tree-sitter-usfm3"
readme = "README.md"
authors = [{ name = "BCS Team", email = "[email protected]" }]
Expand All @@ -23,7 +23,7 @@ classifiers = [
keywords = ["usfm", "parser", "grammar", "tree-sitter"]
dependencies = [
'tree-sitter==0.22.3; python_version >= "3.9"',
'tree-sitter-usfm3==3.0.0-beta.17; python_version >="3.8"',
'tree-sitter-usfm3==3.0.0; python_version >="3.8"',
'lxml==5.2.2; python_version >= "3.5"',
'jsonschema==4.23.0; python_version>= "3.8"'
]
Expand Down
2 changes: 1 addition & 1 deletion py-usfm-parser/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
tree-sitter==0.22.3
tree-sitter-usfm3==3.0.0-beta.17
tree-sitter-usfm3==3.0.0
lxml==5.2.2
jsonschema==4.23.0
2 changes: 1 addition & 1 deletion py-usfm-parser/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ def has_ext_modules(self):

setup(
name="usfm-grammar", # Required
version="3.0.0-beta.17", # Required
version="3.0.0", # Required
python_requires=">=3.10",
# install_requires=["tree-sitter==0.22.3",
# "tree-sitter-usfm3==3.0.0-beta.7",
Expand Down
2 changes: 1 addition & 1 deletion py-usfm-parser/src/usfm_grammar/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@

Validator = validator.Validator

__version__ = "3.0.0-beta.17"
__version__ = "3.0.0"
89 changes: 89 additions & 0 deletions py-usfm-parser/src/usfm_grammar/schema.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
'''JSON schema definition included within the package code'''

usj_schema = {
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://github.com/usfm-bible/tcdocs/blob/main/grammar/usj.js",
"title": "Unified Scripture JSON",
"description": "The JSON varient of USFM and USX data models",
"type": "object",
"$defs": {
"markerObject": {
"type": "object",
"properties": {
"type": {
"description": "The kind/category of node or element this is,"+\
"corresponding the USFM marker and USX node",
"type": "string"
},
"marker": {
"description": "The corresponding marker in USFM or style in USX",
"type": "string"
},
"content": {
"type": "array",
"items": {
"anyOf":[
{"type": "string"},
{"$ref": "#/$defs/markerObject"}
]
}
},
"sid": {
"description": "Indicates the Book-chapter-verse value in the paragraph based structure",
"type": "string"
},
"number": {
"description": "Chapter number or verse number",
"type": "string"
},
"code": {
"description": "The 3-letter book code in id element",
"pattern": "^[0-9A-Z]{3}$",
"type": "string"
},
"altnumber": {
"description": "Alternate chapter number or verse number",
"type": "string"
},
"pubnumber": {
"description": "Published character of chapter or verse",
"type": "string"
},
"caller": {
"description": "Caller character for footnotes and cross-refs",
"type": "string"
},
"align": {
"description": "Alignment of table cells",
"type": "string"
},
"category": {
"description": "Category of extended study bible sections",
"type": "string"
}
},
"required": ["type"]
}
},
"properties": {
"type": {
"description": "The kind of node/element/marker this is",
"type": "string"
},
"version": {
"description": "The USJ spec version",
"type": "string"
},
"content": {
"description": "The JSON representation of scripture contents from USFM/USX",
"type": "array",
"items":{
"anyOf":[
{"type": "string"},
{"$ref": "#/$defs/markerObject"}
]
}
}
},
"required": ["type", "version", "content"]
}
7 changes: 2 additions & 5 deletions py-usfm-parser/src/usfm_grammar/validator.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,22 @@
'''Check the formats of USFM and USJ. Also tries to fixe common errors in USFM'''

import re
import json
import jsonschema

import tree_sitter_usfm3 as tsusfm
from tree_sitter import Language, Parser

from usfm_grammar.usfm_parser import error_query
from usfm_grammar.schema import usj_schema

class Validator:
'''Check validity of USJ and USFM. Also auto fix USFM'''
def __init__(self, tree_sitter_usfm=tsusfm, usj_schema_path='../schemas/usj.js'):
def __init__(self, tree_sitter_usfm=tsusfm):
'''contrsuctor'''
usfm_language = Language(tree_sitter_usfm.language())
self.usfm_parser = Parser(usfm_language)
self.usfm_errors = []

usj_schema = None
with open(usj_schema_path, 'r', encoding='utf-8') as json_file:
usj_schema = json.load(json_file)
self.usj_validator = jsonschema.validators.Draft7Validator(schema=usj_schema)

self.message = ""
Expand Down
4 changes: 2 additions & 2 deletions tree-sitter-usfm3/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tree-sitter-usfm3/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "tree-sitter-usfm3",
"version": "3.0.0-beta.17",
"version": "3.0.0",
"description": "Grammar representation and parser for USFM language using tree-sitter",
"main": "bindings/node",
"types": "bindings/node",
Expand Down
2 changes: 1 addition & 1 deletion tree-sitter-usfm3/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "tree-sitter-usfm3"
description = "Usfm3 grammar for tree-sitter"
version = "3.0.0-beta.17"
version = "3.0.0"
keywords = ["incremental", "parsing", "tree-sitter", "usfm3"]
classifiers = [
"Intended Audience :: Developers",
Expand Down
28 changes: 15 additions & 13 deletions web-usfm-parser/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ function App() {

useEffect(() => {
const initParser = async () => {
await USFMParser.init("https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter-usfm.wasm",
"https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter.wasm");
await Validator.init("https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter-usfm.wasm",
"https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter.wasm");
await USFMParser.init("https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm",
"https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter.wasm");
await Validator.init("https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm",
"https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter.wasm");

};
initParser();
Expand Down Expand Up @@ -60,13 +60,13 @@ It can be used directly in the HTML script tag too. Please ensure its dependenci

```html
<script type="module">
import { USFMParser, Filter, Validator } from 'https://cdn.jsdelivr.net/npm/[email protected]-beta.17/dist/bundle.mjs';
import { USFMParser, Filter, Validator } from 'https://cdn.jsdelivr.net/npm/[email protected]/dist/bundle.mjs';
console.log('Hello world');
(async () => {
await USFMParser.init("https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter-usfm.wasm",
"https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter.wasm");
await Validator.init("https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter-usfm.wasm",
"https://cdn.jsdelivr.net/npm/[email protected]-beta.17/tree-sitter.wasm");
await USFMParser.init("https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm",
"https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter.wasm");
await Validator.init("https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter-usfm.wasm",
"https://cdn.jsdelivr.net/npm/[email protected]/tree-sitter.wasm");
const usfmParser = new USFMParser('\\id GEN\n\\c 1\n\\p\n\\v 1 In the begining..\\v 2 more text')
const output = usfmParser.toUSJ()
console.log({ output })
Expand Down Expand Up @@ -132,11 +132,13 @@ Bible NLP format consists of two `txt` files: the first, with verse texts, one p
const output = usfmParser.toBibleNlpFormat()
//const output = my_parser.toBibleNlpFormat(true) //ignore_errors

const textLines = output.text.join('\n');
fs.writeFileSync('bibleNLP.txt', textLines, { encoding: 'utf-8' });
output.text.forEach(txt => {
console.log(txt);
});

const refLines = output.vref.join('\n');
fs.writeFileSync('vref.txt', refLines, { encoding: 'utf-8' });
output.vref.forEach(ref => {
console.log(ref);
});
```

### Table/List format
Expand Down
2 changes: 1 addition & 1 deletion web-usfm-parser/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "usfm-grammar-web",
"version": "3.0.0-beta.17",
"version": "3.0.0",
"description": "Uses the tree-sitter-usfm3 parser to convert USFM files to other formats such as USJ, USX, and CS, and converts them back to USFM.",
"type": "module",
"module": "dist/bundle.mjs",
Expand Down

0 comments on commit 82a5270

Please sign in to comment.