Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 3 #278

Merged
merged 136 commits into from
Nov 21, 2024
Merged

Version 3 #278

merged 136 commits into from
Nov 21, 2024

Conversation

kavitharaju
Copy link
Collaborator

@kavitharaju kavitharaju commented Nov 18, 2024

Works from Oct 2021 to Nov 2024

kavitharaju and others added 30 commits October 22, 2021 14:07
* initial survery and development plan

* experiment building a grammar for a simpler USFM like format

* build a simple python program to read grammar's parse output and convert it to a JSON format

* initial commit for usfm grammar: include CFG and tests for header markers

* remove auto-generated files from git repo

* remove commented grammar rules

* include rem marker in headers

* include sts and restore makers to headers like rem

* add rules for introduction markers
* initial survery and development plan

* experiment building a grammar for a simpler USFM like format

* build a simple python program to read grammar's parse output and convert it to a JSON format

* initial commit for usfm grammar: include CFG and tests for header markers

* remove auto-generated files from git repo

* remove commented grammar rules

* include rem marker in headers

* include sts and restore makers to headers like rem

* add rules for introduction markers

* write rules for chapter contents: titles and chapterMeta

* write rules for verse and related markers

* add rules for chapter contents: paragraphs

* test and fix rules

* test and fix chapter and verse rules, except for 1 case

* add rule to allow mte markers inside chapter like other titles

* test and fix verse & chapter test case

* test and fix section heading markers

* test and fix paragraphs, except the cases with quotes

* get all paragraph tests fixed

* add rules for chapter contents: poetry markers

* test and fix poetry markers, except one case

* write a minimal python parser to convert output tree to JSON

* prettify the json output by remving repeated inner objects

* test and fix the missed poetry test case
* add rules for list markers

* add rules for table markers

* try running v3 tests on gitactions

* trial 2: running v3 tests on gitactions

* trial 3: running v3 tests on gitactions

* add rules for footnotes

* add rules for cross-reference

* implement character markers and their nesting

* implement pagebreak marker
* add rules for milestone

* test and fix milestone rules

* move ie marker to end of intro

* implement rules for znamespaces

* implement esb rules
* add rules for milestone

* test and fix milestone rules

* move ie marker to end of intro

* implement rules for znamespaces

* implement esb rules

* remove 'Marker' from tag names

* remove text and change bookIdentification to book in AST

* bundle all chapters together in the AST

* start work on converting AST to JSON

* Revert "remove text and change bookIdentification to book in AST"

This reverts commit 2196b02.

* use hidden text for verse and book-description and visible text for other rules

* keep text field in verseText

* change test ASTs as per the changes in bookcode and description

* Revert "bundle all chapters together in the AST"

This reverts commit 97a434a.

* retain all raw text and spaces in json form as well

* add rule in grammar to retain marker numbers

* setup python ENV for parser module

* update test ASTs with numbered markers

* implement AST to scripture-filtered-json convertion

* design Class and APIs for python USFM parser

* document API usages in a jupyter notebook

* Include line and location of error in error list
* test and fix milestone rules

* move ie marker to end of intro

* implement esb rules

* remove 'Marker' from tag names

* bundle all chapters together in the AST

* use hidden text for verse and book-description and visible text for other rules

* keep text field in verseText

* Revert "bundle all chapters together in the AST"

This reverts commit 97a434a.

* include components title, paragraph and poetry in the AST to be able to filter based on these in parsers

* include separate script for re-building grammar after updation

* implement filter with paragraph, its queries and toTable conversion

* edit sample input with paragraph break after list end

* fix error in toTable method of Notes

* draft an architecture.md

* remove initial experiment files

* ignore pycache and jupyter notebook files

* fix ASTs of znamespace, esb and cat with paragraph and title nodes

* undo adding unnecessary files to repo

* undo adding wasm files to repo
* update architecture.md

* implement granular rules for attributes
* update architecture.md

* implement granular rules for attributes

* Implement USX export: id, c, v, paragraphs

* Make Tag node part of AST in numbered markers

* note down the question on USX

* update test cases after inlcuing tag node in AST

* Implement USX conversion for all para type makers, notes, char , nested and attributed markers

* map the markers to their default attributes in USX conversion

* Fix types and add links in noted down questions

* Figure out how to use USX's rnc grammar for validation

* update grammar and tests to include table cell tags in AST

* Implement USX conversion for tables

* expose milestone and znamespace Tag nodes in AST

* Implement USX conversion for milestones

* include catogory value in the AST and update tests

* implement USX conversion for sidebars, cat and fig

* implement USX conversion for optbreak(\b)

* bug fixes in usx conversion

* update notes on grammar

* update note caller rule to allow sequence of any charaters
* switch to python3.10

* change Python API names
 - Enum Format.ST = "syntax-tree"
 - Class data member USFMParser.syntax_tree
 - Class data member USFMParser.USFM_bytes
 - Class member function USFMParser.to_syntax_tree()
 - Class member function USFMParser.to_dict()
 - Class member function USFMParser.to_list()
 - Class member function USFMParser.to_markdown()
 - Class member function USFMParser.to_usx()

* use match-case in place of if-else when useful

* update the API guide jupyter notebook with new names

* use lxml library instead of xml

* keep class members all in lowercase: usfm, usfm_bytes
kavitharaju and others added 24 commits July 2, 2024 13:47
Bump version: 3.0.0-beta.8 → 3.0.0-beta.9
* Pull in latest changes of Test Suite from tcdocs

* Update tests readme

* Name the versionNumber node in \usfm

* Allow \fig within footnote

* Make attribute in rb mandatory

* Allow \cat within footnote

* Allow cat within crossrefs too

* New tag ref and allow \xt and \+xt in footnotes

* Allow multiple imt blocks in same file

* Override 3 tests for parsing

* Change version number to 3.1 in USX and USJ

* Add loc as default attrib of ref in USX/J generation code

* Add ref node handling and its default attrib in python module

* Add ref to the list of char markers in conversion code

* Hand difference in space and line handling in tcdocs and our usj

* Bring \fig as noteText content and not note content

* Strip all text values in testsuite and generated USJ before comparison

* Seperate footnoteText and crossrefText to allow \xt in one and not in the other

* Exclude the \usfm marker when creating USJ

* \fv is within footnoteText, not within footnote

* List issue USXs in test script to exlcude them

* Include the valid testcases back as per new rules in 3.1

* Type of b is not optbreak, but para

* Type of b is not optbreak, but para

* Allow attributes with and without link- prefix

* Add space before closing marker in usfm generation

* Handle list of wrong USXs in test script

* Add \esbe in generating USFM from USX

* Fix imports in tests

* All char marker nesting without +

* Include tests with default markers in milestone

* Undo allowing char within char without +

* exclude two failing tests for now

* exclude two failing tests for now

* fix linting issues

* fix requirements file
* Allow any 3 letter code as bookcode

* Support nested markers without +

* Make closing mandatory for \fv

* Introduce 'key' attribute in \k
* initial

* using base64 grammar to fix the missing magic number error while loading wasm

* Use published tree-sitter-usfm3 library (#254)

* update python builds for macOS intel and apple silicon

* Pin dependency versions to latest working

* Bump version: 3.0.0-beta.5 → 3.0.0-beta.6

* Bump version of tree-sitter-usfm3 as well

* Revert "update python builds for macOS intel and apple silicon"

* Pin dependency versions in pyptoject.toml as well

* Bump version: 3.0.0-beta.6 → 3.0.0-beta.7

* Bump version in tree-sitter module as well

* Upgrade the tree-sitter and tree-sitter-cli node libraries

* Regenerate grammar, binaries and other files using newer version

* Configure setup.py and .toml files for new python tree-sitter-usfm3 grammar package

* Update the python parser module to use published grammar library instead of binary .so file

* Change the installation and packaging config of python usfm-grammar to not use binary but the published python package for grammar

* Update documentations(dev notes)

* Keep bump version configs common for py-parser module and tree-sitter-usfm3 module

* Git actions, trail #1

* Save changes in setup.py

* Fix issues in bumpversion.cfg

* Bump version: 3.0.0-beta.7 → 3.0.0-alpha.8

* Gitactions trials

* Try publish to test.pypi

* Try publish to test.pypi

* Try publish to test.pypi

* pin tree-sitter-usfm3 version in pyproject.toml, but remove it when running on github actions

* Bump version: 3.0.0-alpha.8 → 3.0.0-alpha.9

* Add changes to PyPi publish workflow

* Bump version: 3.0.0-alpha.9 → 3.0.0-alpha.10

* Try to fix bumpversion

* Bump version: 3.0.0-alpha.10 → 3.0.0-alpha.11

* Bump version: 3.0.0-alpha.11 → 3.0.0-beta.8

* Change the build process in npm publish

* Bump version: 3.0.0-beta.8 → 3.0.0-beta.9

* Update pypi_publish.yml

* Bump version: 3.0.0-beta.9 → 3.0.0-beta.10

* Test suite update from tcdocs (#253)

* Pull in latest changes of Test Suite from tcdocs

* Update tests readme

* Name the versionNumber node in \usfm

* Allow \fig within footnote

* Make attribute in rb mandatory

* Allow \cat within footnote

* Allow cat within crossrefs too

* New tag ref and allow \xt and \+xt in footnotes

* Allow multiple imt blocks in same file

* Override 3 tests for parsing

* Change version number to 3.1 in USX and USJ

* Add loc as default attrib of ref in USX/J generation code

* Add ref node handling and its default attrib in python module

* Add ref to the list of char markers in conversion code

* Hand difference in space and line handling in tcdocs and our usj

* Bring \fig as noteText content and not note content

* Strip all text values in testsuite and generated USJ before comparison

* Seperate footnoteText and crossrefText to allow \xt in one and not in the other

* Exclude the \usfm marker when creating USJ

* \fv is within footnoteText, not within footnote

* List issue USXs in test script to exlcude them

* Include the valid testcases back as per new rules in 3.1

* Type of b is not optbreak, but para

* Type of b is not optbreak, but para

* Allow attributes with and without link- prefix

* Add space before closing marker in usfm generation

* Handle list of wrong USXs in test script

* Add \esbe in generating USFM from USX

* Fix imports in tests

* All char marker nesting without +

* Include tests with default markers in milestone

* Undo allowing char within char without +

* exclude two failing tests for now

* exclude two failing tests for now

* fix linting issues

* fix requirements file

* update python builds for macOS intel and apple silicon

* Pin dependency versions to latest working

* Bump version: 3.0.0-beta.5 → 3.0.0-beta.6

* Bump version of tree-sitter-usfm3 as well

* Revert "update python builds for macOS intel and apple silicon"

* Pin dependency versions in pyptoject.toml as well

* Bump version: 3.0.0-beta.6 → 3.0.0-beta.7

* Bump version in tree-sitter module as well

* Upgrade the tree-sitter and tree-sitter-cli node libraries

* Regenerate grammar, binaries and other files using newer version

* Configure setup.py and .toml files for new python tree-sitter-usfm3 grammar package

* Update the python parser module to use published grammar library instead of binary .so file

* Change the installation and packaging config of python usfm-grammar to not use binary but the published python package for grammar

* Update documentations(dev notes)

* Keep bump version configs common for py-parser module and tree-sitter-usfm3 module

* Git actions, trail #1

* Save changes in setup.py

* Fix issues in bumpversion.cfg

* Bump version: 3.0.0-beta.7 → 3.0.0-alpha.8

* Gitactions trials

* Try publish to test.pypi

* Try publish to test.pypi

* Try publish to test.pypi

* pin tree-sitter-usfm3 version in pyproject.toml, but remove it when running on github actions

* Bump version: 3.0.0-alpha.8 → 3.0.0-alpha.9

* Add changes to PyPi publish workflow

* Bump version: 3.0.0-alpha.9 → 3.0.0-alpha.10

* Try to fix bumpversion

* Bump version: 3.0.0-alpha.10 → 3.0.0-alpha.11

* Bump version: 3.0.0-alpha.11 → 3.0.0-beta.8

* Change the build process in npm publish

* Bump version: 3.0.0-beta.8 → 3.0.0-beta.9

* Update pypi_publish.yml

* Bump version: 3.0.0-beta.9 → 3.0.0-beta.10

* Try using web-tree-sitter and wasm file

* Test publish alpha.3

* More Grammar changes as per USFM/X 3.1 (#255)

* Allow any 3 letter code as bookcode

* Support nested markers without +

* Make closing mandatory for \fv

* Introduce 'key' attribute in \k

* Try fixing the issue with init()

* Include the grammar wasm in the npm package

* try publish locally

* Use babel for generating commonjs distribution from the esm code base

* Remove the src/grammar folder with grammar.base64 array and use wasm

* Fix the basic test scripts to use files in dist/ folder generated after npm run build command

* Add notes on how to build and locally publish the js module

* Add js-module files in bumpbersion config

* Update the usage in README

* Use named export instead of default for USFMParser

* revert to parcel build instead of webpack

* Test publish alpha.4 using web-tree-sitter, parcel, wasms etc in cjs and esm formats. Tested in node, and NextJs

---------

Co-authored-by: Chris Vaughn <[email protected]>

* React support with seperate usfm-grammar-web (#256)

* Bundle tree-sitter.js instead of keeping web-tree-sitter dependency

* Fix the parser init to use input wasm path

* Keep a seperate usfm-grammar-web

* keep a separate usfm-grammar for node w/o using wasm files

* Test publish another alpha.8 version of usfm-grammar at npm

* Test node n web (#257)

* Setup mocha and a sanity testcase in node-usfm-grammar

* Pass USFM at object creation not to function. Not throw errors immediately, allow fromUsj parser creation

* Add 7 tests to check object initialization and error handling

* Add test for parsing usfm and validating pass/fail

* Add tests for parsing pass or fail check of testsuite samples

* Override \s5 fails and milestone w/o \* fails

* Increase buffer size for large file parsing

* Add a toSyntaxTree() method like in python

* Undo overriding \s5 w/o \p fails and milestone w/o \* fails

* Find and report MISSING nodes in tree as errors

* Override samples with \s5 w/o space to fail and wrap ups test for parsing check

* Rename usfmToUsj() to toUSJ() to be similar to the python API

* change USJ version to 3.1 in generation code

* Start testing the successful conversion of USFM to USJ for positive testsuite samples

* Increase the timeout threshold for mocha test case runs as USJ conversion takes time

* Extract \cat value correctly

* Use toUSJ() in prevoius test case too

* Start test by comparing generated USJ to testsuite reference

* Handle difference in space, line and 'lemma' handing in tcdocs and usfm-grammar before comparing USJs

* Exclude the \usfm markers node from USJ

* Keep b as an empty paragraph

* Implement \ref node

* Handle default attribute handling for k, ref, xt etc

* Include table cell processing in USJ generation

* Exclude USJ file with \ref not present in USFM from comparison test

* Replace ~ with  space(which is not done in python)

* Fix undecalred var issue in para object conversion

* Exclude 2 USJs from comparison: an \lit issue and a lemma issue from USX

* Fix issue in adding ca and cp values in chapter node

* Avoid adding \+ for fv marker when regenerating USFM

* Convert altnumber and pubnumber back to usfm markers in USFM generation

* Handle ref object not having marker field

* Test USFM round tripping via USJ and cross check presence of all input markers in output USFM also

* update API changes in Readme

* Fix the example in Readme.md

* S5 support (#259)

* Handle error reporting and use of ignore-errors in node module properly

* Prepare for alpha.9 release of node and web modules for \s5 support with ignore errors

* Make API changes and do error handling in web, similar to python and node packages

* Change documentations corresponding to web module usage and development

* Update dev notes regarding node and web modules

* Test node n web (#260)

* Tests in web:Get the basic tests for APIs, error reports to work as in node module

* Tests in web:Get the parsing tests to work as in node module

* Tests in web:Copy the usj tests from node module

* Web: keep version as 3.1

* Web: extract \cat's value correctly

* Web: include notes in USJ that was missed

* Web: exclude \usfm node when making USJ, as version is added by default

* Web: treat \b as para not optbreak

* Web: process ref nodes and changes from link-href to href

* Web: check for table cells and add them in USJ

* Web: fix issue with ca and cp in USJ generation

* Web: replace ~ with space in USJ generation

* Web: handle ref node lacking marker field in USJ while USFM generation

* Web: handle ca cp va vp in usfmGeneration from usj

* Web testing: All the 770 tests in node module running successfully in web too

* Correct the version tag of usfm-grammar-web

* Tests/python node web (#261)

* Copy the list of overriding tests from node to python

* Fix issue in python test script where space in lemma is handled for comparison

* Node: Add tests to check for all markers in generated USJ

* Web: Add tests to check for all markers in generated USJ

* Update the USJ schema with proper schema name

* Node: Add tests to validate the generated USJs against JSON schema

* Web: Add tests to validate the generated USJs against JSON schema

* Node: Fix issues in include and exclude markers options

* Node: Add tests for exclude and include markers

* Web: Fix issues in inlcude and exclude markers options

* Web: Add tests for include and exclude marker options

* Web: Fix the 'table index is out of bounds' error when running all tests at once

* Python: Report MISSING nodes are errors

* Gitactions for js tests: Trial #1

* Gitactions for js tests: Trial #2

* Gitactions for js tests: Trial #3

* Gitactions for js tests: Trial #4

* Pylint error fix

* Gitactions: Let tests run on all PRs

* USFM toList() Implementation (#262)

* Node: Implement toList method like we have in python

* Node: Add tests for list conversion

* Web: Implement toList method like we have in python and node

* Web: Add tests for list conversion

* Remove old trials for web version from node module

* USX implementations in node and web modules (#263)

* Node: Start with toUSX() with xmldom library

* Node: Startwith USXGenerator class, constructore and Id node

* Node: Implement chapter, verse, text etc methods in USX Generation

* Node: Implement content paragraph node in USX Generation

* Node: Implement Notes conversion to USX

* Node: Implement char nodes and attributes conversion to USX

* Node: Implement esb, cat, ref etc and generic parastyle markers in USX generation

* Node: Implement milestone and table nodes in USX generation

* Node: Make verse nodes empty and not carrying the text in USX generation

* Node: Return xlmdom element instead of string after USX generation

* Node: Fix issue of not adding node to xml tree before processing children

* Node: Add verse end node at chapter end

* Node: More minor fixes in USX generation

* Node: Use @xmldom/xlmdom instead of xmldom and xml2js

* Node: Add tests for errorless usfm-usx conversion checks

* Node: Switch back to xmldom for speed

* Node: exlcude usfm(version) node in USX

* Node: Fix issue with numbered markers

* Node: Fix marker usage instead od style

* Node: Fix issues of pi style value

* Node: Fix the similar issue with numbered marker fiun in USJ generation

* Node: Keep ref marker not as char in USX generation

* Node: tests for checking all markers in generated USX

* Node: Minor fix in ref handling

* Node: Allow tests to run in parallel

* Web: Replicate toUSX() implementation as in Node

* Web: Add tests for toUSX() connersion as in Node

* Node: Fix issue is error handling

* Node: Implement USFM Generation from USX and fromUsx initialization of parser

* Node: Add tests for roundtripping USFM via USX

* Web: Implement USFM Generation from USX and fromUsx initialization of parser

* Web: Add tests for roundtripping USFM via USX

---------

Co-authored-by: kavitharaju <[email protected]>
Co-authored-by: Chris Vaughn <[email protected]>
* Allow empty strings and trailing and preceding spaces in text in USJ

* Allow empty strings and trailing and preceding spaces in text in USX also

* Allow empty strings and trailing and preceding spaces in text in USX also

* Handle whitespace only strings in test cases when comparing with test suite samples

* Fix typo in web tests

* Specify tree-sitter-usfm3 version correctly in node. Fix #266
* Python: Add a Validator class with validate_usfm() and validate_usj() methods

* Python: Try implementing auto fix for common issues

* Add some test cases for auto fix

* Implement validation and autofix in node

* Add tests for autofix in node

* Fix an issue in empty attribute case in python module

* Implement validation and autofix in web

* Add tests for autofix in web module

* Fix issue in USJ error reporting in node and web. Also remobe print statements

* Handle pylint issues

* Add tests for USJ validation in node, web and python
* Add jsonschema dependency in python package

* Bump version: 3.0.0-beta.11 → 3.0.0-beta.12

* Add a check for USJ objects and rasie error if not dict or doesn't have 'type'

* Test for USJ in node

* Add tests for invaid USJ case

* Check for invaid USJ in web
* Add the jsonschema dependency in pyproject.toml file as well

* Bump version: 3.0.0-beta.12 → 3.0.0-beta.13
* Fix pyproject.toml

* Bump version: 3.0.0-beta.13 → 3.0.0-beta.14
* Update documentation on running tests and publishing

* Node: Update version number and package description

* Node: Include USJ schema in scr/utils

* Node: Update error message with usage of 'IgnoreErrors=true'

* Node: Add documentaion on usage similar to python module and add examples for autofix and validation

* Web : Prepare module for alpha.10 release

* Readme formatting in python module

* Prepare for automatic bumpversion in node and web modules

* Bump version trail #2

* Bump version trail #3

* Bump version trail #4

* Bump version: 3.0.0-beta.14 → 3.0.0-beta.15

* Automate node and web package publishing trail 1

* Automate node and web package publishing trail 2

* Automate node and web package publishing trail 3

* Automate node and web package publishing trail 4
* Attempt to fix gitactions for node publishing

* Bump version: 3.0.0-beta.15 → 3.0.0-beta.16
@kavitharaju kavitharaju requested a review from joelthe1 November 19, 2024 04:50
@joelthe1 joelthe1 merged commit fd59749 into master Nov 21, 2024
10 checks passed
@joelthe1
Copy link
Collaborator

This comprises a re-write of usfm-grammar using Tree-sitter and with significant updates to functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants