Another pass over operator data

mathics_scanner/characters.py: * Clarify "associativity" field. Check validity of field in htestgs * Remove a"actual-precedence" field Change name: characters.json -> character_tables.json to match what it is in Mathics3 Makefile: force mathics_scanner/data/operators.json build more often by including it as a dependency of "build".
Mathics3 · Nov 27, 2024 · 14b346f · 14b346f
1 parent 106d81d
commit 14b346f
Show file tree

Hide file tree

Showing 15 changed files with 73 additions and 534 deletions.
diff --git a/.gitignore b/.gitignore
@@ -26,7 +26,7 @@ Test/
 _Copies_/
 _Database_/
 build/
-/mathics_scanner/data/characters.json
+/mathics_scanner/data/character-tables.json
 /mathics_scanner/data/operators.json
 dist/
 tmp
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -13,7 +13,7 @@ repos:
       exclude: ChangeLog-spell-corrected.diff
     - id: trailing-whitespace
     - id: check-json
-      exclude: mathics_scanner/data/characters.json
+      exclude: mathics_scanner/data/character-tables.json
 -   repo: https://github.com/pycqa/isort
     rev: 5.13.2
     hooks:

diff --git a/Makefile b/Makefile
@@ -22,7 +22,7 @@ PIP_INSTALL_OPTS ?=
 #: Default target - same as "develop"
 all: develop
 
-mathics_scanner/data/characters.json: mathics_scanner/data/named-characters.yml
+mathics_scanner/data/character-tables.json: mathics_scanner/data/named-characters.yml
 	$(PIP) install -r requirements-dev.txt
 	$(PYTHON) mathics_scanner/generate/build_tables.py
 
@@ -31,11 +31,11 @@ mathics_scanner/data/operators.json: mathics_scanner/data/operators.yml
 	$(PYTHON) mathics_scanner/generate/build_operator_tables.py
 
 #: build everything needed to install
-build: mathics_scanner/data/characters.json
+build: mathics_scanner/data/characters.json mathics_scanner/data/operators.json
 	$(PYTHON) ./setup.py build
 
 #: Set up to run from the source tree
-develop: mathics_scanner/data/characters.json mathics_scanner/data/operators.json
+develop: mathics_scanner/data/character-tables.json mathics_scanner/data/operators.json
 	$(PIP) install -e .$(PIP_INSTALL_OPTS)
 
 #: Build distribution
@@ -56,16 +56,16 @@ check: pytest
 test: check
 
 #: Build Sphinx HTML documentation
-doc:  mathics_scanner/data/characters.json
+doc:  mathics_scanner/data/character-tables.json
 	make -C docs html
 
 #: Remove derived files
 clean:
 	@find . -name *.pyc -type f -delete; \
-	$(RM) -f mathics_scanner/data/characters.json mathics_scanner/data/operators.json || true
+	$(RM) -f mathics_scanner/data/character-tables.json mathics_scanner/data/operators.json || true
 
 #: Run py.test tests. Use environment variable "o" for pytest options
-pytest: mathics_scanner/data/characters.json
+pytest: mathics_scanner/data/character-tables.json
 	$(PYTHON) -m pytest test $o
 
 #: Print to stdout a GNU Readline inputrc without Unicode

diff --git a/admin-tools/make-tables.sh b/admin-tools/make-tables.sh
@@ -6,5 +6,5 @@ mydir=$(dirname $bs)
 PYTHON=${PYTHON:-python}
 
 cd $mydir/../mathics_scanner/data
-$PYTHON ../generate/build_tables.py -o characters.json
+$PYTHON ../generate/build_tables.py -o character-tables.json
 $PYTHON ../generate/build_operator_tables.py -o operators.json
diff --git a/docs/source/implementation.rst b/docs/source/implementation.rst
@@ -19,10 +19,10 @@ class) whose names are preceded by ``t_``, such as in the following example: ::
        # Some logic goes here...
        pass
 
-A tokenization rule is supposed to take a regular expression match (the 
-``match`` parameter of type ``re.Match``) and convert it to an appropriate 
-token, which is then returned by the method. The rule is also responsible for 
-updating the internal state of the tokeniser, such as incrementing the ``pos`` 
+A tokenization rule is supposed to take a regular expression match (the
+``match`` parameter of type ``re.Match``) and convert it to an appropriate
+token, which is then returned by the method. The rule is also responsible for
+updating the internal state of the tokeniser, such as incrementing the ``pos``
 counter.
 
 A rule is always expected to receive sane input. In other words, deciding which
@@ -67,7 +67,7 @@ field in the YAML table is set to ``true``.
 The conversion routines ``replace_wl_with_plain_text`` and
 ``replace_unicode_with_wl`` use this information to convert between Wolfram's
 internal format and standard Unicode, but it should be noted that the
-conversion scheme is more complex than a simple lookup in the YAML table. 
+conversion scheme is more complex than a simple lookup in the YAML table.
 
 The Conversion Scheme
 ---------------------
@@ -97,7 +97,7 @@ ASCII is the following:
   replaced by it's Unicode equivalent.
 - If a character doesn't have a Unicode equivalent or any of the characters of
   it's Unicode equivalent isn't a valid character then the character is
-  replaced by it's fully qualified name. 
+  replaced by it's fully qualified name.
 
 The ``replace_unicode_with_wl`` function converts text from standard Unicode to
 Wolfram's internal representation.  The algorithm for converting from standard
@@ -123,9 +123,9 @@ tests showed that storing the tables as JSON and using `ujson
 way to access them. However, this is merely an implementation detail and
 consumers of this library should not rely on this assumption.
 
-The conversion tables are stored in the ``data/characters.json`` file, along
+The conversion tables are stored in the ``data/character-tables.json`` file, along
 side other complementary information used internally by the library.
-``data/characters.json`` holds three conversion tables:
+``data/character-tables.json`` holds three conversion tables:
 
 - The ``wl-to-unicode`` table, which stores the precompiled results of the
   Wolfram-to-Unicode conversion algorithm. ``wl-to-unicode`` is used for lookup
@@ -140,7 +140,7 @@ side other complementary information used internally by the library.
   when ``replace_unicode_with_wl`` is called.
 
 The precompiled translation tables, as well as the rest of data stored in
-``data/characters.json``, is generated from the YAML table with the
+``data/character-tables.json``, is generated from the YAML table with the
 ``mathics_scanner.generate.build_tables.compile_tables`` function.
 
 Note that multiple entries in the YAML table are redundant in the following
@@ -155,4 +155,3 @@ precompiled conversion tables. Such optimization makes the tables smaller and
 easier to load. This implies that not all named characters that have a Unicode
 equivalent are included in the precompiled translation tables (the ones that
 are not included are the ones where no conversion is needed).
-
diff --git a/mathics_scanner/characters.py b/mathics_scanner/characters.py
@@ -24,7 +24,7 @@ def get_srcdir() -> str:
 ROOT_DIR = get_srcdir()
 
 # Load the conversion tables from disk
-characters_path = osp.join(ROOT_DIR, "data", "characters.json")
+characters_path = osp.join(ROOT_DIR, "data", "character-tables.json")
 if os.path.exists(characters_path):
     with open(characters_path, "r") as f:
         _data = ujson.load(f)

diff --git a/mathics_scanner/data/README.rst b/mathics_scanner/data/README.rst
@@ -11,4 +11,4 @@ Python programs (via ujson).
 Json output is not formatted in any way to facilitate loaded. To see
 json output formated use a JSON formatter like ``jq``:
 
-   cat characters.json | jq
+   cat character-tables.json | jq
diff --git a/mathics_scanner/data/named-characters.yml b/mathics_scanner/data/named-characters.yml
@@ -59,7 +59,7 @@
 #                            unicode names that we check against. So if the character
 #                            or unicode symbol is not in that, don't use it here.
 #
-#   wl-reference: HTML link to the Worlfram Langauge & System document for character.
+#   wl-reference: HTML link to the Wolfram Langauge & System document for character.
 #
 #   wl-unicode: The unicode code point used by Mathics internally to represent
 #               the named character. If it is the same as unicode-equivalent
@@ -2115,6 +2115,10 @@ Digamma:
   wl-unicode: "\u03DD"
   wl-unicode-name: GREEK SMALL LETTER DIGAMMA
 
+# The WL symbol displays with a round dot at the left endpoint.
+# The unicode equivalent shows omits this
+# When there is a tag over the edge, WL uses a bold variant
+# of the symbol.
 DirectedEdge:
   amslatex: "\\rightarrow"
   esc-alias: de
@@ -10172,6 +10176,10 @@ UnderParenthesis:
   wl-unicode: "\uFE36"
   wl-unicode-name: PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
 
+# The WL symbol displays with a round dot at each endpoint.
+# The unicode equivalent shows arrows at each endpoint.
+# When there is a tag over the edge, WL uses a bold variant
+# of the symbol.
 UndirectedEdge:
   ascii: "<->"
   esc-alias: ue

diff --git a/mathics_scanner/data/operators-intro.yml b/mathics_scanner/data/operators-intro.yml