Skip to content

Commit

Permalink
Another pass over operator data
Browse files Browse the repository at this point in the history
mathics_scanner/characters.py:
* Clarify "associativity" field. Check validity of field in htestgs
* Remove a"actual-precedence" field

Change name:
characters.json -> character_tables.json to match what it is in Mathics3

Makefile: force mathics_scanner/data/operators.json build more often by
including it as a dependency of "build".
  • Loading branch information
rocky committed Nov 27, 2024
1 parent 106d81d commit 14b346f
Show file tree
Hide file tree
Showing 15 changed files with 73 additions and 534 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Test/
_Copies_/
_Database_/
build/
/mathics_scanner/data/characters.json
/mathics_scanner/data/character-tables.json
/mathics_scanner/data/operators.json
dist/
tmp
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ repos:
exclude: ChangeLog-spell-corrected.diff
- id: trailing-whitespace
- id: check-json
exclude: mathics_scanner/data/characters.json
exclude: mathics_scanner/data/character-tables.json
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
Expand Down
12 changes: 6 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ PIP_INSTALL_OPTS ?=
#: Default target - same as "develop"
all: develop

mathics_scanner/data/characters.json: mathics_scanner/data/named-characters.yml
mathics_scanner/data/character-tables.json: mathics_scanner/data/named-characters.yml
$(PIP) install -r requirements-dev.txt
$(PYTHON) mathics_scanner/generate/build_tables.py

Expand All @@ -31,11 +31,11 @@ mathics_scanner/data/operators.json: mathics_scanner/data/operators.yml
$(PYTHON) mathics_scanner/generate/build_operator_tables.py

#: build everything needed to install
build: mathics_scanner/data/characters.json
build: mathics_scanner/data/characters.json mathics_scanner/data/operators.json
$(PYTHON) ./setup.py build

#: Set up to run from the source tree
develop: mathics_scanner/data/characters.json mathics_scanner/data/operators.json
develop: mathics_scanner/data/character-tables.json mathics_scanner/data/operators.json
$(PIP) install -e .$(PIP_INSTALL_OPTS)

#: Build distribution
Expand All @@ -56,16 +56,16 @@ check: pytest
test: check

#: Build Sphinx HTML documentation
doc: mathics_scanner/data/characters.json
doc: mathics_scanner/data/character-tables.json
make -C docs html

#: Remove derived files
clean:
@find . -name *.pyc -type f -delete; \
$(RM) -f mathics_scanner/data/characters.json mathics_scanner/data/operators.json || true
$(RM) -f mathics_scanner/data/character-tables.json mathics_scanner/data/operators.json || true

#: Run py.test tests. Use environment variable "o" for pytest options
pytest: mathics_scanner/data/characters.json
pytest: mathics_scanner/data/character-tables.json
$(PYTHON) -m pytest test $o

#: Print to stdout a GNU Readline inputrc without Unicode
Expand Down
2 changes: 1 addition & 1 deletion admin-tools/make-tables.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ mydir=$(dirname $bs)
PYTHON=${PYTHON:-python}

cd $mydir/../mathics_scanner/data
$PYTHON ../generate/build_tables.py -o characters.json
$PYTHON ../generate/build_tables.py -o character-tables.json
$PYTHON ../generate/build_operator_tables.py -o operators.json
19 changes: 9 additions & 10 deletions docs/source/implementation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ class) whose names are preceded by ``t_``, such as in the following example: ::
# Some logic goes here...
pass

A tokenization rule is supposed to take a regular expression match (the
``match`` parameter of type ``re.Match``) and convert it to an appropriate
token, which is then returned by the method. The rule is also responsible for
updating the internal state of the tokeniser, such as incrementing the ``pos``
A tokenization rule is supposed to take a regular expression match (the
``match`` parameter of type ``re.Match``) and convert it to an appropriate
token, which is then returned by the method. The rule is also responsible for
updating the internal state of the tokeniser, such as incrementing the ``pos``
counter.

A rule is always expected to receive sane input. In other words, deciding which
Expand Down Expand Up @@ -67,7 +67,7 @@ field in the YAML table is set to ``true``.
The conversion routines ``replace_wl_with_plain_text`` and
``replace_unicode_with_wl`` use this information to convert between Wolfram's
internal format and standard Unicode, but it should be noted that the
conversion scheme is more complex than a simple lookup in the YAML table.
conversion scheme is more complex than a simple lookup in the YAML table.

The Conversion Scheme
---------------------
Expand Down Expand Up @@ -97,7 +97,7 @@ ASCII is the following:
replaced by it's Unicode equivalent.
- If a character doesn't have a Unicode equivalent or any of the characters of
it's Unicode equivalent isn't a valid character then the character is
replaced by it's fully qualified name.
replaced by it's fully qualified name.

The ``replace_unicode_with_wl`` function converts text from standard Unicode to
Wolfram's internal representation. The algorithm for converting from standard
Expand All @@ -123,9 +123,9 @@ tests showed that storing the tables as JSON and using `ujson
way to access them. However, this is merely an implementation detail and
consumers of this library should not rely on this assumption.

The conversion tables are stored in the ``data/characters.json`` file, along
The conversion tables are stored in the ``data/character-tables.json`` file, along
side other complementary information used internally by the library.
``data/characters.json`` holds three conversion tables:
``data/character-tables.json`` holds three conversion tables:

- The ``wl-to-unicode`` table, which stores the precompiled results of the
Wolfram-to-Unicode conversion algorithm. ``wl-to-unicode`` is used for lookup
Expand All @@ -140,7 +140,7 @@ side other complementary information used internally by the library.
when ``replace_unicode_with_wl`` is called.

The precompiled translation tables, as well as the rest of data stored in
``data/characters.json``, is generated from the YAML table with the
``data/character-tables.json``, is generated from the YAML table with the
``mathics_scanner.generate.build_tables.compile_tables`` function.

Note that multiple entries in the YAML table are redundant in the following
Expand All @@ -155,4 +155,3 @@ precompiled conversion tables. Such optimization makes the tables smaller and
easier to load. This implies that not all named characters that have a Unicode
equivalent are included in the precompiled translation tables (the ones that
are not included are the ones where no conversion is needed).

2 changes: 1 addition & 1 deletion mathics_scanner/characters.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def get_srcdir() -> str:
ROOT_DIR = get_srcdir()

# Load the conversion tables from disk
characters_path = osp.join(ROOT_DIR, "data", "characters.json")
characters_path = osp.join(ROOT_DIR, "data", "character-tables.json")
if os.path.exists(characters_path):
with open(characters_path, "r") as f:
_data = ujson.load(f)
Expand Down
2 changes: 1 addition & 1 deletion mathics_scanner/data/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ Python programs (via ujson).
Json output is not formatted in any way to facilitate loaded. To see
json output formated use a JSON formatter like ``jq``:

cat characters.json | jq
cat character-tables.json | jq
10 changes: 9 additions & 1 deletion mathics_scanner/data/named-characters.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
# unicode names that we check against. So if the character
# or unicode symbol is not in that, don't use it here.
#
# wl-reference: HTML link to the Worlfram Langauge & System document for character.
# wl-reference: HTML link to the Wolfram Langauge & System document for character.
#
# wl-unicode: The unicode code point used by Mathics internally to represent
# the named character. If it is the same as unicode-equivalent
Expand Down Expand Up @@ -2115,6 +2115,10 @@ Digamma:
wl-unicode: "\u03DD"
wl-unicode-name: GREEK SMALL LETTER DIGAMMA

# The WL symbol displays with a round dot at the left endpoint.
# The unicode equivalent shows omits this
# When there is a tag over the edge, WL uses a bold variant
# of the symbol.
DirectedEdge:
amslatex: "\\rightarrow"
esc-alias: de
Expand Down Expand Up @@ -10172,6 +10176,10 @@ UnderParenthesis:
wl-unicode: "\uFE36"
wl-unicode-name: PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS

# The WL symbol displays with a round dot at each endpoint.
# The unicode equivalent shows arrows at each endpoint.
# When there is a tag over the edge, WL uses a bold variant
# of the symbol.
UndirectedEdge:
ascii: "<->"
esc-alias: ue
Expand Down
85 changes: 0 additions & 85 deletions mathics_scanner/data/operators-intro.yml

This file was deleted.

Loading

0 comments on commit 14b346f

Please sign in to comment.