Skip to content

Commit

Permalink
Merge pull request #278 from xxyzz/dl
Browse files Browse the repository at this point in the history
Fix `dl` HTML tags can't have other HTML children bug
  • Loading branch information
xxyzz authored Apr 28, 2024
2 parents 21a9316 + cd70bac commit 10bfffb
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 11 deletions.
12 changes: 2 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,22 +53,14 @@ cd wikitextprocessor
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install --use-pep517 .
python -m pip install -e .
```

Alternatively, you can install from pypi.org:

```
python -m pip install wikitextprocessor
```

If you are installing wiktextract from source, you also need to install wikitextprocessor from source separately; otherwise, a newer wiktextract version will be installed alongside an older pypi version of wikitextprocessor, which will not work out.

### Running tests

This package includes tests written using the `unittest` framework.
The test dependencies can be installed with command
`python -m pip install --use-pep517 -e ".[dev]"`.
`python -m pip install -e .[dev]`.

To run the tests, use the following command in the top-level directory:

Expand Down
2 changes: 1 addition & 1 deletion src/wikitextprocessor/wikihtml.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
"del": {"parents": ["phrasing"], "content": ["phrasing"]},
"dfn": {"parents": ["phrasing"], "content": ["phrasing"]},
"div": {"parents": ["flow", "dl"], "content": ["flow"]},
"dl": {"parents": ["flow"], "content": []},
"dl": {"parents": ["flow"], "content": ["flow"]},
"dt": {
"parents": ["dl", "div"],
"close-next": ["dd", "dt"],
Expand Down
17 changes: 17 additions & 0 deletions tests/test_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -2903,6 +2903,23 @@ def test_html_end_tag_slash_after_attr(self):
self.assertEqual(root.children[2], "\n")
self.assertEqual(root.children[3].kind, NodeKind.LIST)

def test_zh_x_html(self):
# https://zh.wiktionary.org/wiki/大家
# https://zh.wiktionary.org/wiki/Template:Zh-x
self.ctx.start_page("大家")
root = self.ctx.parse(
"""<dl class="zhusex"><span lang="zh-Hant" class="Hant">example text</span><dd>translation text</dd></dl>""" # noqa: E501
)
span_text = ""
dd_text = ""
for dl_tag in root.find_html("dl"):
for span_tag in dl_tag.find_html("span"):
span_text = span_tag.children[0]
for dd_tag in dl_tag.find_html("dd"):
dd_text = dd_tag.children[0]
self.assertEqual(span_text, "example text")
self.assertEqual(dd_text, "translation text")


# XXX implement <nowiki/> marking for links, templates
# - https://en.wikipedia.org/wiki/Help:Wikitext#Nowiki
Expand Down

0 comments on commit 10bfffb

Please sign in to comment.