Release [1.3.0] - 2022-11-28 · CUNY-CL/wikipron

Under `data/`

Added

Big scrape for 2022. (#464)
Added the --fresh flag to data/scrape/scrape.py to facilitate running the big scrape in batches. (#464)
Added the --exclude flag for excluding one or more languages in data/scrape/scrape.py. (#460)
Added data/src/normalize.py. (#356)
Updated README.md. (#360)
Added data/cg/tsv/geo.tsv. (#367)
Added data/morphology. (#369)
Added SIGMORPHON 2021 morphology data. (#375)
Added data/cg/tsv/jpn_hira.tsv. (#384)
Enforced final newlines. (#387)
Adds all UniMorph languages to morphology. (#393)
Added data/covering_grammar/tsv/fre_latn_phonemic.tsv (#398)
Added data/covering_grammar/lib/make_test_file.py (#396, #399)
Added Komi-Zyrian (kpv). (#400)
Added Makasar (mak). (#415, #419)
Added Zou (zom). (#421)
Added Wiyot (wiy). (#422)
Added Sidamo (sid). (#423)
Added Central Atlas Tamazight (tzm). (#429)
Added Chibcha (chb). (#430)
Added Kashmiri (kas). (#431)
Added Malayalam (mal). (#434)
Added Dhivehi (div). (#437)
Added Akkadian (akk). (#441)
Added Central Nahuatl (nhn). (#443)
Added Etruscan (ett). (#444)
Added Gujarati (guj). (#445)
Added Kannada (kan). (#446)
Added Karelian (krl). (#447)
Added Romagnol (rgn). (#448)
Added Southern Yukaghir (yux). (#449)
Added Urak Lawoi' (urk). (#451)
Added Hausa (ha). (#452)
Added Kashubian (csb). (#453)
Added Tabaru (tby). (#455)
Added West Makian (mqs). (#457)
Added Amharic (amh). (#458)
Added Livvi (olo). (#459)
Added Kalmyk (xal). (#472)
Added Ternate (tft). (#473)
Added Abkhaz (abk). (#474)
Added Farefare (gur). (#475)
Added Iban (iba). (#476)
Added Laz (lzz). (#477)

Changed

Switched to ISO 639-3 language codes. (#468)
Updated scraped data in preparation for the SIGMORPHON 2022 shared task:
swe nno ger dut ita rum ukr bel tgl ceb ben asm per pus tha lwl. (#461)
Made scripts under data/frequencies/ and data/morphology/ more flexible,
especially for the purposes of preparing data for a shared task. (#461)
Fixed the --restriction flag for specifying multiple languages in data/scrape/scrape.py. (#460)
Added covering grammar coverage error log and specified error_type in error_analysis.py. (#424)
Added error log writing in error_analysis.py. (#420)
Added new columns in summary tables. (#365)
Fixed broken paths in data/src/generate_phones_summary.py and in
data/phones/HOWTO.md. (#352)
Added Atong (India) (aot). (#353)
Added Egyptian Arabic (arz). (#354)
Added Lolopo (ycl). (#355)
Fixed Unicode normalization in data/phones/slv_phonemic.phones and
re-scraped Slovenian data. (#356)
Updated data/phones/HOWTO.md to include instructions on applying the
NFC Unicode normalization (#357)
Updated data/src/normalize.py to be more efficient. (#358)
Fixed inaccuracies in data/phones/geo_phonemic.phones. (#367)
Fixed typo in data/cg/tsv/geo.tsv and added missing character. (#370)
Morphology URLs are now provided as a list. (#376)
Configured and scraped Yamphu (ybi). (#380)
Configured and scraped Khumi Chin (cnk). (#381)
Made summary generation in common_characters.py optional. (#382)
Fixed phone counting in data/src/generate_phones_summary.py (#390, #392)
Reorganizes scraping scripts under data/scrape (#394)
Reorganizes .phones files and related scripts under data/phones (#395)
Reorganizes CG files and related scripts under data/covering_grammar (#395)
Reorganized data/phones/phones/fre_phonemic.phones (#398)
Removed data/src/ (#401)
Renamed TSV files and phonelists to use the terms "broad"/"narrow" instead
of "phonemic"/"phonetic" (#389, #402, #405)
Fixed typo in README.md (#407)
Fixed column ordering of the test file read by the script in
data/covering_grammar/lib/error_analysis.py (#411)
Fixed Common character collection in common_characters.py (#419)
Scraping test fixed for blt. (#436)
Changed URLs to point at CUNY-CL repo, where applicable. (#438)

Under `wikipron/` and elsewhere

Added

Added ckb in languagecodes.py. (#464)
Added support for Python 3.10. (#462)
Added test of phones list generation in test_data/test_summary.py (#363)
Added Min Nan extraction function. (#397)
Added Tai Dam extraction function, configuration and initial scrape. (#435)
Added test of casefold value for languages in data/scrape/lib/languages.json (#442)
Added support for Python 3.11. (#479)
Added checks for the Python source distribution and wheel on CI. (#479)
Turned on tests for Windows on CI. (#479)

Removed

Dropped support for Python 3.6. (#462)
Dropped support for Python 3.7. (#479)

Changed

Switched to ISO 639-3 language codes. (#468)
Converted setup.py to pyproject.toml. (#479)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.3.0] - 2022-11-28

Under `data/`

Added

Changed

Under `wikipron/` and elsewhere

Added

Removed

Changed

[1.3.0] - 2022-11-28

Under data/

Added

Changed

Under wikipron/ and elsewhere

Added

Removed

Changed

Under `data/`

Under `wikipron/` and elsewhere