Skip to content

Commit

Permalink
docs: add errata, and documentation for BCP47 parser
Browse files Browse the repository at this point in the history
  • Loading branch information
thunderpoot committed Jul 7, 2024
1 parent 277efca commit 2ed611a
Show file tree
Hide file tree
Showing 5 changed files with 76 additions and 26 deletions.
57 changes: 56 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

`isogloss` is a Python–based command–line tool designed for looking up language details based on [ISO 639](https://www.iso.org/iso-639-language-code) codes and IETF ([BCP-47](https://www.rfc-editor.org/info/bcp47)) language tags. It provides comprehensive information about languages, including their names, native names, and additional details associated with each code or tag.

There is also a [web–based version here](https://thunderpoot.github.io/isogloss).
There is also a [web–based version here](https://thunderpoot.github.io/isogloss). The [BCP47 parser](https://thunderpoot.github.io/isogloss/bcp-index.html) has some known issues, documented below in the "Errata" section.

Elsewhere, [the word isogloss](https://en.wikipedia.org/wiki/Isogloss) means a boundary line on a map denoting the regional use of a particular linguistic characteristic, but in this case it just seemed to fit.

Expand Down Expand Up @@ -224,6 +224,61 @@ $ isogloss/isogloss.py -i ar-ajp-apc-apd-Arab-CV-arevela-g-231243-r-sdarre-x-pri
- `data/script_codes.json`: Contains script code data in JSON format used for the BCP47 lookup.
- `data/deprecated-639-3.csv`: Contains deprecated ISO 639-3 codes in CSV format, for quick reference.

## Errata

There are known issues with the BCP47 parser in the web interface. It uses regular expressions to validate input, such that:

### Examples of valid tags:

- `en`

- `fr-CA`

- `i-klingon`

- `az-Arab-IR`

- `sr-Cyrl-RS`

- `zh-cmn-Hans`

- `ja-JP-x-tokyo`

- `uz-Cyrl-UZ-1992`

- `bo-Tibt-x-dialect`

- `zh-cmn-Hans-CN-x-private1`

- `hy-Latn-IT-arevela-x-test`


### Examples of invalid tags (malformed):

- `en-GB-oed-x-private`

- `de-CH-1901-co-phonebk-sc-gothic-x-bavaria`

(and more)

### Examples of inputs that reveal parsing bugs:

- `ca-valencia-nedis`
(Highlighted input section is missing "valencia")

- `en-US-u-islamcal`
(Variant "u" and Extension "islamcal", Extension section says "u - islamcal")

- `es-419-fonipa`
(Extended languages blank)

- `de-Latf-1901`
(Region undefined)

- `sl-rozaj`
(rozaj is coloured differently in the result container to how it is in the highlighted input section)


## Contributing

Contributions, issues, and feature requests are welcome!
Expand Down
39 changes: 16 additions & 23 deletions docs/bcp-app.js
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ function parseIetfTag(tag) {
extLangs: [],
script: null,
region: null,
variant: null,
extension: null,
variants: [],
extensions: [],
private: null
};
let idx = 1;
Expand All @@ -103,26 +103,17 @@ function parseIetfTag(tag) {
}

// Variants and extensions
const variantList = [];
const extensionList = [];
while (idx < components.length) {
if (components[idx].startsWith('x')) {
result.private = components.slice(idx).join('-');
break;
} else if (components[idx].startsWith('r') || components[idx].startsWith('g')) {
extensionList.push(components[idx]);
} else if (/^[a-wy-z0-9]$/.test(components[idx])) {
result.extensions.push(components[idx] + '-' + components[idx + 1]);
idx += 2;
} else {
if (!result.variant) {
result.variant = components[idx];
} else {
extensionList.push(components[idx]);
}
result.variants.push(components[idx]);
idx += 1;
}
idx += 1;
}

if (extensionList.length > 0) {
result.extension = extensionList.join('-');
}

return result;
Expand Down Expand Up @@ -258,14 +249,16 @@ function processTag() {
if (parsedTag.region) {
html += `<div class="section region"><strong>Region:</strong> ${regionData[parsedTag.region]}</div>`;
}
// Process Variant
if (parsedTag.variant) {
html += `<div class="section variant"><strong>Variant:</strong> ${parsedTag.variant}</div>`;
// Process Variants
if (parsedTag.variants.length > 0) {
const variantsHtml = parsedTag.variants.map(variant => `<li>${variant}</li>`).join('');
html += `<div class="section variant"><strong>Variants:</strong> <ul>${variantsHtml}</ul></div>`;
}

// Process Extension
if (parsedTag.extension) {
html += `<div class="section extension"><strong>Extension:</strong> ${parsedTag.extension}</div>`;
// Process Extensions
if (parsedTag.extensions.length > 0) {
const extensionsHtml = parsedTag.extensions.map(extension => `<li>${extension}</li>`).join('');
html += `<div class="section extension"><strong>Extensions:</strong> <ul>${extensionsHtml}</ul></div>`;
}

// Process Private Use
Expand All @@ -292,7 +285,7 @@ async function fetchLanguages() {

function loadPage() {
fetchLanguages();
// fetchLatestCommit();
fetchLatestCommit();
}

window.onload = loadPage;
1 change: 1 addition & 0 deletions docs/bcp-index.html
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ <h2>BCP47 Parser</h2>
<span><a rel="nofollow" href="https://github.com/thunderpoot/isogloss?tab=MIT-1-ov-file#readme" target="_blank">Licence</a></span> |
<span><a rel="nofollow" href="https://www.iso.org/iso-639-language-code" target="_blank">ISO</a></span> |
<span><a rel="nofollow" href="https://sil.org/" target="_blank">SIL</a></span> |
<span><a rel="nofollow" href="./" target="_blank">639</a></span> |
<span id="dark-mode-toggle" class="dark-mode-toggle" onclick="toggleDarkMode()">Dark</span>
</div>
</footer>
Expand Down
4 changes: 2 additions & 2 deletions docs/bcp-style.css
Original file line number Diff line number Diff line change
Expand Up @@ -232,8 +232,8 @@ button:hover {
gap: 5px;
}

.warning { color: #b44545; }
body.dark-mode .warning { color: #e87474; }
.warning { color: #b49f45; }
body.dark-mode .warning { color: #e8d974; }

.primary-language {
background-color: var(--primary-color);
Expand Down
1 change: 1 addition & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
<span><a rel="nofollow" href="https://github.com/thunderpoot/isogloss?tab=MIT-1-ov-file#readme" target="_blank">Licence</a></span> |
<span><a rel="nofollow" href="https://www.iso.org/iso-639-language-code" target="_blank">ISO</a></span> |
<span><a rel="nofollow" href="https://sil.org/" target="_blank">SIL</a></span> |
<span><a rel="nofollow" href="./bcp-index.html" target="_blank">BCP47</a></span> |
<span id="dark-mode-toggle" class="dark-mode-toggle">Dark</span>
</div>
</footer>
Expand Down

0 comments on commit 2ed611a

Please sign in to comment.