-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liblouis table for Dutch #6
Comments
@dkager See branch "dkager_dutch" |
Thanks. Should we update nl-BE or do a new set of tests for nl-NL? Would be nice to keep them in sync, or to have just one set of tests because these tables are almost identical. |
Try extending nl_BE. If all the tests in there are valid for the Netherlands (Dedicon) rename it to nl. Otherwise make a new test file for nl_NL and we can get them in sync again later. |
I've reorganized the tests somewhat and added examples from the official documentation. There still isn't a huge lot of data. I'm not sure what else can be added, these tests cover most of the corner cases. I still have to actually run these, but one omission I can spot right away is the special braille notation for minutes and seconds, i.e.: |
These are things that liblouis has difficulty with, because it knows little about the context, except for the actual text you give it (and typeform info). You could make a rule that finds occurrences of the pattern "digit-digit-apos-digit-digit-apos-apos" and translate it in a special way. For such simple heuristics this approach works, but if you need to do advanced text analysis, or if you want to mark up minutes and seconds in XML, you need something besides liblouis. |
I don't think we get a lot of books where this shows up, so not very high priority just yet. |
The new data looks very good. It's not a problem if it's not a lot, if it covers everything it covers everything. I noticed you deleted some tests. In almost all cases you replaced them with an equivalent test but in one or two cases you didn't (e.g. "SLAAT DE VLAM IN DE PAN..."; I know it's kind of the same as "MEER DAN DRIE WOORDEN..." but not completely because the last uppercase word doesn't end with a period). I want to avoid touching other people's tests as much as possible so if there's no good reason to delete a test I think we should leave it. I'm gonna add back the ascii-output because it's useful for readability. I can do that with a script. |
Can you also re-add the all-capitals sentence? I'm not a big fan of the ascii-input because it's redundant and to me (as a braille reader) the symbols don't make much sense. Could you wait with the ascii-input until I've finished the test data? It makes editing somewhat easier for me as I don't know the ASCII code liblouis uses. Also, I noticed that other harness tests (fi_harness.txt) don't have ASCII? |
I know, that's why I added the Unicode alongside with the ASCII. The Unicode is used by the test harness, the ASCII is for making it readable for sighted people (documentation purpose). I'll wait until you've finished though, then I only have to run the conversion script once. |
So we need support for this that is similar to how emphasized text is handled? I'll look at the Greek capital letters. They are used in maths so it is important that they work. |
Yes, that support is already implemented by Michael Gray, but his patch isn't merged into master yet. See liblouis#50.
I think the nl-NL table does Greek capitals different then the nl-BE table. |
Forgot to bring this up re: commit 1a92520. Ignoring unknown keys in the JSON file sounds like a good idea, but I think it has to be an option. Right now if you want to "test the tests" for syntax errors you can do so. If you ignore unknown keys, you can't. So maybe add an option to the harness runner so that we can have both output and ascii-output keys. |
Okay sounds reasonable. Maybe call it |
Not entirely sure why it's failing in the first place since no explicit format validation is performed except for checking the input is valid JSON. Something to check next week. :) |
The validation happens implicitly when constructing a |
The nl-NL table has a few issues that should be easy to fix:
Furthermore, the emphasis tests are failing. For one of them it looks like expected == received, yet the test fails with a braille difference error. I hope I'm not overlooking something! There are also some accented letters that aren't in the standard. Need to figure out what to do with those. See eb4ef4d. What's the best way to proceed after I fixed nl-NL? Rename the test harness to nl-NL-g1_harness.txt and associate it with the nl-NL table? |
Don't attach too much importance to the emphasis tests. Emphasis in liblouis is currently broken. For that one test that seems to have the correct output, the expected output had some blank braille patterns instead of regular spaces. I've corrected it. For things that are not in the standard such as those accented letters maybe you could make a separate table? Are Greek letters in the standard by the way? Copy the table you want to edit to nl and rename the test to nl. We should probably name the table g0 instead of g1 (see liblouis#16 (comment)). |
Should we do xfails for those?
The standard describes the first three letters in an example. This matches what nl-NL does, except for the letter beta which was missing dot 1 (will push a fix in a moment). The definitions in nl-NL are the same as in the example, but I'll check if Dedicon is maybe using something else.
It's also not really contracted (.ctb versus .utb). Also, is the patch for the MORE THAN THREE WORDS in capitals going to be included in the next release? We now have an xfail for this, but it's important to resolve the problem to get proper Dutch braille output. |
Should we do xfails for emphasis tests: yes. The extension should probably be changed to .utb, yes. The patch which fixes emphasis and adds support for indicating phrases of capitalized words will not be in the June release. We've targeted September. |
September is okay, as long as it’s in there before delivering the system. :) |
It's in the branch https://github.com/MikeGray-APH/liblouis/commits/mrg_ueb_update, in case you already want to try it out. |
This is pretty much done now. What remains:
|
About the FIXMEs in nl-NL-chardefs.cti: they refer to the standard "World Braille Usage (3rd edition)": https://cdn.rawgit.com/liblouis/braille-specs/master/world-braille-usage-third-edition.pdf |
By the way I put those FIXMEs in there. They're mostly about differences between NL and BE that need to be sorted out. |
There are now quite a few tests that are failing (see dkager_dutch_ueb). For next week: find out why, and fix it while not breaking the nocaps table. |
The Dutch braille standard 2005 does not specify that the plus sign doesn't cancel a capitalized word. But the Dutch liblouis table does treat it that way. There is a test: Is there any additional documentation that specifies this or is it an organization-specific choice? Maybe we should discuss this in the process of unifying the Belgium and Netherlands tables. |
Yes I probably based my implementation on the test that I got, and didn't check the documentation. |
Related issues:
The text was updated successfully, but these errors were encountered: