-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liblouis table for Norwegian #4
Comments
@usama49 has been creating tests for norwegian language based on examples in the norwegian braille documentation, and I've converted it to the liblouis test format here: There's still some tests missing. The tests do not succeed yet. We will talk to Lars Bjørndal about the edits we've made to the tables to make sure they're ok. Note that we've included some markup in some of the input strings such as We've also gone through the table and updated it here: https://github.com/josteinaj/liblouis/blob/nlb-norwegian-table-corrections/tables/no-no-g0.utb
|
Great job! The tests looks very thorough. Good idea to coordinate with Lars. Because your changes are backed by the official standard you included, I'm tempted to give you the benefit of the doubt. But still it's good to talk to Lars because it's possible some differences are there because of differences in interpretation of the standard. Where to test <em> and <strong> depends on which level we are going to implement it. It could be done in liblouis, in that case you have to include a "typeform" argument in the tests (see e.g. http://snaekobbi.github.io/liblouis-table-spec/#lenitalphrase, '1' means bold, '2' mean italic, ...). It could also be done on a higher level, i.e. in a translator that makes use of liblouis but does some additional things, usually implemented in Java or XSLT. In order to support <em> and <strong> in the German translation, Christian currently does some extra processing in XSLT before and after the liblouis translation. The reason he does that is because the emphasis support was insufficient for German and also buggy at the time he wrote his translator. But there has been a lot of work on emphasis in liblouis lately, so maybe we could try that for Norwegian. The idea for this repository was that all of us could work together on branches without having to give everybody commit access to liblouis/liblouis. I suggest we first push your branch here, and then when it's ready make a PR to liblouis/liblouis. I'm not sure whether all tests should succeed. It's better to have a few tests that fail instead of no tests at all. I know I have some failing tests in my files. Ideally there should be some XFAIL flag on failing tests but I don't think we have that right now. @egli WDYT? |
Interesting. I noticed @jukkae used typeform for some of the finnish tests but I didn't know what it was. In norwegian braille it's not strictly defined which sign to use for em, strong and underline, so we may want to choose which signs to use based on a pre-processing step. Hopefully we can avoid it by assigning separate signs for em/strong/underline, but I'll have to consult some braille experts on the matter. In sweden they have separate signs apparently, but we don't.
Ok, I see. That makes sense. I'll move the branch.
Isn't xfail only for tests that should fail? How about making a test "pending"/"disabled" instead? |
I moved stuff out of my own liblouis fork and into this repository today: https://github.com/snaekobbi/liblouis/tree/nlb-norwegian-table-corrections https://github.com/snaekobbi/liblouis/tree/formal_braille_spec |
If you say "...we may want to choose which signs..." do you mean that for different documents the same sign may have a different meaning? Are you suggesting a pre-processing step that's analyses the document and then decides which signs to use for which semantic? Which signs to use for which type of emphasis could possibly also be specified in CSS somehow, if that would be a requirement. XFAIL is for tests that you know are failing because of some known bug e.g., and for which you want to get notified when they suddenly pass. |
Yes. Possibly. But hopefully not. I think we can avoid having to do so.
Right, that might work for us.
Ok, I didn't know that. |
We're close to done for transformation tests. Are there any examples of formatting tests? I expect those will not be part of the liblouis tests? |
No, formatting tests are not part of the liblouis tests because liblouis is only the translation engine (or, we only use it for translation). Tests for formatting are part of other tasks. |
I've updated the test file. 66 of 170 tests fail. |
@bertfrees unfortunatelly for the sucky harness tests we do not have xfail afaik |
@josteinaj: I have studied the I have updated the documentation: http://snaekobbi.github.io/liblouis-table-spec/#litdigit |
Interesting. Thanks for looking into this! |
@usama49 @josteinaj In commit 52cfc50 I've added Has either of you been able to set up the development environment and run the test suite? Or are you willing to learn? I would like you guys to work on the liblouis table yourself as much as possible. Of course I will assist. |
I'm able to run We can go through the errors and see if we are able to fix them by modifying the table, but I expect we'll have to learn the table format more in-depth than simple character substitutions I assume so I don't know how fast we'll be able to do it. I'd like to prioritize what little time I have on the DP2 script. |
You don't have to install the Python bindings if you run the test with I've managed to run the tests on Windows once but it was quite a pain and I think you'd be much better of using a VM with Linux. The main reason I wanted one of you to work on the table is that you can judge much better whether a test is failing because of the table or because of the test (e.g. a typo), and which tests deserve priority. The other reason is to pass on some of the workload. Of course I would help you with fixing specific things in the table, I wouldn't just let you struggle all by yourself. Another approach could be that I send you the test report and then you check the failing tests for correctness and send me a list of things I should fix, in order of importance. |
Sure, I understand; perfectly reasonable. We'll go through and try to fix as much as we can ourselves and let you know when we need help. Since I'm able to run the tests on my laptop we dont need you to send the reports.
If I run |
Try |
|
Thanks. I think the problem is that some rules are hardcoded in C (in compileTranslationTable.c), which means they are always defined first and can therefore not be overwritten:
You can however define in a display file how "a" and "1b" are to be represented. So one possible solution to problem 1 would be to include Problem 2 isn't a problem because nbsp characters in the braille output are supported. In fact the DP framework even requires that nbsp are preserved, so in the DP code I dynamically add the rule Problem 3 I will have a look at soon. |
@snaekobbi/nlb I have managed to fall back to the 6-dot definition if a character is not defined in the 8-dot table, but now I get this failure:
|
…table by splitting of the part of no-no-g0.utb that needs to be included in no-no-8dot.utb as new file no-no-chardefs6.uti see issue #4
@usama49 said:
When liblouis detects an email-address, URL, file path etc, it does not only put dot pattern 126 at the start and 345 at the end, but it also switches to "computer braille mode" within that string. In computer braille mode, numbers are not preceded by the digit sign. Is this undesired behavior? P.S.: other translation rules such as contraction rules will also not work in these kind of strings. Is this a problem too? |
@usama49 said:
Pre-processing that would mean that you would mark up the whole number "1 1/2" with e.g. Or I can just match cases like "1 1/2" with a context rule:
What do you think, would we get any false positive matches with this approach? And would we cover everything with this rule? I doubt this would properly back-translate, but I guess that's not a concern for us. |
@bertfrees I pushed a fix for the "6-dot fallback"-test. However, there's an issue which I don't know whether or not we need to fix (or even can fix); the 6-dot capital letters does not have the @KariRudjord: for capital letters not defined in the 8-dot standard; do you think we need to prefix them with a |
@bertfrees your fix for mixed numbers in the table solves the test; thanks! I don't know if it would give any false positives. I've added the fix in any case. For e-mails, the examples provided to us for how this should be done indicates that there should be a number sign (
|
So basically, email addresses should translate like other text, except that it should have start and end indicators? Liblouis assumes that if start and end indicators are inserted, the text in between should actually have a different mode (computer braille), otherwise the indicators wouldn't indicate anything. I suppose we have to implement this outside of liblouis. If we do the same detection that liblouis does, and just insert the indicators, that should do it right? |
My guess is that you should add dot 7 for capitals, so "Ờ" would become "4-1357". |
Although dot 4 also means "COMMERCIAL AT". |
You're right, that seems consistent with the rest of the table. Is it possible to "add a dot 7" with some opcode? |
No :) If I remember correctly @BueVest had this idea. In Danish there exists such a things as 8-dot contracted braille, and it would be useful to be able to add dot 7 to contractions as well. You could also do this by making a |
Yeah, I thought that might be the solution. Not very difficult to make with search'n'replace, it just seems kind of unneccessary. As you pointed out, dot 4 already means |
OK, please ask her. If it's even possible to make sense of the 8-dot standard. |
TODO: maybe add http://www.punktskriftutvalget.no/nyheter/ny-punkttabell-for-jaws to liblouis/braille-specs? |
We think alike, I've commited it locally to snaekobbi/braille-spec :) I don't have push access though... |
Yes. I just got this confirmed from @KariRudjord .
Whatever is easiest. Is this a mod-nlb thing? Should we create an issue for it there? |
Yes it's a mod-nlb thing. Yes please create an issue, then I'll take care of it when I have time. (Of course you are always free to have a look as well, but I already have a good idea about how to do it, so maybe discuss first with me.) |
See liblouis#140 |
See email from Jamie on liblouis ML: Norwegian 8 dot tables |
@bertfrees I can have a look and split it into three tables in the nlb-norwegian-table-corrections branch of this repo. |
OK. Not sure about the middle one either. |
Main thing is that we delete the old one and replace it with a module that the other one can include. |
Have a look at this one: The diff looks long but there's not really many changes. The old 8-dot table were identical to the new one (except two additions to the non-computer-braille list). There's now three tables; 8-dot computer braille, 8-dot braille, and 8-dot braille + fallback. Three tables is probably fine; everyone can choose which tables to expose themselves. |
Perfect! |
It seems the commit with the table cleanup wasn't included in the 2.6.5-release of liblouis, so I created a new branch off liblouis/master with that commit and created a PR for it: liblouis#150 |
Yes I think that change must have been after the release. Thanks for the PR. |
PR: liblouis#144
See also:
The text was updated successfully, but these errors were encountered: