Liblouis table for Norwegian #4

bertfrees · 2014-12-23T17:51:47Z

include some official braille code specification document
write tests based on the specification
validate the existing braille table for Norwegian
improve the existing table or write a new one from scratch
include in liblouis upstream (merge?)

See also:

Norwegian specifications and guidelines liblouis/liblouis#67

josteinaj · 2015-04-22T15:26:04Z

@usama49 has been creating tests for norwegian language based on examples in the norwegian braille documentation, and I've converted it to the liblouis test format here:

https://github.com/josteinaj/liblouis/blob/nlb-norwegian-table-corrections/tests/harness/no_harness.txt

There's still some tests missing. The tests do not succeed yet. We will talk to Lars Bjørndal about the edits we've made to the tables to make sure they're ok.

Note that we've included some markup in some of the input strings such as  and . I assume this is not the right way to do it though but we've used them as placeholders for now. How should we write such tests? Is this a formatting thing which we should somehow write dotify-tests for instead?

We've also gone through the table and updated it here:

https://github.com/josteinaj/liblouis/blob/nlb-norwegian-table-corrections/tables/no-no-g0.utb

Should I make a PR from josteinaj/liblouis to some branch at snaekobbi/liblouis to move these changes over here? Or should I make a PR directly to the liblouis/liblouis master branch?
Should the tests succeed before I create a PR?

bertfrees · 2015-04-22T16:12:56Z

Great job! The tests looks very thorough.

Good idea to coordinate with Lars. Because your changes are backed by the official standard you included, I'm tempted to give you the benefit of the doubt. But still it's good to talk to Lars because it's possible some differences are there because of differences in interpretation of the standard.

Where to test and depends on which level we are going to implement it. It could be done in liblouis, in that case you have to include a "typeform" argument in the tests (see e.g. http://snaekobbi.github.io/liblouis-table-spec/#lenitalphrase, '1' means bold, '2' mean italic, ...). It could also be done on a higher level, i.e. in a translator that makes use of liblouis but does some additional things, usually implemented in Java or XSLT. In order to support and in the German translation, Christian currently does some extra processing in XSLT before and after the liblouis translation. The reason he does that is because the emphasis support was insufficient for German and also buggy at the time he wrote his translator. But there has been a lot of work on emphasis in liblouis lately, so maybe we could try that for Norwegian.

The idea for this repository was that all of us could work together on branches without having to give everybody commit access to liblouis/liblouis. I suggest we first push your branch here, and then when it's ready make a PR to liblouis/liblouis.

I'm not sure whether all tests should succeed. It's better to have a few tests that fail instead of no tests at all. I know I have some failing tests in my files. Ideally there should be some XFAIL flag on failing tests but I don't think we have that right now. @egli WDYT?

josteinaj · 2015-04-22T19:05:30Z

... "typeform" argument ...

Interesting. I noticed @jukkae used typeform for some of the finnish tests but I didn't know what it was.

In norwegian braille it's not strictly defined which sign to use for em, strong and underline, so we may want to choose which signs to use based on a pre-processing step. Hopefully we can avoid it by assigning separate signs for em/strong/underline, but I'll have to consult some braille experts on the matter. In sweden they have separate signs apparently, but we don't.

The idea for this repository ...

Ok, I see. That makes sense. I'll move the branch.

Ideally there should be some XFAIL flag on failing tests

Isn't xfail only for tests that should fail? How about making a test "pending"/"disabled" instead?

josteinaj · 2015-04-23T08:29:24Z

I moved stuff out of my own liblouis fork and into this repository today:

https://github.com/snaekobbi/liblouis/tree/nlb-norwegian-table-corrections

https://github.com/snaekobbi/liblouis/tree/formal_braille_spec

bertfrees · 2015-04-23T08:59:38Z

If you say "...we may want to choose which signs..." do you mean that for different documents the same sign may have a different meaning? Are you suggesting a pre-processing step that's analyses the document and then decides which signs to use for which semantic?

Which signs to use for which type of emphasis could possibly also be specified in CSS somehow, if that would be a requirement.

XFAIL is for tests that you know are failing because of some known bug e.g., and for which you want to get notified when they suddenly pass.

josteinaj · 2015-04-23T11:40:59Z

Are you suggesting a pre-processing step that's analyses the document and then decides which signs to use for which semantic?

Yes. Possibly. But hopefully not. I think we can avoid having to do so.

could possibly also be specified in CSS

Right, that might work for us.

XFAIL (...)

Ok, I didn't know that.

josteinaj · 2015-04-23T11:46:53Z

write tests based on the specification

We're close to done for transformation tests. Are there any examples of formatting tests? I expect those will not be part of the liblouis tests?

bertfrees · 2015-04-23T12:31:39Z

No, formatting tests are not part of the liblouis tests because liblouis is only the translation engine (or, we only use it for translation). Tests for formatting are part of other tasks.

bertfrees · 2015-05-07T17:25:27Z

I've updated the test file. 66 of 170 tests fail.

egli · 2015-05-08T06:31:51Z

@bertfrees unfortunatelly for the sucky harness tests we do not have xfail afaik

bertfrees · 2015-05-11T16:57:19Z

@josteinaj: I have studied the litdigit opcode some more and it appears it has some special behavior for backward translation, as I already suspected. When a dot pattern that was defined as litdigit appears inside a number (starting with the numsign), the litdigit rule is used, otherwise it is ignored for back-translation. Also there's something I had overlooked previously, namely that litdigit rules always have precedence over other character definition rules (digit, sign, letter, etc.)

I have updated the documentation: http://snaekobbi.github.io/liblouis-table-spec/#litdigit

josteinaj · 2015-05-12T09:59:48Z

Interesting. Thanks for looking into this!

bertfrees · 2015-05-21T13:41:29Z

@usama49 @josteinaj In commit 52cfc50 I've added xfail flags to all the failing tests (165 of 324). Now you can go through them one by one, find out why they are failing, and then either fix the table, fix/remove the test, or leave the xfail flag on the test with an explanation if it is reasonable to have those failures.

Has either of you been able to set up the development environment and run the test suite? Or are you willing to learn? I would like you guys to work on the liblouis table yourself as much as possible. Of course I will assist.

josteinaj · 2015-05-26T09:57:22Z

I'm able to run ./runHarness.py no_harness.txt (I think I had to install a liblouis php binding from the ubuntu repository). @usama49 is using Windows; do you know if it is easy to run the tests there as well?

We can go through the errors and see if we are able to fix them by modifying the table, but I expect we'll have to learn the table format more in-depth than simple character substitutions I assume so I don't know how fast we'll be able to do it. I'd like to prioritize what little time I have on the DP2 script.

bertfrees · 2015-05-26T11:02:40Z

You don't have to install the Python bindings if you run the test with make no_harness.txt.

I've managed to run the tests on Windows once but it was quite a pain and I think you'd be much better of using a VM with Linux.

The main reason I wanted one of you to work on the table is that you can judge much better whether a test is failing because of the table or because of the test (e.g. a typo), and which tests deserve priority. The other reason is to pass on some of the workload. Of course I would help you with fixing specific things in the table, I wouldn't just let you struggle all by yourself.

Another approach could be that I send you the test report and then you check the failing tests for correctness and send me a list of things I should fix, in order of importance.

josteinaj · 2015-05-26T11:22:08Z

Sure, I understand; perfectly reasonable. We'll go through and try to fix as much as we can ourselves and let you know when we need help.

Since I'm able to run the tests on my laptop we dont need you to send the reports.

make no_harness.txt just gives me make: Nothing to be done for 'no_harness.txt'.

If I run ./configure from the top-level liblouis/-folder I just get configure: error: cannot find install-sh, install.sh, or shtool in build-aux "."/build-aux; don't know if it matters but that prevents me from building liblouis locally.

bertfrees · 2015-05-26T11:49:02Z

Try make clean, and then again ./autogen.sh, ./configure --enable-ucs4 and make. The UCS4 is needed to run the tests. make no_harness.txt should work if run from the directory tests/harness in the branch nlb-norwegian-table-corrections.

josteinaj · 2015-05-26T12:00:19Z

➜  liblouis git:(nlb-norwegian-table-corrections) make clean
make: *** No rule to make target 'clean'.  Stop.
➜  liblouis git:(nlb-norwegian-table-corrections) ./autogen.sh 
Cleaning autotools files...
Running autoreconf...
configure.ac:79: error: possibly undefined macro: AC_LIBTOOL_WIN32_DLL
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:80: error: possibly undefined macro: AC_PROG_LIBTOOL
autoreconf: /usr/bin/autoconf failed with exit status: 1
➜  liblouis git:(nlb-norwegian-table-corrections) ./configure --enable-ucs4
configure: error: cannot find install-sh, install.sh, or shtool in build-aux "."/build-aux
➜  liblouis git:(nlb-norwegian-table-corrections) make
make: *** No targets specified and no makefile found.  Stop.

bertfrees · 2015-08-13T13:46:10Z

Thanks. I think the problem is that some rules are hardcoded in C (in compileTranslationTable.c), which means they are always defined first and can therefore not be overwritten:

space \s 0
noback sign \x0000
space \x00a0 a
space \x001b 1b
space \xffff 123456789abcdef

You can however define in a display file how "a" and "1b" are to be represented. So one possible solution to problem 1 would be to include display \x288F 1b in no-no-8dot.utb. An alternative is to use a pass2 rule to convert 1b to 12348: pass2 @1b @12348. The latter is the cleanest I think.

Problem 2 isn't a problem because nbsp characters in the braille output are supported. In fact the DP framework even requires that nbsp are preserved, so in the DP code I dynamically add the rule display \x00a0 a to every table. (EDIT: it is a problem because you want to map to dot pattern 8. So I suggest to add the rule pass2 @a @8.)

Problem 3 I will have a look at soon.

see #4

bertfrees · 2015-09-04T14:52:11Z

@snaekobbi/nlb I have managed to fall back to the 6-dot definition if a character is not defined in the 8-dot table, but now I get this failure:

--- Braille Difference Failure: ../../tests/harness/no_harness_8dot.txt ---
comment:                  'If character not defined in 8-dot standard; follow 6-dot standard instead'
input:                    'Ờ'
expected:                 '⢳'
received:                 '⠈⠕'
--- end ---

…table by splitting of the part of no-no-g0.utb that needs to be included in no-no-8dot.utb as new file no-no-chardefs6.uti see issue #4

bertfrees · 2015-09-04T15:08:38Z

@usama49 said:

Edb-sign: Problem is thatdigit sign is missing from e-mail address when email consist of digit and number. (see example)

comment: 'Edb-sign: Example 2'
input: Hun oppga [email protected] som sin adresse.
expected: ⠠⠓⠥⠝⠕⠏⠏⠛⠁⠣⠞⠕⠝⠑⠼⠃⠁⠈⠓⠕⠞⠍⠁⠊⠇⠄⠉⠕⠍⠜⠎⠕⠍⠎⠊⠝⠁⠙⠗⠑⠎⠎⠑⠄
received: ⠠⠓⠥⠝⠕⠏⠏⠛⠁⠣⠞⠕⠝⠑⢃⢁⠈⠓⠕⠞⠍⠁⠊⠇⠄⠉⠕⠍⠜⠎⠕⠍⠎⠊⠝⠁⠙⠗⠑⠎⠎⠑⠄

When liblouis detects an email-address, URL, file path etc, it does not only put dot pattern 126 at the start and 345 at the end, but it also switches to "computer braille mode" within that string. In computer braille mode, numbers are not preceded by the digit sign. Is this undesired behavior?

P.S.: other translation rules such as contraction rules will also not work in these kind of strings. Is this a problem too?

bertfrees · 2015-09-04T17:34:41Z

@usama49 said:

Fractions and mixed numbers: Problem is that in standard there should be no space between digit and fraction. See example:

comment: 'Fractions and mixed numbers: Example 4'
input: Han kjøpte 1 1/2-litersflasker brus.
expected: ⠠⠓⠁⠝ ⠅⠚⠪⠏⠞⠑ ⠼⠁⠼⠁⠌⠼⠃⠤⠇⠊⠞⠑⠗⠎⠋⠇⠁⠎⠅⠑⠗ ⠃⠗⠥⠎⠄
received: ⠠⠓⠁⠝ ⠅⠚⠪⠏⠞⠑ ⠼⠁ ⠼⠁⠌⠼⠃⠤⠇⠊⠞⠑⠗⠎⠋⠇⠁⠎⠅⠑⠗ ⠃⠗⠥⠎⠄

How should we solve this. Should we do some kind of pre-processing or you will fix it.

Pre-processing that would mean that you would mark up the whole number "1 1/2" with e.g. class="mixed-num".

Or I can just match cases like "1 1/2" with a context rule:

# no space within mixed numbers like 1 1/2
# the swapcd rule is for compensating that the litdigit rule is not matched anymore
swapcd aslitdigit 1234567890 1,12,14,145,15,124,1245,125,24,245 # as defined in litdigits6Dots.uti
context [%aslitdigit$s.]$d."/"$d %aslitdigit

What do you think, would we get any false positive matches with this approach? And would we cover everything with this rule? I doubt this would properly back-translate, but I guess that's not a concern for us.

josteinaj · 2015-09-07T07:01:53Z

@bertfrees I pushed a fix for the "6-dot fallback"-test. However, there's an issue which I don't know whether or not we need to fix (or even can fix); the 6-dot capital letters does not have the ⠠-prefix. In the 8-dot standard though, ⠠ is used for "ACUTE ACCENT - ´".

@KariRudjord: for capital letters not defined in the 8-dot standard; do you think we need to prefix them with a ⠠?

josteinaj · 2015-09-07T10:29:50Z

@bertfrees your fix for mixed numbers in the table solves the test; thanks! I don't know if it would give any false positives. I've added the fix in any case.

For e-mails, the examples provided to us for how this should be done indicates that there should be a number sign (3456) when a number sequence starts, and a letter sign (typically 56 for lowercase, I don't think we need 6 for uppercase since e-mails isn't case sensitive) when a letter sequence starts after a number sequence. So for [email protected] we need:

 1 a 1 @ a 1 1 y . c o m
^ ^ ^     ^   ^
| | |     |   |
| | 3456  |   |
| 56      |   56
3456      3456

bertfrees · 2015-09-07T11:01:09Z

So basically, email addresses should translate like other text, except that it should have start and end indicators? Liblouis assumes that if start and end indicators are inserted, the text in between should actually have a different mode (computer braille), otherwise the indicators wouldn't indicate anything.

I suppose we have to implement this outside of liblouis. If we do the same detection that liblouis does, and just insert the indicators, that should do it right?

bertfrees · 2015-09-07T11:08:39Z

for capital letters not defined in the 8-dot standard; do you think we need to prefix them with a ⠠?

My guess is that you should add dot 7 for capitals, so "Ờ" would become "4-1357".

bertfrees · 2015-09-07T11:09:47Z

Although dot 4 also means "COMMERCIAL AT".

josteinaj · 2015-09-07T11:51:34Z

add dot 7

You're right, that seems consistent with the rest of the table. Is it possible to "add a dot 7" with some opcode?

bertfrees · 2015-09-07T12:08:27Z

No :)

If I remember correctly @BueVest had this idea. In Danish there exists such a things as 8-dot contracted braille, and it would be useful to be able to add dot 7 to contractions as well.

You could also do this by making a no-no-latinLetterDef8Dots_diacritics.uti table (brute-force).

josteinaj · 2015-09-07T12:29:44Z

making a no-no-latinLetterDef8Dots_diacritics.uti

Yeah, I thought that might be the solution. Not very difficult to make with search'n'replace, it just seems kind of unneccessary.

As you pointed out, dot 4 already means @, so to avoid using the dot 4 prefix for diacritics that are not already defined in the 8-dot standard I've tried to understand how the patterns for the diacritic characters in the norwegian 8-dot standard has been chosen, but I just don't get it, they seem to be picked quite randomly. One possibility would be to make a "diacritics fallback alphabet" based on the characters already defined in the 8-dot table, making sure there's no conflicts. I'll have to check with @KariRudjord first though, and I won't put this on the top of my TODO-list. Maybe @usama49 could have a look at this.

bertfrees · 2015-09-07T12:48:37Z

OK, please ask her. If it's even possible to make sense of the 8-dot standard.

bertfrees · 2015-09-08T14:15:06Z

TODO: maybe add http://www.punktskriftutvalget.no/nyheter/ny-punkttabell-for-jaws to liblouis/braille-specs?

josteinaj · 2015-09-08T14:24:00Z

We think alike, I've commited it locally to snaekobbi/braille-spec :)

I don't have push access though...

josteinaj · 2015-09-08T15:13:29Z

So basically, email addresses should translate like other text, except that it should have start and end indicators? (...)

Yes. I just got this confirmed from @KariRudjord .

I suppose we have to implement this outside of liblouis. If we do the same detection that liblouis does, and just insert the indicators, that should do it right?

Whatever is easiest. Is this a mod-nlb thing? Should we create an issue for it there?

bertfrees · 2015-09-08T15:25:37Z

Yes it's a mod-nlb thing. Yes please create an issue, then I'll take care of it when I have time. (Of course you are always free to have a look as well, but I already have a good idea about how to do it, so maybe discuss first with me.)

josteinaj · 2015-09-08T15:34:03Z

nlbdev/pipeline-mod-nlb#4

bertfrees · 2015-11-16T10:55:04Z

See liblouis#140

bertfrees · 2015-12-04T09:54:57Z

See email from Jamie on liblouis ML: Norwegian 8 dot tables

josteinaj · 2015-12-04T10:40:36Z

@bertfrees I can have a look and split it into three tables in the nlb-norwegian-table-corrections branch of this repo.

bertfrees · 2015-12-04T10:42:10Z

OK. Not sure about the middle one either.

bertfrees · 2015-12-04T10:45:11Z

Main thing is that we delete the old one and replace it with a module that the other one can include.

josteinaj · 2015-12-04T11:27:33Z

Have a look at this one:

4716447

The diff looks long but there's not really many changes. The old 8-dot table were identical to the new one (except two additions to the non-computer-braille list). There's now three tables; 8-dot computer braille, 8-dot braille, and 8-dot braille + fallback. Three tables is probably fine; everyone can choose which tables to expose themselves.

bertfrees · 2015-12-04T12:03:09Z

Perfect!

josteinaj · 2016-01-11T11:00:45Z

@bertfrees @usama49

It seems the commit with the table cleanup wasn't included in the 2.6.5-release of liblouis, so I created a new branch off liblouis/master with that commit and created a PR for it: liblouis#150

bertfrees · 2016-01-11T11:09:58Z

Yes I think that change must have been after the release. Thanks for the PR.

bertfrees added the 0 - Backlog label Dec 23, 2014

egli closed this as completed in af52d5c Feb 9, 2015

jukkae reopened this Feb 9, 2015

bertfrees added this to the Priority 1 milestone Mar 5, 2015

bertfrees modified the milestones: norwegian (1), (1) Mar 16, 2015

bertfrees mentioned this issue Apr 8, 2015

Support Norwegian braille code snaekobbi/pipeline-mod-braille#41

Open

4 tasks

bertfrees added 1 - Next and removed 0 - Backlog labels Apr 8, 2015

josteinaj added 2 - In progress and removed 1 - Next labels May 5, 2015

bertfrees added a commit that referenced this issue Sep 4, 2015

Fix issues with \x001b and \x00a0 using pass2 rules

bafdc96

see #4

bertfrees added a commit that referenced this issue Sep 4, 2015

Fall back to 6-dot definition if a character is not defined in 8-dot …

716026b

…table by splitting of the part of no-no-g0.utb that needs to be included in no-no-8dot.utb as new file no-no-chardefs6.uti see issue #4

bertfrees mentioned this issue Sep 4, 2015

[4.3:34] The system shall support the braille code used in Denmark, Finland, Norway, Sweden, Switzerland and the Netherlands for uncontracted braille. snaekobbi/sprints#124

Closed

11 tasks

Liblouis table for Norwegian #4

Liblouis table for Norwegian #4

Comments

bertfrees commented Dec 23, 2014

josteinaj commented Apr 22, 2015

bertfrees commented Apr 22, 2015

josteinaj commented Apr 22, 2015

josteinaj commented Apr 23, 2015

bertfrees commented Apr 23, 2015

josteinaj commented Apr 23, 2015

josteinaj commented Apr 23, 2015

bertfrees commented Apr 23, 2015

bertfrees commented May 7, 2015

egli commented May 8, 2015

bertfrees commented May 11, 2015

josteinaj commented May 12, 2015

bertfrees commented May 21, 2015

josteinaj commented May 26, 2015

bertfrees commented May 26, 2015

josteinaj commented May 26, 2015

bertfrees commented May 26, 2015

josteinaj commented May 26, 2015

bertfrees commented Aug 13, 2015

bertfrees commented Sep 4, 2015

bertfrees commented Sep 4, 2015

bertfrees commented Sep 4, 2015

josteinaj commented Sep 7, 2015

josteinaj commented Sep 7, 2015

bertfrees commented Sep 7, 2015

bertfrees commented Sep 7, 2015

bertfrees commented Sep 7, 2015

josteinaj commented Sep 7, 2015

bertfrees commented Sep 7, 2015

josteinaj commented Sep 7, 2015

bertfrees commented Sep 7, 2015

bertfrees commented Sep 8, 2015

josteinaj commented Sep 8, 2015

josteinaj commented Sep 8, 2015

bertfrees commented Sep 8, 2015

josteinaj commented Sep 8, 2015

bertfrees commented Nov 16, 2015

bertfrees commented Dec 4, 2015

josteinaj commented Dec 4, 2015

bertfrees commented Dec 4, 2015

bertfrees commented Dec 4, 2015

josteinaj commented Dec 4, 2015

bertfrees commented Dec 4, 2015

josteinaj commented Jan 11, 2016

bertfrees commented Jan 11, 2016