Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liblouis table for Norwegian #4

Open
6 tasks done
Tracked by #41
bertfrees opened this issue Dec 23, 2014 · 54 comments
Open
6 tasks done
Tracked by #41

Liblouis table for Norwegian #4

bertfrees opened this issue Dec 23, 2014 · 54 comments
Assignees
Milestone

Comments

@bertfrees
Copy link
Member

  • include some official braille code specification document
  • write tests based on the specification
  • validate the existing braille table for Norwegian
  • improve the existing table or write a new one from scratch
  • include in liblouis upstream (merge?)

PR: liblouis#144

See also:

@egli egli closed this as completed in af52d5c Feb 9, 2015
@jukkae jukkae reopened this Feb 9, 2015
@bertfrees bertfrees added this to the Priority 1 milestone Mar 5, 2015
@bertfrees bertfrees modified the milestones: norwegian (1), (1) Mar 16, 2015
@josteinaj
Copy link

@usama49 has been creating tests for norwegian language based on examples in the norwegian braille documentation, and I've converted it to the liblouis test format here:

https://github.com/josteinaj/liblouis/blob/nlb-norwegian-table-corrections/tests/harness/no_harness.txt

There's still some tests missing. The tests do not succeed yet. We will talk to Lars Bjørndal about the edits we've made to the tables to make sure they're ok.

Note that we've included some markup in some of the input strings such as <em> and <strong>. I assume this is not the right way to do it though but we've used them as placeholders for now. How should we write such tests? Is this a formatting thing which we should somehow write dotify-tests for instead?

We've also gone through the table and updated it here:

https://github.com/josteinaj/liblouis/blob/nlb-norwegian-table-corrections/tables/no-no-g0.utb

  • Should I make a PR from josteinaj/liblouis to some branch at snaekobbi/liblouis to move these changes over here? Or should I make a PR directly to the liblouis/liblouis master branch?
  • Should the tests succeed before I create a PR?

@bertfrees
Copy link
Member Author

Great job! The tests looks very thorough.

Good idea to coordinate with Lars. Because your changes are backed by the official standard you included, I'm tempted to give you the benefit of the doubt. But still it's good to talk to Lars because it's possible some differences are there because of differences in interpretation of the standard.

Where to test <em> and <strong> depends on which level we are going to implement it. It could be done in liblouis, in that case you have to include a "typeform" argument in the tests (see e.g. http://snaekobbi.github.io/liblouis-table-spec/#lenitalphrase, '1' means bold, '2' mean italic, ...). It could also be done on a higher level, i.e. in a translator that makes use of liblouis but does some additional things, usually implemented in Java or XSLT. In order to support <em> and <strong> in the German translation, Christian currently does some extra processing in XSLT before and after the liblouis translation. The reason he does that is because the emphasis support was insufficient for German and also buggy at the time he wrote his translator. But there has been a lot of work on emphasis in liblouis lately, so maybe we could try that for Norwegian.

The idea for this repository was that all of us could work together on branches without having to give everybody commit access to liblouis/liblouis. I suggest we first push your branch here, and then when it's ready make a PR to liblouis/liblouis.

I'm not sure whether all tests should succeed. It's better to have a few tests that fail instead of no tests at all. I know I have some failing tests in my files. Ideally there should be some XFAIL flag on failing tests but I don't think we have that right now. @egli WDYT?

@josteinaj
Copy link

... "typeform" argument ...

Interesting. I noticed @jukkae used typeform for some of the finnish tests but I didn't know what it was.

In norwegian braille it's not strictly defined which sign to use for em, strong and underline, so we may want to choose which signs to use based on a pre-processing step. Hopefully we can avoid it by assigning separate signs for em/strong/underline, but I'll have to consult some braille experts on the matter. In sweden they have separate signs apparently, but we don't.

The idea for this repository ...

Ok, I see. That makes sense. I'll move the branch.

Ideally there should be some XFAIL flag on failing tests

Isn't xfail only for tests that should fail? How about making a test "pending"/"disabled" instead?

@josteinaj
Copy link

I moved stuff out of my own liblouis fork and into this repository today:

https://github.com/snaekobbi/liblouis/tree/nlb-norwegian-table-corrections

https://github.com/snaekobbi/liblouis/tree/formal_braille_spec

@bertfrees
Copy link
Member Author

If you say "...we may want to choose which signs..." do you mean that for different documents the same sign may have a different meaning? Are you suggesting a pre-processing step that's analyses the document and then decides which signs to use for which semantic?

Which signs to use for which type of emphasis could possibly also be specified in CSS somehow, if that would be a requirement.

XFAIL is for tests that you know are failing because of some known bug e.g., and for which you want to get notified when they suddenly pass.

@josteinaj
Copy link

Are you suggesting a pre-processing step that's analyses the document and then decides which signs to use for which semantic?

Yes. Possibly. But hopefully not. I think we can avoid having to do so.

could possibly also be specified in CSS

Right, that might work for us.

XFAIL (...)

Ok, I didn't know that.

@josteinaj
Copy link

write tests based on the specification

We're close to done for transformation tests. Are there any examples of formatting tests? I expect those will not be part of the liblouis tests?

@bertfrees
Copy link
Member Author

No, formatting tests are not part of the liblouis tests because liblouis is only the translation engine (or, we only use it for translation). Tests for formatting are part of other tasks.

@bertfrees
Copy link
Member Author

I've updated the test file. 66 of 170 tests fail.

@egli
Copy link

egli commented May 8, 2015

@bertfrees unfortunatelly for the sucky harness tests we do not have xfail afaik

@bertfrees
Copy link
Member Author

@josteinaj: I have studied the litdigit opcode some more and it appears it has some special behavior for backward translation, as I already suspected. When a dot pattern that was defined as litdigit appears inside a number (starting with the numsign), the litdigit rule is used, otherwise it is ignored for back-translation. Also there's something I had overlooked previously, namely that litdigit rules always have precedence over other character definition rules (digit, sign, letter, etc.)

I have updated the documentation: http://snaekobbi.github.io/liblouis-table-spec/#litdigit

@josteinaj
Copy link

Interesting. Thanks for looking into this!

@bertfrees
Copy link
Member Author

@usama49 @josteinaj In commit 52cfc50 I've added xfail flags to all the failing tests (165 of 324). Now you can go through them one by one, find out why they are failing, and then either fix the table, fix/remove the test, or leave the xfail flag on the test with an explanation if it is reasonable to have those failures.

Has either of you been able to set up the development environment and run the test suite? Or are you willing to learn? I would like you guys to work on the liblouis table yourself as much as possible. Of course I will assist.

@josteinaj
Copy link

I'm able to run ./runHarness.py no_harness.txt (I think I had to install a liblouis php binding from the ubuntu repository). @usama49 is using Windows; do you know if it is easy to run the tests there as well?

We can go through the errors and see if we are able to fix them by modifying the table, but I expect we'll have to learn the table format more in-depth than simple character substitutions I assume so I don't know how fast we'll be able to do it. I'd like to prioritize what little time I have on the DP2 script.

@bertfrees
Copy link
Member Author

You don't have to install the Python bindings if you run the test with make no_harness.txt.

I've managed to run the tests on Windows once but it was quite a pain and I think you'd be much better of using a VM with Linux.

The main reason I wanted one of you to work on the table is that you can judge much better whether a test is failing because of the table or because of the test (e.g. a typo), and which tests deserve priority. The other reason is to pass on some of the workload. Of course I would help you with fixing specific things in the table, I wouldn't just let you struggle all by yourself.

Another approach could be that I send you the test report and then you check the failing tests for correctness and send me a list of things I should fix, in order of importance.

@josteinaj
Copy link

Sure, I understand; perfectly reasonable. We'll go through and try to fix as much as we can ourselves and let you know when we need help.

Since I'm able to run the tests on my laptop we dont need you to send the reports.

make no_harness.txt just gives me make: Nothing to be done for 'no_harness.txt'.

If I run ./configure from the top-level liblouis/-folder I just get configure: error: cannot find install-sh, install.sh, or shtool in build-aux "."/build-aux; don't know if it matters but that prevents me from building liblouis locally.

@bertfrees
Copy link
Member Author

Try make clean, and then again ./autogen.sh, ./configure --enable-ucs4 and make. The UCS4 is needed to run the tests. make no_harness.txt should work if run from the directory tests/harness in the branch nlb-norwegian-table-corrections.

@josteinaj
Copy link

➜  liblouis git:(nlb-norwegian-table-corrections) make clean
make: *** No rule to make target 'clean'.  Stop.
➜  liblouis git:(nlb-norwegian-table-corrections) ./autogen.sh 
Cleaning autotools files...
Running autoreconf...
configure.ac:79: error: possibly undefined macro: AC_LIBTOOL_WIN32_DLL
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:80: error: possibly undefined macro: AC_PROG_LIBTOOL
autoreconf: /usr/bin/autoconf failed with exit status: 1
➜  liblouis git:(nlb-norwegian-table-corrections) ./configure --enable-ucs4
configure: error: cannot find install-sh, install.sh, or shtool in build-aux "."/build-aux
➜  liblouis git:(nlb-norwegian-table-corrections) make
make: *** No targets specified and no makefile found.  Stop.

@bertfrees
Copy link
Member Author

Thanks. I think the problem is that some rules are hardcoded in C (in compileTranslationTable.c), which means they are always defined first and can therefore not be overwritten:

space \s 0
noback sign \x0000
space \x00a0 a
space \x001b 1b
space \xffff 123456789abcdef

You can however define in a display file how "a" and "1b" are to be represented. So one possible solution to problem 1 would be to include display \x288F 1b in no-no-8dot.utb. An alternative is to use a pass2 rule to convert 1b to 12348: pass2 @1b @12348. The latter is the cleanest I think.

Problem 2 isn't a problem because nbsp characters in the braille output are supported. In fact the DP framework even requires that nbsp are preserved, so in the DP code I dynamically add the rule display \x00a0 a to every table. (EDIT: it is a problem because you want to map to dot pattern 8. So I suggest to add the rule pass2 @a @8.)

Problem 3 I will have a look at soon.

@bertfrees
Copy link
Member Author

@snaekobbi/nlb I have managed to fall back to the 6-dot definition if a character is not defined in the 8-dot table, but now I get this failure:

--- Braille Difference Failure: ../../tests/harness/no_harness_8dot.txt ---
comment:                  'If character not defined in 8-dot standard; follow 6-dot standard instead'
input:                    'Ờ'
expected:                 '⢳'
received:                 '⠈⠕'
--- end ---

bertfrees added a commit that referenced this issue Sep 4, 2015
…table

by splitting of the part of no-no-g0.utb that needs to be included in
no-no-8dot.utb as new file no-no-chardefs6.uti

see issue #4
@bertfrees
Copy link
Member Author

@usama49 said:

Edb-sign: Problem is thatdigit sign is missing from e-mail address when email consist of digit and number. (see example)

comment: 'Edb-sign: Example 2'
input: Hun oppga [email protected] som sin adresse.
expected: ⠠⠓⠥⠝⠕⠏⠏⠛⠁⠣⠞⠕⠝⠑⠼⠃⠁⠈⠓⠕⠞⠍⠁⠊⠇⠄⠉⠕⠍⠜⠎⠕⠍⠎⠊⠝⠁⠙⠗⠑⠎⠎⠑⠄
received: ⠠⠓⠥⠝⠕⠏⠏⠛⠁⠣⠞⠕⠝⠑⢃⢁⠈⠓⠕⠞⠍⠁⠊⠇⠄⠉⠕⠍⠜⠎⠕⠍⠎⠊⠝⠁⠙⠗⠑⠎⠎⠑⠄

When liblouis detects an email-address, URL, file path etc, it does not only put dot pattern 126 at the start and 345 at the end, but it also switches to "computer braille mode" within that string. In computer braille mode, numbers are not preceded by the digit sign. Is this undesired behavior?

P.S.: other translation rules such as contraction rules will also not work in these kind of strings. Is this a problem too?

@bertfrees
Copy link
Member Author

@usama49 said:

Fractions and mixed numbers: Problem is that in standard there should be no space between digit and fraction. See example:

comment: 'Fractions and mixed numbers: Example 4'
input: Han kjøpte 1 1/2-litersflasker brus.
expected: ⠠⠓⠁⠝ ⠅⠚⠪⠏⠞⠑ ⠼⠁⠼⠁⠌⠼⠃⠤⠇⠊⠞⠑⠗⠎⠋⠇⠁⠎⠅⠑⠗ ⠃⠗⠥⠎⠄
received: ⠠⠓⠁⠝ ⠅⠚⠪⠏⠞⠑ ⠼⠁ ⠼⠁⠌⠼⠃⠤⠇⠊⠞⠑⠗⠎⠋⠇⠁⠎⠅⠑⠗ ⠃⠗⠥⠎⠄

How should we solve this. Should we do some kind of pre-processing or you will fix it.

Pre-processing that would mean that you would mark up the whole number "1 1/2" with e.g. class="mixed-num".

Or I can just match cases like "1 1/2" with a context rule:

# no space within mixed numbers like 1 1/2
# the swapcd rule is for compensating that the litdigit rule is not matched anymore
swapcd aslitdigit 1234567890 1,12,14,145,15,124,1245,125,24,245 # as defined in litdigits6Dots.uti
context [%aslitdigit$s.]$d."/"$d %aslitdigit

What do you think, would we get any false positive matches with this approach? And would we cover everything with this rule? I doubt this would properly back-translate, but I guess that's not a concern for us.

@josteinaj
Copy link

@bertfrees I pushed a fix for the "6-dot fallback"-test. However, there's an issue which I don't know whether or not we need to fix (or even can fix); the 6-dot capital letters does not have the -prefix. In the 8-dot standard though, is used for "ACUTE ACCENT - ´".

@KariRudjord: for capital letters not defined in the 8-dot standard; do you think we need to prefix them with a ?

@josteinaj
Copy link

@bertfrees your fix for mixed numbers in the table solves the test; thanks! I don't know if it would give any false positives. I've added the fix in any case.

For e-mails, the examples provided to us for how this should be done indicates that there should be a number sign (3456) when a number sequence starts, and a letter sign (typically 56 for lowercase, I don't think we need 6 for uppercase since e-mails isn't case sensitive) when a letter sequence starts after a number sequence. So for [email protected] we need:

 1 a 1 @ a 1 1 y . c o m
^ ^ ^     ^   ^
| | |     |   |
| | 3456  |   |
| 56      |   56
3456      3456

@bertfrees
Copy link
Member Author

So basically, email addresses should translate like other text, except that it should have start and end indicators? Liblouis assumes that if start and end indicators are inserted, the text in between should actually have a different mode (computer braille), otherwise the indicators wouldn't indicate anything.

I suppose we have to implement this outside of liblouis. If we do the same detection that liblouis does, and just insert the indicators, that should do it right?

@bertfrees
Copy link
Member Author

for capital letters not defined in the 8-dot standard; do you think we need to prefix them with a ?

My guess is that you should add dot 7 for capitals, so "Ờ" would become "4-1357".

@bertfrees
Copy link
Member Author

Although dot 4 also means "COMMERCIAL AT".

@josteinaj
Copy link

add dot 7

You're right, that seems consistent with the rest of the table. Is it possible to "add a dot 7" with some opcode?

@bertfrees
Copy link
Member Author

No :)

If I remember correctly @BueVest had this idea. In Danish there exists such a things as 8-dot contracted braille, and it would be useful to be able to add dot 7 to contractions as well.

You could also do this by making a no-no-latinLetterDef8Dots_diacritics.uti table (brute-force).

@josteinaj
Copy link

making a no-no-latinLetterDef8Dots_diacritics.uti

Yeah, I thought that might be the solution. Not very difficult to make with search'n'replace, it just seems kind of unneccessary.

As you pointed out, dot 4 already means @, so to avoid using the dot 4 prefix for diacritics that are not already defined in the 8-dot standard I've tried to understand how the patterns for the diacritic characters in the norwegian 8-dot standard has been chosen, but I just don't get it, they seem to be picked quite randomly. One possibility would be to make a "diacritics fallback alphabet" based on the characters already defined in the 8-dot table, making sure there's no conflicts. I'll have to check with @KariRudjord first though, and I won't put this on the top of my TODO-list. Maybe @usama49 could have a look at this.

@bertfrees
Copy link
Member Author

OK, please ask her. If it's even possible to make sense of the 8-dot standard.

@bertfrees
Copy link
Member Author

TODO: maybe add http://www.punktskriftutvalget.no/nyheter/ny-punkttabell-for-jaws to liblouis/braille-specs?

@josteinaj
Copy link

We think alike, I've commited it locally to snaekobbi/braille-spec :)

I don't have push access though...

@josteinaj
Copy link

So basically, email addresses should translate like other text, except that it should have start and end indicators? (...)

Yes. I just got this confirmed from @KariRudjord .

I suppose we have to implement this outside of liblouis. If we do the same detection that liblouis does, and just insert the indicators, that should do it right?

Whatever is easiest. Is this a mod-nlb thing? Should we create an issue for it there?

@bertfrees
Copy link
Member Author

Yes it's a mod-nlb thing. Yes please create an issue, then I'll take care of it when I have time. (Of course you are always free to have a look as well, but I already have a good idea about how to do it, so maybe discuss first with me.)

@josteinaj
Copy link

@bertfrees
Copy link
Member Author

See liblouis#140

@bertfrees
Copy link
Member Author

See email from Jamie on liblouis ML: Norwegian 8 dot tables

@josteinaj
Copy link

@bertfrees I can have a look and split it into three tables in the nlb-norwegian-table-corrections branch of this repo.

@bertfrees
Copy link
Member Author

OK. Not sure about the middle one either.

@bertfrees
Copy link
Member Author

Main thing is that we delete the old one and replace it with a module that the other one can include.

@josteinaj
Copy link

Have a look at this one:

4716447

The diff looks long but there's not really many changes. The old 8-dot table were identical to the new one (except two additions to the non-computer-braille list). There's now three tables; 8-dot computer braille, 8-dot braille, and 8-dot braille + fallback. Three tables is probably fine; everyone can choose which tables to expose themselves.

@bertfrees
Copy link
Member Author

Perfect!

@josteinaj
Copy link

@bertfrees @usama49

It seems the commit with the table cleanup wasn't included in the 2.6.5-release of liblouis, so I created a new branch off liblouis/master with that commit and created a PR for it: liblouis#150

@bertfrees
Copy link
Member Author

Yes I think that change must have been after the release. Thanks for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants