Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the hyphenator #21

Closed
mtrevisan opened this issue Jul 3, 2018 · 2 comments
Closed

question about the hyphenator #21

mtrevisan opened this issue Jul 3, 2018 · 2 comments
Labels
invalid This doesn't seem right question Further information is requested

Comments

@mtrevisan
Copy link

mtrevisan commented Jul 3, 2018

Issue type:

  • Others, questions

I would like to ask you if there are some documentation on the hyphenation algorithm, particularly on the NEXTLEVEL keyword.
I don't understand how it is used. It divides the patterns in two groups where the first is used in the hyphenation of a non-compounded word and the second on a compounded word? How can I know a word is compounded? I've read this https://github.com/hunspell/hyphen/blob/a7255913300734655691fc3e8ce20041d611fbdb/README.compound but I don't quite understand how the things going on.

When it is written "Hyphen, apostrophe and other characters may be word boundary characters, but they don't need (extra) hyphenation. [...] Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the previous settings, plus in UTF-8 encoding, endash (U+2013) and typographical apostrophe (U+2019) are NOHYPHEN characters, too." that means the hyphen, apostrophe (and additionally endash and typographical apostrophe if no NEXTLEVEL keyword is present) defines a break point by default, without checking any patterns?

When it is written

"ISO8859-1
NOHYPHEN -,'
1-1
1'1
NEXTLEVEL

Description:
1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
and NOHYPHEN with the comma separated character (or character sequence)
list forbid the (extra) hyphens at the hyphen and apostrophe characters."

What is the meaning of "(extra)"? If I don't include the NOHYPHEN -,' part there will be an extra hyphen?

When it is written

"The algorithm is recursive: every word parts of a successful
first (compound) level hyphenation will be rehyphenated
by the same (first) pattern set.

Finally, when first level hyphenation is not possible, Hyphen uses
the second level hyphenation for the word or the word parts."

That means that, if the NEXTLEVEL option is present, the algorithm scans two times the first set and, for the "sub-words" that were not re-splitted the second time, the second set is used? I understand correctly?

Thank you

@dimztimz
Copy link
Contributor

dimztimz commented Jul 3, 2018

You opened the issue in the wrong place. This repository is for spell checking. Seems like you already opened one issue there. hunspell/hyphen#16.
Your best bet is to read the source code and make some sense of it.

@dimztimz dimztimz closed this as completed Jul 3, 2018
@mtrevisan
Copy link
Author

I was afraid of that answer... Thank you anyway!

@dimztimz dimztimz added invalid This doesn't seem right question Further information is requested labels Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants