You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to ask you if there are some documentation on the hyphenation algorithm, particularly on the NEXTLEVEL keyword.
I don't understand how it is used. It divides the patterns in two groups where the first is used in the hyphenation of a non-compounded word and the second on a compounded word? How can I know a word is compounded? I've read this https://github.com/hunspell/hyphen/blob/a7255913300734655691fc3e8ce20041d611fbdb/README.compound but I don't quite understand how the things going on.
When it is written "Hyphen, apostrophe and other characters may be word boundary characters, but they don't need (extra) hyphenation. [...] Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the previous settings, plus in UTF-8 encoding, endash (U+2013) and typographical apostrophe (U+2019) are NOHYPHEN characters, too." that means the hyphen, apostrophe (and additionally endash and typographical apostrophe if no NEXTLEVEL keyword is present) defines a break point by default, without checking any patterns?
When it is written
"ISO8859-1
NOHYPHEN -,'
1-1
1'1
NEXTLEVEL
Description:
1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
and NOHYPHEN with the comma separated character (or character sequence)
list forbid the (extra) hyphens at the hyphen and apostrophe characters."
What is the meaning of "(extra)"? If I don't include the NOHYPHEN -,' part there will be an extra hyphen?
When it is written
"The algorithm is recursive: every word parts of a successful
first (compound) level hyphenation will be rehyphenated
by the same (first) pattern set.
Finally, when first level hyphenation is not possible, Hyphen uses
the second level hyphenation for the word or the word parts."
That means that, if the NEXTLEVEL option is present, the algorithm scans two times the first set and, for the "sub-words" that were not re-splitted the second time, the second set is used? I understand correctly?
Thank you
The text was updated successfully, but these errors were encountered:
You opened the issue in the wrong place. This repository is for spell checking. Seems like you already opened one issue there. hunspell/hyphen#16.
Your best bet is to read the source code and make some sense of it.
Issue type:
I would like to ask you if there are some documentation on the hyphenation algorithm, particularly on the NEXTLEVEL keyword.
I don't understand how it is used. It divides the patterns in two groups where the first is used in the hyphenation of a non-compounded word and the second on a compounded word? How can I know a word is compounded? I've read this https://github.com/hunspell/hyphen/blob/a7255913300734655691fc3e8ce20041d611fbdb/README.compound but I don't quite understand how the things going on.
When it is written "Hyphen, apostrophe and other characters may be word boundary characters, but they don't need (extra) hyphenation. [...] Without explicite NEXTLEVEL declaration, Hyphen 2.8 uses the previous settings, plus in UTF-8 encoding, endash (U+2013) and typographical apostrophe (U+2019) are NOHYPHEN characters, too." that means the hyphen, apostrophe (and additionally endash and typographical apostrophe if no NEXTLEVEL keyword is present) defines a break point by default, without checking any patterns?
When it is written
"ISO8859-1
NOHYPHEN -,'
1-1
1'1
NEXTLEVEL
Description:
1-1 and 1'1 declare hyphen and apostrophe as word boundary characters
and NOHYPHEN with the comma separated character (or character sequence)
list forbid the (extra) hyphens at the hyphen and apostrophe characters."
What is the meaning of "(extra)"? If I don't include the NOHYPHEN -,' part there will be an extra hyphen?
When it is written
"The algorithm is recursive: every word parts of a successful
first (compound) level hyphenation will be rehyphenated
by the same (first) pattern set.
Finally, when first level hyphenation is not possible, Hyphen uses
the second level hyphenation for the word or the word parts."
That means that, if the NEXTLEVEL option is present, the algorithm scans two times the first set and, for the "sub-words" that were not re-splitted the second time, the second set is used? I understand correctly?
Thank you
The text was updated successfully, but these errors were encountered: