Path to this page:
Subject: CVS commit: pkgsrc/textproc/hunspell
From: Benny Siegert
Date: 2018-11-16 14:02:20
Message id: 20181116130220.9CB94FB1F@cvs.NetBSD.org
Log Message:
Update hunspell to 1.7.0.
Bump ABI_DEPENDS in bl3.mk.
New features and bug fixes by Laszlo Nemeth, supported by FSF.hu Foundation:
• No annoying suggestion times any more, especially in languages with
compound word handling and complex morphology. By adding balanced
multi-level time limits, now the guaranteed suggestion time is there
within half a second, not seconds (nor dozen of seconds or more
in extreme cases) for longer misspellings, too.
• add SPELLML support for run-time dictionary extension with optional
affixation of user words. See new "Grammar By" feature of
language-specific user dictionaries of LibreOffice 6.0:
News: \
https://wiki.documentfoundation.org/ReleaseNotes/6.0#.E2.80.9CGrammar_By.E2.80.9D_spell_checking
Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo
Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I
• Improved, highly customizable suggestions on level of dictionary words:
Pronunciations and typical misspellings defined by optional "ph:" \
fields of
the dictionary words are used not only in n-gram suggestions, but as
elements of the REP replacement list getting the highest priority in normal
suggestions, also giving the best suggestions for short words, too.
More information: see "ph:" in man 5 hunspell.
• Handling multiple word suggestions is much more easier. Like in a
traditional spelling dictionary, for example, to get the correct suggestion
"a lot" for the typical misspelling "alot" at the first \
place, now it's
enough to put the following line to the dic(tionary) file:
a lot
• Limit compound overgeneration by dictionary based word pairs:
Now it's possible to filter bad compound words by listing
the correct word pairs with space in the dictionary, as in a traditional
spelling dictionary.
• clean-up suggestion:
□ no n-gram and compound word suggestions, if "good" suggestion
exists, ie. uppercase, REP, ph: or dictionary word pair suggestions
□ word pairs are always suggested, if they exist in the dic file
□ word pairs have top priority in suggestions, and
these are the only suggestions if there is no other good suggestion.
□ also dictionary word pairs separated by dash instead of space
are handled specially in two-word suggestion (depending from the
language)
• limit bad suggestions by improved n-gram suggestion rules:
don't suggest capitalized dictionary words for lower
case misspellings in n-gram suggestions, except
□ PHONE usage, or
□ in the case of German, where not only proper
nouns are capitalized, or
□ the capitalized word has special pronunciation
and don't suggest if the difference of lengths of misspellings and
suggestions is 5 or more characters.
• Extend dotless i and dotted I rules to Crimean Tatar language
Allow dotted I in dictionary, and disable bad capitalization of i.
• BREAK: extended recursive word breaking algorithm to handle words or
words with suffixes when they already contain word break characters,
for example, "e-mail" is a dictionary word with a word break \
character, and
it wasn't accepted before in compounds in some languages.
• FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
forms recognized by BREAK word breaking by adding the bad compounds to
the dictionary with FORBIDDENWORD flags.
• lower limit for "doubletwochars" suggestion algorithm:
one of the typical misspellings recognized by Hunspell suggestion
mechanism is the syllable duplication. Along the old pattern
ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
simpler ABAB -> AB pattern is recognized in non-starting position,
for example, regretTETEd -> regretTEd.
• lower limit for longswapchar and movechar: recognized only max.
4-character distances to avoid slow and bad suggestions.
• fix compound handling for new Hungarian orthography reform
• Allow suggestion search for prefix + two suffixes:
Remove artificial performance limit to get correct
suggestions for relatively simple misspellings in
Hungarian, etc., when the word form contains prefix
and both derivative and inflectional suffixes, too:
lefikszálása -> lefixálása
Improvements for command-line Hunspell:
• Remove false alarms during checking OpenDocument (ODF)
documents by ignoring <text:span> elements. (LibreOffice
creates a lot of <text:span> elements also within words
during text reediting, resulted often huge amount of broken
words before this fix.)
• List filenames during filtering multiple files in command-line:
Examples:
$ hunspell -l *.odt
a.odt: mispelling
b.odt: egzample
$ hunspell -l -G *.odt
a.odt: good
b.odt: words
• Dictionary search by option -D doesn't wait for the standard input
(fixed by Siva Mahadevan)
Other improvements:
• makealias dictionary compression: add option --minimize-diff
to reuse free positions of alias lists to create minimal and
readable diffs for alias compressed dictionaries stored in
revision control systems, as dictionaries of LibreOffice.
• Brazilian-Portuguese translation by Rafael Fontenelle
• Catalan translation by robert dot buj at gmail
• Minor bug fixes by several contributors, see git log
Files: