./textproc/hunspell, Improved spellchecker

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 1.7.0nb4, Package name: hunspell-1.7.0nb4, Maintainer: ahoka

Hunspell is the default spell checker of OpenOffice.org office suite
and expectant spell checker of Mozilla Firefox and Thunderbird.

Main features:

* Unicode support.
* Conditional and multiple affixes for languages with rich morphology.
* Extended compound word support.
* Morphological analysis (in custom item and arrangement style).
* Hunspell is based on MySpell and works also with MySpell dictionaries.
* GPL/LGPL/MPL tri-license

Required to run:

Required to build:

Package options: wide-curses

Master sites:

SHA1: e42ea8342a191b9cd7da57d0d6ad4ae1566c5dcc
RMD160: 52c7dbf21f460a0b61ea7d0378ef314773887fde
Filesize: 470.855 KB

Version history: (Expand)

CVS history: (Expand)

   2021-05-03 11:46:59 by Thomas Klausner | Files touched by this commit (2) | Package updated
Log message:
hunspell: enable wide curses support by default

   2021-01-05 08:40:13 by Thomas Klausner | Files touched by this commit (1)
Log message:
hunspell: needs autopoint to build now
   2021-01-04 15:46:26 by Ryo ONODERA | Files touched by this commit (1)
Log message:
hunspell: Fix locally modified errors in pre-configure stage
   2020-08-31 20:13:29 by Thomas Klausner | Files touched by this commit (3631) | Package updated
Log message:
*: bump PKGREVISION for perl-5.32.
   2020-08-03 13:19:28 by Thomas Klausner | Files touched by this commit (3) | Package updated
Log message:
hunspell: fix CVE-2019-16707 using upstream patch

   2019-08-11 15:25:21 by Thomas Klausner | Files touched by this commit (3557) | Package updated
Log message:
Bump PKGREVISIONs for perl 5.30.0
   2018-11-16 14:02:20 by Benny Siegert | Files touched by this commit (5) | Package updated
Log message:
Update hunspell to 1.7.0.

Bump ABI_DEPENDS in bl3.mk.

New features and bug fixes by Laszlo Nemeth, supported by FSF.hu Foundation:

  • No annoying suggestion times any more, especially in languages with
    compound word handling and complex morphology. By adding balanced
    multi-level time limits, now the guaranteed suggestion time is there
    within half a second, not seconds (nor dozen of seconds or more
    in extreme cases) for longer misspellings, too.

  • add SPELLML support for run-time dictionary extension with optional
    affixation of user words. See new "Grammar By" feature of
    language-specific user dictionaries of LibreOffice 6.0:

    News: \ 
https://wiki.documentfoundation.org/Rel … l_checking

    Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo

    Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I

  • Improved, highly customizable suggestions on level of dictionary words:
    Pronunciations and typical misspellings defined by optional "ph:" \ 
fields of
    the dictionary words are used not only in n-gram suggestions, but as
    elements of the REP replacement list getting the highest priority in normal
    suggestions, also giving the best suggestions for short words, too.
    More information: see "ph:" in man 5 hunspell.

  • Handling multiple word suggestions is much more easier. Like in a
    traditional spelling dictionary, for example, to get the correct suggestion
    "a lot" for the typical misspelling "alot" at the first \ 
place, now it's
    enough to put the following line to the dic(tionary) file:

    a lot

  • Limit compound overgeneration by dictionary based word pairs:
    Now it's possible to filter bad compound words by listing
    the correct word pairs with space in the dictionary, as in a traditional
    spelling dictionary.

  • clean-up suggestion:

      □ no n-gram and compound word suggestions, if "good" suggestion
        exists, ie. uppercase, REP, ph: or dictionary word pair suggestions

      □ word pairs are always suggested, if they exist in the dic file

      □ word pairs have top priority in suggestions, and
        these are the only suggestions if there is no other good suggestion.

      □ also dictionary word pairs separated by dash instead of space
        are handled specially in two-word suggestion (depending from the

  • limit bad suggestions by improved n-gram suggestion rules:

    don't suggest capitalized dictionary words for lower
    case misspellings in n-gram suggestions, except

      □ PHONE usage, or
      □ in the case of German, where not only proper
        nouns are capitalized, or
      □ the capitalized word has special pronunciation

    and don't suggest if the difference of lengths of misspellings and
    suggestions is 5 or more characters.

  • Extend dotless i and dotted I rules to Crimean Tatar language
    Allow dotted I in dictionary, and disable bad capitalization of i.

  • BREAK: extended recursive word breaking algorithm to handle words or
    words with suffixes when they already contain word break characters,
    for example, "e-mail" is a dictionary word with a word break \ 
character, and
    it wasn't accepted before in compounds in some languages.

  • FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
    forms recognized by BREAK word breaking by adding the bad compounds to
    the dictionary with FORBIDDENWORD flags.

  • lower limit for "doubletwochars" suggestion algorithm:
    one of the typical misspellings recognized by Hunspell suggestion
    mechanism is the syllable duplication. Along the old pattern
    ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
    simpler ABAB -> AB pattern is recognized in non-starting position,
    for example, regretTETEd -> regretTEd.

  • lower limit for longswapchar and movechar: recognized only max.
    4-character distances to avoid slow and bad suggestions.

  • fix compound handling for new Hungarian orthography reform

  • Allow suggestion search for prefix + two suffixes:
    Remove artificial performance limit to get correct
    suggestions for relatively simple misspellings in
    Hungarian, etc., when the word form contains prefix
    and both derivative and inflectional suffixes, too:

    lefikszálása -> lefixálása

Improvements for command-line Hunspell:

  • Remove false alarms during checking OpenDocument (ODF)
    documents by ignoring <text:span> elements. (LibreOffice
    creates a lot of <text:span> elements also within words
    during text reediting, resulted often huge amount of broken
    words before this fix.)

  • List filenames during filtering multiple files in command-line:


    $ hunspell -l *.odt
    a.odt: mispelling
    b.odt: egzample

    $ hunspell -l -G *.odt
    a.odt: good
    b.odt: words

  • Dictionary search by option -D doesn't wait for the standard input
    (fixed by Siva Mahadevan)

Other improvements:

  • makealias dictionary compression: add option --minimize-diff
    to reuse free positions of alias lists to create minimal and
    readable diffs for alias compressed dictionaries stored in
    revision control systems, as dictionaries of LibreOffice.

  • Brazilian-Portuguese translation by Rafael Fontenelle

  • Catalan translation by robert dot buj at gmail

  • Minor bug fixes by several contributors, see git log
   2018-10-26 09:43:05 by Leonardo Taccari | Files touched by this commit (1)
Log message:
hunspell: Simplify distfile handling (NFC)

GITHUB_PROJECT by default is already PKGBASE, no need to reinitialize it.
Remove commented out WRKSRC while here.