./textproc/hunspell, Improved spellchecker

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.7.0, Package name: hunspell-1.7.0, Maintainer: ahoka

Hunspell is the default spell checker of OpenOffice.org office suite
and expectant spell checker of Mozilla Firefox and Thunderbird.

Main features:

* Unicode support.
* Conditional and multiple affixes for languages with rich morphology.
* Extended compound word support.
* Morphological analysis (in custom item and arrangement style).
* Hunspell is based on MySpell and works also with MySpell dictionaries.
* GPL/LGPL/MPL tri-license


Required to run:
[lang/perl5]

Required to build:
[pkgtools/cwrappers]

Master sites:

SHA1: e42ea8342a191b9cd7da57d0d6ad4ae1566c5dcc
RMD160: 52c7dbf21f460a0b61ea7d0378ef314773887fde
Filesize: 470.855 KB

Version history: (Expand)


CVS history: (Expand)


   2018-11-16 14:02:20 by Benny Siegert | Files touched by this commit (5) | Package updated
Log message:
Update hunspell to 1.7.0.

Bump ABI_DEPENDS in bl3.mk.

New features and bug fixes by Laszlo Nemeth, supported by FSF.hu Foundation:

  • No annoying suggestion times any more, especially in languages with
    compound word handling and complex morphology. By adding balanced
    multi-level time limits, now the guaranteed suggestion time is there
    within half a second, not seconds (nor dozen of seconds or more
    in extreme cases) for longer misspellings, too.

  • add SPELLML support for run-time dictionary extension with optional
    affixation of user words. See new "Grammar By" feature of
    language-specific user dictionaries of LibreOffice 6.0:

    News: \ 
https://wiki.documentfoundation.org/Rel … l_checking

    Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo

    Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I

  • Improved, highly customizable suggestions on level of dictionary words:
    Pronunciations and typical misspellings defined by optional "ph:" \ 
fields of
    the dictionary words are used not only in n-gram suggestions, but as
    elements of the REP replacement list getting the highest priority in normal
    suggestions, also giving the best suggestions for short words, too.
    More information: see "ph:" in man 5 hunspell.

  • Handling multiple word suggestions is much more easier. Like in a
    traditional spelling dictionary, for example, to get the correct suggestion
    "a lot" for the typical misspelling "alot" at the first \ 
place, now it's
    enough to put the following line to the dic(tionary) file:

    a lot

  • Limit compound overgeneration by dictionary based word pairs:
    Now it's possible to filter bad compound words by listing
    the correct word pairs with space in the dictionary, as in a traditional
    spelling dictionary.

  • clean-up suggestion:

      □ no n-gram and compound word suggestions, if "good" suggestion
        exists, ie. uppercase, REP, ph: or dictionary word pair suggestions

      □ word pairs are always suggested, if they exist in the dic file

      □ word pairs have top priority in suggestions, and
        these are the only suggestions if there is no other good suggestion.

      □ also dictionary word pairs separated by dash instead of space
        are handled specially in two-word suggestion (depending from the
        language)

  • limit bad suggestions by improved n-gram suggestion rules:

    don't suggest capitalized dictionary words for lower
    case misspellings in n-gram suggestions, except

      □ PHONE usage, or
      □ in the case of German, where not only proper
        nouns are capitalized, or
      □ the capitalized word has special pronunciation

    and don't suggest if the difference of lengths of misspellings and
    suggestions is 5 or more characters.

  • Extend dotless i and dotted I rules to Crimean Tatar language
    Allow dotted I in dictionary, and disable bad capitalization of i.

  • BREAK: extended recursive word breaking algorithm to handle words or
    words with suffixes when they already contain word break characters,
    for example, "e-mail" is a dictionary word with a word break \ 
character, and
    it wasn't accepted before in compounds in some languages.

  • FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
    forms recognized by BREAK word breaking by adding the bad compounds to
    the dictionary with FORBIDDENWORD flags.

  • lower limit for "doubletwochars" suggestion algorithm:
    one of the typical misspellings recognized by Hunspell suggestion
    mechanism is the syllable duplication. Along the old pattern
    ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
    simpler ABAB -> AB pattern is recognized in non-starting position,
    for example, regretTETEd -> regretTEd.

  • lower limit for longswapchar and movechar: recognized only max.
    4-character distances to avoid slow and bad suggestions.

  • fix compound handling for new Hungarian orthography reform

  • Allow suggestion search for prefix + two suffixes:
    Remove artificial performance limit to get correct
    suggestions for relatively simple misspellings in
    Hungarian, etc., when the word form contains prefix
    and both derivative and inflectional suffixes, too:

    lefikszálása -> lefixálása

Improvements for command-line Hunspell:

  • Remove false alarms during checking OpenDocument (ODF)
    documents by ignoring <text:span> elements. (LibreOffice
    creates a lot of <text:span> elements also within words
    during text reediting, resulted often huge amount of broken
    words before this fix.)

  • List filenames during filtering multiple files in command-line:

    Examples:

    $ hunspell -l *.odt
    a.odt: mispelling
    b.odt: egzample

    $ hunspell -l -G *.odt
    a.odt: good
    b.odt: words

  • Dictionary search by option -D doesn't wait for the standard input
    (fixed by Siva Mahadevan)

Other improvements:

  • makealias dictionary compression: add option --minimize-diff
    to reuse free positions of alias lists to create minimal and
    readable diffs for alias compressed dictionaries stored in
    revision control systems, as dictionaries of LibreOffice.

  • Brazilian-Portuguese translation by Rafael Fontenelle

  • Catalan translation by robert dot buj at gmail

  • Minor bug fixes by several contributors, see git log
   2018-10-26 09:43:05 by Leonardo Taccari | Files touched by this commit (1)
Log message:
hunspell: Simplify distfile handling (NFC)

GITHUB_PROJECT by default is already PKGBASE, no need to reinitialize it.
Reuse PKGVERSION_NOREV for GITHUB_TAG.
Remove commented out WRKSRC while here.
   2018-10-23 13:45:34 by Benny Siegert | Files touched by this commit (10) | Package updated
Log message:
Update hunspell to 1.6.2.

1.6.2

Library changes: no. Same as 1.6.1.
Command line tool:
-   Added German translation
-   Fixed bug with wrong output encoding, not respecting system locale.

1.6.1

Library changes:
-   Performance improvements in suggest()
-   Fixes regressions for Hungarian related to compounding.
-   Fixes regressions for Korean related to ICONV.
Command line tool:
-   Added Tajik translation
-   Fix regarding serching of OOo dicts installed in user folder
Manpages:
-   Fix microsoft-cp1251 to cp1251. Dicts should not use the first.
-   Typos.

1.6.0

Changes in the library:
-   Performance improvement in ngsuggest(), suggestions should be faster.
-   Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons.
-   MAXWORDLEN can be set during build time with -D defines.
-   Fix crash when word with 102 consecutive X is spelled.
Changes in the command line tool:
-   -D shows all loaded dictionares insted of only the first.
-   -D properly lists all available dictionaries on Windows.

1.5.4

Fixes bug related to the Hungarian dictionary and the command COMPOUNDSYLLABLE

1.5.3

Remove a unneded #include header in the public hunspell.hxx

1.5.2

Fixes backward compatibility with 1.4 at API level. Now it should be complete.

1.5.1

-   Lot of stability fixes
-   Fixed compilation errors on various systems (Windows, FreeBSD)
-   Small performance improvement compared to 1.4.0
-   Added new API with C++ types (string, vector), yet full API backward \ 
compatibility with 1.4 is kept

1.4.1

Past begin() iterator decrement error

VS Debug build threw error on decrement past begin.

1.4.0

New release that strips out fixed length buffers from large parts of the library
Note: dictmgr.hxx header is dropped
   2018-10-19 19:57:42 by Benny Siegert | Files touched by this commit (4)
Log message:
Rename analyze, munch and unmunch tools.

These names are way too generic to go into bin/, and folks on the mailing
list agreed. Now they have a "hunspell-" prefix.

Bump revision.
   2018-08-22 11:48:07 by Thomas Klausner | Files touched by this commit (3558)
Log message:
Recursive bump for perl5-5.28.0
   2018-08-07 10:29:43 by Jonathan Perkin | Files touched by this commit (1)
Log message:
hunspell: Specify C++03.
   2018-05-24 00:06:50 by Thomas Klausner | Files touched by this commit (2) | Package updated
Log message:
hunspell: for wide character support, use ncursesw.

The configure script checks for the library name and accepts only ncursesw.

Bump PKGREVISION.
   2018-01-25 11:43:59 by Jonathan Perkin | Files touched by this commit (2)
Log message:
hunspell: Fix clang -Wreserved-user-defined-literal error.