./textproc/split-thai, Utilities and an emacs library to split UTF-8 Thai text into words

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 2.19, Package name: split-thai-2.19, Maintainer: scole

A collection of utilities to split Thai Unicode UTF-8 text by word
boundaries, also known as word tokenization or word breaking. The
utilities use emacs, swath, perl, and a c++ icu-project program. All
use dictionary-based word splitting.

Also included is a merged dictionary file of Thai words, a perl script
to grep Thai UTF-8 words, and an emacs library that can split,
unsplit, spellcheck, and play audio for Thai words.


Master sites:


Version history: (Expand)


CVS history: (Expand)


   2022-09-21 17:40:43 by Sean Cole | Files touched by this commit (2)
Log message:
Update to 2.19
all changes for pthai.el
- add pthai-practice-words for practicing words in a dict file
- add pthai-soundfiles-all/playable/unplayable
- fix pthai-rwb to handle numbers not in dictionary
- fix parsing for thai strings with numbers and no spaces
- various small cleanups
   2022-08-11 17:55:56 by Sean Cole | Files touched by this commit (2)
Log message:
Update to 2.18
all changes for pthai.el
- small clean-ups, add completing-read a few places
- for pthai-unknowns*, keep word same as found in buffer
   2022-07-19 17:21:51 by Sean Cole | Files touched by this commit (2)
Log message:
Update to 2.17
all changes for pthai.el
- remove some unnecessary split methods and related customize settings
- simplify and optimize 'pthai-split-all, reducing redundant word splitting
- rename pthai-spell-correct-p -> pthai-spell-p, fix check for single char words
- clarify few function descriptions
- replace 'remove with 'delete in a few places
- remove pthai-completing-read-mode and ido prompt customize option
- replace pthai-completing-read with pthai-completing-read-full/dynamic,
  allows for thai word completion to be add to some functions
   2022-07-09 18:45:38 by Sean Cole | Files touched by this commit (2)
Log message:
Update to 2.16
all changes for pthai.el
- add pthai-dictionary-read-region/line/buffer functions
- rename pthai-dictionary-read -> pthai-dictionary-read-file
- rename pthai-dictionary-readall -> pthai-dictionary-read-files
- rename pthai-word-count -> pthai-dictionary-word-count
- add option to exit spell check at point
- add progess meters to some word counting funcs
- add pthai-spell-correct-p to spell check without word splitting
- add pthai-unknowns/count/insert for extracting/counting unknown words
   2022-06-28 13:38:00 by Thomas Klausner | Files touched by this commit (3952)
Log message:
*: recursive bump for perl 5.36
   2022-04-18 21:12:27 by Adam Ciarcinski | Files touched by this commit (1798) | Package updated
Log message:
revbump for textproc/icu update
   2022-03-15 18:50:43 by Sean Cole | Files touched by this commit (2)
Log message:
Update to 2.15
all changes for pthai.el
- add pthai-count-words ala count-words
- for lookups, indicate when thai word is in dictionary but undefined
- customize flag to skip download attempts for words not in dictionary
- add lists of consonants, vowels, and numbers and functions to display
   2021-12-08 17:07:18 by Adam Ciarcinski | Files touched by this commit (3063)
Log message:
revbump for icu and libffi