./textproc/split-thai, Utilities and an emacs library to split UTF-8 Thai text into words

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 2.28, Package name: split-thai-2.28, Maintainer: scole

A collection of utilities to split Thai Unicode UTF-8 text by word
boundaries, also known as word tokenization or word breaking. The
utilities use emacs, swath, perl, and a c++ icu-project program. All
use dictionary-based word splitting.

Also included is a merged dictionary file of Thai words, a perl script
to grep Thai UTF-8 words, and an emacs library that can split,
unsplit, spellcheck, and play audio for Thai words.


Required to run:
[textproc/icu] [editors/emacs] [audio/sox] [lang/perl5] [textproc/swath]

Required to build:
[devel/libdatrie]

Master sites:


Version history: (Expand)


CVS history: (Expand)


   2024-01-07 04:28:09 by Sean Cole | Files touched by this commit (2) | Package updated
Log message:
Update to 2.28
all changes for pthai.el
- add option for looking up, viewing words/missed words in pthai-practice-words
- add pthai-dictionary-clear
- bind pthai-complete-word to key like alt-tab for ispell
- for pthai-practice-* functions, don't require dictionary format
- simplify pthai-dictionary-read-file and its *Messages*
- update external dictionaries to latest builds
   2023-11-08 14:21:43 by Thomas Klausner | Files touched by this commit (2377)
Log message:
*: recursive bump for icu 74.1
   2023-07-24 03:55:52 by Sean Cole | Files touched by this commit (2)
Log message:
Update to 2.27
all changes for pthai.el
- reduce some unnecessary splitting and joining of strings
- default to word at point for 'pthai-dictionary-find-word
- make 'pthai-dictionary-find-regexp work same whether interactive or not
- add prefix option for 'pthai-twt-words
- in 'pthai-unknown-words-toggle, display clearer messaging whether words
  toggled on or off, or no words found
- simplify code in a few places
   2023-06-12 03:35:16 by Sean Cole | Files touched by this commit (2) | Package updated
Log message:
Update to 2.26
all changes for pthai.el
- add seconds to pthai-say-time, simplify time parsing
- rename 'pthai-parse-hour-minute-second to 'pthai-parse-hms
- rename 'pthai-bounds-offset-of-split-string to 'pthai-bounds-offsets
- rename 'pthai-randomize to 'pthai-nrandomize, use faster in-place algorithm
- remove 'pthai-nth-set, pthai-nth-delete
- many changes related to 'thai-word-table including:
  * rename 'pthai-thai-break-words to 'pthai-twt-split
  * remove 'pthai-twt-update, change interface for 'pthai-twt-read
  * add 'pthai-twt-add/remove/clear for manipulating the 'thai-word-table
  * modification to 'pthai-dictionary will also be applied to 'thai-word-table
  * simplify pthai-dictionary-read-*, only have clear option for
    'pthai-dictionary-read-files
  * add custom variables pthai-twt-splitter-enable, pthai-twt-lock
   2023-06-06 14:42:56 by Taylor R Campbell | Files touched by this commit (1319)
Log message:
Mass-change BUILD_DEPENDS to TOOL_DEPENDS outside mk/.

Almost all uses, if not all of them, are wrong, according to the
semantics of BUILD_DEPENDS (packages built for target available for
use _by_ tools at build-time) and TOOL_DEPEPNDS (packages built for
host available for use _as_ tools at build-time).

No change to BUILD_DEPENDS as used correctly inside buildlink3.

As proposed on tech-pkg:
https://mail-index.netbsd.org/tech-pkg/2023/06/03/msg027632.html
   2023-05-07 02:28:01 by Sean Cole | Files touched by this commit (2)
Log message:
Update to 2.25
all changes for pthai.el
- consider numbers as being in the dictionary
- various fixes to handle when point at thai character and not word
- simplify and fix some lookup-at-point functions
- add option to pthai-practice-* to practice words without audio
- rename 'pthai-fetch-thailanguage to 'pthai-fetch-thai-language
- rename 'pthai-dictionary-grep to 'pthai-dictionary-find-word
- rename 'pthai-unknown-words-insert to 'pthai-unknown-words-extract
- add counting by letter function 'pthai-dictionary-word-counts
- many small cleanups
   2023-04-19 10:12:01 by Adam Ciarcinski | Files touched by this commit (2359) | Package updated
Log message:
revbump after textproc/icu update
   2022-12-16 19:27:00 by Sean Cole | Files touched by this commit (2)
Log message:
Update to 2.24
all changes for pthai.el
- use expand-file-name in a few places
- fix pthai-audio-display-definition plumbing
- use call-process* for pthai-mp3-play and pthai-split-command
- rename pthai-splitter-swath-word-length to pthai-splitter-max-swath-word-length
- restore pthai-thai-break-words splitter