Path to this page:
./
textproc/split-thai,
Utilities to split UTF-8 Thai text into words
Branch: pkgsrc-2020Q4,
Version: 1.1nb1,
Package name: split-thai-1.1nb1,
Maintainer: pkgsrc-usersA collection of utilities to split Thai Unicode UTF-8 text by word
boundaries, also known as word tokenization or word breaking. The
utilities use emacs, swath, perl, and a c++ icu-project program. All
use dictionary-based word splitting.
Also included is a merged dictionary file of Thai words and a perl
script to grep Thai UTF-8 words.
Master sites:
Version history: (Expand)
- (2021-01-03) Package added to pkgsrc.se, version split-thai-1.1nb1 (created)