./textproc/split-thai, Utilities to split UTF-8 Thai text into words

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: pkgsrc-2020Q3, Version: 0.9, Package name: split-thai-0.9, Maintainer: pkgsrc-users

A collection of utilities to split Thai Unicode UTF-8 text by word
boundaries, also known as word tokenization or word breaking. The
utilities use emacs, swath, perl, and a c++ icu-project program. All
use dictionary-based word splitting.

Also included is a merged dictionary file of Thai words and a perl
script to grep Thai UTF-8 words.


Master sites:


Version history: (Expand)