Path to this page:
./
graphics/tesseract,
Commercial quality open source OCR engine
Branch: pkgsrc-2008Q4,
Version: 2.03,
Package name: tesseract-2.03,
Maintainer: pkgsrc-usersThis code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO
OUTPUT FORMATTING, and NO UI. It can only process an image of a
single column and create text from it. It can detect fixed pitch
vs proportional text. Having said that, in 1995, this engine was
in the top 3 in terms of character accuracy, and it compiles and
runs on both Linux and Windows. Another current limitation is that
it only recognizes English and its character set is only US-ASCII.
Training code IS included in the open source release however, and
will be included in a future release.
Required to run:[
graphics/tiff]
Required to build:[
devel/gmake]
Master sites:
SHA1: b7859278ff98a8b64bf98b5a519688e1559cec57
RMD160: 7519e7f4d876444bd3264d599dbf423e22443311
Filesize: 1050.302 KB
Version history: (Expand)
- (2009-01-06) Package added to pkgsrc.se, version tesseract-2.03 (created)