2009-06-14 19:59:32 by Joerg Sonnenberger | Files touched by this commit (263) |
Log message:
Remove @dirrm entries from PLISTs
|
2008-10-30 23:12:59 by Thomas Klausner | Files touched by this commit (3) |
Log message:
Replace patch-ab with a post-extract rule. No change to the binary package,
just one file less in pkgsrc ;)
|
2008-05-30 15:06:27 by Thomas Klausner | Files touched by this commit (5) |
Log message:
Update to 2.03:
January 23 2008 - V2.02
Improvements to clustering, training and classifier.
Major internationalization improvements for large-character-set
languages, eg Kannada.
Removed some compiler warnings.
Added multipage tiff support for training and running.
Updated graphics output to talk to new java-based viewer.
Added ability to save n-best lists.
Added leptonica support for more file types.
Improved Init/End to make them safe.
Reduced memory use of dictionaries.
Added some new APIs to TessBaseAPI.
April 21 2008 - V2.02 (again)
Fixed namespace collisions with jpeg library (INT32).
Portability fixes for Windows for new code.
Updates to autoconf system for new code.
April 22 2008 - V2.03
Fixed crash introduced in 2.02.
Fixed lack of tessembedded.cpp in distribution.
Added test for leptonica header files and conditional test for lib.
|
2007-11-29 17:42:09 by Thomas Klausner | Files touched by this commit (3) |
Log message:
Update to 2.01:
August 27 2007 - V2.01
Fixed UTF8 input problems with box file reader.
Fixed various infinite loops and crashes in dawg code.
Removed include of config_auto.h from host.h.
Added automatic wctype encoding to unicharset_extractor.
Fixed dawg table too full error.
Removed svn files from tarball.
Added new functions to tessdll.
Increased maximum utf8 string in a classification result to 8.
|
2007-07-28 03:02:16 by Thomas Klausner | Files touched by this commit (7) | |
Log message:
Update to 2.00, provided by Rumko on pkgsrc-users.
July 02 2007 - V2.00
Converted internal character handling to UTF8.
Trained with 6 languages.
Added unicharset_extractor, wordlist2dawg.
Added boxfile creation mode.
Added UNLV regression test capability.
Fixed problems with copyright and registered symbols.
Fixed extern "C" declarations problem.
|
2007-05-18 08:39:27 by Thomas Klausner | Files touched by this commit (9) | |
Log message:
Initial import of tesseract-1.04b from pkgsrc-wip (packaged by heinz@
and myself):
This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO
OUTPUT FORMATTING, and NO UI. It can only process an image of a
single column and create text from it. It can detect fixed pitch
vs proportional text. Having said that, in 1995, this engine was
in the top 3 in terms of character accuracy, and it compiles and
runs on both Linux and Windows. Another current limitation is that
it only recognizes English and its character set is only US-ASCII.
Training code IS included in the open source release however, and
will be included in a future release.
|