2018-11-12 04:53:16 by Ryo ONODERA | Files touched by this commit (1532) |
Log message:
Recursive revbump from hardbuzz-2.1.1
|
2018-11-03 10:13:07 by Adam Ciarcinski | Files touched by this commit (5) | |
Log message:
tesseract: updated to 4.0.0
V4.0.0:
New OCR engine
- Added a new OCR engine that uses neural network system based on LSTMs, with \
major accuracy gains.
- This includes new training tools for the LSTM OCR engine. A new model can be \
trained from scratch or by fine tuning an existing model.
- Added trained data that includes LSTM models to 123 languages.
- Added optional accelerated code paths for the LSTM recognizer:
* Using OpenMP
* Using SIMD: AVX2 / AVX / SSE4.1
- Added a new parameter lstm_choice_mode that allows to include alternative \
symbol choices in the hOCR output.
- The new LSTM engine still does not support all features from the old legacy \
engine (see missing features).
Other OCR engines
- The pattern matching OCR engine that was the primary OCR engine in previous \
versions is still available in this version.
- Removed the 'Cube' OCR engine from the codebase. It was used for Hindi and for \
Arabic. The New LSTM engine performs much better, thus the Cube engine was no \
longer needed.
Updated build system
- Tesseract now uses semantic versioning.
- Tesseract now requires Leptonica 1.74.0 or a higher version.
- For building Tesseract from source code, a compiler with good C++ 11 support \
is required. See here for a list of officially supported compilers.
- Added unit tests to the main repo. The unit tests require Git submodules and \
the code for training.
- Added an option to compile Tesseract without the code of the legacy OCR engine.
- Update minimum required autoconf version to 2.63.
- Training tools dependencies - Update minimum required versions: ICU 52.1, \
Pango 1.22.0.
- Reorganized Tesseract's source tree. Most sources are now below the src directory.
Bug fixes and enhancements
- Fixed many issues that triggered compiler warnings.
- Fixed many issues reported by Coverity Scan or LGTM.
- Fixes to trainingdata rendering.
- Fixed damage to binary images when processing PDFs.
- Don't trigger a deliberate segmentation fault for fatal errors in release code.
- Fixed some issues in OpenCL code. OpenCL now works for the legacy Tesseract \
OCR engine, but does not improve the performance. It is not implemented for the \
LSTM OCR engine.
- Improved multi-page TIFF handling.
- Improvements to PDF rendering.
- Added version information and improved help texts to the training tools.
- Added faster version of log2().
- Documented in tesseract man page the option to use an input text file which \
contains lists of images.
- Made 'osd' the default traineddata when psm 0 is requested (currently this \
feature is only implemented in the command line interface, but not in the API).
- Removed tessedit_pageseg_mode 1 from hocr, pdf, and tsv config files. The user \
should explicitly use --psm 1 if that is desired.
- The list of available languages and scripts is now sorted alphabetically.
- Parameter unlv_tilde_crunching changed to false, because of default values \
cause issues in cases of unlv output in Tesseract 4.
- Removed obsolete code.
|
2018-07-20 05:34:33 by Ryo ONODERA | Files touched by this commit (705) |
Log message:
Recursive revbump from textproc/icu-62.1
|
2018-06-22 11:50:16 by Adam Ciarcinski | Files touched by this commit (2) | |
Log message:
tesseract: updated to 3.05.02
V3.05.02
* Fixed linking with Leptonica
* Fix build for Mingw-w64
* Fix Training error "Couldn't find a matching blob"
* Fix unterminated string
|
2018-06-11 17:01:49 by Filip Hajny | Files touched by this commit (2) | |
Log message:
graphics/tesseract: Revert update to data version 4.00. Using version 4 data \
with version 3 program is not supported. Fixes \
https://github.com/joyent/pkgsrc/issues/113.
|
2018-04-29 12:16:20 by Adam Ciarcinski | Files touched by this commit (2) |
Log message:
tesseract: added buildlink3; fixed COMMENT and HOMEPAGE
|
2018-04-16 16:35:28 by Thomas Klausner | Files touched by this commit (1284) |
Log message:
Recursive bump for new fribidi dependency in pango.
|
2018-04-14 09:34:46 by Adam Ciarcinski | Files touched by this commit (681) | |
Log message:
revbump after icu update
|
2018-03-12 12:18:01 by Thomas Klausner | Files touched by this commit (2155) |
Log message:
Recursive bumps for fontconfig and libzip dependency changes.
|
2018-01-25 12:30:34 by Adam Ciarcinski | Files touched by this commit (3) | |
Log message:
tesseract: updated tessdata to 4.00
|