2016-04-11 21:02:08 by Ryo ONODERA | Files touched by this commit (527) |
Log message: Recursive revbump from textproc/icu 57.1 |
2016-04-03 14:46:18 by Joerg Sonnenberger | Files touched by this commit (1) |
Log message: Needs pkg-config. |
2016-03-30 13:38:59 by Filip Hajny | Files touched by this commit (1) |
Log message: Make sure leptonica is detected properly |
2016-03-17 13:51:14 by Filip Hajny | Files touched by this commit (3) | |
Log message: Update graphics/tesseract to 3.04.01. Move to new home at Github. Clean up. 2015-02-17 - V3.04.01 - Added OSD renderer for psm 0. Works for single page and multi-page images. - Improve tesstrain.sh script. - Simplify build and run of ScrollView. - Improved PDF output for OS X Preview utility. - INCOMPATIBLE fix to hOCR line height information - commit 134ebc3. - Added option to build Tesseract without Cube OCR engine (-DNO_CUBE_BUILD). - Enable OpenMP support. - Many bug fixes. 2015-07-11 - V3.04.00 - Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting). - Tesseract now requires leptonica 1.71 or a higher version. - Removed official support for VS 2008. - Added support for 39 additional scripts/languages, including: amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat, iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya, nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd, uzb, uzb_cyrl, yid - Major updates to training system as a result of extensive testing on 100 languages. - New training data for over 100 languages - Improved performance with PIC compilation option. - Significant change to invisible font system in pdf output to improve correctness and compatibility with external programs, particularly ghostscript. - Improved font identification. - Major change to improve layout analysis for heavily diacritic languages: Thai, Vietnamese, Kannada, Telugu etc. - Fixed problems with shifted baselines so recognition can recover from layout analysis errors. - Major refactor to improve speed on difficult images, especially when running a heap checker. - Moved params from global in page layout to tesseractclass. - Improved single column layout analysis. - Allow ocr output to multiple formats using tesseract command line executable. - Fixed issues with mixed eng+ara scripts. - Improved script consistency in numbers. - Major refactor of control.cpp to enable line recognition. - Added tesstrain.sh - a master training script. - Added ability to text2image training tool to just list available fonts. - Added ability to text2image to underline words. - Improved efficiency of image processing for PDF output. - Added parameter description for each parameter listed with 'print-parameters' command line option. - Added font info to hOCR output. - Enabled streaming input and output of multi-page documents. - Many bug fixes. 2014-02-04 - V3.03(rc1) - Added new training tool text2image to generate box/tif file pairs from text and truetype fonts. - Added support for PDF output with searchable text. - Removed entire IMAGE class and all code in image directory. - Tesseract executable: support for output to stdout; limited support for one page images from stdin (especially on Windows) - Added Renderer to API to allow document-level processing and output of document formats, like hOCR, PDF. - Major refactor of word-level recognition, beam search, eliminating dead code. - Refactored classifier to make it easier to add new ones. - Generalized feature extractor to allow feature extraction from greyscale. - Improved sub/superscript treatment. - Improved baseline fit. - Added set_unicharset_properties to training tools. - Many bug fixes. - More training source data included. |
2016-01-06 11:46:56 by Adam Ciarcinski | Files touched by this commit (87) |
Log message: Revbump after updating graphics/libwebp |
2015-11-03 22:34:36 by Alistair G. Crooks | Files touched by this commit (610) |
Log message: Add SHA512 digests for distfiles for graphics category Problems found with existing digests: Package fotoxx distfile fotoxx-14.03.1.tar.gz ac2033f87de2c23941261f7c50160cddf872c110 [recorded] 118e98a8cc0414676b3c4d37b8df407c28a1407c [calculated] Package ploticus-examples distfile ploticus-2.00/plnode200.tar.gz 34274a03d0c41fae5690633663e3d4114b9d7a6d [recorded] da39a3ee5e6b4b0d3255bfef95601890afd80709 [calculated] Problems found locating distfiles: Package AfterShotPro: missing distfile AfterShotPro-1.1.0.30/AfterShotPro_i386.deb Package pgraf: missing distfile pgraf-20010131.tar.gz Package qvplay: missing distfile qvplay-0.95.tar.gz Otherwise, existing SHA1 digests verified and found to be the same on the machine holding the existing distfiles (morden). All existing SHA1 digests retained for now as an audit trail. |
2015-10-07 13:26:22 by Filip Hajny | Files touched by this commit (1) |
Log message: Network libs still needed, fix build on SunOS. |
2014-10-07 18:47:38 by Adam Ciarcinski | Files touched by this commit (442) |
Log message: Revbump after updating libwebp and icu |
2014-10-02 18:06:02 by Adam Ciarcinski | Files touched by this commit (4) |
Log message: Changes 3.02.02: * Moved ResultIterator/PageIterator to ccmain. * Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic. * Added paragraph detection in layout analysis/post OCR. * Fixed inconsistent xheight during training and over-chopping. * Added simultaneous multi-language capability. * Refactored top-level word recognition module. * Added experimental equation detector. * Improved handling of resolution from input images. * Blamer module added for error analysis. * Cleaned up externally used namespace by removing includes from baseapi.h. * Removed dead memory mangagement code. * Tidied up constraints on control parameters. * Added support for ShapeTable in classifier and training. * Refactored class pruner. * Fixed training leaks and randomness. * Major improvements to layout analysis for better image detection, diacritic \ detection, better textline finding, better tabstop finding. * Improved line detection and removal. * Added fixed pitch chopper for CJK. * Added UNICHARSET to WERD_CHOICE to make mult-language handling easier. * Fixed problems with internally scaled images. * Added page and bbox to string in tr files to identify source of training data \ better. * Fixes to Hindi Shiroreka splitter. * Added word bigram correction. * Reduced stack memory consumption and eliminated some ugly typedefs. * Added new uniform classifier API. * Added new training error counter. * Fixed endian bug in dawg reader. * Many other fixes, including the way in which the chopper finds chops and \ messes with the outline while it does so. |
2014-09-23 21:07:06 by Jonathan Perkin | Files touched by this commit (1) |
Log message: SunOS needs -lsocket -lnsl. |