Projects / tesseract-ocr

tesseract-ocr

tesseract-ocr is an OCR engine originally developed by Hewlett Packard and now sponsored by Google. It is highly accurate and will read a binary, gray, or color image and output text.

Tags
Licenses
Operating Systems
Implementation

RSS Recent releases

  •  04 Nov 2012 01:50

Release Notes: This release adds a C API, a new solution for VS (2008), right-to-left/Bidi capability in the output iterators for Hebrew/Arabic, paragraph detection in layout analysis/post OCR, fixes for inconsistent xheight during training and over-chopping, simultaneous multi-language capability, a refactored top-level word recognition module, an experimental equation detector, improved handling of resolution from input images, and a blamer module for error analysis. It cleans an externally-used namespace by removing includes from baseapi.h.

  •  31 Oct 2011 19:28

    Release Notes: This release adds thread safety, a recognizer for Arabic, PageIterator and ResultIterator, and more.

    •  02 Oct 2010 21:21

    Release Notes: Preparations were made for thread safety. A major new page layout analysis module was added. HOCR output was added. Many more languages were added. Most of the function header comments were documented with doxygen. Leptonica was added for main image I/O and handling.

    RSS Recent comments

    08 Mar 2011 10:57 Teiman Thumbs up

    Its easy to use, and has a good quality of recognition. I recommend it over other similar engines.

    Screenshot

    Project Spotlight

    lnav

    A log file navigator.

    Screenshot

    Project Spotlight

    Whole Platform

    A technology for engineering the production of software.