PDF OCR is a simple drag-and-drop utility that converts PDFs and images into text documents. It uses advanced OCR (optical character recognition) technology to extract the text of the PDF or image. This is particularly useful for dealing with PDFs and images that were created via a scan-to-PDF function in a scanner or photo copier. It uses the Tesseract engine to perform OCR, and currently supports over 20 languages for OCR.
FormReturn is OMR (Optical Mark Recognition) software that has many features and is easy to use. It gives anyone the ability to design printable forms and distribute, capture, and automatically grade/analyze handwritten multiple choice response information instantly. All you need is the FormReturn Application, a printer, and a document scanner, and you can process hundreds of forms within minutes.
GNU Ocrad is an OCR (Optical Character Recognition) program and library based on a feature extraction method. It reads images in pbm (bitmap), pgm (greyscale), or ppm (color) formats and produces text in byte (8-bit) or UTF-8 formats. It also includes a layout analyzer that is able to separate the columns or blocks of text normally found on printed pages. Ocrad can be used as a stand-alone console application, or as a backend to other programs.
Mayan EDMS is a document manager Web application with custom metadata indexing, file serving integration, and OCR capabilities. It features user defined metadata fields, dynamic default values for metadata, lookup support for metadata, filesystem integration by means of metadata indexing directories, user defined document UUID generation, local file or server side staging file uploads, batch uploading of many documents with the same metadata, user defined document checksum algorithms, previews for a great deal of image formats including PDF, document OCR and searching, automatic grouping of documents by metadata, permissions and roles support, multi-page document support, page transformations, distributed OCR processing, and support for multiple languages.
GlyphViewer is a desktop application that allows users to build translations from text in images and export them into different image formats or even HTML. Users can use OCR technology to identify text in images, such as English, German, Chinese, Arabic, Japanese, and many more. A unique feature of the application is its support for Ancient Egyptian Hieroglyphs.
getxbook is a collection of tools to download books from websites. There are tools to download from Google Books' "book preview", Amazon's "look inside the book", and Barnes and Noble's "book viewer". There is an optional GUI written in Tcl/Tk, and some shell scripts using OCR to create plain text or searchable PDFs and DjVu files from the downloaded books.
Aspose.OCR for .NET is a character recognition component built to allow developers to add OCR functionality in their ASP .NET Web applications, Web services, and applications. It provides a simple set of classes for controlling character recognition tasks and supports BMP and TIFF.
Character Recognition is an Android app that allows the user to take a photo (or use existing image files on the device) and then apply the Tesseract OCR engine to extract the text in the photo. It is currently supporting English text, but other language support will be added in the future.