RSS 15 projects tagged "OCR"

Download Website Updated 24 Mar 2014 GNU Ocrad

Screenshot
Pop 663.70
Vit 72.88

GNU Ocrad is an OCR (Optical Character Recognition) program and library based on a feature extraction method. It reads images in pbm (bitmap), pgm (greyscale), or ppm (color) formats and produces text in byte (8-bit) or UTF-8 formats. It also includes a layout analyzer that is able to separate the columns or blocks of text normally found on printed pages. Ocrad can be used as a stand-alone console application, or as a backend to other programs.

Download Website Updated 21 Feb 2014 OCRKit

Screenshot
Pop 276.59
Vit 18.57

OCRKit uses OCR to recognize the text in a graphic, which is particular useful for PDFs received via email, created by DTP, office applications, or images obtained from a scanner, copier, or digital still camera.

Download Website Updated 05 Mar 2014 Paperwork

Screenshot
Pop 214.45
Vit 2.57

Paperwork is a GUI to make papers easily searchable using OCR. The basic idea behind Paperwork is "scan & forget" : You should be able to just scan a new document and forget about it until the day you need it again.

No download Website Updated 20 Mar 2014 Pyocr

Screenshot
Pop 147.95
Vit 4.66

Pyocr is a simple Python wrapper for OCR engines (Tesseract, Cuneiform, etc.). It supports Python 2.7 and Python 3.x, and requires Pillow.

Download Website Updated 04 Nov 2012 tesseract-ocr

Screenshot
Pop 142.41
Vit 2.70

tesseract-ocr is an OCR engine originally developed by Hewlett Packard and now sponsored by Google. It is highly accurate and will read a binary, gray, or color image and output text.

No download Website Updated 14 Oct 2013 getxbook

Screenshot
Pop 110.86
Vit 5.68

getxbook is a collection of tools to download books from websites. There are tools to download from Google Books' "book preview", Amazon's "look inside the book", and Barnes and Noble's "book viewer". There is an optional GUI written in Tcl/Tk, and some shell scripts using OCR to create plain text or searchable PDFs and DjVu files from the downloaded books.

No download Website Updated 19 Dec 2011 OCRFeeder

Screenshot
Pop 110.20
Vit 1.47

OCRFeeder is a document layout analysis and optical character recognition application. It is able to automatically outline a document image's contents, distinguish between graphics and text and perform OCR over the latter. It can export to several formats, its main one being ODT. OCRFeeder has a GTK+ graphical user interface that allows the user to control the application and, for example, edit and correct the automatic recognition. It can also be used from the command line for automation.

Download Website Updated 03 Sep 2010 Paperless Office

Screenshot
Pop 86.36
Vit 1.00

Paperless Office is a document management and electronic filing system. It is similar to Paperport, but adds many new features, such as automatic document classification, synchronization with your filing cabinet, date extraction, semantic Web integration, and sophisticated natural language processing, such as extracting todo lists from documents, spam detection, urgency classification, along with planning, scheduling, and execution features. You can set due dates and interdependencies for documents and tasks, so it has workflow support.

Download Website Updated 19 Jun 2012 MALODOS

Screenshot
Pop 73.76
Vit 3.63

MALODOS helps you to scan, store, and easily retrieve all your personal documents. Its storage format is open and documented, so your document archive can remain accessible even without MALODOS. The documents themselves are stored as standard PDF files, while their metadata (such as title, tags, and description) are stored into a separate SQLite database in an open format. With MALODOS, you can also manage existing files in PDF, JPEG, TIFF, and other formats, so you can still use the documents that you've already scanned. You can connect to any external OCR program to give access to a fulltext search feature.

Download Website Updated 23 Jan 2010 FuzzyOcr

Screenshot
Pop 47.33
Vit 1.01

FuzzyOcr is a plugin for SpamAssassin that can be used on image spam. It supports optical character recognition using different engines and settings, a fuzzy word matching algorithm applied to OCR results, an image hashing system to learn the unique properties of known spam images, dimension, size, and integrity checking of images, and content-type verification for the containing email message.

Screenshot

Project Spotlight

Lernstick Exam Environment

A live Linux distribution for exams.

Screenshot

Project Spotlight

Surf Canyon for Firefox

A plugin to find things faster on Google, Yahoo!, and MSN.