App to OCR on Linux | Web3us LLC

Tesseract is an Open Source OCR engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API. It supports a wide variety of languages. Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page. gImageReader is a simple Gtk front-end to tesseract. It is part of the standard repositories for Fedora 20 Features include: - Automatic page layout detection - User can manually define and adjust recognition regions - Import images from disk, scanning devices, clipboard and screenshots - Supports multipage PDF documents - Recognized text displayed directly next to the image - Basic editing of output text, including search/replace and removing line breaks - Spellchecking for output text (if corresponding dictionary installed)

Popular

User login

Search

Popular

Recent content