App to OCR on Linux
Tesseract is an Open Source OCR engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API. It supports a wide variety of languages. Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page. gImageReader is a simple Gtk front-end to tesseract. It is part of the standard repositories for Fedora 20 Features include: - Automatic page layout detection - User can manually define and adjust recognition regions - Import images from disk, scanning devices, clipboard and screenshots - Supports multipage PDF documents - Recognized text displayed directly next to the image - Basic editing of output text, including search/replace and removing line breaks - Spellchecking for output text (if corresponding dictionary installed)