HOCR (software)
optical character recognition
In computer software, hOCR is a free Hebrew optical character recognition software. It is based on the libhocr Hebrew optical character recognition engine.
About the libhocr OCR Engine
libhocr is a GNU Hebrew optical character recognition engine. It is designed for use with old yellow stained Hebrew poetry and religious texts. libhocr include an image processing unit to remove yellow stains and fix page image. libhocr can understand complex page layouts frequent in old religious texts (Talmud pages)). libhocr can read and understand Nikud, understanding Nikud is essential for Hebrew poetry optical character recognition.libhocr can use the GTK toolkit to load images. It can load png, jpeg, tiff, bmp, pnm and any other image format supported by GTK. libhocr can automatically fix steined, dark, bright and rotated images.
libhocr outputs the recognized text using UTF-8 encoding. It can output the text as plain text or using Google's hocr html format for OCR output.
User interfaces
hOCR include two user interfaces. A graphical user interface and a command line tool.- hocr-gtk is a graphical user interface build using GTK and Python. It is a simple easy to use user interface. Interface designed by Yuval Tanny.
hocr can process old yellow stained images and rotated texts.
hocr-gtk scanning an old yellow text
hocr can undestand texts with Nikud.
hocr-gtk scanning poetry with nikud
- hocr is a command line tool. It is a more powerful tool designed for automation of the OCR process.
hocr can be used for automation
See also
Category: Optical character recognition
» full article | source
