Home

Awesome

Pytesser

Python wrapper for the tesseract OCR engine. The module is based on OpenCV.

Informations

There is already multiples tesseract python modules, but none of them satisfied me. This one is different on the following point:

Installation

sudo apt-get install tesseract tesseract-ocr-all
sudo pip install opencv-python

How to use it ?

There is to ways to use it. Either you give it a filename, either directly an image. For a filename you can do:

import pytesser
txt = pytesser.image_file_to_string("myimage.jpg")
#By default language is eng, and page seg mode auto

#To give specifify parameters:
txt = pytesser.image_to_string("myimage.jpg","fra",pytesser.PSM_SINGLE_WORD) #Analyse image as a single french word

Or you can directly give it an OpenCV image like this:

image = cv2.imread("myimage.jpg")
txt = pytesser.image_to_string(image)