Problem occurred when using PyTesseract to recognize text from an image - python

I was trying to make an automatic login program for our school website, which requires recognizing text from a captcha code. So I installed pytesseract from pip, and ran the program in PyCharm: (the image is in the directory /Users/macintosh/Documents/PythonOutputs/2.jpg)
import pytesseract
from PIL import Image
image = Image.open("/Users/macintosh/Documents/PythonOutputs/2.jpg")
text = pytesseract.image_to_string(image)
print(text)
But this error occured:
Traceback (most recent call last): File
"/Users/macintosh/Library/Preferences/PyCharmCE2018.2/scratches/scratch_3.py",
line 5, in
text = pytesseract.image_to_string(image)
File
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py",
line 294, in image_to_string
return run_and_get_output(*args)
File
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py",
line 202, in run_and_get_output
run_tesseract(**kwargs)
File
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py",
line 178, in run_tesseract
raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (2, 'Usage: python
pytesseract.py [-l lang] input_file')
What's the problem?

Well, although your error message is not really crystal clear I bet (judging from your actions) you haven't installed Tesseract itself.
In pytessaract documentation it states that:
Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine.
so you should install the actual program (Tesseract that is) to do the job also.

Related

PyCharm can't find Tesseract on Mac

I'm starting out with some basic python tutorials with OpenCV, and the first tutorial uses Tesseract, Pytesseract, and OpenCV. I have Tesseract downloaded and pip installed, and I have Pytesseract and OpenCV downloaded, installed, and included in my PyCharm packages, so I think the problem is how I'm addressing the Tesseract file in my code, since I'm new to using a Mac.
(I'm using Python 3.8, but also have Python 2.7 installed, because I needed it to get to this point. Weirdly enough, up to this point, the code only ran without error if I had Python 2.7 installed, but had 3.8 as my PyCharm interpreter.)
When I put Tesseract into my terminal, it tells me that the file address is simply 'Applications/tesseract'. But when I use this as the address in PyCharm, I get the error message below. If anyone could help me figure out how to handle this error, I would appreciate it a lot!!! (I'm new to everything computers, btw. This is how I'm learning.)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/george/PycharmProjects/pythonProject2/main.py", line 7, in <module>
print(pytesseract.image_to_string(img))
File "/Users/george/Library/Python/3.8/lib/python/site-packages/pytesseract/pytesseract.py", line 370, in image_to_string
return {
File "/Users/george/Library/Python/3.8/lib/python/site-packages/pytesseract/pytesseract.py", line 373, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "/Users/george/Library/Python/3.8/lib/python/site-packages/pytesseract/pytesseract.py", line 282, in run_and_get_output
run_tesseract(**kwargs)
File "/Users/george/Library/Python/3.8/lib/python/site-packages/pytesseract/pytesseract.py", line 254, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: \Applications\tesseract is not installed or it's not in your PATH. See README file for more information."
I don't know what to look for in the README file, though.
Here is the code that kicked off the error message:
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '\\Applications\\tesseract'
img = cv2.imread('im1.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
print(pytesseract.image_to_string(img))
cv2.imshow('Result', img)
cv2.waitKey(0)
On Macs (and other Unix-like OSes), the path separator is the forward slash, not the backslash.
pytesseract.pytesseract.tesseract_cmd = '/Applications/tesseract'
could work better.
However, do you actually need to explicitly set that? It's likely the library could be able to find Tesseract on its own

File tesseract.exe does not exist

I have installed the pytesseract library using
pip install pytesseract
When I tried to use the image_to_text method, it gave me a
FileNotFoundError: [WinError 2] The system can not find the file specified
I googled it and found that I should change something in the pytesseract.py file and the line
tesseract_cmd = 'tesseract'
should become
tesseract_cmd = path_to_folder_that_contains_tesseractEXE + 'tesseract'
I searched and haven't found any tesseract.exe files in my Python folder, I then reinstalled the library, but the file still wasn't there. Finnally, I replaced the line by:
tesseract_cmd = path_to_folder_that_contains_pytesseractEXE + 'pytesseract'
and my program threw:
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
What can I do make my programm work?
P.S Here is my programm code :
from pytesseract import image_to_string
from PIL import Image, ImageEnhance, ImageFilter
im = Image.open(r'C:\Users\Филипп\Desktop\ImageToText_Python\NoName.png')
print(im)
txt = image_to_string(im)
print(txt)
Full Traceback of first attempt :
File "C:/Users/user/Desktop/ImageToText.py", line 10, in <module>
text = pytesseract.image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 122, in
image_to_string config=config)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 46, in
run_tesseract proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Python\lib\subprocess.py", line 947, in __init__ restore_signals, start_new_session)
File "C:\Python\lib\subprocess.py", line 1224, in _execute_child startupinfo)
FileNotFoundError: [WinError 2]The system can not find the file specified
Full Traceback of second attempt
Traceback (most recent call last):
File "C:\Users\user\Desktop\ImageToText.py", line 6, in <module> txt = image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 125, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
From project's README:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
So, you have to make sure tesseract.exe is on your computer (for example by installing Tesseract-OCR), then add the containing folder to your PATH environment variable, or declare it's location using pytesseract.pytesseract.tesseract_cmd attribute
For people in the same case as me: here is a tesseract-OCR downloader. After you finish the download, go to the path you've chosen, there should be a file named tesseract.exe, copy the path to this file and paste it into pytesseract.exe.
If you are using windows OS - you have to install tesseract-ocr from this link (3.05.01 is the stable version and supported for foreign language extraction). And add the path(where you installed the software) to the environment variable.
If you are using ubuntu OS - in terminal type "sudo apt-get install tesseract-ocr"
Pytesseract is python wrapper that helps you to access this tesseract-ocr software.
Note 1: if you want to extract foreign languages then you have to include tessdata files in the installed path.
Note 2: Python 2 will not have good support on foreign language extraction, so better go with python 3.

Pytesseract traceback error "file not found"

I'm trying to use pytesseract for OCR.
I have installed google tesseract 3.03
I have installed pytesseract 0.1.6
I am running Python 3.5.1
I am running Windows 8
Tesseract is also in my path (I can call it from anywhere in a normal CMD and it will return the help function)
And this is the code I try to execute:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
im=Image.open('C:/Users/NeusAap/Google Drive/School/Jaar 1/Periode 1/Programming/Miniproject/GarageProject/scripts/test.png')
print(pytesseract.image_to_string(im))
But it returns this error:
Traceback (most recent call last):
File "C:/Users/NeusAap/Google Drive/School/Jaar 1/Periode 1/Programming/Miniproject/GarageProject/scripts/main.py", line 8, in <module>
print(pytesseract.image_to_string(im))
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
config=config)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\subprocess.py", line 1224, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Process finished with exit code 1
I know that both tesseract and pytesseract work because if I run this from CMD:
python pytesseract.py -l eng+nld test.png
It does work, and returns me the characters as expected.
What am I doing wrong?
Thanks in advance!
Mats de Waard
I finally got it working. Seems like everything was set up right, and that I was calling everything correctly, but I needed to reboot Windows, because the files could not be found by Python.
I forgot that windows debugging always starts with a reboot :P

Pytesseract No such file or directory error

First of all I did everything mentioned here pytesseract-no such file or directory error
Still doesn't work. Now I'm using Pycharm IDE with following code:
from PIL import Image
import pytesseract
import subprocess
im = Image.open('test.png')
im.show()
subprocess.call(['tesseract','test.png','out'])
print pytesseract.image_to_string(Image.open('test.png'))
im.show() opens the image successfully.
subprocess.call() with tesseract test.png out also extracts the text
from the image..
but pytesseract.image_to_string() fails.
I don't get it. Why I am able to use tesseract in shell but not in python. And in python I can open same image but when used with tesseract Image can't be found.
Below you can see the error output.
File "/home/hamza-c/Schreibtisch/Android/JioShare/orc.py", line 7, in <module>
print pytesseract.image_to_string(Image.open('/home/hamza-c/Schreibtisch/Android/JioShare/test.png'))
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 162, in image_to_string
config=config)
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 95, in run_tesseract
stderr=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1340, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I tested the code you mentioned in your question. It works fine. I was facing the same error
No such file or directory found
The problem was the directory containing 'tesseract.exe' was not added to the environment Variable. You should be able to run command 'tesseract' in command prompt.
if tesseract is not installed you can download it from tesseract
1: https://github.com/tesseract-ocr/tesseract/wiki and for windows use third party installer available here
maybe you need install tesseract ,if your os is centos, please enter
yum install tesseract
I've used the following command and it worked for me:
brew install tesseract
I solved my own question.
im = Image.open('test.png')
print pytesseract.image_to_string(im)
It's still unclear why it works when a reference is passed but not directly when I try to open image inside the parameter.

PIL with Centos 7 and Python 3.4

Does anyone have a working implementation of PIL with Centos 7 and Python 3.4?
I tried to install the tarball from here, but when running "make" or "make test" got a lot of errors.
If anyone has a working implementation, could you please post the commands and configuration needed i.e. yum packages to get it to work?
I also found this blog post but it didn't help.
Here is one error that I am seeing when trying to convert a jpeg's size:
Traceback (most recent call last):
File "<console>", line 2, in <module>
File "/var/www/deploy/myproject/myproject-django/venv/lib/python3.4/site-packages/PIL/Image.py", line 1557, in resize
self.load()
File "/var/www/deploy/myproject/myproject-django/venv/lib/python3.4/site-packages/PIL/ImageFile.py", line 203, in load
d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
File "/var/www/deploy/myproject/myproject-django/venv/lib/python3.4/site-packages/PIL/Image.py", line 420, in _getdecoder
raise IOError("decoder %s not available" % decoder_name)
OSError: decoder jpeg not available
Make sure you install the dependencies first, and then reinstall Pillow. Looks like you're missing libjpeg. See here for details.
http://pillow.readthedocs.org/en/3.0.x/installation.html

Categories