I have installed the pytesseract library using
pip install pytesseract
When I tried to use the image_to_text method, it gave me a
FileNotFoundError: [WinError 2] The system can not find the file specified
I googled it and found that I should change something in the pytesseract.py file and the line
tesseract_cmd = 'tesseract'
should become
tesseract_cmd = path_to_folder_that_contains_tesseractEXE + 'tesseract'
I searched and haven't found any tesseract.exe files in my Python folder, I then reinstalled the library, but the file still wasn't there. Finnally, I replaced the line by:
tesseract_cmd = path_to_folder_that_contains_pytesseractEXE + 'pytesseract'
and my program threw:
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
What can I do make my programm work?
P.S Here is my programm code :
from pytesseract import image_to_string
from PIL import Image, ImageEnhance, ImageFilter
im = Image.open(r'C:\Users\Филипп\Desktop\ImageToText_Python\NoName.png')
print(im)
txt = image_to_string(im)
print(txt)
Full Traceback of first attempt :
File "C:/Users/user/Desktop/ImageToText.py", line 10, in <module>
text = pytesseract.image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 122, in
image_to_string config=config)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 46, in
run_tesseract proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Python\lib\subprocess.py", line 947, in __init__ restore_signals, start_new_session)
File "C:\Python\lib\subprocess.py", line 1224, in _execute_child startupinfo)
FileNotFoundError: [WinError 2]The system can not find the file specified
Full Traceback of second attempt
Traceback (most recent call last):
File "C:\Users\user\Desktop\ImageToText.py", line 6, in <module> txt = image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 125, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
From project's README:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
So, you have to make sure tesseract.exe is on your computer (for example by installing Tesseract-OCR), then add the containing folder to your PATH environment variable, or declare it's location using pytesseract.pytesseract.tesseract_cmd attribute
For people in the same case as me: here is a tesseract-OCR downloader. After you finish the download, go to the path you've chosen, there should be a file named tesseract.exe, copy the path to this file and paste it into pytesseract.exe.
If you are using windows OS - you have to install tesseract-ocr from this link (3.05.01 is the stable version and supported for foreign language extraction). And add the path(where you installed the software) to the environment variable.
If you are using ubuntu OS - in terminal type "sudo apt-get install tesseract-ocr"
Pytesseract is python wrapper that helps you to access this tesseract-ocr software.
Note 1: if you want to extract foreign languages then you have to include tessdata files in the installed path.
Note 2: Python 2 will not have good support on foreign language extraction, so better go with python 3.
Related
I was trying to make an automatic login program for our school website, which requires recognizing text from a captcha code. So I installed pytesseract from pip, and ran the program in PyCharm: (the image is in the directory /Users/macintosh/Documents/PythonOutputs/2.jpg)
import pytesseract
from PIL import Image
image = Image.open("/Users/macintosh/Documents/PythonOutputs/2.jpg")
text = pytesseract.image_to_string(image)
print(text)
But this error occured:
Traceback (most recent call last): File
"/Users/macintosh/Library/Preferences/PyCharmCE2018.2/scratches/scratch_3.py",
line 5, in
text = pytesseract.image_to_string(image)
File
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py",
line 294, in image_to_string
return run_and_get_output(*args)
File
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py",
line 202, in run_and_get_output
run_tesseract(**kwargs)
File
"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytesseract/pytesseract.py",
line 178, in run_tesseract
raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (2, 'Usage: python
pytesseract.py [-l lang] input_file')
What's the problem?
Well, although your error message is not really crystal clear I bet (judging from your actions) you haven't installed Tesseract itself.
In pytessaract documentation it states that:
Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine.
so you should install the actual program (Tesseract that is) to do the job also.
Problem
My tesseract (tesserocr) is not found by the emacs python interpreter, but I am able to use tesseract on the terminal as well as in my Spyder installation. Emacs python interpreter is able to import pytesseract, but not find tesserocr. I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/eghx/agent18/project-gym/tests/thresholding.py", line 34, in image_to_string2
print(image_to_string(img_open))
File "/home/eghx/anaconda3/lib/python3.6/site-packages/pytesseract-0.1.7-py3.6.egg/pytesseract/pytesseract.py", line 122, in image_to_string
File "/home/eghx/anaconda3/lib/python3.6/site-packages/pytesseract-0.1.7-py3.6.egg/pytesseract/pytesseract.py", line 46, in run_tesseract
File "/home/eghx/anaconda3/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/home/eghx/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract': 'tesseract'
when I run
pytesseract.image_to_string(img)
However I don't get this error when I open EMACS from a terminal instead of the desktop. It appears that the path variable is inherited differently in the desktop version and terminal version of emacs. ODD!
Explanation
I have anaconda installation here:/path/to/anaconda3
I have added this line to my init file to run this particular python installation
(setq python-shell-interpreter "/path/to/anaconda3/bin/python")
I installed both pytesseract and tesserocr using conda install
which tesseract gives:
/path/to/anaconda3/bin/tesseract
$ echo $PATH gives:
/path/to/anaconda3/bin:/usr/local/sbin:/usr/lo....
What I did
I copied the sys.path from the working Spyder IDE to emacs python interpreter and still didn't work.
I looked around and found this but the top answer does not pertain to my case, as my $PATH variable contains the necessary path.
Can someone guide me? I am a noob. I have emacs 27 and ubuntu 16 and conda 4.5.0.
This is a possible duplicate of OSError: [Errno 2] No such file or directory using pytesser
Answer was found as per the 3rd point in the link, quoted below:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'path-to-tesseract-including-bin'
In my case,
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '/home/anaconda3/bin/tesseract'
This is only a temporary hack to get image_to_string to work, by typing the above in every file.
Why the $PATH variable having the /home/anaconda3/bin is not enough to get it to work sufficiently is not known. This seems to be a slighty long-term-temporary solution.
I'm trying to use pytesseract for OCR.
I have installed google tesseract 3.03
I have installed pytesseract 0.1.6
I am running Python 3.5.1
I am running Windows 8
Tesseract is also in my path (I can call it from anywhere in a normal CMD and it will return the help function)
And this is the code I try to execute:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
im=Image.open('C:/Users/NeusAap/Google Drive/School/Jaar 1/Periode 1/Programming/Miniproject/GarageProject/scripts/test.png')
print(pytesseract.image_to_string(im))
But it returns this error:
Traceback (most recent call last):
File "C:/Users/NeusAap/Google Drive/School/Jaar 1/Periode 1/Programming/Miniproject/GarageProject/scripts/main.py", line 8, in <module>
print(pytesseract.image_to_string(im))
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
config=config)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\subprocess.py", line 1224, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Process finished with exit code 1
I know that both tesseract and pytesseract work because if I run this from CMD:
python pytesseract.py -l eng+nld test.png
It does work, and returns me the characters as expected.
What am I doing wrong?
Thanks in advance!
Mats de Waard
I finally got it working. Seems like everything was set up right, and that I was calling everything correctly, but I needed to reboot Windows, because the files could not be found by Python.
I forgot that windows debugging always starts with a reboot :P
First of all I did everything mentioned here pytesseract-no such file or directory error
Still doesn't work. Now I'm using Pycharm IDE with following code:
from PIL import Image
import pytesseract
import subprocess
im = Image.open('test.png')
im.show()
subprocess.call(['tesseract','test.png','out'])
print pytesseract.image_to_string(Image.open('test.png'))
im.show() opens the image successfully.
subprocess.call() with tesseract test.png out also extracts the text
from the image..
but pytesseract.image_to_string() fails.
I don't get it. Why I am able to use tesseract in shell but not in python. And in python I can open same image but when used with tesseract Image can't be found.
Below you can see the error output.
File "/home/hamza-c/Schreibtisch/Android/JioShare/orc.py", line 7, in <module>
print pytesseract.image_to_string(Image.open('/home/hamza-c/Schreibtisch/Android/JioShare/test.png'))
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 162, in image_to_string
config=config)
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 95, in run_tesseract
stderr=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1340, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I tested the code you mentioned in your question. It works fine. I was facing the same error
No such file or directory found
The problem was the directory containing 'tesseract.exe' was not added to the environment Variable. You should be able to run command 'tesseract' in command prompt.
if tesseract is not installed you can download it from tesseract
1: https://github.com/tesseract-ocr/tesseract/wiki and for windows use third party installer available here
maybe you need install tesseract ,if your os is centos, please enter
yum install tesseract
I've used the following command and it worked for me:
brew install tesseract
I solved my own question.
im = Image.open('test.png')
print pytesseract.image_to_string(im)
It's still unclear why it works when a reference is passed but not directly when I try to open image inside the parameter.
My code is straight forward and is the following:
import pytesseract
from PIL import Image
img = Image.open('C:/temp/foo.jpg')
img.load()
i = pytesseract.image_to_string(img)
and the error response I get back is:
Traceback (most recent call last):
File "img.py", line 6, in <module>
i = pytesseract.image_to_string(img)
File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 161, in image_to
_string
File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 94, in run_tesse
ract
File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda\lib\subprocess.py",
line 710, in __init__
errread, errwrite)
File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda\lib\subprocess.py",
line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
Any guidance would be fantastic.
Adding tesseract to my path variable helped:
C:\Program Files (x86)\Tesseract-OCR
But the code now crashes when trying to run the pytesseract piece.
Just hit the same error and decided to answer this question - it might help someone to save time...
First, make sure you have installed/copied Tesseract-OCR executables.
Windows can't find the executable tesseract in the directories specified in your PATH environment variable. So either make sure that the directory containing tesseract is in your PATH variable or overwrite tesseract_cmd variable in your Python script like as following (put your PATH instead):
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
Beside that make sure that TESSDATA_PREFIX Windows environment variable is set to the directory, containing tessdata directory. For example:
TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR
if tessdata location is: C:\Program Files (x86)\Tesseract-OCR\tessdata