First of all I did everything mentioned here pytesseract-no such file or directory error
Still doesn't work. Now I'm using Pycharm IDE with following code:
from PIL import Image
import pytesseract
import subprocess
im = Image.open('test.png')
im.show()
subprocess.call(['tesseract','test.png','out'])
print pytesseract.image_to_string(Image.open('test.png'))
im.show() opens the image successfully.
subprocess.call() with tesseract test.png out also extracts the text
from the image..
but pytesseract.image_to_string() fails.
I don't get it. Why I am able to use tesseract in shell but not in python. And in python I can open same image but when used with tesseract Image can't be found.
Below you can see the error output.
File "/home/hamza-c/Schreibtisch/Android/JioShare/orc.py", line 7, in <module>
print pytesseract.image_to_string(Image.open('/home/hamza-c/Schreibtisch/Android/JioShare/test.png'))
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 162, in image_to_string
config=config)
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 95, in run_tesseract
stderr=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1340, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I tested the code you mentioned in your question. It works fine. I was facing the same error
No such file or directory found
The problem was the directory containing 'tesseract.exe' was not added to the environment Variable. You should be able to run command 'tesseract' in command prompt.
if tesseract is not installed you can download it from tesseract
1: https://github.com/tesseract-ocr/tesseract/wiki and for windows use third party installer available here
maybe you need install tesseract ,if your os is centos, please enter
yum install tesseract
I've used the following command and it worked for me:
brew install tesseract
I solved my own question.
im = Image.open('test.png')
print pytesseract.image_to_string(im)
It's still unclear why it works when a reference is passed but not directly when I try to open image inside the parameter.
Related
Problem
My tesseract (tesserocr) is not found by the emacs python interpreter, but I am able to use tesseract on the terminal as well as in my Spyder installation. Emacs python interpreter is able to import pytesseract, but not find tesserocr. I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/eghx/agent18/project-gym/tests/thresholding.py", line 34, in image_to_string2
print(image_to_string(img_open))
File "/home/eghx/anaconda3/lib/python3.6/site-packages/pytesseract-0.1.7-py3.6.egg/pytesseract/pytesseract.py", line 122, in image_to_string
File "/home/eghx/anaconda3/lib/python3.6/site-packages/pytesseract-0.1.7-py3.6.egg/pytesseract/pytesseract.py", line 46, in run_tesseract
File "/home/eghx/anaconda3/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/home/eghx/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract': 'tesseract'
when I run
pytesseract.image_to_string(img)
However I don't get this error when I open EMACS from a terminal instead of the desktop. It appears that the path variable is inherited differently in the desktop version and terminal version of emacs. ODD!
Explanation
I have anaconda installation here:/path/to/anaconda3
I have added this line to my init file to run this particular python installation
(setq python-shell-interpreter "/path/to/anaconda3/bin/python")
I installed both pytesseract and tesserocr using conda install
which tesseract gives:
/path/to/anaconda3/bin/tesseract
$ echo $PATH gives:
/path/to/anaconda3/bin:/usr/local/sbin:/usr/lo....
What I did
I copied the sys.path from the working Spyder IDE to emacs python interpreter and still didn't work.
I looked around and found this but the top answer does not pertain to my case, as my $PATH variable contains the necessary path.
Can someone guide me? I am a noob. I have emacs 27 and ubuntu 16 and conda 4.5.0.
This is a possible duplicate of OSError: [Errno 2] No such file or directory using pytesser
Answer was found as per the 3rd point in the link, quoted below:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'path-to-tesseract-including-bin'
In my case,
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '/home/anaconda3/bin/tesseract'
This is only a temporary hack to get image_to_string to work, by typing the above in every file.
Why the $PATH variable having the /home/anaconda3/bin is not enough to get it to work sufficiently is not known. This seems to be a slighty long-term-temporary solution.
I have installed the pytesseract library using
pip install pytesseract
When I tried to use the image_to_text method, it gave me a
FileNotFoundError: [WinError 2] The system can not find the file specified
I googled it and found that I should change something in the pytesseract.py file and the line
tesseract_cmd = 'tesseract'
should become
tesseract_cmd = path_to_folder_that_contains_tesseractEXE + 'tesseract'
I searched and haven't found any tesseract.exe files in my Python folder, I then reinstalled the library, but the file still wasn't there. Finnally, I replaced the line by:
tesseract_cmd = path_to_folder_that_contains_pytesseractEXE + 'pytesseract'
and my program threw:
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
What can I do make my programm work?
P.S Here is my programm code :
from pytesseract import image_to_string
from PIL import Image, ImageEnhance, ImageFilter
im = Image.open(r'C:\Users\Филипп\Desktop\ImageToText_Python\NoName.png')
print(im)
txt = image_to_string(im)
print(txt)
Full Traceback of first attempt :
File "C:/Users/user/Desktop/ImageToText.py", line 10, in <module>
text = pytesseract.image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 122, in
image_to_string config=config)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 46, in
run_tesseract proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Python\lib\subprocess.py", line 947, in __init__ restore_signals, start_new_session)
File "C:\Python\lib\subprocess.py", line 1224, in _execute_child startupinfo)
FileNotFoundError: [WinError 2]The system can not find the file specified
Full Traceback of second attempt
Traceback (most recent call last):
File "C:\Users\user\Desktop\ImageToText.py", line 6, in <module> txt = image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 125, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
From project's README:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
So, you have to make sure tesseract.exe is on your computer (for example by installing Tesseract-OCR), then add the containing folder to your PATH environment variable, or declare it's location using pytesseract.pytesseract.tesseract_cmd attribute
For people in the same case as me: here is a tesseract-OCR downloader. After you finish the download, go to the path you've chosen, there should be a file named tesseract.exe, copy the path to this file and paste it into pytesseract.exe.
If you are using windows OS - you have to install tesseract-ocr from this link (3.05.01 is the stable version and supported for foreign language extraction). And add the path(where you installed the software) to the environment variable.
If you are using ubuntu OS - in terminal type "sudo apt-get install tesseract-ocr"
Pytesseract is python wrapper that helps you to access this tesseract-ocr software.
Note 1: if you want to extract foreign languages then you have to include tessdata files in the installed path.
Note 2: Python 2 will not have good support on foreign language extraction, so better go with python 3.
I'm trying to use pytesseract for OCR.
I have installed google tesseract 3.03
I have installed pytesseract 0.1.6
I am running Python 3.5.1
I am running Windows 8
Tesseract is also in my path (I can call it from anywhere in a normal CMD and it will return the help function)
And this is the code I try to execute:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
im=Image.open('C:/Users/NeusAap/Google Drive/School/Jaar 1/Periode 1/Programming/Miniproject/GarageProject/scripts/test.png')
print(pytesseract.image_to_string(im))
But it returns this error:
Traceback (most recent call last):
File "C:/Users/NeusAap/Google Drive/School/Jaar 1/Periode 1/Programming/Miniproject/GarageProject/scripts/main.py", line 8, in <module>
print(pytesseract.image_to_string(im))
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
config=config)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\subprocess.py", line 1224, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Process finished with exit code 1
I know that both tesseract and pytesseract work because if I run this from CMD:
python pytesseract.py -l eng+nld test.png
It does work, and returns me the characters as expected.
What am I doing wrong?
Thanks in advance!
Mats de Waard
I finally got it working. Seems like everything was set up right, and that I was calling everything correctly, but I needed to reboot Windows, because the files could not be found by Python.
I forgot that windows debugging always starts with a reboot :P
My code is straight forward and is the following:
import pytesseract
from PIL import Image
img = Image.open('C:/temp/foo.jpg')
img.load()
i = pytesseract.image_to_string(img)
and the error response I get back is:
Traceback (most recent call last):
File "img.py", line 6, in <module>
i = pytesseract.image_to_string(img)
File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 161, in image_to
_string
File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 94, in run_tesse
ract
File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda\lib\subprocess.py",
line 710, in __init__
errread, errwrite)
File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda\lib\subprocess.py",
line 958, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
Any guidance would be fantastic.
Adding tesseract to my path variable helped:
C:\Program Files (x86)\Tesseract-OCR
But the code now crashes when trying to run the pytesseract piece.
Just hit the same error and decided to answer this question - it might help someone to save time...
First, make sure you have installed/copied Tesseract-OCR executables.
Windows can't find the executable tesseract in the directories specified in your PATH environment variable. So either make sure that the directory containing tesseract is in your PATH variable or overwrite tesseract_cmd variable in your Python script like as following (put your PATH instead):
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
Beside that make sure that TESSDATA_PREFIX Windows environment variable is set to the directory, containing tessdata directory. For example:
TESSDATA_PREFIX=C:\Program Files (x86)\Tesseract-OCR
if tessdata location is: C:\Program Files (x86)\Tesseract-OCR\tessdata
What I'm trying to do here is save the contents of a Tkinter Canvas as a .png image using PIL.
This is my save function ('graph' is the canvas).
def SaveAs():
filename = tkFileDialog.asksaveasfilename(initialfile="Untitled Graph", parent=master)
graph.postscript(file=filename+".eps")
img = Image.open(filename+".eps")
img.save(filename+".png", "png")
But it's getting this error:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\Tkinter.py", line 1410, in __call__
return self.func(*args)
File "C:\Users\Adam\Desktop\Graphing Calculator\Graphing Calculator.py", line 352, in SaveAs
img.save(filename+".png", "png")
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1406, in save
self.load()
File "C:\Python27\lib\site-packages\PIL\EpsImagePlugin.py", line 283, in load
self.im = Ghostscript(self.tile, self.size, self.fp)
File "C:\Python27\lib\site-packages\PIL\EpsImagePlugin.py", line 72, in Ghostscript
gs.write(s)
IOError: [Errno 32] Broken pipe
I'm running this on Windows 7, Python 2.7.1.
How do I make this work?
oh I just get the same error. I have solve it now
just do the following after installing PIL and Ghostscript
1) Open C:\Python27\Lib\site-packages\PIL\EpsImagePlugin.py
2) Change code near line 50 so that it looks like this:
Build ghostscript command
command = ["gswin32c",
"-q", # quite mode
"-g%dx%d" % size, # set output geometry (pixels)
"-dNOPAUSE -dSAFER", # don't pause between pages, safe mode
"-sDEVICE=ppmraw", # ppm driver
"-sOutputFile=%s" % file,# output file
"-"
]
Make sure that gswin32c.exe is in the PATH
good luck
It looks like the Ghostscript executable is erroring out and then closing the connection. Others have had this same problem on different OSes.
So, first I would recommend that you confirm that PIL is installed correctly--see the FAQ page for hints. Next, ensure that Ghostscript is installed and working. Lastly, ensure that Python can find Ghostscript, for example by running a PIL script that works elsewhere.
Oh, also--here are some tips on catching the broken pipe error so your program can be more resilient, recognize the problem, and warn the end-user. Hope that helps!
I have realized that while Python 2.7 has this EPEImagePulgin.py, Anaconda also has it. And unfortunately Anaconda's file is an older version. And unfortunately again, when you run your from Spyder environment it was picking up the epsimageplugin.py file from anaconda folder.
So I was getting similar broken pipe error.
When I went into python 2.7 directory and opened python console and then ran my code, it ran just fine.
Because the lates epsimageplugin.py file takes into consideration the windows environment and appropriate ghostscript exe files. Hope this helps.