I'm trying to use pytesseract for OCR.
I have installed google tesseract 3.03
I have installed pytesseract 0.1.6
I am running Python 3.5.1
I am running Windows 8
Tesseract is also in my path (I can call it from anywhere in a normal CMD and it will return the help function)
And this is the code I try to execute:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
im=Image.open('C:/Users/NeusAap/Google Drive/School/Jaar 1/Periode 1/Programming/Miniproject/GarageProject/scripts/test.png')
print(pytesseract.image_to_string(im))
But it returns this error:
Traceback (most recent call last):
File "C:/Users/NeusAap/Google Drive/School/Jaar 1/Periode 1/Programming/Miniproject/GarageProject/scripts/main.py", line 8, in <module>
print(pytesseract.image_to_string(im))
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
config=config)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "C:\Users\NeusAap\AppData\Local\Programs\Python\Python35-32\lib\subprocess.py", line 1224, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Process finished with exit code 1
I know that both tesseract and pytesseract work because if I run this from CMD:
python pytesseract.py -l eng+nld test.png
It does work, and returns me the characters as expected.
What am I doing wrong?
Thanks in advance!
Mats de Waard
I finally got it working. Seems like everything was set up right, and that I was calling everything correctly, but I needed to reboot Windows, because the files could not be found by Python.
I forgot that windows debugging always starts with a reboot :P
Related
I'm working on a Mac and I'm trying to open an external program (Visual Studio Code) using the subprocess module in Python. The code binary file is in usr/local/bin, which is in the PATH variable. My code works fine when I run it on a shell using python3.8 main.py. However, when I run the program using IDLE, Python keeps giving me
Traceback (most recent call last):
File "/Users/mike/Desktop/main.py", line 33, in <module>
subprocess.Popen(['code'])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'code'
My code is as simple as it can get:
import subprocess
subprocess.Popen(['code'])
Does anyone know what's causing the problem and how I can fix it?
Thanks everyone.
If there are any extension, can you try with that while opening the file. Also we can try with providing the absolute path once.
try to use the full path of visual studio code, instead of just the word 'code'
subprocess.Popen(['usr/local/code'])
I have installed exiftool (https://smarnach.github.io/pyexiftool/) and I am able to import the library, but I get the following error when trying to run the test data just to see if it works.
ERROR: test_get_metadata (__main__.TestExifTool)
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Program Files\Python36\Lib\site-
packages\pyexiftool\test\test_exiftool.py", line 66, in test_get_metadata
with self.et:
File "C:\Program Files\Python36\lib\site-packages\exiftool.py", line 191, in __enter__
self.start()
File "C:\Program Files\Python36\lib\site-packages\exiftool.py", line 174, in start
stderr=devnull)
File "C:\Program Files\Python36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Program Files\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
I also did run the setup code that is in the exiftool folder and still no luck. I think it might be a library issue or path or the (init.py) file, but I've tried several ways and so I'm here to ask if anyone else has a solution or ideas for me to try and fix it.
I'm running Python 3.6.6 and have tried other versions.
(I can run exiftool in command line, but I have encoded BASE64 images that exiftool doesnt work in command line to fully decode.)
Thank you StarGeek! The problem was that I didn't have the exiftool command tool (the separate application of exiftool) in the right PATH env variable. Once I added the application to the PATH env variables i got it to work. Also in the python exiftool code at line 70 it says that you have to have it in the path or direct it to the executable, that I had missed. Thanks again!
Problem
My tesseract (tesserocr) is not found by the emacs python interpreter, but I am able to use tesseract on the terminal as well as in my Spyder installation. Emacs python interpreter is able to import pytesseract, but not find tesserocr. I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/eghx/agent18/project-gym/tests/thresholding.py", line 34, in image_to_string2
print(image_to_string(img_open))
File "/home/eghx/anaconda3/lib/python3.6/site-packages/pytesseract-0.1.7-py3.6.egg/pytesseract/pytesseract.py", line 122, in image_to_string
File "/home/eghx/anaconda3/lib/python3.6/site-packages/pytesseract-0.1.7-py3.6.egg/pytesseract/pytesseract.py", line 46, in run_tesseract
File "/home/eghx/anaconda3/lib/python3.6/subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "/home/eghx/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract': 'tesseract'
when I run
pytesseract.image_to_string(img)
However I don't get this error when I open EMACS from a terminal instead of the desktop. It appears that the path variable is inherited differently in the desktop version and terminal version of emacs. ODD!
Explanation
I have anaconda installation here:/path/to/anaconda3
I have added this line to my init file to run this particular python installation
(setq python-shell-interpreter "/path/to/anaconda3/bin/python")
I installed both pytesseract and tesserocr using conda install
which tesseract gives:
/path/to/anaconda3/bin/tesseract
$ echo $PATH gives:
/path/to/anaconda3/bin:/usr/local/sbin:/usr/lo....
What I did
I copied the sys.path from the working Spyder IDE to emacs python interpreter and still didn't work.
I looked around and found this but the top answer does not pertain to my case, as my $PATH variable contains the necessary path.
Can someone guide me? I am a noob. I have emacs 27 and ubuntu 16 and conda 4.5.0.
This is a possible duplicate of OSError: [Errno 2] No such file or directory using pytesser
Answer was found as per the 3rd point in the link, quoted below:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'path-to-tesseract-including-bin'
In my case,
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '/home/anaconda3/bin/tesseract'
This is only a temporary hack to get image_to_string to work, by typing the above in every file.
Why the $PATH variable having the /home/anaconda3/bin is not enough to get it to work sufficiently is not known. This seems to be a slighty long-term-temporary solution.
I have installed the pytesseract library using
pip install pytesseract
When I tried to use the image_to_text method, it gave me a
FileNotFoundError: [WinError 2] The system can not find the file specified
I googled it and found that I should change something in the pytesseract.py file and the line
tesseract_cmd = 'tesseract'
should become
tesseract_cmd = path_to_folder_that_contains_tesseractEXE + 'tesseract'
I searched and haven't found any tesseract.exe files in my Python folder, I then reinstalled the library, but the file still wasn't there. Finnally, I replaced the line by:
tesseract_cmd = path_to_folder_that_contains_pytesseractEXE + 'pytesseract'
and my program threw:
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
What can I do make my programm work?
P.S Here is my programm code :
from pytesseract import image_to_string
from PIL import Image, ImageEnhance, ImageFilter
im = Image.open(r'C:\Users\Филипп\Desktop\ImageToText_Python\NoName.png')
print(im)
txt = image_to_string(im)
print(txt)
Full Traceback of first attempt :
File "C:/Users/user/Desktop/ImageToText.py", line 10, in <module>
text = pytesseract.image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 122, in
image_to_string config=config)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 46, in
run_tesseract proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Python\lib\subprocess.py", line 947, in __init__ restore_signals, start_new_session)
File "C:\Python\lib\subprocess.py", line 1224, in _execute_child startupinfo)
FileNotFoundError: [WinError 2]The system can not find the file specified
Full Traceback of second attempt
Traceback (most recent call last):
File "C:\Users\user\Desktop\ImageToText.py", line 6, in <module> txt = image_to_string(im)
File "C:\Python\lib\site-packages\pytesseract\pytesseract.py", line 125, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py [-l lang] input_file')
From project's README:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
print(pytesseract.image_to_string(Image.open('test.png')))
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
So, you have to make sure tesseract.exe is on your computer (for example by installing Tesseract-OCR), then add the containing folder to your PATH environment variable, or declare it's location using pytesseract.pytesseract.tesseract_cmd attribute
For people in the same case as me: here is a tesseract-OCR downloader. After you finish the download, go to the path you've chosen, there should be a file named tesseract.exe, copy the path to this file and paste it into pytesseract.exe.
If you are using windows OS - you have to install tesseract-ocr from this link (3.05.01 is the stable version and supported for foreign language extraction). And add the path(where you installed the software) to the environment variable.
If you are using ubuntu OS - in terminal type "sudo apt-get install tesseract-ocr"
Pytesseract is python wrapper that helps you to access this tesseract-ocr software.
Note 1: if you want to extract foreign languages then you have to include tessdata files in the installed path.
Note 2: Python 2 will not have good support on foreign language extraction, so better go with python 3.
First of all I did everything mentioned here pytesseract-no such file or directory error
Still doesn't work. Now I'm using Pycharm IDE with following code:
from PIL import Image
import pytesseract
import subprocess
im = Image.open('test.png')
im.show()
subprocess.call(['tesseract','test.png','out'])
print pytesseract.image_to_string(Image.open('test.png'))
im.show() opens the image successfully.
subprocess.call() with tesseract test.png out also extracts the text
from the image..
but pytesseract.image_to_string() fails.
I don't get it. Why I am able to use tesseract in shell but not in python. And in python I can open same image but when used with tesseract Image can't be found.
Below you can see the error output.
File "/home/hamza-c/Schreibtisch/Android/JioShare/orc.py", line 7, in <module>
print pytesseract.image_to_string(Image.open('/home/hamza-c/Schreibtisch/Android/JioShare/test.png'))
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 162, in image_to_string
config=config)
File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 95, in run_tesseract
stderr=subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1340, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I tested the code you mentioned in your question. It works fine. I was facing the same error
No such file or directory found
The problem was the directory containing 'tesseract.exe' was not added to the environment Variable. You should be able to run command 'tesseract' in command prompt.
if tesseract is not installed you can download it from tesseract
1: https://github.com/tesseract-ocr/tesseract/wiki and for windows use third party installer available here
maybe you need install tesseract ,if your os is centos, please enter
yum install tesseract
I've used the following command and it worked for me:
brew install tesseract
I solved my own question.
im = Image.open('test.png')
print pytesseract.image_to_string(im)
It's still unclear why it works when a reference is passed but not directly when I try to open image inside the parameter.