This question already has answers here:
Pytesseract: Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata
(4 answers)
Closed 1 year ago.
I tried to read text inside the image using pytesseract and openCV in pycharm editor(windows). It display image but when read the text it show error.
Here is my code.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
image = cv2.imread('test2.png')
image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
height,width,_ =image.shape
print(height,width)
print(pytesseract.image_to_string(image)) # ****
cv2.imshow('Output', image)
cv2.waitKey(0)
According to this code when I type and run
print (pytesseract.image_to_string(image))
It showed error like follow.
Traceback (most recent call last):
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\test detection.py", line 10, in <module>
print(pytesseract.image_to_string(image))
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\venv\lib\site-packages\pytesseract\pytesseract.py", line 409, in image_to_string
return {
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\venv\lib\site-packages\pytesseract\pytesseract.py", line 412, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\venv\lib\site-packages\pytesseract\pytesseract.py", line 287, in run_and_get_output
run_tesseract(**kwargs)
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\venv\lib\site-packages\pytesseract\pytesseract.py", line 263, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Program Files (x86)\\Tesseract-OCR/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I tried various ways. But it doesn't work. I changed the environment variable, download test data again but it doesn't work. How I solve this.
I think your main problem is with TESSDATA_PREFIX not being in Env Variables
See this answer
Pytesseract: Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata
Related
My code is as follows:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = 'B:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
img = Image.open("sample.png")
text = pytesseract.image_to_string(img, lang="eng")
print(text)
The error I get is:
Traceback (most recent call last):
File "C:/PY/tesseract test.py", line 11, in <module>
text = pytesseract.image_to_string(img, lang="eng")
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 346, in image_to_string
return {
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 349, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 260, in run_and_get_output
run_tesseract(**kwargs)
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 236, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I have tried searching for other solutions but cannot find anything
I'm not familiar with tesseract in Python, but you may need to load the eng.traineddata binary in order to make it work. Add a TESSDATA_PREFIX to your environment variables and point it to the folder where the binary is located.
You may want to at this answer, looks kind similar to your case: pytesseract Failed loading language \'eng\'
I fixed this issue by uninstalling tesseract and installing an older version (3.0.2). So far I haven't noticed any functionality loss. I'm personally just happy that it works.
Here's the code when I run it gives an error that it is not setting a variable in windows, but it is installed, there is also traindata, the solution on stackoverflow and github looked, there are some thoughts there, but they did not help me!
image = Image.open(create_path)
print(pytesseract.image_to_string(image))
Error:
Traceback (most recent call last):
File "D:\Dev\Scripts\NewYorkCase__upwork__\new.py", line 60, in <module>
main(driver, main_url)
File "D:\Dev\Scripts\NewYorkCase__upwork__\new.py", line 53, in main
get_ready_captcha = captcha_symbols_recognized(get_path)
File "D:\Dev\Scripts\NewYorkCase__upwork__\new.py", line 40, in captcha_symbols_recognized
print(pytesseract.image_to_string(image))
File "C:\Users\PANDEMIC\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytesseract\pytesseract.py", line 294, in image_to_string
return run_and_get_output(*args)
File "C:\Users\PANDEMIC\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytesseract\pytesseract.py", line 202, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\PANDEMIC\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytesseract\pytesseract.py", line 178, in run_tesseract
raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
img = printscreen_pil
img = img.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2)
img = img.convert('1')
img.save('temp.jpg')
text = pytesseract.image_to_string(Image.open('temp.jpg'))
I want to read the image in order to convert it to text but i get the error system cannot find the file specified. I think it has to do with the working directory of the python. I'm sorry if this is a stupid question but I hope you can help me.
this is the complete error mssg.
Traceback (most recent call last):
File "C:\Users\pncor\Documents\pyprograms\bot.py", line 23, in <module>
text = pytesseract.image_to_string(Image.open('temp.jpg'))
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pytesseract\pytesseract.py", line 122, in image_to_string
config=config)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pytesseract\pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 990, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
The tesseract package does not seem to be installed on your system, or it is not found on your path. pytesseract runs the tesseract binary as a sub process in order to perform the OCR.
Use the package manager on your OS to install it, or refer the the installation documentation. You are using Windows so check this out.
Also I don't think that it is necessary to write the enhanced image to file first, just pass it directly to pytesseract.image_to_string:
text = pytesseract.image_to_string(img)
I am using python 2.7 with OpenCV and PIL Running on windows 10
I am trying to get an image written with cv2.imwrite() like follow Code:
x={{"Some Process On an Image "}}
#now saving the effects on real image
cv2.imwrite("temp.png", x)
now I am trying to get the text from the same image with pytesseract
and need to open the image
with the PIL library within another calss becuase the pytesseract
doesnt deal with cv2 images
from PIL import Image
import pytesseract
import cv2
if __name__ == '__main__':
url= Image.open("temp.png")
cv2.waitKey(200)
text=pytesseract.image_to_string(url)
print text
cv2.waitKey(0)
and Gave Me this Error:
Traceback (most recent call last):
File "C:\Users\Elie MA\Documents\NetBeansProjects\tesser\src\new_module.py", line 9, in <module>
text=pytesseract.image_to_string(url)
File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 161, in image_to_string
File "build\bdist.win32\egg\pytesseract\pytesseract.py", line 94, in run_tesseract
File "c:\Python27\lib\subprocess.py", line 711, in __init__
errread, errwrite)
File "c:\Python27\lib\subprocess.py", line 948, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
why I cant get the image ?
is there any access rights or something ?
It can be showed with this Code:
url="temp.png"
cv2.imshow("temp_photo",url)
I want to know why PIL cannot get the image ?
With the knowledge that this image can be opened with photo viewer also
So Any one can Help Me?
I am trying to execute the following code
from pytesser import *
import Image
i="C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg"
print i
im = Image.open(i.strip())
text = image_to_string(im)
print text
I get the following error
C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 322, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\debugger\debugger.py", line 655, in run
exec cmd in globals, locals
File "C:\Documents and Settings\Administrator\Desktop\attachments\ocr.py", line 1, in <module>
from pytesser import *
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1952, in open
fp = __builtin__.open(fp, "rb")
IOError: [Errno 2] No such file or directory: 'C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg'
Can someone please explain what I am doing wrong here.
Renamed the image file.Shifted the python file and the images to a new folder. Shifted the folder to E drive
Now the code is as follows:
from pytesser import *
import Image
import os
i=os.path.join("E:\\","ocr","a.jpg")
print i
im = Image.open(i.strip())
text = image_to_string(im)
print text
Now the error is as follows:
E:\ocr\a.jpg
Traceback (most recent call last):
File "or.py", line 8, in <module>
text = image_to_string(im)
File "C:\Python27\lib\pytesser.py", line 31, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "C:\Python27\lib\pytesser.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "C:\Python27\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 893, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
You need to install Tesseract first. Just installing pytesseract is not enough. Then edit the tesseract_cmd variable in pytesseract.py to point the the tessseract binary. For example, in my installation I set it to
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
The exception is pretty clear: the file either doesn't exist, or you lack sufficient permissions to access it. If neither is the case, please provide evidence (e.g. relevant dir commands with output, run as the same user).
your image path maybe?
i="C:\\Documents and Settings\\Administrator\\Desktop\\attachments\\R1PNDTCB.jpg"
try this:
import os
os.path.join("C:\\", "Documents and Settings", "Administrator")
you should get a string similar to the one in the previous line
Try this first:
os.path.expanduser('~/Desktop/attachments/R1PNDTCB.jpg')
It could be that the space in the 'Documents and Settings' is causing this problem.
EDIT:
Use os.path.join so it uses the correct directory separator.
Just add these two lines in your code
import OS
os.chdir('C:\Python27\Lib\site-packages\pytesser')
before
from pytesser import *
If you are using pytesseract, you have to make sure that you have installed Tesseract-OCR in your system. After that you have to insert the path of the tesseract in your code, as below
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract
OCR/tesseract'
You can download the Tesseract-OCR form https://github.com/UB-Mannheim/tesseract/wiki