PyTesseract failing to load languages

PyTesseract failing to load languages - python

My code is as follows:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = 'B:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
img = Image.open("sample.png")
text = pytesseract.image_to_string(img, lang="eng")
print(text)
The error I get is:
Traceback (most recent call last):
File "C:/PY/tesseract test.py", line 11, in <module>
text = pytesseract.image_to_string(img, lang="eng")
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 346, in image_to_string
return {
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 349, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 260, in run_and_get_output
run_tesseract(**kwargs)
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 236, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I have tried searching for other solutions but cannot find anything

I'm not familiar with tesseract in Python, but you may need to load the eng.traineddata binary in order to make it work. Add a TESSDATA_PREFIX to your environment variables and point it to the folder where the binary is located.
You may want to at this answer, looks kind similar to your case: pytesseract Failed loading language \'eng\'

I fixed this issue by uninstalling tesseract and installing an older version (3.0.2). So far I haven't noticed any functionality loss. I'm personally just happy that it works.

Related

image path is valid but pyautogui.locateOnScreen is unable to read image file?

import cv2
import pyautogui
if __name__ == '__main__':
template = cv2.imread("Images/新增.png", 0)
pos = pyautogui.locateOnScreen("Images/新增.png")
cv2 can open the image just fine, I checked the permission and it is same as any images on computer. However pyautogui returns the following error, which says the file is missing, has improper permissions, or is an unsupported or invalid format.
[ WARN:0#1.236] global D:\a\opencv-python\opencv-python\opencv\modules\imgcodecs\src\loadsave.cpp (239) cv::findDecoder imread_('Images/新增.png'): can't open/read file: check file path/integrity
[ WARN:0#1.296] global D:\a\opencv-python\opencv-python\opencv\modules\imgcodecs\src\loadsave.cpp (239) cv::findDecoder imread_('Images/新增.png'): can't open/read file: check file path/integrity
Traceback (most recent call last):
File "E:\Python代码\HealthCheckRunner\HealthCheck.py", line 82, in <module>
pos = pyautogui.locateOnScreen("Images/新增.png")
File "F:\HealthCheckRunner\lib\site-packages\pyautogui\__init__.py", line 175, in wrapper
return wrappedFunction(*args, **kwargs)
File "F:\HealthCheckRunner\lib\site-packages\pyautogui\__init__.py", line 213, in locateOnScreen
return pyscreeze.locateOnScreen(*args, **kwargs)
File "F:\HealthCheckRunner\lib\site-packages\pyscreeze\__init__.py", line 373, in locateOnScreen
retVal = locate(image, screenshotIm, **kwargs)
File "F:\HealthCheckRunner\lib\site-packages\pyscreeze\__init__.py", line 353, in locate
points = tuple(locateAll(needleImage, haystackImage, **kwargs))
File "F:\HealthCheckRunner\lib\site-packages\pyscreeze\__init__.py", line 207, in _locateAll_opencv
needleImage = _load_cv2(needleImage, grayscale)
File "F:\HealthCheckRunner\lib\site-packages\pyscreeze\__init__.py", line 170, in _load_cv2
raise IOError("Failed to read %s because file is missing, "
OSError: Failed to read Images/新增.png because file is missing, has improper permissions, or is an unsupported or invalid format
Process finished with exit code 1
absolute path returns the same error too
pos = pyautogui.locateOnScreen("E:/Python代码/HealthCheckRunner/Images/新增.png")
--------------------------
OSError: Failed to read E:/Python代码/HealthCheckRunner/Images/新增.png because file is missing, has improper permissions, or is an unsupported or invalid format

The path is relative, but not relative to the file where imread() is used, it is relative from the folder where the py command was run
#Directories
.
╚ test
╠ visual1.py
╠ visual2.py
╚ data
╠ 1.jpg
╚ 2.jpg
#./test/visual1.py
import cv2
img_rgb = cv2.imread('data/2.jpg')
template = cv2.imread('data/1.jpg')
#./test/visual2.py
import cv2
img_rgb = cv2.imread('test/data/2.jpg')
template = cv2.imread('test/data/1.jpg')
if you are in the main project folder and run:
py test/visual1.py
you will get error, but if you run:
py test/visual2.py
everything will be fine.

I cannot comment yet then please bear it. Try to change the name of file . I think in the converting of file name for library some characters can not be recognized. Do that comment the result and I will edit my post.

As the error states did you try changing the permissions? Or try putting ./ infront of the folder
eg:
pos = pyautogui.locateOnScreen("./Images/新增.png")

Please try to install opencv-python library.
This is the line to install
pip install opencv-python.
I tried to change the name off arquive too, sometimes is confuse for the system.
I was trying it and now is working for me.

You use Chinese in the code! Even if you change file encodings to UTF-8, it will report error still. You'd better change it to an English name instead of using Chinese.

Got error from Text recognition using Tesseract in pycharm(Windows) [duplicate]

This question already has answers here:
Pytesseract: Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata
(4 answers)
Closed 1 year ago.
I tried to read text inside the image using pytesseract and openCV in pycharm editor(windows). It display image but when read the text it show error.
Here is my code.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
image = cv2.imread('test2.png')
image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
height,width,_ =image.shape
print(height,width)
print(pytesseract.image_to_string(image)) # ****
cv2.imshow('Output', image)
cv2.waitKey(0)
According to this code when I type and run
print (pytesseract.image_to_string(image))
It showed error like follow.
Traceback (most recent call last):
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\test detection.py", line 10, in <module>
print(pytesseract.image_to_string(image))
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\venv\lib\site-packages\pytesseract\pytesseract.py", line 409, in image_to_string
return {
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\venv\lib\site-packages\pytesseract\pytesseract.py", line 412, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\venv\lib\site-packages\pytesseract\pytesseract.py", line 287, in run_and_get_output
run_tesseract(**kwargs)
File "D:\Last Update\Uni of Mora\L4S1\FYP\new python project\HTMLCodeGenerator1\venv\lib\site-packages\pytesseract\pytesseract.py", line 263, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Program Files (x86)\\Tesseract-OCR/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I tried various ways. But it doesn't work. I changed the environment variable, download test data again but it doesn't work. How I solve this.

I think your main problem is with TESSDATA_PREFIX not being in Env Variables
See this answer
Pytesseract: Error opening data file \\Program Files (x86)\\Tesseract-OCR\\en.traineddata

raise "pytesseract.pytesseract.TesseractError: (3221225477, '')"

I got the following error when I tried to find out the Chinese words in a picture by python: (By the way, I had already had "chi_sim.traineddata" training file in tessdata directory and got a successful try to find out English sentences in a picture, so this error really confused me.)
*C:\Users\Lenovo\AppData\Local\Programs\Python\Python37-32\python.exe E:/PKU1.3/python_math/set_for_recognition.py
Traceback (most recent call last):
File "E:/PKU1.3/python_math/set_for_recognition.py", line 5, in <module>
text=pytesseract.image_to_string(Image.open('climb_high.jpeg'),lang='chi_sim')
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytesseract\pytesseract.py", line 295, in image_to_string
return run_and_get_output(*args)
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytesseract\pytesseract.py", line 203, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytesseract\pytesseract.py", line 179, in run_tesseract
raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (3221225477, '')*

I think this problem is TRAINEDDATA that raised.
I used to develop the OCR project with TESSERACT on windows 7.
Now, I change to windows 10. I get this problem.
but, I found this issue is related to your TRAINEDDATA,
If I use TRAINEDDATA that I have trained on windows 7, then it fine without any error message.

Actually since the error code 3221225477 --> 0xC0000005 : ACCESS_VIOLATION means Tesseract has crashed (from here), change a version of Tesseract may help you.
In 4.00 (beta) and 3.02 this problem is occurred, 3.05 is fine (I use Windows 7).
Hope this helps.

I got this error because my UZN file extended beyond the image area. I patched pytesseract.py (print(' '.join(cmd_args)) in run_tesseract()) which was throwing an assertion error.

Please try the below code :
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'
tessdata_dir_config = '--tessdata-dir "C:/Program Files/Tesseract-OCR/tessdata"'
img = Image.open('images\Capture2.JPG')
text = pytesseract.image_to_string(img, config=tessdata_dir_config)
print(text)

FileNotFoundError on python

img = printscreen_pil
img = img.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2)
img = img.convert('1')
img.save('temp.jpg')
text = pytesseract.image_to_string(Image.open('temp.jpg'))
I want to read the image in order to convert it to text but i get the error system cannot find the file specified. I think it has to do with the working directory of the python. I'm sorry if this is a stupid question but I hope you can help me.
this is the complete error mssg.
Traceback (most recent call last):
File "C:\Users\pncor\Documents\pyprograms\bot.py", line 23, in <module>
text = pytesseract.image_to_string(Image.open('temp.jpg'))
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pytesseract\pytesseract.py", line 122, in image_to_string
config=config)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pytesseract\pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 990, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

The tesseract package does not seem to be installed on your system, or it is not found on your path. pytesseract runs the tesseract binary as a sub process in order to perform the OCR.
Use the package manager on your OS to install it, or refer the the installation documentation. You are using Windows so check this out.
Also I don't think that it is necessary to write the enhanced image to file first, just pass it directly to pytesseract.image_to_string:
text = pytesseract.image_to_string(img)

Error in opening image file in PIL

I am trying to execute the following code
from pytesser import *
import Image
i="C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg"
print i
im = Image.open(i.strip())
text = image_to_string(im)
print text
I get the following error
C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 322, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\debugger\debugger.py", line 655, in run
exec cmd in globals, locals
File "C:\Documents and Settings\Administrator\Desktop\attachments\ocr.py", line 1, in <module>
from pytesser import *
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1952, in open
fp = __builtin__.open(fp, "rb")
IOError: [Errno 2] No such file or directory: 'C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg'
Can someone please explain what I am doing wrong here.
Renamed the image file.Shifted the python file and the images to a new folder. Shifted the folder to E drive
Now the code is as follows:
from pytesser import *
import Image
import os
i=os.path.join("E:\\","ocr","a.jpg")
print i
im = Image.open(i.strip())
text = image_to_string(im)
print text
Now the error is as follows:
E:\ocr\a.jpg
Traceback (most recent call last):
File "or.py", line 8, in <module>
text = image_to_string(im)
File "C:\Python27\lib\pytesser.py", line 31, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "C:\Python27\lib\pytesser.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "C:\Python27\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 893, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

You need to install Tesseract first. Just installing pytesseract is not enough. Then edit the tesseract_cmd variable in pytesseract.py to point the the tessseract binary. For example, in my installation I set it to
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'

The exception is pretty clear: the file either doesn't exist, or you lack sufficient permissions to access it. If neither is the case, please provide evidence (e.g. relevant dir commands with output, run as the same user).

your image path maybe?
i="C:\\Documents and Settings\\Administrator\\Desktop\\attachments\\R1PNDTCB.jpg"
try this:
import os
os.path.join("C:\\", "Documents and Settings", "Administrator")
you should get a string similar to the one in the previous line

Try this first:
os.path.expanduser('~/Desktop/attachments/R1PNDTCB.jpg')
It could be that the space in the 'Documents and Settings' is causing this problem.
EDIT:
Use os.path.join so it uses the correct directory separator.

Just add these two lines in your code
import OS
os.chdir('C:\Python27\Lib\site-packages\pytesser')
before
from pytesser import *

If you are using pytesseract, you have to make sure that you have installed Tesseract-OCR in your system. After that you have to insert the path of the tesseract in your code, as below
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract
OCR/tesseract'
You can download the Tesseract-OCR form https://github.com/UB-Mannheim/tesseract/wiki

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyTesseract failing to load languages - python

I fixed this issue by uninstalling tesseract and installing an older version (3.0.2). So far I haven't noticed any functionality loss. I'm personally just happy that it works.

Related

image path is valid but pyautogui.locateOnScreen is unable to read image file?

Got error from Text recognition using Tesseract in pycharm(Windows) [duplicate]

raise "pytesseract.pytesseract.TesseractError: (3221225477, '')"

FileNotFoundError on python

Error in opening image file in PIL

Categories

Resources