Pytesseract failed to load due to it being unable to find tesseract - python

While trying to install and use tesseract on windows 10 with python using pytesseract I get the error:
File "C:\ProgramData\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 194, in run_tesseract
raise TesseractError(status_code, get_errors(error_string))
TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I tried reinstalling tesseract.
I have Set C:\Program Files (x86)\Tesseract-OCR to the PATH envoirment variables
I have added TESSDATA_PREFIX to C:\Program Files (x86)\Tesseract-OCR\tessdata
I have verrified that when I type in 'tesseract' in CMD works
The code i use:
import cv2
import pytesseract
# Uncomment the line below to provide path to tesseract manually
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
# Define config parameters.
# '-l eng' for using the English language
# '--oem 1' for using LSTM OCR Engine
config = ('-l eng --oem 1 --psm 3')
# Read image from disk
im = cv2.imread("Serie1/NL83LHL9.JPG", cv2.IMREAD_COLOR)
# Run tesseract OCR on image
text = pytesseract.image_to_string(im, config=config)
# Print recognized text
print(text)
Results:
CMD > tesseract : shows the tesseract interface

solved by Dmitrii Z.
Indeed it looks a bit odd. One thing you can try is to add tessdata path to your config - config = r'--tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata" -l eng --oem 1 --psm 3'

If you don't have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files (x86)/Tesseract-OCR/tesseract'

Related

Converting Image Converter python program to Exe not executing properly

i am trying to convert my ImageConverter.py program to an Executable. I use
python3.6.7
pyinstaller --onefile ImageConverter.py
The Executable is made along with the folder pycache , build, and dist
The executable file is in the dist folder, when I try to run the exe the terminal pops up and performs the program and says complete. However no jpeg files are created or in the dist folder. I have also made sure my original .py file, along with my png files are also in the dist folder with the exe. is there something im missing here? are the converted jpeg files in another location?
The ImageConverter.py program uses PIL python package, it open the png files, converts them to RGB, than saves them as jpegs. The program works when running it as usual in terminal using python3 but does not work when trying the exe. any help is appreciated! thanks
ImageConverter.py. :
from PIL import Image #Python Image Library - Image Processing
import glob
import os
import sys
application_path = os.path.dirname(sys.executable)
print(glob.glob("*.png"))
#Iterate through all the images
#Convert images to RGB
#Save Images
for files in glob.glob("*.png"):
im = Image.open(files)
rgb_im = im.convert("RGB")
rgb_im.save(files.replace("png", "jpeg"), quality=95)
output_path = os.path.join(application_path, f'images')
Ive tried to run the executable multiple times. I put the png files and also .py into the dist folder where the exe exists and run. I was expecting the png files to convert to jpeg files, leaving png and jpegs in the dist folder after running exe. However terminal said :
PythonPractice/ImageProcessingPractice/dist/ImageConverter ; exit;
[]
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
[Process completed]
There is no converted Jpegs in the dist folder
sys.executable is the location of the Python interpreter you are running, e.g. /usr/bin/python3 or C:\Program Files\Python\Scripts\python.exe, not the location of the script you are running.
Therefore, for a system-installed Python binary, os.path.dirname(sys.executable) will return something like /usr/bin or C:\Program Files\Python\Scripts, and os.path.join(application_path, f'images') will return something like /usr/bin/images or C:\Program Files\Python\Scripts\images.
But for a PyInstaller program that used --onefile, that's likely a temporary directory somewhere that gets cleaned up right after the program is finished running.
It might make more sense to just use
output_path = 'images'
without any prefix, which should be relative to the current working directory (wherever you are when you run the script).

UnidentifiedImageError: cannot identify image file '59.jpg

!sudo apt install tesseract-ocr
!pip install pytesseract
from google.colab import files
file_uploaded = files.upload()
import pytesseract
import cv2
import os
from PIL import Image
from google.colab.patches import cv2_imshow
from google.colab import drive
from google.colab import files
image=cv2.imread('prac.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
filename = "{}.jpg".format(os.getpid())
cv2.imwrite(filename, gray)
text = pytesseract.image_to_string(Image.open(filename), lang = None)
os.remove(filename)
print(text)
cv2_imshow(image)
In the line "text = ~~", this code cause error "UnidentifiedImageError: cannot identify image file '59.jpg'". But I don't even have image named 59. what's wrong with this?
Your code works. But you are supposed to restart Colab runtime once pytesseract and its dependencies are installed, as you were guided by pip install (in red):
You must restart the runtime in order to use newly installed versions. ,
followed by RESTART RUNTIME button.
After restarting runtime and running that cell again you should get expected result. Though you'd probably want to re-organize your Notebook a bit and split that code (at least) into 3 different cells:
installing dependencies
importing modules
your image upload & OCR code
"image named 59" is created by image=cv2.imread('prac.jpg'); gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)in your code.

Reading text from image with Pytesseract gives bad path error

Im trying to read text from image with pytesseract. Im using mac.
I have install pytesseract with pip.
import cv2
import pytesseract
img = cv2.imread('slika1.png')
text = pytesseract.image_to_string(img)
print(text)
It gives me this error:
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
when I do this:
import importlib.util
print(importlib.util.find_spec('pytesseract'))
It prints:
ModuleSpec(name='pytesseract', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f8a7837c160>, origin='/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract/__init__.py', submodule_search_locations=['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract'])
So what should I do, what am I doing wrong?
Is there any other way to read text from image?
Try to open the module source file (as admin) and edit the path to the Tesseract exe file - set it with absolute path if needed. It should be a const in the top lines.
Something like this (on Win):
"C:\Program Files\Python36\Lib\site-packages\pytesseract\pytesseract.py"
Set the path: ...
pytesseract.tesseract_cmd = r"D:\OCR\tesseract.exe"
https://github.com/Twenkid/ComputerVision_Pyimagesearch_OpenCV_Dlib_OCR-Tesseract-DL/blob/master/OCR_Tesseract/ocr.py

python executable that has dependency on dlib not working

Hello everyone i have a python script that has dependency on dlib such as import dlib now i have created an executable out of it (using pyinstaller) and it works fine on my machine but gives ImportError: DLL load failed: A dynamic link library (DLL) initialization routine failed on another machine. and after digging out the line at which this occurs is basically importing dlib which makes me think dlib is not getting properly included in my executable. My dlib version 19.18.0 and the other machine i am trying to run exe on does'nt have python installed.Need help Error on another machine
F:\FaceRecogDemo\FaceRecogDemo\dist>recognizefaces.exe --debug --encodings ../encodings.pickle --image ../example1.jpg
Traceback (most recent call last):
File "D:\FaceRecogDemo\recognizefaces.py", line 2, in <module>
File "c:\programdata\anaconda3\envs\mywindowscv\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 627, in exec_module
File "D:\FaceRecogDemo\face_recognition\__init__.py", line 7, in <module>
File "c:\programdata\anaconda3\envs\mywindowscv\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 627, in exec_module
File "D:\FaceRecogDemo\face_recognition\api.py", line 4, in <module>
ImportError: DLL load failed: A dynamic link library (DLL) initialization routine failed.
[14720] Failed to execute script recognizefaces
My recognizefaces.py script
import face_recognition
import argparse
import pickle
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-e", "--encodings", required=True,
help="path to serialized db of facial encodings")
ap.add_argument("-i", "--image", required=True,
help="path to input image")
ap.add_argument("-d", "--detection-method", type=str, default="cnn",
help="face detection model to use: either `hog` or `cnn`")
args = vars(ap.parse_args())
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open(args["encodings"], "rb").read())
# load the input image and convert it from BGR to RGB
image = cv2.imread(args["image"])
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# detect the (x, y)-coordinates of the bounding boxes corresponding
# to each face in the input image, then compute the facial embeddings
# for each face
print("[INFO] recognizing faces...")
boxes = face_recognition.face_locations(rgb,
model=args["detection_method"])
encodings = face_recognition.face_encodings(rgb, boxes)
# initialize the list of names for each face detected
names = []
# loop over the facial embeddings
for encoding in encodings:
# attempt to match each face in the input image to our known
# encodings
matches = face_recognition.compare_faces(data["encodings"],
encoding)
name = "Unknown"
# check to see if we have found a match
if True in matches:
# find the indexes of all matched faces then initialize a
# dictionary to count the total number of times each face
# was matched
matchedIdxs = [i for (i, b) in enumerate(matches) if b]
counts = {}
# loop over the matched indexes and maintain a count for
# each recognized face face
for i in matchedIdxs:
name = data["names"][i]
counts[name] = counts.get(name, 0) + 1
# determine the recognized face with the largest number of
# votes (note: in the event of an unlikely tie Python will
# select first entry in the dictionary)
name = max(counts, key=counts.get)
# update the list of names
names.append(name)
print(names)
Both my machines have windows 10 OS
Pyinstaller doesn't seem to be picking up dlib for some reason. When you construct your binary on the command line, try explicitly adding dlib to the bundle using the following flag pyinstaller --hidden-import dlib.
https://pyinstaller.readthedocs.io/en/stable/usage.html#what-to-bundle-where-to-search
I have resolved the issue, In my case it was due to direct compilation of dlib (I don't know why).
Here are steps:
Create a new environment
Install other requirements of your project
Make sure visual studio is installed on your development PC
pip install cmake
pip install face_recognition
Now you can use dlib, as face_recognition will install dlib as well.
Now you can create exe file, if still problem persists copy dlib folder from site_packages folder into folder where your exe file is placed.
In my case, it works on two other version of windows, where it was not working previously
The problem might also be in the environmental variable , can you check the %PATH% in the machine where you have executed the file. Multiple python versions and its configured correctly in the PATH.
The problem might also be due to the Visual C++ distribution , Check once you have the same distribution in both the machines.
Try to add the opencv DLL's in the path variables and check.The problem is a missing python3.dll in the anaconda distribution. You can download the python binaries here and extract dll out of the zip archive. Put it in a folder in your PATH (e.g. C:\Users\MyName\Anaconda3) and the import should work.
Can't import cv2; "DLL load failed"

FileNotFoundError: [WinError 2] The system cannot find the file specified while trying to use pytesseract

I am trying to write a python script to extract the text from image and i keep getting this error. The script is given below. Error
from PIL import Image
from pytesseract import image_to_string
print (image_to_string(Image.open('samp.png')))
print (image_to_string(Image.open('test-english.jpg'), lang='eng'))
Try the following steps, this worked for me.
1) Download tesseract-OCR from here and install it in the location C:/Program Files
2)write the following code
from PIL import Image
from pytesseract import image_to_string
#pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
i.e
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
3)Now run this
print(pytesseract.image_to_string(Image.open('D:/image_file.jpg')))
Hope that help!

Categories