Reading text from image with Pytesseract gives bad path error

Reading text from image with Pytesseract gives bad path error - python

Im trying to read text from image with pytesseract. Im using mac.
I have install pytesseract with pip.
import cv2
import pytesseract
img = cv2.imread('slika1.png')
text = pytesseract.image_to_string(img)
print(text)
It gives me this error:
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
when I do this:
import importlib.util
print(importlib.util.find_spec('pytesseract'))
It prints:
ModuleSpec(name='pytesseract', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f8a7837c160>, origin='/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract/__init__.py', submodule_search_locations=['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract'])
So what should I do, what am I doing wrong?
Is there any other way to read text from image?

Try to open the module source file (as admin) and edit the path to the Tesseract exe file - set it with absolute path if needed. It should be a const in the top lines.
Something like this (on Win):
"C:\Program Files\Python36\Lib\site-packages\pytesseract\pytesseract.py"
Set the path: ...
pytesseract.tesseract_cmd = r"D:\OCR\tesseract.exe"
https://github.com/Twenkid/ComputerVision_Pyimagesearch_OpenCV_Dlib_OCR-Tesseract-DL/blob/master/OCR_Tesseract/ocr.py

Related

UnidentifiedImageError: cannot identify image file '59.jpg

!sudo apt install tesseract-ocr
!pip install pytesseract
from google.colab import files
file_uploaded = files.upload()
import pytesseract
import cv2
import os
from PIL import Image
from google.colab.patches import cv2_imshow
from google.colab import drive
from google.colab import files
image=cv2.imread('prac.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
filename = "{}.jpg".format(os.getpid())
cv2.imwrite(filename, gray)
text = pytesseract.image_to_string(Image.open(filename), lang = None)
os.remove(filename)
print(text)
cv2_imshow(image)
In the line "text = ~~", this code cause error "UnidentifiedImageError: cannot identify image file '59.jpg'". But I don't even have image named 59. what's wrong with this?

Your code works. But you are supposed to restart Colab runtime once pytesseract and its dependencies are installed, as you were guided by pip install (in red):
You must restart the runtime in order to use newly installed versions. ,
followed by RESTART RUNTIME button.
After restarting runtime and running that cell again you should get expected result. Though you'd probably want to re-organize your Notebook a bit and split that code (at least) into 3 different cells:
installing dependencies
importing modules
your image upload & OCR code
"image named 59" is created by image=cv2.imread('prac.jpg'); gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)in your code.

Pytesseract failed to load due to it being unable to find tesseract

While trying to install and use tesseract on windows 10 with python using pytesseract I get the error:
File "C:\ProgramData\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 194, in run_tesseract
raise TesseractError(status_code, get_errors(error_string))
TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I tried reinstalling tesseract.
I have Set C:\Program Files (x86)\Tesseract-OCR to the PATH envoirment variables
I have added TESSDATA_PREFIX to C:\Program Files (x86)\Tesseract-OCR\tessdata
I have verrified that when I type in 'tesseract' in CMD works
The code i use:
import cv2
import pytesseract
# Uncomment the line below to provide path to tesseract manually
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
# Define config parameters.
# '-l eng' for using the English language
# '--oem 1' for using LSTM OCR Engine
config = ('-l eng --oem 1 --psm 3')
# Read image from disk
im = cv2.imread("Serie1/NL83LHL9.JPG", cv2.IMREAD_COLOR)
# Run tesseract OCR on image
text = pytesseract.image_to_string(im, config=config)
# Print recognized text
print(text)
Results:
CMD > tesseract : shows the tesseract interface

solved by Dmitrii Z.
Indeed it looks a bit odd. One thing you can try is to add tessdata path to your config - config = r'--tessdata-dir "C:\Program Files (x86)\Tesseract-OCR\tessdata" -l eng --oem 1 --psm 3'

If you don't have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files (x86)/Tesseract-OCR/tesseract'

Issues with running pytesseract in script

I'm trying to use pytesseract in my python script to read out a string of text in an image, but I keep getting errors. I'm now trying this code:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files
(x86)\\Tesseract-OCR\\tesseract'
# Include the above line, if you don't have tesseract executable in
#your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-
#OCR\\tesseract'
# Simple image to string
print(pytesseract.image_to_string(Image.open('IMG_9296.jpg')))
The IMG_9296.jpg file is located on my desktop: ~/Desktop. I already have tesseract and pytesseract installed, as typing tesseract into my command line comes back with information about it.
When I type pip install pytesseract into my command line, I get back:
Requirement already satisfied: pytesseract in /Library/Python/2.7/site-packages
Requirement already satisfied: Pillow in /Library/Python/2.7/site-packages (from pytesseract)
So I'm guessing that /Library/Python/2.7/site-packages is my path to pytesseract, so I tried putting that in for pytesseract.pytesseract.tesseract_cmd, but that didn't work. Either way I'm getting this error (I think from the import pytesseract line):
ValueError: Attempted relative import in non-package
Do I need to be entering in a different path, or move/copy pytesseract somewhere? I'm not really sure what's going on.

First check in your command prompt by entering Code
!pip view pytesseract
If it return some info that means you have it in your system , you also need to see the PIL module also.
Now Coming on to your main Question:
In order to pytesseact you need to be on pytesseract.pytesseract.tesseract_cmd path:- And more over you need to be on the path as well where the image is located. So what possibly you can do is as below
import pytesseract
import cv2
import os
os.chdir(FullPath_where_your_"tesseract.exe" is located)
image = cv2.imread('full_path_of_your_image')
pytext = pytesseract.image_to_string(image)
print(pytext)
Do let me know in case its not working
Happy learning !!

FileNotFoundError: [WinError 2] The system cannot find the file specified while trying to use pytesseract

I am trying to write a python script to extract the text from image and i keep getting this error. The script is given below. Error
from PIL import Image
from pytesseract import image_to_string
print (image_to_string(Image.open('samp.png')))
print (image_to_string(Image.open('test-english.jpg'), lang='eng'))

Try the following steps, this worked for me.
1) Download tesseract-OCR from here and install it in the location C:/Program Files
2)write the following code
from PIL import Image
from pytesseract import image_to_string
#pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
i.e
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
3)Now run this
print(pytesseract.image_to_string(Image.open('D:/image_file.jpg')))
Hope that help!

Python error when importing image_to_string from tesseract

I recently used tesseract OCR with python and I kept getting an error when I was trying to import image_to_string from tesseract.
Code causing the problem:
# Perform OCR using tesseract-ocr library
from tesseract import image_to_string
image = Image.open('input-NEAREST.tif')
print image_to_string(image)
Error caused by above code:
Traceback (most recent call last):
file "./captcha.py", line 52, in <module>
from tesseract import image_to_string
ImportError: cannot import name image_to_string
I've verified that the tesseract module is installed:
digital_alchemy#roaming-gnome /home $ pydoc modules | grep 'tesseract'
Hdf5StubImagePlugin _tesseract gzip sipconfig
ORBit cairo mako tesseract
I believe that I've grabbed all the required packages but unfortunately I'm just stuck at this point. It appears that the function is not in the module.
Any help greatly appreciated.

Another possibility that seems to have worked for me is to modify pytesseract so that instead of import Image it has from PIL import Image
Code that works in PyCharm after modifying pytesseract:
from pytesseract import image_to_string
from PIL import Image
im = Image.open(r'C:\Users\<user>\Downloads\dashboard-test.jpeg')
print(im)
print(image_to_string(im))
Pytesseract I installed via the package management built into PyCharm

Is your syntax correct for the module you have installed? That image_to_string functions looks like it is from PyTesser per the usage example on this page:
https://code.google.com/p/pytesser/
Your import looks like it is for python-tesseract which has a more complicated usage example listed:
https://code.google.com/p/python-tesseract/

For windows followed below steps
pip3 install pytesseract
pip3 install pillow
Installation of tessaract-ocr is also required
https://github.com/tesseract-ocr/tesseract/wiki
otherwise you will get an error Tessract is not on path
Python code
from PIL import Image
from pytesseract import image_to_string
print ( image_to_string(Image.open('test.tif'),lang='eng') )

what works for me:
after I install the pytesseract form tesseract-ocr-setup-3.05.02-20180621.exe
I add the line
pytesseract.pytesseract.tesseract_cmd="C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"
and use the code form the above this is all the code:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd="C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"
im=Image.open("C:\\Users\\<user>\\Desktop\\ro\\capt.png")
print(pytesseract.image_to_string(im,lang='eng'))
I am using windows 10 with PyCharm Community Edition 2018.2.3 x64

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading text from image with Pytesseract gives bad path error - python

Related

UnidentifiedImageError: cannot identify image file '59.jpg

Pytesseract failed to load due to it being unable to find tesseract

Issues with running pytesseract in script

FileNotFoundError: [WinError 2] The system cannot find the file specified while trying to use pytesseract

Python error when importing image_to_string from tesseract

Categories

Resources