Issues with running pytesseract in script - python

I'm trying to use pytesseract in my python script to read out a string of text in an image, but I keep getting errors. I'm now trying this code:
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files
(x86)\\Tesseract-OCR\\tesseract'
# Include the above line, if you don't have tesseract executable in
#your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-
#OCR\\tesseract'
# Simple image to string
print(pytesseract.image_to_string(Image.open('IMG_9296.jpg')))
The IMG_9296.jpg file is located on my desktop: ~/Desktop. I already have tesseract and pytesseract installed, as typing tesseract into my command line comes back with information about it.
When I type pip install pytesseract into my command line, I get back:
Requirement already satisfied: pytesseract in /Library/Python/2.7/site-packages
Requirement already satisfied: Pillow in /Library/Python/2.7/site-packages (from pytesseract)
So I'm guessing that /Library/Python/2.7/site-packages is my path to pytesseract, so I tried putting that in for pytesseract.pytesseract.tesseract_cmd, but that didn't work. Either way I'm getting this error (I think from the import pytesseract line):
ValueError: Attempted relative import in non-package
Do I need to be entering in a different path, or move/copy pytesseract somewhere? I'm not really sure what's going on.

First check in your command prompt by entering Code
!pip view pytesseract
If it return some info that means you have it in your system , you also need to see the PIL module also.
Now Coming on to your main Question:
In order to pytesseact you need to be on pytesseract.pytesseract.tesseract_cmd path:- And more over you need to be on the path as well where the image is located. So what possibly you can do is as below
import pytesseract
import cv2
import os
os.chdir(FullPath_where_your_"tesseract.exe" is located)
image = cv2.imread('full_path_of_your_image')
pytext = pytesseract.image_to_string(image)
print(pytext)
Do let me know in case its not working
Happy learning !!

Related

UnidentifiedImageError: cannot identify image file '59.jpg

!sudo apt install tesseract-ocr
!pip install pytesseract
from google.colab import files
file_uploaded = files.upload()
import pytesseract
import cv2
import os
from PIL import Image
from google.colab.patches import cv2_imshow
from google.colab import drive
from google.colab import files
image=cv2.imread('prac.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
filename = "{}.jpg".format(os.getpid())
cv2.imwrite(filename, gray)
text = pytesseract.image_to_string(Image.open(filename), lang = None)
os.remove(filename)
print(text)
cv2_imshow(image)
In the line "text = ~~", this code cause error "UnidentifiedImageError: cannot identify image file '59.jpg'". But I don't even have image named 59. what's wrong with this?
Your code works. But you are supposed to restart Colab runtime once pytesseract and its dependencies are installed, as you were guided by pip install (in red):
You must restart the runtime in order to use newly installed versions. ,
followed by RESTART RUNTIME button.
After restarting runtime and running that cell again you should get expected result. Though you'd probably want to re-organize your Notebook a bit and split that code (at least) into 3 different cells:
installing dependencies
importing modules
your image upload & OCR code
"image named 59" is created by image=cv2.imread('prac.jpg'); gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)in your code.

Reading text from image with Pytesseract gives bad path error

Im trying to read text from image with pytesseract. Im using mac.
I have install pytesseract with pip.
import cv2
import pytesseract
img = cv2.imread('slika1.png')
text = pytesseract.image_to_string(img)
print(text)
It gives me this error:
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
when I do this:
import importlib.util
print(importlib.util.find_spec('pytesseract'))
It prints:
ModuleSpec(name='pytesseract', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f8a7837c160>, origin='/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract/__init__.py', submodule_search_locations=['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract'])
So what should I do, what am I doing wrong?
Is there any other way to read text from image?
Try to open the module source file (as admin) and edit the path to the Tesseract exe file - set it with absolute path if needed. It should be a const in the top lines.
Something like this (on Win):
"C:\Program Files\Python36\Lib\site-packages\pytesseract\pytesseract.py"
Set the path: ...
pytesseract.tesseract_cmd = r"D:\OCR\tesseract.exe"
https://github.com/Twenkid/ComputerVision_Pyimagesearch_OpenCV_Dlib_OCR-Tesseract-DL/blob/master/OCR_Tesseract/ocr.py

Convert PDF to image using python (pdf2image module)

I am using Python with pdf2image module to convert a PDF to image.
My Code :
import numpy as np
from pdf2image import convert_from_path
pages = convert_from_path('file.pdf')
print(pages)
I don't know why i am getting this errors
the error is due to the path ! you should put the absolute path starting from C:\ (windows)

FileNotFoundError: [WinError 2] The system cannot find the file specified while trying to use pytesseract

I am trying to write a python script to extract the text from image and i keep getting this error. The script is given below. Error
from PIL import Image
from pytesseract import image_to_string
print (image_to_string(Image.open('samp.png')))
print (image_to_string(Image.open('test-english.jpg'), lang='eng'))
Try the following steps, this worked for me.
1) Download tesseract-OCR from here and install it in the location C:/Program Files
2)write the following code
from PIL import Image
from pytesseract import image_to_string
#pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
i.e
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
3)Now run this
print(pytesseract.image_to_string(Image.open('D:/image_file.jpg')))
Hope that help!

Python error when importing image_to_string from tesseract

I recently used tesseract OCR with python and I kept getting an error when I was trying to import image_to_string from tesseract.
Code causing the problem:
# Perform OCR using tesseract-ocr library
from tesseract import image_to_string
image = Image.open('input-NEAREST.tif')
print image_to_string(image)
Error caused by above code:
Traceback (most recent call last):
file "./captcha.py", line 52, in <module>
from tesseract import image_to_string
ImportError: cannot import name image_to_string
I've verified that the tesseract module is installed:
digital_alchemy#roaming-gnome /home $ pydoc modules | grep 'tesseract'
Hdf5StubImagePlugin _tesseract gzip sipconfig
ORBit cairo mako tesseract
I believe that I've grabbed all the required packages but unfortunately I'm just stuck at this point. It appears that the function is not in the module.
Any help greatly appreciated.
Another possibility that seems to have worked for me is to modify pytesseract so that instead of import Image it has from PIL import Image
Code that works in PyCharm after modifying pytesseract:
from pytesseract import image_to_string
from PIL import Image
im = Image.open(r'C:\Users\<user>\Downloads\dashboard-test.jpeg')
print(im)
print(image_to_string(im))
Pytesseract I installed via the package management built into PyCharm
Is your syntax correct for the module you have installed? That image_to_string functions looks like it is from PyTesser per the usage example on this page:
https://code.google.com/p/pytesser/
Your import looks like it is for python-tesseract which has a more complicated usage example listed:
https://code.google.com/p/python-tesseract/
For windows followed below steps
pip3 install pytesseract
pip3 install pillow
Installation of tessaract-ocr is also required
https://github.com/tesseract-ocr/tesseract/wiki
otherwise you will get an error Tessract is not on path
Python code
from PIL import Image
from pytesseract import image_to_string
print ( image_to_string(Image.open('test.tif'),lang='eng') )
what works for me:
after I install the pytesseract form tesseract-ocr-setup-3.05.02-20180621.exe
I add the line
pytesseract.pytesseract.tesseract_cmd="C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"
and use the code form the above this is all the code:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd="C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"
im=Image.open("C:\\Users\\<user>\\Desktop\\ro\\capt.png")
print(pytesseract.image_to_string(im,lang='eng'))
I am using windows 10 with PyCharm Community Edition 2018.2.3 x64

Categories