why google colab cannot identify image file using while using tesseract? - python

!apt install tesseract-ocr
!pip install pytesserac
import pytesseract
from PIL import Image
# Open image file
image = Image.open("/content/book.PNG")
# Recognize text
text = pytesseract.image_to_string(image)
print(text)
I wrote this code in google colab and getting an error "cannot identify image file '/content/book.PNG'"
I am expecting the code to run.

Related

OCR in python on Google Colab, identify information in jpg

I'm trying to use an OCR function to identify the information in a .jpg (referring this link) and system shows:
SyntaxError: invalid character in identifier
!sudo apt install tesseract-ocr
!pip install pytesseract
import pytesseract
import shutil
import os
import random
try:
from PIL import Image
except ImportError:
import Image
from google.colab import files
uploaded = files.upload()
image_path_in_colab=‘Test.jpeg’
extractedInformation = pytesseract.image_to_string(Image.open('image_path_in_colab'), lang='eng')
print(extractedInformation)
I expect to export information in the .jpg

Problem with python script that takes image to text

When running this script and trying to paste it, it doesn't work.
import pytesseract
import pyperclip
import PIL.Image
# Open the image file
image = PIL.Image.open('generate_word_picture.png')
# Extract the text from the image using pytesseract
text = pytesseract.image_to_string(image)
# Copy the text to the clipboard
pyperclip.copy(text)
This is the image:
Image
I have these installed:
-pip install pytesseract
-pip install pyperclip
-https://github.com/UB-Mannheim/tesseract/wiki

UnidentifiedImageError: cannot identify image file '59.jpg

!sudo apt install tesseract-ocr
!pip install pytesseract
from google.colab import files
file_uploaded = files.upload()
import pytesseract
import cv2
import os
from PIL import Image
from google.colab.patches import cv2_imshow
from google.colab import drive
from google.colab import files
image=cv2.imread('prac.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
filename = "{}.jpg".format(os.getpid())
cv2.imwrite(filename, gray)
text = pytesseract.image_to_string(Image.open(filename), lang = None)
os.remove(filename)
print(text)
cv2_imshow(image)
In the line "text = ~~", this code cause error "UnidentifiedImageError: cannot identify image file '59.jpg'". But I don't even have image named 59. what's wrong with this?
Your code works. But you are supposed to restart Colab runtime once pytesseract and its dependencies are installed, as you were guided by pip install (in red):
You must restart the runtime in order to use newly installed versions. ,
followed by RESTART RUNTIME button.
After restarting runtime and running that cell again you should get expected result. Though you'd probably want to re-organize your Notebook a bit and split that code (at least) into 3 different cells:
installing dependencies
importing modules
your image upload & OCR code
"image named 59" is created by image=cv2.imread('prac.jpg'); gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)in your code.

Reading text from image with Pytesseract gives bad path error

Im trying to read text from image with pytesseract. Im using mac.
I have install pytesseract with pip.
import cv2
import pytesseract
img = cv2.imread('slika1.png')
text = pytesseract.image_to_string(img)
print(text)
It gives me this error:
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
when I do this:
import importlib.util
print(importlib.util.find_spec('pytesseract'))
It prints:
ModuleSpec(name='pytesseract', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7f8a7837c160>, origin='/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract/__init__.py', submodule_search_locations=['/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pytesseract'])
So what should I do, what am I doing wrong?
Is there any other way to read text from image?
Try to open the module source file (as admin) and edit the path to the Tesseract exe file - set it with absolute path if needed. It should be a const in the top lines.
Something like this (on Win):
"C:\Program Files\Python36\Lib\site-packages\pytesseract\pytesseract.py"
Set the path: ...
pytesseract.tesseract_cmd = r"D:\OCR\tesseract.exe"
https://github.com/Twenkid/ComputerVision_Pyimagesearch_OpenCV_Dlib_OCR-Tesseract-DL/blob/master/OCR_Tesseract/ocr.py

FileNotFoundError: [WinError 2] The system cannot find the file specified while trying to use pytesseract

I am trying to write a python script to extract the text from image and i keep getting this error. The script is given below. Error
from PIL import Image
from pytesseract import image_to_string
print (image_to_string(Image.open('samp.png')))
print (image_to_string(Image.open('test-english.jpg'), lang='eng'))
Try the following steps, this worked for me.
1) Download tesseract-OCR from here and install it in the location C:/Program Files
2)write the following code
from PIL import Image
from pytesseract import image_to_string
#pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
i.e
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
3)Now run this
print(pytesseract.image_to_string(Image.open('D:/image_file.jpg')))
Hope that help!

Categories