NameError: name 'img_new' is not defined, how to fix? - python

Im trying to get pytesseract to work at identifying an image as single characters and not words.
Using code: This works, but only for detecting words not single characters in the image.
#importing modules
import pytesseract
from PIL import Image
# If you don't have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
#converting image to text
print(pytesseract.image_to_string(Image.open('C:\Program Files\Tesseract-OCR\image2.png')))
Attempting to view single characters Code:
#importing modules
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
#converting image to text
text = pytesseract.image_to_string(img_new, lang='eng', config='--psm 10')
print(pytesseract.image_to_string(Image.open('C:\Program Files\Tesseract-OCR\image2.png')))
I get error
text = pytesseract.image_to_string(img_new, lang='eng', config='--psm 10')
NameError: name 'img_new' is not defined

Related

I am trying to install "pytesseract" but it doesn't work

I am trying to install "pytesseract" but the code after that doesn't work
from PIL import Image
from pytesseract import pytesseract
#Define path to tessaract.exe
path_to_tesseract = r'C:\Python310\Scripts\pytesseract.exe'
#Define path to image
path_to_image = 'extract.png'
#Point tessaract_cmd to tessaract.exe
pytesseract.tesseract_cmd = path_to_tesseract
#Open image with PIL
img = Image.open(path_to_image)
#Extract text from image
text = pytesseract.image_to_string(img)
print(text)
when run the code it shows this error
(ModuleNotFoundError: No module named 'pytesseract')

Py: How to extract text with hashtag from image

I am trying to extract "#opentowork" text from LinkedIn account thumbnails but it does not seem to work.
the image is:
I have tried the simplest way:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
print(pytesseract.image_to_string(r'1633367686467.jpg'))
And using opencv
img = cv2.imread('1516535608688.jpg', 0)
# img = 255 - img #invert image
print(pytesseract.image_to_string(img, lang='eng', config = '--psm 12'))
But it returns nothing. I am not really experienced in text recognition. Is it possible to extract the text from such images?

blank output after solving the captcha using pytesseract?

I am trying to solve a captcha :
and run a script :
from PIL import Image
from pytesseract import pytesseract
path_to_tesseract = r"/usr/local/Cellar/tesseract/5.0.1/bin/tesseract"
image_path2 = r"captcha2.jpg"
img = Image.open(image_path2)
pytesseract.tesseract_cmd = path_to_tesseract
text = pytesseract.image_to_string(img)
print(text[:-1])
captchaText=text[:-1]
but output is blank and when I use the same script with the following captcha:
it works great.

"Unsupported image object", using Python

I am building a character identifier from an image using Tesseract and Python.
This is my code:
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
img = cv2.imread("bigsleep.jpg")
text = pytesseract.image_to_string(img)
cv2.imshow("Img", img)
cv2.waitKey(0)
print(text)
I am getting the following error while executing this program
TypeError: Unsupported image object
Can anyone solve this issue

Tesseract OCR fails on TIFF files

I have a multiple page .tif file, I am trying to extract text from it using Tesseract OCR but I am getting this error
TypeError: Unsupported image object
Code
from PIL import Image
import pytesseract
img = Image.open('Group 1/1_CHE_MDC_1.tif')
text = pytesseract.image_to_string(img.seek(0)) # OCR on 1st Page
text = ' '.join(text.split())
print(text)
ERROR
Any idea why its happening
Image.seek does not have a return value so you're essentially running:
pytesseract.image_to_string(None)
Instead do:
img.seek(0)
text = pytesseract.image_to_string(img)
I had a same question and i have tried below code and it worked for me :-
import glob
import pytesseract
import os
os.chdir("Set your Tesseract-OCR .exe file path")
b = ''
for i in glob.glob('Fullpath of your image directory/*.tif'): <-- you can give *.jpg extension in case of jpg image
if glob.glob('*.tif'):
b = b + (pytesseract.image_to_string(i))
print(b)
Happy learning !

Categories