blank output after solving the captcha using pytesseract? - python

I am trying to solve a captcha :
and run a script :
from PIL import Image
from pytesseract import pytesseract
path_to_tesseract = r"/usr/local/Cellar/tesseract/5.0.1/bin/tesseract"
image_path2 = r"captcha2.jpg"
img = Image.open(image_path2)
pytesseract.tesseract_cmd = path_to_tesseract
text = pytesseract.image_to_string(img)
print(text[:-1])
captchaText=text[:-1]
but output is blank and when I use the same script with the following captcha:
it works great.

Related

I am trying to install "pytesseract" but it doesn't work

I am trying to install "pytesseract" but the code after that doesn't work
from PIL import Image
from pytesseract import pytesseract
#Define path to tessaract.exe
path_to_tesseract = r'C:\Python310\Scripts\pytesseract.exe'
#Define path to image
path_to_image = 'extract.png'
#Point tessaract_cmd to tessaract.exe
pytesseract.tesseract_cmd = path_to_tesseract
#Open image with PIL
img = Image.open(path_to_image)
#Extract text from image
text = pytesseract.image_to_string(img)
print(text)
when run the code it shows this error
(ModuleNotFoundError: No module named 'pytesseract')

TesseractNotFoundError: C:\\Program Files\\Tesseract-OCR\\tesseract (2).exe is not installed or it's not in your PATH. See README file for more info

I wrote the following code for extracting text form image in google colab:
from PIL import Image
from pytesseract import pytesseract
path_to_tesseract = r"C:\\Program Files\\Tesseract-OCR\\tesseract (2).exe"
image_path = r"/content/images/result_Page_1.jpg"
img = Image.open(image_path)
pytesseract.tesseract_cmd = path_to_tesseract
text = pytesseract.image_to_string(img)
print(text[:-1])
I am still getting tesseract not found error. I am using google colab on windows 10 laptop.

Tesseract doesn't recognize certain pictures. Python

Tesseract works fine when I use other pictures but whenever I use this picture it doesn't recognize the picture.
Can someone explain me why please?
import cv2
import pytesseract
import time
import random
from pynput.keyboard import Controller
keyboard = Controller() # Create the controller
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
img = cv2.imread("capture5.png")
#img = cv2.resize(img, (300, 300))
cv2.imshow("capture5", img)
text = pytesseract.image_to_string(img)
print(text)
cv2.waitKey(0)
cv2.destroyAllWindows()
I fixed my problem, all I needed to do was add this code to my script.
text = pytesseract.image_to_string(
img, config=("-c tessedit"
"_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ"
" --psm 10"
" "))

NameError: name 'img_new' is not defined, how to fix?

Im trying to get pytesseract to work at identifying an image as single characters and not words.
Using code: This works, but only for detecting words not single characters in the image.
#importing modules
import pytesseract
from PIL import Image
# If you don't have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
#converting image to text
print(pytesseract.image_to_string(Image.open('C:\Program Files\Tesseract-OCR\image2.png')))
Attempting to view single characters Code:
#importing modules
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
#converting image to text
text = pytesseract.image_to_string(img_new, lang='eng', config='--psm 10')
print(pytesseract.image_to_string(Image.open('C:\Program Files\Tesseract-OCR\image2.png')))
I get error
text = pytesseract.image_to_string(img_new, lang='eng', config='--psm 10')
NameError: name 'img_new' is not defined

Tesseract OCR fails on TIFF files

I have a multiple page .tif file, I am trying to extract text from it using Tesseract OCR but I am getting this error
TypeError: Unsupported image object
Code
from PIL import Image
import pytesseract
img = Image.open('Group 1/1_CHE_MDC_1.tif')
text = pytesseract.image_to_string(img.seek(0)) # OCR on 1st Page
text = ' '.join(text.split())
print(text)
ERROR
Any idea why its happening
Image.seek does not have a return value so you're essentially running:
pytesseract.image_to_string(None)
Instead do:
img.seek(0)
text = pytesseract.image_to_string(img)
I had a same question and i have tried below code and it worked for me :-
import glob
import pytesseract
import os
os.chdir("Set your Tesseract-OCR .exe file path")
b = ''
for i in glob.glob('Fullpath of your image directory/*.tif'): <-- you can give *.jpg extension in case of jpg image
if glob.glob('*.tif'):
b = b + (pytesseract.image_to_string(i))
print(b)
Happy learning !

Categories