I am trying to detect the text from the images
but fail due to some unknown reasons.
import pytesseract as pt
from PIL import Image
import re
image = Image.open('sample.jpg')
custom_config = r'--oem 3 --psm 7 outbase digits'
number = pt.image_to_string(image, config=custom_config)
print('Number: ', number)
Number: 0 50 100 200 250 # This is the output that I am getting.
Expected --> 0,0,0,0,0,1,0,8
OCR using tesseract on crude/raw image inputs might not give you expected result.
For the given image, a somewhat better result can be obtained using grayscale conversion followed by thresholding operation
To perform the conversion and thresholding operation you may use ImageMagick as follows:
$ convert input_image.jpg -colorspace gray grayscale_image.jpg
$ convert grayscale_image.jpg -threshold 45% thresholded_image.jpg
$ convert thresholded_image.jpg -morphology Dilate Rectangle:4,3 dilated_binary.jpg
$ python run_tesseract.py
00000109
A more robust approach to OCR is via training the tesseract engine discussed here
Related
I am trying to detect some numbers with tesseract in python. Below you will find my starting image and what I can get it down to. Here is the code I used to get it there.
import pytesseract
import cv2
import numpy as np
pytesseract.pytesseract.tesseract_cmd = "C:\\Users\\choll\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe"
image = cv2.imread(r'64normalwart.png')
lower = np.array([254, 254, 254])
upper = np.array([255, 255, 255])
image = cv2.inRange(image, lower, upper)
image = cv2.bitwise_not(image)
#Uses a language that should work with minecraft text, I have tried with and without, no luck
text = pytesseract.image_to_string(image, lang='mc')
print(text)
cv2.imwrite("Wartthreshnew.jpg", image)
cv2.imshow("Image", image)
cv2.waitKey(0)
I end up with black numbers on a white background which seems pretty good but tesseract can still not detect the numbers. I also noticed the numbers were pretty jagged but I don't know how to fix that. Does anyone have recommendations for how I could make tesseract be able to recognize these numbers?
Starting Image
What I end up with
Your problem is with the page segmentation mode. Tesseract segments every image in a different way. When you don't choose an appropriate PSM, it goes for mode 3, which is automatic and might not be suitable for your case. I've just tried your image and it works perfectly with PSM 6.
df = pytesseract.image_to_string(np.array(image),lang='eng', config='--psm 6')
These are all PSMs availabe at this moment:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
Use the pytesseract.image_to_string(img, config='--psm 8') or try diffrent configs to see if the image will get recognized. Useful link here Pytesseract OCR multiple config options
I am currently facing a problem with pytesseract where the software is unable to detect a number in this image:
For some reason, pytesseract doesn't want to recognise digits in this image. Any suggestions? Here is my code:
import pytesseract
from PIL import ImageEnhance, ImageFilter, Image
img = r'/content/inv_thresh.png'
str = pytesseract.image_to_string(Image.open(img), lang='eng', \
config='--psm 8 --oem 3 -c tessedit_char_whitelist=0123456789')
It returns a string COTO
Why you specify --oem 3 (Default, based on what is available.)
Which model you use? Which tesseract version?
Tesseract expect clear image without artifacts to provide correct results => you will need better preprocess image.
I got following result with tessdata_best mode with recent tesseract (4.1/5.0alpha):
tesseract a9Uq4.png - --psm 8 --dpi 70
00308
I want to read the text from an image.
I use pytesseract in Python.
Here is my code:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = Image.open(r'a.jpg')
image.resize((150, 50),Image.ANTIALIAS).save("pic.jpg")
image = Image.open("pic.jpg")
captcha = pytesseract.image_to_string(image).replace(" ", "").replace("-", "").replace("$", "")
image
However, it returns empty string.
What should be the correct way?
Thanks.
i agree with #Jon Betts
tesseract is not very strong in OCR, only good in binary cases with right settings
CAPTCHAs ment to fool OCRs!
but if you really need to do it, you need to come up with the manual procedure for it,
i created the code below specifically for the type of CAPTCHAs that you gave (but its completely rigid and is not generalized/optimized for all cases)
psudo code
apply median blur
apply a threshold to get Blue colors only (binary image output from this stage)
apply opening to reduce small white pixels in binary image
give the image to tesseract with options:
limited whitelist of output chars
OEM 3 : tesseract + cube
PSM 8 : one word per image
Code
from PIL import Image
import pytesseract
import numpy as np
import cv2
img = cv2.imread('a.jpg')
img = cv2.medianBlur(img, 3)
# extract blue parts
img2 = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8)
cond = np.bitwise_and(img[:, :, 0] >= 100, img[:, :, 2] < 100)
img2[np.where(cond)] = 255
img = img2
# delete the noise
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
str1 = pytesseract.image_to_string(Image.fromarray(img),
config='-c tessedit_char_whitelist=abcedfghijklmnopqrtuvwxyz0123456789 -oem 3 -psm 8')
cv2.imwrite("frame.png", img)
print(str1)
output
f2e4
image
in order to see full options of tesseract, type the following command tesseract --help-extra or refere to this_link
Tesseract is intended for performing OCR on text documents. In my experience it's good but a bit patchy even with very clean data.
In this case it appears you are trying to solve a CAPTCHA which is specifically designed to defeat OCR software. It's very likely you cannot use Tesseract to solve this issue, because:
It's not really designed for that
The scenario is adversarial:
The example is specifically designed to prevent what you are trying to do
If you could get it to work, the other party would likely change it to break again
If you want to proceed I would suggest:
Working on cleaning up the image before attempting to process it (can you get a nice readable black and white image?)
Train your own recognition network using a lot of examples
You can download a multi-channel, 16-bit png file from here (shown below). I have tried multiple Python packages for reading this multi-channel 16-bit-per-channel image. But none work, and if they do somehow they transform the images (scaling etc). I tried using imageio, PIL.Image, scipy.ndimage.imread and a couple more. It seems that they all can read single-channel 16-bit pngs properly but convert the multi-channel images into 8-bit-per-channel. For instance, this is a GitHub issue thst indicates imageio cannot read multi-channel 16-bit images. Another issue (here) for Pillow seems to say the same thing.
So I wonder, does anyone know how can I read multi-channel, 16-bit png files in Python without using OpenCV package properly? Feel free to offer solutions from other packages that I didn't mention anything about here.
Option 1 - Split into channels with ImageMagick
You could use ImageMagick (it is installed on most Linux distros and is available for macOS and Windows) at the command line.
For example, this will separate your 16-bit 3-channel PNG into its constituent channels that you can then process individually in Pillow:
magick input.png -separate channel-$d.png
Now there are 3 separate channels:
-rw-r--r-- 1 mark staff 2276 1 Apr 16:47 channel-2.png
-rw-r--r-- 1 mark staff 3389 1 Apr 16:47 channel-1.png
-rw-r--r-- 1 mark staff 2277 1 Apr 16:47 channel-0.png
and they are each 16-bit, single channel images that Pillow can open:
magick identify channel-*
Sample Output
channel-0.png PNG 600x600 600x600+0+0 16-bit Grayscale Gray 2277B 0.000u 0:00.000
channel-1.png PNG 600x600 600x600+0+0 16-bit Grayscale Gray 3389B 0.000u 0:00.000
channel-2.png PNG 600x600 600x600+0+0 16-bit Grayscale Gray 2276B 0.000u 0:00.000
If you are using ImageMagick v6, replace magick with convert and replace magick identify with plain identify.
Option 2 - Split into channels with NetPBM
As an alternative to ImageMagick, you could use the much lighter weight NetPBM tools to do the same thing:
pngtopam < rainbow.png | pamchannel - 0 -tupletype GRAYSCALE > channel-0.pam
pngtopam < rainbow.png | pamchannel - 1 -tupletype GRAYSCALE > channel-1.pam
pngtopam < rainbow.png | pamchannel - 2 -tupletype GRAYSCALE > channel-2.pam
Pillow can then open the PAM files.
Option 3 - Use PyVips
As an alternative, you could use the extremely fast, memory-efficient pyvips to process your images. Here is an example from the documentation that:
crops 100 pixels off each side
shrinks an image by 10% with bilinear interpolation
sharpens with a convolution and re-saves the image.
Here is the code:
#!/usr/local/bin/python3
import sys
import pyvips
im = pyvips.Image.new_from_file(sys.argv[1], access='sequential')
im = im.crop(100, 100, im.width - 200, im.height - 200)
im = im.reduce(1.0 / 0.9, 1.0 / 0.9, kernel='linear')
mask = pyvips.Image.new_from_array([[-1, -1, -1],
[-1, 16, -1],
[-1, -1, -1]], scale=8)
im = im.conv(mask, precision='integer')
im.write_to_file("result.png")
The result is 16-bit like the input image:
identify result.png
result.png PNG 360x360 360x360+0+0 16-bit sRGB 2900B 0.000u 0:00.000
As you can see it is still 16-bit, and trimming 100px off each side results in 600px becoming 400px and then the 10% reduction makes that into 360px.
Option 4 - Convert to TIFF and use PyLibTiff
A fourth option, if the number of files is an issue, might be to convert your images to TIFF with ImageMagick
convert input.png output.tif
and they retain their 16-bit resolution, and you process them with PyLibTiff as shown here.
Option 5 - Multi-image TIFF processed as ImageSequence
A fifth option, could be to split your PNG files into their constituent channel and store them as a multi-image TIFF, i.e. with Red as the first image in the sequence, green as the second and blue as the third. This means there is no increase in he number of files and also you can store more than 3 channels per file - you mentioned 5 channels somewhere in your comments:
convert input.png -separate multiimage.tif
Check there are now 3 images, each 16-bit, but all in the same, single file:
identify multiimage.tif
multiimage.tif[0] TIFF 600x600 600x600+0+0 16-bit Grayscale Gray 10870B 0.000u 0:00.000
multiimage.tif[1] TIFF 600x600 600x600+0+0 16-bit Grayscale Gray 10870B 0.000u 0:00.000
multiimage.tif[2] TIFF 600x600 600x600+0+0 16-bit Grayscale Gray 10870B 0.000u 0:00.000
Then process them as an image sequence:
from PIL import Image, ImageSequence
im = Image.open("multiimage.tif")
index = 1
for frame in ImageSequence.Iterator(im):
print(index)
index = index + 1
I had the same problem and I found out that imageio can do the job:
img = imageio.imread('path/to/img', format='PNG-FI')
With this option you can read and write multi-channel 16-bit png images (by default imageio uses PNG-PIL as format for reading png files). This works for png images, but changing the format can probably help when dealing with other image types (here a full list of available imageio formats).
To use this format you may need to install the FreeImage plugin as shown in the documentation.
I am currently facing a problem with pytesseract where the software is unable to detect a number in this image:
https://i.stack.imgur.com/kmH2R.png
This is taken from a bigger image with threshold filter applied.
For some reason, pytesseract doesn't want to recognise the 6 in this image. Any suggestions? Here is my code:
image = #Insert raw image here. My code takes a screenshot.
image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
image = cv2.medianBlur(image, 3)
rel, gray = cv2.threshold(image, 127, 255, cv2.THRESH_BINARY)
# If you want to use the image from above, start here.
image = Image.fromarray(image)
string = pytesseract.image_to_string(image)
print(string)
EDIT: With some further investigation, my code works fine wit numbers containing 2 digits. But not those with singular digits.
pytesseract defaults to a mode that looks for large chunks of text (PSM_SINGLE_BLOCK or --psm 6), in order to have it detect a single character you need to run it with the option --psm 10 (PSM_SINGLE_CHAR). However, due to the black spots in the corners of the image you provided it detects them as random dashes and returns nothing in this mode since it things there's multiple characters, so in this case you need to use --psm 8 (PSM_SINGLE_WORD):
string = pytesseract.image_to_string(image, config='--psm 8')
The output from this will include those random characters so you would need to strip them after pytesseract runs or improve your bounding box around the numbers to remove any noise. Also, if all of your characters being detected are numbers you can add '-c tessedit_char_whitelist=0123456789' after '--psm 8' to improve the detection.
Some other minor tips to simplify your code is that cv2.imread has an option to read the image as black & white so you don't need to run cvtColor afterwards, just do:
image = cv2.imread('/path/to/image/6.png', 0)
also you can create the PIL image object within your call to pytesseract, so that line simplifies to:
string = pytesseract.image_to_string(Image.fromarray(img), config='--psm 8')
as long as you have 'from PIL import Image' at the top of your script.