Very easy code, I can read image from either image1 as png or image2 as jpg. Same image with different format.
Then filter out the darker part into black, show the brighter part into white.
#image = mpimg.imread('image1.png')
image = mpimg.imread('image2.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
thresh = (180, 255)
binary = np.zeros_like(gray)
binary[(gray > thresh[0]) & (gray <= thresh[1])] = 1
Somehow, when I plot the binary of image1, it is all black, but image2 looks what I tend to do.
The problem is most likely due to matplotlib.image successfully reading the png while the jpg falls back to using Pillow. The image resulting from the png read will be an array of floating point values with a range of 0.0 to 1.0 while the jpg read will be an array of bytes with values 0..255. As a result your clip operation will result in an all black image as everything is below 1.
See http://matplotlib.org/users/image_tutorial.html for more information.
Related
I have an image that looks like this:
And this is the processed image
I have tried pretty much everything. I processed the image like this:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #Converting to GrayScale
(h, w) = gray.shape[:2]
gray = cv2.resize(gray, (w*2, h*2))
thresh = cv2.threshold(gray, 150, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
gray = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, rectKernel)
blur = cv2.GaussianBlur(gray,(1,1),cv2.BORDER_DEFAULT)
text = pytesseract.image_to_string(blur, config="--oem 1 --psm 6")
But Tesseract doesnt print out anything. I am using this version of tesseract
5.0.0-alpha.20201127
How do I improve it's performance? Its highly unreliable.
Edit:
The answer below did a wonderful job on the said image.
But when I apply this technique to image like this one I get wrong output
Why is that? They seem roughly the same.
The problem is characters are not in center of the image.
Sometimes, tesseract have difficulty recognizing the characters or digit if they are not on the center.
Therefore my suggestion is:
Center the characters
Up-sample and convert to gray-scale
Centering the characters:
cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
50 is just a padding variable, you can set to any other value.
The background turns blue because of the value. OpenCV read the image in BGR fashion. giving 255 as an input is same as [255, 0, 0] which is display blue channel, but not green and red respectively.
You can try with other values. For me it won't matter, since I'll convert it to gray-scale on the next step.
Up-sampling and converting to gray-scale:
The same steps you have done. The first three-line of your code.
Now when you read:
MEHVISH MUQADDAS
Code:
import cv2
import pytesseract
# Load the image
img = cv2.imread("onf0D.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OCR
txt = pytesseract.image_to_string(gry, config="--psm 6")
print(txt)
Read more tesseract-improve-quality.
You don't need to do threshold, GaussianBlur or morphologyEx.
The reasons are:
Simple-Threshold is used to get the features of the image. Input images' features are already available.
You don't have to smooth the image, there is no illumination effect on the image.
You don't need to do segmentation, since background is plain-white.
Update-1
The second image requires pre-processing. However, applying simple-threshold won't work on this image. You need to remove the background using a binary mask, then you can apply OCR.
Result of the binary-mask:
Now, if you apply OCR:
IRUM FEROZ
Code:
import cv2
import numpy as np
import pytesseract
# Load the image
img = cv2.imread("jCMft.jpg")
# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[255])
# Up-sample
img = cv2.resize(img, (0, 0), fx=2, fy=2)
# Convert to HSV color-space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Adaptive-Threshold
msk = cv2.inRange(hsv, np.array([0, 0, 0]), np.array([179, 255, 130]))
# OCR
txt = pytesseract.image_to_string(msk, config="--psm 6")
print(txt)
Q:How do I find the lower and upper bounds of the cv2.inRange method?
A: You can use the following script.
Q: What did you change in the second image?
A: First I converted image to the HSV format, instead of gray-scale. The reason is I wanted remove the background. If you experiment with adaptiveThreshold you will see there are a lot of artifacts on the background limits the tesseract recognition. Then I used cv2.inRange to get a binary mask. Feeding binary-mask to the input gave me the desired result.
I create and view a random image. Then, this image is encoded as a jpg file with opencv. However, after decoding this image the colors have been changed slightly. This behavior is not seen using png to encode. Can anyone explain why this occurs? Is it a negative result of the jpeg compression? Am I doing something wrong? Code sample below to recreate this.
import cv2
import numpy as np
random_image = np.random.randint(255, size=(4,4,3), dtype=np.uint8)
cv2.imshow('Image', random_image)
cv2.waitKey()
_, img_encoded = cv2.imencode('.jpg', random_image)
img_string = img_encoded.tostring()
npimg = np.fromstring(img_string, dtype=np.uint8)
img = cv2.imdecode(npimg, 1)
cv2.imshow('Image', img)
cv2.waitKey()
# Does not happen with png
_, img_encoded = cv2.imencode('.png', random_image)
img_string = img_encoded.tostring()
npimg = np.fromstring(img_string, dtype=np.uint8)
img = cv2.imdecode(npimg, 1)
cv2.imshow('Image', img)
cv2.waitKey()
Edited to add some 4x4 images.
Original:
JPG
PNG
Edited again with 512x512 images
Original 512x512
JPG 512x512
PNG 512x512
JPG is a lossy compression. It actually works by modifying colours in a way that should be "unoffensive" to the eye. PNG is a lossless compression, so you get exactly what to had after encode/decode.
You can control how much JPG will be free to modify the image by specifying the IMWRITE_JPEG_QUALITY parameter like this:
cv2.imencode(('.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 90])
Higher values means less compression and result closer to original. For example, 100 should be no compression at all and result identical to original.
I have a text detector which outputs polygon coordinates of detected text:
I am using below loop to show how the detected text looks like with bounding boxes:
for i in range(0, num_box):
pts = np.array(boxes[0][i],np.int32)
pts = pts.reshape((-1,1,2))
print(pts)
print('\n')
img2 = cv2.polylines(img,[pts],True,(0,255,0),2)
return img2
Each pts stores all coordinates of a polygon, for one text box detection:
pts =
[[[509 457]]
[[555 457]]
[[555 475]]
[[509 475]]]
I would like to convert the area inside the bounding box described by pts to grayscale using:
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
However I am not sure how should I provide the image argument in above gray_image as I want to convert only the area described by pts to grayscale and not the entire image (img2). I want the rest of the image to be white.
From my understanding you want to convert the content of the bounding box to grayscale, and set the rest of the image to white (background).
Here would be my solution to achieve that:
import cv2
import numpy as np
# Some input image
image = cv2.imread('path/to/your/image.png')
# Some pts
pts = np.array([[60, 40], [340, 40], [340, 120], [60, 120]])
# Get extreme x, y coordinates from box
x1 = pts[0][0]
y1 = pts[0][1]
x2 = pts[1][0]
y2 = pts[2][1]
# Build output; initialize white background
image2 = 255 * np.ones(image.shape, np.uint8)
image2[y1:y2, x1:x2] = cv2.cvtColor(cv2.cvtColor(image[y1:y2, x1:x2], cv2.COLOR_BGR2GRAY), cv2.COLOR_GRAY2BGR)
# Show bounding box in original image
cv2.polylines(image, [pts], True, (0, 255, 0), 2)
cv2.imshow('image', image)
cv2.imshow('image2', image2)
cv2.waitKey(0)
cv2.destroyAllWindows()
The main "trick" is to use OpenCV's cvtColor method twice just on the region of interest (ROI) of the image, first time converting BGR to grayscale, and then grayscale back to BGR. Accessing rectangular ROIs in "Python OpenCV images" is done by proper NumPy array indexing and slicing. Operations solely on these ROIs are supported by most OpenCV functions (Python API).
EDIT: If your final image is a plain grayscale image, the backwards conversion of course can be omitted!
These are some outputs, I generated with my "standard image":
Hope that helps!
I am trying to use tesseract ocr to convert an image to text. The image always have three letters without rotation/skew, but randomly distributed in an 90x50 png file.
By just cleaning and converting to black/white, tesseract could not get the text in the image. After aligning them by hand in Paint, the ocr gives the exact match. I doesn't even need to be exactly aligned.
What I want is some tips on how to automate this alignment of the characters in the image prior to sending it to tesseract.
I am using python with tesseract and opencv.
Original image:
What I have done - turn black and white:
What I want to do - aligned by code:
You can use the following code to achieve this output. Some of the constants may need to be changed to fit your needs:
import cv2
import numpy as np
# Read the image (resize so it is easier to see)
img = cv2.imread("/home/stephen/Desktop/letters.png",0)
h,w = img.shape
img = cv2.resize(img, (w*5,h*5))
# Threshold the image and find the contours
_, thresh = cv2.threshold(img, 123, 255, cv2.THRESH_BINARY_INV);
contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
# Create a white background iamge to paste the letters on
bg = np.zeros((200,200), np.uint8)
bg[:] = 255
left = 5
# Iterate through the contours
for contour,h in zip(contours, hierarchy[0]):
# Ignore inside parts (circle in a 'p' or 'b')
if h[3] == -1:
# Get the bounding rectangle
x,y,w,h = cv2.boundingRect(contour)
# Paste it onto the background
bg[5:5+h,left:left+w] = img[y:y+h,x:x+w]
left += (w + 5)
cv2.imshow('thresh', bg)
cv2.waitKey()
So basically I have a colored RGB image and I want to add a colored overlay over the RGB image without converting it to gray level.
For example if I have a colored image(RGB). And I want to add a transparent blue color over the index like this
img[200:350, 200:350] = [0, 0, 1] # Blue block
This question is a sibling question to this one:
Applying a coloured overlay to an image in either PIL or Imagemagik
Difference is the color space. The above question is for gray level images rather colored (RGB).
from skimage import io, data
import numpy as np
img = data.astronaut()
Please use the above code to answer.
Here is the code in OpenCV:
import cv2
# load the image
image = cv2.imread("2.jpg")
# I resized the images because they were to big
image = cv2.resize(image, (0,0), fx=0.75, fy=0.75)
overlay = image.copy()
output = image.copy()
#select the region that has to be overlaid
cv2.rectangle(overlay, (420, 205), (595, 385),(0, 255, 255), -1)
#Adding the transparency parameter
alpha = 1
#Performing image overlay
cv2.addWeighted(overlay, alpha, output, 1 - alpha,0, output)
#Save the overlaid image
cv2.imwrite('Output'+str(alpha) +'.jpg', output)
cv2.waitKey(0)
cv2.destroyAllWindows()
Some results:
when alpha = 0.1
when alpha = 0.5
when alpha = 0.8
when alpha = 1.0 (the overlay is no longer transparent but opaque)