Related
With a uint8, 3-channel image and uint8 binary mask, I have done the following in opencv and python in order to change an object on a black background into an object on a transparent background:
# Separate image into its 3 channels
b, g, r = cv2.split(img)
# Merge channels back with mask (resulting in a 4-channel image)
imgBGRA = cv2.merge((b, g, r, mask))
However, when I try doing this with a uint16, 3-channel image and uint16 binary mask, the saved result is 4-channel, but the background is still black. (I saved it as a .tiff file and viewed it in Photoshop.)
How can I make the background transparent, keeping the output image uint16?
UPDATE
Seeing #Shamshirsaz.Navid and #fmw42 comments, I tried
imgBGRA=cv2.cvtColor(imgBGR, cv2.COLOR_BGR2BGRA). Then used Numpy to add the alpha channel from the mask: imgBGRA[:,:,3]=mask. (I hadn't tried this, as I thought that cvtColor operations required an 8-bit image.) Nonetheless, my results are the same.
I think the problem is my mask. When I run numpy.amin(mask), I get 0, and for numpy.amax(mask), I get 1. What should they be? I tried multiplying the mask by 255 prior to using the split/merge technique, but the background was still black. Then I tried mask*65535, but again the background was black.
I had tried to keep the scope of my initial post narrow. But it seems that my problem does lie somewhere in the larger scope of what I'm doing and how this uint16 mask gets created.
I'm using connectedComponentsWithStats (CC) to cut out the components on a uint16 image. CC requires an 8-bit mask, which I am using as input to CC. But the cutout results need to be from my uint16 original. This has required some alterations to the way I learned to use CC on uint8 images. Note that the per-component mask (which I eventually use to try to make the background transparent) is created as uint16. Here is the whittled down version:
# img is original image, dtype=uint16
# bin is binary mask, dtype=uint8
cc = cv2.connectedComponentsWithStats(bin, connectivity, cv2.CV_32S)
num_labels = cc[0]
labels = cc[1]
for i in range(1, num_labels):
maskg = (labels == i).astype(np.uint16) # with uint8: maskg = (labels == i).astype(np.uint8) * 255
# NOTE: I don't understand why removing the `* 255` works; but after hours of experimenting, it's the only way I could get the original to appear correctly when saving 'glyph'; for all other methods I tried the colors were off in some significant way -- either grayish blue whereas the object in my original is variations of brown, or else a pixelated rainbow of colors)
glyph = img * maskg[..., np.newaxis] # with uint8: glyph = cv2.bitwise_and(img, img, mask=maskg)
b, g, r = cv2.split(glyph)
glyphBGRA = cv2.merge((b, g, r, maskg))
example (my real original image is huge and, also, I am not able share it; so I put together this example)
img (original uint16 image)
bin (input uint8 mask)
maskg (uint16 component mask created within loop)
(this is a screenshot -- it shows up all black when uploaded directly)
glyph (img with maskg applied)
glyphBGRA (result of split and merge method trying to add transparency)
(this is also a screenshot -- this one showed up all white/blank when added directly)
I hope this added info provides sufficient context for my problem.
I checked your last comment. I think an example might be better. Your code is correct; The question is, how did you use it? I attached a picture and a mask to test on them.
import sys,cv2
main = cv2.imread(sys.path[0]+'/main.png')
mask = cv2.imread(sys.path[0]+'/mask.png', cv2.IMREAD_GRAYSCALE)
mask = cv2.threshold(mask, 127, 255, cv2.THRESH_BINARY)[1]
b, g, r = cv2.split(main)
bgra = cv2.merge((b, g, r, mask))
cv2.imwrite(sys.path[0]+'/out_split_merge.png',bgra)
Main:
Mask:
Output:
If you open the final output with an image editing software, you will notice that part of it is transparent.
Diagnosis: Opencv is not able to save tiff with an alpha channel.
The following is from the opencv docs' entry for imwrite():
The function imwrite saves the image to the specified file. The image
format is chosen based on the filename extension (see cv::imread for
the list of extensions). In general, only 8-bit single-channel or
3-channel (with 'BGR' channel order) images can be saved using this
function, with these exceptions:
16-bit unsigned (CV_16U) images can be saved in the case of PNG, JPEG 2000, and TIFF formats
32-bit float (CV_32F) images can be saved in PFM, TIFF, OpenEXR, and Radiance HDR formats; 3-channel (CV_32FC3) TIFF images will be
saved using the LogLuv high dynamic range encoding (4 bytes per
pixel)
PNG images with an alpha channel can be saved using this function. To do this, create 8-bit (or 16-bit) 4-channel image BGRA, where
the alpha channel goes last. Fully transparent pixels should have
alpha set to 0, fully opaque pixels should have alpha set to
255/65535 (see the code sample below).
How I got to this point:
I manually removed the background in Photoshop and saved as png file and as tiff file. (They both look like this:)
Then I ran:
import cv2
import numpy as np
png16 = cv2.imread('c:/users/scott/desktop/python2/teststack/png16.png', cv2.IMREAD_UNCHANGED)
tif16 = cv2.imread('c:/users/scott/desktop/python2/teststack/tif16.tiff', cv2.IMREAD_UNCHANGED)
print('png16:', png16.dtype, png16.shape)
b, g, r, a = cv2.split(png16)
mmin = np.amin(a)
mmax = np.amax(a)
print('png16-a channel:', a.dtype, a.shape, mmin, mmax)
pixvals = np.unique(a.flatten()) # get all unique pixel values in a
print('png16-a channel pixel values:', pixvals)
print('tif16:', tif16.dtype, tif16.shape)
b, g, r, a = cv2.split(tif16)
mmin = np.amin(a)
mmax = np.amax(a)
print('tif16-a channel:', a.dtype, a.shape, mmin, mmax)
pixvals = np.unique(a.flatten()) # get all unique pixel values in a
print('tif16-a channel pixel values:', pixvals)
png16copy = png16.copy()
tif16copy = tif16.copy()
cv2.imwrite('c:/users/scott/desktop/python2/teststack/png16copy.png', png16copy)
cv2.imwrite('c:/users/scott/desktop/python2/teststack/tif16copy.tiff', tif16copy)
The output is all as one should expect:
png16: uint16 (312, 494, 4)
png16-a channel: uint16 (312, 494) 0 65535
png16-a channel pixel values: [ 0 65535]
tif16: uint16 (312, 494, 4)
tif16-a channel: uint16 (312, 494) 0 65535
tif16-a channel pixel values: [ 0 65535]
Back in Photoshop, the png file looked like it did before:
But the tiff file did not.
Without alpha channel visible:
With alpha channel visible:
So I knew at this point that the problem was in the saving. I reread the opencv docs for imwrite and picked up on the logic: if it's not 8-bit single-channel or 3-channel, and if it's not spelled out explicitly in the exceptions, it won't work.
I did some more searching and found something that does work. I installed tifffile and ran:
from tifffile import imsave
tif16copy2 = cv2.cvtColor(tif16copy, cv2.COLOR_BGRA2RGBA)
imsave('c:/users/scott/desktop/python2/teststack/tif16copy2.tiff', tif16copy2)
Here is the result in Photoshop:
I want to resize png picture 476x402 to 439x371, and I used resize method of PIL(image) or opencv, however, it will loss some sharp. After resize, The picture becomes blurred.
How to resize(shrink) image without losing sharpness with use python?
from skimage import transform, data, io
from PIL import Image
import os
import cv2
infile = 'D:/files/script/org/test.png'
outfile = 'D:/files/script/out/test.png'
''' PIL'''
def fixed_size1(width, height):
im = Image.open(infile)
out = im.resize((width, height),Image.ANTIALIAS)
out.save(outfile)
''' open cv'''
def fixed_size2(width, height):
img_array = cv2.imread(infile)
new_array = cv2.resize(img_array, (width, height), interpolation=cv2.INTER_CUBIC)
cv2.imwrite(outfile, new_array)
def fixed_size3(width, height):
img = io.imread(infile)
dst = transform.resize(img, (439, 371))
io.imsave(outfile, dst)
fixed_size2(371, 439)
src:476x402
resized:439x371
How can you pack 2000 pixels into a box that only holds 1800? You can't.
Putting the same amount of information (stored as pixels in your source image) into a smaller pixelarea only works by
throwing away pixels (i.e. discarding single values or by cropping an image which is not what you want to do)
blending neighbouring pixels into some kind of weighted average and replace say 476 pixels with slightly altered 439 pixels
That is exactly what happens when resizing images. Some kind of algorithm (interpolation=cv2.INTER_CUBIC, others here) tweaks the pixel values to merge/average them so you do not loose too much of information.
You can try to change the algorithm or you can apply further postprocessing ("sharpening") to enrich the contrasts again.
Upon storing the image, certain formats do "lossy" storage to minimize file size (JPG) others are lossless (PNG, TIFF, JPG2000, ...) which further might blur your image if you choose a lossy image format.
See
Shrink/resize an image without interpolation
How can I sharpen an image in OpenCV?
I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.
every time I change the image from the original I lose accuracy in detecting some characters.
I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.
Example:
some variations on the image from gray scale or b&w
What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?
some ideas I already tried, also some thresholding.
dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)
contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)
sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)
I can only offer a butcher's solution, potentially a nightmare to maintain.
In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.
My prerequisites:
I knew exactly in which area of the screen the text was going to go.
I knew exactly which fonts and colors were going to be used.
the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
I could not detect reliably text changes to average frames and reduce the interference.
What I did:
- I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.
- The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.
(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).
In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.
You tried almost every standard step. I would advise you to try some PIL built-in filters like sharpness filter. Apply sharpness and contrast on the RGB image, then binarise it. Perhaps use Image.split() and Image.merge() to binarise each colour separately and then bring them back together.
Or convert your image to YUV and then use just Y channel for further processing.
Also, if you do not have a monochrome background consider performing some background substraction.
What tesseract likes when detecting scanned text is removed frames, so you can try to destroy as much of non character space from the image. (You might need to keep the picture size though, so you should replace it with white colour). Tesseract also likes straight lines. So some deskewing might be in order if your text is recorded at an angle. Tesseract also sometimes gives better results if you resize the image to twice its original size.
I suspect that Google Vision uses tesseract, or portions of it, but what other preprocessing it does for you I have no idea. So some of my advices here might actually be implemented already and doing them would be unnecessary and repetitive.
You will need to pre-process the image more than once, and use a bitwise_or operation to combine the results. To extract the colors, you could use
import cv2
boundaries = [ #BGR colorspace for opencv, *not* RGB
([15, 15, 100], [50, 60, 200]), #red
([85, 30, 2], [220, 90, 50]), #blue
([25, 145, 190], [65, 175, 250]), #yellow
]
for (low, high) in boundaries:
low = np.array(low, dtype = "uint8")
high = np.array(high, dtype = "uint8")
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, low, high)
bitWise = cv2.bitwise_and(image, image, mask=mask)
#now here is the image masked with the specific color boundary...
Once you have the masked image, you can do another bitwise_or operation on your to-be "final" image, essentially adding this mask to it.
but this specific implementation requires opencv, however the same principle applies for other image packages.
I need a little more context on this.
How many calls are you going to do to the Google Vision API? If you are doing this throughout a whole stream, you'd probably need to get a paid subscription.
What are you going to do with this data? How accurate does the OCR need to be?
Assuming you get this snapshot from another's twitch stream, dealing with the streamer's video compression and network connectivity, you're going to get pretty blurry snapshot, so OCR is going to be pretty tough.
The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR. If you are set on OCR, one approach you could try:
Binarize the image to get the non-red text in white and background black as in your binarized image:
from PIL import Image
def binarize_image(im, threshold):
"""Binarize an image."""
image = im.convert('L') # convert image to monochrome
bin_im = image.point(lambda p: p > threshold and 255)
return bin_im
im = Image.open("game_text.JPG")
binarized = binarize_image(im, 100)
Extract only the red text values with a filter, then binarize it:
import cv2
from matplotlib import pyplot as plt
lower = [15, 15, 100]
upper = [50, 60, 200]
lower = np.array(lower, dtype = "uint8")
upper = np.array(upper, dtype = "uint8")
mask = cv2.inRange(im, lower, upper)
red_binarized = cv2.bitwise_and(im, im, mask = mask)
plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB))
plt.show()
However, even with this filtering, it still doesn't extract red well.
Add images obtained in (1.) and (2.).
combined_image = binarized + red_binarized
Do OCR on (3.)
This is not a full solution but it may drive to something better.
By converting your data from BGR (or RGB) to CIE-Lab you can process a grayscale image as the weighted sum of the colour channels a* and b*.
This grayscale image will enhance colour regions of the text.
But adapting the threshold you can from this grayscale image segment the coloured word in your original image and get the other words from the a L channel thresholding.
A bitwise and operator should be enough to merge to two segmentation image.
If you can have an image with a better contrast a very last step could be a filling based on the contours.
For that take a look to RETR_FLOODFILL of the function 'cv2.findContours'.
Any other hole filing function from other package may also fit for that purpose.
Here is a code that show the first part of my idea.
import cv2
import numpy as np
from matplotlib import pyplot as plt
I = cv2.UMat(cv2.imread('/home/smile/QSKN.png',cv2.IMREAD_ANYCOLOR))
Lab = cv2.cvtColor(I,cv2.COLOR_BGR2Lab)
L,a,b = cv2.split(Lab)
Ig = cv2.addWeighted(cv2.UMat(a),0.5,cv2.UMat(b),0.5,0,dtype=cv2.CV_32F)
Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
#k = np.ones((3,3),np.float32)
#k[2,2] = 0
#k*=-1
#
#Ig = cv2.filter2D(Ig,cv2.CV_32F,k)
#Ig = cv2.absdiff(Ig,0)
#Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
_, Ib = cv2.threshold(Ig,0.,255.,cv2.THRESH_OTSU)
_, Lb = cv2.threshold(cv2.UMat(L),0.,255.,cv2.THRESH_OTSU)
_, ax = plt.subplots(2,2)
ax[0,0].imshow(Ig.get(),cmap='gray')
ax[0,1].imshow(L,cmap='gray')
ax[1,0].imshow(Ib.get(),cmap='gray')
ax[1,1].imshow(Lb.get(),cmap='gray')
import numpy as np
from skimage.morphology import selem
from skimage.filters import rank, threshold_otsu
from skimage.util import img_as_float
from PIL import ImageGrab
import matplotlib.pyplot as plt
def preprocessing(image, strelem, s0=30, s1=30, p0=.3, p1=1.):
image = rank.mean_bilateral(image, strelem, s0=s0, s1=s1)
condition = (lambda x: x>threshold_otsu(x))(rank.maximum(image, strelem))
normalize_image = rank.autolevel_percentile(image, strelem, p0=p0, p1=p1)
return np.where(condition, normalize_image, 0)
#Grab image from clipboard
image = np.array(ImageGrab.grabclipboard())
sel = selem.disk(4)
a = sum([img_as_float(preprocessing(image[:, :, x], sel, p0=0.3)) for x in range(3)])/3
fig, ax = plt.subplots(1, 2, sharey=True, sharex=True)
ax[0].imshow(image)
ax[1].imshow(rank.autolevel_percentile(a, sel, p0=.4))
This is my code for clearing text from noise and creating uniform brightness for characters.
With minor modifications, I used it to solve your problem.
my job is to detect and get the size of red particles from image. I tried simple blob detections, but works bad with colour filter and extracting values of red using the HSV but I got poor results because the image has small resolution (I work on Rasperry Pi using a webcam).
Here is a sample picture:
Using the HSV colour space is perfectly fine. If you show the hue and saturation components of the image, you'll see that the red particles have a relatively large hue with a small saturation.
BTW, your image is rather large in resolution. I'm going to downsample for the purposes of fitting the images into the post as well as minimizing processing time. First let's load in your image, resize it down to 25% resolution, then extract out the HSV components:
import cv2
import numpy as np
im = cv2.imread('sample.png')
im_resize = cv2.resize(im, None, None, 0.25, 0.25)
out = cv2.cvtColor(im_resize, cv2.COLOR_BGR2HSV)
stacked = np.hstack([out[...,0], out[...,1]])
cv2.imshow("Hue & Saturation", stacked)
cv2.waitKey(0)
cv2.destroyAllWindows()
I'm also stacking the hue and saturation channels together into a single image so we can see what it looks like and displaying this to the screen.
We get this image:
The combination of a relatively large hue component with a low saturation component is unique in comparison to the rest of the image. Let's do some simple thresholding to extract out those components where we look for areas that have a hue component that is greater than one threshold and a saturation component that is smaller than another threshold:
hue_thresh = 100
saturation_thresh = 32
thresh = np.logical_and(out[...,0] > hue_thresh, out[...,1] < saturation_thresh)
cv2.imshow("Thresholded", 255*(thresh.astype(np.uint8)))
cv2.waitKey(0)
cv2.destroyAllWindows()
I set some tuned thresholds, then use numpy.logical_and to combine both conditions together. Because the image is now of type bool and to display images, they should be an unsigned or floating-point type, we convert the image to uint8 then multiply by 255.
We now get this image:
As you can see, we extract out the portions that are a reddish hue that is not common with the background. The thresholds will also need to be played around with, but it's fine for this particular example.
I can only ever find examples in C/C++ and they never seem to map well to the OpenCV API. I'm loading video frames (both from files and from a webcam) and want to reduce them to 16 color, but mapped to a 24-bit RGB color-space (this is what my output requires - a giant LED display).
I read the data like this:
ret, frame = self._vid.read()
image = cv2.cvtColor(frame, cv2.COLOR_RGB2BGRA)
I did find the below python example, but cannot figure out how to map that to the type of output data I need:
import numpy as np
import cv2
img = cv2.imread('home.jpg')
Z = img.reshape((-1,3))
# convert to np.float32
Z = np.float32(Z)
# define criteria, number of clusters(K) and apply kmeans()
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 8
ret,label,center=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
# Now convert back into uint8, and make original image
center = np.uint8(center)
res = center[label.flatten()]
res2 = res.reshape((img.shape))
cv2.imshow('res2',res2)
cv2.waitKey(0)
cv2.destroyAllWindows()
That obviously works for the OpenCV image viewer but trying to do the same errors on my output code since I need an RGB or RGBA format. My output works like this:
for y in range(self.height):
for x in range(self.width):
self._led.set(x,y,tuple(image[y,x][0:3]))
Each color is represented as an (r,g,b) tuple.
Any thoughts on how to make this work?
I think the following could be faster than kmeans, specially with a k = 16.
Convert the color image to gray
Contrast stretch this gray image to so that resulting image gray levels are between 0 and 255 (use normalize with NORM_MINMAX)
Calculate the histogram of this stretched gray image using 16 as the number of bins (calcHist)
Now you can modify these 16 values of the histogram. For example you can sort and assign ranks (say 0 to 15), or assign 16 uniformly distributed values between 0 and 255 (I think these could give you a consistent output for a video)
Backproject this histogram onto the stretched gray image (calcBackProject)
Apply a color-map to this backprojected image (you might want to scale the backprojected image befor applying a colormap using applyColorMap)
Tip for kmeans:
If you are using kmeans for video, you can use the cluster centers from the previous frame as the initial positions in kmeans for the current frame. That way, it'll take less time to converge, so kmeans in the subsequent frames will most probably run faster.
You can speed up your processing by applying the k-means on a downscaled version of your image. This will give you the cluster centroids. You can then quantify each pixel of the original image by picking the closest centroid.