Python 3.10,
opencv-python==4.5.4.60
I'm having a hard time understanding the color format of screenshots taken by MSS.
Here is the portion of my screen that I am screenshotting (with correct/expected colors, taken manually):
Here is the screenshot code I'm working with:
with mss.mss() as sct:
monitor = rect
res = np.array(sct.grab(monitor))[:, :, :3] # remove alpha
When I use cv2.imsave() or cv2.imshow(), the screenshot looks as expected. If I attempt to create masks on this image based on [R, G, B] colors, it produces opposite results, implying that the default color space of MSS Screenshots is BGRA.
If I do the following:
res = cv2.cvtColor(res, cv2.COLOR_BGRA2RGB)
cv2.imshow("res", res)
cv2.waitKey(0)
It produces this:
However, then I am able to perform RGB masking properly.
I may have just answered my own question, but is it correct to say that MSS Screenshots are in BGRA format, and OpenCV expects BGR format - which would explain why RGB images have reversed colors when using cv2.imshow() and cv2.imsave()?
I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.
every time I change the image from the original I lose accuracy in detecting some characters.
I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.
Example:
some variations on the image from gray scale or b&w
What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?
some ideas I already tried, also some thresholding.
dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)
contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)
sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)
I can only offer a butcher's solution, potentially a nightmare to maintain.
In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.
My prerequisites:
I knew exactly in which area of the screen the text was going to go.
I knew exactly which fonts and colors were going to be used.
the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
I could not detect reliably text changes to average frames and reduce the interference.
What I did:
- I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.
- The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.
(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).
In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.
You tried almost every standard step. I would advise you to try some PIL built-in filters like sharpness filter. Apply sharpness and contrast on the RGB image, then binarise it. Perhaps use Image.split() and Image.merge() to binarise each colour separately and then bring them back together.
Or convert your image to YUV and then use just Y channel for further processing.
Also, if you do not have a monochrome background consider performing some background substraction.
What tesseract likes when detecting scanned text is removed frames, so you can try to destroy as much of non character space from the image. (You might need to keep the picture size though, so you should replace it with white colour). Tesseract also likes straight lines. So some deskewing might be in order if your text is recorded at an angle. Tesseract also sometimes gives better results if you resize the image to twice its original size.
I suspect that Google Vision uses tesseract, or portions of it, but what other preprocessing it does for you I have no idea. So some of my advices here might actually be implemented already and doing them would be unnecessary and repetitive.
You will need to pre-process the image more than once, and use a bitwise_or operation to combine the results. To extract the colors, you could use
import cv2
boundaries = [ #BGR colorspace for opencv, *not* RGB
([15, 15, 100], [50, 60, 200]), #red
([85, 30, 2], [220, 90, 50]), #blue
([25, 145, 190], [65, 175, 250]), #yellow
]
for (low, high) in boundaries:
low = np.array(low, dtype = "uint8")
high = np.array(high, dtype = "uint8")
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, low, high)
bitWise = cv2.bitwise_and(image, image, mask=mask)
#now here is the image masked with the specific color boundary...
Once you have the masked image, you can do another bitwise_or operation on your to-be "final" image, essentially adding this mask to it.
but this specific implementation requires opencv, however the same principle applies for other image packages.
I need a little more context on this.
How many calls are you going to do to the Google Vision API? If you are doing this throughout a whole stream, you'd probably need to get a paid subscription.
What are you going to do with this data? How accurate does the OCR need to be?
Assuming you get this snapshot from another's twitch stream, dealing with the streamer's video compression and network connectivity, you're going to get pretty blurry snapshot, so OCR is going to be pretty tough.
The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR. If you are set on OCR, one approach you could try:
Binarize the image to get the non-red text in white and background black as in your binarized image:
from PIL import Image
def binarize_image(im, threshold):
"""Binarize an image."""
image = im.convert('L') # convert image to monochrome
bin_im = image.point(lambda p: p > threshold and 255)
return bin_im
im = Image.open("game_text.JPG")
binarized = binarize_image(im, 100)
Extract only the red text values with a filter, then binarize it:
import cv2
from matplotlib import pyplot as plt
lower = [15, 15, 100]
upper = [50, 60, 200]
lower = np.array(lower, dtype = "uint8")
upper = np.array(upper, dtype = "uint8")
mask = cv2.inRange(im, lower, upper)
red_binarized = cv2.bitwise_and(im, im, mask = mask)
plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB))
plt.show()
However, even with this filtering, it still doesn't extract red well.
Add images obtained in (1.) and (2.).
combined_image = binarized + red_binarized
Do OCR on (3.)
This is not a full solution but it may drive to something better.
By converting your data from BGR (or RGB) to CIE-Lab you can process a grayscale image as the weighted sum of the colour channels a* and b*.
This grayscale image will enhance colour regions of the text.
But adapting the threshold you can from this grayscale image segment the coloured word in your original image and get the other words from the a L channel thresholding.
A bitwise and operator should be enough to merge to two segmentation image.
If you can have an image with a better contrast a very last step could be a filling based on the contours.
For that take a look to RETR_FLOODFILL of the function 'cv2.findContours'.
Any other hole filing function from other package may also fit for that purpose.
Here is a code that show the first part of my idea.
import cv2
import numpy as np
from matplotlib import pyplot as plt
I = cv2.UMat(cv2.imread('/home/smile/QSKN.png',cv2.IMREAD_ANYCOLOR))
Lab = cv2.cvtColor(I,cv2.COLOR_BGR2Lab)
L,a,b = cv2.split(Lab)
Ig = cv2.addWeighted(cv2.UMat(a),0.5,cv2.UMat(b),0.5,0,dtype=cv2.CV_32F)
Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
#k = np.ones((3,3),np.float32)
#k[2,2] = 0
#k*=-1
#
#Ig = cv2.filter2D(Ig,cv2.CV_32F,k)
#Ig = cv2.absdiff(Ig,0)
#Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
_, Ib = cv2.threshold(Ig,0.,255.,cv2.THRESH_OTSU)
_, Lb = cv2.threshold(cv2.UMat(L),0.,255.,cv2.THRESH_OTSU)
_, ax = plt.subplots(2,2)
ax[0,0].imshow(Ig.get(),cmap='gray')
ax[0,1].imshow(L,cmap='gray')
ax[1,0].imshow(Ib.get(),cmap='gray')
ax[1,1].imshow(Lb.get(),cmap='gray')
import numpy as np
from skimage.morphology import selem
from skimage.filters import rank, threshold_otsu
from skimage.util import img_as_float
from PIL import ImageGrab
import matplotlib.pyplot as plt
def preprocessing(image, strelem, s0=30, s1=30, p0=.3, p1=1.):
image = rank.mean_bilateral(image, strelem, s0=s0, s1=s1)
condition = (lambda x: x>threshold_otsu(x))(rank.maximum(image, strelem))
normalize_image = rank.autolevel_percentile(image, strelem, p0=p0, p1=p1)
return np.where(condition, normalize_image, 0)
#Grab image from clipboard
image = np.array(ImageGrab.grabclipboard())
sel = selem.disk(4)
a = sum([img_as_float(preprocessing(image[:, :, x], sel, p0=0.3)) for x in range(3)])/3
fig, ax = plt.subplots(1, 2, sharey=True, sharex=True)
ax[0].imshow(image)
ax[1].imshow(rank.autolevel_percentile(a, sel, p0=.4))
This is my code for clearing text from noise and creating uniform brightness for characters.
With minor modifications, I used it to solve your problem.
I am new to OpenCV so please bear with me if my qustion seems silly to you.
I have a set of images that all have a transparent border on the left and right like you can see below:
I want to erase these borders so I thought about edge detection which would be easy to do if I could transform these transparent borders to a white color. In the Docs I found that you can do this:
img = cv2.imread("./Green/image-000.png", 1)
cv2.imwrite('../image-000.png', img)
This erases the alpha channel of the png image but turns it into black.
Is there something similar that turns the borders white?
Or is there even a simpler method of erasing these borders?
You would make me really happy if you could help me!
PS: I use Python 2.7 and OpenCV 3.4
You should load image with IMREAD_UNCHANGED, i.e.
import cv2 as cv
img = cv.imread("./Green/imgage-000.png", cv.IMREAD_UNCHANGED)
Then, your image will have 4 channels (BGRA), and you can use alpha channel mask to turn the corresponding part to white:
alpha_channel = img[:, :, 3]
_, mask = cv.threshold(alpha_channel, 254, 255, cv.THRESH_BINARY) # binarize mask
color = img[:, :, :3]
new_img = cv.bitwise_not(cv.bitwise_not(color, mask=mask))
I tested this code with a transparent PNG where the color channels were black and the information was in the transparency:
The nested bitwise_not is ugly but is the only way I found to make it work.
I have seen an image of a girl which is made up of multiple images of her.So I want to achieve the same thing using a python script.(I am completely new to image processing)
I am using pil library for this script.
import sys,os
from PIL import Image
img = Image.open("DSC_0149.jpg")
pixels = img.load()
for i in range(img.size[0]):
for j in range(img.size[1]):
pixels[i,j] = (i, j, 100) # I will change this to some pic image.
img.show()
I am trying first just to change the colour of pixel retaining the pic,But this code dint work.
Can anyone guide me how to achieve it.
Edit : I want to fill the picture with multiple pictures and yet RETAIN the original picture.
Something like this : http://www.photoshopessentials.com/photo-effects/photo-fill/ but in a much better way.
So first you need to edit each pixel with this to change the color:
If it is rgb:
img.putpixel((10,15),(r,g,b))
or
faster: pixels[1, 1] = (r, g, b)
otherwise:
Is it possible to change the color of one individual pixel in Python?
After knowing how to edit each pixel you have to create a small copy of your image with a resize like this:
Copy Image:
// Not tested : Make sure it's rgb
img = Image.new( 'RGB', (img.size[0],(img.size[1]), "black") # create a new black image
pixels = img.load() # create the pixel map
for i in range(img.size[0]): # for every pixel:
for j in range(img.size[1]):
pixels[i,j] = other_image_pixel[i,j] # set the colour accordingly
https://opensource.com/life/15/2/resize-images-python
Apply a Color filter to each small image to match the area color you will replace with this image.
The best way to understand the whole process is to take time to read this code in the same language, it's around 200 lines:
https://github.com/codebox/mosaic
Hope it solve your problems
Is there any way to make an image half transparent?
the pseudo code is something like this:
from PIL import Image
image = Image.open('image.png')
image = alpha(image, 0.5)
I googled it for a couple of hours but I can't find anything useful.
I realize this question is really old, but with the current version of Pillow (v4.2.1), there is a function called putalpha. It seems to work fine for me. I don't know if will work for every situation where you need to change the alpha, but it does work. It sets the alpha value for every pixel in the image. It seems, though that you can use a mask: http://www.leancrew.com/all-this/2013/11/transparency-with-pil/.
Use putalpha like this:
from PIL import Image
img = Image.open(image)
img.putalpha(127) # Half alpha; alpha argument must be an int
img.save(dest)
Could you do something like this?
from PIL import Image
image = Image.open('image.png') #open image
image = image.convert("RGBA") #convert to RGBA
rgb = image.getpixel(x,y) #Get the rgba value at coordinates x,y
rgb[3] = int(rgb[3] / 2) or you could do rgb[3] = 50 maybe? #set alpha to half somehow
image.putpixel((x,y), rgb) #put back the modified reba values at same pixel coordinates
Definitely not the most efficient way of doing things but it might work. I wrote the code in browser so it might not be error free but hopefully it can give you an idea.
EDIT: Just noticed how old this question was. Leaving answer anyways for future help. :)
I put together Pecan's answer and cr333's question from this question:
Using PIL to make all white pixels transparent?
... and came up with this:
from PIL import Image
opacity_level = 170 # Opaque is 255, input between 0-255
img = Image.open('img1.png')
img = img.convert("RGBA")
datas = img.getdata()
newData = []
for item in datas:
newData.append((0, 0, 0, opacity_level))
else:
newData.append(item)
img.putdata(newData)
img.save("img2.png", "PNG")
In my case, I have text with black background and wanted only the background semi-transparent, in which case:
from PIL import Image
opacity_level = 170 # Opaque is 255, input between 0-255
img = Image.open('img1.png')
img = img.convert("RGBA")
datas = img.getdata()
newData = []
for item in datas:
if item[0] == 0 and item[1] == 0 and item[2] == 0:
newData.append((0, 0, 0, opacity_level))
else:
newData.append(item)
img.putdata(newData)
img.save("img2.png", "PNG")
I had an issue, where black boxes were appearing around my image when applying putalpha().
This workaround (applying alpha in a copied layer) solved it for me.
from PIL import Image
with Image.open("file.png") as im:
im2 = im.copy()
im2.putalpha(180)
im.paste(im2, im)
im.save("file2.png")
Explanation:
Like I said, putalpha modifies all pixels by setting their alpha value, so fully transparent pixels become only partially transparent. The code I posted above first sets (putalpha) all pixels to semi-transparent in a copy, then copies (paste) all pixels to the original image using the original alpha values as a mask. This means that fully transparent pixels in the original image are skipped during the paste.
Credit: https://github.com/nulano # https://github.com/python-pillow/Pillow/issues/4687#issuecomment-643567573
I just did this by myself...even though my code maybe a little bit weird...But it works fine. So I share it here. Hopes it could help anybody. =)
The idea: To transparent a pic means lower alpha which is the 4th element in the tuple.
my frame code:
from PIL import Image
img=open(image)
img=img.convert('RGBA') #you can make sure your pic is in the right mode by check img.mode
data=img.getdata() #you'll get a list of tuples
newData=[]
for a in data:
a=a[:3] #you'll get your tuple shorten to RGB
a=a+(100,) #change the 100 to any transparency number you like between (0,255)
newData.append(a)
img.putdata(newData) #you'll get your new img ready
img.save(filename.filetype)
I didn't find the right command to fulfil this job automatically, so I write this by myself. Hopes it'll help again. XD
This method helps to reduce opacity of logo with transparency before pasting it over image
# pip install Pillow
# PIL.__version__ is 9.3.0
from PIL import Image, ImageEnhance
im = Image.open('logo.png').convert('RGBA')
alpha = im.split()[3]
alpha = ImageEnhance.Brightness(alpha).enhance(.5)
im.putalpha(alpha)