How to convert YUV_420_888 to BGR using opencv python? - python

I have three ndarray which is Y.shape(307200,) U.shape(153599,) V.shape(153599,). what is the efficient way to convert this to BGR using opencv python? Those array are in YUV_420_888
format.
my_image which is 640*640
My code is
Y= np.fromstring(Y, dtype=np.uint8)
U= np.fromstring(U, dtype=np.uint8)
V= np.fromstring(V, dtype=np.uint8)
Y= np.reshape(Y, (480,640))
U= np.reshape(U, (480,320))
V= np.reshape(V, (480,320))
YUV = np.append(Y,U)
YUV = np.append(YUV,V)
img = np.reshape(YUV,(960,640))
img = np.asarray(img, dtype = np.uint8)
img = cv2.cvtColor(img, cv2.COLOR_YUV2BGR_NV21)

Updated Answer
The information here tells me that an Android NV21 image is stored with all the Y (Luminance) values contiguously and sampled at the full resolution followed by the V and the U samples interleaved and stored at 1/4 the resolution (1/2 the height by 1/2 the width). I have created a dummy NV21 frame below and converted it into OpenCV BGR format and that confirms the layout and the way OpenCV interprets it too. All the code below works in order from top to bottom, so just remove the images and squidge all the lines up together to make a Python script:
#!/usr/bin/env python3
import cv2
import numpy as np
# Define width and height of image
w,h = 640,480
# Create black-white gradient from top to bottom in Y channel
f = lambda i, j: int((i*256)/h)
Y = np.fromfunction(np.vectorize(f), (h,w)).astype(np.uint8)
# DEBUG
cv2.imwrite('Y.jpg',Y)
That gives Y:
# Dimensions of subsampled U and V
UVwidth, UVheight = w//2, h//2
# U is a black-white gradient from left to right
f = lambda i, j: int((j*256)/UVwidth)
U = np.fromfunction(np.vectorize(f), (UVheight,UVwidth)).astype(np.uint8)
# DEBUG
cv2.imwrite('U.jpg',U)
That gives U:
# V is a white-black gradient from left to right
V = U[:,::-1]
# DEBUG
cv2.imwrite('V.jpg',V)
That gives V:
# Interleave U and V, V first NV21, U first for NV12
U = np.ravel(U)
V = np.ravel(V)
UV = np.empty((U.size+V.size), dtype=np.uint8)
UV[0::2] = V
UV[1::2] = U
# Lay out Y plane, followed by UV
YUV = np.append(Y,UV).reshape((3*h)//2,w)
BGR = cv2.cvtColor(YUV.astype(np.uint8), cv2.COLOR_YUV2BGR_NV21)
cv2.imwrite('result.jpg',BGR)
Which gives this. Hopefully you can see how that is the correct RGB representation of the individual Y, U, and V components.
So, in summary, I believe a 2x2 image in NV21 image is stored with interleaved VU, like this:
Y Y Y Y V U V U
and a 2x2 NV12 image is stored with interleaved UV, like this:
Y Y Y Y U V U V
and a YUV420 image (Raspberry Pi) is stored fully planar:
Y Y Y Y U U V V
Original Answer
I don't have your data to test with and your question is missing some details, but I see no-one is answering you after 5 hours, so I'll try and get you started... no-one said answers have to be complete.
Firstly, I guess from your Y.shape(307200) that your image is 640x480 pixels, correct?
Secondly, your U.shape(153599) and V.shape(153599) look incorrect - they should be exactly half the Y.shape since they are sampled down at a rate of 2:1.
Once you have got that sorted out, I think you need to take your Y array and append the U array, then the V array so you have one single contiguous array. You then need to pass that to cvtColor() with the code cv2.CV_YUV2BGR_NV21.
You may need to reshape your array before appending, something like im = Y.reshape(480,640).
I know when you use the C++ interface to OpenCV, you must set the height of the image to 1.5x the actual height (whilst leaving the width unchanged) - so you may need to try that too.
I can never remember all the constants OpenCV provides for image opening modes (like IMREAD_ANYDEPTH, IMREAD_GRAYSCALE) and for cvtColor(), so here's a handy way of finding them. I start ipython and if am looking for the Android NV21 constants, I do:
import cv2
[i for i in dir(cv2) if 'NV21' in i]
Out[29]:
['COLOR_YUV2BGRA_NV21',
'COLOR_YUV2BGR_NV21',
'COLOR_YUV2GRAY_NV21',
'COLOR_YUV2RGBA_NV21',
'COLOR_YUV2RGB_NV21']
So the constant you need is probably COLOR_YUV2BGR_NV21
The same technique works for parameters to imread():
items=[i for i in dir(cv2) if i.startswith('IMREAD')]
In [22]: items
['IMREAD_ANYCOLOR',
'IMREAD_ANYDEPTH',
'IMREAD_COLOR',
'IMREAD_GRAYSCALE',
'IMREAD_IGNORE_ORIENTATION',
'IMREAD_LOAD_GDAL',
'IMREAD_REDUCED_COLOR_2',
'IMREAD_REDUCED_COLOR_4',
'IMREAD_REDUCED_COLOR_8',
'IMREAD_REDUCED_GRAYSCALE_2',
'IMREAD_REDUCED_GRAYSCALE_4',
'IMREAD_REDUCED_GRAYSCALE_8',
'IMREAD_UNCHANGED']

Related

Implementing from scratch cv2.warpPerspective()

I was making some experimentations with the OpenCV function cv2.warpPerspective when I decided to code it from scratch to better understand its pipeline. Though I followed (hopefully) every theoretical step, it seems I am still missing something and I am struggling a lot to understand what. Could you please help me?
SRC image (left) and True DST Image (right)
Output of the cv2.warpPerspective overlapped on the True DST
# Invert the homography SRC->DST to DST->SRC
hinv = np.linalg.inv(h)
src = gray1
dst = np.zeros(gray2.shape)
h, w = src.shape
# Remap back and check the domain
for ox in range(h):
for oy in range(w):
# Backproject from DST to SRC
xw, yw, w = hinv.dot(np.array([ox, oy, 1]).T)
# cv2.INTER_NEAREST
x, y = int(xw/w), int(yw/w)
# Check if it falls in the src domain
c1 = x >= 0 and y < h
c2 = y >= 0 and y < w
if c1 and c2:
dst[x, y] = src[ox, oy]
cv2.imshow(dst + gray2//2)
Output of my code
PS: The output images are the overlapping of Estimated DST and the True DST to better highlight differences.
Your issue amounts to a typo. You mixed up the naming of your coordinates. The homography assumes (x,y,1) order, which would correspond to (j,i,1).
Just use (x, y, 1) in the calculation, and (xw, yw, w) in the result of that (then x,y = xw/w, yw/w). the w factor mirrors the math, when formulated properly.
Avoid indexing into .shape. The indices don't "speak". Just do (height, width) = src.shape[:2] and use those.
I'd recommend to fix the naming scheme, or define it up top in a comment. I'd recommend sticking with x,y instead of i,j,u,v, and then extend those with prefixes/suffixes for the space they're in ("src/dst/in/out"). Perhaps something like ox,oy for iterating, just xw,yw,w for the homography result, which turns into x,y via division, and ix,iy (integerized) for sampling in the input? Then you can use dst[oy, ox] = src[iy, ix]

Numpy: Efficient mapping of single values in np arrays

I have a 3D numpy array respresenting an image in HSV color space (shape = (h=1000, w=3000, 3)).
The last dimension of the image is [H,S, V]. I want to subtract 20 from the H channel from all the pixels IF the pixel value is >20 , but leave S and V intact.
I wrote the following vectorized function:
def sub20(x):
# x is a array in the format [H,S, V]
return np.uint8([H-20, S, V])
vec= np.vectorize(sub20, otypes=[np.uint8],signature="(i)->(i)")
img2= vec(img1)
What this vectorised function does is to accept the last dimension of the image [H,S,V] and output
[H-20, S, V]
I dont know how to make it subtract 20 if H is greater than 20. it also takes 1 minute to execute. I want the script to accept live webcam feed. Is there any way to make it faster?
Thanks
You can simply slice with condition:
img1[:,:,0][img1[:,:,0]>=20] -= 20
Or also make use of np.where:
img1[:,:,0] = np.where(img1[:,:,0]>=20, img1[:,:,0]-20, img1[:,:,0])
Do you need to use the vectorize function?
Otherwise you could only use the following command:
# if you want to make change directly on same image.
img1[:,:,0] -= 20
# if you want to leave img1 in the same state.
img2 = np.array(img1)
img2[:,:,0] = img1[:,:,0] - 20
Update (12:08 - 5.4.2020)
To incorporate that values never get below 0 I would recommend to compute it in two steps as Mercury mentioned:
# if you want to make changes directly on same image.
img1[:,:,0] -= 20
img1[img1[:,:,0] < 0] = 0
# if you want to leave img1 in the same state.
img2 = np.array(img1)
img2[:,:,0] = img2[:,:,0] - 20
img2[img2[:,:,0] < 0] = 0

How do I place a m*n array within a M*N array?

I'm trying to stitch together 30px*30px images into a 3000px*3000px image. If there is a better way than what I'm describing could you please let me know.
I've created BigArr = np.zeros((3000,3000)). And I have my image arrays (which have dimensions 30px*30px). I want to place the first image so that it occupies all the space between BigArr[0][0] and BigArr[29][29].
Is there an easy way to do this? Is there an even easier way to do what I'm trying to do overall?
Edit: The second image should occupy [0][30] -> [59][29], etc.
Assuming you have exactly as many images as needed to completely fill your BigArr and your images can be stored in some list, you can create a nested list with the desired grid and use np.block to generate such an image:
from matplotlib import pyplot as plt
import numpy as np
# Image sizes (width x height)
img_w, img_h = (40, 30)
f_img_w, f_img_h = (200, 300)
# Number of rows / columns in final image
r = np.int32(f_img_h / img_h)
c = np.int32(f_img_w / img_w)
# Generate some images (stored in N-dimensional array)
imgs = np.ones((img_h, img_w, r * c), np.uint8) * 5 * (np.arange(r * c) + 1)
# Assumed starting point of procedure: All images stored in list
imgs_list = [imgs[:, :, i] for i in np.arange(imgs.shape[2])]
# Generate nested list with desired grid
imgs_nested_list = [[imgs_list[y * c + x] for x in np.arange(c)] for y in np.arange(r)]
# Generate desired image
big_arr = np.block(imgs_nested_list)
plt.figure(1)
plt.imshow(big_arr, vmin=0, vmax=255)
plt.show()
The output from the example then would look like this:
Hope that helps!
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.8.1
Matplotlib: 3.2.0rc3
NumPy: 1.18.1
----------------------------------------

Numpy image slicing returning black patches/ wrong values

The end goal is to take an image and slice it up into samples that I save. The problem is that my slices are randomly returning black/ incorrect patches. Bellow is a small sample program.
import scipy.ndimage as ndimage
import scipy.misc as misc
import numpy as np
image32 = misc.imread("work0.png")
patches = np.zeros((36, 8, 8))
for i in range(4):
for j in range(4):
patches[i*4 + j] = image32[i:i+8,j:j+8]
misc.imsave("{0}{1}.png".format(i,j), patches[i*4 + j])
An example of my image would be:
Patch of 0,0 of 8x8 patch yields:
Two things:
You are initializing your patch matrix to be the wrong data type. By default, numpy will make patches matrix a np.float64 type and if you use this with saving, you won't get the results you would expect. Specifically, if you consult Mr. F's answer, there is actually some scaling performed on floating-point images where the minimum and maximum values of the image get scaled to black and white respectively and so if you have an image that is completely uniform in background, both the minimum and maximum will be the same and will get visualized to black. As such, the best thing is to respect the original image's data type, namely setting the dtype of your patches matrix to np.uint8.
Judging from your for loop indexing, you want to extract out 8 x 8 patches that are non-overlapping. This means that if you have a 32 x 32 image with 8 x 8 patches, you have 16 patches in total arranged in a 4 x 4 grid.
Therefore, you need to change the patches statement so that it has 16 in the first dimension, not 36. In addition, you'll have to adjust the way you're indexing into your image to extract out the 8 x 8 patches because right now, the patches are overlapping. Specifically, you want to make the image patch indexing go from 8*i to 8*(i+1) for the rows and 8*j to 8*(j+1) for the columns. If you substitute sample values of i and j yourself, you'll see that we get unique 8 x 8 patches for each grid in your image.
With both of the above things I noted, the modified code should be:
import scipy.ndimage as ndimage
import scipy.misc as misc
import numpy as np
image32 = misc.imread('work0.png')
patches = np.zeros((16,8,8), dtype=np.uint8) # Change
for i in range(4):
for j in range(4):
patches[i*4 + j] = image32[8*i:8*(i+1),8*j:8*(j+1)] # Change
misc.imsave("{0}{1}.png".format(i,j), patches[i*4 + j])
When I do this and take a look at the output images, I get what I expect.
To be absolutely sure, let's plot the segments using matplotlib. You've conveniently saved all of the patches in patches so it shouldn't be a problem showing what we need. However, I'll place some code in comments so that you can read in the images that were saved from disk with your above code so you can verify that it still works, regardless of looking at patches or the images on disk:
import matplotlib.pyplot as plt
plt.figure()
for i in range(4):
for j in range(4):
plt.subplot(4, 4, 4*i + j + 1)
img = patches[4*i + j]
# or you can do this:
# img = misc.imread('{0}{1}.png'.format(i,j))
img = np.dstack([img, img, img])
plt.imshow(img)
plt.show()
The weird thing about matplotlib.pyplot.imshow is that if you have an image that is single channel (such as your case) that has the same intensity all around, it gets visualized to black no matter what the colour map is, much like what we experienced with imsave. Therefore, I had to artificially make this a RGB image but with all of the channels to be the same so this gets visualized as grayscale before we show the image.
We get:
According to this answer the issue is that imsave normalizes the data so that the computed minimum is defined as black (and, if there is a distinct maximum, that is defined as white).
This led me to go digging as to why the suggested use of uint8 did work to create the desired output. As it turns out, in the source there is a function called bytescale that gets called internally.
Actually, imsave itself is a very thin wrapper around toimage followed by save (from the image object). Inside of toimage if mode is None (which it is by default), that's when bytescale gets invoked.
It turns out that bytescale has an if statement that checks for the uint8 data type, and if the data is in that format, it returns the data unaltered. But if not, then the data is scaled according to a max and min transformation (where 0 and 255 are the default low and high pixel values to compare to).
This is the full snippet of code linked above:
if data.dtype == uint8:
return data
if high < low:
raise ValueError("`high` should be larger than `low`.")
if cmin is None:
cmin = data.min()
if cmax is None:
cmax = data.max()
cscale = cmax - cmin
if cscale < 0:
raise ValueError("`cmax` should be larger than `cmin`.")
elif cscale == 0:
cscale = 1
scale = float(high - low) / cscale
bytedata = (data * 1.0 - cmin) * scale + 0.4999
bytedata[bytedata > high] = high
bytedata[bytedata < 0] = 0
return cast[uint8](bytedata) + cast[uint8](low)
For the blocks of your data that are all 255, cscale will be 0, which will be checked for and changed to 1. Then the line
bytedata = (data * 1.0 - cmin) * scale + 0.4999
will result in the whole image block having the float value of 0.4999, thus set explicitly to 0 in the next chunk of code (when casted to uint8 from float) as for example:
In [102]: np.cast[np.uint8](0.4999)
Out[102]: array(0, dtype=uint8)
You can see in the body of bytescale that there are only two possible ways to return: either your data is type uint8 and it's returned as-is, or else it goes through this kind of silly scaling process. So in the end, it is indeed correct, and good practice, to be using uint8 for the pieces of your code that specifically load from or save to an image format via these functions.
So this cascade of stuff is why you were getting all zeros in the outputted image file and why the other suggestion of using dtype=np.uint8 actually helps you. It's not because you need to avoid floating point data for images, just because of this bizarre convention to check and scale data on the part of imsave.

Numpy manipulating array of True values dependent on x/y index

So I have an array (it's large - 2048x2048), and I would like to do some element wise operations dependent on where they are. I'm very confused how to do this (I was told not to use for loops, and when I tried that my IDE froze and it was going really slow).
Onto the question:
h = aperatureimage
h[:,:] = 0
indices = np.where(aperatureimage>1)
for True in h:
h[index] = np.exp(1j*k*z)*np.exp(1j*k*(x**2+y**2)/(2*z))/(1j*wave*z)
So I have an index, which is (I'm assuming here) essentially a 'cropped' version of my larger aperatureimage array. *Note: Aperature image is a grayscale image converted to an array, it has a shape or text on it, and I would like to find all the 'white' regions of the aperature and perform my operation.
How can I access the individual x/y values of index which will allow me to perform my exponential operation? When I try index[:,None], leads to the program spitting out 'ValueError: broadcast dimensions too large'. I also get array is not broadcastable to correct shape. Any help would be appreciated!
One more clarification: x and y are the only values I would like to change (essentially the points in my array where there is white, z, k, and whatever else are defined previously).
EDIT:
I'm not sure the code I posted above is correct, it returns two empty arrays. When I do this though
index = (aperatureimage==1)
print len(index)
Actually, nothing I've done so far works correctly. I have a 2048x2048 image with a 128x128 white square in the middle of it. I would like to convert this image to an array, look through all the values and determine the index values (x,y) where the array is not black (I only have white/black, bilevel image didn't work for me). I would then like to take all the values (x,y) where the array is not 0, and multiply them by the h[index] value listed above.
I can post more information if necessary. If you can't tell, I'm stuck.
EDIT2: Here's some code that might help - I think I have the problem above solved (I can now access members of the array and perform operations on them). But - for some reason the Fx values in my for loop never increase, it loops Fy forever....
import sys, os
from scipy.signal import *
import numpy as np
import Image, ImageDraw, ImageFont, ImageOps, ImageEnhance, ImageColor
def createImage(aperature, type):
imsize = aperature*8
middle = imsize/2
im = Image.new("L", (imsize,imsize))
draw = ImageDraw.Draw(im)
box = ((middle-aperature/2, middle-aperature/2), (middle+aperature/2, middle+aperature/2))
import sys, os
from scipy.signal import *
import numpy as np
import Image, ImageDraw, ImageFont, ImageOps, ImageEnhance, ImageColor
def createImage(aperature, type):
imsize = aperature*8 #Add 0 padding to make it nice
middle = imsize/2 # The middle (physical 0) of our image will be the imagesize/2
im = Image.new("L", (imsize,imsize)) #Make a grayscale image with imsize*imsize pixels
draw = ImageDraw.Draw(im) #Create a new draw method
box = ((middle-aperature/2, middle-aperature/2), (middle+aperature/2, middle+aperature/2)) #Bounding box for aperature
if type == 'Rectangle':
draw.rectangle(box, fill = 'white') #Draw rectangle in the box and color it white
del draw
return im, middle
def Diffraction(aperaturediameter = 1, type = 'Rectangle', z = 2000000, wave = .001):
# Constants
deltaF = 1/8 # Image will be 8mm wide
z = 1/3.
wave = 0.001
k = 2*pi/wave
# Now let's get to work
aperature = aperaturediameter * 128 # Aperaturediameter (in mm) to some pixels
im, middle = createImage(aperature, type) #Create an image depending on type of aperature
aperaturearray = np.array(im) # Turn image into numpy array
# Fourier Transform of Aperature
Ta = np.fft.fftshift(np.fft.fft2(aperaturearray))/(len(aperaturearray))
# Transforming and calculating of Transfer Function Method
H = aperaturearray.copy() # Copy image so H (transfer function) has the same dimensions as aperaturearray
H[:,:] = 0 # Set H to 0
U = aperaturearray.copy()
U[:,:] = 0
index = np.nonzero(aperaturearray) # Find nonzero elements of aperaturearray
H[index[0],index[1]] = np.exp(1j*k*z)*np.exp(-1j*k*wave*z*((index[0]-middle)**2+(index[1]-middle)**2)) # Free space transfer for ap array
Utfm = abs(np.fft.fftshift(np.fft.ifft2(Ta*H))) # Compute intensity at distance z
# Fourier Integral Method
apindex = np.nonzero(aperaturearray)
U[index[0],index[1]] = aperaturearray[index[0],index[1]] * np.exp(1j*k*((index[0]-middle)**2+(index[1]-middle)**2)/(2*z))
Ufim = abs(np.fft.fftshift(np.fft.fft2(U))/len(U))
# Save image
fim = Image.fromarray(np.uint8(Ufim))
fim.save("PATH\Fim.jpg")
ftfm = Image.fromarray(np.uint8(Utfm))
ftfm.save("PATH\FTFM.jpg")
print "that may have worked..."
return
if __name__ == '__main__':
Diffraction()
You'll need numpy, scipy, and PIL to work with this code.
When I run this, it goes through the code, but there is no data in them (everything is black). Now I have a real problem here as I don't entirely understand the math I'm doing (this is for HW), and I don't have a firm grasp on Python.
U[index[0],index[1]] = aperaturearray[index[0],index[1]] * np.exp(1j*k*((index[0]-middle)**2+(index[1]-middle)**2)/(2*z))
Should that line work for performing elementwise calculations on my array?
Could you perhaps post a minimal, yet complete, example? One that we can copy/paste and run ourselves?
In the meantime, in the first two lines of your current example:
h = aperatureimage
h[:,:] = 0
you set both 'aperatureimage' and 'h' to 0. That's probably not what you intended. You might want to consider:
h = aperatureimage.copy()
This generates a copy of aperatureimage while your code simply points h to the same array as aperatureimage. So changing one changes the other.
Be aware, copying very large arrays might cost you more memory then you would prefer.
What I think you are trying to do is this:
import numpy as np
N = 2048
M = 64
a = np.zeros((N, N))
a[N/2-M:N/2+M,N/2-M:N/2+M]=1
x,y = np.meshgrid(np.linspace(0, 1, N), np.linspace(0, 1, N))
b = a.copy()
indices = np.where(a>0)
b[indices] = np.exp(x[indices]**2+y[indices]**2)
Or something similar. This, in any case, sets some values in 'b' based on the x/y coordinates where 'a' is bigger than 0. Try visualizing it with imshow. Good luck!
Concerning the edit
You should normalize your output so it fits in the 8 bit integer. Currently, one of your arrays has a maximum value much larger than 255 and one has a maximum much smaller. Try this instead:
fim = Image.fromarray(np.uint8(255*Ufim/np.amax(Ufim)))
fim.save("PATH\Fim.jpg")
ftfm = Image.fromarray(np.uint8(255*Utfm/np.amax(Utfm)))
ftfm.save("PATH\FTFM.jpg")
Also consider np.zeros_like() instead of copying and clearing H and U.
Finally, I personally very much like working with ipython when developing something like this. If you put the code in your Diffraction function in the top level of your script (in place of 'if __ name __ &c.'), then you can access the variables directly from ipython. A quick command like np.amax(Utfm) would show you that there are indeed values!=0. imshow() is always nice to look at matrices.

Categories