Create depth map image as 24-bit (Carla) - python

I have a depth map encoded in 24 bits (labeled "Original").
With the code below:
carla_img = cv.imread('carla_deep.png', flags=cv.IMREAD_COLOR)
carla_img = carla_img[:, :, :3]
carla_img = carla_img[:,:,::-1]
gray_depth = ((carla_img[:,:,0] + carla_img[:,:,1] * 256.0 + carla_img[:,:,2] * 256.0 * 256.0)/((256.0 * 256.0 * 256.0) - 1))
gray_depth = gray_depth * 1000
I am able to convert it as in the "Converted" image.
As shown here: https://carla.readthedocs.io/en/latest/ref_sensors/
How can I reverse this process (Without using any larger external libraries and using at most openCV)? In Python I create a depth map with the help of OpenCV. I wanted to save the obtained depth map in the form of Carla (24-bit).
This is how I create depth map:
imgL = cv.imread('leftImg.png',0)
imgR = cv.imread('rightImg.png',0)
stereo = cv.StereoBM_create(numDisparities=128, blockSize=17)
disparity = stereo.compute(imgL,imgR)
CameraFOV = 120
Focus_length = width /(2 * math.tan(CameraFOV * math.pi / 360))
camerasBaseline = 0.3
depthMap = (camerasBaseline * Focus_length) / disparity
How can I save the obtained depth map in the same form as in the picture marked "Original"?

Docs say:
normalized = (R + G * 256 + B * 256 * 256) / (256 * 256 * 256 - 1)
in_meters = 1000 * normalized
So if you have a depth map in_meters, you do the reverse, by rearranging the equations.
You need to make sure your depth map (from block matching) is in units of meters. Your calculations there look sensible, assuming your cameras have a baseline of 0.3 meters.
First variant
take the calculation apart, using division and modulo operations.
Various .astype are required to turn floats into integers, and wider integers into narrow integers (assumption for pictures).
normalized = in_meters / 1000
BGR = (normalized * (2**24-1)).astype(np.uint32)
BG,R = np.divmod(BGR, 2**8)
B,G = np.divmod(BG, 2**8)
carla_img = np.dstack([B,G,R]).astype(np.uint8) # BGR order
Second variant
One could also do this with a view, reinterpreting the uint32 data as four uint8 values. This assumes a little endian system, which is a fair assumption but one needs to be aware of it.
...
reinterpreted = BGR.view(np.uint8) # lowest byte first, i.e. order is RGBx
reinterpreted.shape = BGR.shape + (4,) # np.view refuses to add a dimension
carla_img = reinterpreted[:,:,(2,1,0)] # select BGR
# this may require a .copy() to get data without holes (OpenCV may want this)
Disclaimer
I could not test the code in this answer because you haven't provided usable data.

Related

Append an array to an image in python

I'm trying to create a piece of data for a CNN in tensorflow. The image corresponds to a state in my environment. I'd like to take the state (the array) and append it to the image for model input.
Some conditions are, that the original structure of the image needs to be intact. So when I convert the array to an image, I can see the original image plus the appended array. The final array can have any number of rows, but the number of columns and channels cannot change.
So in other words, I'd like to reshape an array of max_length (say, 2,000,000) a long a matrix of n rows and 1920 columns and 3 channels. Padding with 0s if necessary, and then append that to an image of shape (1080, 1920, 3). I have a feeling my approach is more complicated than it needs to be.
My attempt at doing this was to convert the state and image to a list, for faster array transformations. Add them together sequentially and then reshape them into an numpy array. I use padding to constrain the array to 'n rows x 1920 x 3'.
The variable max_length is the max length of the padding.
input_space is the state or array I want to append to the image. It's length can be anywhere between 1 and max_length which is why I use padding. So the state size isn't changing per loop.
The code described under aspect ratio, is used for more padding to ensure we constrain the array to required dimensions.
def get_state(input_space, img):
img = list(np.concatenate(img).flat)
pad_length = (max_length - len(input_space))
state = img + input_space + [0] * pad_length
# get pad length for 1920 aspect ratio
aspect_ratio = (len(state) / 3) / 1920
remainder = aspect_ratio - np.fix(aspect_ratio)
aspect_ratio = round(((1 - remainder) * 1920) * 3)
# pad state to shape into 1920 x n rows and 3 channels
state = state + [0] * aspect_ratio
rows = round((len(state)/3)/1920)
return np.reshape(state, (rows, 1920, 3)).T
The function successfully produces an array constrained too n rows, 1920 columns and 3 channels, except when I go to preview the array as an image the original image structure is lost.
Is there a better way to approach this? I should add that performance is important because the function runs in a loop.
Ok so I put together something that works. Instead of appending together the image and state to start, and then reshaping them in the desired shape, I reshape the state and then just append it to the image. Seems to work out.
def get_state(input_space, img):
pad_length = (max_length - len(input_space))
state = input_space + [0] * pad_length
# get pad length for 1920 aspect ratio
aspect_ratio = (len(state) / 3) / 1920
remainder = aspect_ratio - np.fix(aspect_ratio)
aspect_ratio = round(((1 - remainder) * 1920) * 3)
# pad state to shape into 1920 x n rows and 3 channels
state = state + [0] * aspect_ratio
rows = round((len(state)/3)/1920)
x = list(np.reshape(state, (3, 1920, rows)).T)
state = list(img) + x
return np.array(state)

Shift numpy array data to make it only grow

I have numpy array named data_saw that contains thousands of float numbers.
When visualized, it looks like this
My task is to shift down each element before gap by 180 (because each gap range here is around 180), so I will get continuous growing only line without gaps
I've ended up with looping through array and checking for a gap at every index (the last element is for comparison only and is not needed in further calculations, so it is just deleted after alignment):
for i in range(1, len(data_saw)):
if data_saw[i - 1] > data_saw[i]:
data_saw[:i] -= 180
data_saw = np.delete(data_saw, -1)
Trying to find out if there are more correct ways to do this with numpy array. Are there any?
np.diff will tell you when the gap is bigger than some threshold
mask = np.diff(data_saw) < -90
To make mask the same size as data_saw, prepend a zero, because the result of diff is always smaller than the input by one element. For what I have in mind, you'll also want to convert to an integer type:
offset = np.concatenate(([0], mask)).cumsum()
To normalize the data, just add 180 * offset plus some arbitrary bias:
data_fixed = data_saw + 180 * offset
To keep the last segment at its original value:
data_fixed = data_saw + 180 * (offset - offset[1])
To keep the second segment as-is:
data_fixed = data_saw + 180 * (offset - offset[-1])
You can use a similar method to adjust data not only with arbitrary numbers of gaps, but even arbitrary gap sizes above some threshold.
First, compute the indices corresponding to the orignal mask using np.flatnonzero:
delta = np.diff(data_saw)
indices = np.flatnonzero(delta < -90)
Now you can simply fill in the bad elements of delta, for example with the average of the two surrounding elements:
delta[indices] = 0.5 * (delta[indices - 1] + delta[indices + 1])
The fixed data is the cumulative sum (with a zero prepended):
data_fixed = np.concatenate(([0], delta)).cumsum() + data_saw[0]
Not sure if it is more correct, but uses numpy's native methods.
import numpy as np
import matplotlib.pyplot as plt
array = np.arange(90)
array = np.concatenate([array[:30], array[30:] + 180, array[60:] + 2 * 180 + 30])
plt.plot(array)
adjusted = array - np.concatenate([[0], ((np.diff(array) >= 180).cumsum() * 180)])
plt.plot(adjusted)

Why does the output contain only 2 values but not the displacement for the entire image?

I have been stuck here for sometime now. I cannot understand what am I doing wrong in calculating the displacement vectors along x-axis and y-axis using the Lucas Kanade method.
I implemented it as given in the above Wikipedia link. Here is what I have done:
import cv2
import numpy as np
img_a = cv2.imread("./images/1.png",0)
img_b = cv2.imread("./images/2.png",0)
# Calculate gradient along x and y axis
ix = cv2.Sobel(img_a, cv2.CV_64F, 1, 0, ksize = 3, scale = 1.0/3.0)
iy = cv2.Sobel(img_a, cv2.CV_64F, 0, 1, ksize = 3, scale = 1.0/3.0)
# Calculate temporal difference between the 2 images
it = img_b - img_a
ix = ix.flatten()
iy = iy.flatten()
it = -it.flatten()
A = np.vstack((ix, iy)).T
atai = np.linalg.inv(np.dot(A.T,A))
atb = np.dot(A.T, it)
v = np.dot(np.dot(np.linalg.inv(np.dot(A.T,A)),A.T),it)
print(v)
This code runs without an error but it prints an array of 2 values! I had expected the v matrix to be of the same size as that of the image. Why does this happen? What am I doing incorrectly?
PS: I know there are methods directly available with OpenCV but I want to write this simple algorithm (as also given in the Wikipedia link shared above) myself.
To properly compute the Lucas–Kanade optical flow estimate you need to solve the system of two equations for every pixel, using information from its neighborhood, not for the image as a whole.
This is the recipe (notation refers to that used on the Wikipedia page):
Compute the image gradient (A) for the first image (ix, iy in the OP) using any method (Sobel is OK, I prefer Gaussian derivatives; note that it is important to apply the right scaling in Sobel: 1/8).
ix = cv2.Sobel(img_a, cv2.CV_64F, 1, 0, ksize = 3, scale = 1.0/8.0)
iy = cv2.Sobel(img_a, cv2.CV_64F, 0, 1, ksize = 3, scale = 1.0/8.0)
Compute the structure tensor (ATWA): Axx = ix * ix, Axy = ix * iy, Ayy = iy * iy. Each of these three images must be smoothed with a Gaussian filter (this is the windowing). For example,
Axx = cv2.GaussianBlur(ix * ix, (0,0), 5)
Axy = cv2.GaussianBlur(ix * iy, (0,0), 5)
Ayy = cv2.GaussianBlur(iy * iy, (0,0), 5)
These three images together form the structure tensor, which is a 2x2 symmetric matrix at each pixel. For a pixel at (i,j), the matrix is:
| Axx(i,j) Axy(i,j) |
| Axy(i,j) Ayy(i,j) |
Compute the temporal gradient (b) by subtracting the two images (it in the OP).
it = img_b - img_a
Compute ATWb: Abx = ix * it, Aby = iy * it, and smooth these two images with the same Gaussian filter as above.
Abx = cv2.GaussianBlur(ix * it, (0,0), 5)
Aby = cv2.GaussianBlur(iy * it, (0,0), 5)
Compute the inverse of ATWA (a symmetric positive-definite matrix) and multiply by ATWb. Note that this inverse is of the 2x2 matrix at each pixel, not of the images as a whole. You can write this out as a set of simple arithmetic operations on the images Axx, Axy, Ayy, Abx and Aby.
The inverse of the matrix ATWA is given by:
| Ayy -Axy |
| -Axy Axx | / ( Axx*Ayy - Axy*Axy )
so you can write the solution as
norm = Axx*Ayy - Axy*Axy
vx = ( Ayy * Abx - Axy * Aby ) / norm
vy = ( Axx * Aby - Axy * Abx ) / norm
If the image is natural, it will have at least a tiny bit of noise, and norm will not have zeros. But for artificial images norm could have zeros, meaning you can't divide by it. Simply adding a small value to it will avoid division by zero errors: norm += 1e-6.
The size of the Gaussian filter is chosen as a compromise between precision and allowed motion speed: a larger filter will yield less precise results, but will work with larger shifts between images.
Typically, the vx and vy is only evaluated where the two eigenvalues of the matrix ATWA are sufficiently large (if at least one is small, the result is inaccurate or possibly wrong).
Using DIPlib (disclosure: I'm an author) this is all very easy because it supports images with a matrix at each pixel. You would do this as follows:
import diplib as dip
img_a = dip.ImageRead("./images/1.png")
img_b = dip.ImageRead("./images/2.png")
A = dip.Gradient(img_a, [1.0])
b = img_b - img_a
ATA = dip.Gauss(A * dip.Transpose(A), [5.0])
ATb = dip.Gauss(A * b, [5.0])
v = dip.Inverse(ATA) * ATb

computing spectrograms of wav files & recorded sound (normalizing for volume)

I want to compare recorded audio with audio read from disk in a consistent way, but I'm running into problems with normalization for volume (otherwise amplitudes of spectrograms are different).
I also have never worked with signals, FFTs, or the WAV format ever before so this is new, uncharted territory for me. I retrieve channels as lists of signed 16bit ints sampled at 44100 Hz from both
on disk .wav files
recorded music playing from my laptop
and then I proceed through each with a window (2^k) with a certain amount of overlap. For each window like so:
# calculate window variables
window_step_size = int(self.window_size * (1.0 - self.window_overlap_ratio)) + 1
last_frame = nframes - window_step_size # nframes is total number of frames from audio source
num_windows, i = 0, 0 # calculate number of windows
while i <= last_frame:
num_windows += 1
i += window_step_size
# allocate memory and initialize counter
wi = 0 # index
nfft = 2 ** self.nextpowof2(self.window_size) # size of FFT in 2^k
fft2D = np.zeros((nfft/2 + 1, num_windows), dtype='c16') # 2d array for storing results
# for each window
count = 0
times = np.zeros((1, num_windows)) # num_windows was calculated
while wi <= last_frame:
# channel_samples is simply list of signed ints
window_samples = channel_samples[ wi : (wi + self.window_size)]
window_samples = np.hamming(len(window_samples)) * window_samples
# calculate and reformat [[[[ THIS IS WHERE I'M UNSURE ]]]]
fft = 2 * np.fft.rfft(window_samples, n=nfft) / nfft
fft[0] = 0 # apparently these are completely real and should not be used
fft[nfft/2] = 0
fft = np.sqrt(np.square(fft) / np.mean(fft)) # use RMS of data
fft2D[:, count] = 10 * np.log10(np.absolute(fft))
# sec / frame * frames = secs
# get midpt
times[0, count] = self.dt * wi
wi += window_step_size
count += 1
# remove NaNs, infs
whereAreNaNs = np.isnan(fft2D);
fft2D[whereAreNaNs] = 0;
whereAreInfs = np.isinf(fft2D);
fft2D[whereAreInfs] = 0;
# find the spectorgram peaks
fft2D = fft2D.astype(np.float32)
# the get_2D_peaks() method discretizes the fft2D periodogram array and then
# finds peaks and filters out those peaks below the threshold supplied
#
# the `amp_xxxx` variables are used for discretizing amplitude and the
# times array above is used to discretize the time into buckets
local_maxima = self.get_2D_peaks(fft2D, self.amp_threshold, self.amp_max, self.amp_min, self.amp_step_size, times, self.dt)
In particular, the crazy stuff (to me at least) happens on the line with my comment [[[[ THIS IS WHERE I'M UNSURE ]]]].
Can anyone point me in the right direction or help me to generate this audio spectrogram while normalizing for volume correctly?
A quick look tells me that you forgot to use a window, it is necessary to calculate your Spectrogram .
You need to use one Window (hamming, hann) in your "window_samples"
np.hamming(len(window_samples)) * window_samples
Then you can calculate rfft.
Edit:
#calc magnetitude from FFT
fftData=fft(windowed);
#Get Magnitude (linear scale) of first half values
Mag=abs(fftData(1:Chunk/2))
#if you want log scale R=20 * np.log10(Mag)
plot(Mag)
#calc RMS from FFT
RMS = np.sqrt( (np.sum(np.abs(np.fft(data)**2) / len(data))) / (len(data) / 2) )
RMStoDb = 20 * log10(RMS)
PS: If you want calculate RMS from FFT you cant use Window(Hann, Hamming), this line makes no sense:
fft = np.sqrt(np.square(fft) / np.mean(fft)) # use RMS of data
One simple normalization data can be done for each window:
window_samples = channel_samples[ wi : (wi + self.window_size)]
#framMax=np.max(window_samples);
framMean=np.mean(window_samples);
Normalized=window_samples/framMean;

Speed up this interpolation in python

I have an image processing problem I'm currently solving in python, using numpy and scipy. Briefly, I have an image that I want to apply many local contractions to. My prototype code is working, and the final images look great. However, processing time has become a serious bottleneck in our application. Can you help me speed up my image processing code?
I've tried to boil down our code to the 'cartoon' version below. Profiling suggests that I'm spending most of my time on interpolation. Are there obvious ways to speed up execution?
import cProfile, pstats
import numpy
from scipy.ndimage import interpolation
def get_centered_subimage(
center_point, window_size, image):
x, y = numpy.round(center_point).astype(int)
xSl = slice(max(x-window_size-1, 0), x+window_size+2)
ySl = slice(max(y-window_size-1, 0), y+window_size+2)
subimage = image[xSl, ySl]
interpolation.shift(
subimage, shift=(x, y)-center_point, output=subimage)
return subimage[1:-1, 1:-1]
"""In real life, this is experimental data"""
im = numpy.zeros((1000, 1000), dtype=float)
"""In real life, this mask is a non-zero pattern"""
window_radius = 10
mask = numpy.zeros((2*window_radius+1, 2*window_radius+1), dtype=float)
"""The x, y coordinates in the output image"""
new_grid_x = numpy.linspace(0, im.shape[0]-1, 2*im.shape[0])
new_grid_y = numpy.linspace(0, im.shape[1]-1, 2*im.shape[1])
"""The grid we'll end up interpolating onto"""
grid_step_x = new_grid_x[1] - new_grid_x[0]
grid_step_y = new_grid_y[1] - new_grid_y[0]
subgrid_radius = numpy.floor(
(-1 + window_radius * 0.5 / grid_step_x,
-1 + window_radius * 0.5 / grid_step_y))
subgrid = (
window_radius + 2 * grid_step_x * numpy.arange(
-subgrid_radius[0], subgrid_radius[0] + 1),
window_radius + 2 * grid_step_y * numpy.arange(
-subgrid_radius[1], subgrid_radius[1] + 1))
subgrid_points = ((2*subgrid_radius[0] + 1) *
(2*subgrid_radius[1] + 1))
"""The coordinates of the set of spots we we want to contract. In real
life, this set is non-random:"""
numpy.random.seed(0)
num_points = 10000
center_points = numpy.random.random(2*num_points).reshape(num_points, 2)
center_points[:, 0] *= im.shape[0]
center_points[:, 1] *= im.shape[1]
"""The output image"""
final_image = numpy.zeros(
(new_grid_x.shape[0], new_grid_y.shape[0]), dtype=numpy.float)
def profile_me():
for m, cp in enumerate(center_points):
"""Take an image centered on each illumination point"""
spot_image = get_centered_subimage(
center_point=cp, window_size=window_radius, image=im)
if spot_image.shape != (2*window_radius+1, 2*window_radius+1):
continue #Skip to the next spot
"""Mask the image"""
masked_image = mask * spot_image
"""Resample the image"""
nearest_grid_index = numpy.round(
(cp - (new_grid_x[0], new_grid_y[0])) /
(grid_step_x, grid_step_y))
nearest_grid_point = (
(new_grid_x[0], new_grid_y[0]) +
(grid_step_x, grid_step_y) * nearest_grid_index)
new_coordinates = numpy.meshgrid(
subgrid[0] + 2 * (nearest_grid_point[0] - cp[0]),
subgrid[1] + 2 * (nearest_grid_point[1] - cp[1]))
resampled_image = interpolation.map_coordinates(
masked_image,
(new_coordinates[0].reshape(subgrid_points),
new_coordinates[1].reshape(subgrid_points))
).reshape(2*subgrid_radius[1]+1,
2*subgrid_radius[0]+1).T
"""Add the recentered image back to the scan grid"""
final_image[
nearest_grid_index[0]-subgrid_radius[0]:
nearest_grid_index[0]+subgrid_radius[0]+1,
nearest_grid_index[1]-subgrid_radius[1]:
nearest_grid_index[1]+subgrid_radius[1]+1,
] += resampled_image
cProfile.run('profile_me()', 'profile_results')
p = pstats.Stats('profile_results')
p.strip_dirs().sort_stats('cumulative').print_stats(10)
Vague explanation of what the code does:
We start with a pixellated 2D image, and a set of arbitrary (x, y) points in our image that don't generally fall on an integer grid. For each (x, y) point, I want to multiply the image by a small mask centered precisely on that point. Next we contract/expand the masked region by a finite amount, before finally adding this processed sub-image to a final image, which may not have the same pixel size as the original image. (Not my finest explanation. Ah well).
I'm pretty sure that, as you said, the bulk of the calculation time happens in interpolate.map_coordinates(…), which gets called once for every iteration on center_points, here 10,000 times. Generally, working with the numpy/scipy stack, you want the repetitive task over a large array to happen in native Numpy/Scipy functions -- i.e. in a C loop over homogeneous data -- as opposed to explicitely in Python.
One strategy that might speed up the interpolation, but that will also increase the amount of memory used, is :
First, fetch all the subimages (here named masked_image) in a 3-dimensional array (window_radius x window_radius x center_points.size)
Make a ufunc (read that, it's useful) that wraps the work that has to be done on each subimage, using numpy.frompyfunc, which should return another 3-dimensional array (subgrid_radius[0] x subgrid_radius[1] x center_points.size). In short, this creates a vectorized version of the python function, that can be broadcast element-wise on an array.
Build the final image by summing over the third dimension.
Hope that gets you closer to your goals!

Categories