I'm saving a binary OpenCV Mat to a HDF5 file.
In OpenCV Mat files are stored in memory with first index channel, second index is x-Coordinate and third index is y-Coordinate, so an address access looks like:
address = M.data + M.step[0]*y + M.step[1]*x + ch
Where M.step[0] = NUM_X*NUM_CH and M.step[1] = MAX_CH
The problem I experience is, that Matlab and Python interpret the data in a wrong way.
Though the dimensions of the read data are set correctly (channel, x, y), when I look into the data storage I see, that e.g. numpy reads the data backwards, meaning first y is incremented, then x and lastly the channel number, which means, that it assumes planar configuration of the channel data, while it is actually interleaved. This results in wrong displaying of images.
Is there a way to tell numpy/Matlab to change the data access, without reordering the data?
Thanks in advance.
Edit:
I store everything in a rank 3 dataset in the hdf5 file, where dimension 1 is channel, dimension 2 is x-coordinate and dimension 3 is y-coordinate.
If I read that dataset and process it with OpenCV in C++ the correct image is being displayed. OpenCV in python doesn't work because of error: (-206) Unrecognized or unsupported array type in function cvGetMat
I could solve the question in python by changing the shape and stride of the array, which had been calculated in a wrong way:
If i had a 3*1280*720 uint8 Image with 3 being channel number, 1280 being x-coordinate and 720 being y-coordinate, I would have to assign the shape, so it would look like data.shape = (720, 1280, 3) and the stride would have to be changed to data.strides = (3*1280,3,1).
This link explains how numpy arrays work in memory:
Numpy doc
Related
Im in python, I have a list with a length 784 (which i extracted from a 28x28 image ) and now i want to arrange it in a way i can use it in tensor-flow to be used with a trained model but the problem is that the NN needs to have a 28,28 shaped array but my current array is of shape (784,)
I tried to use some for loops for this but i have no luck in successfully creating a system to carry this out , please help
i figured out that i need to use this structure
for i in range(res):
for a in range(0,res):
mnistFormat.append(grascalePix[a]) #mnistFormat is an innitially empty list
#and grascale has my 784 grayscale pixles
but i cant figure out what should go in the range function of the for loop to make this possible
For example lets say i have a sample 4x4 image's pixel list >
grayscalePix = [255,255,255,255,255,100,83,200,255,50,60,255,30,1,46,255]
this is a Row by Row representation , which means the first 4 elements are the first ---- row
i want to arrange them into a list of shape (4,4)
mnistFormat = [255,255,255,255],[255,100,83,200],[255,50,60,255],[30,1,46,255]
Just keep in mind this is sample , the real data is 784 elements long and i dont have much experince in numpy
numpy might help you there very easily:
mnistFormat = np.array(grascalePix).reshape((np.sqrt(len()),-1))
I have a 4-D numpy array, with the first dimension representing the number of images in a data set, the second and third being the (equal) width and height, and the 4th being the number of channels (3). For example let's say I have 4 color images that are 28*28, so my image data looks like this:
X = np.reshape(np.arange(4*28*28*3), (4,28,28,3))
I would like to select a random 16*16 width x height crop of each of the 4 images. Critically, I want the crop to be different per-image, i.e I want to generate 4 random (x_offset, y_offset) pairs. In the end I want access to an array of shape (4, 16, 16, 3).
If I were to write this in a for loop it would look something like this:
x = np.random.randint(0,12,4)
y = np.random.randint(0,12,4)
for i in range(X.shape[0]):
cropped_image = X[i, x[i]:x[i]+16, y[i]:y[i]+16, :]
#Add cropped image to a list or something
But I'd like to do it as efficiently as possible and I'm wondering if there's a way to do it with strides and fancy indexing. I've seen the answers to this question, but can't quite wrap my head around how I might combine something like stride_tricks with random starting points for the strides on the second and third (width and height) axes.
Leverage strided-based method for efficient patch extraction
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows that would be merely views into the input array and hence incur no extra memory overhead and virtually free! We can surely use np.lib.stride_tricks.as_strided directly, but the setup work required is hard to manage especially on arrays with higher dimensions. If scikit-image is not available, we can directly use the source code that works standalone.
Explanation on usage of view_as_windows
The idea with view_as_windows is that we feed in the input arg window_shape as a tuple of length same as the number of dimensions in the input array whose sliding windows are needed. The axes along which we need to slide are fed with the respective window lengths and rest are fed with 1s. This would create an array of views with singleton dims/axes i.e. axes with lengths=1 corresponding to the 1s in window_shape arg. So, for those cases we might want to index into the zeroth element corresponding to the axes that are fed 1 as the sliding window lengths to have a squeezed version of the sliding windows.
Thus, we would have a solution, like so -
# Get sliding windows
from skimage.util.shape import view_as_windows
w = view_as_windows(X, (1,16,16,1))[...,0,:,:,0]
# Index and get our specific windows
out = w[np.arange(X.shape[0]),x,y]
# If you need those in the same format as in the posted loopy code
out = out.transpose(0,2,3,1)
I have an image I've read from file with shape (m,n,3) (i.e. it has 3 channels). I also have a matrix to convert the color space with dimensions (3,3). I've already arrived at a few different ways of applying this matrix to each vector in the image; for example,
np.einsum('ij,...j',transform,image)
appears to make for the same results as the following (far slower) implementation.
def convert(im: np.array, transform: np.array) -> np.array:
""" Convert an image array to another colorspace """
dimensions = len(im.shape)
axes = im.shape[:dimensions-1]
# Create a new array (respecting mutability)
new_ = np.empty(im.shape)
for coordinate in np.ndindex(axes):
pixel = im[coordinate]
pixel_prime = transform # pixel
new_[coordinate] = pixel_prime
return new_
However, I found that the following is even more efficient while testing on the example image with line_profiler.
np.moveaxis(np.tensordot(transform, X, axes=((-1),(-1))), 0, 2)
The problem I'm having here is using just a np.tensordot, i.e. removing the need for np.moveaxis. I've spent a few hours attempting to find a solution (I'm guessing it resides in choosing the correct axes), so I thought I'd ask others for help.
You can do it concisely with tensordot if you make image the first argument:
np.tensordot(image, transform, axes=(-1, 1))
You can get better performance from einsum by using the argument optimize=True (requires numpy 1.12 or later):
np.einsum('ij,...j', transform, image, optimize=True)
Or (as Paul Panzer pointed out in a comment), you can simply use matrix multiplication:
image # transform.T
They all take about the same time on my computer.
I am trying to add 2 different Mat objects in python.
Both Mat objects are binary images (CV_8U). But because the Matrices don't have the same size, I get an error when adding them.
I generate one of the Mat objects using numpy with the proper value for the channels, like this:
diagonal = np.zeros((height,width,1))
cv2.line(diagonal,(0,0),(height,width), (255))
The other Mat object comes from cv2.Canny:
canny_edge = cv2.Canny(input_image, min_thr, max_thr)
Addition code:
final = cv2.addWeighted(canny_edge,1.0,diagonal,1.0,0)
I get the following error when I try to add the 2 Mat objects:
error: (-5) When the input arrays in add/subtract/multiply/divide functions have different types, the output array type must be explicitly specified in function cv::arithm_op
I also tried removing the channels value from the numpy generated matrix, but I got the same error.
So I tried to print the channels, but I got this:
height, weight, channels = canny_edge.shape
ValueError: not enough values to unpack (expected 3, got 2)
EDIT: I am sorry but the answer posted by Miki doesn't help me. The Mat object generated by cv2.Canny doesn't have channel information. I know it is a binary image, but opencv gets confused when it tries to add this Mat object's matrix with a Mat object which does have channel information.
Basically, use the canny_edge Mat object and convert it to a numpy array using:
canny_edge_nparr = np.asarray(canny_edge)
And then we can add a dimension to it (as the cv2.Canny outputs a Mat object of shape: (height, width) ):
canny_edge_nparr = np.expand_dims(canny_edge_nparr, axis=2)
Now the shape represents (height, width, channels). Now use numpy addition to add the arrays:
final = np.add(canny_edge_nparr, diagonal)
Currently I have the following problem: I get an array of image data (a large vector). I do neither know the size of the image, only the formula for the size (2^n*2^m) nor know the encoding of the image (jpeg, j2k, lossy 12bit jpeg or similar). One array I know the encoding looks like:
[-1000, -888, -884, -883, -884, -886,...-850, -852, -854, -854]
Here I can simply reshape it into the form I want to have (in this case its the square root of the length) and afterwards convert it into an image I can view with
pixel_values = numpy.asarray(pixel_values).reshape(512, 512)
pl2 = pylab.imshow(pixel_values, cmap=pylab.cm.bone
But now I have another array:
[65534, 57344, 4, 0, 0, 0, 65534, 57344, 7652, 1, 20479, 20991, 10496, 0,...35286, 23076, 34407, 36383, 56252, 65370, 217]
Here I can not use the square root or something similar (I know that the images are always like (2^n*2^m), and I don't know how I can transform this data into a real image I can view. How can I find out the encoding of this data and the size in python?
To determine the size of the image, I don't think there is a better approach that simply by trial and error. First we determine the image sizes compatible with the expression (2^n, 2^m),
import numpy as np
vect_len = len(pixel_values)
min_size = 256 # e.g. minimal size acceptable for one of the dimensions
npm = np.log2(vect_len) # this is n+m
if not npm % 1:
# n+m is an integer
for n in range(1, npm):
p = npm - n
if 2**n < min_size or 2**p < min_size:
continue
print(n,p)
# (256, 2048)
# (512, 1024)
# (1024, 512)
# (2048, 256)
Then, for each of the possible image sizes, we reshape the pixel_values array and plot the result until it looks right. If it is a colour image, there would also be a third dimension of size 3 for RGB channels.
If you can visualise you image simply by reshaping the input vector, that means that it contains directly the values for each pixel and we don't care about the encoding of the image (it was already decoded). Indeed, say jpeg would store the discrete cosine transform (DCT) coefficients in the .jpeg file, j2k stores the wavelet transform coefficients, etc. This is not something you want to get into, and the approach is to use the appropriate library for each format.