Grouping a numpy array - python

I have a huge NumPy array of size 778. I would like to pair the elements hence I'm using the following code to do so.
coordinates = coordinates.reshape(-1, 2,2)
However, if I use the following code it just works fine.
coordinates = coordinates[:len(coordinates)-1].reshape(-1, 2,2)
How can I do this in a proper way irrespective of the size?

Related

Find indices of each integer group in a labelled array

I have a labelled array obtained by using scipy measure.label on a binary 2 dimensional array. For argument sake it might look like this:
[
[1,1,0,0,2],
[1,1,1,0,2],
[1,0,0,0,0],
[0,0,0,3,3]
]
I want to get the indices of each group of labels. So in this case:
[
[(0,0),(0,1),(1,0),(1,1),(1,2),(2,0)],
[(0,4),(1,4)],
[(3,3),(3,4)]
]
I can do this using builtin Python like so (n and m are the dimensions of the array):
_dict = {}
for coords in itertools.product(range(n), range(m)):
_dict.setdefault(labelled_array[coords], []).append(coords)
blobs = [np.array(item) for item in _dict.values()]
This is very slow (about 10 times slower than the initial labelling of the binary array using measure.label!)
Scipy also has a function find_objects:
from scipy import ndimage
objs = ndimage.find_objects(labelled_array)
From what I can gather though this is returning the bounding box for each group (object). I don't want the bounding box I want the exact coordinates of each value in the group.
I have also tried using np.where for each integer in the number of labels. This is very slow.
it also seems to me that what I'm tring to do here is something like the minesweeper algorithm. I suspect there must be an efficient solution using numpy or scipy.
Is there an efficient way to obtain these coordinates?

Combining n dimensional arrays

I am in the process of converting some matlab code to python. I working with a 3d volume h x w x d represented as an numpy array, I am extracting smaller 3d patches from this volume using the function from SO here. So if I have 32x32x32 array and extract 16x16x16 patches I end up with a shape (2, 2, 2, 16, 16, 16) After processing each patch I would like to put it back into shape h x w x d basically reverse window_nd What would be the idiomatic numpy way without looping each dimension? Since I also need to work with 2d and 4d data I would like to avoid creating a function for each dimension.
Normally, writing back to as_strided views is not advised because it can cause race conditions, but since you only made blocks, this should work:
original_shaped_array = windowed_array.transpose(0,3,1,4,2,5).reshape(32,32,32)
Additionally, if you never copied the windowed array, and do calculations in-place, the data should be changed in the original array - a windowed view is simply a new view into the same data. Don't do this if there is any overlap

What is the fastest way to read in an image to an array of tuples?

I am trying to assign provinces to an area for use in a game mod. I have two separate maps for area and provinces.
provinces file,
area file.
Currently I am reading in an image in Python and storing it in an array using PIL like this:
import PIL
land_prov_pic = Image.open(INPUT_FILES_DIR + land_prov_str)
land_prov_array = np.array(land_prov_pic)
image_size = land_prov_pic.size
for x in range(image_size[0]):
if x % 100 == 0:
print(x)
for y in range(image_size[1]):
land_prov_array[x][y] = land_prov_pic.getpixel((x,y))
Where you end up with land_prov_array[x][y] = (R,G,B)
However, this get's really slow, especially for large images. I tried reading it in using opencv like this:
import opencv
land_prov_array = cv2.imread(INPUT_FILES_DIR + land_prov_str)
land_prov_array = cv2.cvtColor(land_prov_array, cv2.COLOR_BGR2RGB) #Convert from BGR to RGB
But now land_prov_array[x][y] = [R G B] which is an ndarray and can't be inserted into a set. But it's way faster than the previous for loop. How do I convert [R G B] to (R,G,B) for every element in the array without for loops or, better yet, read it in that way?
EDIT: Added pictures, more description, and code blocks for readability.
It is best to convert the [R,G,B] array to tuple when you need it to be a tuple, rather than converting the whole image to this form. An array of tuples takes up a lot more memory, and will be a lot slower to process, than a numeric array.
The answer by isCzech shows how to create a NumPy view over a 3D array that presents the data as if it were a 2D array of tuples. This might not require the additional memory of an actual array of tuples, but it is still a lot slower to process.
Most importantly, most NumPy functions (such as np.mean) and operators (such as +) cannot be applied to such an array. Thus, one is obliged to iterate over the array in Python code (or with a #np.vectorize function), which is a lot less efficient than using NumPy functions and operators that work on the array as a whole.
For transformation from a 3D array (data3D) to a 2D array (data2D), I've used this approach:
import numpy as np
dt = np.dtype([('x', 'u1'), ('y', 'u1'), ('z', 'u1')])
data2D = data3D.view(dtype=dt).squeeze()
The .view modifies the data type and returns still a 3D array with the last dimension of size 1 which can be then removed by .squeeze. Alternatively you can use .squeeze(axis=-1) to only squeeze the last dimension (in case some of your other dimensions are of size 1 too).
Please note I've used uint8 ('u1') - your type may be different.
Trying to do this using a loop is very slow, indeed (compared to this approach at least).
Similar question here: Show a 2d numpy array where contents are tuples as an image

Efficiently Using Multiple Numpy Slices for Random Image Cropping

I have a 4-D numpy array, with the first dimension representing the number of images in a data set, the second and third being the (equal) width and height, and the 4th being the number of channels (3). For example let's say I have 4 color images that are 28*28, so my image data looks like this:
X = np.reshape(np.arange(4*28*28*3), (4,28,28,3))
I would like to select a random 16*16 width x height crop of each of the 4 images. Critically, I want the crop to be different per-image, i.e I want to generate 4 random (x_offset, y_offset) pairs. In the end I want access to an array of shape (4, 16, 16, 3).
If I were to write this in a for loop it would look something like this:
x = np.random.randint(0,12,4)
y = np.random.randint(0,12,4)
for i in range(X.shape[0]):
cropped_image = X[i, x[i]:x[i]+16, y[i]:y[i]+16, :]
#Add cropped image to a list or something
But I'd like to do it as efficiently as possible and I'm wondering if there's a way to do it with strides and fancy indexing. I've seen the answers to this question, but can't quite wrap my head around how I might combine something like stride_tricks with random starting points for the strides on the second and third (width and height) axes.
Leverage strided-based method for efficient patch extraction
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows that would be merely views into the input array and hence incur no extra memory overhead and virtually free! We can surely use np.lib.stride_tricks.as_strided directly, but the setup work required is hard to manage especially on arrays with higher dimensions. If scikit-image is not available, we can directly use the source code that works standalone.
Explanation on usage of view_as_windows
The idea with view_as_windows is that we feed in the input arg window_shape as a tuple of length same as the number of dimensions in the input array whose sliding windows are needed. The axes along which we need to slide are fed with the respective window lengths and rest are fed with 1s. This would create an array of views with singleton dims/axes i.e. axes with lengths=1 corresponding to the 1s in window_shape arg. So, for those cases we might want to index into the zeroth element corresponding to the axes that are fed 1 as the sliding window lengths to have a squeezed version of the sliding windows.
Thus, we would have a solution, like so -
# Get sliding windows
from skimage.util.shape import view_as_windows
w = view_as_windows(X, (1,16,16,1))[...,0,:,:,0]
# Index and get our specific windows
out = w[np.arange(X.shape[0]),x,y]
# If you need those in the same format as in the posted loopy code
out = out.transpose(0,2,3,1)

matplotlib.pyplot.hist returns a histogram where all bins have the same value when I have varying data

I am trying to create a histogram in python using matplotlib.pyplot.hist.
I have an array of data that varies, however when put my code into python the histogram is returned with values in all bins equal to each other, or equal to zero which is not correct.
The histogram should look the the line graph above it with bins roughly the same height and in the same shape as the graph above.
The line graph above the histogram is there to illustrate what my data looks like and to show that my data does vary.
My data array is called spectrumnoise and is just a function I have created against an array x
x=np.arange[0.1,20.1,0.1]
The code I am using to create the histogram and the line graph above it is
import matplotlib.pylot as mpl
mpl.plot(x,spectrumnoise)
mpl.hist(spectrumnoise,bins=50,histtype='step')
mpl.show()
I have also tried using
mpl.hist((x,spectrumnoise),bins=50,histtype=step)
I have also changed the number of bins countless times to see if that helps an normalising the histogram function but nothing works.
Image of the output of the code can be seen here
The problem is that spectrumnoise is a list of arrays, not a numpy.ndarray. When you hand hist a list of arrays as its first argument, it treats each element as a separate dataset to plot. All the bins have the same height because each 'dataset' in the list has only one value in it!
From the hist docstring:
Multiple data can be provided via x as a list of datasets
of potentially different length ([x0, x1, ...]), or as
a 2-D ndarray in which each column is a dataset.
Try converting spectrumnoise to a 1D array:
pp.hist(np.vstack(spectrumnoise),50)
As an aside, looking at your code there's absolutely no reason to convert your data to lists in the first place. What you ought to do is operate directly on slices in your array, e.g.:
data[20:40] += y1

Categories