Resample and resize numpy array - python

I would like to resample a numpy array as suggested here Resampling a numpy array representing an image however this resampling will do so by a factor i.e.
x = np.arange(9).reshape(3,3)
print scipy.ndimage.zoom(x, 2, order=1)
Will create a shape of (6,6) but how can I resample an array to its best approximation within a (4,6),(6,8) or (6,10) shape for instance?

Instead of passing a single number to the zoom parameter, give a sequence:
scipy.ndimage.zoom(x, zoom=(1.5, 2.), order=1)
#array([[0, 0, 1, 1, 2, 2],
# [2, 2, 3, 3, 4, 4],
# [4, 4, 5, 5, 6, 6],
# [6, 6, 7, 7, 8, 8]])
With the sequences (2., 2.75) and (2., 3.5) you will get output arrays with shapes (6, 8) and (6, 10), respectively.

Related

What is the fill order of Numpy fromfile for a 2-D ndarray?

I'm trying to read a structured binary file using the numpy.fromfile() function. In my case, I have a numpy.dtype() which is used to define a user defined data type to use with np.fromfile().
I will reproduce the relevant part of the data structure here( for the full structure is rather long):
('RawData', np.int32, (2, BlockSize))
this will read BlockSize*2 number of int32s into the field RawData, will produce a 2xBlockSize matrix. This is where I am having trouble because I want to replicate the behavior of Matlab's fread() function, in which the matric is filled in column order. As for NumPy's fromfile(), this isn't mentioned (at least I couldn't find it).
It doesn't matter NumPy's fromfile() should work like Matlab's fread(), but I have to know how NumPy's fromfile() works to code accordingly.
Now, the question is, what is the fill order of a 2-D array in the NumPy fromfile() function when using a custom data type?
fromfile and tofile read/write flat, 1d, arrays:
In [204]: x = np.arange(1,11).astype('int32')
In [205]: x.tofile('data615')
fromfile returns a 1d array:
In [206]: np.fromfile('data615',np.int32)
Out[206]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=int32)
x.reshape(2,5).tofile(...) would save the same thing. tofile does not save dtype or shape information.
reshaped to 2d, the default order is 'C':
In [207]: np.fromfile('data615',np.int32).reshape(2,5)
Out[207]:
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]], dtype=int32)
but it can be changed to MATLAB like:
In [208]: np.fromfile('data615',np.int32).reshape(2,5, order='F')
Out[208]:
array([[ 1, 3, 5, 7, 9],
[ 2, 4, 6, 8, 10]], dtype=int32)
The underlying databuffer is the same, just a 1d array of bytes.
edit
The file could be read as a 2 integer structure:
In [249]: np.fromfile('data615','i4,i4')
Out[249]:
array([(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
In [250]: _['f0']
Out[250]: array([1, 3, 5, 7, 9], dtype=int32)
It's still a 1d array, but with numbers grouped by 2s.
Converting to complex:
In [252]: xx = np.fromfile('data615','i4,i4')
In [253]: xx['f0']+1j*xx['f1']
Out[253]: array([1. +2.j, 3. +4.j, 5. +6.j, 7. +8.j, 9.+10.j])
In [254]: _.dtype
Out[254]: dtype('complex128')
If the data had been saved as floats, we could load them as complex directly:
In [255]: x.astype(np.float32).tofile('data615f')
In [257]: xx = np.fromfile('data615f',np.complex64)
In [258]: xx
Out[258]: array([1. +2.j, 3. +4.j, 5. +6.j, 7. +8.j, 9.+10.j], dtype=complex64)
Another way to get the complex from the integer sequence:
In [261]: np.fromfile('data615', np.int32).reshape(5,2)
Out[261]:
array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10]], dtype=int32)
In [262]: xx = np.fromfile('data615', np.int32).reshape(5,2)
In [263]: xx[:,0]+1j*xx[:,1]
Out[263]: array([1. +2.j, 3. +4.j, 5. +6.j, 7. +8.j, 9.+10.j])
By default, when creating a new 2-d array, NumPy will use "C" ordering, which is row-major. That is the opposite of the order used by Matlab.
For example, if BlockSize is 4, and the raw data is
0 1 2 3 4 5 6 7
then the 2 x 4 array will be
[[0, 1, 2, 3],
[4, 5, 6, 7]]
With Matlab and that same raw data, the 2 x 4 array would be
[[0, 2, 4, 6],
[1, 3, 5, 7]]

Unable to expand numpy array

I have an array containing information about images. It contains information about 21495 images in an array named 'shuffled'.
np.shape(shuffled) = (21495, 1)
np.shape(shuffled[0]) = (1,)
np.shape(shuffled[0][0]) = (128, 128, 3) # (These are the image dimensions, with 3 channels of RGB)
How do I convert this array to an array of shape (21495, 128, 128, 3) to feed to my model?
There are 2 ways that I can think of:
One is using the vstack() fucntion of numpy, but it gets quite slow overtime when the size of array starts to increase.
Another way (which I use) is to take an empty list and keep appending the images array to that list using .append(), then finally convert that list to a numpy array.
Try
np.stack(shuffled[:,0])
stack, a form of concatenate, joins a list (or array) of arrays on a new initial dimension. We need to get get rid of the size 1 dimension first.
In [23]: arr = np.empty((4,1),object)
In [24]: for i in range(4): arr[i,0] = np.arange(i,i+6).reshape(2,3)
In [25]: arr
Out[25]:
array([[array([[0, 1, 2],
[3, 4, 5]])],
[array([[1, 2, 3],
[4, 5, 6]])],
[array([[2, 3, 4],
[5, 6, 7]])],
[array([[3, 4, 5],
[6, 7, 8]])]], dtype=object)
In [26]: arr.shape
Out[26]: (4, 1)
In [27]: arr[0,0].shape
Out[27]: (2, 3)
In [28]: np.stack(arr[:,0])
Out[28]:
array([[[0, 1, 2],
[3, 4, 5]],
[[1, 2, 3],
[4, 5, 6]],
[[2, 3, 4],
[5, 6, 7]],
[[3, 4, 5],
[6, 7, 8]]])
In [29]: _.shape
Out[29]: (4, 2, 3)
But beware, if the subarrays differ in shape, say one or two is b/w rather than 3 channel, this won't work.

Create a 1D array of 1D arrays in Python

How can I create a 1D array of 1D array in python? That is, something like:
a = [array([0]) array([1]) array([2]) array([3])]
If I create a list of arrays and cast it, I obtain a matrix:
a = [array([1]), array([2])]
b = np.asarray(a)
then b.shape = (2,1) but if i reshape it:
c = np.asarray(a)
then c = array([1, 2]) which is an array of ints.
Is there any way to avoid this? It is worth noting that the inner arrays have shape (1,).
Ok, found. The solution is to create an empty array with dtype object and assign there a list of arrays.
a = [array([1]), array([2])]
b = np.empty(len(a), dtype=object)
b[:] = a
And now b = array([array([1]), array([2])], dtype=object)
Do you mean something like this:
ans = [np.array([i]) for i in range(4)]
print (ans)
Output
[array([0]), array([1]), array([2]), array([3])]
You can either have a matrix-like list, when the shape of the arrays is all the same:
matrix_like_list = np.array([np.arange(10) for i in range(3)])
>>> array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
with shape (3, 10), or you can have a list of arrays when the size of at least one array is different:
list_of_arrays = np.array([np.arange(np.random.randint(10)) for i in range(3)])
>>> array([array([0, 1, 2, 3, 4, 5]), array([0, 1, 2, 3, 4, 5, 6]),
array([0, 1, 2, 3, 4, 5, 6])], dtype=object)
The resulting object will have shape (3,).
There are no other options.

How to use numpy as_strided (from np.stride_tricks) correctly?

I'm trying to reshape a numpy array using numpy.strided_tricks. This is the guide I'm following: https://stackoverflow.com/a/2487551/4909087
My use case is very similar, with the difference being that I need strides of 3.
Given this array:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
I'd like to get:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Here's what I tried:
import numpy as np
as_strided = np.lib.stride_tricks.as_strided
a = np.arange(1, 10)
as_strided(a, (len(a) - 2, 3), (3, 3))
array([[ 1, 2199023255552, 131072],
[ 2199023255552, 131072, 216172782113783808],
[ 131072, 216172782113783808, 12884901888],
[216172782113783808, 12884901888, 768],
[ 12884901888, 768, 1125899906842624],
[ 768, 1125899906842624, 67108864],
[ 1125899906842624, 67108864, 4]])
I was pretty sure I'd followed the example to a T, but evidently not. Where am I going wrong?
The accepted answer (and discussion) is good, but for the benefit of readers who don't want to run their own test case, I'll try to illustrate what's going on:
In [374]: a = np.arange(1,10)
In [375]: as_strided = np.lib.stride_tricks.as_strided
In [376]: a.shape
Out[376]: (9,)
In [377]: a.strides
Out[377]: (4,)
For a contiguous 1d array, strides is the size of the element, here 4 bytes, an int32. To go from one element to the next it steps forward 4 bytes.
What the OP tried:
In [380]: as_strided(a, shape=(7,3), strides=(3,3))
Out[380]:
array([[ 1, 512, 196608],
[ 512, 196608, 67108864],
[ 196608, 67108864, 4],
[ 67108864, 4, 1280],
[ 4, 1280, 393216],
[ 1280, 393216, 117440512],
[ 393216, 117440512, 7]])
This is stepping by 3 bytes, crossing int32 boundaries, and giving mostly unintelligable numbers. If might make more sense if the dtype had been bytes or uint8.
Instead using a.strides*2 (tuple replication), or (4,4) we get the desired array:
In [381]: as_strided(a, shape=(7,3), strides=(4,4))
Out[381]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Columns and rows both step one element, resulting in a 1 step moving window. We could have also set shape=(3,7), 3 windows 7 elements long.
In [382]: _.strides
Out[382]: (4, 4)
Changing strides to (8,4) steps 2 elements for each window.
In [383]: as_strided(a, shape=(7,3), strides=(8,4))
Out[383]:
array([[ 1, 2, 3],
[ 3, 4, 5],
[ 5, 6, 7],
[ 7, 8, 9],
[ 9, 25, -1316948568],
[-1316948568, 184787224, -1420192452],
[-1420192452, 0, 0]])
But shape is off, showing us bytes off the end of the original databuffer. That could be dangerous (we don't know if those bytes belong to some other object or array). With this size of array we don't get a full set of 2 step windows.
Now step 3 elements for each row (3*4, 4):
In [384]: as_strided(a, shape=(3,3), strides=(12,4))
Out[384]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [385]: a.reshape(3,3).strides
Out[385]: (12, 4)
This is the same shape and strides as a 3x3 reshape.
We can set negative stride values and 0 values. In fact, negative-step slicing along a dimension with a positive stride will give a negative stride, and broadcasting works by setting 0 strides:
In [399]: np.broadcast_to(a, (2,9))
Out[399]:
array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [400]: _.strides
Out[400]: (0, 4)
In [401]: a.reshape(3,3)[::-1,:]
Out[401]:
array([[7, 8, 9],
[4, 5, 6],
[1, 2, 3]])
In [402]: _.strides
Out[402]: (-12, 4)
However, negative strides require adjusting which element of the original array is the first element of the view, and as_strided has no parameter for that.
I have no idea why you think you need strides of 3. You need strides the distance in bytes between one element of a and the next, which you can get using a.strides:
as_strided(a, (len(a) - 2, 3), a.strides*2)
I was trying to do a similar operation and run into the same problem.
In your case, as stated in this comment, the problems were:
You were not taking into account the size of your element when stored in memory (int32 = 4, which can be checked using a.dtype.itemsize).
You didn't specify appropriately the number of strides you had to skip, which in your case were also 4, as you were skipping only one element.
I made myself a function based on this answer, in which I compute the segmentation of a given array, using a window of n-elements and specifying the number of elements to overlap (given by window - number_of_elements_to_skip).
I share it here in case someone else needs it, since it took me a while to figure out how stride_tricks work:
def window_signal(signal, window, overlap):
"""
Windowing function for data segmentation.
Parameters:
------------
signal: ndarray
The signal to segment.
window: int
Window length, in samples.
overlap: int
Number of samples to overlap
Returns:
--------
nd-array
A copy of the signal array with shape (rows, window),
where row = (N-window)//(window-overlap) + 1
"""
N = signal.reshape(-1).shape[0]
if (window == overlap):
rows = N//window
overlap = 0
else:
rows = (N-window)//(window-overlap) + 1
miss = (N-window)%(window-overlap)
if(miss != 0):
print('Windowing led to the loss of ', miss, ' samples.')
item_size = signal.dtype.itemsize
strides = (window - overlap) * item_size
return np.lib.stride_tricks.as_strided(signal, shape=(rows, window),
strides=(strides, item_size))
The solution for this case is, according to your code:
as_strided(a, (len(a) - 2, 3), (4, 4))
Alternatively, using the function window_signal:
window_signal(a, 3, 2)
Both return as output the following array:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])

Losing dimensions of a numpy array

I have a numpy array that consists of lists each containing more lists. I have been trying to figure out a smart and fast way to collapse the dimensions of these list using numpy, but without any luck.
What I have looks like this:
>>> np.shape(projected)
(13,)
>>> for i in range(len(projected)):
print np.shape(projected[i])
(130, 3200)
(137, 3200)
.
.
(307, 3200)
(196, 3200)
What I am trying to get is a list that contains all the sub-lists and would be 130+137+..+307+196 long. I have tried using np.reshape() but it gives an error: ValueError: total size of new array must be unchanged
np.reshape(projected,(total_number_of_lists, 3200))
>> ValueError: total size of new array must be unchanged
I have been fiddling around with np.vstack but to no avail. Any help that does not contain a for loop and an .append() would be highly appreciated.
It seems you can just use np.concatenate along the first axis axis=0 like so -
np.concatenate(projected,0)
Sample run -
In [226]: # Small random input list
...: projected = [[[3,4,1],[5,3,0]],
...: [[0,2,7],[8,2,8],[7,3,6],[1,9,0],[4,2,6]],
...: [[0,2,7],[8,2,8],[7,3,6]]]
In [227]: # Print nested lists shapes
...: for i in range(len(projected)):
...: print (np.shape(projected[i]))
...:
(2, 3)
(5, 3)
(3, 3)
In [228]: np.concatenate(projected,0)
Out[228]:
array([[3, 4, 1],
[5, 3, 0],
[0, 2, 7],
[8, 2, 8],
[7, 3, 6],
[1, 9, 0],
[4, 2, 6],
[0, 2, 7],
[8, 2, 8],
[7, 3, 6]])
In [232]: np.concatenate(projected,0).shape
Out[232]: (10, 3)

Categories