Creating a non trivial view of numpy array - python

TL;DR:
I am looking for a way to get a non trivial, and in particular non contigous, view of a numpy ndarray.
E.g., given a 1D ndarray, x = np.array([1, 2, 3, 4]), is there a way to get a non trivial view of it, e.g. np.array([2, 4, 3, 1])?
Longer Version
The context of the question is the following: I have a 4D ndarray of shape (U, V, S, T) which I would like to reshape to a 2D ndarray of shape (U*S, V*T)in a non-trivial way, i.e. a simple np.reshape()does not do the trick as I have a more complex indexing scheme in mind, in which the reshaped array will not be contigous in memory. The arrays in my case are rather large and I would like to get a view and not a copy of the array.
Example
Given an array x(u, v, s, t)of shape (2, 2, 2, 2):
x = np.array([[[[1, 1], [1, 1]],[[2, 2], [2, 2]]],
[[[3, 3], [3, 3]], [[4, 4], [4, 4]]]])
I would like to get the view z(a, b) of the array:
np.array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
This corresponds to a indexing scheme of a = u * S + s and b = v * T + t, where in this case S = 2 = T.
What I have tried
Various approaches using np.reshape or even as_strided. Doing standard reshaping will not change the order of elements as they appear in the memory. I tried playing around with order='F' and transposing a bit but had no idea which gave me the correct result.
Since I know the indexing scheme, I tried to operate on the flattened view of the array using np.ravel(). My idea was to create an array of indices follwing the desired indexing scheme and apply it to the flattened array view, but unfortunately, fancy/advanced indexing gives a copy of the array, not a view.
Question
Is there any way to achieve the indexing view that I'm looking for?
In principle, I think this should be possible, as for example ndarray.sort() performs an in place non-trivial indexing of the array. On the other hand, this is probably implemented in C/C++, so it might even not be possible in pure Python?

Let's review the basics of an array - it has a flat data buffer, a shape, strides, and dtype. Those three attributes are used to view the elements of the data buffer in a particular way, whether it is a simple 1d sequence, 2d or higher dimensions.
A true view than use the same data buffer, but applies different shape, strides or dtype to it.
To get [2, 4, 3, 1] from [1,2,3,4] requires starting at 2, jumping forward 2, then skipping back to 1 and forward 2. That's not a regular pattern that can be represented by strides.
arr[1::2] gives the [2,4], and arr[0::2] gives the [1,3].
(U, V, S, T) to (U*S, V*T) requires a transpose to (U, S, V, T), followed by a reshape
arr.transpose(0,2,1,3).reshape(U*S, V*T)
That will require a copy, no way around that.
In [227]: arr = np.arange(2*3*4*5).reshape(2,3,4,5)
In [230]: arr1 = arr.transpose(0,2,1,3).reshape(2*4, 3*5)
In [231]: arr1.shape
Out[231]: (8, 15)
In [232]: arr1
Out[232]:
array([[ 0, 1, 2, 3, 4, 20, 21, 22, 23, 24, 40, 41, 42,
43, 44],
[ 5, 6, 7, 8, 9, 25, 26, 27, 28, 29, 45, 46, 47,
48, 49],
....)
Or with your x
In [234]: x1 = x.transpose(0,2,1,3).reshape(4,4)
In [235]: x1
Out[235]:
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
Notice that the elements are in a different order:
In [254]: x.ravel()
Out[254]: array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
In [255]: x1.ravel()
Out[255]: array([1, 1, 2, 2, 1, 1, 2, 2, 3, 3, 4, 4, 3, 3, 4, 4])
ndarray.sort is in-place and changes the order of bytes in the data buffer. It is operating at a low level that we don't have access to. It isn't a view of the original array.

Related

Unable to expand numpy array

I have an array containing information about images. It contains information about 21495 images in an array named 'shuffled'.
np.shape(shuffled) = (21495, 1)
np.shape(shuffled[0]) = (1,)
np.shape(shuffled[0][0]) = (128, 128, 3) # (These are the image dimensions, with 3 channels of RGB)
How do I convert this array to an array of shape (21495, 128, 128, 3) to feed to my model?
There are 2 ways that I can think of:
One is using the vstack() fucntion of numpy, but it gets quite slow overtime when the size of array starts to increase.
Another way (which I use) is to take an empty list and keep appending the images array to that list using .append(), then finally convert that list to a numpy array.
Try
np.stack(shuffled[:,0])
stack, a form of concatenate, joins a list (or array) of arrays on a new initial dimension. We need to get get rid of the size 1 dimension first.
In [23]: arr = np.empty((4,1),object)
In [24]: for i in range(4): arr[i,0] = np.arange(i,i+6).reshape(2,3)
In [25]: arr
Out[25]:
array([[array([[0, 1, 2],
[3, 4, 5]])],
[array([[1, 2, 3],
[4, 5, 6]])],
[array([[2, 3, 4],
[5, 6, 7]])],
[array([[3, 4, 5],
[6, 7, 8]])]], dtype=object)
In [26]: arr.shape
Out[26]: (4, 1)
In [27]: arr[0,0].shape
Out[27]: (2, 3)
In [28]: np.stack(arr[:,0])
Out[28]:
array([[[0, 1, 2],
[3, 4, 5]],
[[1, 2, 3],
[4, 5, 6]],
[[2, 3, 4],
[5, 6, 7]],
[[3, 4, 5],
[6, 7, 8]]])
In [29]: _.shape
Out[29]: (4, 2, 3)
But beware, if the subarrays differ in shape, say one or two is b/w rather than 3 channel, this won't work.

Numpy.where used with list of values

I have a 2d and 1d array. I am looking to find the two rows that contain at least once the values from the 1d array as follows:
import numpy as np
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
results = []
for elem in B:
results.append(np.where(A==elem)[0])
This works and results in the following array:
[array([0, 5], dtype=int64),
array([0, 3], dtype=int64),
array([2, 4], dtype=int64),
array([0, 2], dtype=int64)]
But this is probably not the best way of proceeding. Following the answers given in this question (Search Numpy array with multiple values) I tried the following solutions:
out1 = np.where(np.in1d(A, B))
num_arr = np.sort(B)
idx = np.searchsorted(B, A)
idx[idx==len(num_arr)] = 0
out2 = A[A == num_arr[idx]]
But these give me incorrect values:
In [36]: out1
Out[36]: (array([ 0, 1, 2, 6, 8, 9, 13, 17], dtype=int64),)
In [37]: out2
Out[37]: array([0, 3, 1, 2, 3, 1, 2, 0])
Thanks for your help
If you need to know whether each row of A contains ANY element of array B without interest in which particular element of B it is, the following script can be used:
input:
np.isin(A,B).sum(axis=1)>0
output:
array([ True, False, True, True, True, True])
Since you're dealing with a 2D array* you can use broadcasting to compare B with raveled version of A. This will give you the respective indices in a raveled shape. Then you can reverse the result and get the corresponding indices in original array using np.unravel_index.
In [50]: d = np.where(B[:, None] == A.ravel())[1]
In [51]: np.unravel_index(d, A.shape)
Out[51]: (array([0, 5, 0, 3, 2, 4, 0, 2]), array([0, 2, 2, 0, 0, 1, 1, 2]))
^
# expected result
* From documentation: For 3-dimensional arrays this is certainly efficient in terms of lines of code, and, for small data sets, it can also be computationally efficient. For large data sets, however, the creation of the large 3-d array may result in sluggish performance.
Also, Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm. In these cases, it is better to write the algorithm's outer loop in Python. This may also produce more readable code, as algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast increases.
Is something like this what you are looking for?
import numpy as np
from itertools import combinations
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
for i in combinations(A, 2):
if np.all(np.isin(B, np.hstack(i))):
print(i[0], ' ', i[1])
which prints the following:
[0 3 1] [2 7 3]
[0 3 1] [6 2 7]
note: this solution does NOT require the rows be consecutive. Please let me know if that is required.

how to roll two arrays of diffeent dimesnions into one dimensional array in python

I have two arrays (a,b) of different mXn dimensions
I need to know that how can I roll these two arrays into a single one dimensional array
I used np.flatten() for both a,b array and then rolled them into a single array but what i get is an array containg two one dimensional array(a,b)
a = np.array([[1,2,3,4],[3,4,5,6],[4,5,6,7]]) #3x4 array
b = np.array([ [1,2],[2,3],[3,4],[4,5],[5,6]]) #5x2 array
result = [a.flatten(),b.flatten()]
print(result)
[array([1, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7]), array([1, 2, 2, 3, ... 5, 6])]
In matlab , I would do it like this :
res = [a(:);b(:)]
Also, how can I retrieve a and b back from the result?
Use ravel + concatenate:
>>> np.concatenate((a.ravel(), b.ravel()))
array([1, 2, 3, 4, 3, 4, 5, 6, 4, 5, 6, 7, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6])
ravel returns a 1D view of the arrays, and is a cheap operation. concatenate joins the views together, returning a new array.
As an aside, if you want to be able to retrieve these arrays back, you'll need to store their shapes in some variable.
i = a.shape
j = b.shape
res = np.concatenate((a.ravel(), b.ravel()))
Later, to retrieve a and b from res,
a = res[:np.prod(i)].reshape(i)
b = res[np.prod(i):].reshape(j)
a
array([[1, 2, 3, 4],
[3, 4, 5, 6],
[4, 5, 6, 7]])
b
array([[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6]])
How about changing the middle line to:
result = [a.flatten(),b.flatten()].flatten()
Or even more simply (if you know there's always exactly 2 arrays)
result = a.flatten() + b.flatten()

How to use numpy as_strided (from np.stride_tricks) correctly?

I'm trying to reshape a numpy array using numpy.strided_tricks. This is the guide I'm following: https://stackoverflow.com/a/2487551/4909087
My use case is very similar, with the difference being that I need strides of 3.
Given this array:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
I'd like to get:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Here's what I tried:
import numpy as np
as_strided = np.lib.stride_tricks.as_strided
a = np.arange(1, 10)
as_strided(a, (len(a) - 2, 3), (3, 3))
array([[ 1, 2199023255552, 131072],
[ 2199023255552, 131072, 216172782113783808],
[ 131072, 216172782113783808, 12884901888],
[216172782113783808, 12884901888, 768],
[ 12884901888, 768, 1125899906842624],
[ 768, 1125899906842624, 67108864],
[ 1125899906842624, 67108864, 4]])
I was pretty sure I'd followed the example to a T, but evidently not. Where am I going wrong?
The accepted answer (and discussion) is good, but for the benefit of readers who don't want to run their own test case, I'll try to illustrate what's going on:
In [374]: a = np.arange(1,10)
In [375]: as_strided = np.lib.stride_tricks.as_strided
In [376]: a.shape
Out[376]: (9,)
In [377]: a.strides
Out[377]: (4,)
For a contiguous 1d array, strides is the size of the element, here 4 bytes, an int32. To go from one element to the next it steps forward 4 bytes.
What the OP tried:
In [380]: as_strided(a, shape=(7,3), strides=(3,3))
Out[380]:
array([[ 1, 512, 196608],
[ 512, 196608, 67108864],
[ 196608, 67108864, 4],
[ 67108864, 4, 1280],
[ 4, 1280, 393216],
[ 1280, 393216, 117440512],
[ 393216, 117440512, 7]])
This is stepping by 3 bytes, crossing int32 boundaries, and giving mostly unintelligable numbers. If might make more sense if the dtype had been bytes or uint8.
Instead using a.strides*2 (tuple replication), or (4,4) we get the desired array:
In [381]: as_strided(a, shape=(7,3), strides=(4,4))
Out[381]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Columns and rows both step one element, resulting in a 1 step moving window. We could have also set shape=(3,7), 3 windows 7 elements long.
In [382]: _.strides
Out[382]: (4, 4)
Changing strides to (8,4) steps 2 elements for each window.
In [383]: as_strided(a, shape=(7,3), strides=(8,4))
Out[383]:
array([[ 1, 2, 3],
[ 3, 4, 5],
[ 5, 6, 7],
[ 7, 8, 9],
[ 9, 25, -1316948568],
[-1316948568, 184787224, -1420192452],
[-1420192452, 0, 0]])
But shape is off, showing us bytes off the end of the original databuffer. That could be dangerous (we don't know if those bytes belong to some other object or array). With this size of array we don't get a full set of 2 step windows.
Now step 3 elements for each row (3*4, 4):
In [384]: as_strided(a, shape=(3,3), strides=(12,4))
Out[384]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [385]: a.reshape(3,3).strides
Out[385]: (12, 4)
This is the same shape and strides as a 3x3 reshape.
We can set negative stride values and 0 values. In fact, negative-step slicing along a dimension with a positive stride will give a negative stride, and broadcasting works by setting 0 strides:
In [399]: np.broadcast_to(a, (2,9))
Out[399]:
array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [400]: _.strides
Out[400]: (0, 4)
In [401]: a.reshape(3,3)[::-1,:]
Out[401]:
array([[7, 8, 9],
[4, 5, 6],
[1, 2, 3]])
In [402]: _.strides
Out[402]: (-12, 4)
However, negative strides require adjusting which element of the original array is the first element of the view, and as_strided has no parameter for that.
I have no idea why you think you need strides of 3. You need strides the distance in bytes between one element of a and the next, which you can get using a.strides:
as_strided(a, (len(a) - 2, 3), a.strides*2)
I was trying to do a similar operation and run into the same problem.
In your case, as stated in this comment, the problems were:
You were not taking into account the size of your element when stored in memory (int32 = 4, which can be checked using a.dtype.itemsize).
You didn't specify appropriately the number of strides you had to skip, which in your case were also 4, as you were skipping only one element.
I made myself a function based on this answer, in which I compute the segmentation of a given array, using a window of n-elements and specifying the number of elements to overlap (given by window - number_of_elements_to_skip).
I share it here in case someone else needs it, since it took me a while to figure out how stride_tricks work:
def window_signal(signal, window, overlap):
"""
Windowing function for data segmentation.
Parameters:
------------
signal: ndarray
The signal to segment.
window: int
Window length, in samples.
overlap: int
Number of samples to overlap
Returns:
--------
nd-array
A copy of the signal array with shape (rows, window),
where row = (N-window)//(window-overlap) + 1
"""
N = signal.reshape(-1).shape[0]
if (window == overlap):
rows = N//window
overlap = 0
else:
rows = (N-window)//(window-overlap) + 1
miss = (N-window)%(window-overlap)
if(miss != 0):
print('Windowing led to the loss of ', miss, ' samples.')
item_size = signal.dtype.itemsize
strides = (window - overlap) * item_size
return np.lib.stride_tricks.as_strided(signal, shape=(rows, window),
strides=(strides, item_size))
The solution for this case is, according to your code:
as_strided(a, (len(a) - 2, 3), (4, 4))
Alternatively, using the function window_signal:
window_signal(a, 3, 2)
Both return as output the following array:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])

Losing dimensions of a numpy array

I have a numpy array that consists of lists each containing more lists. I have been trying to figure out a smart and fast way to collapse the dimensions of these list using numpy, but without any luck.
What I have looks like this:
>>> np.shape(projected)
(13,)
>>> for i in range(len(projected)):
print np.shape(projected[i])
(130, 3200)
(137, 3200)
.
.
(307, 3200)
(196, 3200)
What I am trying to get is a list that contains all the sub-lists and would be 130+137+..+307+196 long. I have tried using np.reshape() but it gives an error: ValueError: total size of new array must be unchanged
np.reshape(projected,(total_number_of_lists, 3200))
>> ValueError: total size of new array must be unchanged
I have been fiddling around with np.vstack but to no avail. Any help that does not contain a for loop and an .append() would be highly appreciated.
It seems you can just use np.concatenate along the first axis axis=0 like so -
np.concatenate(projected,0)
Sample run -
In [226]: # Small random input list
...: projected = [[[3,4,1],[5,3,0]],
...: [[0,2,7],[8,2,8],[7,3,6],[1,9,0],[4,2,6]],
...: [[0,2,7],[8,2,8],[7,3,6]]]
In [227]: # Print nested lists shapes
...: for i in range(len(projected)):
...: print (np.shape(projected[i]))
...:
(2, 3)
(5, 3)
(3, 3)
In [228]: np.concatenate(projected,0)
Out[228]:
array([[3, 4, 1],
[5, 3, 0],
[0, 2, 7],
[8, 2, 8],
[7, 3, 6],
[1, 9, 0],
[4, 2, 6],
[0, 2, 7],
[8, 2, 8],
[7, 3, 6]])
In [232]: np.concatenate(projected,0).shape
Out[232]: (10, 3)

Categories