Selecting some dimensions from a multi-dimensional array - python

I have a 4-D array, and I need to process all 1-D vectors from this array along a given dimension. This works well:
def myfun(arr4d,selected_dim): # selected_dim can only be 2 or 3
print arr4d.shape # (2, 5, 10, 10)
for i in xrange(arr4d.shape[0]):
for j in xrange(arr4d.shape[1]):
for k in xrange(arr4d.shape[selected_dim]):
if selected_dim==2:
arr=arr4d[i,j,k,:]
elif selected_dim==3:
arr=arr4d[i,j,:,k]
do_something(arr) # arr is 1-D and has 10 items
...but I believe there is some way to avoid the nested "if" part, and maybe also be more efficient? Like creating other views of this array before the loops and then iterating through these views?

One common way to handle this is to use np.rollaxis:
def myfun(arr4d, selected_dim): # selected_dim can only be 2 or 3
arr4d = np.rollaxis(arr4d, selected_dim)
print arr4d.shape # (10, 2, 5, 10)
for i in xrange(arr4d.shape[1]):
for j in xrange(arr4d.shape[2]):
for k in xrange(arr4d.shape[0]):
arr=arr4d[k, i, j, :]
do_something(arr) # arr is 1-D and has 10 items
Note that np.rollaxis should return a view so it doesn't actually copy the array.

Related

numpy sum of each array in a list of arrays of different size

Given a list of numpy arrays, each of different length, as that obtained by doing lst = np.array_split(arr, indices), how do I get the sum of every array in the list? (I know how to do it using list-comprehension but I was hoping there was a pure-numpy way to do it).
I thought that this would work:
np.apply_along_axis(lambda arr: arr.sum(), axis=0, arr=lst)
But it doesn't, instead it gives me this error which I don't understand:
ValueError: operands could not be broadcast together with shapes (0,) (12,)
NB: It's an array of sympy objects.
There's a faster way which avoids np.split, and utilizes np.reduceat. We create an ascending array of indices where you want to sum elements with np.append([0], np.cumsum(indices)[:-1]). For proper indexing we need to put a zero in front (and discard the last element, if it covers the full range of the original array.. otherwise just delete the [:-1] indexing). Then we use the np.add ufunc with np.reduceat:
import numpy as np
arr = np.arange(1, 11)
indices = np.array([2, 4, 4])
# this should split like this
# [1 2 | 3 4 5 6 | 7 8 9 10]
np.add.reduceat(arr, np.append([0], np.cumsum(indices)[:-1]))
# array([ 3, 18, 34])

how to slice 2D numpy array based on 2 1D arrays containing initial and final indexes

I have a 2D numpy array, let's say it has shape 4x10 (4 rows and 10 columns). I have 2 1D arrays that have the initial and final indexes, so they are both 20x1. For an example, let's say
initial = [1, 2, 4, 5]
final = [3, 6, 8, 6]
then I'd like to get
data[0,1:3]
data[1,2:6]
data[2,4:8]
data[3,5:6]
Of course, each of those arrays would have different size, so I'd like to store them in a list.
If I were to do it with a for loop, it'd look like this:
arrays = []
for i in range(4):
slice = data[i,initial[i]:final[i]]
arrays.append(slice)
Is there a more efficient way to do this? I'd rather avoid use a for loop, because my actual data is huge.
You can use numpy.split with flattened data (using numpy.ndarray.flatten) and modifying the slices:
sections = np.column_stack([initial, final]).flatten()
sections[::2] += np.arange(len(initial)) * data.shape[1]
sections[1::2] += sections[::2] - np.array(initial)
np.split(data.flatten(), sections)[1::2]

create a list of list of N-dimensional numpy arrays

I want to create a list of list of 2x2 numpy arrays
array([[0, 0],
[1, 1]])
for example I want to fill a list with 8 of these arrays.
x = []
for j in range(9):
for i in np.random.randint(2, size=(2, 2)):
x.append([i])
this gives me a 1x1 array
z = iter(x)
next(z)
[array([0, 1])]
what am I missing here ?
You missed that are iterating over a 2x2 array 9 times. Each iteration yields a row of the array which is what you see when you look at the first element - the first row of the first matrix. Not only that, you append this row within a list, so you actually have 18 lists with a single element. What you want to do is append the matrix directly, with no inner loop and definitely no additional [] around, or better yet:
x = [np.random.randint(2, size=(2, 2)) for _ in range(9)]

How to convert list of numpy arrays into single numpy array?

Suppose I have ;
LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays
I try to convert;
array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5])
I am solving it by iteration on vstack right now but it is really slow for especially large LIST
What do you suggest for the best efficient way?
In general you can concatenate a whole sequence of arrays along any axis:
numpy.concatenate( LIST, axis=0 )
but you do have to worry about the shape and dimensionality of each array in the list (for a 2-dimensional 3x5 output, you need to ensure that they are all 2-dimensional n-by-5 arrays already). If you want to concatenate 1-dimensional arrays as the rows of a 2-dimensional output, you need to expand their dimensionality.
As Jorge's answer points out, there is also the function stack, introduced in numpy 1.10:
numpy.stack( LIST, axis=0 )
This takes the complementary approach: it creates a new view of each input array and adds an extra dimension (in this case, on the left, so each n-element 1D array becomes a 1-by-n 2D array) before concatenating. It will only work if all the input arrays have the same shape.
vstack (or equivalently row_stack) is often an easier-to-use solution because it will take a sequence of 1- and/or 2-dimensional arrays and expand the dimensionality automatically where necessary and only where necessary, before concatenating the whole list together. Where a new dimension is required, it is added on the left. Again, you can concatenate a whole list at once without needing to iterate:
numpy.vstack( LIST )
This flexible behavior is also exhibited by the syntactic shortcut numpy.r_[ array1, ...., arrayN ] (note the square brackets). This is good for concatenating a few explicitly-named arrays but is no good for your situation because this syntax will not accept a sequence of arrays, like your LIST.
There is also an analogous function column_stack and shortcut c_[...], for horizontal (column-wise) stacking, as well as an almost-analogous function hstack—although for some reason the latter is less flexible (it is stricter about input arrays' dimensionality, and tries to concatenate 1-D arrays end-to-end instead of treating them as columns).
Finally, in the specific case of vertical stacking of 1-D arrays, the following also works:
numpy.array( LIST )
...because arrays can be constructed out of a sequence of other arrays, adding a new dimension to the beginning.
Starting in NumPy version 1.10, we have the method stack. It can stack arrays of any dimension (all equal):
# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]
# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True
M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)
# This are all true
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True
Enjoy,
I checked some of the methods for speed performance and find that there is no difference!
The only difference is that using some methods you must carefully check dimension.
Timing:
|------------|----------------|-------------------|
| | shape (10000) | shape (1,10000) |
|------------|----------------|-------------------|
| np.concat | 0.18280 | 0.17960 |
|------------|----------------|-------------------|
| np.stack | 0.21501 | 0.16465 |
|------------|----------------|-------------------|
| np.vstack | 0.21501 | 0.17181 |
|------------|----------------|-------------------|
| np.array | 0.21656 | 0.16833 |
|------------|----------------|-------------------|
As you can see I tried 2 experiments - using np.random.rand(10000) and np.random.rand(1, 10000)
And if we use 2d arrays than np.stack and np.array create additional dimension - result.shape is (1,10000,10000) and (10000,1,10000) so they need additional actions to avoid this.
Code:
from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
new_np = np.random.rand(10000)
l.append(new_np)
start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')
start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')
start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')
start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')
Other solution is to use the asarray function:
numpy.asarray( LIST )
I found a much more robust numpy function reshape.
Problem of stack and vstack is, that it fails for an empty list.
>>> LIST = [np.array([1, 2, 3, 4, 5]), np.array([1, 2, 3, 4, 5]),np.array([1,2,3,4,5])]
>>> s = np.vstack(LIST)
>>> s.shape
(3, 5)
>>> s = np.vstack([])
ValueError: need at least one array to concatenate
The alternative is reshape
>>> s = np.reshape(LIST, (len(LIST),5))
>>> s.shape
(3, 5)
>>> LIST = []
>>> s = np.reshape(LIST, (len(LIST),5))
>>> s.shape
(0,5)
Drawback is, you need to know the length/shape of your inner array

Numpy: 2D array access with 2D array of indices

I have two arrays, one is a matrix of index pairs,
a = array([[[0,0],[1,1]],[[2,0],[2,1]]], dtype=int)
and another which is a matrix of data to access at these indices
b = array([[1,2,3],[4,5,6],[7,8,9]])
and I want to able to use the indices of a to get the entries of b. Just doing:
>>> b[a]
does not work, as it gives one row of b for each entry in a, i.e.
array([[[[1,2,3],
[1,2,3]],
[[4,5,6],
[4,5,6]]],
[[[7,8,9],
[1,2,3]],
[[7,8,9],
[4,5,6]]]])
when I would like to use the index pair in the last axis of a to give the two indices of b:
array([[1,5],[7,8]])
Is there a clean way of doing this, or do I need to reshape b and combine the columns of a in a corresponding manner?
In my actual problem a has about 5 million entries, and b is 100-by-100, I'd like to avoid for loops.
Actually, this works:
b[a[:, :, 0],a[:, :, 1]]
Gives array([[1, 5],
[7, 8]]).
For this case, this works
tmp = a.reshape(-1,2)
b[tmp[:,0], tmp[:,1]]
A more general solution, whenever you want to use a 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, in order to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
You can test this by using, e.g.:
B = 1/(np.arange(n*m).reshape(n,-1) + 1)
inds = np.random.randint(0,B.shape[1],(B.shape[0],B.shape[1]))

Categories