Iterate over columns of a NumPy array and elements of another one? - python

I am trying to replicate the behaviour of zip(a, b) in order to be able to loop simultaneously along two NumPy arrays. In particular, I have two arrays a and b:
a.shape=(n,m)
b.shape=(m,)
I would like to get for every loop a column of a and an element of b.
So far, I have tried the following:
for a_column, b_element in np.nditer([a, b]):
print(a_column)
However, I get printed the element a[0,0] rather than the column a[0,:], which I want.
How can I solve this?

You can still use zip on numpy arrays, because they are iterables.
In your case, you'd need to transpose a first, to make it an array of shape (m,n), i.e. an iterable of length m:
for a_column, b_element in zip(a.T, b):
...

Adapting my answer in shallow iteration with nditer,
nditer and ndindex can be used to iterate over rows or columns by generating indexes.
In [19]: n,m=3,4
In [20]: a=np.arange(n*m).reshape(n,m)
In [21]: b=np.arange(m)
In [22]: it=np.nditer(b)
In [23]: for i in it: print a[:,i],b[i]
[0 4 8] 0
[1 5 9] 1
[ 2 6 10] 2
[ 3 7 11] 3
In [24]: for i in np.ndindex(m):print a[:,i],b[i]
[[0]
[4]
[8]] 0
[[1]
[5]
[9]] 1
[[ 2]
[ 6]
[10]] 2
[[ 3]
[ 7]
[11]] 3
In [25]:
ndindex uses an iterator like: it = np.nditer(b, flags=['multi_index'].
For iteration over a single dimension like this, for i in range(m): works just as well.
Also from the other thread, here's a trick using order to iterate without the indexes:
In [28]: for i,j in np.nditer([a,b],order='F',flags=['external_loop']):
print i,j
[0 4 8] [0 0 0]
[1 5 9] [1 1 1]
[ 2 6 10] [2 2 2]
[ 3 7 11] [3 3 3]

Usually, because of NumPy's ability to broadcast arrays, it is not necessary to iterate over the columns of an array one-by-one. For example, if a has shape (n,m) and b has shape (m,) then you can add a+b and b will broadcast itself to shape (n, m) automatically.
Moreover, your calculation will complete much faster if it can be expressed through operations on the whole array, a, rather than through operations on pieces of a (such as on columns) using a Python for-loop.
Having said that, the easiest way to loop through the columns of a is to iterate over the index:
for i in np.arange(b.shape[0]):
a_column, b_element = a[:, i], b[i]
print(a_column)

Related

Generalized version of np.roll

I have a 2D array
a = np.array([[0,1,2,3],[4,5,6,7]])
that is a 2x4 array. I need to shift the elements of each of the two arrays in axis 0 in but with different steps, say 1 for the first and 2 for the second, so that the output will be
np.array([[1,2,3,0],[6,7,4,5]])
With np.roll it doesn't seem possible to do it, at least looking at the documentation, I don't see any useful hint. There exists another function doing this?
This is an attempt at a generalized version of numpy.roll.
import numpy as np
a = np.array([[0,1,2,3],[4,5,6,7]])
def roll(a, shifts, axis):
assert a.shape[axis] == len(shifts)
return np.stack([
np.roll(np.take(a, i, axis), shifts[i]) for i in range(len(shifts))
], axis)
print(a)
print(roll(a, [-1, -2], 0))
print(roll(a, [1, 2, 1, 0], 1))
prints
[[0 1 2 3]
[4 5 6 7]]
[[1 2 3 0]
[6 7 4 5]]
[[4 1 6 3]
[0 5 2 7]]
Here, the parameter a is a numpy.array, shifts is an Iterable containing the shift amounts per element and axis is the axis along which to shift. Note that was only tested on two-dimensional arrays however.

Get part of np array with parameters

I am using python and numpy. I am using n dimensional array.
I want to select all elements with index like
arr[a,b,:,c]
but I want to be able to select slice position like parameter. For example if the parameter
#pos =2
arr[a,b,:,c]
#pos =1
arr[a,:,b,c]
I would move the axis of interest (at pos) to the front with numpy.moveaxis(array,pos,0)[1] and then simply slice with [:,a,b,c].
There is also numpy.take[2], but in your case you would still need to loop over each dimension a,b,c, so I think moveaxis is more convenient. Maybe there is an even more direct way to do this.
The idea of moving the slicing axis to one end is a good one. Various numpy functions use that idea.
In [171]: arr = np.ones((2,3,4,5),int)
In [172]: arr[0,0,:,0].shape
Out[172]: (4,)
In [173]: arr[0,:,0,0].shape
Out[173]: (3,)
Another idea is to build a indexing tuple:
In [176]: idx = (0,0,slice(None),0)
In [177]: arr[idx].shape
Out[177]: (4,)
In [178]: idx = (0,slice(None),0,0)
In [179]: arr[idx].shape
Out[179]: (3,)
To do this programmatically it may be easier to start with a list or array that can be modified, and then convert it to a tuple for indexing. Details will vary depending on how you prefer to specify the axis and variables.
If any of a,b,c are arrays (or lists), you may get some shape surprises, since it's a case of mixing advanced and basic indexing. But as long as they are scalars, that's not an issue.
You could np.transpose the array arr based on your preferences before you try to slice it, since you move your axis of interest (i.e. the :) "to the back". This way, you can rearrange arr, s.t. you can always call arr[a,b,c].
Example with only a and b:
import numpy as np
a = 0
b = 2
target_axis = 1
# Generate some random data
arr = np.random.randint(10, size=[3, 3, 3], dtype=int)
print(arr)
#[[[0 8 2]
# [3 9 4]
# [0 3 6]]
#
# [[8 5 4]
# [9 8 5]
# [8 6 1]]
#
# [[2 2 5]
# [5 3 3]
# [9 1 8]]]
# Define transpose s.t. target_axis is the last axis
transposed_shape = np.arange(arr.ndim)
transposed_shape = np.delete(transposed_shape, target_axis)
transposed_shape = np.append(transposed_shape, target_axis)
print(transposed_shape)
#[0 2 1]
# Caution! These 0 and 2 above do not come from a or b.
# Instead they are the indices of the axes.
# Transpose arr
arr_T = np.transpose(arr, transposed_shape)
print(arr_T)
#[[[0 3 0]
# [8 9 3]
# [2 4 6]]
#
# [[8 9 8]
# [5 8 6]
# [4 5 1]]
#
# [[2 5 9]
# [2 3 1]
# [5 3 8]]]
print(arr_T[a,b])
#[2 4 6]

Slicing a 2D NumPy Array by all zero rows

This is essentially the 2D array equivalent of slicing a python list into smaller lists at indexes that store a particular value. I'm running a program that extracts a large amount of data out of a CSV file and copies it into a 2D NumPy array. The basic format of these arrays are something like this:
[[0 8 9 10]
[9 9 1 4]
[0 0 0 0]
[1 2 1 4]
[0 0 0 0]
[1 1 1 2]
[39 23 10 1]]
I want to separate my NumPy array along rows that contain all zero values to create a set of smaller 2D arrays. The successful result for the above starting array would be the arrays:
[[0 8 9 10]
[9 9 1 4]]
[[1 2 1 4]]
[[1 1 1 2]
[39 23 10 1]]
I've thought about simply iterating down the array and checking if the row has all zeros but the data I'm handling is substantially large. I have potentially millions of rows of data in the text file and I'm trying to find the most efficient approach as opposed to a loop that could waste computation time. What are your thoughts on what I should do? Is there a better way?
a is your array. You can use any to find all zero rows, remove them, and then use split to split by their indices:
#not_all_zero rows indices
idx = np.flatnonzero(a.any(1))
#all_zero rows indices
idx_zero = np.delete(np.arange(a.shape[0]),idx)
#select not_all_zero rows and split by all_zero row indices
output = np.split(a[idx],idx_zero-np.arange(idx_zero.size))
output:
[array([[ 0, 8, 9, 10],
[ 9, 9, 1, 4]]),
array([[1, 2, 1, 4]]),
array([[ 1, 1, 1, 2],
[39, 23, 10, 1]])]
You can use the np.all function to check for rows which are all zeros, and then index appropriately.
# assume `x` is your data
indices = np.all(x == 0, axis=1)
zeros = x[indices]
nonzeros = x[np.logical_not(indices)]
The all function accepts an axis argument (as do many NumPy functions), which indicates the axis along which to operate. 1 here means to do the reduction along rows, so you get back a boolean array of shape (x.shape[0],), which can be used to directly index x.
Note that this will be much faster than a for-loop over the rows, especially for large arrays.

which numpy command could I use to subtract vectors with different dimensions many times?

i have to write this function:
in which x is a vector with dimensions [150,2] and c is [N,2] (lets suppose N=20). From each component xi (i=1,2) I have to subtract the components of c in this way ([x11-c11,x12-c12])...([x11-cN1, x12-cN2])for all the 150 sample.
I've trasformed them in a way I have the same dimensions and I can subtract them, but the result of the function should be a vector. Maybe How can I write this in numpy?
Thank you
Ok, lets suppose x=(5,2) and c=(3,2)
this is what I have obtained transforming dimensions of the two arrays. the problem is that, I have to do this but with a iteration "for loop" because the exp function should give me as a result a vector. so I have to obtain a sort of matrix divided in N blocks.
From what I understand of the issue, the problem seems to be in the way you are calculating the vector norm, not in the subtraction. Using your example, but calculating exp(-||x-c||), try:
x = np.linspace(8,17,10).reshape((5,2))
c = np.linspace(1,6,6).reshape((3,2))
sub = np.linalg.norm(x[:,None] - c, axis=-1)
np.exp(-sub)
array([[ 5.02000299e-05, 8.49325705e-04, 1.43695961e-02],
[ 2.96711024e-06, 5.02000299e-05, 8.49325705e-04],
[ 1.75373266e-07, 2.96711024e-06, 5.02000299e-05],
[ 1.03655678e-08, 1.75373266e-07, 2.96711024e-06],
[ 6.12664624e-10, 1.03655678e-08, 1.75373266e-07]])
np.exp(-sub).shape
(5, 3)
numpy.linalg.norm will try to return some kind of matrix norm across all the dimensions of its input unless you tell it explicitly which axis represents the vector components.
I I understand, try if this give the expected result, but there is still the problem that the result has the same shape of x:
import numpy as np
x = np.arange(10).reshape(5,2)
c = np.arange(6).reshape(3,2)
c_col_sum = np.sum(c, axis=0)
for (h,k), value in np.ndenumerate(x):
x[h,k] = c.shape[0] * x[h,k] - c_col_sum[k]
Initially x is:
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
And c is:
[[0 1]
[2 3]
[4 5]]
After the function x becomes:
[[-6 -6]
[ 0 0]
[ 6 6]
[12 12]
[18 18]]

Iterating through matrix in python using numpy

I want to generate a resultant matrix by iterating through 5 different matrices and firstly i want to take first value of all matrix and take the average of these values and append the result as the first value of resultant matrix. Can anyone tell how to do this in python using numpy library??
In general you want to avoid (potentially slow) python-based looping and let numpy do (faster) c-based looping (or no looping at all).
Most people would call the approach of removing explicit loops as (numpy-)vectorization which is usually very important if going for performance.
The following example creates 5 numpy-arrays with size (3,3) (the matrix-type, which also exists, is kind of deprecated, not used here and most numpy-users should use arrays as replacement for matrices) and calculate a new matrix containing all the averages with the same shape (elementwise-mean over matrix-cells; we are interpreting the 2d-arrays as a matrix).
Code:
import numpy as np
a, b, c, d, e = [np.random.randint(0, 5, size=(3,3)) for i in range(5)]
all = np.stack((a, b, c, d, e), axis=0)
print(all.shape)
x = np.mean(all, axis=0)
print(a)
print(b)
print(c)
print(d)
print(e)
print(x)
Out:
(5, 3, 3)
[[0 0 0]
[0 1 0]
[2 4 0]]
[[4 2 0]
[3 3 4]
[0 4 0]]
[[3 4 0]
[2 2 1]
[0 0 4]]
[[3 1 2]
[4 3 4]
[2 0 3]]
[[3 4 2]
[3 1 0]
[1 0 0]]
[[ 2.6 2.2 0.8]
[ 2.4 2. 1.8]
[ 1. 1.6 1.4]]
If you still want to loop, you can just use a nested loop like:
for row in range(array.shape[0]):
for col in range(array.shape[1]):
cell_value = array[row, col]
...
given an array of 2 dimensions.

Categories