Using numpy.argmax() on multidimensional arrays - python

I have a 4 dimensional array, i.e., data.shape = (20,30,33,288). I am finding the index of the closest array to n using
index = abs(data - n).argmin(axis = 1), so
index.shape = (20,33,288) with the indices varying.
I would like to use data[index] = "values" with values.shape = (20,33,288), but data[index] returns the error "index (8) out of range (0<=index<1) in dimension 0" or this operation takes a relatively long time to compute and returns a matrix with a shape that doesn't seem to make sense.
How do I return a array of correct values? i.e.,
data[index] = "values" with values.shape = (20,33,288)
This seems like a simple problem, is there a simple answer?
I would eventually like to find index2 = abs(data - n2).argmin(axis = 1), so I can perform an operation, say sum data at index to data at index2 without looping through the variables. Is this possible?
I am using python2.7 and numpy version 1.5.1.

You should be able to access the maximum values indexed by index using numpy.indices():
x, z, t = numpy.indices(index.shape)
data[x, index, z, t]

If I understood you correctly, this should work:
numpy.put(data, index, values)
I learned something new today, thanks.

Related

Appending numpy array of arrays

I am trying to append an array to another array but its appending them as if it was just one array. What I would like to have is have each array appended on its own index, (withoug having to use a list, i want to use np arrays) i.e
temp = np.array([])
for i in my_items
m = get_item_ids(i.color) #returns an array as [1,4,20,5,3] (always same number of items but diff ids
temp = np.append(temp, m, axis=0)
On the second iteration lets suppose i get [5,4,15,3,10]
then i would like to have temp as
array([1,4,20,5,3][5,4,15,3,10])
But instead i keep getting [1,4,20,5,3,5,4,15,3,10]
I am new to python but i am sure there is probably a way to concatenate in this way with numpy without using lists?
You have to reshape m in order to have two dimension with
m.reshape(-1, 1)
thus adding the second dimension. Then you could concatenate along axis=1.
np.concatenate(temp, m, axis=1)
List append is much better - faster and easier to use correctly.
temp = []
for i in my_items
m = get_item_ids(i.color) #returns an array as [1,4,20,5,3] (always same number of items but diff ids
temp = m
Look at the list to see what it created. Then make an array from that:
arr = np.array(temp)
# or `np.vstack(temp)

Numpy vectorize sum over indices

I have a list of indices (list(int)) and a list of summing indices (list(list(int)). Given a 2D numpy array, I need to find the sum over indices in the second list for each column and add them to the corresponding indices in the first column. Is there any way to vectorize this?
Here is the normal code:
indices = [1,0,2]
summing_indices = [[5,6,7],[6,7,8],[4,5]]
matrix = np.arange(9*3).reshape((9,3))
for c,i in enumerate(indices):
matrix[i,c] = matrix[summing_indices[i],c].sum()+matrix[i,c]
Here's an almost* vectorized approach using np.add.reduceat -
lens = np.array(map(len,summing_indices))
col = np.repeat(indices,lens)
row = np.concatenate(summing_indices)
vals = matrix[row,col]
addvals = np.add.reduceat(vals,np.append(0,lens.cumsum()[:-1]))
matrix[indices,np.arange(len(indices))] += addvals[indices.argsort()]
Please note that this has some setup overhead, so it would be best suited for 2D input arrays with a good number of columns as we are iterating along the columns.
*: Almost because of the use of map() at the start, but computationally that should be negligible.

setting null values in a numpy array

how do I null certain values in numpy array based on a condition?
I don't understand why I end up with 0 instead of null or empty values where the condition is not met... b is a numpy array populated with 0 and 1 values, c is another fully populated numpy array. All arrays are 71x71x166
a = np.empty(((71,71,166)))
d = np.empty(((71,71,166)))
for indexes, value in np.ndenumerate(b):
i,j,k = indexes
a[i,j,k] = np.where(b[i,j,k] == 1, c[i,j,k], d[i,j,k])
I want to end up with an array which only has values where the condition is met and is empty everywhere else but with out changing its shape
FULL ISSUE FOR CLARIFICATION as asked for:
I start with a float populated array with shape (71,71,166)
I make an int array based on a cutoff applied to the float array basically creating a number of bins, roughly marking out 10 areas within the array with 0 values in between
What I want to end up with is an array with shape (71,71,166) which has the average values in a particular array direction (assuming vertical direction, if you think of a 3D array as a 3D cube) of a certain "bin"...
so I was trying to loop through the "bins" b == 1, b == 2 etc, sampling the float where that condition is met but being null elsewhere so I can take the average, and then recombine into one array at the end of the loop....
Not sure if I'm making myself understood. I'm using the np.where and using the indexing as I keep getting errors when I try and do it without although it feels very inefficient.
Consider this example:
import numpy as np
data = np.random.random((4,3))
mask = np.random.random_integers(0,1,(4,3))
data[mask==0] = np.NaN
The data will be set to nan wherever the mask is 0. You can use any kind of condition you want, of course, or do something different for different values in b.
To erase everything except a specific bin, try the following:
c[b!=1] = np.NaN
So, to make a copy of everything in a specific bin:
a = np.copy(c)
a[b!=1] == np.NaN
To get the average of everything in a bin:
np.mean(c[b==1])
So perhaps this might do what you want (where bins is a list of bin values):
a = np.empty(c.shape)
a[b==0] = np.NaN
for bin in bins:
a[b==bin] = np.mean(c[b==bin])
np.empty sometimes fills the array with 0's; it's undefined what the contents of an empty() array is, so 0 is perfectly valid. For example, try this instead:
d = np.nan * np.empty((71, 71, 166)).
But consider using numpy's strength, and don't iterate over the array:
a = np.where(b, c, d)
(since b is 0 or 1, I've excluded the explicit comparison b == 1.)
You may even want to consider using a masked array instead:
a = np.ma.masked_where(b, c)
which seems to make more sense with respect to your question: "how do I null certain values in a numpy array based on a condition" (replace null with mask and you're done).

How to logically combine integer indices in numpy?

Does anyone know how to combine integer indices in numpy? Specifically, I've got the results of a few np.wheres and I would like to extract the elements that are common between them.
For context, I am trying to populate a large 3d array with the number of elements that are between boundary values of each cell, i.e. I have records of individual events including their time, latitude and longitude. I want to grid this into a 3D frequency matrix, where the dimensions are time, lat and lon.
I could loop round the array elements doing an np.where(timeCondition & latCondition & lonCondition), population with the length of the where result, but I figured this would be very inefficient as you would have to repeat a lot of the wheres.
What would be better is to just have a list of wheres for each of the cells in each dimension, and then loop through the logically combining them?
as #ali_m said, use bitwise and should be much faster, but to answer your question:
call ravel_multi_index() to convert the multi-dim index into 1-dim index.
call intersect1d() to get the index that in both condition.
call unravel_index() to convert the 1-dim index back to multi-dim index.
Here is the code:
import numpy as np
a = np.random.rand(10, 20, 30)
idx1 = np.where(a>0.2)
idx2 = np.where(a<0.4)
ridx1 = np.ravel_multi_index(idx1, a.shape)
ridx2 = np.ravel_multi_index(idx2, a.shape)
ridx = np.intersect1d(ridx1, ridx2)
idx = np.unravel_index(ridx, a.shape)
np.allclose(a[idx], a[(a>0.2) & (a<0.4)])
or you can use ridx directly:
a.ravel()[ridx]

Numpy signed maximum magnitude of cumsum along an axis

I have a numpy array a, a.shape=(17,90,144). I want to find the maximum magnitude of each column of cumsum(a, axis=0), but retaining the original sign. In other words, if for a given column a[:,j,i] the largest magnitude of cumsum corresponds to a negative value, I want to retain the minus sign.
The code np.amax(np.abs(a.cumsum(axis=0))) gets me the magnitude, but doesn't retain the sign. Using np.argmax instead will get me the indices I need, which I can then plug into the original cumsum array. But I can't find a good way to do so.
The following code works, but is dirty and really slow:
max_mag_signed = np.zeros((90,144))
indices = np.argmax(np.abs(a.cumsum(axis=0)), axis=0)
for j in range(90):
for i in range(144):
max_mag_signed[j,i] = a.cumsum(axis=0)[indices[j,i],j,i]
There must be a cleaner, faster way to do this. Any ideas?
I can't find any alternatives to argmax but at least you can fasten that with a more vectorized approach:
# store the cumsum, since it's used multiple times
cum_a = a.cumsum(axis=0)
# find the indices as before
indices = np.argmax(abs(cum_a), axis=0)
# construct the indices for the second and third dimensions
y, z = np.indices(indices.shape)
# get the values with np indexing
max_mag_signed = cum_a[indices, y, z]

Categories