Related
so maybe this is a basic question about numpy, but I can't see how to do is, so lets say I have a 2D numpy array like this
import numpy as np
arr = np.array([[ 0., 460., 166., 167., 123.],
[ 0., 0., 0., 0., 0.],
[ 0., 81., 0., 21., 0.],
[ 0., 128., 23., 0., 12.],
[ 0., 36., 0., 13., 0.]])
And I want the coordinates from the subarray
[[0., 21,. 0.],
[23., 0., 12.],
[0., 13., 0.]]
I tried slicing my original array and the find the coordinates using np.argwhere like this
newarr = np.argwhere(arr[2:, 2:] != 0)
#output
#[[0 1]
# [1 0]
# [1 2]
# [2 1]]
Which are indeed the coordinates from the subarray but I was expecting the coordinates corresponding to my original array, the desired output is:
[[2 3]
[3 2]
[3 4]
[4 3]]
If I use the np.argwhere with my original array I get a bunch of coordinates that I don't need, so I can't figure it out how to get what I need, any help or if you can point me to the right direction will be great, thank you!
Assume origin on the top left corner of the matrix and the matrix itself placed in 4th quadrant of Cartesian space. The horizontal axis having the column indices, and the vertical axis coming down having row indices.
You will see the whole sub-matrix is origin shifted on (2,2) coordinate. Thus when the coordinates you get are with respect to sub-matrix on origin, then to get them back to (2,2) again, just add (2,2) in whole elements:
>>> np.argwhere(arr[2:, 2:] != 0) + [2, 2]
array([[2, 3],
[3, 2],
[3, 4],
[4, 3]])
For other examples:
>>> col_shift, row_shift = 3, 2
>>> arr[row_shift:, col_shift:]
array([[21., 0.],
[ 0., 12.],
[13., 0.]])
>>> np.argwhere(arr[row_shift:, col_shift:] != 0) + [row_shift, col_shift]
array([[2, 3],
[3, 4],
[4, 3]])
For a fully inside sub matrix, you can bound the column and rows:
>>> col_shift, row_shift = 0, 1
>>> col_bound, row_bound = 4, 4
>>> arr[row_shift:row_bound, col_shift:col_bound]
array([[ 0., 0., 0., 0.],
[ 0., 81., 0., 21.],
[ 0., 128., 23., 0.]])
>>> np.argwhere(arr[row_shift:row_bound, col_shift:col_bound] != 0) + [row_shift, col_shift]
array([[2, 1],
[2, 3],
[3, 1],
[3, 2]])
You have moved down the array two times and two times to the right. All that remains for you is to add the number of steps taken towards X and towards Y in the coordinates:
y = 2
x = 2
newarr = np.argwhere(arr[y:, x:] != 0)
X = (newarr[0:, 0] + x).reshape(4,1)
Y = (newarr[0:, 1] + y).reshape(4,1)
print(np.concatenate((X, Y), axis=1))
I think I've misunderstood something with indexing in numpy.
I have a 3D-numpy array of shape (dim_x, dim_y, dim_z) and I want to find the maximum along the third axis (dim_z), and set its value to 1 and all the others to zero.
The problem is that I end up with several 1 in the same row, even if values are different.
Here is the code :
>>> test = np.random.rand(2,3,2)
>>> test
array([[[ 0.13110146, 0.07138861],
[ 0.84444158, 0.35296986],
[ 0.97414498, 0.63728852]],
[[ 0.61301975, 0.02313646],
[ 0.14251848, 0.91090492],
[ 0.14217992, 0.41549218]]])
>>> result = np.zeros_like(test)
>>> result[:test.shape[0], np.arange(test.shape[1]), np.argmax(test, axis=2)]=1
>>> result
array([[[ 1., 0.],
[ 1., 1.],
[ 1., 1.]],
[[ 1., 0.],
[ 1., 1.],
[ 1., 1.]]])
I was expecting to end with :
array([[[ 1., 0.],
[ 1., 0.],
[ 1., 0.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
Probably I'm missing something here. From what I've understood, 0:dim_x, np.arange(dim_y) returns dim_x of dim_y tuples and np.argmax(test, axis=dim_z) has the shape (dim_x, dim_y) so if the indexing is of the form [x, y, z] a couple [x, y] is not supposed to appear twice.
Could someone explain me where I'm wrong ? Thanks in advance.
What we are looking for
We get the argmax indices along the last axis -
idx = np.argmax(test, axis=2)
For the given sample data, we have idx :
array([[0, 0, 0],
[0, 1, 1]])
Now, idx covers the first and second axes, while getting those argmax indices.
To assign the corresponding ones in the output, we need to create range arrays for the first two axes covering the lengths along those and aligned according to the shape of idx. Now, idx is a 2D array of shape (m,n), where m = test.shape[0] and n = test.shape[1].
Thus, the range arrays for assignment into first two axes of output must be -
X = np.arange(test.shape[0])[:,None]
Y = np.arange(test.shape[1])
Notice, the extension of the first range array to 2D is needed to have it aligned against the rows of idx and Y would align against the cols of idx -
In [239]: X
Out[239]:
array([[0],
[1]])
In [240]: Y
Out[240]: array([0, 1, 2])
Schematically put -
idx :
Y array
--------->
x x x | X array
x x x |
v
The fault in original code
Your code was -
result[:test.shape[0], np.arange(test.shape[1]), ..
This is essentially :
result[:, np.arange(test.shape[1]), ...
So, you are selecting all elements along the first axis, instead of only selecting the corresponding ones that correspond to idx indices. In that process, you were selecting a lot more than required elements for assignment and hence you were seeing many more than required 1s in result array.
The correction
Thus, the only correction needed was indexing into the first axis with the range array and a working solution would be -
result[np.arange(test.shape[0])[:,None], np.arange(test.shape[1]), ...
The alternative(s)
Alternatively, using the range arrays created earlier with X and Y -
result[X,Y,idx] = 1
Another way to get X,Y would be with np.mgrid -
m,n = test.shape[:2]
X,Y = np.ogrid[:m,:n]
I think there's a problem with mixing basic (slice) and advanced indexing. It's easier to see when selecting value from an array than with this assignment; but it can result in transposed axes. For a problem like this it is better use advanced indexing all around, as provided by ix_
In [24]: test = np.random.rand(2,3,2)
In [25]: idx=np.argmax(test,axis=2)
In [26]: idx
Out[26]:
array([[1, 0, 1],
[0, 1, 1]], dtype=int32)
with basic and advanced:
In [31]: res1 = np.zeros_like(test)
In [32]: res1[:, np.arange(test.shape[1]), idx]=1
In [33]: res1
Out[33]:
array([[[ 1., 1.],
[ 1., 1.],
[ 0., 1.]],
[[ 1., 1.],
[ 1., 1.],
[ 0., 1.]]])
with advanced:
In [35]: I,J = np.ix_(range(test.shape[0]), range(test.shape[1]))
In [36]: I
Out[36]:
array([[0],
[1]])
In [37]: J
Out[37]: array([[0, 1, 2]])
In [38]: res2 = np.zeros_like(test)
In [40]: res2[I, J , idx]=1
In [41]: res2
Out[41]:
array([[[ 0., 1.],
[ 1., 0.],
[ 0., 1.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
On further thought, the use of the slice for the 1st dimension is just wrong , if the goal is to set or find the 6 argmax values
In [54]: test
Out[54]:
array([[[ 0.15288242, 0.36013289],
[ 0.90794601, 0.15265616],
[ 0.34014976, 0.53804266]],
[[ 0.97979479, 0.15898605],
[ 0.04933804, 0.89804999],
[ 0.10199319, 0.76170911]]])
In [55]: test[I, J, idx]
Out[55]:
array([[ 0.36013289, 0.90794601, 0.53804266],
[ 0.97979479, 0.89804999, 0.76170911]])
In [56]: test[:, J, idx]
Out[56]:
array([[[ 0.36013289, 0.90794601, 0.53804266],
[ 0.15288242, 0.15265616, 0.53804266]],
[[ 0.15898605, 0.04933804, 0.76170911],
[ 0.97979479, 0.89804999, 0.76170911]]])
With the slice it selects a (2,3,2) set of values from test (or res), not the intended (2,3). There 2 extra rows.
Here is an easier way to do it:
>>> test == test.max(axis=2, keepdims=1)
array([[[ True, False],
[ True, False],
[ True, False]],
[[ True, False],
[False, True],
[False, True]]], dtype=bool)
...and if you really want that as floating-point 1.0 and 0.0, then convert it:
>>> (test==test.max(axis=2, keepdims=1)).astype(float)
array([[[ 1., 0.],
[ 1., 0.],
[ 1., 0.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
Here is a way to do it with only one winner per row-column combo (i.e. no ties, as discussed in comments):
rowmesh, colmesh = np.meshgrid(range(test.shape[0]), range(test.shape[1]), indexing='ij')
maxloc = np.argmax(test, axis=2)
flatind = np.ravel_multi_index( [rowmesh, colmesh, maxloc ], test.shape )
result = np.zeros_like(test)
result.flat[flatind] = 1
UPDATE after reading hpaulj's answer:
rowmesh, colmesh = np.ix_(range(test.shape[0]), range(test.shape[1]))
is a more-efficient, more numpythonic, alternative to my meshgrid call (the rest of the code stays the same)
The issue of why your approach fails is hard to explain, but here's one place where intuition could start: your slicing approach says "all rows, times all columns, times a certain sequence of layers". How many elements is that slice in total? By contrast, how many elements do you actually want to set to 1? It can be instructive to look at the values you get when you view the corresponding test values of the slice you're trying to assign to:
>>> test[:, :, maxloc].shape
(2, 3, 2, 3) # oops! it's because maxloc itself is 2x3
>>> test[:, :, maxloc]
array([[[[ 0.13110146, 0.13110146, 0.13110146],
[ 0.13110146, 0.07138861, 0.07138861]],
[[ 0.84444158, 0.84444158, 0.84444158],
[ 0.84444158, 0.35296986, 0.35296986]],
[[ 0.97414498, 0.97414498, 0.97414498],
[ 0.97414498, 0.63728852, 0.63728852]]],
[[[ 0.61301975, 0.61301975, 0.61301975],
[ 0.61301975, 0.02313646, 0.02313646]],
[[ 0.14251848, 0.14251848, 0.14251848],
[ 0.14251848, 0.91090492, 0.91090492]],
[[ 0.14217992, 0.14217992, 0.14217992],
[ 0.14217992, 0.41549218, 0.41549218]]]]) # note the repetition, because in maxloc you're repeatedly asking for layer 0 sometimes, and sometimes repeatedly for layer 1
Given the following numpy arrays:
import numpy
a=numpy.array([[1,1,1],[1,1,1],[1,1,1]])
b=numpy.array([[2,2,2],[2,2,2],[2,2,2]])
c=numpy.array([[3,3,3],[3,3,3],[3,3,3]])
and this dictionary containing them all:
mydict={0:a,1:b,2:c}
What is the most efficient way of iterating through mydict so to compute the average numpy array that has (1+2+3)/3=2 as values?
My attempt fails as I am giving it too many values to unpack. It is also extremely inefficient as it has an O(n^3) time complexity:
aver=numpy.empty([a.shape[0],a.shape[1]])
for c,v in mydict.values():
for i in range(0,a.shape[0]):
for j in range(0,a.shape[1]):
aver[i][j]=mydict[c][i][j] #<-too many values to unpack
The final result should be:
In[17]: aver
Out[17]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
EDIT
I am not looking for an average value for each numpy array. I am looking for an average value for each element of my colleciton of numpy arrays. This is a minimal example, but the real thing I am working on has over 120,000 elements per array, and for the same position the values change from array to array.
I think you're making this harder than it needs to be. Either sum them and divide by the number of terms:
In [42]: v = mydict.values()
In [43]: sum(v) / len(v)
Out[43]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
Or stack them into one big array -- which it sounds like is the format they probably should have been in to start with -- and take the mean over the stacked axis:
In [44]: np.array(list(v)).mean(axis=0)
Out[44]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
You really shouldn't be using a dict of numpy.arrays. Just use a multi-dimensional array:
>>> bigarray = numpy.array([arr.tolist() for arr in mydict.values()])
>>> bigarray
array([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
>>> bigarray.mean(axis=0)
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
>>>
You should modify your code to not even work with a dict. Especially not a dict with integer keys...
In NumPy, how can you efficiently make a 1-D object into a 2-D object where the singleton dimension is inferred from the current object (i.e. a list should go to either a 1xlength or lengthx1 vector)?
# This comes from some other, unchangeable code that reads data files.
my_list = [1,2,3,4]
# What I want to do:
my_numpy_array[some_index,:] = numpy.asarray(my_list)
# The above doesn't work because of a broadcast error, so:
my_numpy_array[some_index,:] = numpy.reshape(numpy.asarray(my_list),(1,len(my_list)))
# How to do the above without the call to reshape?
# Is there a way to directly convert a list, or vector, that doesn't have a
# second dimension, into a 1 by length "array" (but really it's still a vector)?
In the most general case, the easiest way to add extra dimensions to an array is by using the keyword None when indexing at the position to add the extra dimension. For example
my_array = numpy.array([1,2,3,4])
my_array[None, :] # shape 1x4
my_array[:, None] # shape 4x1
Why not simply add square brackets?
>> my_list
[1, 2, 3, 4]
>>> numpy.asarray([my_list])
array([[1, 2, 3, 4]])
>>> numpy.asarray([my_list]).shape
(1, 4)
.. wait, on second thought, why is your slice assignment failing? It shouldn't:
>>> my_list = [1,2,3,4]
>>> d = numpy.ones((3,4))
>>> d
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
>>> d[0,:] = my_list
>>> d[1,:] = numpy.asarray(my_list)
>>> d[2,:] = numpy.asarray([my_list])
>>> d
array([[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]])
even:
>>> d[1,:] = (3*numpy.asarray(my_list)).T
>>> d
array([[ 1., 2., 3., 4.],
[ 3., 6., 9., 12.],
[ 1., 2., 3., 4.]])
import numpy as np
a = np.random.random(10)
sel = np.at_least2d(a)[idx]
What about expand_dims?
np.expand_dims(np.array([1,2,3,4]), 0)
has shape (1,4) while
np.expand_dims(np.array([1,2,3,4]), 1)
has shape (4,1).
You can always use dstack() to replicate your array:
import numpy
my_list = array([1,2,3,4])
my_list_2D = numpy.dstack((my_list,my_list));
Assume three arrays in numpy:
a = np.zeros(5)
b = np.array([3,3,3,0,0])
c = np.array([1,5,10,50,100])
b can now be used as an index for a and c. For example:
In [142]: c[b]
Out[142]: array([50, 50, 50, 1, 1])
Is there any way to add up the values connected to the duplicate indexes with this kind of slicing? With
a[b] = c
Only the last values are stored:
array([ 100., 0., 0., 10., 0.])
I would like something like this:
a[b] += c
which would give
array([ 150., 0., 0., 16., 0.])
I'm mapping very large vectors onto 2D matrices and would really like to avoid loops...
The += operator for NumPy arrays simply doesn't work the way you are hoping, and I'm not aware of a away of making it work that way. As a work-around I suggest using numpy.bincount():
>>> numpy.bincount(b, c)
array([ 150., 0., 0., 16.])
Just append zeros as needed.
You could do something like:
def sum_unique(label, weight):
order = np.lexsort(label.T)
label = label[order]
weight = weight[order]
unique = np.ones(len(label), 'bool')
unique[:-1] = (label[1:] != label[:-1]).any(-1)
totals = weight.cumsum()
totals = totals[unique]
totals[1:] = totals[1:] - totals[:-1]
return label[unique], totals
And use it like this:
In [110]: coord = np.random.randint(0, 3, (10, 2))
In [111]: coord
Out[111]:
array([[0, 2],
[0, 2],
[2, 1],
[1, 2],
[1, 0],
[0, 2],
[0, 0],
[2, 1],
[1, 2],
[1, 2]])
In [112]: weights = np.ones(10)
In [113]: uniq_coord, sums = sum_unique(coord, weights)
In [114]: uniq_coord
Out[114]:
array([[0, 0],
[1, 0],
[2, 1],
[0, 2],
[1, 2]])
In [115]: sums
Out[115]: array([ 1., 1., 2., 3., 3.])
In [116]: a = np.zeros((3,3))
In [117]: x, y = uniq_coord.T
In [118]: a[x, y] = sums
In [119]: a
Out[119]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])
I just thought of this, it might be easier:
In [120]: flat_coord = np.ravel_multi_index(coord.T, (3,3))
In [121]: sums = np.bincount(flat_coord, weights)
In [122]: a = np.zeros((3,3))
In [123]: a.flat[:len(sums)] = sums
In [124]: a
Out[124]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])