eigenvalue and eigenvectors in python vs matlab

eigenvalue and eigenvectors in python vs matlab - python

I have noticed there is a difference between how matlab calculates the eigenvalue and eigenvector of a matrix, where matlab returns the real valued while numpy's return the complex valued eigen valus and vector. For example:
for matrix:
A=
1 -3 3
3 -5 3
6 -6 4
Numpy:
w, v = np.linalg.eig(A)
w
array([ 4. +0.00000000e+00j, -2. +1.10465796e-15j, -2. -1.10465796e-15j])
v
array([[-0.40824829+0.j , 0.24400118-0.40702229j,
0.24400118+0.40702229j],
[-0.40824829+0.j , -0.41621909-0.40702229j,
-0.41621909+0.40702229j],
[-0.81649658+0.j , -0.66022027+0.j , -0.66022027-0.j ]])
Matlab:
[E, D] = eig(A)
E
-0.4082 -0.8103 0.1933
-0.4082 -0.3185 -0.5904
-0.8165 0.4918 -0.7836
D
4.0000 0 0
0 -2.0000 0
0 0 -2.0000
Is there a way of getting the real eigen values in python as it is in matlab?

To get NumPy to return a diagonal array of real eigenvalues when the complex part is small, you could use
In [116]: np.real_if_close(np.diag(w))
Out[116]:
array([[ 4., 0., 0.],
[ 0., -2., 0.],
[ 0., 0., -2.]])
According to the Matlab docs,
[E, D] = eig(A) returns E and D which satisfy A*E = E*D:
I don't have Matlab, so I'll use Octave to check the result you posted:
octave:1> A = [[1, -3, 3],
[3, -5, 3],
[6, -6, 4]]
octave:6> E = [[ -0.4082, -0.8103, 0.1933],
[ -0.4082, -0.3185, -0.5904],
[ -0.8165, 0.4918, -0.7836]]
octave:25> D = [[4.0000, 0, 0],
[0, -2.0000, 0],
[0, 0, -2.0000]]
octave:29> abs(A*E - E*D)
ans =
3.0000e-04 0.0000e+00 3.0000e-04
3.0000e-04 2.2204e-16 3.0000e-04
0.0000e+00 4.4409e-16 6.0000e-04
The magnitude of the errors is mainly due to the values reported by Matlab being
displayed to a lower precision than the actual values Matlab holds in memory.
In NumPy, w, v = np.linalg.eig(A) returns w and v which satisfy
np.dot(A, v) = np.dot(v, np.diag(w)):
In [113]: w, v = np.linalg.eig(A)
In [135]: np.set_printoptions(formatter={'complex_kind': '{:+15.5f}'.format})
In [136]: v
Out[136]:
array([[-0.40825+0.00000j, +0.24400-0.40702j, +0.24400+0.40702j],
[-0.40825+0.00000j, -0.41622-0.40702j, -0.41622+0.40702j],
[-0.81650+0.00000j, -0.66022+0.00000j, -0.66022-0.00000j]])
In [116]: np.real_if_close(np.diag(w))
Out[116]:
array([[ 4., 0., 0.],
[ 0., -2., 0.],
[ 0., 0., -2.]])
In [112]: np.abs((np.dot(A, v) - np.dot(v, np.diag(w))))
Out[112]:
array([[4.44089210e-16, 3.72380123e-16, 3.72380123e-16],
[2.22044605e-16, 4.00296604e-16, 4.00296604e-16],
[8.88178420e-16, 1.36245817e-15, 1.36245817e-15]])
In [162]: np.abs((np.dot(A, v) - np.dot(v, np.diag(w)))).max()
Out[162]: 1.3624581677742195e-15
In [109]: np.isclose(np.dot(A, v), np.dot(v, np.diag(w))).all()
Out[109]: True

Related

Getting coordinates from a numpy array

so maybe this is a basic question about numpy, but I can't see how to do is, so lets say I have a 2D numpy array like this
import numpy as np
arr = np.array([[ 0., 460., 166., 167., 123.],
[ 0., 0., 0., 0., 0.],
[ 0., 81., 0., 21., 0.],
[ 0., 128., 23., 0., 12.],
[ 0., 36., 0., 13., 0.]])
And I want the coordinates from the subarray
[[0., 21,. 0.],
[23., 0., 12.],
[0., 13., 0.]]
I tried slicing my original array and the find the coordinates using np.argwhere like this
newarr = np.argwhere(arr[2:, 2:] != 0)
#output
#[[0 1]
# [1 0]
# [1 2]
# [2 1]]
Which are indeed the coordinates from the subarray but I was expecting the coordinates corresponding to my original array, the desired output is:
[[2 3]
[3 2]
[3 4]
[4 3]]
If I use the np.argwhere with my original array I get a bunch of coordinates that I don't need, so I can't figure it out how to get what I need, any help or if you can point me to the right direction will be great, thank you!

Assume origin on the top left corner of the matrix and the matrix itself placed in 4th quadrant of Cartesian space. The horizontal axis having the column indices, and the vertical axis coming down having row indices.
You will see the whole sub-matrix is origin shifted on (2,2) coordinate. Thus when the coordinates you get are with respect to sub-matrix on origin, then to get them back to (2,2) again, just add (2,2) in whole elements:
>>> np.argwhere(arr[2:, 2:] != 0) + [2, 2]
array([[2, 3],
[3, 2],
[3, 4],
[4, 3]])
For other examples:
>>> col_shift, row_shift = 3, 2
>>> arr[row_shift:, col_shift:]
array([[21., 0.],
[ 0., 12.],
[13., 0.]])
>>> np.argwhere(arr[row_shift:, col_shift:] != 0) + [row_shift, col_shift]
array([[2, 3],
[3, 4],
[4, 3]])
For a fully inside sub matrix, you can bound the column and rows:
>>> col_shift, row_shift = 0, 1
>>> col_bound, row_bound = 4, 4
>>> arr[row_shift:row_bound, col_shift:col_bound]
array([[ 0., 0., 0., 0.],
[ 0., 81., 0., 21.],
[ 0., 128., 23., 0.]])
>>> np.argwhere(arr[row_shift:row_bound, col_shift:col_bound] != 0) + [row_shift, col_shift]
array([[2, 1],
[2, 3],
[3, 1],
[3, 2]])

You have moved down the array two times and two times to the right. All that remains for you is to add the number of steps taken towards X and towards Y in the coordinates:
y = 2
x = 2
newarr = np.argwhere(arr[y:, x:] != 0)
X = (newarr[0:, 0] + x).reshape(4,1)
Y = (newarr[0:, 1] + y).reshape(4,1)
print(np.concatenate((X, Y), axis=1))

Understanding axes in NumPy

I was going through NumPy documentation, and am not able to understand one point. It mentions, for the example below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
How does the first dimension (axis) have a length of 2?
Edit:
The reason for my confusion is the below statement in the documentation.
The coordinates of a point in 3D space [1, 2, 1] is an array of rank
1, because it has one axis. That axis has a length of 3.
In the original 2D ndarray, I assumed that the number of lists identifies the rank/dimension, and I wrongly assumed that the length of each list denotes the length of each dimension (in that order). So, as per my understanding, the first dimension should be having a length of 3, since the length of the first list is 3.

In numpy, axis ordering follows zyx convention, instead of the usual (and maybe more intuitive) xyz.
Visually, it means that for a 2D array where the horizontal axis is x and the vertical axis is y:
x -->
y 0 1 2
| 0 [[1., 0., 0.],
V 1 [0., 1., 2.]]
The shape of this array is (2, 3) because it is ordered (y, x), with the first axis y of length 2.
And verifying this with slicing:
import numpy as np
a = np.array([[1, 0, 0], [0, 1, 2]], dtype=np.float)
>>> a
Out[]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
>>> a[0, :] # Slice index 0 of first axis
Out[]: array([ 1., 0., 0.]) # Get values along second axis `x` of length 3
>>> a[:, 2] # Slice index 2 of second axis
Out[]: array([ 0., 2.]) # Get values along first axis `y` of length 2

You may be confusing the other sentence with the picture example below. Think of it like this: Rank = number of lists in the list(array) and the term length in your question can be thought of length = the number of 'things' in the list(array)
I think they are trying to describe to you the definition of shape which is in this case (2,3)
in that post I think the key sentence is here:
In NumPy dimensions are called axes. The number of axes is rank.

If you print the numpy array
print(np.array([[ 1. 0. 0.],[ 0. 1. 2.]])
You'll get the following output
#col1 col2 col3
[[ 1. 0. 0.] # row 1
[ 0. 1. 2.]] # row 2
Think of it as a 2 by 3 matrix... 2 rows, 3 columns. It is a 2d array because it is a list of lists. ([[ at the start is a hint its 2d)).
The 2d numpy array
np.array([[ 1. 0., 0., 6.],[ 0. 1. 2., 7.],[3.,4.,5,8.]])
would print as
#col1 col2 col3 col4
[[ 1. 0. , 0., 6.] # row 1
[ 0. 1. , 2., 7.] # row 2
[3., 4. , 5., 8.]] # row 3
This is a 3 by 4 2d array (3 rows, 4 columns)

The first dimensions is the length:
In [11]: a = np.array([[ 1., 0., 0.], [ 0., 1., 2.]])
In [12]: a
Out[12]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
In [13]: len(a) # "length of first dimension"
Out[13]: 2
The second is the length of each "row":
In [14]: [len(aa) for aa in a] # 3 is "length of second dimension"
Out[14]: [3, 3]
Many numpy functions take axis as an argument, for example you can sum over an axis:
In [15]: a.sum(axis=0)
Out[15]: array([ 1., 1., 2.])
In [16]: a.sum(axis=1)
Out[16]: array([ 1., 3.])
The thing to note is that you can have higher dimensional arrays:
In [21]: b = np.array([[[1., 0., 0.], [ 0., 1., 2.]]])
In [22]: b
Out[22]:
array([[[ 1., 0., 0.],
[ 0., 1., 2.]]])
In [23]: b.sum(axis=2)
Out[23]: array([[ 1., 3.]])

Keep the following points in mind when considering Numpy axes:
Each sub-level of a list (or array) represents an axis. For example:
import numpy as np
a = np.array([1,2]) # 1 axis
b = np.array([[1,2],[3,4]]) # 2 axes
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) # 3 axes
Axis labels correspond to the level of the sub-list they represent, starting with axis 0 for the outer most list.
To illustrate this, consider the following array of different shape, each with 24 elements:
# 1D Array
a0 = np.array(
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
)
a0.shape # (24,) - here, the length along the 0-axis is 24
# 2D Array
a01 = np.array(
[
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4],
[4.1, 4.2, 4.3, 4.4],
[5.1, 5.2, 5.3, 5.4],
[6.1, 6.2, 6.3, 6.4]
]
)
a01.shape # (6, 4) - now, the length along the 0-axis is 6
# 3D Array
a012 = np.array(
[
[
[1.1.1, 1.1.2],
[1.2.1, 1.2.2],
[1.3.1, 1.3.2]
],
[
[2.1.1, 2.1.2],
[2.2.1, 2.2.2],
[2.3.1, 2.3.2]
],
[
[3.1.1, 3.1.2],
[3.2.1, 3.2.2],
[3.3.1, 3.3.2]
],
[
[4.1.1, 4.1.2],
[4.2.1, 4.2.2],
[4.3.1, 4.3.2]
]
)
a012.shape # (4, 3, 2) - and finally, the length along the 0-axis is 4

Numpy indexing set 1 to max value and zero's to all others

I think I've misunderstood something with indexing in numpy.
I have a 3D-numpy array of shape (dim_x, dim_y, dim_z) and I want to find the maximum along the third axis (dim_z), and set its value to 1 and all the others to zero.
The problem is that I end up with several 1 in the same row, even if values are different.
Here is the code :
>>> test = np.random.rand(2,3,2)
>>> test
array([[[ 0.13110146, 0.07138861],
[ 0.84444158, 0.35296986],
[ 0.97414498, 0.63728852]],
[[ 0.61301975, 0.02313646],
[ 0.14251848, 0.91090492],
[ 0.14217992, 0.41549218]]])
>>> result = np.zeros_like(test)
>>> result[:test.shape[0], np.arange(test.shape[1]), np.argmax(test, axis=2)]=1
>>> result
array([[[ 1., 0.],
[ 1., 1.],
[ 1., 1.]],
[[ 1., 0.],
[ 1., 1.],
[ 1., 1.]]])
I was expecting to end with :
array([[[ 1., 0.],
[ 1., 0.],
[ 1., 0.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
Probably I'm missing something here. From what I've understood, 0:dim_x, np.arange(dim_y) returns dim_x of dim_y tuples and np.argmax(test, axis=dim_z) has the shape (dim_x, dim_y) so if the indexing is of the form [x, y, z] a couple [x, y] is not supposed to appear twice.
Could someone explain me where I'm wrong ? Thanks in advance.

What we are looking for
We get the argmax indices along the last axis -
idx = np.argmax(test, axis=2)
For the given sample data, we have idx :
array([[0, 0, 0],
[0, 1, 1]])
Now, idx covers the first and second axes, while getting those argmax indices.
To assign the corresponding ones in the output, we need to create range arrays for the first two axes covering the lengths along those and aligned according to the shape of idx. Now, idx is a 2D array of shape (m,n), where m = test.shape[0] and n = test.shape[1].
Thus, the range arrays for assignment into first two axes of output must be -
X = np.arange(test.shape[0])[:,None]
Y = np.arange(test.shape[1])
Notice, the extension of the first range array to 2D is needed to have it aligned against the rows of idx and Y would align against the cols of idx -
In [239]: X
Out[239]:
array([[0],
[1]])
In [240]: Y
Out[240]: array([0, 1, 2])
Schematically put -
idx :
Y array
--------->
x x x | X array
x x x |
v
The fault in original code
Your code was -
result[:test.shape[0], np.arange(test.shape[1]), ..
This is essentially :
result[:, np.arange(test.shape[1]), ...
So, you are selecting all elements along the first axis, instead of only selecting the corresponding ones that correspond to idx indices. In that process, you were selecting a lot more than required elements for assignment and hence you were seeing many more than required 1s in result array.
The correction
Thus, the only correction needed was indexing into the first axis with the range array and a working solution would be -
result[np.arange(test.shape[0])[:,None], np.arange(test.shape[1]), ...
The alternative(s)
Alternatively, using the range arrays created earlier with X and Y -
result[X,Y,idx] = 1
Another way to get X,Y would be with np.mgrid -
m,n = test.shape[:2]
X,Y = np.ogrid[:m,:n]

I think there's a problem with mixing basic (slice) and advanced indexing. It's easier to see when selecting value from an array than with this assignment; but it can result in transposed axes. For a problem like this it is better use advanced indexing all around, as provided by ix_
In [24]: test = np.random.rand(2,3,2)
In [25]: idx=np.argmax(test,axis=2)
In [26]: idx
Out[26]:
array([[1, 0, 1],
[0, 1, 1]], dtype=int32)
with basic and advanced:
In [31]: res1 = np.zeros_like(test)
In [32]: res1[:, np.arange(test.shape[1]), idx]=1
In [33]: res1
Out[33]:
array([[[ 1., 1.],
[ 1., 1.],
[ 0., 1.]],
[[ 1., 1.],
[ 1., 1.],
[ 0., 1.]]])
with advanced:
In [35]: I,J = np.ix_(range(test.shape[0]), range(test.shape[1]))
In [36]: I
Out[36]:
array([[0],
[1]])
In [37]: J
Out[37]: array([[0, 1, 2]])
In [38]: res2 = np.zeros_like(test)
In [40]: res2[I, J , idx]=1
In [41]: res2
Out[41]:
array([[[ 0., 1.],
[ 1., 0.],
[ 0., 1.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
On further thought, the use of the slice for the 1st dimension is just wrong , if the goal is to set or find the 6 argmax values
In [54]: test
Out[54]:
array([[[ 0.15288242, 0.36013289],
[ 0.90794601, 0.15265616],
[ 0.34014976, 0.53804266]],
[[ 0.97979479, 0.15898605],
[ 0.04933804, 0.89804999],
[ 0.10199319, 0.76170911]]])
In [55]: test[I, J, idx]
Out[55]:
array([[ 0.36013289, 0.90794601, 0.53804266],
[ 0.97979479, 0.89804999, 0.76170911]])
In [56]: test[:, J, idx]
Out[56]:
array([[[ 0.36013289, 0.90794601, 0.53804266],
[ 0.15288242, 0.15265616, 0.53804266]],
[[ 0.15898605, 0.04933804, 0.76170911],
[ 0.97979479, 0.89804999, 0.76170911]]])
With the slice it selects a (2,3,2) set of values from test (or res), not the intended (2,3). There 2 extra rows.

Here is an easier way to do it:
>>> test == test.max(axis=2, keepdims=1)
array([[[ True, False],
[ True, False],
[ True, False]],
[[ True, False],
[False, True],
[False, True]]], dtype=bool)
...and if you really want that as floating-point 1.0 and 0.0, then convert it:
>>> (test==test.max(axis=2, keepdims=1)).astype(float)
array([[[ 1., 0.],
[ 1., 0.],
[ 1., 0.]],
[[ 1., 0.],
[ 0., 1.],
[ 0., 1.]]])
Here is a way to do it with only one winner per row-column combo (i.e. no ties, as discussed in comments):
rowmesh, colmesh = np.meshgrid(range(test.shape[0]), range(test.shape[1]), indexing='ij')
maxloc = np.argmax(test, axis=2)
flatind = np.ravel_multi_index( [rowmesh, colmesh, maxloc ], test.shape )
result = np.zeros_like(test)
result.flat[flatind] = 1
UPDATE after reading hpaulj's answer:
rowmesh, colmesh = np.ix_(range(test.shape[0]), range(test.shape[1]))
is a more-efficient, more numpythonic, alternative to my meshgrid call (the rest of the code stays the same)
The issue of why your approach fails is hard to explain, but here's one place where intuition could start: your slicing approach says "all rows, times all columns, times a certain sequence of layers". How many elements is that slice in total? By contrast, how many elements do you actually want to set to 1? It can be instructive to look at the values you get when you view the corresponding test values of the slice you're trying to assign to:
>>> test[:, :, maxloc].shape
(2, 3, 2, 3) # oops! it's because maxloc itself is 2x3
>>> test[:, :, maxloc]
array([[[[ 0.13110146, 0.13110146, 0.13110146],
[ 0.13110146, 0.07138861, 0.07138861]],
[[ 0.84444158, 0.84444158, 0.84444158],
[ 0.84444158, 0.35296986, 0.35296986]],
[[ 0.97414498, 0.97414498, 0.97414498],
[ 0.97414498, 0.63728852, 0.63728852]]],
[[[ 0.61301975, 0.61301975, 0.61301975],
[ 0.61301975, 0.02313646, 0.02313646]],
[[ 0.14251848, 0.14251848, 0.14251848],
[ 0.14251848, 0.91090492, 0.91090492]],
[[ 0.14217992, 0.14217992, 0.14217992],
[ 0.14217992, 0.41549218, 0.41549218]]]]) # note the repetition, because in maxloc you're repeatedly asking for layer 0 sometimes, and sometimes repeatedly for layer 1

How to split an array based on minimum row value using vectorization

I am trying to figure out how to take the following for loop that splits an array based on the index of the lowest value in the row and use vectorization. I've looked at this link and have been trying to use the numpy.where function but currently unsuccessful.
For example if an array has n columns, then all the rows where col[0] has the lowest value are put in one array, all the rows where col[1] are put in another, etc.
Here's the code using a for loop.
import numpy
a = numpy.array([[ 0. 1. 3.]
[ 0. 1. 3.]
[ 0. 1. 3.]
[ 1. 0. 2.]
[ 1. 0. 2.]
[ 1. 0. 2.]
[ 3. 1. 0.]
[ 3. 1. 0.]
[ 3. 1. 0.]])
result_0 = []
result_1 = []
result_2 = []
for value in a:
if value[0] <= value[1] and value[0] <= value[2]:
result_0.append(value)
elif value[1] <= value[0] and value[1] <= value[2]:
result_1.append(value)
else:
result_2.append(value)
print(result_0)
>>[array([ 0. 1. 3.]), array([ 0. 1. 3.]), array([ 0. 1. 3.])]
print(result_1)
>>[array([ 1. 0. 2.]), array([ 1. 0. 2.]), array([ 1. 0. 2.])]
print(result_2)
>>[array([ 3. 1. 0.]), array([ 3. 1. 0.]), array([ 3. 1. 0.])]

First, use argsort to see where the lowest value in each row is:
>>> a.argsort(axis=1)
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[1, 0, 2],
[1, 0, 2],
[1, 0, 2],
[2, 1, 0],
[2, 1, 0],
[2, 1, 0]])
Note that wherever a row has 0, that is the smallest column in that row.
Now you can build the results:
>>> sortidx = a.argsort(axis=1)
>>> [a[sortidx[:,i] == 0] for i in range(a.shape[1])]
[array([[ 0., 1., 3.],
[ 0., 1., 3.],
[ 0., 1., 3.]]),
array([[ 1., 0., 2.],
[ 1., 0., 2.],
[ 1., 0., 2.]]),
array([[ 3., 1., 0.],
[ 3., 1., 0.],
[ 3., 1., 0.]])]
So it is done with only a single loop over the columns, which will give a huge speedup if the number of rows is much larger than the number of columns.

This is not the best solution since it relies on simple python loops and is not very efficient when you start dealing with large data sets but it should get you started.
The point is to create an array of "buckets" which store the data based on the depth of the lengthiest element. Then enumerate each element in values, selecting the smallest one and saving its offset which is subsequently appended to the correct results "bucket", for each a. Finally we print this out in the last loop.
Solution using loops:
import numpy
import pprint
# random data set
a = numpy.array([[0, 1, 3],
[0, 1, 3],
[0, 1, 3],
[1, 0, 2],
[1, 0, 2],
[1, 0, 2],
[3, 1, 0],
[3, 1, 0],
[3, 1, 0]])
# create a list of results as big as the depth of elements in an entry
results = list()
for l in range(max(len(i) for i in a)):
results.append(list())
# don't do the following because all the references to the lists will be the same and you get dups:
# results = [[]]*max(len(i) for i in a)
for value in a:
res_offset, _val = min(enumerate(value), key=lambda x: x[1]) # get the offset and min value
results[res_offset].append(value) # store the original Array obj in the correct "bucket"
# print for visualization
for c, r in enumerate(results):
print("result_%s: %s" % (c, r))
Outputs:
result_0: [array([0, 1, 3]), array([0, 1, 3]), array([0, 1, 3])]
result_1: [array([1, 0, 2]), array([1, 0, 2]), array([1, 0, 2])]
result_2: [array([3, 1, 0]), array([3, 1, 0]), array([3, 1, 0])]

I found a much easier way to do this. I hope that I am interpreting the OP correctly.
My sense is that the OP wants to create a slice of the larger array based upon some set of conditions.
Note that the code above to create the array does not seem to work--at least in python 3.5. I generated the array as follow.
a = np.array([0., 1., 3., 0., 1., 3., 0., 1., 3., 1., 0., 2., 1., 0., 2.,1., 0., 2.,3., 1., 0.,3., 1., 0.,3., 1., 0.]).reshape([9,3])
Next, I sliced the original array into smaller arrays. Numpy has builtins to help with this.
result_0 = a[np.logical_and(a[:,0] <= a[:,1],a[:,0] <= a[:,2])]
result_1 = a[np.logical_and(a[:,1] <= a[:,0],a[:,1] <= a[:,2])]
result_2 = a[np.logical_and(a[:,2] <= a[:,0],a[:,2] <= a[:,1])]
This will generate new numpy arrays that match the given conditions.
Note if the user wants to convert these individual rows into a list or arrays, he/she can just enter the following code to obtain the result.
result_0 = [np.array(x) for x in result_0.tolist()]
result_0 = [np.array(x) for x in result_1.tolist()]
result_0 = [np.array(x) for x in result_2.tolist()]
This should generate the outcome requested in the OP.

Map arrays with duplicate indexes?

Assume three arrays in numpy:
a = np.zeros(5)
b = np.array([3,3,3,0,0])
c = np.array([1,5,10,50,100])
b can now be used as an index for a and c. For example:
In [142]: c[b]
Out[142]: array([50, 50, 50, 1, 1])
Is there any way to add up the values connected to the duplicate indexes with this kind of slicing? With
a[b] = c
Only the last values are stored:
array([ 100., 0., 0., 10., 0.])
I would like something like this:
a[b] += c
which would give
array([ 150., 0., 0., 16., 0.])
I'm mapping very large vectors onto 2D matrices and would really like to avoid loops...

The += operator for NumPy arrays simply doesn't work the way you are hoping, and I'm not aware of a away of making it work that way. As a work-around I suggest using numpy.bincount():
>>> numpy.bincount(b, c)
array([ 150., 0., 0., 16.])
Just append zeros as needed.

You could do something like:
def sum_unique(label, weight):
order = np.lexsort(label.T)
label = label[order]
weight = weight[order]
unique = np.ones(len(label), 'bool')
unique[:-1] = (label[1:] != label[:-1]).any(-1)
totals = weight.cumsum()
totals = totals[unique]
totals[1:] = totals[1:] - totals[:-1]
return label[unique], totals
And use it like this:
In [110]: coord = np.random.randint(0, 3, (10, 2))
In [111]: coord
Out[111]:
array([[0, 2],
[0, 2],
[2, 1],
[1, 2],
[1, 0],
[0, 2],
[0, 0],
[2, 1],
[1, 2],
[1, 2]])
In [112]: weights = np.ones(10)
In [113]: uniq_coord, sums = sum_unique(coord, weights)
In [114]: uniq_coord
Out[114]:
array([[0, 0],
[1, 0],
[2, 1],
[0, 2],
[1, 2]])
In [115]: sums
Out[115]: array([ 1., 1., 2., 3., 3.])
In [116]: a = np.zeros((3,3))
In [117]: x, y = uniq_coord.T
In [118]: a[x, y] = sums
In [119]: a
Out[119]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])
I just thought of this, it might be easier:
In [120]: flat_coord = np.ravel_multi_index(coord.T, (3,3))
In [121]: sums = np.bincount(flat_coord, weights)
In [122]: a = np.zeros((3,3))
In [123]: a.flat[:len(sums)] = sums
In [124]: a
Out[124]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

eigenvalue and eigenvectors in python vs matlab - python

Related

Getting coordinates from a numpy array

Understanding axes in NumPy

Numpy indexing set 1 to max value and zero's to all others

How to split an array based on minimum row value using vectorization

Map arrays with duplicate indexes?

Categories

Resources