This question already has answers here:
How to take elements along a given axis, given by their indices?
(4 answers)
indexing a numpy array with indices from another array
(1 answer)
Closed 4 years ago.
Let's say I've d1, d2 and d3 as following. t is a variable where I've combined my arrays and m contains the indices of the smallest value.
>>> d1
array([[ 0.9850916 , 0.95004463, 1.35728604, 1.18554035],
[ 0.47624542, 0.45561795, 0.6231743 , 0.94746001],
[ 0.74008166, 0. , 1.59774065, 1.00423774],
[ 0.86173439, 0.70940862, 1.0601817 , 0.96112015],
[ 1.03413477, 0.64874991, 1.27488263, 0.80250053]])
>>> d2
array([[ 0.27301946, 0.38387185, 0.93215524, 0.98851404],
[ 0.17996978, 0. , 0.41283798, 0.15204035],
[ 0.10952115, 0.45561795, 0.5334015 , 0.75242805],
[ 0.4600214 , 0.74100962, 0.16743427, 0.36250385],
[ 0.60984208, 0.35161234, 0.44580535, 0.6713633 ]])
>>> d3
array([[ 0. , 0.19658541, 1.14605925, 1.18431945],
[ 0.10697428, 0.27301946, 0.45536417, 0.11922118],
[ 0.42153386, 0.9850916 , 0.28225364, 0.82765657],
[ 1.04940684, 1.63082272, 0.49987388, 0.38596938],
[ 0.21015723, 1.07007177, 0.22599987, 0.89288339]])
>>> t = np.array([d1, d2, d3])
>>> t
array([[[ 0.9850916 , 0.95004463, 1.35728604, 1.18554035],
[ 0.47624542, 0.45561795, 0.6231743 , 0.94746001],
[ 0.74008166, 0. , 1.59774065, 1.00423774],
[ 0.86173439, 0.70940862, 1.0601817 , 0.96112015],
[ 1.03413477, 0.64874991, 1.27488263, 0.80250053]],
[[ 0.27301946, 0.38387185, 0.93215524, 0.98851404],
[ 0.17996978, 0. , 0.41283798, 0.15204035],
[ 0.10952115, 0.45561795, 0.5334015 , 0.75242805],
[ 0.4600214 , 0.74100962, 0.16743427, 0.36250385],
[ 0.60984208, 0.35161234, 0.44580535, 0.6713633 ]],
[[ 0. , 0.19658541, 1.14605925, 1.18431945],
[ 0.10697428, 0.27301946, 0.45536417, 0.11922118],
[ 0.42153386, 0.9850916 , 0.28225364, 0.82765657],
[ 1.04940684, 1.63082272, 0.49987388, 0.38596938],
[ 0.21015723, 1.07007177, 0.22599987, 0.89288339]]])
>>> m = np.argmin(t, axis=0)
>>> m
array([[2, 2, 1, 1],
[2, 1, 1, 2],
[1, 0, 2, 1],
[1, 0, 1, 1],
[2, 1, 2, 1]])
From m and t, I want to calculate the actual values as following. How do I do this? ... preferably, the efficient way?
array([ [ 0. , 0.19658541, 0.93215524, 0.98851404],
[ 0.10697428, 0. , 0.41283798, 0.11922118],
[ 0.10952115, 0. , 0.28225364, 0.75242805],
[ 0.4600214 , 0.70940862, 0.16743427, 0.36250385],
[ 0.21015723, 0.35161234, 0.22599987, 0.6713633 ]])
If only the minimum is what you needed, you can use np.min(t, axis=0)
If you want to use customary indexing, you can use choose:
m.choose(t) # This will return the same thing.
It can also be written as
np.choose(m, t)
Which returns:
array([[0. , 0.19658541, 0.93215524, 0.98851404],
[0.10697428, 0. , 0.41283798, 0.11922118],
[0.10952115, 0. , 0.28225364, 0.75242805],
[0.4600214 , 0.70940862, 0.16743427, 0.36250385],
[0.21015723, 0.35161234, 0.22599987, 0.6713633 ]])
Related
I have the following numpy arrays:
theta_array =
array([[ 1, 10],
[ 1, 11],
[ 1, 12],
[ 1, 13],
[ 1, 14],
[ 2, 10],
[ 2, 11],
[ 2, 12],
[ 2, 13],
[ 2, 14],
[ 3, 10],
[ 3, 11],
[ 3, 12],
[ 3, 13],
[ 3, 14],
[ 4, 10],
[ 4, 11],
[ 4, 12],
[ 4, 13],
[ 4, 14]])
XY_array =
array([[ 44.0394952 , 505.81099922],
[ 61.03882938, 515.97253226],
[ 26.69851841, 525.18083012],
[ 46.78487831, 533.42309602],
[ 45.77188401, 545.42988355],
[ 81.12969132, 554.78767379],
[ 54.178463 , 565.8716283 ],
[ 41.58952084, 574.76827133],
[ 85.24956815, 585.1355127 ],
[ 80.73726733, 595.49446033],
[ 22.70625059, 605.59017175],
[ 40.66810604, 615.26308629],
[ 47.16694695, 624.39222332],
[ 48.72499541, 633.19846364],
[ 50.68589921, 643.72334885],
[ 38.42731134, 654.68595883],
[ 47.39519707, 666.28232866],
[ 58.07767155, 673.9572227 ],
[ 72.11393347, 683.68307373],
[ 53.70872932, 694.65509894],
[ 82.08237952, 704.5868817 ],
[ 46.64069738, 715.18427515],
[ 40.46032478, 723.91308011],
[ 75.69090892, 733.69595658],
[120.61447884, 745.31322786],
[ 60.17764744, 754.89747186],
[ 87.15961973, 766.24040447],
[ 82.93872713, 773.01518252],
[ 93.56688906, 785.60640153],
[ 70.0474047 , 793.81792947],
[104.3613818 , 805.40234676],
[108.39253837, 814.75002114],
[ 78.97643673, 824.95386427],
[ 85.69096895, 834.44797862],
[ 53.07112931, 844.39555058],
[111.49875807, 855.660508 ],
[ 70.88824958, 865.53417489],
[ 79.55499469, 875.31303945],
[ 60.86941464, 885.85235946],
[101.06017712, 896.69986636],
[ 74.55823544, 905.87417231],
[113.24705653, 915.19350121],
[ 94.21920882, 925.87933273],
[ 63.26478103, 933.70804578],
[ 95.97827181, 945.76196917],
[ 80.48623318, 955.60422694],
[ 80.03451808, 964.39856485],
[ 73.86032436, 973.91032818],
[103.96923524, 984.24366761],
[ 93.20663129, 995.44618851]])
I am trying to combine both, so for each combination of theta_array I get all combinations from XY_array.
I am aware about this post so I have done this:
np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)
But this generates:
array([[ 1. , 44.0394952 , 1. , 505.81099922],
[ 1. , 61.03882938, 1. , 515.97253226],
[ 1. , 26.69851841, 1. , 525.18083012],
...,
[ 14. , 73.86032436, 14. , 973.91032818],
[ 14. , 103.96923524, 14. , 984.24366761],
[ 14. , 93.20663129, 14. , 995.44618851]])
and the problem requires:
array([[ 1. , 1. , 44.0394952 , 505.81099922],
[ 1. , 1. , 61.03882938, 515.97253226],
[ 1. , 1. , 26.69851841, 525.18083012],
...,
[ 14. , 14. , 73.86032436, 973.91032818],
[ 14. , 14. , 103.96923524, 984.24366761],
[ 14. , 14. , 93.20663129, 995.44618851]])
Which would be the way of doing this combination/aggregation in numpy?
EDIT:
There is a mistake in the above process as the combined arrays do not lead to the generation of that matrix. With separate vectors for each column the actual solution to merge this is:
dataset = np.array(np.meshgrid(theta0_range, theta1_range, X)).T.reshape(-1,3)
And later the Y vector can be added as an additional column.
You can reorder the "columns" after using meshgrid with [:,[0,2,1,3]] and if you need to make the list dynamic because of a large number of columns, then you can see the end of my answer:
np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,[0,2,1,3]]
Output:
array([[ 1. , 1. , 44.0394952 , 505.81099922]],
[[ 1. , 1. , 61.03882938, 515.97253226]],
[[ 1. , 1. , 26.69851841, 525.18083012]],
...,
[[ 14. , 14. , 73.86032436, 973.91032818]],
[[ 14. , 14. , 103.96923524, 984.24366761]],
[[ 14. , 14. , 93.20663129, 995.44618851]])
If you have many columns you could dynamically create this list: [0,2,1,3] with list comprehension. For example:
n = new_arr.shape[1]*2
lst = [x for x in range(n) if x % 2 == 0]
[lst.append(z) for z in [y for y in range(n) if y % 2 == 1]]
lst
[0, 2, 4, 6, 1, 3, 5, 7]
Then, you could rewrite to:
np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,lst]
You can use itertools.product:
out = np.array([*product(theta_array, XY_array)])
out = out.reshape(out.shape[0],-1)
Output:
array([[ 1. , 10. , 44.0394952 , 505.81099922],
[ 1. , 10. , 61.03882938, 515.97253226],
[ 1. , 10. , 26.69851841, 525.18083012],
...,
[ 4. , 14. , 73.86032436, 973.91032818],
[ 4. , 14. , 103.96923524, 984.24366761],
[ 4. , 14. , 93.20663129, 995.44618851]])
That said, this looks very much like an XY-problem. What are you trying to do with this array?
Just as side/complementary reference here is a comparison in terms of execution time for both solutions. For this specific operation itertools takes 10 times more time to complete than its numpy equivalent.
%%time
for i in range(1000):
z = np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,[0,2,1,3]]
CPU times: user 299 ms, sys: 0 ns, total: 299 ms
Wall time: 328 ms
%%time
for i in range(1000):
z = np.array([*product(theta_array, XY_array)])
z = z.reshape(z.shape[0],-1)
CPU times: user 2.79 s, sys: 474 µs, total: 2.79 s
Wall time: 2.84 s
This question already has answers here:
Numpy array loss of dimension when masking
(5 answers)
Closed 3 years ago.
The question sounds very basic. But when I try to use where or boolean conditions on numpy arrays, it always returns a flattened array.
I have the NumPy array
P = array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
I want to extract the array of only negative values, but when I try
P[P<0]
array([-0.19012371, -0.41421612, -0.20315014, -0.56598029, -0.21166188,
-0.08773241, -0.09241335])
P[np.where(P<0)]
array([-0.19012371, -0.41421612, -0.20315014, -0.56598029, -0.21166188,
-0.08773241, -0.09241335])
I get a flattened array. How can I extract the array of the form
array([[ 0, 0, -0.19012371],
[ 0 , 0, -0.20315014],
[ 0, 0, -0.56598029],
[ 0, -0.21166188, -0.08773241]])
I do not wish to create a temp array and then use something like Temp[Temp>=0] = 0
Since your need is:
I want to "extract" the array of only negative values
You can use numpy.where() with your condition (checking for negative values), which can preserve the dimension of the array, as in the below example:
In [61]: np.where(P<0, P, 0)
Out[61]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
where P is your input array.
Another idea could be to use numpy.zeros_like() for initializing a same shape array and numpy.where() to gather the indices at which our condition satisfies.
# initialize our result array with zeros
In [106]: non_positives = np.zeros_like(P)
# gather the indices where our condition is obeyed
In [107]: idxs = np.where(P < 0)
# copy the negative values to correct indices
In [108]: non_positives[idxs] = P[idxs]
In [109]: non_positives
Out[109]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
Yet another idea would be to simply use the barebones numpy.clip() API, which would return a new array, if we omit the out= kwarg.
In [22]: np.clip(P, -np.inf, 0) # P.clip(-np.inf, 0)
Out[22]:
array([[ 0. , 0. , -0.19012371],
[ 0. , 0. , -0.20315014],
[ 0. , 0. , -0.56598029],
[ 0. , -0.21166188, -0.08773241]])
This should work, essentially get the indexes of all elements which are above 0, and set them to 0, this will preserve the dimensions! I got the idea from here: Replace all elements of Python NumPy Array that are greater than some value
Also note that I have modified the original array, I haven't used a temp array here
import numpy as np
P = np.array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
P[P >= 0] = 0
print(P)
The output will be
[[ 0. 0. -0.19012371]
[ 0. 0. -0.20315014]
[ 0. 0. -0.56598029]
[ 0. -0.21166188 -0.08773241]]
As noted below, this will modify the array, so we should use np.where(P<0, P 0) to preserve the original array as follows, thanks #kmario123 as follows
import numpy as np
P = np.array([[ 0.49530662, 0.07901 , -0.19012371],
[ 0.1421513 , 0.48607405, -0.20315014],
[ 0.76467375, 0.16479826, -0.56598029],
[ 0.53530718, -0.21166188, -0.08773241]])
print( np.where(P<0, P, 0))
print(P)
The output will be
[[ 0. 0. -0.19012371]
[ 0. 0. -0.20315014]
[ 0. 0. -0.56598029]
[ 0. -0.21166188 -0.08773241]]
[[ 0.49530662 0.07901 -0.19012371]
[ 0.1421513 0.48607405 -0.20315014]
[ 0.76467375 0.16479826 -0.56598029]
[ 0.53530718 -0.21166188 -0.08773241]]
I have following numpy array
import numpy as np
np.random.seed(20)
np.random.rand(20).reshape(5, 4)
array([[ 0.5881308 , 0.89771373, 0.89153073, 0.81583748],
[ 0.03588959, 0.69175758, 0.37868094, 0.51851095],
[ 0.65795147, 0.19385022, 0.2723164 , 0.71860593],
[ 0.78300361, 0.85032764, 0.77524489, 0.03666431],
[ 0.11669374, 0.7512807 , 0.23921822, 0.25480601]])
For each column I would like to slice it in positions:
position_for_slicing=[0, 3, 4, 4]
So I will get following array:
array([[ 0.5881308 , 0.85032764, 0.23921822, 0.81583748],
[ 0.03588959, 0.7512807 , 0 0],
[ 0.65795147, 0, 0 0],
[ 0.78300361, 0, 0 0],
[ 0.11669374, 0, 0 0]])
Is there fast way to do this ? I know I can use to do for loop for each column, but I was wondering if there is more elegant way to do this.
If "elegant" means "no loop" the following would qualify, but probably not under many other definitions (arr is your input array):
m, n = arr.shape
arrf = np.asanyarray(arr, order='F')
padded = np.r_[arrf, np.zeros_like(arrf)]
assert padded.flags['F_CONTIGUOUS']
expnd = np.lib.stride_tricks.as_strided(padded, (m, m+1, n), padded.strides[:1] + padded.strides)
expnd[:, [0,3,4,4], range(4)]
# array([[ 0.5881308 , 0.85032764, 0.23921822, 0.25480601],
# [ 0.03588959, 0.7512807 , 0. , 0. ],
# [ 0.65795147, 0. , 0. , 0. ],
# [ 0.78300361, 0. , 0. , 0. ],
# [ 0.11669374, 0. , 0. , 0. ]])
Please note that order='C' and then 'C_CONTIGUOUS' in the assertion also works. My hunch is that 'F' could be a bit faster because the indexing then operates on contiguous slices.
I want to add a vector as the first column of my 2D array which looks like :
[[ 1. 0. 0. nan]
[ 4. 4. 9.97 1. ]
[ 4. 4. 27.94 1. ]
[ 2. 1. 4.17 1. ]
[ 3. 2. 38.22 1. ]
[ 4. 4. 31.83 1. ]
[ 3. 4. 41.87 1. ]
[ 2. 1. 18.33 1. ]
[ 4. 4. 33.96 1. ]
[ 2. 1. 5.65 1. ]
[ 3. 3. 40.74 1. ]
[ 2. 1. 10.04 1. ]
[ 2. 2. 53.15 1. ]]
I want to add an aray [] of 13 elements as the first column of the matrix. I tried with np.stack_column, np.append but it is for 1D vector or doesn't work because I can't chose axis=1 and only do np.append(peak_values, results)
I have a very simple option for you using numpy -
x = np.array( [[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.942777 , -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427767 ,-4.297677 ],
[ 3.9427767, -4.297677 ],
[ 3.9427772 ,-4.297677 ]])
b = np.arange(10).reshape(-1,1)
np.concatenate((b.T, x), axis=1)
Output-
array([[ 0. , 3.9427767, -4.297677 ],
[ 1. , 3.9427767, -4.297677 ],
[ 2. , 3.9427767, -4.297677 ],
[ 3. , 3.9427767, -4.297677 ],
[ 4. , 3.942777 , -4.297677 ],
[ 5. , 3.9427767, -4.297677 ],
[ 6. , 3.9427767, -4.297677 ],
[ 7. , 3.9427767, -4.297677 ],
[ 8. , 3.9427767, -4.297677 ],
[ 9. , 3.9427772, -4.297677 ]])
Improving on this answer by removing the unnecessary transposition, you can indeed use reshape(-1, 1) to transform the 1d array you'd like to prepend along axis 1 to the 2d array to a 2d array with a single column. At this point, the arrays only differ in shape along the second axis and np.concatenate accepts the arguments:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> b = np.arange(3)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> b
array([0, 1, 2])
>>> b.reshape(-1, 1) # preview the reshaping...
array([[0],
[1],
[2]])
>>> np.concatenate((b.reshape(-1, 1), a), axis=1)
array([[ 0, 0, 1, 2, 3],
[ 1, 4, 5, 6, 7],
[ 2, 8, 9, 10, 11]])
For the simplest answer, you probably don't even need numpy.
Try the following:
new_array = []
new_array.append(your_array)
That's it.
I would suggest using Numpy. It will allow you to easily do what you want.
Here is an example of squaring the entire set. you can use something like nums[0].
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print even_squares # Prints "[0, 4, 16]"
I have been trying to divide a python scipy sparse matrix by a vector sum of its rows. Here is my code
sparse_mat = bsr_matrix((l_data, (l_row, l_col)), dtype=float)
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
However, it throws an error no matter how I try it
sparse_mat = sparse_mat / (sparse_mat.sum(axis = 1)[:,None])
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 381, in __div__
return self.__truediv__(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 427, in __truediv__
raise NotImplementedError
NotImplementedError
Anyone with an idea of where I am going wrong?
You can circumvent the problem by creating a sparse diagonal matrix from the reciprocals of your row sums and then multiplying it with your matrix. In the product the diagonal matrix goes left and your matrix goes right.
Example:
>>> a
array([[0, 9, 0, 0, 1, 0],
[2, 0, 5, 0, 0, 9],
[0, 2, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 0],
[0, 9, 5, 3, 0, 7],
[1, 0, 0, 8, 9, 0]])
>>> b = sparse.bsr_matrix(a)
>>>
>>> c = sparse.diags(1/b.sum(axis=1).A.ravel())
>>> # on older scipy versions the offsets parameter (default 0)
... # is a required argument, thus
... # c = sparse.diags(1/b.sum(axis=1).A.ravel(), 0)
...
>>> a/a.sum(axis=1, keepdims=True)
array([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
[ 0. , 1. , 0. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.375 , 0.20833333, 0.125 , 0. , 0.29166667],
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
>>> (c # b).todense() # on Python < 3.5 replace c # b with c.dot(b)
matrix([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
[ 0. , 1. , 0. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0.375 , 0.20833333, 0.125 , 0. , 0.29166667],
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
Something funny is going on. I have no problem performing the element division. I wonder if it's a Py2 issue. I'm using Py3.
In [1022]: A=sparse.bsr_matrix([[2,4],[1,2]])
In [1023]: A
Out[1023]:
<2x2 sparse matrix of type '<class 'numpy.int32'>'
with 4 stored elements (blocksize = 2x2) in Block Sparse Row format>
In [1024]: A.A
Out[1024]:
array([[2, 4],
[1, 2]], dtype=int32)
In [1025]: A.sum(axis=1)
Out[1025]:
matrix([[6],
[3]], dtype=int32)
In [1026]: A/A.sum(axis=1)
Out[1026]:
matrix([[ 0.33333333, 0.66666667],
[ 0.33333333, 0.66666667]])
or to try the other example:
In [1027]: b=sparse.bsr_matrix([[0, 9, 0, 0, 1, 0],
...: [2, 0, 5, 0, 0, 9],
...: [0, 2, 0, 0, 0, 0],
...: [2, 0, 0, 0, 0, 0],
...: [0, 9, 5, 3, 0, 7],
...: [1, 0, 0, 8, 9, 0]])
In [1028]: b
Out[1028]:
<6x6 sparse matrix of type '<class 'numpy.int32'>'
with 14 stored elements (blocksize = 1x1) in Block Sparse Row format>
In [1029]: b.sum(axis=1)
Out[1029]:
matrix([[10],
[16],
[ 2],
[ 2],
[24],
[18]], dtype=int32)
In [1030]: b/b.sum(axis=1)
Out[1030]:
matrix([[ 0. , 0.9 , 0. , 0. , 0.1 , 0. ],
[ 0.125 , 0. , 0.3125 , 0. , 0. , 0.5625 ],
....
[ 0.05555556, 0. , 0. , 0.44444444, 0.5 , 0. ]])
The result of this sparse/dense is also dense, where as the c*b (c is the sparse diagonal) is sparse.
In [1039]: c*b
Out[1039]:
<6x6 sparse matrix of type '<class 'numpy.float64'>'
with 14 stored elements in Compressed Sparse Row format>
The sparse sum is a dense matrix. It is 2d, so there's no need to expand it dimensions. In fact if I try that I get an error:
In [1031]: A/(A.sum(axis=1)[:,None])
....
ValueError: shape too large to be a matrix.
Per this message, to keep the matrix sparse, you access the data values and use the (nonzero) indices:
sums = np.asarray(A.sum(axis=1)).squeeze() # this is dense
A.data /= sums[A.nonzero()[0]]
If dividing by the nonzero row mean instead of the sum, one can
nnz = A.getnnz(axis=1) # this is also dense
means = sums / nnz
A.data /= means[A.nonzero()[0]]