Numpy increment array indexed array? [duplicate] - python

I am trying to efficiently update some elements of a numpy array A, using another array b to indicate the indexes of the elements of A to be updated. However b can contain duplicates which are ignored whereas I would like to be taken into account. I would like to avoid for looping b. To illustrate it:
>>> A = np.arange(10).reshape(2,5)
>>> A[0, np.array([1,1,1,2])] += 1
>>> A
array([[0, 2, 3, 3, 4],
[5, 6, 7, 8, 9]])
whereas I would like the output to be:
array([[0, 3, 3, 3, 4],
[5, 6, 7, 8, 9]])
Any ideas?

To correctly handle the duplicate indices, you'll need to use np.add.at instead of +=. Therefore to update the first row of A, the simplest way would probably be to do the following:
>>> np.add.at(A[0], [1,1,1,2], 1)
>>> A
array([[0, 4, 3, 3, 4],
[5, 6, 7, 8, 9]])
The documents for the ufunc.at method can be found here.

One approach is to use numpy.histogram to find out how many values there are at each index, then add the result to A:
A[0, :] += np.histogram(np.array([1,1,1,2]), bins=np.arange(A.shape[1]+1))[0]

Related

Numpy Array: Slice several values at every step

I am trying to extract several values at once from an array but I can't seem to find a way to do it in a one-liner in Numpy.
Simply put, considering an array:
a = numpy.arange(10)
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
I would like to be able to extract, say, 2 values, skip the next 2, extract the 2 following values etc. This would result in:
array([0, 1, 4, 5, 8, 9])
This is an example but I am ideally looking for a way to extract x values and skip y others.
I thought this could be done with slicing, doing something like:
a[:2:2]
but it only returns 0, which is the expected behavior.
I know I could obtain the expected result by combining several slicing operations (similarly to Numpy Array Slicing) but I was wondering if I was not missing some numpy feature.
If you want to avoid creating copies and allocating new memory, you could use a window_view of two elements:
win = np.lib.stride_tricks.sliding_window_view(a, 2)
array([[0, 1],
[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9]])
And then only take every 4th window view:
win[::4].ravel()
array([0, 1, 4, 5, 8, 9])
Or directly go with the more dangerous as_strided, but heed the warnings in the documentation:
np.lib.stride_tricks.as_strided(a, shape=(3,2), strides=(32,8))
You can use a modulo operator:
x = 2 # keep
y = 2 # skip
out = a[np.arange(a.shape[0])%(x+y)<x]
Output: array([0, 1, 4, 5, 8, 9])
Output with x = 2 ; y = 3:
array([0, 1, 5, 6])

Increase numpy array elements using array as index

I am trying to efficiently update some elements of a numpy array A, using another array b to indicate the indexes of the elements of A to be updated. However b can contain duplicates which are ignored whereas I would like to be taken into account. I would like to avoid for looping b. To illustrate it:
>>> A = np.arange(10).reshape(2,5)
>>> A[0, np.array([1,1,1,2])] += 1
>>> A
array([[0, 2, 3, 3, 4],
[5, 6, 7, 8, 9]])
whereas I would like the output to be:
array([[0, 3, 3, 3, 4],
[5, 6, 7, 8, 9]])
Any ideas?
To correctly handle the duplicate indices, you'll need to use np.add.at instead of +=. Therefore to update the first row of A, the simplest way would probably be to do the following:
>>> np.add.at(A[0], [1,1,1,2], 1)
>>> A
array([[0, 4, 3, 3, 4],
[5, 6, 7, 8, 9]])
The documents for the ufunc.at method can be found here.
One approach is to use numpy.histogram to find out how many values there are at each index, then add the result to A:
A[0, :] += np.histogram(np.array([1,1,1,2]), bins=np.arange(A.shape[1]+1))[0]

Apply same permutation for every row in a 2D numpy array

To permute a 1D array A I know that you can run the following code:
import numpy as np
A = np.random.permutation(A)
I have a 2D array and want to apply exactly the same permutation for every row of the array. Is there any way you can specify the numpy to do that for you?
Generate random permutations for the number of columns in A and index into the columns of A, like so -
A[:,np.random.permutation(A.shape[1])]
Sample run -
In [100]: A
Out[100]:
array([[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]])
In [101]: A[:,np.random.permutation(A.shape[1])]
Out[101]:
array([[7, 5, 7, 4, 3],
[3, 5, 2, 0, 2],
[8, 4, 3, 8, 1]])
Actually you do not need to do this, from the documentation:
If x is a multi-dimensional array, it is only shuffled along its first
index.
So, taking Divakar's array:
a = np.array([
[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]
])
you can just do: np.random.permutation(a) and get something like:
array([[2, 5, 2, 0, 3],
[3, 5, 7, 4, 7],
[1, 4, 3, 8, 8]])
P.S. if you need to perform column permutations - just do np.random.permutation(a.T).T. Similar things apply to multi-dim arrays.
It depends what you mean on every row.
If you want to permute all values (regardless of row and column), reshape your array to 1d, permute, reshape back to 2d.
If you want to permutate each row but not shuffle the elements among the different columns you need to loop trough the one axis and call permutation.
for i in range(len(A)):
A[i] = np.random.permutation(A[i])
It can probably done shorter somehow but that is how it can be done.

Numpy: Add rows from one matrix to another by index

In short I want to index into a matrix and add to each row.
In this example the first row (indexed by the 0) should get [1,1,1] added to it. Then the second row (indexed by the 1) should get [2, 2, 2] added to it. Finally the first row (indexed by the third 0) should get [3, 3, 3] added to it.
>>> a = np.array([np.array([1,2,3]), np.array([4,5,6])])
>>> a
array([[1, 2, 3],
[4, 5, 6]])
>>> a[np.array([0,1,0]), :] += np.array([np.array([1,1,1]), np.array([2,2,2]), np.array([3,3,3])])
Desired:
>>> a
array([[5, 6, 7],
[6, 7, 8]])
Actual:
>>> a
array([[4, 5, 6],
[6, 7, 8]])
Edit 2:
As per comments below the solution runs slowly. From a portion of the code where I'm just adding 0 to test the speed:
print y.shape
print dW.shape
np.add.at(dW, (y, slice(None)), 0)
Yields:
(49000,)
(10, 3073)
And takes about 21 seconds. Without the np.add.at line the rest of the code takes about 1 second.
y.npy
dW.npy
This is a known problem of numpy, explained well here:
For example, a[[0,0]] += 1 will only increment the first element once
because of buffering, whereas add.at(a, [0,0], 1) will increment the
first element twice.
numpy solves the problem using add.at(). Example:
a = array([1,2,3])
add.at(a,[0,0],4) # now a = array([9, 2, 3])
In this case we want this to work for a multidimensional array:
a = np.array([np.array([1,2,3]), np.array([4,5,6])])
np.add.at(a,([0,1,0],slice(None)),array([[1,1,1],[2,2,2],[3,3,3]]))
The result is:
array([[5, 6, 7], [6, 7, 8]])
I guess you mistyped a 7 for a 6.

merge arrays in python based on a similar value

I want to merge two arrays in python based on the first element in each column of each array.
For example,
A = ([[1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[5, 9, 1]])
B = ([[1, .002],
[4, .005],
[5, .006]])
So that I get an array
C = ([[1, 2, 3, .002],
[4, 5, 6, .005],
[4, 6, 7, .005],
[5, 7, 8, .006],
[5, 9, 1, .006]])
For more clarity:
First column in A is 1, 4, 4, 5, 5 and
First column of B is 1, 4, 5
So that 1 in A matches up with 1 in B and gets .002
How would I do this in python? Any suggestions would be great.
Is it Ok to modify A in place?:
d = dict((x[0],x[1:]) for x in B)
Now d is a dictionary where the first column are keys and the subsequent columns are values.
for lst in A:
if lst[0] in d: #Is the first value something that we can extend?
lst.extend(d[lst[0]])
print A
To do it out of place (inspired by the answer by Ashwini):
d = dict((x[0],x[1:]) for x in B)
C = [lst + d.get(lst[0],[]) for lst in A]
However, with this approach, you need to have lists in both A and B. If you have some lists and some tuples it'll fail (although it could be worked around if you needed to), but it will complicate the code slightly.
with either of these answers, B can have an arbitrary number of columns
As a side note on style: I would write the lists as:
A = [[1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[5, 9, 1]]
Where I've dropped the parenthesis ... They make it look too much like you're putting a list in a tuple. Python's automatic line continuation happens with parenthesis (), square brackets [] or braces {}.
(This answer assumes these are just regular lists. If they’re NumPy arrays, you have more options.)
It looks like you want to use B as a lookup table to find values to add to each row of A.
I would start by making a dictionary out of the data in B. As it happens, B is already in just the right form to be passed to the dict() builtin:
B_dict = dict(B)
Then you just need to build C row by row.
For each row in A, row[0] is the first element, so B_dict[row[0]] is the value you want to add to the end of the row. Therefore row + [B_dict[row[0]] is the row you want to add to C.
Here is a list comprehension that builds C from A and B_dict.
C = [row + [B_dict[row[0]]] for row in A]
You can convert B to a dictionary first, with the first element of each sublist as key and second one as value.
Then simply iterate over A and append the related value fetched from the dict.
In [114]: A = ([1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[6, 9, 1])
In [115]: B = ([1, .002],
[4, .005],
[5, .006])
In [116]: [x + [dic[x[0]]] if x[0] in dic else [] for x in A]
Out[116]:
[[1, 2, 3, 0.002],
[4, 5, 6, 0.005],
[4, 6, 7, 0.005],
[5, 7, 8, 0.006],
[6, 9, 1]]
Here is a solution using itertools.product() that prevents having to create a dictionary for B:
In [1]: from itertools import product
In [2]: [lst_a + lst_b[1:] for (lst_a, lst_b) in product(A, B) if lst_a[0] == lst_b[0]]
Out[2]:
[[1, 2, 3, 0.002],
[4, 5, 6, 0.005],
[4, 6, 7, 0.005],
[5, 7, 8, 0.006],
[5, 9, 1, 0.006]]
The naive, simple way:
for alist in A:
for blist in B:
if blist[0] == alist[0]:
alist.extend(blist[1:])
# alist.append(blist[1]) if B will only ever contain 2-tuples.
break # Remove this if you want to append more than one.
The downside here is that it's O(N^2) complexity. For most small data sets, that should be ok. If you're looking for something more comprehensive, you'll probably want to look at #mgilson's answer. Some comparison:
His response converts everything in B to a dict and performs list slicing on each element. If you have a lot of values in B, that could be expensive. This uses the existing lists (you're only looking at the first value, anyway).
Because he's using dicts, he gets O(1) lookup times (his answer also assumes that you're never going to append multiple values to the end of the values in A). That means overall, his algorithm will achieve O(N). You'll need to weigh whether the overhead of creating a dict is going to outweight the iteration of the values in B.

Categories