I am trying to use array slicing to reverse part of a NumPy array. If my array is, for example,
a = np.array([1,2,3,4,5,6])
then I can get a slice b
b = a[::-1]
Which is a view on the original array. What I would like is a view that is partially reversed, for example
1,4,3,2,5,6
I have encountered performance problems with NumPy if you don't play along exactly with how it is designed, so I would like to avoid "fancy" indexing if it is possible.
If you don't like the off by one indices
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[1:4][::-1]
>>> a
array([1, 4, 3, 2, 5, 6])
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[3:0:-1]
>>> a
array([1, 4, 3, 2, 5, 6])
You can use the permutation matrices (that's the numpiest way to partially reverse an array).
a = np.array([1,2,3,4,5,6])
new_order_for_index = [1,4,3,2,5,6] # Careful: index from 1 to n !
# Permutation matrix
m = np.zeros( (len(a),len(a)) )
for index , new_index in enumerate(new_order_for_index ):
m[index ,new_index -1] = 1
print np.dot(m,a)
# np.array([1,4,3,2,5,6])
Related
I am very new to python. I want to clearly understand the below code, if there's anyone who can help me.
Code:
import numpy as np
arr = np.array([[1, 2, 3, 4,99,11,22], [5, 6, 7, 8,43,54,22]])
for x in np.nditer(arr[0:,::4]):
print(x)
My understanding:
This 2D array has two 1D arrays.
np.nditer(arr[0:,::4]) will give all value from 0 indexed array to upto last array, ::4 means the gap between printed arrays will be 4.
Question:
Is my understanding for no 2 above correct?
How can I get the index for the print(x)? Because of the step difference of 4 e.g [0:,::4] or any gap [0:,::x] I want to find out the exact index that it is printing. But how?
Addressing your questions below
Yes, I think your understanding is correct. It might help to first print what arr[0:,::4] returns though:
iter_array = arr[0:,::4]
print(iter_array)
>>> [[ 1 99]
>>> [ 5 43]]
The slicing takes out each 4th index of the original array. All nditer does is iterate through these values in order. (Quick FYI: arr[0:] and arr[:] are equivalent, since the starting point is 0 by default).
As you pointed out, to get the index for these you need to keep track of the slicing that you did, i.e. arr[0:, ::x]. Remember, nditer has nothing to do with how you sliced your array. I'm not sure how to best get the indices of your slicing, but this is what I came up with:
import numpy as np
ls = [
[1, 2, 3, 4,99,11,22],
[5, 6, 7, 8,43,54,22]
]
arr = np.array(ls)
inds = np.array([
[(ctr1, ctr2) for ctr2, _ in enumerate(l)] for ctr1, l in enumerate(ls)
]) # create duplicate of arr filled with zeros
step = 4
iter_array = arr[0:,::step]
iter_inds = inds[0:,::step]
print(iter_array)
>>> [[ 1 99]
>>> [ 5 43]]
print(iter_inds)
>>> [[[0 0]
>>> [0 4]]
>>>
>>> [[1 0]
>>> [1 4]]]
All that I added here was an inds array. This array has elements equal to their own index. Then, when you slice both arrays in the same way, you get your indices. Hopefully this helps!
When ı print out the following code Q is prints like it suppose to be (3 5 7 9) sum of the numbers with the next one. but in the variable explorer its a single integer ı want to get the result Q as an array like
Q = [3, 5, 7, 9]
import numpy as np
A = [1, 2, 3, 4, 5]
for i in range(0,4):
Q = np.array(A[i]+A[i+1])
print(Q)
for i in range(0,4):
Q = []
Q.append(Q[i] + A[i]+A[i+1])
print(Q)
This also doesnt work
Currently you're just re-declaring Q each time and it's never added to some collection of values
Instead, start with an empty list (or perhaps a numpy array in your case) and outside of your loop and append the values to it at each loop cycle
Q is a numpy array, but it's not what you're expecting!
It has no dimensions and only references a single value
>>> type(Q)
<class 'numpy.ndarray'>
>>> print(repr(Q))
array(9)
>>> import numpy as np
>>> A = [1, 2, 3, 4, 5]
>>> Q = np.array([], dtype=np.uint8)
>>> for i in range(4):
... Q = np.append(Q, A[i]+A[i+1]) # reassign each time for np
...
>>> print(Q)
[3 5 7 9]
Note that numpy arrays should be reassigned via np.append, while a normal python list has a .append() method (which does not return the list, but directly appends to it)
>>> l = ['a', 'b', 'c'] # start with a list of values
>>> l.append('d') # use the append method
>>> l # display resulting list
['a', 'b', 'c', 'd']
If you're not forced to use a numpy array to begin with, this can be done with a list comprehension
The resulting list can also be made into a numpy array afterwards
>>> [(x + x + 1) for x in range(1, 5)]
[3, 5, 7, 9]
All together with simplified math
>>> np.array([x*2+3 for x in range(4)])
array([3, 5, 7, 9])
If you want to use Numpy, then use Numpy. Start with a Numpy array (one-dimensional, containing the values), which looks like this:
A = np.array([1, 2, 3, 4, 5])
(Yes, you initialize it from the list).
Or you can create that kind of patterned data using Numpy's built-in tool:
A = np.arange(1, 6) # it works similarly to the built-in `range` type,
# but it does create an actual array.
Now we can get the values to use on the left-hand and right-hand sides of the addition:
# You can slice one-dimensional Numpy arrays just like you would lists.
# With more dimensions, you can slice in each dimension.
X = A[:-1]
Y = A[1:]
And add the values together element-wise:
Q = X + Y # yes, really that simple!
And that last line is the reason you would use Numpy to solve a problem like this. Otherwise, just use a list comprehension:
A = list(range(1, 6)) # same as [1, 2, 3, 4, 5]
# Same slicing, but now we have to do more work for the addition,
# by explaining the process of pairing up the elements.
Q = [x + y for x, y in zip(A[:-1], A[1:])]
Is there a fast way to compare every element of an array against every element in a list of unique identifiers?
Using a for loop to loop through each of the unique values works but is way too slow to be usable. I have been searching for a vectorized solution but have not been successful. Any help would be greatly appreciated!
arrStart = []
startRavel = startInforce['pol_id'].ravel()
for policy in unique_policies:
arrStart.append(np.argwhere(startRavel == policy))
Sample Input:
startRavel = [1,2,2,2,3,3]
unique_policies = [1,2,3]
Sample Output:
arrStart = [[0], [1,2,3],[4,5]]
The new array would have the same length as the unique values array but each element would be a list of all of the rows that match that unique value in the large array.
Here's a vectorized solution:
import numpy as np
startRavel = np.array([1,2,2,2,3,3])
unique_policies = np.array([1,2,3])
Sort startRavel using np.argsort.
ix = np.argsort(startRavel)
s_startRavel = startRavel[ix]
Use np.searchsorted to find the indices in which unique_policies should be inserted in startRavel to mantain order:
s_ix = np.searchsorted(s_startRavel, unique_policies)
# array([0, 1, 4])
And then use np.split to split the array using the obtained indices. np.argsort is used again on s_ix to deal with non-sorted inputs:
ix_r = np.argsort(s_ix)
ixs = np.split(ix, s_ix[ix_r][1:])
np.array(ixs)[ix_r]
# [array([0]), array([1, 2, 3]), array([4, 5])]
General solution :
Lets wrap it all up in a function:
def ix_intersection(x, y):
"""
Finds the indices where each unique
value in x is found in y.
Both x and y must be numpy arrays.
----------
x: np.array
Must contain unique values.
Values in x are assumed to be in y.
y: np.array
Returns
-------
Array of arrays. Each array contains the indices where a
value in x is found in y
"""
ix_y = np.argsort(y)
s = np.searchsorted(y[ix_y], x)
ix_r = np.argsort(s)
ixs = np.split(ix_y, s[ix_r][1:])
return np.array(ixs)[ix_r]
Other examples
Lets try with the following arrays:
startRavel = np.array([1,3,3,2,2,2])
unique_policies = np.array([1,2,3])
ix_intersection(unique_policies, startRavel)
# array([array([0]), array([3, 4, 5]), array([1, 2])])
Another example, this time with non-sorted inputs:
startRavel = np.array([1,3,3,2,2,2,5])
unique_policies = np.array([1,2,5,3])
ix_intersection(unique_policies, startRavel)
# array([array([0]), array([3, 4, 5]), array([6]), array([1, 2])])
Is there any way to get the indices of several elements in a NumPy array at once?
E.g.
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
I would like to find the index of each element of a in b, namely: [0,1,4].
I find the solution I am using a bit verbose:
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
c = np.zeros_like(a)
for i, aa in np.ndenumerate(a):
c[i] = np.where(b == aa)[0]
print('c: {0}'.format(c))
Output:
c: [0 1 4]
You could use in1d and nonzero (or where for that matter):
>>> np.in1d(b, a).nonzero()[0]
array([0, 1, 4])
This works fine for your example arrays, but in general the array of returned indices does not honour the order of the values in a. This may be a problem depending on what you want to do next.
In that case, a much better answer is the one #Jaime gives here, using searchsorted:
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([0, 1, 4])
This returns the indices for values as they appear in a. For instance:
a = np.array([1, 2, 4])
b = np.array([4, 2, 3, 1])
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([3, 1, 0]) # the other method would return [0, 1, 3]
This is a simple one-liner using the numpy-indexed package (disclaimer: I am its author):
import numpy_indexed as npi
idx = npi.indices(b, a)
The implementation is fully vectorized, and it gives you control over the handling of missing values. Moreover, it works for nd-arrays as well (for instance, finding the indices of rows of a in b).
All of the solutions here recommend using a linear search. You can use np.argsort and np.searchsorted to speed things up dramatically for large arrays:
sorter = b.argsort()
i = sorter[np.searchsorted(b, a, sorter=sorter)]
For an order-agnostic solution, you can use np.flatnonzero with np.isin (v 1.13+).
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
res = np.flatnonzero(np.isin(a, b)) # NumPy v1.13+
res = np.flatnonzero(np.in1d(a, b)) # earlier versions
# array([0, 1, 2], dtype=int64)
There are a bunch of approaches for getting the index of multiple items at once mentioned in passing in answers to this related question: Is there a NumPy function to return the first index of something in an array?. The wide variety and creativity of the answers suggests there is no single best practice, so if your code above works and is easy to understand, I'd say keep it.
I personally found this approach to be both performant and easy to read: https://stackoverflow.com/a/23994923/3823857
Adapting it for your example:
import numpy as np
a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
indices = [b_list.index(x) for x in a]
vals_at_indices = b_array[indices]
I personally like adding a little bit of error handling in case a value in a does not exist in b.
import numpy as np
a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
b_set = set(b_list)
indices = [b_list.index(x) if x in b_set else np.nan for x in a]
vals_at_indices = b_array[indices]
For my use case, it's pretty fast, since it relies on parts of Python that are fast (list comprehensions, .index(), sets, numpy indexing). Would still love to see something that's a NumPy equivalent to VLOOKUP, or even a Pandas merge. But this seems to work for now.
array = numpy.array([1,2,3,4,5,6,7,8,9,10])
array[-1:3:1]
>> []
I want this array indexing to return something like this:
[10,1,2,3]
Use np.roll to:
Roll array elements along a given axis. Elements that roll beyond the last position are re-introduced at the first.
>>> np.roll(x, 1)[:4]
array([10, 1, 2, 3])
np.roll lets you wrap an array which might be useful
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = np.roll(a,1)[0:4]
results in
>>> b
array([10 1 2 3])
As one of the answers mentioned, rolling the array makes a copy of the whole array which can be memory consuming for large arrays. So just another way of doing this without converting to list is:
np.concatenate([array[-1:],array[:3]])
Use np.r_:
import numpy as np
>>>
>>> arr = np.arange(1, 11)
>>> arr[np.r_[-1:3]]
array([10, 1, 2, 3])
Simplest solution would be to convert first to list, and then join and return to array.
As such:
>>> numpy.array(list(array[-1:]) + list(array[:3]))
array([10, 1, 2, 3])
This way you can choose which indices to start and end, without creating a duplicate of the entire array