ndarray view into slices/indices - python

With an ndarray.view, one can do:
import numpy as np
a = np.arange(6)
b = a.view()
b[...] = [5, 5, 5, 5, 5, 5]
a and b are now both [5, 5, 5, 5, 5, 5].
Now, can I do the same with slicing/indexing? So that the view does not show the full array, but just a slice? Something like:
import numpy as np
a = np.arange(6)
idx = [0, 2, 4]
b = a[idx] # please just return a view into `a` here
b[...] = [5, 5, 5]
Now a is of course still [0, 1, 2, 3, 4, 5] but I'd like to have it to be [5, 1, 5, 3, 5, 5].
This would be very useful when mapping between different arrays.

As mentioned in the comments, we can get views when dealing with patterned strides for indexing.
Let's take a look at few cases.
1) Case #1: Starting index = 0 and with a stride of 2:
In [129]: a = np.arange(6) # Input array
In [130]: idx = [0,2,4] # Simulating these indices for indexing
In [131]: b = a[::2] # Get view
In [132]: b[...] = [5, 5, 5] # Assign values
In [133]: a
Out[133]: array([5, 1, 5, 3, 5, 5]) # Verify
2) Case #2: Starting index = 1 and with a stride of 2:
In [134]: a = np.arange(6) # Input array
In [135]: idx = [1,3,5] # Simulating these indices for indexing
In [136]: b = a[1::2] # Get view
In [137]: b[...] = [5, 5, 5] # Assign values
In [138]: a
Out[138]: array([0, 5, 2, 5, 4, 5]) # Verify
This method is extensible to multi-dimensional arrays.

Related

Permute within a row in python

I have two arrays that are paired meaning that element 1 in both arrays needs to have the same index. I want to permute these elements. Currently, I tried np.random.permutation but that does not seem to get the right answer.
For example, if the two arrays are [1,2,3] and [4,5,6], one possible permutation would be [4,2,3] and [1,5,6].
You can stack your arrays and choose a random column for each row using choice.
Setup
a = np.array([1,2,3])
b = np.array([4,5,6])
v = np.column_stack((a,b))
# array([[1, 4],
# [2, 5],
# [3, 6]])
np.random.seed(1)
choices = np.random.choice(v.shape[1], v.shape[0])
# array([1, 1, 0])
Finally, to index:
v[np.arange(v.shape[0]), choices]
array([4, 5, 3])
a=np.array([1, 2, 3])
b=np.array([4, 5, 6])
random_arr=np.random.choice([0, 1], size=(len(a),)) # Generate a random array of 0s and 1s, let's say arr([0,0,1])
a1=random_arr*a + (1-random_arr)*b # arr([0,0,1])*arr([1,2,3]) + arr([1,1,0])*arr([4,5,6]) = arr([4, 5, 3])
b1=random_arr*b + (1-random_arr)*a # arr([0,0,1])*arr([4,5,6]) + arr([1,1,0])*arr([1,2,3]) = arr([1, 2, 6])
a=a1
b=b1
Run 1 of the code above:
a
Out[188]: array([4, 2, 6])
b
Out[189]: array([1, 5, 3])
Run 2:
a
Out[191]: array([4, 5, 3])
b
Out[192]: array([1, 2, 6])
You can use np.choose :
toss=np.random.randint(0,2,len(x))
print(np.choose(toss,[x,y]))
print(np.choose(toss,[y,x]))
#[1 5 6]
#[4 2 3]

Python: Creating list of subarrays

I have a massive array but for illustration I am using an array of size 14. I have another list which contains 2, 3, 3, 6. How do I efficiently without for look create a list of new arrays such that:
import numpy as np
A = np.array([1,2,4,5,7,1,2,4,5,7,2,8,12,3]) # array with 1 axis
subArraysizes = np.array( 2, 3, 3, 6 ) #sums to number of elements in A
B = list()
B[0] = [1,2]
B[1] = [4,5,7]
B[2] = [1,2,4]
B[3] = [5,7,2,8,12,3]
i.e. select first 2 elements from A store it in B, select next 3 elements of A store it in B and so on in the order it appears in A.
You can use np.split -
B = np.split(A,subArraysizes.cumsum())[:-1]
Sample run -
In [75]: A
Out[75]: array([ 1, 2, 4, 5, 7, 1, 2, 4, 5, 7, 2, 8, 12, 3])
In [76]: subArraysizes
Out[76]: array([2, 3, 3, 6])
In [77]: np.split(A,subArraysizes.cumsum())[:-1]
Out[77]:
[array([1, 2]),
array([4, 5, 7]),
array([1, 2, 4]),
array([ 5, 7, 2, 8, 12, 3])]

How can I pad and/or truncate a vector to a specified length using numpy?

I have couple of lists:
a = [1,2,3]
b = [1,2,3,4,5,6]
which are of variable length.
I want to return a vector of length five, such that if the input list length is < 5 then it will be padded with zeros on the right, and if it is > 5, then it will be truncated at the 5th element.
For example, input a would return np.array([1,2,3,0,0]), and input b would return np.array([1,2,3,4,5]).
I feel like I ought to be able to use np.pad, but I can't seem to follow the documentation.
This might be slow or fast, I am not sure, however it works for your purpose.
In [22]: pad = lambda a,i : a[0:i] if len(a) > i else a + [0] * (i-len(a))
In [23]: pad([1,2,3], 5)
Out[23]: [1, 2, 3, 0, 0]
In [24]: pad([1,2,3,4,5,6,7], 5)
Out[24]: [1, 2, 3, 4, 5]
np.pad is overkill, better for adding a border all around a 2d image than adding some zeros to a list.
I like the zip_longest, especially if the inputs are lists, and don't need to be arrays. It's probably the closest you'll find to a code that operates on all lists at once in compiled code).
a, b = zip(*list(itertools.izip_longest(a, b, fillvalue=0)))
is a version that does not use np.array at all (saving some array overhead)
But by itself it does not truncate. It stills something like [x[:5] for x in (a,b)].
Here's my variation on all_ms function, working with a simple list or 1d array:
def foo_1d(x, n=5):
x = np.asarray(x)
assert x.ndim==1
s = np.min([x.shape[0], n])
ret = np.zeros((n,), dtype=x.dtype)
ret[:s] = x[:s]
return ret
In [772]: [foo_1d(x) for x in [[1,2,3], [1,2,3,4,5], np.arange(10)[::-1]]]
Out[772]: [array([1, 2, 3, 0, 0]), array([1, 2, 3, 4, 5]), array([9, 8, 7, 6, 5])]
One way or other the numpy solutions do the same thing - construct a blank array of the desired shape, and then fill it with the relevant values from the original.
One other detail - when truncating the solution could, in theory, return a view instead of a copy. But that requires handling that case separately from a pad case.
If the desired output is a list of equal lenth arrays, it may be worth while collecting them in a 2d array.
In [792]: def foo1(x, out):
x = np.asarray(x)
s = np.min((x.shape[0], out.shape[0]))
out[:s] = x[:s]
In [794]: lists = [[1,2,3], [1,2,3,4,5], np.arange(10)[::-1], []]
In [795]: ret=np.zeros((len(lists),5),int)
In [796]: for i,xx in enumerate(lists):
foo1(xx, ret[i,:])
In [797]: ret
Out[797]:
array([[1, 2, 3, 0, 0],
[1, 2, 3, 4, 5],
[9, 8, 7, 6, 5],
[0, 0, 0, 0, 0]])
Pure python version, where a is a python list (not a numpy array): a[:n] + [0,]*(n-len(a)).
For example:
In [42]: n = 5
In [43]: a = [1, 2, 3]
In [44]: a[:n] + [0,]*(n - len(a))
Out[44]: [1, 2, 3, 0, 0]
In [45]: a = [1, 2, 3, 4]
In [46]: a[:n] + [0,]*(n - len(a))
Out[46]: [1, 2, 3, 4, 0]
In [47]: a = [1, 2, 3, 4, 5]
In [48]: a[:n] + [0,]*(n - len(a))
Out[48]: [1, 2, 3, 4, 5]
In [49]: a = [1, 2, 3, 4, 5, 6]
In [50]: a[:n] + [0,]*(n - len(a))
Out[50]: [1, 2, 3, 4, 5]
Function using numpy:
In [121]: def tosize(a, n):
.....: a = np.asarray(a)
.....: x = np.zeros(n, dtype=a.dtype)
.....: m = min(n, len(a))
.....: x[:m] = a[:m]
.....: return x
.....:
In [122]: tosize([1, 2, 3], 5)
Out[122]: array([1, 2, 3, 0, 0])
In [123]: tosize([1, 2, 3, 4], 5)
Out[123]: array([1, 2, 3, 4, 0])
In [124]: tosize([1, 2, 3, 4, 5], 5)
Out[124]: array([1, 2, 3, 4, 5])
In [125]: tosize([1, 2, 3, 4, 5, 6], 5)
Out[125]: array([1, 2, 3, 4, 5])

Finding differences between all values in an List

I want to find the differences between all values in a numpy array and append it to a new list.
Example: a = [1,4,2,6]
result : newlist= [3,1,5,3,2,2,1,2,4,5,2,4]
i.e for each value i of a, determine difference between values of the rest of the list.
At this point I have been unable to find a solution
You can do this:
a = [1,4,2,6]
newlist = [abs(i-j) for i in a for j in a if i != j]
Output:
print newlist
[3, 1, 5, 3, 2, 2, 1, 2, 4, 5, 2, 4]
I believe what you are trying to do is to calculate absolute differences between elements of the input list, but excluding the self-differences. So, with that idea, this could be one vectorized approach also known as array programming -
# Input list
a = [1,4,2,6]
# Convert input list to a numpy array
arr = np.array(a)
# Calculate absolute differences between each element
# against all elements to give us a 2D array
sub_arr = np.abs(arr[:,None] - arr)
# Get diagonal indices for the 2D array
N = arr.size
rem_idx = np.arange(N)*(N+1)
# Remove the diagonal elements for the final output
out = np.delete(sub_arr,rem_idx)
Sample run to show the outputs at each step -
In [60]: a
Out[60]: [1, 4, 2, 6]
In [61]: arr
Out[61]: array([1, 4, 2, 6])
In [62]: sub_arr
Out[62]:
array([[0, 3, 1, 5],
[3, 0, 2, 2],
[1, 2, 0, 4],
[5, 2, 4, 0]])
In [63]: rem_idx
Out[63]: array([ 0, 5, 10, 15])
In [64]: out
Out[64]: array([3, 1, 5, 3, 2, 2, 1, 2, 4, 5, 2, 4])

NumPy selecting specific column index per row by using a list of indexes

I'm struggling to select the specific columns per row of a NumPy matrix.
Suppose I have the following matrix which I would call X:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
I also have a list of column indexes per every row which I would call Y:
[1, 0, 2]
I need to get the values:
[2]
[4]
[9]
Instead of a list with indexes Y, I can also produce a matrix with the same shape as X where every column is a bool / int in the range 0-1 value, indicating whether this is the required column.
[0, 1, 0]
[1, 0, 0]
[0, 0, 1]
I know this can be done with iterating over the array and selecting the column values I need. However, this will be executed frequently on big arrays of data and that's why it has to run as fast as it can.
I was thus wondering if there is a better solution?
If you've got a boolean array you can do direct selection based on that like so:
>>> a = np.array([True, True, True, False, False])
>>> b = np.array([1,2,3,4,5])
>>> b[a]
array([1, 2, 3])
To go along with your initial example you could do the following:
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b = np.array([[False,True,False],[True,False,False],[False,False,True]])
>>> a[b]
array([2, 4, 9])
You can also add in an arange and do direct selection on that, though depending on how you're generating your boolean array and what your code looks like YMMV.
>>> a = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> a[np.arange(len(a)), [1,0,2]]
array([2, 4, 9])
You can do something like this:
In [7]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [8]: lst = [1, 0, 2]
In [9]: a[np.arange(len(a)), lst]
Out[9]: array([2, 4, 9])
More on indexing multi-dimensional arrays: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays
Recent numpy versions have added a take_along_axis (and put_along_axis) that does this indexing cleanly.
In [101]: a = np.arange(1,10).reshape(3,3)
In [102]: b = np.array([1,0,2])
In [103]: np.take_along_axis(a, b[:,None], axis=1)
Out[103]:
array([[2],
[4],
[9]])
It operates in the same way as:
In [104]: a[np.arange(3), b]
Out[104]: array([2, 4, 9])
but with different axis handling. It's especially aimed at applying the results of argsort and argmax.
A simple way might look like:
In [1]: a = np.array([[1, 2, 3],
...: [4, 5, 6],
...: [7, 8, 9]])
In [2]: y = [1, 0, 2] #list of indices we want to select from matrix 'a'
range(a.shape[0]) will return array([0, 1, 2])
In [3]: a[range(a.shape[0]), y] #we're selecting y indices from every row
Out[3]: array([2, 4, 9])
You can do it by using iterator. Like this:
np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
Time:
N = 1000
X = np.zeros(shape=(N, N))
Y = np.arange(N)
##Aशwini चhaudhary
%timeit X[np.arange(len(X)), Y]
10000 loops, best of 3: 30.7 us per loop
#mine
%timeit np.fromiter((row[index] for row, index in zip(X, Y)), dtype=int)
1000 loops, best of 3: 1.15 ms per loop
#mine
%timeit np.diag(X.T[Y])
10 loops, best of 3: 20.8 ms per loop
Another clever way is to first transpose the array and index it thereafter. Finally, take the diagonal, its always the right answer.
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
Y = np.array([1, 0, 2, 2])
np.diag(X.T[Y])
Step by step:
Original arrays:
>>> X
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
>>> Y
array([1, 0, 2, 2])
Transpose to make it possible to index it right.
>>> X.T
array([[ 1, 4, 7, 10],
[ 2, 5, 8, 11],
[ 3, 6, 9, 12]])
Get rows in the Y order.
>>> X.T[Y]
array([[ 2, 5, 8, 11],
[ 1, 4, 7, 10],
[ 3, 6, 9, 12],
[ 3, 6, 9, 12]])
The diagonal should now become clear.
>>> np.diag(X.T[Y])
array([ 2, 4, 9, 12]
The answer from hpaulj using take_along_axis should be the accepted one.
Here is a derived version with an N-dim index array:
>>> arr = np.arange(20).reshape((2,2,5))
>>> idx = np.array([[1,0],[2,4]])
>>> np.take_along_axis(arr, idx[...,None], axis=-1)
array([[[ 1],
[ 5]],
[[12],
[19]]])
Note that the selection operation is ignorant about the shapes. I used this to refine a possibly vector-valued argmax result from histogram by fitting parabolas:
def interpol(arr):
i = np.argmax(arr, axis=-1)
a = lambda Δ: np.squeeze(np.take_along_axis(arr, i[...,None]+Δ, axis=-1), axis=-1)
frac = .5*(a(1) - a(-1)) / (2*a(0) - a(-1) - a(1)) # |frac| < 0.5
return i + frac
Note the squeeze to remove the dimension of size 1 resulting in the same shape of i and frac, the integer and fractional part of the peak position.
I'm quite sure that it is possible to avoid the lambda, but would the interpolation formula still look nice?

Categories