Numpy/Pytorch generate mask based on varying index values - python

I've been trying to do the following as a batch operation in numpy or torch (no looping). Is this possible?
Suppose I have:
indices: [[3],[2]] (2x1)
output: [[0,0,0,0,1], [0,0,0,1,1]] (2xfixed_num) where fixed_num is 5 here
Essentially, I want to make indices up to that index value 0 and the rest 1 for each element.

Ok, so I actually assume this is some sort of HW assignment - but maybe it's not, either way it was fun to do, here's a solution for your specific example, maybe you can generalize it to any shape array:
def fill_ones(arr, idxs):
x = np.where(np.arange(arr.shape[1]) <= idxs[0], 0, 1) # This is the important logic.
y = np.where(np.arange(arr.shape[1]) <= idxs[1], 0, 1)
return np.array([x, y])
So where the comment is located - we use a condition to assign 0 to all indices before some index value, and 1 after such value. This actually creates a new array as opposed to a mask that we can use to the original array - so maybe it's "dirtier".
Also, I suspect it's possible to generalize to arrays more than 2 dimensions, but the solution i'm imagining now uses a for-loop. Hope this helps!
Note: arr is just a numpy array of whatever shape you want the output to be and idxs is a tuple of what indices past you want to the array elements to turn into 1's - hope that is clear

Related

expand numpy array in n dimensions

I am trying to 'expand' an array (generate a new array with proportionally more elements in all dimensions). I have an array with known numbers (let's call it X) and I want to make it j times bigger (in each dimension).
So far I generated a new array of zeros with more elements, then I used broadcasting to insert the original numbers in the new array (at fixed intervals).
Finally, I used linspace to fill the gaps, but this part is actually not directly relevant to the question.
The code I used (for n=3) is:
import numpy as np
new_shape = (np.array(X.shape) - 1 ) * ratio + 1
new_array = np.zeros(shape=new_shape)
new_array[::ratio,::ratio,::ratio] = X
My problem is that this is not general, I would have to modify the third line based on ndim. Is there a way to use such broadcasting for any number of dimensions in my array?
Edit: to be more precise, the third line would have to be:
new_array[::ratio,::ratio] = X
if ndim=2
or
new_array[::ratio,::ratio,::ratio,::ratio] = X
if ndim=4
etc. etc. I want to avoid having to write code for each case of ndim
p.s. If there is a better tool to do the entire process (such as 'inner-padding' that I am not aware of, I will be happy to learn about it).
Thank you
array = array[..., np.newaxis] will add another dimension
This article might help
You can use slice notation -
slicer = tuple(slice(None,None,ratio) for i in range(X.ndim))
new_array[slicer] = X
Build the slicing tuple manually. ::ratio is equivalent to slice(None, None, ratio):
new_array[(slice(None, None, ratio),)*new_array.ndim] = ...

Access elements of a Matrix by a list of indices in Python to apply a max(val, 0.5) to each value without a for loop

I know how to access elements in a vector by indices doing:
test = numpy.array([1,2,3,4,5,6])
indices = list([1,3,5])
print(test[indices])
which gives the correct answer : [2 4 6]
But I am trying to do the same thing using a 2D matrix, something like:
currentGrid = numpy.array( [[0, 0.1],
[0.9, 0.9],
[0.1, 0.1]])
indices = list([(0,0),(1,1)])
print(currentGrid[indices])
this should display me "[0.0 0.9]" for the value at (0,0) and the one at (1,1) in the matrix. But instead it displays "[ 0.1 0.1]". Also if I try to use 3 indices with :
indices = list([(0,0),(1,1),(0,2)])
I now get the following error:
Traceback (most recent call last):
File "main.py", line 43, in <module>
print(currentGrid[indices])
IndexError: too many indices for array
I ultimately need to apply a simple max() operation on all the elements at these indices and need the fastest way to do that for optimization purposes.
What am I doing wrong ? How can I access specific elements in a matrix to do some operation on them in a very efficient way (not using list comprehension nor a loop).
The problem is the arrangement of the indices you're passing to the array. If your array is two-dimensional, your indices must be two lists, one containing the vertical indices and the other one the horizontal ones. For instance:
idx_i, idx_j = zip(*[(0, 0), (1, 1), (0, 2)])
print currentGrid[idx_j, idx_i]
# [0.0, 0.9, 0.1]
Note that the first element when indexing arrays is the last dimension, e.g.: (y, x). I assume you defined yours as (x, y) otherwise you'll get an IndexError
There are already some great answers to your problem. Here just a quick and dirty solution for your particular code:
for i in indices:
print(currentGrid[i[0],i[1]])
Edit:
If you do not want to use a for loop you need to do the following:
Assume you have 3 values of your 2D-matrix (with the dimensions x1 and x2 that you want to access. The values have the "coordinates"(indices) V1(x11|x21), V2(x12|x22), V3(x13|x23). Then, for each dimension of your matrix (2 in your case) you need to create a list with the indices for this dimension of your points. In this example, you would create one list with the x1 indices: [x11,x12,x13] and one list with the x2 indices of your points: [x21,x22,x23]. Then you combine these lists and use them as index for the matrix:
indices = [[x11,x12,x13],[x21,x22,x23]]
or how you write it:
indices = list([(x11,x12,x13),(x21,x22,x23)])
Now with the points that you used ((0,0),(1,1),(2,0)) - please note you need to use (2,0) instead of (0,2), because it would be out of range otherwise:
indices = list([(0,1,2),(0,1,0)])
print(currentGrid[indices])
This will give you 0, 0.9, 0.1. And on this list you can then apply the max() command if you like (just to consider your whole question):
maxValue = max(currentGrid[indices])
Edit2:
Here an example how you can transform your original index list to get it into the correct shape:
originalIndices = [(0,0),(1,1),(2,0)]
x1 = []
x2 = []
for i in originalIndices:
x1.append(i[0])
x2.append(i[1])
newIndices = [x1,x2]
print(currentGrid[newIndices])
Edit3:
I don't know if you can apply max(x,0.5) to a numpy array with using a loop. But you could use Pandas instead. You can cast your list into a pandas Series and then apply a lambda function:
import pandas as pd
maxValues = pd.Series(currentGrid[newIndices]).apply(lambda x: max(x,0.5))
This will give you a pandas array containing 0.5,0.9,0.5, which you can simply cast back to a list maxValues = list(maxValues).
Just one note: In the background you will always have some kind of loop running, also with this command. I doubt, that you will get much better performance by this. If you really want to boost performance, then use a for loop, together with numba (you simply need to add a decorator to your function) and execute it in parallel. Or you can use the multiprocessing library and the Pool function, see here. Just to give you some inspiration.
Edit4:
Accidentally I saw this page today, which allows to do exactly what you want with Numpy. The solution (considerin the newIndices vector from my Edit2) to your problem is:
maxfunction = numpy.vectorize(lambda i: max(i,0.5))
print(maxfunction(currentGrid[newIndices]))
2D indices have to be accessed like this:
print(currentGrid[indices[:,0], indices[:,1]])
The row indices and the column indices are to be passed separately as lists.

How to replace values in python numpy array in multiple dimensions using a mask?

I would like to replace values in a numpy array using boolean logic in dimensions >= 3. All the examples I can find are for 1d or 2d arrays such that you can say something like A[A>.5] = 1.
That I understand. I want to do something like this:
A[B==0,2 > 128] = 0. Basically, I have a 2D map (B) whose dimensions are the same as the first two in A. So everywhere B == 0 and the third channel (like B of RGB in an image) is greater than 128 I want to set the 3rd channel to zero, leaving the other two channels.
I can do this in a loop or doing a for value in array loop, but I was hoping someone could tell me if I could do it in a similar manner to A[A>.5] = 1.
I have tried A[B==0,2 > 128] = 0, but I get IndexError: in the future, 0-d boolean arrays will be interpreted as a valid boolean index and I can't really think of any other ways to write this.
One thing I forgot to mention - currently I can do A[B==0,2] = 0. I am just trying how to fit that extra conditional in.
A[(B == 0)*(A[..., 2] > 128), 2] = 0
Explanation:
B == 0 and A[..., 2] > 128 are the same dimensions, where [..., 2] specifies that slicing occurs in all axes in all dimensions until the last dimension, where only the axis indexed by 2 is sliced. This product of the two is False for any condition which doesn't fulfils your requirements.
2 is the third axis of A on the last dimension.

Subtracting a number from an array if condition is met python

I am facing a very basic problem in if condition in python.
The array current_img_list is of dimension (500L,1). If the number 82459 is in this array, I want to subtract it by 1.
index = np.random.randint(0,num_train-1,500)
# shape of current_img_list is (500L,1)
# values in index array is randomized
current_img_list = train_data['img_list'][index]
# shape of current_img_list is (500L,1)
if (any(current_img_list) == 82459):
i = np.where(current_img_list == 82459)
final_list = i-1
Explanation of variables - train_data is of type dict. It has 4 elements in it. img_list is one of the elements with size (215375L,1). The value of num_train is 215375 and size of index is (500L,1)
Firsly I don't know whether this loop is working or not. I tried all() function and numpy.where() function but to no success. Secondly, I can't think of a way of how to subtract 1 from 82459 directly from the index at which it is stored without affecting the rest of the values in this array.
Thanks
Looping over the array in Python will be much slower than letting numpy's vectorized operators do their thing:
import numpy as np
num_train = 90000 # for example
index = np.random.randint(0,num_train-1,500)
index[index == 82459] -= 1
current_img_list = np.array([1,2,3,82459,4,5,6])
i = np.where(current_img_list == 82459)
current_img_list[i] -= 1
I'm a little confused by whats trying to be achieved here, but I'll give it a go:
If you're trying to subtract 1 from anywhere in your array that is equal to 82459, then what maybe iterate through the array, with a for loop. Each time the current index is equal to 82459, just set the number at that index -= 1.
If you need more help, please post the rest of the relevant code so I can debug.

Python compute a specific inner product on vectors

Assume having two vectors with m x 6, n x 6
import numpy as np
a = np.random.random(m,6)
b = np.random.random(n,6)
using np.inner works as expected and yields
np.inner(a,b).shape
(m,n)
with every element being the scalar product of each combination. I now want to compute a special inner product (namely Plucker). Right now im using
def pluckerSide(a,b):
a0,a1,a2,a3,a4,a5 = a
b0,b1,b2,b3,b4,b5 = b
return a0*b4+a1*b5+a2*b3+a4*b0+a5*b1+a3*b2
with a,b sliced by a for loop. Which is way too slow. Any plans on vectorizing fail. Mostly broadcast errors due to wrong shapes. Cant get np.vectorize to work either.
Maybe someone can help here?
There seems to be an indexing based on some random indices for pairwise multiplication and summing on those two input arrays with function pluckerSide. So, I would list out those indices, index into the arrays with those and finally use matrix-multiplication with np.dot to perform the sum-reduction.
Thus, one approach would be like this -
a_idx = np.array([0,1,2,4,5,3])
b_idx = np.array([4,5,3,0,1,2])
out = a[a_idx].dot(b[b_idx])
If you are doing this in a loop across all rows of a and b and thus generating an output array of shape (m,n), we can vectorize that, like so -
out_all = a[:,a_idx].dot(b[:,b_idx].T)
To make things a bit easier, we can re-arrange a_idx such that it becomes range(6) and re-arrange b_idx with that pattern. So, we would have :
a_idx = np.array([0,1,2,3,4,5])
b_idx = np.array([4,5,3,2,0,1])
Thus, we can skip indexing into a and the solution would be simply -
a.dot(b[:,b_idx].T)

Categories