Vectorizing this for-loop in numpy - python

I was wondering how I would vectorize this for loop. Given a 2x2x2 array x and an array where each element is the ith, jth, and kth element of the array I want to get x[i,j,k]
Given an arrays x and y
x = np.arange(8).reshape((2, 2, 2))
y = [[0, 1, 1], [1, 1, 0]]
I want to get:
x[0, 1, 1] = 3 and x[1, 1, 0] = 6
I tried:
print(x[y])
But it prints:
array([[2, 3],
[6, 7],
[4, 5]])
So I ended up doing:
for y_ in y:
print(x[y_[0], y_[1], y_[2]])
Which works, but I can't help but think there is a better way.

Use transposed y i.e zip(*y) as the index; You need to have the indices for each dimension as an element for advanced indexing to work:
x[tuple(zip(*y))]
# array([3, 6])

Related

How to sum specific row values together in Sparse COO matrix to reshape matrix

I have a sparse coo matrix built in python using the scipy library. An example data set looks something like this:
>>> v.toarray()
array([[1, 0, 2, 4],
[0, 0, 3, 1],
[4, 5, 6, 9]])
I would like to add the 0th index and 2nd index together and the 1st index and the and 3rd index together so the shape would change from 3, 4 to 3, 2.
However looking at the docs their sum function doesn't support slicing of some sort. So the only way I have thought of a way to do something like that would be to loop the matrix as an array then use numpy to get the summed values like so:
a_col = []
b_col = []
for x in range(len(v.toarray()):
a_col.append(np.sum(v.toarray()[x, [0, 2]], axis=0))
b_col.append(np.sum(v.toarray()[x, [1, 3]], axis=0))
Then use those values for a_col and b_col to create the matrix again.
But surely there should be a way to handle it with the sum method?
You can add the values with a simple loop and 2d slicing and than take the columns you want
v = np.array([[1, 0, 2, 4],
[0, 0, 3, 1],
[4, 5, 6, 9]])
for i in range(2):
v[:, i] = v[:, i] + v[:, i+2]
print(v[:, :2])
Output
[[ 3 4]
[ 3 1]
[10 14]]
You can use csr_matrix.dot with a special matrix to achieve the same,
csr = csr_matrix(csr.dot(np.array([[1,0,1,0],[0,1,0,1]]).T))
#csr.data
#[ 3, 4, 3, 1, 10, 14]

How does PyTorch Tensor.index_select() evaluates tensor output?

I am not able to understand how complex indexing - non contiguous indexing of a tensor works. Here is a sample code and its output
import torch
def describe(x):
print("Type: {}".format(x.type()))
print("Shape/size: {}".format(x.shape))
print("Values: \n{}".format(x))
indices = torch.LongTensor([0,2])
x = torch.arange(6).view(2,3)
describe(torch.index_select(x, dim=1, index=indices))
Returns output as
Type: torch.LongTensor Shape/size: torch.Size([2, 2]) Values:
tensor([[0, 2],
[3, 5]])
Can someone explain how did it arrive to this output tensor?
Thanks!
You are selecting the first (indices[0] is 0) and third (indices[1] is 2) tensors from x on the first axis (dim=0). Essentially, torch.index_select with dim=1 works the same as doing a direct indexing on the second axis with x[:, indices].
>>> x
tensor([[0, 1, 2],
[3, 4, 5]])
So selecting columns (since you're looking at dim=1 and not dim=0) which indices are in indices. Imagine having a simple list [0, 2] as indices:
>>> indices = [0, 2]
>>> x[:, indices[0]] # same as x[:, 0]
tensor([0, 3])
>>> x[:, indices[1]] # same as x[:, 2]
tensor([2, 5])
So passing the indices as a torch.Tensor allows you to index on all elements of indices directly, i.e. columns 0 and 2. Similar to how NumPy's indexing works.
>>> x[:, indices]
tensor([[0, 2],
[3, 5]])
Here's another example to help you see how it works. With x defined as x = torch.arange(9).view(3, 3) so we have 3 rows (a.k.a. dim=0) and 3 columns (a.k.a. dim=1).
>>> indices
tensor([0, 2]) # namely 'first' and 'third'
>>> x = torch.arange(9).view(3, 3)
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> x.index_select(0, indices) # select first and third rows
tensor([[0, 1, 2],
[6, 7, 8]])
>>> x.index_select(1, indices) # select first and third columns
tensor([[0, 2],
[3, 5],
[6, 8]])
Note: torch.index_select(x, dim, indices) is equivalent to x.index_select(dim, indices)

Is there a function in numpy where it will return a matrix of each row of vector raised from a power of 0 to integer

Suppose I have a vertical vector X = [1,[2],[3]] and integer pow = 2.
Is there a function in numpy where it will return a matrix of each row of vector x raised from a power of 0 to pow (pow = 2)
The above example should return a matrix
[[1, 1, 1],
[1, 2, 4],
[1, 3, 9]]
I looked at numpy.power, however returns an array.
The function you are looking for is
numpy.vander
I don't think there is a specific function, but you could use array broadcasting.
X = np.array([[1],[2],[3]])
p = np.arange(0, 2+1) # powers
X**p # row vs column vector broadcasts to 2D matrix
Following the rules of broadcasting, we could extend to 2D and then raise to power with a ranged array -
X[:,None]**np.arange(3) # Or np.power(X[:,None], np.arange(3))
Sample run -
In [7]: X = np.array([1,2,3])
In [8]: X[:,None]**np.arange(3)
Out[8]:
array([[1, 1, 1],
[1, 2, 4],
[1, 3, 9]])
If X is already extended, just raise to power -
In [24]: X = np.array([[1],[2],[3]])
In [25]: X**np.arange(3)
Out[25]:
array([[1, 1, 1],
[1, 2, 4],
[1, 3, 9]])
If n is large, I would suggest you use nested loop instead. Calculating power is expensive. Do something like this:
arr = [1]
for i in range(1, p + 1):
arr.append(p * arr[-1])

Create Numpy 2D Array with data from triplets of (x,y,value)

I have a lot of data in database in (x, y, value) triplet form.
I would like to be able to create dynamically a 2D numpy array from this data by setting value at the coords (x,y) of the array.
For instance if I have :
(0,0,8)
(0,1,5)
(0,2,3)
(1,0,4)
(1,1,0)
(1,2,0)
(2,0,1)
(2,1,2)
(2,2,5)
The resulting array should be :
Array([[8,5,3],[4,0,0],[1,2,5]])
I'm new to numpy, is there any method in numpy to do so ? If not, what approach would you advice to do this ?
Extending the answer from #MaxU, in case the coordinates are not ordered in a grid fashion (or in case some coordinates are missing), you can create your array as follows:
import numpy as np
a = np.array([(0,0,8),(0,1,5),(0,2,3),
(1,0,4),(1,1,0),(1,2,0),
(2,0,1),(2,1,2),(2,2,5)])
Here a represents your coordinates. It is an (N, 3) array, where N is the number of coordinates (it doesn't have to contain ALL the coordinates). The first column of a (a[:, 0]) contains the Y positions while the second columne (a[:, 1]) contains the X positions. Similarly, the last column (a[:, 2]) contains your values.
Then you can extract the maximum dimensions of your target array:
# Maximum Y and X coordinates
ymax = a[:, 0].max()
xmax = a[:, 1].max()
# Target array
target = np.zeros((ymax+1, xmax+1), a.dtype)
And finally, fill the array with data from your coordinates:
target[a[:, 0], a[:, 1]] = a[:, 2]
The line above sets values in target at a[:, 0] (all Y) and a[:, 1] (all X) locations to their corresponding a[:, 2] value (your value).
>>> target
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
Additionally, if you have missing coordinates, and you want to replace those missing values by some number, you can initialize the array as:
default_value = -1
target = np.full((ymax+1, xmax+1), default_value, a.type)
This way, the coordinates not present in your list will be filled with -1 in the target array/
Why not using sparse matrices? (which is pretty much the format of your triplets.)
First split the triplets in rows, columns, and data using numpy.hsplit(). (Use numpy.squeeze() to convert the resulting 2d arrays to 1d arrays.)
>>> row, col, data = [np.squeeze(splt) for splt
... in np.hsplit(tripets, tripets.shape[-1])]
Use the sparse matrix in coordinate format, and convert it to an array.
>>> from scipy.sparse import coo_matrix
>>> coo_matrix((data, (row, col))).toarray()
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
is that what you want?
In [37]: a = np.array([(0,0,8)
....: ,(0,1,5)
....: ,(0,2,3)
....: ,(1,0,4)
....: ,(1,1,0)
....: ,(1,2,0)
....: ,(2,0,1)
....: ,(2,1,2)
....: ,(2,2,5)])
In [38]:
In [38]: a
Out[38]:
array([[0, 0, 8],
[0, 1, 5],
[0, 2, 3],
[1, 0, 4],
[1, 1, 0],
[1, 2, 0],
[2, 0, 1],
[2, 1, 2],
[2, 2, 5]])
In [39]:
In [39]: a[:, 2].reshape(3,len(a)//3)
Out[39]:
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
or a bit more flexible (after your comment):
In [48]: a[:, 2].reshape([int(len(a) ** .5)] * 2)
Out[48]:
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
Explanation:
this gives you the 3rd column (value):
In [42]: a[:, 2]
Out[42]: array([8, 5, 3, 4, 0, 0, 1, 2, 5])
In [49]: [int(len(a) ** .5)]
Out[49]: [3]
In [50]: [int(len(a) ** .5)] * 2
Out[50]: [3, 3]

what is numpy way of arranging numpy 2D array based on 2D numpy array of index?

import numpy as np
x = np.array([[1,2 ,3], [9,8,7]])
y = np.array([[2,1 ,0], [1,0,2]])
x[y]
Expected output:
array([[3,2,1], [8,9,7]])
If x and y were 1D arrays, then x[y] would work. So what is the numpy way or most pythonic or efficient way of doing this for 2D arrays?
You need to define the corresponding row indices.
One way is:
>>> x[np.arange(x.shape[0])[..., None], y]
array([[3, 2, 1],
[8, 9, 7]])
You can calculate the linear indices from y and then use those to extract specific elements from x, like so -
# Linear indices from y, using x's shape
lin_idx = y + np.arange(y.shape[0])[:,None]*x.shape[1]
# Use np.take to extract those indexed elements from x
out = np.take(x,lin_idx)
Sample run -
In [47]: x
Out[47]:
array([[1, 2, 3],
[9, 8, 7]])
In [48]: y
Out[48]:
array([[2, 1, 0],
[1, 0, 2]])
In [49]: lin_idx = y + np.arange(y.shape[0])[:,None]*x.shape[1]
In [50]: lin_idx # Compare this with y
Out[50]:
array([[2, 1, 0],
[4, 3, 5]])
In [51]: np.take(x,lin_idx)
Out[51]:
array([[3, 2, 1],
[8, 9, 7]])

Categories