How to vectorize increments in Python - python

I have a 2d array, and I have some numbers to add to some cells. I want to vectorize the operation in order to save time. The problem is when I need to add several numbers to the same cell. In this case, the vectorized code only adds the last.
'a' is my array, 'x' and 'y' are the coordinates of the cells I want to increment, and 'z' contains the numbers I want to add.
import numpy as np
a=np.zeros((4,4))
x=[1,2,1]
y=[0,1,0]
z=[2,3,1]
a[x,y]+=z
print(a)
As you see, a[1,0] should be incremented twice: one by 2, one by 1. So the expected array should be:
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
but instead I get:
[[0. 0. 0. 0.]
[1. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
The problem would be easy to solve with a for loop, but I wonder if I can correctly vectorize this operation.

Use np.add.at for that:
import numpy as np
a = np.zeros((4,4))
x = [1, 2, 1]
y = [0, 1, 0]
z = [2, 3, 1]
np.add.at(a, (x, y), z)
print(a)
# [[0. 0. 0. 0.]
# [3. 0. 0. 0.]
# [0. 3. 0. 0.]
# [0. 0. 0. 0.]]

When you're doing a[x,y]+=z, we can decompose the operations as :
a[1, 0], a[2, 1], a[1, 0] = [a[1, 0] + 2, a[2, 1] + 3, a[1, 0] + 1]
# Equivalent to :
a[1, 0] = 2
a[2, 1] = 3
a[1, 0] = 1
That's why it doesn't works.
But if you're incrementing your array with a loop for each dimention, it should work

You could create a multi-dimensional array of size 3x4x4, then add up z to all the 3 different dimensions and them sum them all
import numpy as np
x = [1,2,1]
y = [0,1,0]
z = [2,3,1]
a = np.zeros((3,4,4))
n = range(a.shape[0])
a[n,x,y] += z
print(sum(a))
which will result in
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]

Approach #1: Bincount-based method for performance
We can use np.bincount for efficient bin-based summation and basically inspired by this post -
def accumulate_arr(x, y, z, out):
# Get output array shape
shp = out.shape
# Get linear indices to be used as IDs with bincount
lidx = np.ravel_multi_index((x,y),shp)
# Or lidx = coords[0]*(coords[1].max()+1) + coords[1]
# Accumulate arr with IDs from lidx
out += np.bincount(lidx,z,minlength=out.size).reshape(out.shape)
return out
If you are working with a zeros-initialized output array, feed in the output shape directly into the function and get the bincount output as the final one.
Output on given sample -
In [48]: accumulate_arr(x,y,z,a)
Out[48]:
array([[0., 0., 0., 0.],
[3., 0., 0., 0.],
[0., 3., 0., 0.],
[0., 0., 0., 0.]])
Approach #2: Using sparse-matrix for memory-efficiency
In [54]: from scipy.sparse import coo_matrix
In [56]: coo_matrix((z,(x,y)), shape=(4,4)).toarray()
Out[56]:
array([[0, 0, 0, 0],
[3, 0, 0, 0],
[0, 3, 0, 0],
[0, 0, 0, 0]])
If you are okay with a sparse-matrix, skip the .toarray() part for a memory-efficient solution.

Related

cupy/numpy ignores duplicate indexes

When we uses arrays as indexes cupy/numpy ignores duplicates.
Example:
import cupy as cp
matrix = cp.zeros((3, 3))
xi = cp.asarray([0, 1, 1, 2])
yi = cp.asarray([0, 1, 1, 2])
matrix[xi, yi] += 1
print(matrix.get())
Output:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
Desired output:
[[1. 0. 0.]
[0. 2. 0.]
[0. 0. 1.]]
The second one (1, 1) index is ignored. How to apply operation for duplicate indexes also?

How to fix IndexError from numpy?

import numpy as np
from scipy.io import mmread
from scipy import linalg
A = mmread('bcspwr02.mtx')
A =np.transpose(A)+A+np.identity(A.shape[0])
#A = np.array([[20, 18, 1], [2, 3, 1], [1, 2, 1]])
def get_b(A):
n = A.shape[0]
b = np.ones(n)
return b
def Jacobi(A, b, numIter):
n = A.shape[0]
x=np.zeros(n)
x0 = np.zeros(n)
for numItr in range(numIter):
print("Iteration "+ str(numItr) + ": " + str(x))
for i in range(len(A)):
temp = 0
for j in range(len(A)):
if i != j:
temp = x0[j] * A[i][j]
x[i] = float((b[i] - temp) / A[i][i])
else:
x0 = x.copy()
numIter = 4
Jacobi(A, get_b(A), numIter)
Result:
Iteration 0: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0.]
Traceback (most recent call last):
File "/Users/cxf/Desktop/test.py", line 36, in <module>
Jacobi(A, get_b(A), numIter)
File "/Users/cxf/Desktop/test.py", line 29, in Jacobi
temp = x0[j] * A[i][j]
File "/Applications/Spyder.app/Contents/Resources/lib/python3.9/numpy/matrixlib/defmatrix.py", line 193, in __getitem__
out = N.ndarray.__getitem__(self, index)
IndexError: index 1 is out of bounds for axis 0 with size 1
What exactly does mmread return?
A = mmread('bcspwr02.mtx')
A =np.transpose(A)+A+np.identity(A.shape[0])
The docs say "Dense or sparse matrix depending on the matrix format in the Matrix Market file."
Let's experiment with a sparse matrix:
In [52]: A = sparse.coo_matrix([[1,0,1],[0,0,1],[0,1,0]])
In [53]: A
Out[53]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in COOrdinate format>
In [54]: A.A
Out[54]:
array([[1, 0, 1],
[0, 0, 1],
[0, 1, 0]])
In [58]: A1 = np.transpose(A)+A+np.identity(A.shape[0])
In [59]: A1
Out[59]:
matrix([[3., 0., 1.],
[0., 1., 2.],
[1., 2., 1.]])
In [60]: A1[0]
Out[60]: matrix([[3., 0., 1.]]) # shape (1,3)
In [61]: A1[0][0]
Out[61]: matrix([[3., 0., 1.]]) # still (1,3)
In [62]: A1[0][1]
Traceback (most recent call last):
File "<ipython-input-62-c6007014201d>", line 1, in <module>
A1[0][1]
File "/usr/local/lib/python3.8/dist-packages/numpy/matrixlib/defmatrix.py", line 193, in __getitem__
out = N.ndarray.__getitem__(self, index)
IndexError: index 1 is out of bounds for axis 0 with size 1
If A is coo matrix, then the transpose expression creates a np.matrix. A1[i][j] indexing does not work the same as for regular numpy array. Instead you need to use the safe A1[i,j] syntax.
In [63]: A1[0,1]
Out[63]: 0.0
Note that the traceback tells me the error is in the defmatrix file. I should have read your traceback more carefully. The nature of the problem was hidden in plain sight!
initial
Evidently, in
x0[j] * A[i][j]
either j or i is too large. Why - we/you have to look at how they are set, and what the shape of x0 and A are.
Try to understand the error before asking how to fix it.
In the commented line A is (3,3), so n=3. Then x0 will be (3,) shape. i and j iterate over range(3). With those shapes
x0[j] * A[i][j]
x0[j] * A[i,j] # better
should work.
But the error says one of the arrays has shape (1,?) or (1,).
You need to check the array shapes. Don't just assume the shapes are right; when there's an error, you must verify.

Returning list of arrays from a function having as argument a vector

I have a function such as:
def f(x):
A =np.array([[0, 1],[0, -1/x]]);
return A
If I use an scalar I will obtain:
>>x=1
>>f(x)
array([[ 0., 1.],
[ 0., -1.]])
and if I use an array as an input, I will obtain:
>>x=np.linspace(1,3,3)
>>f(x)
array([[0, 1],
[0, array([-1. , -0.5 , -0.33333333])]], dtype=object)
Actually I would like to obtain a list of array, namely:
A = [A_1,A_2, ..., A_n]
Right now I do not care much about if it is an array of arrays or a list that contain several arrays.
I know I can do that using a for loop in x. But I think there is probably another way to do it, and maybe more efficient.
So the output that I would like would be something like:
>>x=np.linspace(1,3,3)
>>r=f(x)
array([[[0, 1],[0,-1]],
[[0, 1],[0,-0.5]],
[[0, 1],[0,-0.33333]]])
>>r[0]
array([[0, 1],[0,-1]])
or something like
>>x=np.linspace(1,3,3)
>>r=f(x)
[array([[0, 1],[0,-1]]),
array([[0, 1],[0,-0.5]]),
array([[0, 1],[0,-0.33333]])]
>>r[0]
array([[0, 1],[0,-1]])
Thanks
In your function we could check
type of given parameter. Here, if x is type of np.ndarray we are going to create nested list which we desire, otherwise we'll return output as before.
import numpy as np
def f(x):
if isinstance(x, np.ndarray):
v = -1/x
A = np.array([[[0, 1],[0, i]] for i in v])
else:
A = np.array([[0, 1],[0, -1/x]])
return A
x = np.linspace(1,3,3)
print(f(x))
Output:
[[[ 0. 1. ]
[ 0. -1. ]]
[[ 0. 1. ]
[ 0. -0.5 ]]
[[ 0. 1. ]
[ 0. -0.33333333]]]
You can do something like:
import numpy as np
def f(x):
x = np.array([x]) if type(x)==float or type(x)==int else x
A = np.stack([np.array([[0, 1],[0, -1/i]]) for i in x]);
return A
The first line deal with the cases when x is an int or a float, since is not an iterable. Then:
f(1)
array([[[ 0., 1.],
[ 0., -1.]]])
f(np.linspace(1,3,3))
array([[[ 0. , 1. ],
[ 0. , -1. ]],
[[ 0. , 1. ],
[ 0. , -0.5 ]],
[[ 0. , 1. ],
[ 0. , -0.33333333]]])

How do I compute one hot encoding using tf.one_hot?

I'm trying to build a one hot encoding of y_train of mnist data-set using tensorflow. I couldn't understand how to do it?
# unique values 0 - 9
y_train = array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)
In keras we'll do something like
# this converts it into one hot encoding
one hot_encoding = tf.keras.utils.to_categorical(y_train)
Where as in tf.one_hot what should be my input to indices & depth parameters? After doing one hot encoding how can I convert it back to numpy array from 2d-tensor?
I'm not familiar with Tensorflow but after some tests, this is what I've found:
tf.one_hot() takes an indices and a depth. The indices are the values to actually convert to a one-hot encoding. depth refers to the maximum value to utilize.
For example, take the following code:
y = [1, 2, 3, 2, 1]
tf.keras.utils.to_categorical(y)
sess = tf.Session();
with sess.as_default():
print(tf.one_hot(y, 2).eval())
print(tf.one_hot(y, 4).eval())
print(tf.one_hot(y, 6).eval())
tf.keras.utils.to_categorical(y) Returns the following:
array([[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 0., 1., 0.],
[0., 1., 0., 0.]], dtype=float32)
In contrast, the tf.one_hot() options (2, 4, and 6) do the following:
[[0. 1.]
[0. 0.]
[0. 0.]
[0. 0.]
[0. 1.]]
[[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]
[0. 0. 1. 0.]
[0. 1. 0. 0.]]
[[0. 1. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0.]
[0. 0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0. 0.]
[0. 1. 0. 0. 0. 0.]]
As can be seen here, to mimic tf.keras.utils.to_categorical() using tf.one_hot(), the depth parameter should be equivalent to the maximum value present in the array, +1 for 0. In this case, the maximum value is 3, so there are four possible values in the encoding - 0, 1, 2, and 3. As such, a depth of 4 is required to represent all of these values in the one-hot encoding.
As for conversion to numpy, as shown above, using a Tensorflow session, running eval() on a tensor converts it to a numpy array. For methods on doing this, refer to How can I convert a tensor into a numpy array in TensorFlow?.
I'm not familiar with Tensorflow but I hope this helps.
Note: for the purposes of MNIST, a depth of 10 should be sufficient.
I'd like to counter what #Andrew Fan has said. First, the above y label list does not start at index 0, which is what is required. Just look at the first column (i.e. index 0) in all of those examples: they're all empty. This will create a redundant class in the learning and may cause problems. One hot creates a simple list with 1 for that index position and zeros elsewhere. Therefore, your depth has to be the same as the number of classes, but you also have to start at index 0.

How to initialise a Numpy array of numpy arrays

I have a numpy array D of dimensions 4x4
I want a new numpy array based on an user defined value v
If v=2, the new numpy array should be [D D].
If v=3, the new numpy array should be [D D D]
How do i initialise such a numpy array as numpy.zeros(v) dont allow me to place arrays as elements?
If I understand correctly, you want to take a 2D array and tile it v times in the first dimension? You can use np.repeat:
# a 2D array
D = np.arange(4).reshape(2, 2)
print D
# [[0 1]
# [2 3]]
# tile it 3 times in the first dimension
x = np.repeat(D[None, :], 3, axis=0)
print x.shape
# (3, 2, 2)
print x
# [[[0 1]
# [2 3]]
# [[0 1]
# [2 3]]
# [[0 1]
# [2 3]]]
If you wanted the output to be kept two-dimensional, i.e. (6, 2), you could omit the [None, :] indexing (see this page for more info on numpy's broadcasting rules).
print np.repeat(D, 3, axis=0)
# [[0 1]
# [0 1]
# [0 1]
# [2 3]
# [2 3]
# [2 3]]
Another alternative is np.tile, which behaves slightly differently in that it will always tile over the last dimension:
print np.tile(D, 3)
# [[0, 1, 0, 1, 0, 1],
# [2, 3, 2, 3, 2, 3]])
You can do that as follows:
import numpy as np
v = 3
x = np.array([np.zeros((4,4)) for _ in range(v)])
>>> print x
[[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]]
Here you go, see if this works for you.
import numpy as np
v = raw_input('Enter: ')
To intialize the numpy array of arrays from user input (obviously can be whatever shape you're wanting here):
b = np.zeros(shape=(int(v),int(v)))
I know this isn't initializing a numpy array but since you mentioned wanting an array of [D D] if v was 2 for example, just thought I'd throw this in as another option as well.
new_array = []
for x in range(0, int(v)):
new_array.append(D)

Categories