Create identity matrices with arbitrary shape with numpy - python

Is there a faster / inbuilt way to generate identity matrices with arbitrary shape in the first dimensions and an identity in the last m dimensions?
import numpy as np
base_shape = (10, 11, 12)
n_dim = 4
# m = 2
frames2d = np.zeros(base_shape + (n_dim, n_dim))
for i in range(n_dim):
frames2d[..., i, i] = 1
# m = 3
frames3d = np.zeros(base_shape + (n_dim, n_dim, n_dim))
for i in range(n_dim):
frames3d[..., i, i, i] = 1

Approach #1
We can leverage np.einsum for a diagonal view inspired by this post and hence assign 1s there for our desired output. So, for say the m=3 case, after initializing with zeros, we can simply do -
diag_view = np.einsum('...iii->...i',frames3d)
diag_view[:] = 1
Generalizing to include those input params, it would be -
def ndeye_einsum(base_shape, n_dim, m):
out = np.zeros(list(base_shape) + [n_dim]*m)
diag_view = np.einsum('...'+'i'*m+'->...i',out)
diag_view[:] = 1
return out
So, to reproduce those same arrays, it would be -
frames2d = ndeye_einsum(base_shape, n_dim, m=2)
frames3d = ndeye_einsum(base_shape, n_dim, m=3)
Approach #2
Again, from the same linked post, we can also reshape to 2D and assign into step-sized sliced array along the cols, like so -
def ndeye_reshape(base_shape, n_dim, m):
N = (n_dim**np.arange(m)).sum()
out = np.zeros(list(base_shape) + [n_dim]*m)
out.reshape(-1,n_dim**m)[:,::N] = 1
return out
This again works on a view and hence should be equally efficient as approach #1.
Approach #3
Another way would be to use integer-based indexing. So, for example for assigning into frames3d in one-go, it would be -
I = np.arange(n_dim)
frames3d[..., I, I, I] = 1
Generalizing that becomes -
def ndeye_ellipsis_indexer(base_shape, n_dim, m):
I = np.arange(n_dim)
indexer = tuple([Ellipsis]+[I]*m)
out = np.zeros(list(base_shape) + [n_dim]*m)
out[indexer] = 1
return out
Extending to higher-dims with view
The dims along base_shape are basically replications of elements from the last m dims. As such, we can get those higher dims as a higher-dim array view with np.broadcast_to. We will create basically a m-dim identity array and then broadcast-view into higher dims. This would be applicable across all three approaches posted earlier. To demonstrate, how to use it on the einsum based solution, we would have -
# Create m-dim "trailing-base" array, basically a m-dim identity array
def ndeye_einsum_trailingbase(n_dim, m):
out = np.zeros([n_dim]*m)
diag_view = np.einsum('i'*m+'->...i',out)
diag_view[:] = 1
return out
def ndeye_einsum_view(base_shape, n_dim, m):
trail_base = ndeye_einsum_trailingbase(n_dim, m)
return np.broadcast_to(trail_base, list(base_shape) + [n_dim]*m)
Thus, again we would have, e.g. -
frames3d = ndeye_einsum_view(base_shape, n_dim, m=3)
This would be a view into a m-dim array and hence efficient both on memory and performance.

One approach to have an identity matrix along the last two dimensions of the array, is to use np.broadcast_to and specifying the resulting shape the ndarray should have (this does not generalize to higher dimensions):
base_shape = (10, 11, 12)
n_dim = 4
frame2d = np.broadcast_to(np.eye(n_dim), a.shape+(n_dim,)*2)
print(frame2d.shape)
# (10, 11, 12, 4, 4)
print(frame2d)
array([[[[[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]],
[[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]],
...

Related

How to create a numpy array with a an extra dimension depending on an where clause?

Problem
I need to create an array that takes the argmax and based on that maximum value position fill the array with [1,0] while the other fields that are not the maximum will be filled with [0,1].
Example:
Given the vector a:
a.shape = (3,2)
a = np.array([[1,0],[1,2],[1,3]])
Return the vector b:
b.shape = (3,2,2)
b = np.array([[[1,0],[0,1]],[[0,1],[1,0]],[[0,1],[1,0]]])
c = np.argmax(a, axis=1)
b = np.empty(tuple(list(a.shape) + [2]))
b[range(len(c)), c, :] = [1, 0]
b[range(len(c)), ~c, :] = [0, 1]
b
>>>array([[[1., 0.],
[0., 1.]],
[[0., 1.],
[1., 0.]],
[[0., 1.],
[1., 0.]]])
Note this only works in this example since the argmax will ever be only 0 or 1. If the second dimension in a is greater than 2 I don't think that this solution will work
I was able to create a function that returns the desirable result but will only work for two classes. It could be adapted for multiple classes:
a = np.array([[1,0],[1,2],[1,3]])
def create_dist_prob_target(arr):
p_ = np.squeeze(arr,axis=1)
a = np.expand_dims(np.where((p_ == np.amax(p_,axis = 1)[:,None]),1,0),axis=-1)
b = np.expand_dims(np.where((p_ == np.amax(p_,axis = 1)[:,None]),0,1),axis=-1)
return np.concatenate((a,b),axis=2)
b = create_dist_prob_target(a)
print(b)

How to create a binary matrix with some given condition below:

For a given list of tuples L whose elements are taken from range(n), I want to create A binary matrix A of order n in the following way:
If (i,j) or (j,i) in L then A[i][j]=1 otherwise A[i][j]=0.
Let us consider the following example:
L=[(2,3),(0,1),(1,3),(2,0),(0,3)]
A=[[0]*4]*4
for i in range(4):
for j in range(4):
if (i,j) or (j,i) in L:
A[i][j]=1
else:
A[i][j]=0
print A
This program does not give the accurate result. Where is the logical mistake occurred?
You should use a 3rd party library, numpy, for matrix calculations.
Python lists of lists are inefficient for numeric arrays.
import numpy as np
L = [(2,3),(0,1),(1,3),(2,0),(0,3)]
A = np.zeros((4, 4))
idx = np.r_[L].T
A[idx[0], idx[1]] = 1
Result:
array([[ 0., 1., 0., 1.],
[ 0., 0., 0., 1.],
[ 1., 0., 0., 1.],
[ 0., 0., 0., 0.]])
Related: Why NumPy instead of Python lists?
According to Aran-Fey's correction the answer is :
L=[(2,3),(0,1),(1,3),(2,0),(0,3)]
#A=[[0]*4]*4
A=[[0]*4 for _ in range(4)]
for i in range(4):
for j in range(4):
if (i,j) in L or (j,i) in L:
A[i][j]=1
else:
A[i][j]=0
print A

python: broadcast sliced np.array assignment to any number of dimensions

I have np.arrays C, R and S of shapes (?, d), (?, n) and (?, d) respectively; where d<=n and the question mark represents any number of matching dimensions. Now I would like to do the following assignment (this is of course not proper python code, but it works if ? is just a single number):
for i in range(?):
R[i][S[i]]=C[i]
That is: I want for each tuple i of indices (within the bounds specified by ?) to take the corresponding array R[i] in R and assign d many positions (the ones specified by S[i]) to be the values in the array C[i].
What is the pythonic way to do this?
Example:
setup
import numpy as np
m,n,d= 2,7,4
R=np.zeros((m,n))
C=np.arange(d*m).reshape((m,d))
S=np.array([[0,2,4,6],[3,4,5,6]])
this works:
for i in range(m):
R[i][S[i]]=C[i]
this does not work:
R[S]=C
Your 2D example can be done as follows:
R[np.arange(m)[:, None], S] = C
# array([[ 0., 0., 1., 0., 2., 0., 3.],
# [ 0., 0., 0., 4., 5., 6., 7.]])
The 3D case would be similar:
i, j, k = R.shape
i, j, k = np.ogrid[i, j, k]
R[i, j, S] = C
In ND one could write:
idx = np.ogrid[tuple(map(slice, R.shape))]
idx[-1] = S
R[idx] = C

What's the most efficient way to increment an array by a reference while broadcasting row to column in NumPy Python? Can it be vectorized?

I have this piece of code in Python
for i in range(len(ax)):
for j in range(len(rx)):
x = ax[i] + rx[j]
y = ay[i] + ry[j]
A[x,y] = A[x,y] + 1
where
A.shape = (N,M)
ax.shape = ay.shape = (L)
rx.shape = ry.shape = (K)
I wanted to vectorize or otherwise make it more efficient, i.e. faster, and if possible more economical in memory consumption. Here, my ax and ay refer to the absolute elements of an array A, while rx and ay are relative coordinates. So, I'm updating the counter array A.
My table A can be 1000x1000, while ax,ay are 100x1 and cx,cy are 300x1. The whole thing's inside the loop, preferably the optimized code doesn't keep creating big tables of A size.
This question is related to the one I asked before, but it's not directly applicable to this situation due to the way increment works. Here's an example.
This code does exactly what I want:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0,0])
ry = np.array([0,0,0])
for i in range(len(ax)):
for j in range(len(rx)):
x = ax[i] + rx[j]
y = ay[i] + ry[j]
print(x,y)
A[x,y] = A[x,y] + 1
A
array([[ 0., 1., 0., 0., 0.],
[ 0., 3., 0., 0., 0.],
[ 0., 2., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
However, the following code doesn't work, because when we're incrementing an array, it pre-calculates the right side with the array:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0])
ry = np.array([0,0])
x = ax + rx[:,np.newaxis]
y = ay + ry[:,np.newaxis]
A[x,y] = A[x,y] + 1
A
array([[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
This solution works in terms of the correctness of numbers, but it's not the fastest, probably, because of np.add.at() function is not buffered:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0,0])
ry = np.array([0,0,0])
x = ax + rx[:,np.newaxis]
y = ay + ry[:,np.newaxis]
np.add.at(A,[x,y],1)
A
Here's one leveraging broadcasting, getting linear indices, which are then fed to the very efficient np.bincount for binned summations -
m,n = 4,5 # shape of output array
X = ax[:,None] + rx
Y = ay[:,None] + ry
Aout = np.bincount((X*n + Y).ravel(), minlength=m*n).reshape(m,n)
Alternative one with np.flatnonzero -
idx = (X*n + Y).ravel()
idx.sort()
m = np.r_[True,idx[1:] != idx[:-1],True]
A.ravel()[idx[m[:-1]]] = np.diff(np.flatnonzero(m))
If you are adding into A iteratively, replace = with += there at the last step.

Defining Error of An Array with Two Index

I get an error such as;
Traceback (most recent call last): File
"C:\Users\SONY\Desktop\deneme.py", line 42, in
G[alpha][n]=compute_G(x,n) NameError: name 'G' is not defined
Here is my code:
N = 20
N_cor = 25
N_cf = 25
a = 0.5
eps = 1.4
def update(x):
for j in range(0,N):
old_x = x[j]
old_Sj = S(j,x)
x[j] = x[j] + random.uniform(-eps,eps)
dS = S(j,x) - old_Sj
if dS>0 and exp(-dS)<random.uniform(0,1):
x[j] = old_x
def S(j,x):
jp = (j+1)%N
jm = (j-1)%N
return a*x[j]**2/2 + x[j]*(x[j]-x[jp]-x[jm])/a
def compute_G(x,n):
g = 0
for j in range(0,N):
g = g + x[j]*x[(j+n)%N]
return g/N
#def MCaverage(x,G):
import random
from math import exp
x=[]
for j in range(0,N):
x.append(0.0)
print"x(%d)=%f"%(j,x[j])
for j in range(0,5*N_cor):
update(x)
for alpha in range(0,N_cf):
for j in range(0,N_cor):
update(x)
for i in range(0,N):
print"x(%d)=%f"%(i,x[i])
for n in range(0,N):
G[alpha][n]=compute_G(x,n)
for n in range(0,N):
avg_G = 0
for alpha in range(0,N_cf):
avg_G = avg_G + G[alpha][n]
avg_G = avg_G / N_cf
print "G(%d) = %f"%(n,avg_G)
When i define G I get another error such as:
Traceback (most recent call last): File
"C:\Users\SONY\Desktop\deneme.py", line 43, in
G[alpha][n]=compute_G(x,n) IndexError: list index out of range
Here is how i define G:
...
for alpha in range(0,N_cf):
for j in range(0,N_cor):
update(x)
for n in range(0,N):
G=[][]
G[alpha][n]=compute_G(x,n)
...
What should i do to define an array with two index ie a two dimensional matrix?
In Python a=[] defines a list, not an array. It certainly can be used to store a lot of elements all of the same numeric type, and one can define a mapping from two integers indexing a rectangular array to one list index. It's rather going against the grain, though. Hard to program and inefficiently stored, because lists are intended as ordered collections of objects which may be of arbitrary type.
What you probably need most is a direction to where to start reading. Here it is. Learn about Numpy http://www.numpy.org/, which is a Python module for use in typical scienticic calculations with arrays of (mostly) numeric data in which all the elements are of the same type. Here is a brief taster, after you have installed numpy.
>>> import numpy as np # importing as np is conventional
>>> p = np.zeros( (6,4) ) # two dimensional, 24 elements in total
>>> for i in range(4): p[i,i]=1
>>> p
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
numpy arrays are efficient ways of manipulating as much data as you can fit into your computer's RAM.
Underlying numpy is Python's array.array datatype, but it is rarely used on its own. numpy is the support code that you'll usually not want to write for yourself. Not least, because when your arrays are millions or billions of elements, you can't afford the inefficiency of inner loops over their indices in an interpreted language like Python. Numpy offers you row-, column- and array-level operations whose underlying code is compiled and optimized, so it runs considerably faster.

Categories