How to create a binary matrix with some given condition below: - python

For a given list of tuples L whose elements are taken from range(n), I want to create A binary matrix A of order n in the following way:
If (i,j) or (j,i) in L then A[i][j]=1 otherwise A[i][j]=0.
Let us consider the following example:
L=[(2,3),(0,1),(1,3),(2,0),(0,3)]
A=[[0]*4]*4
for i in range(4):
for j in range(4):
if (i,j) or (j,i) in L:
A[i][j]=1
else:
A[i][j]=0
print A
This program does not give the accurate result. Where is the logical mistake occurred?

You should use a 3rd party library, numpy, for matrix calculations.
Python lists of lists are inefficient for numeric arrays.
import numpy as np
L = [(2,3),(0,1),(1,3),(2,0),(0,3)]
A = np.zeros((4, 4))
idx = np.r_[L].T
A[idx[0], idx[1]] = 1
Result:
array([[ 0., 1., 0., 1.],
[ 0., 0., 0., 1.],
[ 1., 0., 0., 1.],
[ 0., 0., 0., 0.]])
Related: Why NumPy instead of Python lists?

According to Aran-Fey's correction the answer is :
L=[(2,3),(0,1),(1,3),(2,0),(0,3)]
#A=[[0]*4]*4
A=[[0]*4 for _ in range(4)]
for i in range(4):
for j in range(4):
if (i,j) in L or (j,i) in L:
A[i][j]=1
else:
A[i][j]=0
print A

Related

PyTorch add to tensor at indices with degenerate indices

This question may be seen as an extension to this one.
I have two 1D tensors, counts and idx. Counts is length 20 and stores the occurrences of events that fall into 1 of 20 bins. idx is very long, and each entry is an integer which corresponds to the occurrence of 1 of the 20 events, and each event can occur multiple times. I'd like a vectorized or very fast way to add the number of times event i occurred in idx to the i'th bucket in counts. Furthermore, it would be ideal if the solution was compatible with operation on batches of count's and idx's during a training loop.
My first thought was to simply use this strategy of indexing counts with idx:
counts = torch.zeros(5)
idx = torch.tensor([1,1,1,2,3])
counts[idx] += 1
But it did not work, with counts ending at
tensor([0., 1., 1., 1., 0.])
instead of the desired
tensor([0., 3., 1., 1., 0.])
What's the fastest way I can do this? My next best guess is
for i in range(20):
counts[i] += idx[idx == i].sum()
Please consider the following proposal implemented with the bincount function which counts the frequency of each value in tensor of non-negative ints (The only constraint).
import torch
EVENT_TYPES = 20
counts = torch.zeros(EVENT_TYPES)
events = torch.tensor([1, 1, 1, 2, 3, 9])
batch_counts = torch.bincount(events, minlength=EVENT_TYPES)
print(counts + batch_counts)
Result:
tensor([0., 3., 1., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0.])
You can evaluate that for every batch being only in torch tensor environment. You control the number of event types using the minlength argument in the bincount function. In this case 20 as you described in the problem.

Create identity matrices with arbitrary shape with numpy

Is there a faster / inbuilt way to generate identity matrices with arbitrary shape in the first dimensions and an identity in the last m dimensions?
import numpy as np
base_shape = (10, 11, 12)
n_dim = 4
# m = 2
frames2d = np.zeros(base_shape + (n_dim, n_dim))
for i in range(n_dim):
frames2d[..., i, i] = 1
# m = 3
frames3d = np.zeros(base_shape + (n_dim, n_dim, n_dim))
for i in range(n_dim):
frames3d[..., i, i, i] = 1
Approach #1
We can leverage np.einsum for a diagonal view inspired by this post and hence assign 1s there for our desired output. So, for say the m=3 case, after initializing with zeros, we can simply do -
diag_view = np.einsum('...iii->...i',frames3d)
diag_view[:] = 1
Generalizing to include those input params, it would be -
def ndeye_einsum(base_shape, n_dim, m):
out = np.zeros(list(base_shape) + [n_dim]*m)
diag_view = np.einsum('...'+'i'*m+'->...i',out)
diag_view[:] = 1
return out
So, to reproduce those same arrays, it would be -
frames2d = ndeye_einsum(base_shape, n_dim, m=2)
frames3d = ndeye_einsum(base_shape, n_dim, m=3)
Approach #2
Again, from the same linked post, we can also reshape to 2D and assign into step-sized sliced array along the cols, like so -
def ndeye_reshape(base_shape, n_dim, m):
N = (n_dim**np.arange(m)).sum()
out = np.zeros(list(base_shape) + [n_dim]*m)
out.reshape(-1,n_dim**m)[:,::N] = 1
return out
This again works on a view and hence should be equally efficient as approach #1.
Approach #3
Another way would be to use integer-based indexing. So, for example for assigning into frames3d in one-go, it would be -
I = np.arange(n_dim)
frames3d[..., I, I, I] = 1
Generalizing that becomes -
def ndeye_ellipsis_indexer(base_shape, n_dim, m):
I = np.arange(n_dim)
indexer = tuple([Ellipsis]+[I]*m)
out = np.zeros(list(base_shape) + [n_dim]*m)
out[indexer] = 1
return out
Extending to higher-dims with view
The dims along base_shape are basically replications of elements from the last m dims. As such, we can get those higher dims as a higher-dim array view with np.broadcast_to. We will create basically a m-dim identity array and then broadcast-view into higher dims. This would be applicable across all three approaches posted earlier. To demonstrate, how to use it on the einsum based solution, we would have -
# Create m-dim "trailing-base" array, basically a m-dim identity array
def ndeye_einsum_trailingbase(n_dim, m):
out = np.zeros([n_dim]*m)
diag_view = np.einsum('i'*m+'->...i',out)
diag_view[:] = 1
return out
def ndeye_einsum_view(base_shape, n_dim, m):
trail_base = ndeye_einsum_trailingbase(n_dim, m)
return np.broadcast_to(trail_base, list(base_shape) + [n_dim]*m)
Thus, again we would have, e.g. -
frames3d = ndeye_einsum_view(base_shape, n_dim, m=3)
This would be a view into a m-dim array and hence efficient both on memory and performance.
One approach to have an identity matrix along the last two dimensions of the array, is to use np.broadcast_to and specifying the resulting shape the ndarray should have (this does not generalize to higher dimensions):
base_shape = (10, 11, 12)
n_dim = 4
frame2d = np.broadcast_to(np.eye(n_dim), a.shape+(n_dim,)*2)
print(frame2d.shape)
# (10, 11, 12, 4, 4)
print(frame2d)
array([[[[[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]],
[[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]],
...

What's the most efficient way to increment an array by a reference while broadcasting row to column in NumPy Python? Can it be vectorized?

I have this piece of code in Python
for i in range(len(ax)):
for j in range(len(rx)):
x = ax[i] + rx[j]
y = ay[i] + ry[j]
A[x,y] = A[x,y] + 1
where
A.shape = (N,M)
ax.shape = ay.shape = (L)
rx.shape = ry.shape = (K)
I wanted to vectorize or otherwise make it more efficient, i.e. faster, and if possible more economical in memory consumption. Here, my ax and ay refer to the absolute elements of an array A, while rx and ay are relative coordinates. So, I'm updating the counter array A.
My table A can be 1000x1000, while ax,ay are 100x1 and cx,cy are 300x1. The whole thing's inside the loop, preferably the optimized code doesn't keep creating big tables of A size.
This question is related to the one I asked before, but it's not directly applicable to this situation due to the way increment works. Here's an example.
This code does exactly what I want:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0,0])
ry = np.array([0,0,0])
for i in range(len(ax)):
for j in range(len(rx)):
x = ax[i] + rx[j]
y = ay[i] + ry[j]
print(x,y)
A[x,y] = A[x,y] + 1
A
array([[ 0., 1., 0., 0., 0.],
[ 0., 3., 0., 0., 0.],
[ 0., 2., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
However, the following code doesn't work, because when we're incrementing an array, it pre-calculates the right side with the array:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0])
ry = np.array([0,0])
x = ax + rx[:,np.newaxis]
y = ay + ry[:,np.newaxis]
A[x,y] = A[x,y] + 1
A
array([[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
This solution works in terms of the correctness of numbers, but it's not the fastest, probably, because of np.add.at() function is not buffered:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0,0])
ry = np.array([0,0,0])
x = ax + rx[:,np.newaxis]
y = ay + ry[:,np.newaxis]
np.add.at(A,[x,y],1)
A
Here's one leveraging broadcasting, getting linear indices, which are then fed to the very efficient np.bincount for binned summations -
m,n = 4,5 # shape of output array
X = ax[:,None] + rx
Y = ay[:,None] + ry
Aout = np.bincount((X*n + Y).ravel(), minlength=m*n).reshape(m,n)
Alternative one with np.flatnonzero -
idx = (X*n + Y).ravel()
idx.sort()
m = np.r_[True,idx[1:] != idx[:-1],True]
A.ravel()[idx[m[:-1]]] = np.diff(np.flatnonzero(m))
If you are adding into A iteratively, replace = with += there at the last step.

How to append numpy.array to other numpy.array?

I want to create 2D numpy.array knowing at the begining only its shape, i.e shape=2. Now, I want to create in for loop ith one dimensional numpy.arrays, and add them to the main matrix of shape=2, so I'll get something like this:
matrix=
[numpy.array 1]
[numpy.array 2]
...
[numpy.array n]
How can I achieve that? I try to use:
matrix = np.empty(shape=2)
for i in np.arange(100):
array = np.zeros(random_value)
matrix = np.append(matrix, array)
But as a result of print(np.shape(matrix)), after loop, I get something like:
(some_number, )
How can I append each new array in the next row of the matrix? Thank you in advance.
I would suggest working with list
matrix = []
for i in range(10):
a = np.ones(2)
matrix.append(a)
matrix = np.array(matrix)
list does not have the downside of being copied in the memory everytime you use append. so you avoid the problem described by ali_m. at the end of your operation you just convert the list object into a numpy array.
I suspect the root of your problem is the meaning of 'shape' in np.empty(shape=2)
If I run a small version of your code
matrix = np.empty(shape=2)
for i in np.arange(3):
array = np.zeros(3)
matrix = np.append(matrix, array)
I get
array([ 9.57895902e-259, 1.51798693e-314, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000])
See those 2 odd numbers at the start? Those are produced by np.empty(shape=2). That matrix starts as a (2,) shaped array, not an empty 2d array. append just adds sets of 3 zeros to that, resulting in a (11,) array.
Now if you started with a 2 array with the right number of columns, and did concatenate on the 1st dimension you would get a multirow array. (rows only have meaning in 2d or larger).
mat=np.zeros((1,3))
for i in range(1,3):
mat = np.concatenate([mat, np.ones((1,3))*i],axis=0)
produces:
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
A better way of doing an iterative construction like this is with list append
alist = []
for i in range(0,3):
alist.append(np.ones((1,3))*i)
mat=np.vstack(alist)
alist is:
[array([[ 0., 0., 0.]]), array([[ 1., 1., 1.]]), array([[ 2., 2., 2.]])]
mat is
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
With vstack you can get by with np.ones((3,), since it turns all of its inputs into 2d array.
append would work, but it also requires axis=0 parameter, and 2 arrays. It gets misused, often by mistaken analogy to the list append. It is just another front end to concatenate. So I prefer not to use it.
Notice that other posters assumed your random value changed during the iteration. That would produce a arrays of differing lengths. For 1d appending that would still produce the long 1d array. But a 2d append wouldn't work, because an 2d array can't be ragged.
mat = np.zeros((2,),int)
for i in range(4):
mat=np.append(mat,np.ones((i,),int)*i)
# array([0, 0, 1, 2, 2, 3, 3, 3])
The function you are looking for is np.vstack
Here is a modified version of your example
import numpy as np
matrix = np.empty(shape=2)
for i in np.arange(3):
array = np.zeros(2)
matrix = np.vstack((matrix, array))
The result is
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])

List as element of list of lists or multidimensional lists as a grid

I am trying to create a lat/lon grid that contains an array of found indices where two conditions are met for a lat/lon combination. This approach might be too complicated, but using a meshgrid or numpy broadcasting failed also. If there is a better approach, feel free to share your knowlegde. :-)
Round lat/lon values to gridsize resolution of 1° but retain full length of array:
x = np.around(lon, decimals=0)
y = np.around(lat, decimals=0)
arrays consists of longitude/latitude values from -180 to 180 and -82° to 82°; multiple douplets possible
Check for each combination of lat/lon how many measurements are available for 1°/1° grid point:
a = arange(-180,181)
b = arange(-82,83)
totalgrid = [ [ 0 for i in range(len(b)) ] for j in range(len(a)) ]
for d1 in range(len(a)):
for d2 in range(len(b)):
totalgrid[d1][d2]=np.where((x==a[d1])&(y==b[d2]))[0]
This method fails and returns only a list of lists with empty arrays. I can't figure out why it's not working properly.
Replacing the last line by:
totalgrid[d1][d2]=np.where((x==a[0])&(y==b[0]))[0]
returns all found indices from lon/lat that are present at -180°/-82°. Unfortunately it takes a while. Am I missing a for loop somewhere?!
The Problem in more detail:
#askewchan
Unfortunately this one does not solve my original problem.
As expected the result represents the groundtrack quite well.
But besides the fact that I need the total number of points for each grid point, I also need each single index of lat/lon combinations in the lat/lon array for further computations.
Let's assume I have an array
lat(100000L,), lon(100000L,) and a third one array(100000L,)
which corresponds to the measurement at each point. I need every index of all 1°/1° combinations in lat/lon, to check this index in the array(100000L,) if a condition is met. Now lets assume that the indices[10000,10001,10002,..,10025] of lat/lon are on the same gridpoint. For those indices I need to check whether array[10000,10001,10002,..,10025] now met a condition, i.e. np.where(array==0). With cts.nonzero() I only get the index in the histogram. But then all information of each point contributing to the value of the histogram is lost. Hopefully you get what was my initial problem.
Not sure if I understand the goal here, but you want to count how many lat/lon pairs you have in each 1° section? This is what a histogram does:
lon = np.random.random(5000)*2*180 - 180
lat = np.random.random(5000)*2*82 - 82
a = np.arange(-180,181)
b = np.arange(-82,83)
np.histogram2d(lon, lat, (a,b))
#(array([[ 0., 0., 1., ..., 0., 0., 0.],
# [ 0., 2., 0., ..., 0., 0., 1.],
# [ 0., 0., 0., ..., 0., 1., 0.],
# ...,
# [ 0., 1., 0., ..., 0., 0., 0.],
# [ 0., 0., 0., ..., 0., 0., 0.],
# [ 0., 0., 0., ..., 0., 0., 0.]]),
The indices where you have a nonzero count would be at:
cts.nonzero()
#(array([ 0, 0, 0, ..., 359, 359, 359]),
# array([ 2, 23, 25, ..., 126, 140, 155]))
You can plot it too:
cts, xs, ys = np.histogram2d(lon, lat, (a,b))
pyplot.imshow(cts, extent=(-82,82,-180,180))

Categories