how to generate a modified version of identity matrix in python - python

I want to generate a modified version of the identity matrix, call it C, such that Cii is zero until some index i, the rest is still 1.
I can use brute force to set Cii to 0, but I think that is not good.
Is there any efficient functions I can use, this is hard to search.
Example below:
the original identity matrix for 3 * 3 is
1 0 0
0 1 0
0 0 1
, I want to change this into:
0 0 0
0 1 0
0 0 1
so the i is 0 in this case, want to change Ckk, k goes from [0, i] to 0.

np.diag makes a 2d array from a 1d diagonal:
In [97]: np.diag((np.arange(6)>2).astype(int))
Out[97]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]])
basically the same as PPanzer's, but generating the diagonal a different way. Similar speed.

Here is one possibility:
N = 5
k = 2
np.diag(np.bincount([k],None,N).cumsum())
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]])
Update: fast solution:
out = np.zeros((N,N))
out.reshape(-1)[(N+1)*k::N+1] = 1

You can build an NxN identity matrix and assign zero to the top left KxK corner:
N,K = 10,3
im = np.identity(N)
im[:K,:K] = 0
print(im)
output:
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
40% faster than hpaulj's but not as fast at Paul Panzer's fast solution (which is 3x faster than this)

Related

How do I create a binary matrix with a specific repeating pattern of 1s and 0s?

I want to efficiently print a matrix in Python that follows a repeating specific pattern in the columns of 3 1s in a column, the rest of the columns 0s then the column 1s switch and so on for 1000 rows as shown below:
100000
100000
100000
010000
010000
010000
001000
001000
001000
000100
000100
000100
000010
000010
000010
000001
000001
000001
100000
100000
100000
010000
010000
010000
...
First, you can create a diagonal matrix of size (6, 6) with only ones on the diagonal:
>>> arr = np.diag(np.ones(6))
Then, you can repeat each rows of that matrix 3times:
>>> arr = np.repeat(arr, repeats=3, axis=0)
>>> arr
[[1. 0. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0. 0.]
[0. 0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 0. 1.]
[0. 0. 0. 0. 0. 1.]]
Finally, use np.tile to tile this matrix the number of times you want. In your case, as you want 1000 rows, you can repeat the array 1000 // 18 + 1 = 56 times, and only keep the first 1000 rows.
>>> arr = np.tile(arr, (56, 1))[:1000]
Build an identity matrix, and then take out the matrix you need by generating the row indices (elegant but inefficient):
>>> np.eye(6, dtype=int)[np.arange(1000) // 3 % 6]
array([[1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
...,
[0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0]])

Is there a way in Python to get a sub matrix as in Matlab?

For example, let's say that I have the following matrices in Matlab:
A = zeros(10)
B = ones(2,2)
I want to add the matrix A with B in specific positions of A that are stored like this:
locations = [1, 3]
I can do this:
A(locations, locations) = A(locations, locations) + B
So the job is done. In python, I would like to the same using NumPy arrays, like:
import numpy as np
A = np.zeros([10,10])
B = np.ones([2,2])
locations = np.array([0, 2]) #Because NumPy arrays are zero indexed
A[locations, locations] = A[locations, locations] + B
But I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: shape mismatch: value array of shape (2,2) could not be broadcast to indexing result of shape (2,)
Does anyone know how can I do this?
In [126]: A = np.zeros((5,5),int)
In [127]: B = np.arange(1,5).reshape(2,2)
In [128]: idx = np.array([0,2])
In numpy indexing with 2 1d arrays produces a 'diagonal', the points (0,0) and (2,2). In MATLAB you have to use some sort of sub2ind to convert the 2d indexing to 1d.
In [129]: A[idx,idx]
Out[129]: array([0, 0])
To get a block (as MATLAB) does we have to take advantage of broadcasting:
In [130]: A[idx[:,None],idx]
Out[130]:
array([[0, 0],
[0, 0]])
In [131]: A[idx[:,None],idx]=B
In [132]: A
Out[132]:
array([[1, 0, 2, 0, 0],
[0, 0, 0, 0, 0],
[3, 0, 4, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
This, and other indexing details, is covered in
https://numpy.org/doc/stable/reference/arrays.indexing.html
https://numpy.org/doc/stable/user/basics.broadcasting.html
A = np.zeros([10,10])
B = np.ones([2,2])
print(B)
A[:2,:2]=B
print(A)
#output B
[[1. 1.]
[1. 1.]]
#output A
[[1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

Add 2-d array to 3-d array with constantly changing index fast

I'm trying to add a 2-d array to a 3-d array with constantly changing index , I come up with following code:
import numpy as np
a = np.zeros([8, 3, 5])
k = 0
for i in range(2):
for j in range(4):
a[k, i: i + 2, j: j + 2] += np.ones([2, 2], dtype=int)
k += 1
print(a)
which will give exactly what i want:
[[[1. 1. 0. 0. 0.]
[1. 1. 0. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 1. 1. 0. 0.]
[0. 1. 1. 0. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 1. 1. 0.]
[0. 0. 1. 1. 0.]
[0. 0. 0. 0. 0.]]
[[0. 0. 0. 1. 1.]
[0. 0. 0. 1. 1.]
[0. 0. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
[1. 1. 0. 0. 0.]
[1. 1. 0. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 1. 1. 0. 0.]
[0. 1. 1. 0. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 1. 1. 0.]
[0. 0. 1. 1. 0.]]
[[0. 0. 0. 0. 0.]
[0. 0. 0. 1. 1.]
[0. 0. 0. 1. 1.]]]
I wish it can be faster so I create an array for index and trying to use np.vectorize. But as manual described, vectorize is not for performance. And my goal is running through an array with shape of (10^6, 15, 15) which end up with 10^6 iteration. I hope there are some cleaner solution can get rid of all the for-loop.
This is the first time I using stack overflow, any suggestion are appreciated.
Thank you.
A efficient solution using numpy.lib.stride_tricks, which can "view" all the possibilities.
N=4 #tray size #(square)
P=3 # chunk size
R=N-P
from numpy.lib.stride_tricks import as_strided
tray = zeros((N,N),numpy.int32)
chunk = ones((P,P),numpy.int32)
tray[R:,R:] = chunk
tray = np.vstack((tray,tray))
view = as_strided(tray,shape=(R+1,R+1,N,N),strides=(4*N,4,4*N,4))
a_view = view.reshape(-1,N,N)
a_hard = a_view.copy()
Here is the result :
In [3]: a_view
Out[3]:
array([[[0, 0, 0, 0],
[0, 1, 1, 1],
[0, 1, 1, 1],
[0, 1, 1, 1]],
[[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 0]],
[[0, 1, 1, 1],
[0, 1, 1, 1],
[0, 1, 1, 1],
[0, 0, 0, 0]],
[[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 0],
[0, 0, 0, 0]]])
a_view is just a view on possible positions of a chunk on the tray. It doesn't cost any computation, and it just uses twice the tray space.
a_hard is a hard copy, necessary if you need to modify it.

Quickly fill large Numpy matrix from Pandas DataFrame

I have DataFrame df with info of x-axes, y-axes, and values to fill numpy matrix mat.
Example of smaller df:
y x x x x value value value value
1 6 3 6 4 100 10 300 15
1 6 2 8 7 50 200 35 70
5 7 5 4 6 2 50 40 400
7 5 3 2 1 105 80 35 44
I want to fill mat = np.zeros(shape=(10,10)) by each y is row index, x is column index with the value at the same position as x in value block. Such as
col=1, row=6, value=100 ###
col=1, row=3, value=10
col=1, row=6, value=300 ###
col=1, row=4, value=10
col=1, row=6, value=50 ###
If more than one value goes into that position (like ###), do average. Is there any ways to go direct from Pandas to matrix (or other quick way)?
What I can do now is using np.ravel of selected column in dataframe first to make 1D-arrays and fill from those arrays but it is slow and redundant a lot.
Construct row and column indices and perform slice assignment.
val = df.values
j = val[:, 0].repeat(4)
i = val[:, 1: 5].ravel()
v = val[:, 5:].ravel()
mat = np.zeros(shape=(10,10), dtype=int)
mat[i, j] = v
mat
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 44, 0, 0],
[ 0, 200, 0, 0, 0, 0, 0, 35, 0, 0],
[ 0, 10, 0, 0, 0, 0, 0, 80, 0, 0],
[ 0, 15, 0, 0, 0, 40, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 50, 0, 105, 0, 0],
[ 0, 50, 0, 0, 0, 400, 0, 0, 0, 0],
[ 0, 70, 0, 0, 0, 2, 0, 0, 0, 0],
[ 0, 35, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
For averages
val = df.values
j = val[:, 0].repeat(4)
i = val[:, 1: 5].ravel()
v = val[:, 5:].ravel()
sums = np.bincount(i * 10 + j, v, 100)
cnts = np.bincount(i * 10 + j, minlength=100)
mask = cnts > 0
sums[mask] /= cnts[mask]
print(sums.reshape(10, 10))
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 44. 0. 0.]
[ 0. 200. 0. 0. 0. 0. 0. 35. 0. 0.]
[ 0. 10. 0. 0. 0. 0. 0. 80. 0. 0.]
[ 0. 15. 0. 0. 0. 40. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 50. 0. 105. 0. 0.]
[ 0. 150. 0. 0. 0. 400. 0. 0. 0. 0.]
[ 0. 70. 0. 0. 0. 2. 0. 0. 0. 0.]
[ 0. 35. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

1D Numpy array does not get reshaped into a 2D array

columns = np.shape(lines)[0] # Gets x-axis dimension of array lines (to get numbers of columns)
lengths = np.zeros(shape=(2,1)) # Create a 2D array
# lengths = [[ 0.]
# [ 0.]]
lengths = np.arange(columns).reshape((columns)) # Makes array have the same number of columns as columns and fills it with elements going up from zero <--- This line seems to be turning it into a 1D array
Output after printing lengths array:
print(lengths)
[0 1 2]
Expected Output Example:
print(lengths)
[[0 1 2]] # Notice the double square bracket
This results in me not being able to enter data into a 2D parts of an array, because it now no longer exists:
np.append(lengths, 65, axis=1)
AxisError: axis 1 is out of bounds for array of dimension 1
I want the array to be 2D so I can store "IDs" on the first row and values on the second (at a later point in the program). I'm also aware that I could add another row to the array instead of doing it at initialization. But I'd rather not do that since I heard that's inefficient and this program's success is highly dependent on performance.
Thank you.
Since you eventually want a 2d array with ids in one row and values in the second, I'd suggest starting with the right size
In [535]: arr = np.zeros((2,10),int)
In [536]: arr
Out[536]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [537]: arr[0,:]=np.arange(10)
In [538]: arr
Out[538]:
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Sure you could start with a 1 row array of ids, but adding that 2nd row at a later time requires making a new array anyways. np.append is just a variation on np.concatenate.
But to make a 2d array from arange I like:
In [539]: np.arange(10)[None,:]
Out[539]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
reshape also works, but has to be given the correct shape, e.g. (1,10).
In:
lengths = np.zeros(shape=(2,1)) # Create a 2D array
lengths = np.arange(columns).reshape((columns))
the 2nd lengths assignment replaces the first. You have to do an indexed assignment as I did with arr[0,:] to modify an existing array. lengths[0,:] = np.arange(10) wouldn't work because lengths only has 1 column, not 10. Assignments like this require correct pairing of dimensions.
Don't need 2D data to put into a column of a 2D array. You just need 1D data.
You can put the data into the 0th row instead of the 0th column if you change the organization of memory. This is copying data into contiguous memory (memory without gaps) and that is faster.
Program:
import numpy as np
data = np.arange(12)
#method 1
buf = np.zeros((12, 6))
buf[:,0] = data
print(buf)
#method 2
buf = np.zeros((6, 12))
buf[0] = data
print(buf)
Result:
[[ 0. 0. 0. 0. 0. 0.]
[ 1. 0. 0. 0. 0. 0.]
[ 2. 0. 0. 0. 0. 0.]
[ 3. 0. 0. 0. 0. 0.]
[ 4. 0. 0. 0. 0. 0.]
[ 5. 0. 0. 0. 0. 0.]
[ 6. 0. 0. 0. 0. 0.]
[ 7. 0. 0. 0. 0. 0.]
[ 8. 0. 0. 0. 0. 0.]
[ 9. 0. 0. 0. 0. 0.]
[ 10. 0. 0. 0. 0. 0.]
[ 11. 0. 0. 0. 0. 0.]]
[[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

Categories