This was the only question I found about standard basis vectors in numpy but it's not really related to my question.
I have a numpy array of integers and I want to determine the co-occurrence matrix which stores the number of times indicies have the same value in the same column. This question describes the problem in more detail.
I have a method of solving my problem but it doesn't scale well.
My question then is this:
Is it possible to store standard basis vectors in a numpy array in a memory efficient manner?
I want to be able to do the following:
Given an array
M = e1 e2 e1
e1 e2 e2
e3 e1 e3
e2 e3 e3
where ei is the transposed i-th standard basis vector of the vector space (R3 in this case), perform matrix multiplication with the transpose of M, i.e. determine np.dot(M, M.T). To be clear, the matrix M above could be written as:
M = 1 0 0 0 1 0 1 0 0
1 0 0 0 1 0 0 1 0
0 0 1 1 0 0 0 0 1
0 1 0 0 0 1 0 0 1
(extra spaces added for emphasis).
The issue with representing the matrix like this is that it isn't scalable in memory with the number of rows and dimension of the vector space.
EDIT: I should mention that the number of columns can increase as well. The memory complexity is D * R * C where D is the dimension of the vector space, R is the number of rows and C is the number of columns. In an average working example I have roughly D == 150, R == 2000 and C == 1000 though R can go up to 20,000 and C is unbounded (though 10,000 is a reasonable estimate).
The rules for standard basis vector multiplication are simple (ei * ei.T == 1, ei * ej.T == 0 if i != j) so I was wondering if it's possible to store these rules in a numpy array to save memory.
Let's encode the basis vectors with numbers: e1 -> 1, e2 -> 2, ... This allows a very memory-efficient storage.
M = np.array([[1, 2, 1], [1, 2, 2], [3, 1, 3], [2, 3, 3]], dtype=np.uint8)
# if more than 255 basis vectors, use uint16.
Now we only need to implement a special dot product that works with these basis vectors. Basically we only replace the multiplication with a comparison:
def basis_dot(a, b):
return np.sum(a[:, :, np.newaxis] == b[np.newaxis, :, :], axis=1)
print(basis_dot(M, M.T))
# [[3 2 0 0]
# [2 3 0 0]
# [0 0 3 1]
# [0 0 1 3]]
Let's verify the result:
M = np.array([[1, 0, 0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 1, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 1, 0, 0, 1]])
np.dot(M, M.T)
# array([[3, 2, 0, 0],
# [2, 3, 0, 0],
# [0, 0, 3, 1],
# [0, 0, 1, 3]])
A potential drawback with the approach is the large temporary array required in basis_dot. The memory requirement can be reduced by explicitly coding the loops, at the cost of performance (unless you use a jit compiler).
# slower but more memory friendly
def basis_dot(a, b):
out = np.empty((a.shape[0], b.shape[1]))
for i in range(a.shape[0]):
for j in range(b.shape[1]):
out[i, j] = np.sum(a[i, :] == b[:, j])
return out
So, my assumption based on your example is that you're actually working with a higher dimensionality than just 3. My other assumption is that you're not computing any basis vectors, but just auto-generating basis vectors for RN. I'll ignore the question of exactly what you're trying to accomplish or why you're storing vectors that you can easily auto-generate for now.
If all of the above assumptions are accurate then you can likely gain a lot of benefit by storing in a sparse data format. This will only improve storage if you've got a preponderance of zeroes, but that seems like a reasonable assumption. There are a large number of sparse formats which you can view here. My best guess for you would be the coo_matrix class.
from scipy.sparse import coo_matrix
new_matrix = coo_matrix(<your_matrix>)
Then saving new matrix in your format of choice.
I have a Python Numpy array that is a 2D array where the second dimension is a subarray of 3 elements of integers. For example:
[ [2, 3, 4], [9, 8, 7], ... [15, 14, 16] ]
For each subarray I want to replace the lowest number with a 1 and all other numbers with a 0. So the desired output from the above example would be:
[ [1, 0, 0], [0, 0, 1], ... [0, 1, 0] ]
This is a large array, so I want to exploit Numpy performance. I know about using conditions to operate on array elements, but how do I do this when the condition is dynamic? In this instance the condition needs to be something like:
newarray = (a == min(a)).astype(int)
But how do I do this across each subarray?
You can specify the axis parameter to calculate a 2d array of mins(if you keep the dimension of the result), then when you do a == a.minbyrow, you will get trues at the minimum position for each sub array:
(a == a.min(1, keepdims=True)).astype(int)
#array([[1, 0, 0],
# [0, 0, 1],
# [0, 1, 0]])
How about this?
import numpy as np
a = np.random.random((4,3))
i = np.argmin(a, axis=-1)
out = np.zeros(a.shape, int)
out[np.arange(out.shape[0]), i] = 1
print(a)
print(out)
Sample output:
# [[ 0.58321885 0.18757452 0.92700724]
# [ 0.58082897 0.12929637 0.96686648]
# [ 0.26037634 0.55997658 0.29486454]
# [ 0.60398426 0.72253012 0.22812904]]
# [[0 1 0]
# [0 1 0]
# [1 0 0]
# [0 0 1]]
It appears to be marginally faster than the direct approach:
from timeit import timeit
def dense():
return (a == a.min(1, keepdims=True)).astype(int)
def sparse():
i = np.argmin(a, axis=-1)
out = np.zeros(a.shape, int)
out[np.arange(out.shape[0]), i] = 1
return out
for shp in ((4,3), (10000,3), (100,10), (100000,1000)):
a = np.random.random(shp)
d = timeit(dense, number=40)/40
s = timeit(sparse, number=40)/40
print('shape, dense, sparse, ratio', '({:6d},{:6d}) {:9.6g} {:9.6g} {:9.6g}'.format(*shp, d, s, d/s))
Sample run:
# shape, dense, sparse, ratio ( 4, 3) 4.22172e-06 3.1274e-06 1.34992
# shape, dense, sparse, ratio ( 10000, 3) 0.000332396 0.000245348 1.35479
# shape, dense, sparse, ratio ( 100, 10) 9.8944e-06 5.63165e-06 1.75693
# shape, dense, sparse, ratio (100000, 1000) 0.344177 0.189913 1.81229
[a b c ]
[ a b c ]
[ a b c ]
[ a b c ]
Hello
For my economics course we are suppose to create an array that looks like this. The problem is I am an economist not a programmer. We are using numpy in python. Our professor says college is not preparing us for the real world and wants us to learn programming (which is a good thing). We are not allowed to use any packages and must come up with an original code. Does anybody out there have any idea how to make this matrix. I have spent hours trying codes and browsing the internet looking for help and have been unsuccessful.
This kind of matrix is called a Toeplitz matrix or constant diagonal matrix. Knowing this leads you to scipy.linalg.toeplitz:
import scipy.linalg
scipy.linalg.toeplitz([1, 0, 0, 0], [1, 2, 3, 0, 0, 0])
=>
array([[1, 2, 3, 0, 0, 0],
[0, 1, 2, 3, 0, 0],
[0, 0, 1, 2, 3, 0],
[0, 0, 0, 1, 2, 3]])
The method below fills one diagonal at a time:
import numpy as np
x = np.zeros((4, 6), dtype=np.int)
for i, v in enumerate((6,7,8)):
np.fill_diagonal(x[:,i:], v)
array([[6, 7, 8, 0, 0, 0],
[0, 6, 7, 8, 0, 0],
[0, 0, 6, 7, 8, 0],
[0, 0, 0, 6, 7, 8]])
or you could do the one liner:
x = [6,7,8,0,0,0]
y = np.vstack([np.roll(x,i) for i in range(4)])
Personally, I prefer the first since it's easier to understand and probably faster since it doesn't build all the temporary 1D arrays.
Edit:
Since a discussion of efficiency has come up, it might be worthwhile to run a test. I also included time to the toeplitz method suggested by chthonicdaemon (although personally I interpreted the question to exclude this approach since it uses a package rather than using original code -- also though speed isn't the point of the original question either).
import numpy as np
import timeit
import scipy.linalg as sl
def a(m, n):
x = np.zeros((m, m), dtype=np.int)
for i, v in enumerate((6,7,8)):
np.fill_diagonal(x[:,i:], v)
def b(m, n):
x = np.zeros((n,))
x[:3] = vals
y = np.vstack([np.roll(x,i) for i in range(m)])
def c(m, n):
x = np.zeros((n,))
x[:3] = vals
y = np.zeros((m,))
y[0] = vals[0]
r = sl.toeplitz(y, x)
return r
m, n = 4, 6
print timeit.timeit("a(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("b(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("c(m,n)", "from __main__ import np, c, sl, m, n", number=1000)
m, n = 1000, 1006
print timeit.timeit("a(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("b(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("c(m,n)", "from __main__ import np, c, sl, m, n", number=100)
# which gives:
0.03525209 # fill_diagonal
0.07554483 # vstack
0.07058787 # toeplitz
0.18803215 # fill_diagonal
2.58780789 # vstack
1.57608604 # toeplitz
So the first method is about a 2-3x faster for small arrays and 10-20x faster for larger arrays.
This is a simplified tridiagonal matrix. So it is essentially a this question
def tridiag(a, b, c, k1=-1, k2=0, k3=1):
return np.diag(a, k1) + np.diag(b, k2) + np.diag(c, k3)
a = [1, 1]; b = [2, 2, 2]; c = [3, 3]
A = tridiag(a, b, c)
print(A)
Result:
array([[2, 3, 0],
[1, 2, 3],
[0, 1, 2]])
Something along the lines of
import numpy as np
def createArray(theinput,rotations) :
l = [theinput]
for i in range(1,rotations) :
l.append(l[i-1][:])
l[i].insert(0,l[i].pop())
return np.array(l)
print(createArray([1,2,3,0,0,0],4))
"""
[[1 2 3 0 0 0]
[0 1 2 3 0 0]
[0 0 1 2 3 0]
[0 0 0 1 2 3]]
"""
If you care about efficiency, it is hard to beat this:
import numpy as np
def create_matrix(diags, n):
diags = np.asarray(diags)
m = np.zeros((n,n+len(diags)-1), diags.dtype)
s = m.strides
v = np.lib.index_tricks.as_strided(
m,
(len(diags),n),
(s[1],sum(s)))
v[:] = diags[:,None]
return m
print create_matrix(['a','b','c'], 8)
Might be a little over your head, but then again that's good inspiration ;)
Or even better: a solution which has both O(n) storage and runtime requirements, rather than all the other solutions posted thus far, which are O(n^2)
import numpy as np
def create_matrix(diags, n):
diags = np.asarray(diags)
b = np.zeros(len(diags)+n*2, diags.dtype)
b[n:][:len(diags)] = diags
s = b.strides[0]
v = np.lib.index_tricks.as_strided(
b[n:],
(n,n+len(diags)-1),
(-s,s))
return v
print create_matrix(np.arange(1,4), 8)
This is an old question, however some new input can always be useful.
I create tridiagonal matrices in python using list comprehension.
Say a matrix that is symmetric around "-2" and has a "1" on either side:
-2 1 0
Tsym(3) => 1 -2 1
0 1 -2
This can be created using the following "one liner":
Tsym = lambda n: [ [ 1 if (i+1==j or i-1==j) else -2 if j==i else 0 for i in xrange(n) ] for j in xrange(n)] # Symmetric tridiagonal matrix (1,-2,1)
A different case (that several of the other people answering has solved perfectly fine) is:
1 2 3 0 0 0
Tgen(4,6) => 0 1 2 3 0 0
0 0 1 2 3 0
0 0 0 1 2 3
Can be made using the one liner shown below.
Tgen = lambda n,m: [ [ 1 if i==j else 2 if i==j+1 else 3 if i==j+2 else 0 for i in xrange(m) ] for j in xrange(n)] # General tridiagonal matrix (1,2,3)
Feel free to modify to suit your specific needs. These matrices are very common when modelling physical systems and I hope this is useful to someone (other than me).
Hello since your professor asked you not to import any external package, while most answers use numpy or scipy.
You better use only python List to create 2D array (compound list), then populate its diagonals with the items you wish, Find the code below
def create_matrix(rows = 4, cols = 6):
mat = [[0 for col in range(cols)] for row in range(rows)] # create a mtrix filled with zeros of size(4,6)
for row in range(len(mat)): # gives number of lists in the main list,
for col in range(len(mat[0])): # gives number of items in sub-list 0, but all sublists have the same length
if row == col:
mat[row][col] = "a"
if col == row+1:
mat[row][col] = "b"
if col == row+2:
mat[row][col] = "c"
return mat
create_matrix(4, 6)
[['a', 'b', 'c', 0, 0, 0],
[0, 'a', 'b', 'c', 0, 0],
[0, 0, 'a', 'b', 'c', 0],
[0, 0, 0, 'a', 'b', 'c']]
Creating Band Matrix
Check out the definition for it in wiki :
https://en.wikipedia.org/wiki/Band_matrix
You can use this function to create band matrices like diagonal matrix with offset=1 or tridiagonal matrix (The one you are asking about) with offset=1 or Pentadiagonal Matrix with offset=2
def band(size=10, ones=False, low=0, high=100, offset=2):
shape = (size, size)
n_matrix = np.random.randint(low, high, shape) if not ones else np.ones(shape,dtype=int)
n_matrix = np.triu(n_matrix, -1*offset)
n_matrix = np.tril(n_matrix, offset)
return n_matrix
In your case you should use this
rand_tridiagonal = band(size=6,offset=1)
print(rand_tridiagonal)