Creating a special matrix in numpy - python

[a b c ]
[ a b c ]
[ a b c ]
[ a b c ]
Hello
For my economics course we are suppose to create an array that looks like this. The problem is I am an economist not a programmer. We are using numpy in python. Our professor says college is not preparing us for the real world and wants us to learn programming (which is a good thing). We are not allowed to use any packages and must come up with an original code. Does anybody out there have any idea how to make this matrix. I have spent hours trying codes and browsing the internet looking for help and have been unsuccessful.

This kind of matrix is called a Toeplitz matrix or constant diagonal matrix. Knowing this leads you to scipy.linalg.toeplitz:
import scipy.linalg
scipy.linalg.toeplitz([1, 0, 0, 0], [1, 2, 3, 0, 0, 0])
=>
array([[1, 2, 3, 0, 0, 0],
[0, 1, 2, 3, 0, 0],
[0, 0, 1, 2, 3, 0],
[0, 0, 0, 1, 2, 3]])

The method below fills one diagonal at a time:
import numpy as np
x = np.zeros((4, 6), dtype=np.int)
for i, v in enumerate((6,7,8)):
np.fill_diagonal(x[:,i:], v)
array([[6, 7, 8, 0, 0, 0],
[0, 6, 7, 8, 0, 0],
[0, 0, 6, 7, 8, 0],
[0, 0, 0, 6, 7, 8]])
or you could do the one liner:
x = [6,7,8,0,0,0]
y = np.vstack([np.roll(x,i) for i in range(4)])
Personally, I prefer the first since it's easier to understand and probably faster since it doesn't build all the temporary 1D arrays.
Edit:
Since a discussion of efficiency has come up, it might be worthwhile to run a test. I also included time to the toeplitz method suggested by chthonicdaemon (although personally I interpreted the question to exclude this approach since it uses a package rather than using original code -- also though speed isn't the point of the original question either).
import numpy as np
import timeit
import scipy.linalg as sl
def a(m, n):
x = np.zeros((m, m), dtype=np.int)
for i, v in enumerate((6,7,8)):
np.fill_diagonal(x[:,i:], v)
def b(m, n):
x = np.zeros((n,))
x[:3] = vals
y = np.vstack([np.roll(x,i) for i in range(m)])
def c(m, n):
x = np.zeros((n,))
x[:3] = vals
y = np.zeros((m,))
y[0] = vals[0]
r = sl.toeplitz(y, x)
return r
m, n = 4, 6
print timeit.timeit("a(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("b(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("c(m,n)", "from __main__ import np, c, sl, m, n", number=1000)
m, n = 1000, 1006
print timeit.timeit("a(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("b(m,n)", "from __main__ import np, a, b, m, n", number=1000)
print timeit.timeit("c(m,n)", "from __main__ import np, c, sl, m, n", number=100)
# which gives:
0.03525209 # fill_diagonal
0.07554483 # vstack
0.07058787 # toeplitz
0.18803215 # fill_diagonal
2.58780789 # vstack
1.57608604 # toeplitz
So the first method is about a 2-3x faster for small arrays and 10-20x faster for larger arrays.

This is a simplified tridiagonal matrix. So it is essentially a this question
def tridiag(a, b, c, k1=-1, k2=0, k3=1):
return np.diag(a, k1) + np.diag(b, k2) + np.diag(c, k3)
a = [1, 1]; b = [2, 2, 2]; c = [3, 3]
A = tridiag(a, b, c)
print(A)
Result:
array([[2, 3, 0],
[1, 2, 3],
[0, 1, 2]])

Something along the lines of
import numpy as np
def createArray(theinput,rotations) :
l = [theinput]
for i in range(1,rotations) :
l.append(l[i-1][:])
l[i].insert(0,l[i].pop())
return np.array(l)
print(createArray([1,2,3,0,0,0],4))
"""
[[1 2 3 0 0 0]
[0 1 2 3 0 0]
[0 0 1 2 3 0]
[0 0 0 1 2 3]]
"""

If you care about efficiency, it is hard to beat this:
import numpy as np
def create_matrix(diags, n):
diags = np.asarray(diags)
m = np.zeros((n,n+len(diags)-1), diags.dtype)
s = m.strides
v = np.lib.index_tricks.as_strided(
m,
(len(diags),n),
(s[1],sum(s)))
v[:] = diags[:,None]
return m
print create_matrix(['a','b','c'], 8)
Might be a little over your head, but then again that's good inspiration ;)
Or even better: a solution which has both O(n) storage and runtime requirements, rather than all the other solutions posted thus far, which are O(n^2)
import numpy as np
def create_matrix(diags, n):
diags = np.asarray(diags)
b = np.zeros(len(diags)+n*2, diags.dtype)
b[n:][:len(diags)] = diags
s = b.strides[0]
v = np.lib.index_tricks.as_strided(
b[n:],
(n,n+len(diags)-1),
(-s,s))
return v
print create_matrix(np.arange(1,4), 8)

This is an old question, however some new input can always be useful.
I create tridiagonal matrices in python using list comprehension.
Say a matrix that is symmetric around "-2" and has a "1" on either side:
-2 1 0
Tsym(3) => 1 -2 1
0 1 -2
This can be created using the following "one liner":
Tsym = lambda n: [ [ 1 if (i+1==j or i-1==j) else -2 if j==i else 0 for i in xrange(n) ] for j in xrange(n)] # Symmetric tridiagonal matrix (1,-2,1)
A different case (that several of the other people answering has solved perfectly fine) is:
1 2 3 0 0 0
Tgen(4,6) => 0 1 2 3 0 0
0 0 1 2 3 0
0 0 0 1 2 3
Can be made using the one liner shown below.
Tgen = lambda n,m: [ [ 1 if i==j else 2 if i==j+1 else 3 if i==j+2 else 0 for i in xrange(m) ] for j in xrange(n)] # General tridiagonal matrix (1,2,3)
Feel free to modify to suit your specific needs. These matrices are very common when modelling physical systems and I hope this is useful to someone (other than me).

Hello since your professor asked you not to import any external package, while most answers use numpy or scipy.
You better use only python List to create 2D array (compound list), then populate its diagonals with the items you wish, Find the code below
def create_matrix(rows = 4, cols = 6):
mat = [[0 for col in range(cols)] for row in range(rows)] # create a mtrix filled with zeros of size(4,6)
for row in range(len(mat)): # gives number of lists in the main list,
for col in range(len(mat[0])): # gives number of items in sub-list 0, but all sublists have the same length
if row == col:
mat[row][col] = "a"
if col == row+1:
mat[row][col] = "b"
if col == row+2:
mat[row][col] = "c"
return mat
create_matrix(4, 6)
[['a', 'b', 'c', 0, 0, 0],
[0, 'a', 'b', 'c', 0, 0],
[0, 0, 'a', 'b', 'c', 0],
[0, 0, 0, 'a', 'b', 'c']]

Creating Band Matrix
Check out the definition for it in wiki :
https://en.wikipedia.org/wiki/Band_matrix
You can use this function to create band matrices like diagonal matrix with offset=1 or tridiagonal matrix (The one you are asking about) with offset=1 or Pentadiagonal Matrix with offset=2
def band(size=10, ones=False, low=0, high=100, offset=2):
shape = (size, size)
n_matrix = np.random.randint(low, high, shape) if not ones else np.ones(shape,dtype=int)
n_matrix = np.triu(n_matrix, -1*offset)
n_matrix = np.tril(n_matrix, offset)
return n_matrix
In your case you should use this
rand_tridiagonal = band(size=6,offset=1)
print(rand_tridiagonal)

Related

I want to take the XOR of all the elements of 1 list with another. How do I do it? [duplicate]

This question already has answers here:
How do you get the logical xor of two variables in Python?
(28 answers)
Closed 3 years ago.
I have a bunch of lists in the form of say [0,0,1,0,1...], and I want to take the XOR of 2 lists and give the output as a list.
Like:
[ 0, 0, 1 ] XOR [ 0, 1, 0 ] -> [ 0, 1, 1 ]
res = []
tmp = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
tmp = [i[index] ^ j[index] for index in range(len(i))]
res.append(temp)
The size of each of my lists / vectors is around 3500 elements, so I need something to save time, since this piece of code is taking more than 20 mins to run.
I have 3085 lists, each of which need an XOR operation with 4089 other lists.
How do I do this without iterating through each list explicitly?
Use map:
answer = list(map(operator.xor, lst1, lst2)).
or zip:
answer = [x ^ y for x,y in zip(lst1, lst2)]
If you need something faster, consider using NumPy instead of Python lists to hold your data.
Assuming a and b are the same size you can use the xor operation (i.e. ^) with simple list indexing:
a = [0, 0, 1]
b = [0, 1, 1]
c = [a[index] ^ b[index] for index in range(len(a))]
print(c) # [0, 1, 0]
or you can use zip with the xor:
a = [0, 0, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip(a, b)]
print(c) # [0, 1, 0]
zip will only go to the shortest list (if they are not the same size). If they are not the same size and you want to go to the longer list you can use zip_longest:
from itertools import zip_longest
a = [0, 0, 1, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip_longest(a, b, fillvalue=0)]
print(c) # [0, 1, 0, 1]
Using numpy you should have some performance gains, the function you need is bitwise_xor, like so:
import numpy as np
results = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
results.append(np.bitwise_xor(i, j))
A proof of concept:
a = [1,0,0,1,1]
b = [1,1,0,0,1]
x = np.bitwise_xor(a,b)
print("a\tb\tres")
for i in range(len(a)):
print("{}\t{}\t{}".format(a[i], b[i], x[i]))
output:
a b x
1 1 0
0 1 1
0 0 0
1 0 1
1 1 0
Edit
Note that if your arrays have the same size, you can simply do one operation and the bitwise_xor will still work, so:
a = [[1,1,0], [0,0,1]]
b = [[0,1,0], [1,0,1]]
res = np.bitwise_xor(a, b)
will still work, and you'll have:
res: [[1, 0, 0], [1, 0, 0]]
In your case, a workaround would possibily be:
results = []
n = len(Course_Specific_Vocabulary_Dict['Binary Vector'])
for a in Employee_Specific_Vocabulary_Dict['Binary Vector']:
# Get same size array w.r.t Course_Specific_Vocabulary_Dict["Binary Vector]
repeated_a = np.repeat([a], n, axis=0)
results.append(np.bitwise_xor(repeated_a, Course_Specific_Vocabulary_Dict['Binary Vector']))
However I don't know if that would actually improve performance, it is to be checked; for sure it will require some more memory.

how to do this operation in numpy (chaining of tiling operation)?

I'm trying to do fast generation of numpy array, possibly without passing through python.
I want to build an 1D index numpy array that would take this as an input:
[2,3] and this [2,4] and would return this
[0,1,0,1,0,1,2,0,1,2,0,1,2,0,1,2]
Explanation:
I iterate from 0 to 2 (so [0,1] array) and repeat it 2 times : [0,1,0,1]
Then I iterate from 0 to 3 (so [0,1,2] array) and repeat it 4 times : [0,1,2,0,1,2,0,1,2,0,1,2]
Then I flattened everything.
Is there a way to do this fully in numpy?
For now I'm building each table separately in numpy by using np.tile() and flattening everything afterwards but I feel like there is a more efficient way that would only translate to C functions calls and no python
Here is a vectorized solution:
def cycles(spec):
steps = np.repeat(*spec)
ps = steps.cumsum()
psj = np.zeros(ps[-1], int)
psj[ps[:-1]] = steps[:-1]
return np.arange(ps[-1]) - psj.cumsum()
Demo:
>>> cycles(((2,3),(2,4)))
array([0, 1, 0, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2])
I am not entirely sure if this is what you want; here each tuple in the call to func() contains first the range and then the repeat.
import numpy
def func(tups):
Arr = numpy.empty(numpy.sum([ele[0] * ele[1] for ele in tups]), dtype=int)
i = 0
for ele in tups:
Arr[i:i + ele[0] * ele[1]] = numpy.tile(numpy.arange(ele[0]), ele[1])
i += ele[0] * ele[1]
return Arr
arr = func([(2, 3), (3, 4)])
print(arr)
# [0 1 0 1 0 1 0 1 2 0 1 2 0 1 2 0 1 2]

Creating multidimensional matrix in python using for loop

I'm slowly learning the differences between MATLAB and Python, and wanted to know how I could do the following, which was done in MATLAB, in Python instead:
Ak = zeros(3,3,N);
for t = 1:N
Ak(:,:,t) = [
a(t) 0 0;
0 a(t) 0;
0 0 a(t);
];
end
Where a(t) is just a vector with N elements. Any help would be great. Thanks!
You can use NumPy for matrix calculation. Here is an example.
import numpy as np
N = 256
a = np.arange(N)
Ak = np.zeros((3,3,N))
for t in range(N):
Ak[:,:,t] = np.array([[a[t], 0, 0],
[0, a[t], 0],
[0, 0, a[t]]])
If you use Ak with different dimension order, like [N, 3, 3], you can simplify the code a little.
import numpy as np
N = 256
a = np.arange(N)
Ak = np.zeros((N,3,3))
for ak, _a in zip(Ak, a):
ak[:, :] = np.array([[_a, 0, 0],
[0, _a, 0],
[0, 0, _a]])

Given distances and values array, return sorted filtered values in numpy

I am not sure what the title of this question should be. But lets say we have 2 arrays, values and distances.
values = np.array([[-1,-1,-1],
[1, 2, 0],
[-1,-1,-1]])
distances = np.array([[1,2,3],
[6,5,4],
[7,8,9]])
I would like to get the values that are non negative, and have them in order by its corresponding distance, based on the distances array.
So with the example above, the positive values are [1,2,0] and its distances will be [6,5,4]. Thus, if sorting by its corresponding distance, I would like to have [0,2,1] as the answer.
My code is below. It works, but would like to have the solution of just using numpy. Im sure that would be more efficient than this:
import numpy as np
import heapq
def get_sorted_values(seek_val, values, distances):
r, c = np.where(values >= seek_val)
di = distances[r, c]
vals = values[r, c]
print("di", di)
print("vals", vals)
if len(di) >= 1:
heap = []
for d, v in zip(di,vals):
heapq.heappush(heap, (d,v))
lists = []
while heap:
d, v = heapq.heappop(heap)
lists.append(v)
return lists
else:
## NOTHING FOUND
return None
Input:
seek_val = 0
values = np.array([[-1,-1,-1],
[1,2,0],
[-1,-1,-1]])
distances = np.array([[1,2,3],
[6,5,4],
[7,8,9]])
print("Ans:",get_sorted_values(seek_val, values, distances))
Output:
di [6 5 4]
vals [1 2 0]
Ans: [0, 2, 1]
"one liner":
values[np.where(values >= 0)][np.argsort(distances[np.where(values >= 0)])]
Out[981]: array([0, 2, 1])
repeating np.where(values >= 0) is inefficient, could make a variable if values is big
v_indx = np.where(values >= 0)
values[v_indx][np.argsort(distances[v_indx])]
Try np.argsort
import numpy as np
values = np.array([[-1,-1,-1],
[ 1, 2, 0],
[-1,-1,-1]])
distances = np.array([[1, 2, 3],
[6, 5, 4],
[7, 8, 9]])
print(values[values >= 0])
# [1 2 0]
print(distances[values >= 0])
# [6 5 4]
print('Ans:', values[values >= 0][np.argsort(distances[values >= 0])])
# Ans: [0 2 1]

Replace values in subarray based upon dynamic condition in Numpy

I have a Python Numpy array that is a 2D array where the second dimension is a subarray of 3 elements of integers. For example:
[ [2, 3, 4], [9, 8, 7], ... [15, 14, 16] ]
For each subarray I want to replace the lowest number with a 1 and all other numbers with a 0. So the desired output from the above example would be:
[ [1, 0, 0], [0, 0, 1], ... [0, 1, 0] ]
This is a large array, so I want to exploit Numpy performance. I know about using conditions to operate on array elements, but how do I do this when the condition is dynamic? In this instance the condition needs to be something like:
newarray = (a == min(a)).astype(int)
But how do I do this across each subarray?
You can specify the axis parameter to calculate a 2d array of mins(if you keep the dimension of the result), then when you do a == a.minbyrow, you will get trues at the minimum position for each sub array:
(a == a.min(1, keepdims=True)).astype(int)
#array([[1, 0, 0],
# [0, 0, 1],
# [0, 1, 0]])
How about this?
import numpy as np
a = np.random.random((4,3))
i = np.argmin(a, axis=-1)
out = np.zeros(a.shape, int)
out[np.arange(out.shape[0]), i] = 1
print(a)
print(out)
Sample output:
# [[ 0.58321885 0.18757452 0.92700724]
# [ 0.58082897 0.12929637 0.96686648]
# [ 0.26037634 0.55997658 0.29486454]
# [ 0.60398426 0.72253012 0.22812904]]
# [[0 1 0]
# [0 1 0]
# [1 0 0]
# [0 0 1]]
It appears to be marginally faster than the direct approach:
from timeit import timeit
def dense():
return (a == a.min(1, keepdims=True)).astype(int)
def sparse():
i = np.argmin(a, axis=-1)
out = np.zeros(a.shape, int)
out[np.arange(out.shape[0]), i] = 1
return out
for shp in ((4,3), (10000,3), (100,10), (100000,1000)):
a = np.random.random(shp)
d = timeit(dense, number=40)/40
s = timeit(sparse, number=40)/40
print('shape, dense, sparse, ratio', '({:6d},{:6d}) {:9.6g} {:9.6g} {:9.6g}'.format(*shp, d, s, d/s))
Sample run:
# shape, dense, sparse, ratio ( 4, 3) 4.22172e-06 3.1274e-06 1.34992
# shape, dense, sparse, ratio ( 10000, 3) 0.000332396 0.000245348 1.35479
# shape, dense, sparse, ratio ( 100, 10) 9.8944e-06 5.63165e-06 1.75693
# shape, dense, sparse, ratio (100000, 1000) 0.344177 0.189913 1.81229

Categories