Defining Error of An Array with Two Index - python

I get an error such as;
Traceback (most recent call last): File
"C:\Users\SONY\Desktop\deneme.py", line 42, in
G[alpha][n]=compute_G(x,n) NameError: name 'G' is not defined
Here is my code:
N = 20
N_cor = 25
N_cf = 25
a = 0.5
eps = 1.4
def update(x):
for j in range(0,N):
old_x = x[j]
old_Sj = S(j,x)
x[j] = x[j] + random.uniform(-eps,eps)
dS = S(j,x) - old_Sj
if dS>0 and exp(-dS)<random.uniform(0,1):
x[j] = old_x
def S(j,x):
jp = (j+1)%N
jm = (j-1)%N
return a*x[j]**2/2 + x[j]*(x[j]-x[jp]-x[jm])/a
def compute_G(x,n):
g = 0
for j in range(0,N):
g = g + x[j]*x[(j+n)%N]
return g/N
#def MCaverage(x,G):
import random
from math import exp
x=[]
for j in range(0,N):
x.append(0.0)
print"x(%d)=%f"%(j,x[j])
for j in range(0,5*N_cor):
update(x)
for alpha in range(0,N_cf):
for j in range(0,N_cor):
update(x)
for i in range(0,N):
print"x(%d)=%f"%(i,x[i])
for n in range(0,N):
G[alpha][n]=compute_G(x,n)
for n in range(0,N):
avg_G = 0
for alpha in range(0,N_cf):
avg_G = avg_G + G[alpha][n]
avg_G = avg_G / N_cf
print "G(%d) = %f"%(n,avg_G)
When i define G I get another error such as:
Traceback (most recent call last): File
"C:\Users\SONY\Desktop\deneme.py", line 43, in
G[alpha][n]=compute_G(x,n) IndexError: list index out of range
Here is how i define G:
...
for alpha in range(0,N_cf):
for j in range(0,N_cor):
update(x)
for n in range(0,N):
G=[][]
G[alpha][n]=compute_G(x,n)
...
What should i do to define an array with two index ie a two dimensional matrix?

In Python a=[] defines a list, not an array. It certainly can be used to store a lot of elements all of the same numeric type, and one can define a mapping from two integers indexing a rectangular array to one list index. It's rather going against the grain, though. Hard to program and inefficiently stored, because lists are intended as ordered collections of objects which may be of arbitrary type.
What you probably need most is a direction to where to start reading. Here it is. Learn about Numpy http://www.numpy.org/, which is a Python module for use in typical scienticic calculations with arrays of (mostly) numeric data in which all the elements are of the same type. Here is a brief taster, after you have installed numpy.
>>> import numpy as np # importing as np is conventional
>>> p = np.zeros( (6,4) ) # two dimensional, 24 elements in total
>>> for i in range(4): p[i,i]=1
>>> p
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
numpy arrays are efficient ways of manipulating as much data as you can fit into your computer's RAM.
Underlying numpy is Python's array.array datatype, but it is rarely used on its own. numpy is the support code that you'll usually not want to write for yourself. Not least, because when your arrays are millions or billions of elements, you can't afford the inefficiency of inner loops over their indices in an interpreted language like Python. Numpy offers you row-, column- and array-level operations whose underlying code is compiled and optimized, so it runs considerably faster.

Related

how initialize a torch of matrices

Hello I m trying to create a tensor that will have inside N matrices of n by n size. I tried to initialize it with
Q=torch.zeros(N, (n,n))
but i get the following error
zeros(): argument 'size' must be tuple of ints, but found element of type tuple at pos 2
Also I want to fill it later with random matrices with integer values and I will turn it to semidefinte so I thought of the following
for i in range(0,N):
Q[i]=torch.randint(0,10,(n,n))
Q = Q*Q.t()
Is it correct? Is there any other faster way with a build in command?
N matrices of n x n size is equivalent to three dimensional tensor of shape [N, n, n]. You can do it like so:
import torch
N = 32
n = 10
tensor = torch.randint(0, 10, size=(N, n, n))
No need to fill it with zeros to begin with, you can create it directly.
You can also iterate over 0 dimension similar to what you did:
for i in range(0, N):
tensor[i] = tensor[i] * tensor[i].T
See #Dishin H Goyani answer for faster approach with permutation.
Here you supposed to pass N, n, n to get N matrices of n by n size. As #Szymon already explain in his answer
Q = torch.randint(0, 10, size=(N, n, n))
For Later part you can use torch.Tensor.permute to transpose internal tensors
Q = Q * Q.permute(0, 2, 1)
Use torch.empty to create uninitialized tensor (it's faster then torch.zeros) torch.empty
Q = torch.empty(N, n, n)
Initialize it:
for i in range(0, N):
Q[i] = torch.randint(0, 10, (n, n))
use .permute as #Dishin H Goyani has proposed.
You can use * operator on iterables like tuples to pass it as positional arguments.
Here sample code:
>>> import torch
>>> N = 2
>>> n = 3
>>> Q = torch.zeros(N, *(n, n))
>>> Q
tensor([[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]])

Create identity matrices with arbitrary shape with numpy

Is there a faster / inbuilt way to generate identity matrices with arbitrary shape in the first dimensions and an identity in the last m dimensions?
import numpy as np
base_shape = (10, 11, 12)
n_dim = 4
# m = 2
frames2d = np.zeros(base_shape + (n_dim, n_dim))
for i in range(n_dim):
frames2d[..., i, i] = 1
# m = 3
frames3d = np.zeros(base_shape + (n_dim, n_dim, n_dim))
for i in range(n_dim):
frames3d[..., i, i, i] = 1
Approach #1
We can leverage np.einsum for a diagonal view inspired by this post and hence assign 1s there for our desired output. So, for say the m=3 case, after initializing with zeros, we can simply do -
diag_view = np.einsum('...iii->...i',frames3d)
diag_view[:] = 1
Generalizing to include those input params, it would be -
def ndeye_einsum(base_shape, n_dim, m):
out = np.zeros(list(base_shape) + [n_dim]*m)
diag_view = np.einsum('...'+'i'*m+'->...i',out)
diag_view[:] = 1
return out
So, to reproduce those same arrays, it would be -
frames2d = ndeye_einsum(base_shape, n_dim, m=2)
frames3d = ndeye_einsum(base_shape, n_dim, m=3)
Approach #2
Again, from the same linked post, we can also reshape to 2D and assign into step-sized sliced array along the cols, like so -
def ndeye_reshape(base_shape, n_dim, m):
N = (n_dim**np.arange(m)).sum()
out = np.zeros(list(base_shape) + [n_dim]*m)
out.reshape(-1,n_dim**m)[:,::N] = 1
return out
This again works on a view and hence should be equally efficient as approach #1.
Approach #3
Another way would be to use integer-based indexing. So, for example for assigning into frames3d in one-go, it would be -
I = np.arange(n_dim)
frames3d[..., I, I, I] = 1
Generalizing that becomes -
def ndeye_ellipsis_indexer(base_shape, n_dim, m):
I = np.arange(n_dim)
indexer = tuple([Ellipsis]+[I]*m)
out = np.zeros(list(base_shape) + [n_dim]*m)
out[indexer] = 1
return out
Extending to higher-dims with view
The dims along base_shape are basically replications of elements from the last m dims. As such, we can get those higher dims as a higher-dim array view with np.broadcast_to. We will create basically a m-dim identity array and then broadcast-view into higher dims. This would be applicable across all three approaches posted earlier. To demonstrate, how to use it on the einsum based solution, we would have -
# Create m-dim "trailing-base" array, basically a m-dim identity array
def ndeye_einsum_trailingbase(n_dim, m):
out = np.zeros([n_dim]*m)
diag_view = np.einsum('i'*m+'->...i',out)
diag_view[:] = 1
return out
def ndeye_einsum_view(base_shape, n_dim, m):
trail_base = ndeye_einsum_trailingbase(n_dim, m)
return np.broadcast_to(trail_base, list(base_shape) + [n_dim]*m)
Thus, again we would have, e.g. -
frames3d = ndeye_einsum_view(base_shape, n_dim, m=3)
This would be a view into a m-dim array and hence efficient both on memory and performance.
One approach to have an identity matrix along the last two dimensions of the array, is to use np.broadcast_to and specifying the resulting shape the ndarray should have (this does not generalize to higher dimensions):
base_shape = (10, 11, 12)
n_dim = 4
frame2d = np.broadcast_to(np.eye(n_dim), a.shape+(n_dim,)*2)
print(frame2d.shape)
# (10, 11, 12, 4, 4)
print(frame2d)
array([[[[[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]],
[[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]],
...

Identity matrix stacking in NumPy

I need a 2n x n matrix in NumPy consisting of the n x n identity matrix and the negative n x n identity matrix stacked on top of one another.
This was my original solution, which works fine.
def id_stack(n):
id_ = np.identity(n)
return np.vstack((id_, -id_))
id_stack(3)
# array([[ 1., 0., 0.],
# [ 0., 1., 0.],
# [ 0., 0., 1.],
# [-1., -0., -0.],
# [-0., -1., -0.],
# [-0., -0., -1.]])
Then I figured I could just set the diagonals instead and be faster like this, which also works.
def id_stack2(n):
full = np.zeros((2*n, n))
rng = np.arange(n)
full[rng, rng] = 1
full[rng + n, rng] = -1
return full
I was wondering if there is an even faster way of accomplishing this, maybe using some kind of stride tricks?
As you probably noticed from your own examples, allocating one big buffer and setting elements in it is generally faster than allocating two smaller buffers and a big buffer to copy them into.
The neat thing about numpy is that you can get views to the same buffer without allocating a new array. For example:
output = np.zeros((2 * n, n))
A useful view in this case is
flat = output.ravel()
You can set every n + 1st element to 1, starting from the first, for a total of n elements in the flattened view, and similar for -1. This requires only a simple indexing operation on the raveled view:
output[:n * n:n + 1] = 1
output[n * n::n + 1] = -1
This avoids creating the full range arrays, and triggering advanced indexing semantics, which are more memory intensive as well.

How to create a binary matrix with some given condition below:

For a given list of tuples L whose elements are taken from range(n), I want to create A binary matrix A of order n in the following way:
If (i,j) or (j,i) in L then A[i][j]=1 otherwise A[i][j]=0.
Let us consider the following example:
L=[(2,3),(0,1),(1,3),(2,0),(0,3)]
A=[[0]*4]*4
for i in range(4):
for j in range(4):
if (i,j) or (j,i) in L:
A[i][j]=1
else:
A[i][j]=0
print A
This program does not give the accurate result. Where is the logical mistake occurred?
You should use a 3rd party library, numpy, for matrix calculations.
Python lists of lists are inefficient for numeric arrays.
import numpy as np
L = [(2,3),(0,1),(1,3),(2,0),(0,3)]
A = np.zeros((4, 4))
idx = np.r_[L].T
A[idx[0], idx[1]] = 1
Result:
array([[ 0., 1., 0., 1.],
[ 0., 0., 0., 1.],
[ 1., 0., 0., 1.],
[ 0., 0., 0., 0.]])
Related: Why NumPy instead of Python lists?
According to Aran-Fey's correction the answer is :
L=[(2,3),(0,1),(1,3),(2,0),(0,3)]
#A=[[0]*4]*4
A=[[0]*4 for _ in range(4)]
for i in range(4):
for j in range(4):
if (i,j) in L or (j,i) in L:
A[i][j]=1
else:
A[i][j]=0
print A

cosine similarity on large sparse matrix with numpy

The code below causes my system to run out of memory before it completes.
Can you suggest a more efficient means of computing the cosine similarity on a large matrix, such as the one below?
I would like to have the cosine similarity computed for each of the 65000 rows in my original matrix (mat) relative to all of the others so that the result is a 65000 x 65000 matrix where each element is the cosine similarity between two rows in the original matrix.
import numpy as np
from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity
mat = np.random.rand(65000, 10)
sparse_mat = sparse.csr_matrix(mat)
similarities = cosine_similarity(sparse_mat)
After running that last line I always run out of memory and the program either freezes or crashes with a MemoryError. This occurs whether I run on my 8 gb local RAM or on a 64 gb EC2 instance.
Same problem here. I've got a big, non-sparse matrix. It fits in memory just fine, but cosine_similarity crashes for whatever unknown reason, probably because they copy the matrix one time too many somewhere. So I made it compare small batches of rows "on the left" instead of the entire matrix:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
def cosine_similarity_n_space(m1, m2, batch_size=100):
assert m1.shape[1] == m2.shape[1]
ret = np.ndarray((m1.shape[0], m2.shape[0]))
for row_i in range(0, int(m1.shape[0] / batch_size) + 1):
start = row_i * batch_size
end = min([(row_i + 1) * batch_size, m1.shape[0]])
if end <= start:
break # cause I'm too lazy to elegantly handle edge cases
rows = m1[start: end]
sim = cosine_similarity(rows, m2) # rows is O(1) size
ret[start: end] = sim
return ret
No crashes for me; YMMV. Try different batch sizes to make it faster. I used to only compare 1 row at a time, and it took about 30X as long on my machine.
Stupid yet effective sanity check:
import random
while True:
m = np.random.rand(random.randint(1, 100), random.randint(1, 100))
n = np.random.rand(random.randint(1, 100), m.shape[1])
assert np.allclose(cosine_similarity(m, n), cosine_similarity_n_space(m, n))
You're running out of memory because you're trying to store a 65000x65000 matrix. Note that the matrix you're constructing is not sparse at all. np.random.rand generates a random number between 0 and 1. So there aren't enough zeros for csr_matrix to actually compress your data. In fact, there are almost surely no zeros at all.
If you look closely at your MemoryError traceback, you can see that cosine_similarity tries to use the sparse dot product if possible:
MemoryError Traceback (most recent call last)
887 Y_normalized = normalize(Y, copy=True)
888
--> 889 K = safe_sparse_dot(X_normalized, Y_normalized.T, dense_output=dense_output)
890
891 return K
So the problem isn't with cosine_similarity, it's with your matrix. Try initializing an actual sparse matrix (with 1% sparsity, for example) like this:
>>> a = np.zeros((65000, 10))
>>> i = np.random.rand(a.size)
>>> a.flat[i < 0.01] = 1 # Select 1% of indices and set to 1
>>> a = sparse.csr_matrix(a)
Then, on a machine with 32GB RAM (8GB RAM was not enough for me), the following runs with no memory error:
>>> b = cosine_similarity(a)
>>> b
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 1., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
I would run it in chunks like this
from sklearn.metrics.pairwise import cosine_similarity
# Change chunk_size to control resource consumption and speed
# Higher chunk_size means more memory/RAM needed but also faster
chunk_size = 500
matrix_len = your_matrix.shape[0] # Not sparse numpy.ndarray
def similarity_cosine_by_chunk(start, end):
if end > matrix_len:
end = matrix_len
return cosine_similarity(X=your_matrix[start:end], Y=your_matrix) # scikit-learn function
for chunk_start in xrange(0, matrix_len, chunk_size):
cosine_similarity_chunk = similarity_cosine_by_chunk(chunk_start, chunk_start+chunk_size)
# Handle cosine_similarity_chunk ( Write it to file_timestamp and close the file )
# Do not open the same file again or you may end up with out of memory after few chunks

Categories