I must calculate the mean in this specific part of the matrix, that was generated with random numbers, my work so far:
import random as rd
import numpy as np
matriz= np.zeros([12, 12])
for i in range(0,12):
for j in range(0,12):
matriz[i,j]=rd.randint(0,10)
Your problem is trying to fit an algorithm. There's clearly a structure to the "marked" section of your matrix, so your problem is in trying to see/identify this structure, in order to fit it.
What I see is a pattern: starting with row 0, you're taking columns from 1 to n-1, then in row 1 you're taking columns 2 to n-2, etc. So basically, you're summing the coordinates for each row in range(rowIndex+1, len(columns)-(rowIndex+1))
There may be some more elegant ways to achieve this, but I think this will work:
import random as rd
import numpy as np
l, w = 12, 12 # matrix has dimensions l=12, w=12
matriz= np.zeros([l, w])
for i in range(l):
for j in range(w):
matriz[i,j]=rd.randint(0,10)
vals = []
for i in range(int(l/2)): # iterate through rows 0 to 4
for j in range(i+1, w-(i+1)):
vals.append(matrix[i,j])
# get the mean:
print 'mean is {}'.format(sum(vals)/len(vals))
Note this probably doesn't work for a non-square matrix.
Related
My goal is to find the Top N vectors in a large 3D dask array(~100k rows per side or more would be nice) that are most cosine similar to a target vector. I can get the Top 1, and only for smaller values of n, n=500 takes over 2 hours. I'm doing something incorrectly with dask, but not sure what. Also, is there a vectorized way to get the cosine similarity instead of the for-loop? In pure numpy I can get to n = ~6000 before I have a MemoryError. dtype of float16 is enough accuracy and an attempt to save space. If dask isn't the right tool, I'd be open to something else too.
import dask.array as da
import numpy as np
from numpy.linalg import norm
# create a 2d matrix of n rows, each of length n, ideally n is quite large, >100,000
start = 1
step = 1
n = 5
vec_len = 10
shape = [n, vec_len]
end = np.prod(shape) * step + start
arr_2D = da.from_array(np.array(np.arange(start, end, step).reshape(shape), dtype=np.float16))
print(arr_2D.compute())
# sum each row with each other row using broadcasting, resulting in a 3D matrix
# each (i,j) location contains a vector that is the sum of the i-th and j-th original vectors
sums_3D = arr_2D[:, None] + arr_2D[None,:]
# make a target vector
target = np.array(range(vec_len,0,-1))
print('target:', target)
# brute force way to get cosine of each vector in #D matrix with target vector
da_cos = da.empty(shape=(n,n), dtype=np.float16)
for i in range(n): # <----- is there a way to vectorize this for loop??
print('row:', i)
for j in range(i+1, n): # i+1: to get only upper triangle
cur = sums_3D[i, j]
cosine = np.dot(target,cur)/(norm(target)*norm(cur))
da_cos[i,j] = cosine
print(da_cos.compute(), da_cos.dtype, da_cos.shape)
# Get top match <------ how would I get the Top N matches??
ar_max = da_cos.argmax().compute()
best_1, best_2 = np.unravel_index(ar_max, (n,n))
print(da_cos.max().compute(), best_1, best_2)
I am working on a problem which requires me to find all 6x6 (0,1) matrices with some given properties:
The sum of a row/column must be lower than 2.
The matrices are not symmetrical.
I am using this code:
import numpy as np
import itertools as it
n=6
li=[]
for i in it.product([0, 1], repeat = n**2):
if (np.reshape(np.array(i), (n, n)).sum(axis=1) < 2).all() and (np.reshape(np.array(i), (n, n)).sum(axis=0)< 2).all() :
if (np.transpose(np.reshape(np.array(i), (n, n))) != np.reshape(np.array(i), (n, n))).any():
li.append(np.reshape(np.array(i), (n, n)))
The problem is that this method has to go through all 68719476736 (0,1) matrices. After this piece of code I still have to impose extra conditions.
Is there a faster algorithm to find this list of matrices?
Edit:
The problem I am working on is one to find unique adjacency matrices (graph theory) up to a certain equivalence class. For instance, in the 4x4 version of the problem I wanted to find all (0,1) matrices such that:
The sum in a row/column is lower than 2;
Are not symmetrical, i.e. A^T != A;
Also A^T != P^T A P, where P is a matrix representation of the dihedral group D8 (order 8) which is a subgroup of S4.
After this last step I get a certain number of matrices. If A relates to B through the relation B = P^T A P, then it represents the same matrix. I follow to choose only one representative of this equivalence class.
In the 4x4 problem I go from 65536 to 3.
My estimate of the result after sorting through the first condition (sums) is 46080. In the 6x6 problem, the group of transformations P is of order 48.
You have trouble with your math, because if the row/column sum is less than 2, it could be 0 or 1 -- that means that in every row/column can be only one non-zero elememt, which is 7^6 = 117649 possible matrices.
100k matrices is pretty much doable by using a brute force, with additional filtering to remove vertical/horizontal flips and diagonal symmetries.
Here's a simple code that should get you started:
import numpy as np
from itertools import permutations
for perm in permutations( range(7), 6 ) : # there are only 5040 permutations
m = np.zeros(6, 6) # start with an empty matrix
for i, j in enumerate(perm) :
if j == 6 : continue # all zeros
m[i][j] = 1 # put `1` in the current (i)-th row, (j) pos
# here you check `m` for symmetry and save it somewhere or not
I have the following code to create a random subset (of size examples) of a large set:
def sampling(input_set):
tmp = random.sample(input_set, examples)
return tmp
The problem is that my input is a large matrix, so input_set.shape = (n,m). However, sampling(input_set) is a list, while I want it to be a submatrix of size = (examples, m), not a list of length examples of vectors of size m.
I modified my code to do this:
def sampling(input_set):
tmp = random.sample(input_set, examples)
sample = input_set[0:examples]
for i in range(examples):
sample[i] = tmp[i]
return sample
This works, but is there a more elegant/better way to accomplish what I am trying to do?
Use numpy as follow to create a n x m matrix (assuming input_set is a list)
import numpy as np
input_matrix = np.array(input_set).reshape(n,m)
Ok, if i understand correctly the question you just want to drop the last couple of rolls (n - k) so:
sample = input_matrix[:k - n]
must do the job for you.
Don't know if still interested in, but maybe you do something like this:
#select a random 6x6 matrix with items -10 / 10
import numpy as np
mat = np.random.randint(-10,10,(6,6))
print (mat)
#select a random int between 0 and 5
startIdx = np.random.randint(0,5)
print(startIdx)
#extracy submatrix (will be less than 3x3 id the index is out of bounds)
print(mat[startIdx:startIdx+3,startIdx:startIdx+3])
So I need to write code that accomplishes the following:
Write a Python code that produces a variable op_table that is a numpy array with three
axes, i, j, and k. Define three arrays:
xi ranges from 0 (included) to 9 (included) in steps of 1,
yj ranges form 10 (included) to 11 (included) in 20 equal-size steps,
zk ranges form 10 to 106 in five steps (i.e. with six entries total), where zk=10zk−1.
Then create the final array op_table that satisfies:
op_table[i,j,k]=sin(xi)⋅yj+zk
My question lies in how to initially set the values. I've only seen numpy arrays created in manners such as np.array([1,2,3,4]) or np.arrange(10). Also, how is this set-up? Is the first column the x-axis, second the y-axis and so forth?
import numpy as np
import math
xi = np.linspace(0,9, num=10)
yj = np.linspace(10,11,20, endpoint=True)
zk = [10, 10**2, 10**3, 10**4, 10**5, 10**6]
op_table = np.random.rand(10,20,6)
for i in range (0,10):
for j in range (0,20):
for k in range (0,6):
op_table[i,j,k] = math.sin(xi[i]) * yj[j] + zk[k]
Don't personally believe in spoon-feeding answers, but it looks like you've misinterpreted the problem. The problem doesn't actually require that you generate any matrix, except by solving the second equation. Numpy happens to have a very helpful function called linspace that does almost exactly this.
import numpy as np
xi = np.linspace(0, 10)
yj = np.linspace(10, 11, 20)
Other than that, this seems to be a math problem, and this should get you 80% of the way to a solution. If you need help with the math, there's another stackexchange for that.
More np.linspace docs: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
Math stackexchange: https://math.stackexchange.com/
I use a function to calculate similarity between a pair of documents and wanto perform clustering using this similarity measure.
Code so Far
Sim=np.zeros((n, n)) # create a numpy arrary
i=0
j=0
for i in range(0,n):
for j in range(i,n):
if i==j:
Sim[i][j]=1
else:
Sim[i][j]=simfunction(list_doc[i],list_doc[j]) # calculate similarity between documents i and j using simfunction
Sim=Sim+ Sim.T - np.diag(Sim.diagonal()) # complete the symmetric matrix
AggClusterDistObj=AgglomerativeClustering(n_clusters=num_cluster,linkage='average',affinity="precomputed")
Res_Labels=AggClusterDistObj.fit_predict(Sim)
My concern is that here I used a similarity function , and I think as per documents it should be a disimilarity matrix, how can I change it to dissimilarity matrix.
Also what would be a more efficient way to do this.
Please format your code correctly, as indentation matters in Python.
If possible, keep the code complete (you left out a import numpy as np).
Since range always starts from zero, you can omit it and write range(n).
Indexing in numpy works like [i, j, k, ...].
So instead of Sim[i][j] you actually want to write Sim[i, j], because otherwise you do two operations: first taking the entire row slice and then indexing the column. Heres another way to copy the elements of the upper triangle to the lower one
Sim = np.identity(n) # diagonal with ones (100 percent similarity)
for i in range(n):
for j in range(i+1, n): # +1 skips the diagonal
Sim[i, j]= simfunction(list_doc[i], list_doc[j])
# Expand the matrix (copy triangle)
tril = np.tril_indices_from(Sim, -1) # take lower & upper triangle's indices
triu = np.triu_indices_from(Sim, 1) # (without diagonal)
Sim[tril] = Sim[triu]
Assumed tha you really have similarities within the range (0, 1) to convert your similarity matrix into a distance matrix you can then simply do
dm = 1 - Sim
This operation will be vectorized by numpy