Get indices of matrix from upper triangle - python

I have a symmetric matrix represented as a numpy array, like the following example:
[[ 1. 0.01735908 0.01628629 0.0183845 0.01678901 0.00990739 0.03326491 0.0167446 ]
[ 0.01735908 1. 0.0213712 0.02364181 0.02603567 0.01807505 0.0130358 0.0107082 ]
[ 0.01628629 0.0213712 1. 0.01293289 0.02041379 0.01791615 0.00991932 0.01632739]
[ 0.0183845 0.02364181 0.01293289 1. 0.02429031 0.01190878 0.02007371 0.01399866]
[ 0.01678901 0.02603567 0.02041379 0.02429031 1. 0.01496896 0.00924174 0.00698689]
[ 0.00990739 0.01807505 0.01791615 0.01190878 0.01496896 1. 0.0110924 0.01514519]
[ 0.03326491 0.0130358 0.00991932 0.02007371 0.00924174 0.0110924 1. 0.00808803]
[ 0.0167446 0.0107082 0.01632739 0.01399866 0.00698689 0.01514519 0.00808803 1. ]]
And I need to find the indices (row and column) of the greatest value without considering the diagonal. Since is a symmetric matrix I just took the the upper triangle of the matrix.
ind = np.triu_indices(M_size, 1)
And then the index of the max value
max_ind = np.argmax(H[ind])
However max_ind is the index of the vector resulting after taking the upper triangle with triu_indices, how do I know which are the row and column of the value I've just found?
The matrix could be any size but it's always symmetric. Do you know a better method to achieve the same?
Thank you

Couldn't you do this by using np.triu to return a copy of your matrix with all but the upper triangle zeroed, then just use np.argmax and np.unravel_index to get the row/column indices?
Example:
x = np.zeros((10,10))
x[3, 8] = 1
upper = np.triu(x, 1)
idx = np.argmax(upper)
row, col = np.unravel_index(idx, upper.shape)
The drawback of this method is that it creates a copy of the input matrix, but it should still be a lot quicker than looping over elements in Python. It also assumes that the maximum value in the upper triangle is > 0.

You can use the value of max_ind as an index into the ind data
max_ind = np.argmax(H[ind])
Out: 23
ind[0][max_ind], ind[1][max_ind],
Out: (4, 6)
Validate this by looking for the maximum in the entire matrix (won't always work -- data-dependent):
np.unravel_index(np.argmax(H), H.shape)
Out: (4, 6)

There's probably a neater "numpy way" to do this, but this is what comest to mind first:
answer = None
biggest = 0
for r,row in enumerate(matrix):
i,elem = max(enumerate(row[r+1:]), key=operator.itemgetter(1))
if elem > biggest:
biggest, answre = elem, i

Related

Checking for a Magic Square Python

i'm trying to make a function that checks whether or not a matrix is a magic square. i only need to check the vertical and horizontal (not diagonal). sometimes it passes and sometimes it fails. i was hoping that someone could help me solve the issue. this is my code:
for i in range(len(m)):
if len(m[i]) != len(m):
return False
return True
and this is the one that fails. it returns false:
m = [ [1,2,3,4]
, [5,6,7,8]
, [9,10,11,12]
, [13,14,15,16]
]
print(magic_square(m)) == False)
The code you provide does not check if a matrix is a magic square, it only checks if a matrix (in this case, a list of lists) is square. After this check, you need to calculate the sums of each row, each column and each diagonal (although for some reason you said you don't need those) and compare if all of them are equal.
If you are ok with using numpy then you can use
m = np.array(m)
len(np.unique(np.concatenate([m.sum(axis=1), m.sum(axis=0)]))) == 1
Test cases:
m = [ [1,2,3,4]
, [5,6,7,8]
, [9,10,11,12]
, [13,14,15,16]
]
m = np.array(m)
print (len(np.unique(np.concatenate([m.sum(axis=1), m.sum(axis=0)]))) == 1)
m = [ [2,7,6]
, [9,5,1]
, [4,3,8]
]
m = np.array(m)
print (len(np.unique(np.concatenate([m.sum(axis=1), m.sum(axis=0)]))) == 1)
Output:
False
True
Meaning:
m.sum(axis=1) : Sum the numpy array along the rows
m.sum(axis=0) : Sum the numpy array along the columns
np.concatenate([m.sum(axis=1), m.sum(axis=0)]) : Combine both the sums (along rows and columns) into t single numpy array
np.unique(x) : Find the number of unique elements in the numpy array x
len(np.unique(np.concatenate([m.sum(axis=1), m.sum(axis=0)]))) == 1) : Check the number of unique elements in the row wise sum and columns wise sum is 1. i.e all the row wise sum and column wise sums are same.
This is not as clever as the numpy answer, but it works
def magic_square(m):
# check size
for i in range(len(m)):
if len(m[i]) != len(m):
return False
# check row sums
for r in m:
if sum(r) != sum(m[0]):
return False
# check column sums
cols = [[r[c] for r in m] for c in range(len(m[0]))]
for c in cols:
if sum(c) != sum(m[0]):
return False
return True
m = [ [1,2,3,4]
, [5,6,7,8]
, [9,10,11,12]
, [13,14,15,16]
]
print(magic_square(m)) # False
m = [ [8,11,14,1]
, [13,2,7,12]
, [3,16,9,6]
, [10,5,4,15]
]
print(magic_square(m)) # True

Create normally distributed columns, with different mean values

I have the following numpy matrix:
R = np.matrix(np.ones([3,3]))
# Update R matrix based on sales statistics
for i in range(0, len(R)):
for j in range(0, len(R)):
R[j,i] = scipy.stats.norm(2, 1).pdf(i) * 100
print(R)
[[ 5.39909665 24.19707245 39.89422804]
[ 5.39909665 24.19707245 39.89422804]
[ 5.39909665 24.19707245 39.89422804]]
I would like to convert each column, multiplying the index (0,1,2) to the corresponding density value of the normal distribution, with mean equals to, specifically, 5.39909665 for the first column, 24.19707245 the second and 39.8942280 the third; and standard deviation equals to 1.
Ultimately, creating a matrix as:
[norm(5.39, 1).pdf(0), norm(24.197, 1).pdf(0), ...]
[ norm(5.39, 1).pdf(1), norm(24.197, 1).pdf(1), ...]
[ norm(5.39, 1).pdf(2), norm(24.197, 1).pdf(2), ...]]
How can I create the final matrix?
The pdf method works much like any numpy function, in the sense you can input arrays with same shapes in combinations with scalars. You can create R with something like:
ix = np.repeat(np.arange(3),3).reshape((3,3)) #row index, or ix.T for column index
R = scipy.stats.norm(2,1).pdf(ix.T)*100
>>array([[ 5.39909665, 24.19707245, 39.89422804],
[ 5.39909665, 24.19707245, 39.89422804],
[ 5.39909665, 24.19707245, 39.89422804]])
Following the same logic, if you want your [i,j] index to be scipy.stats.norm(scipy.stats.norm(2,1).pdf(j) * 100, 1).pdf(i) (as from the matrix you put as result), use:
scipy.stats.norm(scipy.stats.norm(2,1).pdf(ix.T) * 100, 1).pdf(ix)

Speed up NumPy loop on 2D arrays - removes rows that are similar

I am trying to significantly speed up the following code but to no avail. The code takes in a 2D array and removes rows of the array that, when compared to other rows in the array, are too similar. Please see below code and comments.
as0 = a.shape[0]
for i in range(as0):
a2s0 = a.shape[0] # shape may change after each iteration
if i > (a2s0 - 1):
break
# takes the difference between all rows in array by iterating over each
# row. Then sums the absolutes. The condition finally gives a boolean
# array output - similarity condition of 0.01
t = np.sum(np.absolute(a[i,:] - a), axis=1)<0.01
# Retains the indices that are too similar and then deletes the
# necessary row
inddel = np.where(t)[0]
inddel = [k for k in inddel if k != i]
a = np.delete(a, inddel, 0)
I was wondering if vectorization was possible but I'm not too familiar with it. Any assistance would be greatly appreciated.
Edit:
if i >= (a2s0 - 1): # Added equality sign
break
# Now this only calculates over rows that have not been compared.
t = np.sum(np.absolute(a[i,:] - a[np.arange(i+1,a2s0),:]), axis=1)>0.01
t = np.concatenate((np.ones(i+1,dtype=bool), t))
a = a[t, :]
Approach #1 : Broadcasting
Here's one vectorized approach making use of broadcasting upon extending a to 3D and then performing those computations across all iterations in a vectorized manner -
mask = (np.absolute(a[:,None] - a)).sum(2) < 0.01
a_out = a[~np.triu(mask,1).any(0)]
Approach #2 : Using pdist('cityblock')
For large arrays, we would run into memory issues with the previous approach. So, as another method, we could make use of pdist's 'cityblock' that computes the manhattan distances and then ID the corresponding row/col in its squared distance matrix form without actually computing that form by using searchsorted instead for an efficient solution to it.
Here's the implementation -
from scipy.spatial.distance import pdist
n = a.shape[0]
dists = pdist(a, 'cityblock')
idx = np.flatnonzero(dists < thresh)
sep_idx = np.arange(n-1,0,-1).cumsum()
rm_idx = np.unique(np.searchsorted(sep_idx,idx,'right'))
a_out = np.delete(a,rm_idx,axis=0)
Benchmarking
Approaches -
# Approach#2 from this post
def remove_similar_rows(a, thresh=0.01):
n = a.shape[0]
dists = pdist(a, 'cityblock')
idx = np.flatnonzero(dists < thresh)
sep_idx = np.arange(n-1,0,-1).cumsum()
rm_idx = np.unique(np.searchsorted(sep_idx,idx,'right'))
return np.delete(a,rm_idx,axis=0)
# #John Zwinck's soln
def pairwise_manhattan_distances(a, thresh=0.01):
d = manhattan_distances(a)
return a[~np.any(np.tril(d < thresh), axis=0)]
Timings -
In [209]: a = np.random.randint(0,9,(4000,30))
# Let's set 100 rows randomly as dups
In [210]: idx0 = np.random.choice(4000,size=100, replace=0)
In [211]: idx1 = np.random.choice(4000,size=100, replace=0)
In [217]: a[idx0] = a[idx1]
In [238]: %timeit pairwise_manhattan_distances(a, thresh=0.01)
1 loops, best of 3: 225 ms per loop
In [239]: %timeit remove_similar_rows(a, thresh=0.01)
10 loops, best of 3: 100 ms per loop
Let's create some fake data:
np.random.seed(0)
a = np.random.random((4,3))
Now we have:
array([[ 0.5488135 , 0.71518937, 0.60276338],
[ 0.54488318, 0.4236548 , 0.64589411],
[ 0.43758721, 0.891773 , 0.96366276],
[ 0.38344152, 0.79172504, 0.52889492]])
Next, we want the sum of elementwise differences for all pairs of rows. We can use Manhattan Distance:
d = sklearn.metrics.pairwise.manhattan_distances(a)
Which gives:
array([[ 0. , 0.33859562, 0.64870931, 0.31577611],
[ 0.33859562, 0. , 0.89318282, 0.6465111 ],
[ 0.64870931, 0.89318282, 0. , 0.5889615 ],
[ 0.31577611, 0.6465111 , 0.5889615 , 0. ]])
Now you can apply a threshold, keeping only one triangle:
m = np.tril(d < 0.4, -1) # large threshold just for this example
And get a boolean mask:
array([[False, False, False, False],
[ True, False, False, False],
[False, False, False, False],
[ True, False, False, False]], dtype=bool)
Which tells you that row 0 is "too similar" to both row 1 and row 3. Now you can remove rows from the original matrix where any element of the mask is True:
a[~np.any(m, axis=0)] # axis can be either 0 or 1 - design choice
Which gives you:
array([[ 0.54488318, 0.4236548 , 0.64589411],
[ 0.43758721, 0.891773 , 0.96366276],
[ 0.38344152, 0.79172504, 0.52889492]])
Putting it all together:
d = sklearn.metrics.pairwise.manhattan_distances(a)
a = a[~np.any(np.tril(d < 0.4, -1), axis=0)]
First the line:
t = np.sum(np.absolute(a[i,:] - a), axis=1)<0.01
is taking the sum of the absolute value of the difference between a single line the and the whole array every time. This is probably not what you would need, try instead taking the differences between the current line and the lines later in the array. You have already compared all of the preceding lines with the current so why do it again?
Also deleting lines from the array is an expensive, slow operation, so you will probably find it quicker to check all of the lines and then delete the near duplicates. You could also not check any lines that already slated for deletion as you know that they will be removed.

Filling empty list with zero vector using numpy

So here is the thing.
I have a list of lists and some of the list are empty... but if it is not empty.. all the list are of fix length.. the length which I dont know..
For example
for feature in features:
print len(feature)
This will print either 0 or length k.
I am trying to write a function
def generate_matrix(features):
# return matrix
Which returns a numpy matrix.. with empty rows replaces with zero vector.
Is there a way to do this efficiently?
I mean... I don't want to loop thru to find the shape of matrix and then again loop thru to generate row (with either np.array(feature) or np.zeros(shape)..
Edit: I didn't realize that all of the non-empty features were the same length. If that is the case then you can just use the length of the first non-zero one. I added a function that does that.
f0 = [0,1,2]
f1 = []
f2 = [4,5,6]
features = [f0, f1, f2]
def get_nonempty_len(features):
"""
returns the length of the first non-empty element
of features.
"""
for f in features:
if len(f) > 0:
return len(f)
return 0
def generate_matrix(features):
rows = len(features)
cols = get_nonempty_len(features)
m = np.zeros((rows, cols))
for i, f in enumerate(features):
m[i,:len(f)]=f
return m
print(generate_matrix(features))
Output looks like:
[[ 0. 1. 2.]
[ 0. 0. 0.]
[ 4. 5. 6.]]

Numpy mixing arrays with multiple index arrays

I have a 3d mesh with points and the locations of the points are
in an array that looks like this:
mesh_vectors = np.array([[-0.85758871, 0.8965745 , -0.1427767 ],
[-0.23945311, 1.00544977, 1.45797086],
[-0.57341832, -1.07448494, -0.11827722],
[ 0.05894491, -0.97208506, 1.47583127],
[-0.71402085, -0.08872638, -0.12916484],
[-0.09181146, 1.01235461, 0.47418442],
[-0.09025362, 0.01668115, 1.46690106],
[ 0.19773833, -0.95349348, 0.49089319],
[ 0.05055711, 0.02909645, 0.48503664]])
I have two indexing arrays:
idx1 = np.array([4 2 1 6 5 0 1 5])
idx2 = np.array([6 3 0 4 7 2 3 7])
these translations correspond to the index arrays:
translate_1 = np.array([[ 0.00323021 0.00047712 -0.00422925]
[ 0.00153422 0.00022654 -0.00203258]
[ 0.00273207 0.00039626 0.00038201]
[ 0.0052439 0.00075993 0.00068843]
[-0.00414245 -0.00053918 0.00543974]
[-0.00681844 -0.00084955 0.00894626]
[ 0. 0. 0. ]
[-0.00672519 -0.00099897 -0.00090189]])
translate_2 = np.array([[ 0.00523871 0.00079512 0.00068814]
[ 0.00251901 0.00038234 0.00033379]
[ 0.00169134 0.00021078 -0.00218737]
[ 0.00324106 0.00040338 -0.00422859]
[-0.00413547 -0.00058669 0.00544016]
[-0.00681223 -0.0008921 0.00894669]
[ 0. 0. 0. ]
[-0.00672553 -0.00099677 -0.00090191]])
they are currently added to the mesh like this:
mesh_vectors[idx1] += translate_1
mesh_vectors[idx2] += translate_2
The trouble is, what I really need to add isn't the translations
but the mean of the translations where multiple translations are
applied to the same mesh point. The indexing arrays can have indices occurring in a variety of different frequencies. Could be [2,2,2,3,4,5] and [1,2,1,1,5,4] though they will always be the same size. I'm trying to do this with numpy for speed but I have the options of using loops on start to generate indexing arrays if needed.
Thanks in advance!
This works:
scaled_tr1 = translate_1 / np.bincount(idx1)[idx1,None]
np.add.at(mesh_vectors, idx1, scaled_tr1)
Note that the use of np.add.at instead of fancy indexing is required:
ufunc.at(a, indices, b=None)
Performs unbuffered in place operation on operand a for elements specified by indices. For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once. For example, a[[0,0]] += 1 will only increment the first element once because of buffering, whereas add.at(a, [0,0], 1) will increment the first element twice.

Categories