Matrix match in python - python

How can I find the best "match" for small matrix in big matrix?
For example:
small=[[1,2,3],
[4,5,6],
[7,8,9]]
big=[[2,4,2,3,5],
[6,0,1,9,0],
[2,8,2,1,0],
[7,7,4,2,1]]
The match is defined as difference of numbers in matrix, so match in position (1,1) is as if number 5 from small would be on number 0 from big matrix (so the central number from small matrix in coordinates (1,1) of big matrix.
The match value in position (1,1) is:
m(1,1)=|2−1|+|4−2|+|2−3|+|6−4|+|0−5|+|1−6|+|2−7|+|8−8|+|2−9|=28
The goal is to find the lowest difference posible in those matrixes.
The small matrix always has odd number of lines and columns, so it's easy to find it's centre.

You can iterate through the viable rows and columns and zip the slices of big with small to calculate the sum of differences, and use min to find the minimum among the differences:
from itertools import islice
min(
(
sum(
sum(abs(x - y) for x, y in zip(a, b))
for a, b in zip(
(
islice(r, col, col + len(small[0]))
for r in islice(big, row, row + len(small))
),
small
)
),
(row, col)
)
for row in range(len(big) - len(small) + 1)
for col in range(len(big[0]) - len(small[0]) + 1)
)
or in one line:
min((sum(sum(abs(x - y) for x, y in zip(a, b)) for a, b in zip((islice(r, col, col + len(small[0])) for r in islice(big, row, row + len(small))), small)), (row, col)) for row in range(len(big) - len(small) + 1) for col in range(len(big[0]) - len(small[0]) + 1))
This returns: (24, (1, 0))

Done by hand:
small=[[1,2,3],
[4,5,6],
[7,8,9]]
big=[[2,4,2,3,5],
[6,0,1,9,0],
[2,8,2,1,0],
[7,7,4,2,1]]
# collect all the sums
summs= []
# k and j are the offset into big
for k in range(len(big)-len(small)+1):
# add inner list for one row
summs.append([])
for j in range(len(big[0])-len(small[0])+1):
s = 0
for row in range(len(small)):
for col in range(len(small[0])):
s += abs(big[k+row][j+col]-small[row][col])
# add to the inner list
summs[-1].append(s)
print(summs)
Output:
[[28, 29, 38], [24, 31, 39]]
If you are just interested in the coords in the bigger one, store tuples of (rowoffset,coloffset,sum) and dont box lists into lists. You can use min() with a key that way:
summs = []
for k in range(len(big)-len(small)+1):
for j in range(len(big[0])-len(small[0])+1):
s = 0
for row in range(len(small)):
for col in range(len(small[0])):
s += abs(big[k+row][j+col]-small[row][col])
summs .append( (k,j,s) ) # row,col, sum
print ("Min value for bigger matrix at ", min(summs , key=lambda x:x[2]) )
Output:
Min value for bigger matrix at (1, 0, 24)
If you had "draws" this would only return the one with minimal row, col offset.

Another possible solution would be this, returning the minimum difference and the coordinates in the big matrix:
small=[[1,2,3],
[4,5,6],
[7,8,9]]
big=[[2,4,2,3,5],
[6,0,1,9,0],
[2,8,2,1,0],
[7,7,4,2,1]]
def difference(small, matrix):
l = len(small)
return sum([abs(small[i][j] - matrix[i][j]) for i in range(l) for j in range(l)])
def getSubmatrices(big, smallLength):
submatrices = []
bigLength = len(big)
step = (bigLength // smallLength) + 1
for i in range(smallLength):
for j in range(step):
tempMatrix = [big[j+k][i:i+smallLength] for k in range(smallLength)]
submatrices.append([i+1,j+1,tempMatrix])
return submatrices
def minDiff(small, big):
submatrices = getSubmatrices(big, len(small))
diffs = [(x,y, difference(small, submatrix)) for x, y, submatrix in submatrices]
minDiff = min(diffs, key=lambda elem: elem[2])
return minDiff
y, x, diff = minDiff(small, big)
print("Minimum difference: ", diff)
print("X = ", x)
print("Y = ", y)
Output:
Minimum difference: 24
X = 1
Y = 2

I would use numpy to help with this.
To start I would convert the arrays to numpy arrays
import numpy as np
small = np.array([[1,2,3], [4,5,6], [7,8,9]])
big = np.array([[2,4,2,3,5], [6,0,1,9,0], [2,8,2,1,0], [7,7,4,2,1]])
then I would initialize an array to store the results of the test (optional: a dictionary as well)
result_shape = np.array(big.shape) - np.array(small.shape) + 1
results = np.zeros((result_shape[0], result_shape[1]))
result_dict = {}
Then iterate over the positions in which the small matrix can be positioned over the large matrix and calculate the difference:
insert = np.zeros(big.shape)
for i in range(results.shape[0]):
for j in range(results.shape):
insert[i:small.shape[0] + i, j:small.shape[1] + j] = small
results[i, j] = np.sum(np.abs(big - insert)[i:3+i, j:3+j])
# Optional dictionary
result_dict['{}{}'.format(i, j)] = np.sum(np.abs(big - insert)[i:3+i, j:3+j])
Then you can print(results) and obtain:
[[ 28. 29. 38.]
[ 24. 31. 39.]]
and/or because the position of the small matrix over the big matrix is stored in the keys of the dictionary, you can get the position of the small matrix over the large matrix where the difference is smallest by key manipulation:
pos_min = [int(i) for i in list(min(result_dict, key=result_dict.get))]
and if you print(pos_min), you obtain:
[1, 0]
then if you need the index for anything you can iterate over it if required. Hope this helps!

Related

How can I make my output group all numbers together?

So I'm trying to find how to group similar numbers into different lists. I tried looking at some sources like (Grouping / clustering numbers in Python)
but all of them requires the importation of itertools and use itertools.groupby, which I dont want because I dont want to use built in functions.
Here is my code so far.
def n_length_combo(lst, n):
if n == 0:
return [[]]
l = []
for i in range(0, len(lst)):
m = lst[i]
remLst = lst[i + 1:]
for p in n_length_combo(remLst, n - 1):
l.append([m] + p)
return l
print(n_length_combo(lst=[1,1,76,45,45,4,5,99,105],n=3))
Edit: n: int represents the number of groups permitted from one single list, so if n is 3, the numbers will be grouped in (x,...), (x,....) (x,...) If n = 2, the numbers will be grouped in (x,..),(x,...)
However, my code prints out all possible combinations in a list of n elements. But it doesnt group the numbers together. So what I want is: for instance if the input is
[10,12,45,47,91,98,99]
and if n = 2, the output would be
[10,12,45,47] [91,98,99]
and if n = 3, the output would be
[10,12] [45,47] [91,98,99]
What changes to my code should I make?
Assuming n is the number of groups/partitions you want:
import math
def partition(nums, n):
partitions = [[] for _ in range(n)]
min_, max_ = min(nums), max(nums)
r = max_ - min_ # range of the numbers
s = math.ceil(r / n) # size of each bucket/partition
for num in nums:
p = (num - min_) // s
partitions[p].append(num)
return partitions
nums = [10,12,45,47,91,98,99]
print(partition(nums, 2))
print(partition(nums, 3))
prints:
[[10, 12, 45, 47], [91, 98, 99]]
[[10, 12], [45, 47], [91, 98, 99]]
You are trying to convert a 1d array into a 2d array. Forgive the badly named variables but the general idea is as follows. It is fairly easy to parse, but basically what we are doing is first finding out the size in rows of the 2d matrix given the length of the 1d matrix and desired number of cols. If this does not divide cleanly, we add one to rows. then we create one loop for counting the cols and inside that we create another loop for counting the rows. we map the current position (r,c) of the 2d array to an index into the 1d array. if there is an array index out of bounds, we put 0 (or None or -1 or just do nothing at all), otherwise we copy the value from the 1d array to the 2d array. Well, actually we create a 1d array inside the cols loop which we append to the lst2 array when the loop is finished.
def transform2d(lst, cols):
size = len(lst)
rows = int(size/cols)
if cols * rows < size:
rows+=1
lst2 = []
for c in range(cols):
a2 = []
for r in range(rows):
i = c*cols + r
if i < size:
a2.append(lst[i])
else:
a2.append(0) # default value
lst2.append(a2)
return lst2
i = [10,12,45,47,91,98,99]
r = transform2d(i, 2)
print(r)
r = transform2d(i, 3)
print(r)
the output is as you have specified except for printing 0 for the extra elements in the 2d array. this can be changed by just removing the else that does this.

conditional sum over a range python

I have created X as folowing
num_locations = 2
X= [ ]
for n in range(num_locations):
X.append([0 for j in range(num_locations)])
Now I want to sum these X[n][m] values for the case n != m . Such that the result should be like
X[0][1]+X[1][0]
Is there a way to do that with the sum formulation ?
X[n][m] for n in range(num_locations)for m in range(num_locations))
This is effectively taking the sum of the non-diagonal elements of your 2D array. One option using Numpy could simply be to subtract the sum of the main diagonal (np.trace) from the sum of the entire array.
num_locations = 2
X= [[1,2],[2,1]]
import numpy as np
s = np.sum(X) - np.trace(X)
print(s)
Outputs:
4
You can simply use enumerate
>>> sum(o for i, a in enumerate(X) for j, o in enumerate(a) if i!=j)
0
Where i and j are row (1st dim) and column (2nd dim) indices respectively
This should work
sum([sum(row) - (row[i] if len(row) < i else 0) for i,row in enumerate(X)])
It runs over every row in the 2d array, and sums it, then take out the i cell (if exists) so it won't get into sum

NumPy: Sparse outer product of n vectors (hyperbolic cross)

I'm trying to compute a certain subset of the full outer product of n vectors. The computation of the full outer product is described in this question.
Formally: Let v1,v2,...,vk be vectors of some length n, and K be a positive constant. I want a list containing all the products v1[i1]v2[i2]...vk[ik] for which i1*i2*...*ik <= K (indices start at one). Note: For example, if K = n ** k, the list would contain every combination.
My current approach is to create a hierarchical list of the indices fulfilling the condition above and then calculating the products recursively, which has the advantage of reusing some factors.
This implementation is a lot slower than the computation of the full outer product using NumPy (for same n and k). I want to achieve a better performance than the computation of the full product. I'm interested in larger values for k, and small K (this problem comes from function approximation with sparse bases, i.e. hyperbolic cross).
Does anyone know a more performant way to get this list? Maybe by using more NumPy or another algorithm? I will try a C implementation next.
Here is my current implementation:
import numpy as np
def get_cross_indices(n, k, K):
"""
Assume k > 0.
Returns a hierarchical list containg elements of type
(i1, list) with
- i1 being a index (zero based!)
- list being again a list (possibly empty) with all indices i2, such
that (i1+1) * (i2+1) * ... * (ik+1) <= K (going down the hierarchy)
"""
if k == 1:
num = min(n, K)
return (num, [(x, []) for x in range(num)])
else:
indices = []
nums = 0
for i in xrange(min(n, K)):
(num, tail) = get_cross_indices(n,
k - 1, K // (i + 1))
indices.append((i, tail))
nums += num
return (nums, indices)
def calc_cross_outer_product(vectors, result, factor, indices, pos):
"""
Fills the result list recursively with all products
vectors[0][i1] * ... * vectors[k-1][ik]
such that i1,...,ik is a feasible index sequence
from `indices` (they are in there hierarchically,
also see `get_cross_indices`).
"""
for (x, list) in indices:
if not list:
result[pos] = factor * vectors[0][x]
pos += 1
else:
pos = calc_cross_outer_product(vectors[1:], result,
factor * vectors[0][x], list, pos)
return pos
k = 3 # number of vectors
n = 4 # vector length
K = 3
# using random values here just for demonstration purposes
vectors = np.random.rand(k, n)
# get all indices which meet the condition
(count, indices) = get_cross_indices(n, k, K)
result = np.ones(count)
calc_cross_outer_product(vectors, result, 1, indices, 0)
## Equivalent version ##
alt_result = np.ones(count)
# create full outer products
outer_product = reduce(np.multiply, np.ix_(*vectors))
pos = 0
for inds in np.ndindex((n,)*k):
# current index set is feasible?
if np.product(np.array(inds) + 1) <= K:
# compute [ vectors[0][inds[0]],...,vectors[k-1][inds[k-1]] ]
values = map(lambda x: vectors[x[0]][x[1]],
np.dstack((np.arange(k), inds))[0])
alt_result[pos] = np.product(values)
pos += 1
To get a visual idea of the indices I'm interested in, here is a picture for k=3, K=n:
(taken from this website)

Find m "smallest" elements from m by n np.array

I have a two-dimention ndarray of size m by n (m<=n) like the following:
a = [ [1, 2, 3],
[4, 5, 6] ]
Now I want to greedily find m "smallest" elements from the array with a restriction that each row and column can only choose one element, everytime choose the global minimum. My code is as follows:
for k in xrange(m):
index = np.argmin(a)
i, j = divmod(index, n-k)
result.append(a[i][j])
a = np.delete(np.delete(a, i, 0), j, 1)
So I would get result = [1, 5], is there any better way to represent the input array a, and better algorithm to find these elements wrt speed?
I just tried an alternative approach:
import numpy as np
import timeit
nmin = 2000 # number of the smallest values to find in a matrix with unique row and column indexes
nrows = 2000 # number of rows
ncols = 2000 # number of columns
print "Select {} smallest values from {} x {} matrix".format(nmin, nrows, ncols)
matrix = np.random.uniform(0, 1, size = nrows * ncols).reshape(nrows, ncols) # sample 2D array
#print matrix
# ALTERNATIVE: sort once and track-and-skip visited rows and columns
startedat = timeit.default_timer()
seenrows = set()
seencols = set()
order = (divmod(index, ncols) for index in np.argsort(matrix, None))
for iter in xrange(nmin):
while True:
try:
current = order.next()
except:
break
if current[0] not in seenrows and current[1] not in seencols:
#print iter, current, matrix[current[0]][current[1]]
seenrows.add(current[0])
seencols.add(current[1])
break
alternative = timeit.default_timer() - startedat
print "Alternative approach took: ", alternative
# ORIGINAL: repeatedly find minimum and update matrix
startedat = timeit.default_timer()
for k in xrange(nmin):
index = np.argmin(matrix)
i, j = divmod(index, np.shape(matrix)[1])
#print k, (i, j), matrix[i][j]
matrix = np.delete(np.delete(matrix, i, 0), j, 1)
if matrix.size == 0: break
original = timeit.default_timer() - startedat
print " Original approach took: ", original, "WINNER" if original < alternative else "TIE" if original == alternative else "LOOSER"
With the following result:
Select 2 smallest values from 2000 x 2000 matrix
Alternative approach took: 0.737312265981
Original approach took: 0.0572765855289 WINNER
Select 20 smallest values from 2000 x 2000 matrix
Alternative approach took: 0.732718787079
Original approach took: 0.564769882057 WINNER
Select 200 smallest values from 2000 x 2000 matrix
Alternative approach took: 0.736015078962
Original approach took: 5.14679721535 LOOSER
Select 2000 smallest values from 2000 x 2000 matrix
Alternative approach took: 6.46196502191
Original approach took: 19.2116744154 LOOSER
Select 20000 smallest values from 2000 x 2000 matrix
Alternative approach took: 7.90157398272
Original approach took: 19.189003763 LOOSE

Quiver vector from high value toward low value

What I would like to plot is to make vector from high values to low values.
If code would start from:
a = [[1, 8, 9, 10],[2, 15, 3, -1],[3,1,6,11],[13,15,5,-2]]
X,Y = np.meshgrid(np.arange(4), np.arange(4))
U = ?
V = ?
From this point, I should set U and V components of the vector.
The magnitude of each point would be a[x][y]. I don't have much idea of how I can set U and V to make arrow from high to low value at each grid point.
Here's a solution (doesn't require numpy):
import itertools as it
a = [[1, 8, 9, 10],[2, 15, 3, -1],[3,1,6,11],[13,15,5,-2]]
rowSize = len(a[0])
maxVal = a[0][0]
maxIndex = 0
minVal = a[0][0]
minIndex = 0
for k, v in enumerate(it.chain(*a)): # Loop through a flattened list of the values in the array, and locate the indices of the max and min values.
if v > maxVal:
maxVal = v
maxIndex = k
if v < minVal:
minVal = v
minIndex = k
U = (minIndex % rowSize) - (maxIndex % rowSize)
V = (minIndex / rowSize) - (maxIndex / rowSize)
print U, ",", V
OUTPUT
2 , 2
Note that you haven't defined what behavior you want when there are two equal maximum values, as there are in your example. The code above uses the "first" (upper-leftmost) one as the true maximum, and ignores all others.
Explanation:
I flattened the list (which means I read the values like you would the words on a book - first the first row, then the second row, then the third row). Each value got a single index, like so:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
For example, a value in the second row and the third column would get an index of 6, since it's the 7th value if you read the array like a book.
At the end, when we've found the index of the max or min value, we need to get 2D coordinates from the 1D index. So, we can use the mod operator (%) to get the x-value.
For example, 6 % 4 = 2, so X = 2 (the 3rd column)
To get the Y value, we use the integer division operator (/).
For example, 6 / 4 = 1, so Y = 1 (the second row)
The formulas for U and V are simply taking the X and Y values for the max and min and subtracting them to get the vector coordinates, like so:
U = xMin - xMax
V = yMin - yMax
If you're wondering, "why the heck didn't he just use the meshgrid solution I started with", there are two reasons: One, using a non-standard library like numpy is generally undesirable if there is an easy way to solve the problem without non-standard libraries, and two, if you ever need to deal with large arrays, generating a large meshgrid could become time/memory expensive.
Solution that picks shortest vector:
import itertools as it
a = [[1, 8, 9, 10],[2, 15, 3, -1],[3,1,6,11],[13,15,5,-2]]
rowSize = len(a[0])
values = sorted(enumerate(it.chain(*a)), key=lambda x:x[1]) # Pair each value with its 1D index, then sort the list.
minVal = values[0][1]
maxVal = values[-1][1]
maxIndices = map(lambda x:x[0], filter(lambda x:x[1]==maxVal, values)) # Get a list of all the indices that match the maximum value
minIndices = map(lambda x:x[0], filter(lambda x:x[1]==minVal, values)) # Get a list of all the indices that match the minimum value
def getVector(index1, index2, rowSize): # A function that translates a pair of 1D index values to a "quiver vector"
return ((index1 % rowSize) - (index2 % rowSize), (index1 / rowSize) - (index2 / rowSize))
vectors = [getVector(k2, k1, rowSize) for k1, k2 in it.product(maxIndices, minIndices)] # produce a list of the vectors formed by all possible combinations of the 1D indices for maximum and minimum values
U, V = sorted(vectors, key=lambda x:(x[0]*x[0] + x[1]*x[1])**0.5)[0]
print U, ",", V
OUTPUT
2 , 0

Categories