algorithm to randomize a matrix with uniqueness constraint

algorithm to randomize a matrix with uniqueness constraint - python

I'm trying to develop an algorithm for randomizing an NxN matrix N times with the following constraint: any two values A and B can exist at most one time across all the columns in the resulting matrices. For example, a 3x3 matrix is randomized 3 times with the following result:
matrix #0
[0, 3, 6]
[1, 4, 7]
[2, 5, 8]
matrix #1
[0, 3, 6]
[7, 1, 4]
[5, 8, 2]
matrix #2
[0, 3, 6]
[4, 7, 1]
[8, 2, 5]
The pairing of any two number A and B in any given column, say 0 and 1 in column 0 of matrix #0 are unique for all the columns in each resulting matrix. This condition must hold for every two-paired values in the matrices.
I developed what I believed was a solution with the following code:
#!/usr/bin/python
w,h = 5,5
matrix_list = []
def rotate(l,n):
return l[-n:] + l[:-n]
def transpose(l):
return list(map(list, zip(*l)))
matrix = [[x*w + y for x in range(w)] for y in range(h)]
#matrix = transpose(matrix)
for i in range(w):
matrix_list.append(matrix[:])
matrix = [rotate(matrix[n],n) for n in range(w)]
for m in matrix_list:
for arr in m:
print(arr)
print('\n')
It simply shift the values of each row N places were N is the value of the row index of the matrix.
However, I found that the algorithm does not work whenever N is even and N > 2, as illustrated by the following partial output of a 4x4 matrix (the pairing of values in rows 0 and 2 are repeated):
(from matrix #0)
[0, 4, 8, 12]
[1, 5, 9, 13]
[2, 6, 10, 14]
[3, 7, 11, 15]
(from matrix #2)
[0, 4, 8, 12]
[9, 13, 1, 5]
[2, 6, 10, 14]
[11, 15, 3, 7]
I have tried all sorts of shifting and transposing methods and continue to come up empty. Any assistance in creating a solution for even-dimensioned matrices or a general solution covering both odd and even matrices would be much appreciated.

Related

How to sum a numpy along the row axis by including only certain values per row according to variable length indices?

For the following array:
array = [
[1, 5, 6, 8, 10, 3],
[3, 2, 4, 9, 11, 7],
[8, 0, 9, 6, 23, 4]
]
How could we sum the elements (per row) as indicated by these indices:
indices = [
[2, 4, 5],
[1, 3],
[4]
]
that is to say that:
for the first row only the values on indices [2, 4, 5] will be considered when summing up -> (6 + 10 + 3)
for the second row only the values on indices [1, 3] will be considered when summing up -> (2 + 9)
and so on
Output:
array([19, 11, 23])
The output has the same shape as if we did array.sum(axis=1) but not every value is included. Instead, the participants of each row are determined by the indices array.
I have thought of creating a mask for that purpose, but I did not know how to pass the indices to it.

Try this:
arr = np.array(array)
out = np.array([arr[idx, ind].sum() for idx, ind in enumerate(indices)])
out
Output : array([19, 11, 23])

how can I shuffle node labels and get a new weight vector using NumPy in Python?

I am saving the edge weights of an undirected graph in a row vector. For instance, if I have a graph as pictured below
The vector that I create is [5, 3, 4, 1, 2, 7] as ordered based on node number in ascending order. Now, if I swap the node labels of nodes 1 and 4, I can obtain the following graph;
In this scenerio, the vector that I should have is [2, 7, 4, 1, 5, 3]. My question is if I have an n by m NumPy array, where n is the number of graphs and m is the number of edges, how can I shuffle the node labels for each row and get the updated array efficiently?
Suppose I have a set of graphs consisting of four nodes as shown below. My intention is to randomly shuffle node labels in each network and then get an updated weights accordingly in a same size array.
np.random.seed(2)
arr = np.random.randint(10, size=(5, 6))
arr
array([[8, 8, 6, 2, 8, 7],
[2, 1, 5, 4, 4, 5],
[7, 3, 6, 4, 3, 7],
[6, 1, 3, 5, 8, 4],
[6, 3, 9, 2, 0, 4]])

You can do it like this:
import numpy as np
def get_arr_from_edges(a):
n = int(np.sqrt(len(a) * 2)) + 1
mask = np.tri(n, dtype=bool, k=-1).T
out = np.zeros((n, n))
out[mask] = a
out += out.T
return out
def get_edges_from_arr(a):
mask = np.tri(a.shape[0], dtype=bool, k=-1).T
out = a[mask]
return out
def swap_nodes(a, nodes):
a[:, [nodes[0] - 1, nodes[1] - 1], :] = a[:, [nodes[1] - 1, nodes[0] - 1], :]
a[:, :, [nodes[0] - 1, nodes[1] - 1]] = a[:, :, [nodes[1] - 1, nodes[0] - 1]]
return a
arr = np.array([
[8, 8, 6, 2, 8, 7],
[2, 1, 5, 4, 4, 5],
[7, 3, 6, 4, 3, 7],
[6, 1, 3, 5, 8, 4],
[6, 3, 9, 2, 0, 4],
])
nodes_to_swap = (1, 4)
# initialize node-arr
node_arrs = np.apply_along_axis(get_arr_from_edges, axis=1, arr=arr)
# swap nodes
node_arrs = swap_nodes(node_arrs, nodes_to_swap)
# return rempapped edges
edges = np.array([get_edges_from_arr(node_arr) for node_arr in node_arrs])
print(edges)
Gives the following result:
[[8 7 6 2 8 8]
[4 5 5 4 2 1]
[3 7 6 4 7 3]
[8 4 3 5 6 1]
[0 4 9 2 6 3]]
The idea is to build a connection-matrix from the edges, where the edge-number is saved at the indices of the two nodes.
Then you just swap the columns and rows according to the nodes you want to swap. If you want this process to be random you could create random node pairs and call the function multiple times with these node pairs. This process is non-commutative, so if you want to swap multiple node-pairs then order matters!
After that you read out the remapped edges of the array with the swapped columns and rows (this is basically the inverse of the first step).
I am sure that there are some more optimizations left using numpys vast functionality.

In a specific row of a numpy array, how to find column indexes of the top 3 largest values

I have an array X:
X = np.array([[4, 3, 5, 2],
[9, 6, 7, 3],
[8, 6, 7, 5],
[3, 4, 5, 3],
[5, 3, 2, 6]])
I want the indices of the top 3 greatest values in a row with index 1. The result of that would be :
[0,2,1]
I am relatively new to Python. I tried doing it with argsort, but am not able to do it for one specific row.

You can use argsort on axis=1 (by row) and then extract the last 3 indices for each row:
X.argsort(axis=1)[:,:-4:-1]
#[[2 0 1]
# [0 2 1]
# [0 2 1]
# [2 1 3]
# [3 0 1]]

X = np.array([[4, 3, 5, 2],
[9, 6, 7, 3],
[8, 6, 7, 5],
[3, 4, 5, 3],
[5, 3, 2, 6]])
# get top 3 values in the row with index 1
row_sorted = sorted(X[1], reverse=True)[0:3]
# Find the corresponding index of theses top 3 values
indexes = [list(X[1]).index(i) for i in row_sorted]
output:
[0, 2, 1]

For sufficiently large arrays, np.argpartition will be the most efficient solution. It will place the last three elements of the sort indices in the right positions:
i = np.argpartition(x[1], [-3, -2, -1])[:-4:-1]
This behaves similarly to np.argsort except that only the selected indices are in the right place. All the other elements are only guaranteed to be in the correct side relative to each partition point, but not the exact position.

How to build a N(N+1) matrix with number in range of 1~NN and totally distributed?

Assume for a given number N, generated a matrix which has N+1 rows, and each rows has N columns, each columns has N number in range [1, N^2]. And the matrix has this feature: every column has N number, the numbers are totally distributed in other row.
Sorry for that English is not my mother language, I tried my best to describe the problem clearly, If you have better description for this problem, please teach me how to.
For example N = 3, I can build a matrix which has 4 rows and 3 columns, and with number [1, 3^2]. the matrix is:
[1, 2, 3], [4, 5, 6], [7, 8, 9]
[1, 4, 7], [2, 5, 8], [3, 6, 9]
[1, 5, 9], [2, 6, 7], [3, 4, 8]
[1, 6, 8], [2, 4, 9], [3, 5, 7]
In this example, every row has 3 columns, every columns has 3 numbers, and the 3 numbers are distributed in 3 different columns in every other row. Following are use the 2nd row 2nd column ([2,5,8]) as a example. The three number [2,5,8] are in different column in other rows. No other any column has [2,5], [5,8] or [2,8], but every column in other rows has and only has one of them.
[1, 2, 3], [4, 5, 6], [7, 8, 9]
[1, 4, 7], [2, 5, 8], [3, 6, 9]
[1, 5, 9], [2, 6, 7], [3, 4, 8]
[1, 6, 8], [2, 4, 9], [3, 5, 7]
I have found a fast way to build matrix like this when N is a prime number.
But when N is not a prime number, I can only find a exhaustive method. But it's a O((N^(N-1)^N) algorithm. I can build a matrix in 5 seconds when N is 4, but I should take 328 days when N is 5.
This is what I build when N = 4:
[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]
[1, 5, 9, 13], [2, 6, 10, 14], [3, 7, 11, 15], [4, 8, 12, 16]
[1, 6, 11, 16], [2, 5, 12, 15], [3, 8, 9, 14], [4, 7, 10, 13]
[1, 7, 12, 14], [2, 8, 11, 13], [3, 5, 10, 16], [4, 6, 9, 15]
[1, 8, 10, 15], [2, 7, 9, 16], [3, 6, 12, 13], [4, 5, 11, 14]
I want to find out how to build the matrix with N = 100 or other greater number. Can anyone solve this problem?
Following are how I build the matrix when N is prime number. Use example also.
For example N = 3:
The first row is alway continuous: [1,2,3],[4,5,6],[7,8,9]
For the following rows, I pick a number from the first row with different offset.
Following is my code to how to generate the matrix when N is prime. But I thought there must be other way to generate the matrix when N is not prime.
#!/bin/env python
def main(n):
data = []
for i in range(0, n):
data.append([n*i + x for x in range(0, n)])
print data # the first row
offset = 0
while offset < n:
row = []
for i in range(0, n):
idx = i
grid = []
for j in range(0, n):
grid.append(data[j][idx%n])
idx += offset # every row I use a different step. It works for prime.
row.append(grid)
print row
offset += 1
if __name__ == '__main__':
main(7)

Short answer: this is a known and studied problem in the field of combinatorics, and (without wanting to discourage you) seems to be very hard to solve computationally. For N a prime or prime power it's easy to generate examples, once you know how. For N=6 or N=10, it's known that there are no solutions. For many other N (e.g., N=12, N=15, etc.), people have searched, but no-one knows whether there are solutions in general.
Longer answer: What you describe corresponds to a finite affine plane. This is a finite set of "points", together with a finite collection of "lines" (which for simplicity we can think of as being subsets of the set of points), satisfying the following axioms:
For any two points, there's a unique line that contains those two points.
Given any line L and any point P not on L, there's a unique line M that's parallel to L and goes through (i.e., contains) P. (Here M and L are considered parallel if they have no points in common - i.e., they don't intersect.)
There's a configuration of 4 points such that no three are collinear.
To make this correspond to your example: in the 3x3 case, your "points" are the numbers 1 through 9. Your "lines" are the "columns", and each row in your configuration gives a collection of mutually parallel lines.
Axiom 1 above roughly corresponds to your "totally distributed" property; axiom 2 is what lets you organise your "columns" into rows so that each row contains each number exactly once. Axiom 3 isn't all that interesting: it's a non-degeneracy condition designed to exclude degenerate configurations that are permitted under the first two axioms, but otherwise don't have much in common with the non-degenerate cases.
If you start searching, you'll find lots of results for finite projective planes, but fewer for finite affine planes. That's because any affine plane can easily be completed to a projective plane, by adding a line of points at infinity. Conversely, given a finite projective plane, you can remove a line and all the points on it to get an affine plane. So if you can create finite projective planes, you can create finite affine planes, and vice versa.
Here's an example of that completion process starting from the affine plane you have for N=3. You had:
[1, 2, 3], [4, 5, 6], [7, 8, 9]
[1, 4, 7], [2, 5, 8], [3, 6, 9]
[1, 5, 9], [2, 6, 7], [3, 4, 8]
[1, 6, 8], [2, 4, 9], [3, 5, 7]
We add four new points ("points at infinity") which we'll call "A", "B", "C" and "D". Every current line gets a new point (one of these points at infinity) added to it, and we also get one new line, consisting of exactly these new points at infinity. Note that any two lines which were previously parallel (i.e., were in the same row above) have been completed with the same point at infinity, so now we have a very concrete meaning for the oft-heard phrase "two parallel lines meet at infinity". The new structure looks like this:
[1, 2, 3, A], [4, 5, 6, A], [7, 8, 9, A]
[1, 4, 7, B], [2, 5, 8, B], [3, 6, 9, B]
[1, 5, 9, C], [2, 6, 7, C], [3, 4, 8, C]
[1, 6, 8, D], [2, 4, 9, D], [3, 5, 7, D]
[A, B, C, D]
So now we have 13 points, and 13 lines, such that for every pair of distinct points there's a unique line through those two points, and for every pair of distinct lines, those lines intersect in exactly one point. And this beautifully symmetric situation is pretty much exactly what a finite projective plane is (modulo another non-degeneracy condition). In this case, we've just described the (unique up to isomorphism) finite projective plane of order 3.
Here are some known facts about finite projective planes of order n (where the n here corresponds exactly to your N):
if n is a prime or prime power, there's a finite projective plane of order n; this can be created directly and simply from the finite field of order n, which is what your algorithm already does in the case where n is prime
there are also finite projective planes that don't arise this way: so-called non-Desarguesian planes. There are three known non-Desarguesian planes of order 9, for example
there are no finite projective planes of orders 6 or 10 (the latter was proved by a computer search that took ~2000 hours of supercomputer time in the late 1980s)
it's unknown whether there's a finite projective plane of order 12 (though it's conjectured that there isn't)
there's no known finite projective plane whose order is not either a prime or a prime power
some orders (including n=14) can be ruled out directly by the Bruck-Ryser-Chowla theorem
Here's some code that constructs the solution for N=4 directly as the affine plane over the finite field with four elements, which I'll call GF4. First we need an implementation of that field. The one below is perhaps unnecessarily non-obvious, and was chosen for the simplicity of the multiplication operation.
class GF4:
"""
A quick and dirty implementation of the finite field
(Galois field) of order 4. Elements are GF4(0), GF4(1),
GF4(8), GF4(9). This representation was chosen for the
simplicity of implementation of multiplication.
"""
def __init__(self, bits):
self.bits = bits
def __add__(self, other):
return GF4(self.bits ^ other.bits)
__sub__ = __add__ # because we're in characteristic 2
def __mul__(self, other):
return GF4(self.bits * other.bits % 55 & 9)
def __eq__(self, other):
return self.bits == other.bits
def __hash__(self):
return hash(self.bits)
Now we construct the scalars over the field, then use those to create first the collection of all points in the plane (just pairs of scalars), then the collection of all lines in the plane (by enumerating pairs of points):
# Our scalars are all four elements of GF4.
scalars = list(map(GF4, [0, 1, 8, 9]))
# Points are simply pairs of scalars
points = [(x, y) for x in scalars for y in scalars]
# Every pair of nonequal points determines a line.
def line_through(p, q):
"""
Return a frozenset of all the points on the line through p and q.
"""
# We want our lines to be hashable, so use a frozenset.
return frozenset(
(p[0] + t*(q[0] - p[0]), p[1] + t*(q[1] - p[1]))
for t in scalars
)
# Create all lines; every line is created multiple times, so use
# a set to get unique lines.
lines = {line_through(p, q) for p in points for q in points if p != q}
Our points are currently pairs of objects of type GF4; to show the correspondence with your problem, we want to relabel those, replacing the points with integers 1 through 16:
relabel = dict(zip(points, range(1, 17)))
lines = [sorted(map(relabel.get, line)) for line in lines]
We can now print the lines one by one, but to get your rows, we also need to group lines into mutually parallel groups:
def parallel(l, m):
"""Return True if l and m are parallel, else False."""
return not(set(l) & set(m))
rows = []
while lines:
l = lines.pop()
parallel_to_l = {m for m in lines if parallel(m, l)}
lines -= parallel_to_l
rows.append(sorted({l} | parallel_to_l))
And now we can print the results, sorting for friendliness:
for row in sorted(rows):
print(row)
Here's the output; it's essentially identical to the output you computed.
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10, 11, 12), (13, 14, 15, 16)]
[(1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 11, 15), (4, 8, 12, 16)]
[(1, 6, 11, 16), (2, 5, 12, 15), (3, 8, 9, 14), (4, 7, 10, 13)]
[(1, 7, 12, 14), (2, 8, 11, 13), (3, 5, 10, 16), (4, 6, 9, 15)]
[(1, 8, 10, 15), (2, 7, 9, 16), (3, 6, 12, 13), (4, 5, 11, 14)]

Python 2D array sum enumeration

I'm trying to iterate through a 2D array getting the sum for each list inside the array. For example I have:
test = [[5, 3, 6], [2, 1, 3], [1, 1, 3], [2, 6, 6], [4, 5, 3], [3, 6, 2], [5, 5, 2], [4, 4, 4], [3, 5, 3], [1, 3, 4]]
I want to take the values of each smaller array, so for example 5+3+6 and 2+1+3 and put them into a new array. So I'm aiming for something like:
testSum = [14, 6, 5, 14...].
I'm having trouble properly enumerating through a 2D array. It seems to jump around. I know my codes not correct but this is what i have so far:
k = 10
m = 3
testSum = []
#create array with 10 arrays of length 3
test = [[numpy.random.randint(1,7) for i in range(m)] for j in range(k)]
sum = 0
#go through each sub-array in test array
for array in test:
#add sums of sub-arrays
for i in array
sum += test[array][i]
testSum.append(sum)

You can do this more pythonic way,
In [17]: print [sum(i) for i in test]
[14, 6, 5, 14, 12, 11, 12, 12, 11, 8]
or
In [19]: print map(sum,test)
[14, 6, 5, 14, 12, 11, 12, 12, 11, 8]

Since you're using Numpy, you should let Numpy handle the looping: it's much more efficient than using explicit Python loops.
import numpy as np
k = 10
m = 3
test = np.random.randint(1, 7, size=(k, m))
print(test)
print('- ' * 20)
testSum = np.sum(test, axis=1)
print(testSum)
typical output
[[2 5 1]
[1 5 5]
[6 5 3]
[1 1 1]
[2 5 6]
[4 2 5]
[3 3 1]
[6 4 6]
[2 5 1]
[6 5 2]]
- - - - - - - - - - - - - - - - - - - -
[ 8 11 14 3 13 11 7 16 8 13]
As for the code you posted, it has a few problems. The main one being that you need to set the sum variable to zero for each sub-list. BTW, you shouldn't use sum as a variable name because that shadows Python's built-in sum function.
Also, your array access is wrong. (And you shouldn't use array as a variable name either, since it's the name of a standard module).
for array in test:
for i in array:
iterates over each list in test and then over each item in each of those list, so i is already an item of an inner list, so in
sum += test[array][i]
you are attempting to index the test list with a list instead of an integer, and then you're trying to index the result of that with the current item in i.
(In other words, in Python, when you iterate over a container object in a for loop the loop variable takes on the values of the items in the container, not their indices. This may be confusing if you are coming from a language where the loop variable gets the indices of those items. If you want the indices you can use the built-in enumerate function to get the indices and items at the same time).
Here's a repaired version of your code.
import numpy as np
k = 10
m = 3
#create array with 10 arrays of length 3
test = [[np.random.randint(1,7) for i in range(m)] for j in range(k)]
print(test)
print()
testSum = []
#go through each sub-array in test array
for array in test:
#add sums of sub-arrays
asum = 0
for i in array:
asum += i
testSum.append(asum)
print(testSum)
typical output
[[4, 5, 1], [3, 6, 6], [3, 4, 1], [2, 1, 1], [1, 6, 4], [3, 4, 4], [3, 2, 6], [6, 3, 2], [1, 3, 5], [5, 3, 3]]
[10, 15, 8, 4, 11, 11, 11, 11, 9, 11]
As I said earlier, it's much better to use Numpy arrays and let Numpy do the looping for you. However, if your program is only processing small lists there's no need to use Numpy: just use the functions in the standard random module to generate your random numbers and use the technique shown in Rahul K P's answer to calculate the sums: it's more compact and faster than using a Python loop.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

algorithm to randomize a matrix with uniqueness constraint - python

Related

How to sum a numpy along the row axis by including only certain values per row according to variable length indices?

how can I shuffle node labels and get a new weight vector using NumPy in Python?

In a specific row of a numpy array, how to find column indexes of the top 3 largest values

How to build a N(N+1) matrix with number in range of 1~NN and totally distributed?

Python 2D array sum enumeration

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

algorithm to randomize a matrix with uniqueness constraint - python

Related

How to sum a numpy along the row axis by including only certain values per row according to variable length indices?

how can I shuffle node labels and get a new weight vector using NumPy in Python?

In a specific row of a numpy array, how to find column indexes of the top 3 largest values

How to build a N*(N+1) matrix with number in range of 1~N*N and totally distributed?

Python 2D array sum enumeration

Categories

Resources

How to build a N(N+1) matrix with number in range of 1~NN and totally distributed?