Optimization problem of connection between nodes - python

I have a problem that I am struggling to solve efficiently. I have 2 sets of nodes, A_1, ... A_m and B_1,... B_m. Each node from one set can be connected to a node from the other set with a given constant probability p. Then, what I want is to have the maximum links between the set A and the set B under the condition that there is one and only one node remaining between each single node of A and B.
For example if A_i is connected to B_j and B_j', then we have either to remove the link with either B_j or B_j'.
The successful link between A_i and B_j can be stored in a matrix where M_{ij} = 1 if there is a link between A_i and B_j, 0 if not.
This matrix can for example be simulated by the given code for a matrix dimension 5 and a probability 0.7 of successful link (in python):
import numpy as np
m = 5
proba = 0.7
M = np.random.choice([0, 1], size=(m, m), p=[1 - proba, proba])
This can give for example the matrix:
M =
[[0 1 0 1 0]
[1 1 1 0 1]
[1 1 1 0 0]
[0 0 0 1 0]
[1 1 1 1 1]]
What I want is to implement the transformation on this matrix that satisfies the condition of maximum one link between nodes while maximizing the number of links. This transformation convert M into M_2.
Here's the condition on M_2:
In the end, there is k<= m links between the set A and the set B as each node of A can in fine connect to only one single node (or zero) of the the set of B.
This translate in terms of matrix into a transformation from M to M_2 where M_2_{ij} is equal either to M_{ij} or 0. If M_2_{i0,j0} = 1 then M_2_{i, j0} = 0 and M'_{i0, j} = 0 for all i, j != i0, j0. It means that there is only one (or zero) non zero term per row and per column.
It is easy to remove any terms in a row if one is already equal to one. The hard part of what I want my code to do is to maximize the number of non-zero terms of the matrix M_2 while respecting the conditions on the matrix M_2.
I've managed to do painfully the "automatic reduction part" where if you find a row with only one non-zero term, you remove the non-zero terms of the associated column and conversely. I've done it recursively, until the transformed matrix stays the same (The code is at the end, and is quite complicated to understand, because I think that I've not found the elegant way to do it...)
With this code there is some improvements:
M_2' =
[[0 1 0 0 0]
[1 0 1 0 1]
[1 0 1 0 0]
[0 0 0 1 0]
[1 0 1 0 1]]
There are less non-zero terms compared to M in this matrix but still, it does not respect the conditions.
What I want is to automatically do the final step to find the matrix M_2 which by hand should look like this:
M_2 =
[[0 1 0 0 0]
[1 0 0 0 0]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 1]]
But I have no idea how to automatically and efficiently do such optimization... Anyone have ideas how to realize that?
Thank you in advance.
Paul
The code for the straightforward transformation that gave me M_2' is the following (if you have any improvement for a more pythonic or efficient way to write it, I'd be happy to see it):
def remove_straightforward_terms(M):
"""Transform the matrix M by removing a maximum of non-zero terms."""
# Index of the non-zero (=1) terms:
ind_i, ind_j = np.where(M == 1)
# Remove automatically some terms:
new_ind_i, new_ind_j = automatic_remove_straightforward(ind_i, ind_j)
# Rebuild the matrix using this terms
return build_matrix_non_zero(M, new_ind_i, new_ind_j)
def build_matrix_non_zero(M, ind_i, ind_j):
"""Rebuild the matrix using non-zero terms indices."""
M_2 = np.zeros(M.shape, dtype=int)
for ind, val in np.ndenumerate(ind_i):
M_2[ind_i[ind], ind_j[ind]] = 1
return M_2
def automatic_remove_straightforward(ind_i, ind_j):
"""Recursively remove the terms from ind_i and ind_j."""
ind_j_int_2 = []
ind_i_int_2 = []
while len(ind_i) != len(ind_i_int_2) and len(ind_j) != len(ind_j_int_2):
if len(ind_i_int_2) != 0:
# Necessary for entering the while loop...
ind_i = ind_i_int_2
ind_j = ind_j_int_2
# If there is only one non zero term for a given row, remove the other corresponding terms from the column.
ind_i_int_2, ind_j_int_2 = remove_indices_straightforward(ind_i, ind_j)
# If there is only one non zero term for a given column, remove the other corresponding terms from the row.
ind_j_int_2, ind_i_int_2 = remove_indices_straightforward(
ind_j_int_2, ind_i_int_2)
return ind_i, ind_j
def remove_indices_straightforward(ind_i, ind_j):
"""Remove the non-zero terms automatically.
Let's consider i is the column and j the row"""
unique_i, counts_i = np.unique(ind_i, return_counts=True)
# Record all the indices that will be removed (it correspond to removing non-zero terms)
remove_ind = []
for ind, val_counts in np.ndenumerate(counts_i):
if val_counts == 1:
# If a term is in ind_i is only here once (it is alone on its line).
val_i = unique_i[ind]
# We find its position in the list ind_i
index_ind_i = np.where(ind_i == val_i)[0]
# We find the correspond row of the list ind_j
val_j = ind_j[index_ind_i]
# We find the indices of all the non-zero terms share the same row.
indices_ind_j = np.where(ind_j == val_j)[0]
# We record all but the one that was found in the first place
indices_ind_j = indices_ind_j[indices_ind_j != index_ind_i]
# and put them in the remove_ind
remove_ind = np.concatenate((remove_ind, indices_ind_j))
# We remove the indices that we don't want anymore
new_ind_j = np.delete(ind_j, remove_ind)
new_ind_i = np.delete(ind_i, remove_ind)
return new_ind_i, new_ind_j
M_2 = remove_straightforward_terms(M)
EDIT:
Using the solution proposed by btilly, here is how to obtain the desired matrix:
import numpy as np
import networkx as nx
from networkx.algorithms import bipartite
m = 5
p = 0.7
def connecting(m, p):
G_1 = nx.bipartite.random_graph(m, m, p)
top_nodes = {n for n, d in G_1.nodes(data=True) if d['bipartite'] == 0}
A = nx.bipartite.hopcroft_karp_matching(G_1, top_nodes)
M = np.zeros((m, m), dtype=int)
for node_1, node_2 in A.items():
if node_1 < m:
M[node_1, node_2 - m] = 1
return M

Related

Find maximum value and indices of a maximum without using max built in functions

Like the title says, I'm trying to find the max value and location of the argument(s) without using any variation of the built in max functions.
I was able to piece this together for a basic np.array, but I'm having difficulty translating it into a matrix ... I think because of how it is indexed.
Here's what I have for the np.array:
a = np.array((1,2,2,3,4,3,2,1,4,3))
def argmax(x):
maximum = 0
for i in range(len(x)):
if x[i] > maximum: maximum = x[i]
pos = np.argwhere(x == maximum)[0][0]
return(print('The maximum value of the array is', maximum, 'and is located at index', pos))
argmax(a)
The maximum value of the array is 4 and is located at index 4.
I'm trying to create something similar for any size matrix without using built in max functions. Can someone help me with the function and help me understand the difference in indexing between a basic array and a matrix?
This works for 1d arrays and 2d arrays:
import numpy as np
import math
matrix = np.arange(20).reshape(4, 5)
print(matrix)
# Important
matrix = np.atleast_2d(matrix)
# set maximum to -inf
maximum = -math.inf
# Search maximum
for j in range(matrix.shape[1]):
for i in range(matrix.shape[0]):
maximum = matrix[i][j] if matrix[i][j] > maximum else maximum
# More than 1 maximum, take the first one?
pos = np.argwhere(matrix == maximum)[0]
print(
f"The maximum value of the array is: {maximum}, located at: row {pos[0]}, column {pos[1]}"
)
Outputs:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
The maximum value of the array is: 19, located at: row 3, column 4
I'm assuming that you want to find the maxima along a given axis. Otherwise, do np.unravel_index(argmax(a.ravel()), a.shape).
First let's define a function that steps along the given dimension and keeps track of both the maxima and the indices at which they occur:
def argmax(a, axis):
# index
cur = [slice(None) for _ in range(a.ndim)]
cur[axis] = 0
# trackers
val = a[tuple(index)]
ind = np.zeros(val.shape, dtype=int)
# loop
for i in range(1, a.shape[axis]):
index[axis] = i
v = a[tuple(index)]
mask = v > val
val[mask] = v[mask]
ind[mask] = i
return ind
This returns the index along axis. If you want to get the other indices, do something like
all_indices = list(np.indices(a.shape))
all_indices[axis] = ind
all_indices = tuple(all indices)
Or alternatively,
all_indices = [slice(None) for _ range(a.ndim)]
all_indices[axis] = ind
all_indices = tuple(all indices)
This function skips a couple of corner cases, like when a.shape[axis] == 0 and a.ndim == 0, but you can easily handle them with a simple preliminary test.
You can also special-case axis=None with a recursive call as shown in the beginning of the answer.
If you want to allow multiple axes simultaneously, swap them all to the end, and reshape them into a single axis. So a hybrid of the axis=None and normal processing.
Here is a way to do it for ANY shape and dimensions array (it assumes values are non-negative since you initialize maximum with 0 and returns the first incidence of maximum only as you did in your original answer. Of course you can easily change them):
def argmax(x):
maximum = 0
for i, v in enumerate(x.reshape(-1)):
if v > maximum:
maximum = v
pos = i
print('The maximum value of the array is', maximum, 'and is located at index', np.unravel_index(pos, x.shape))
argmax(a)
Example:
a = np.random.randint(0,10,(3,4))
#[[7 6 2 6]
# [7 2 0 5]
# [4 0 8 7]]
output:
The maximum value of the array is 8 and is located at index (2, 2)

How to find the largest submatrix whose values are sorted both row-wise and column-wise?

Given a matrix, for example:
[[2 5 3 8 3]
[1 4 6 8 4]
[3 6 7 9 5]
[1 3 6 4 2]
[2 6 4 3 1]]
...how to find the greatest sub-matrix (i.e. with most values) in which all rows are sorted and all columns are sorted?
In the above example the solution would be the sub-matrix at (1,0)-(2,3):
1 4 6 8
3 6 7 9
and its size is 8.
You could use recursion to get a maximised area that could fit below a given row segment, that itself has already been verified to be a non-decreasing value sequence. The found area will be guaranteed to stay within the column range of the given row segment, but could be more narrow and span several rows below the given row.
The area that will be returned can then be extended one row upwards, with the width that area already has. If the segment can not be wider, then we will have found the maximum area that can be made from a subsequence of this segment (or the full segment) combined with rows below it.
By filtering the best result from the results retrieved for all segments in all rows, we will have found the solution.
To avoid repetition of recursive calculations that had already been done for exactly the same segment, memoisation can be used (direct programming).
Here is the suggested code:
from collections import namedtuple
Area = namedtuple('Area', 'start_row_num start_col_num end_row_num end_col_num size')
EMPTY_AREA = Area(0,0,0,0,0)
def greatest_sub(matrix):
memo = {}
# Function that will be called recursively
def greatest_downward_extension(row_num, start_col_num, end_col_num, depth=0):
# Exit if the described segment has no width
if end_col_num <= start_col_num:
return EMPTY_AREA
next_row_num = row_num + 1
# Use memoisation:
# Derive an ID (hash) from the segment's attributes for use as memoisation key
segment_id = ((row_num * len(matrix[0]) + start_col_num)
* len(matrix[0]) + end_col_num)
if segment_id in memo:
return memo[segment_id]
# This segment without additional rows is currently the best we have:
best = Area(row_num, start_col_num, next_row_num, end_col_num,
end_col_num - start_col_num)
if next_row_num >= len(matrix):
return best
next_row = matrix[next_row_num]
row = matrix[row_num]
prev_val = -float('inf')
for col_num in range(start_col_num, end_col_num + 1):
# Detect interruption in increasing series,
# either vertically (1) or horizontally (0)
status = (1 if col_num >= end_col_num or next_row[col_num] < row[col_num]
else (0 if next_row[col_num] < prev_val
else -1))
if status >= 0: # There is an interruption: stop segment
# Find largest area below current row segment, within its column range
result = greatest_downward_extension(next_row_num,
start_col_num, col_num)
# Get column range of found area and add that range from the current row
size = result.size + result.end_col_num - result.start_col_num
if size > best.size:
best = Area(row_num, result.start_col_num,
result.end_row_num, result.end_col_num, size)
if col_num >= end_col_num:
break
# When the interruption was vertical, the next segment can only start
# at the next column (status == 1)
start_col_num = col_num + status
prev_val = row[col_num]
memo[segment_id] = best
return best
# For each row identify the segments with non-decreasing values
best = EMPTY_AREA
for row_num, row in enumerate(matrix):
prev_val = -float('inf')
start_col_num = 0
for end_col_num in range(start_col_num, len(row) + 1):
# When value decreases (or we reached the end of the row),
# the segment ends here
if end_col_num >= len(row) or row[end_col_num] < prev_val:
# Find largest area below current row segment, within its column range
result = greatest_downward_extension(row_num, start_col_num, end_col_num)
if result.size > best.size:
best = result
if end_col_num >= len(row):
break
start_col_num = end_col_num
prev_val = row[end_col_num]
return best
# Sample call
matrix = [
[2, 5, 3, 8, 3],
[1, 4, 6, 8, 4],
[3, 6, 7, 9, 5],
[1, 3, 6, 4, 2],
[2, 6, 4, 3, 1]]
result = greatest_sub(matrix)
print(result)
The output for the sample data will be:
Area(start_row_num=1, start_col_num=0, end_row_num=3, end_col_num=4, size=8)
One approach, which it sounds like you have tried, would be using brute force recursion to check first the entire matrix, then smaller and smaller portions by area until you found one that works. It sounds like you already tried that, but you may get different results depending on whether you check from smallest to largest sections (in which case you would have to check every combination no matter what) or largest to smallest (in which case you would still end up checking a very large number of cases).
Another approach would be to create two matrices with the same dimensions as the original, where each slot in the matrix represents the gap between two numbers and the first slot in each row or column represents the gap above the first number in said row or column. You could fill the first matrix with ones and zeros to represent whether or not a matrix could be formed vertically (a 1 representation of a gap would mean that the number lower than the gap would be larger than the number above the gap) and the second with ones or zeros to represent a similar condition horizontally. You could use AND(a,b) (in other words a binary operation where only 1 1 maps to 1) for each value in the matrix to make a matrix that would essentially be AND(matrix1,matrix2), and then you could find the largest rectangle of ones in the matrix.
Example matrix (smaller for simplicity and convenience):
[ 1 2 5 ]
[ 4 9 2 ]
[ 3 6 4 ]
Vertical matrix: a one in location L means that the number in position L is greater than the number directly above L, or that L is the top of the column (with brackets signifying that the first row will always fit the vertical conditions).
{ 1 1 1 }
[ 1 1 0 ]
[ 0 0 1 ]
Horizontal matrix: a one in location L means that the number in position L is greater than the number directly left of L, or that L is the front of the row (leftmost point) (with brackets again signifying that the first row will always fit the vertical conditions).
{1} [ 1 1 ]
{1} [ 1 0 ]
{1} [ 1 0 ]
Vertical AND Horizontal (you could ignore the vertical-only and horizontal-only steps and do this at once: for each cell, put in a 0 if the number is larger than the number on its right or directly below it, otherwise put in a 1)
[ 1 1 1 ]
[ 1 1 0 ]
[ 0 0 0 ]
The largest rectangle would be represented by the largest rectangle of ones with the same indices as the original rectangle. It should be much easier to find the largest rectangle of ones.
Hope this helps! I know I did not explain this very clearly but the general idea should be helpful. It is very similar to the idea you presented about comparing all i and i-1 digits. Let me know if it would help for me to do this for the example matrix you gave.

random flip m values from an array

I have an array with length n, I want to randomly choose m elements from it and flip their value. What is the most efficient way?
there are two cases, m=1 case is a special case. It can be discussed separately, and m=/=1.
My attempt is:
import numpy as np
n = 20
m = 5
#generate an array a
a = np.random.randint(0,2,n)*2-1
#random choose `m` element and flip it.
for i in np.random.randint(0,n,m):
a[m]=-a[m]
Suppose m are tens and n are hundreds.
To make sure we are not flipping the same element twice or even more times, we can create unique indices in that length range with np.random.choice using its optional replace argument set as False. Then, simply indexing into the input array and flipping in one go should give us the desired output. Thus, we would have an implementation like so -
idx = np.random.choice(n,m,replace=False)
a[idx] = -a[idx]
Faster version : For a faster version of np.random_choice, I would suggest reading up on this post that explores using np.argpartition to simulate identical behavior.
You can make a random permutation of the array indices, take the first m of them and flip their values:
a[np.random.permutation(range(len(a)))[:m]]*=-1
Using permutation validate you don't choose the same index twice.
You need to change the index of the array from m to i to actually change the value.
Result:
import numpy as np
n = 20
m = 5
#generate an array a
a = np.random.randint(0,2,n)*2-1
print(a)
#random choose `i` element and flip it.
for i in np.random.randint(0,n,m):
a[i] = -a[i]
print(a)
My output:
[ 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 1 1 1 1 -1 1 -1]
[ 1 1 -1 -1 -1 -1 1 1 1 -1 -1 1 -1 -1 1 -1 1 -1 -1 -1]

Count number of specific elements in between other elements in list

I am reading a data file. Rows start with consecutive numbers (steps), and sometimes in between each row there is a 0.
E.g:
1
0
2
0
3
4
5
0
0
0
6
0
How can I create a list that counts the number of 0s in between each step.
I want a list like this:
finalList = [1,1,0,0,3,1]
which represents the number of 0s each step contains, i.e: step 1 has 1 zero, step 2 has 1 zero, step 3 has 0 zeros, step 4 has 0 zeros, step 5 has 3 zeros and step 6 has 1 zero.
The following code should work if your datafile looks exactly as you described (e.g. no other number except incresing number of step and zeroes).
cur = 0
res = []
with open("file.txt") as f:
for line in f:
if line.strip() == '0':
cur += 1
else:
res.append(cur)
cur = 0
a = [1,0,2,0,3,4,5,0,0,0,6,0]
finalList = []
count = 0
for i in xrange(len(a)):
if i == 0 : continue
if a[i] == 0 :
count += 1
else :
finalList.append(count)
count = 0
finalList.append(count)
Possibly overly clever solution using Python's included batteries:
from itertools import chain, groupby
with open("file.txt") as f:
# Add extra zeroes after non-zero values so we see a group when no padding exists
extrazeroes = chain.from_iterable((x, 0) if x else (x,) for x in map(int, f))
# Count elements in group and subtract 1 if not first group to account for padding
# The filter condition means we drop non-zero values cheaply
zerocounts = [sum(1 for _ in g) - bool(gnum) for gnum, (k, g) in enumerate(groupby(extrazeroes)) if k == 0]
# If leading zeroes (before first non-zero line) can't happen, simplify to:
zerocounts = [sum(1 for _ in g) - 1 for k, g in groupby(extrazeroes) if k == 0]
Yes, it's a bit complicated (if you didn't care about including zeroes where there was no gap between two non-zero values it would be much simpler), but it's succinct and should be very fast. If you could omit the 0s in your counts, it would simplify to the much cleaner:
with open("file.txt") as f:
zerocounts = [sum(1 for _ in g) for k, g in groupby(map(int, f)) if k == 0]
For the record, I'd use the latter if it met requirements. The former should probably not go in production code. :-)
Note that depending on your use case, using groupby may be a good idea for your broader problem; in comments, you mention you're storing all the lines in the file (using f = f.readlines()), which implies you'll be accessing them, possibly based on the values stored in zerocounts. If you have some specific need to process each "step" based on the number of following zeroes, an adaptation of the code above might save you the memory overhead of slurping the file by lazily grouping and processing.
Note: To avoid slurping the whole file into memory, in Python 2, you'd want to add from future_builtins import map so map is a lazy generator function like it is in Py3, rather than loading the whole file and converting all of it to int up front. If you don't want to stomp map, importing and using itertools.imap over map for int conversion accomplishes the same goal.
I came up with this:
finalList = []
count = 0
step = None
for e in [1, 0, 2, 0, 3, 4, 5, 0, 0, 0, 6, 0]:
if e > 0:
if step:
finalList.append(count)
step = e
count = 0
else:
count += 1
if step:
finalList.append(count)
Alternative solution
# temp list (copy of l with last element if doesn't exist)
_l = l if l[-1] > 0 else l + [max(l) + 1]
# _l.index(i) - _l.index(i - 1) - 1 = distance between elements
[_l.index(i) - _l.index(i - 1) - 1 for i in range(2, max(_l) + 1)]

Check for pattern recursively

*Note i refer to thing as a matrix but it is not, it is just a collection of 1's and zeros
Suppose you have a matrix that is always square (n x n). Is it possible to determine if there exists a single column/row/diagonal such that each item is a 1.
Take the matrix below for instance (True):
1 0 0
1 1 0
0 0 1
Another example (True):
1 1 1
0 0 0
0 1 0
And finally one without a solution (False):
0 1 1
1 0 0
0 0 0
Notice how there is a diagonal filled with 1's. The rule is there is either there is a solution or there is no solution. There can be any number of 1's or zeros within the matrix. All i really need to do is, if you have (n x n) then there should be a row/column/diagonal with n elements the same.
If this is not possible with recursions, please let me know what is the best and most efficient method. Thanks a lot, i have been stuck on this for hours so any help is appreciated (if you could post samples that would be great).
EDIT
This is one solution that i came up with but it gets really complex after a while.
Take the first example i gave and string all the rows together so you get:
1 0 0, 1 1 0, 0 0 1
Then add zeros between the rows to get:
1 0 0 0, 1 1 0 0, 0 0 1 0
Now if you look closely, you will see that the distances between the 1's that form a solution are equal. I dont know how this can be implemented though.
In search for an elegant solution I came up with this:
class LineOfOnesChecker(object):
_DIAG_INDICES = (lambda i: i, lambda i: -i - 1)
def __init__(self, matrix):
self._matrix = matrix
self._len_range = range(len(self._matrix))
def has_any(self):
return self.has_row() or self.has_col() or self.has_diag()
def has_row(self):
return any(all(elem == 1 for elem in row)
for row in self._matrix)
def has_col(self):
return any(all(self._matrix[i][j] == 1 for i in self._len_range)
for j in self._len_range)
def has_diag(self):
return any(all(self._matrix[transf(i)][i] == 1 for i in self._len_range)
for transf in self._DIAG_INDICES)
Usage:
print LineOfOnesChecker(matrix).has_any()
You can have a list of n 1's and do an 'AND' for your sets of diagonal elements, row elements and column elements and if any of those AND operation results in a TRUE, then you have your valid pattern.
import sys
matrix = [[1,0,1],[1,0,1],[1,0,1]]
transpose = zip(*matrix)
diagonal1 = []
for n,elem in enumerate(matrix):
diagonal1.append(elem[n])
diagonal2 = []
for n,elem in enumerate(transpose):
diagonal2.append(elem[n])
for row in matrix:
if reduce(lambda x,y: x and y, row):
print True
sys.exit()
for row in transpose:
if reduce(lambda x,y: x and y, row):
print True
sys.exit()
if (reduce(lambda x,y: x and y, diagonal1) or reduce(lambda x, y: x and y, diagonal2)):
print True
sys.exit()
From what I understand of the problem, you just need to check whether any row, coloumn or diagonal consists entirely of '1's. This can be done very easily using all in Python, so I don't get why you want to do this recursively.
The more obvious solution (in my mind) is something like this:
#! /usr/bin/env python
boards=[
((0,1,0),(1,0,1),(0,1,0)),
((1,1,1),(0,0,0),(0,0,0)),
((0,0,0),(1,1,1),(0,0,0)),
((0,0,0),(0,0,0),(1,1,1)),
((1,0,0),(1,0,0),(1,0,0)),
((0,1,0),(0,1,0),(0,1,0)),
((0,0,1),(0,0,1),(0,0,1)),
((1,0,0),(0,1,0),(0,0,1)),
((0,0,1),(0,1,0),(1,0,0)),
((0,0,0),(0,0,0),(0,0,0))
]
def check(board):
for row in board:
if all(row):
return True
for col in xrange(len(board)):
vector=[board[row][col] for row in xrange(len(board))]
if all(vector):
return True
diag1=[board[i][i] for i in xrange(len(board))]
if all(diag1):
return True
diag2=[board[i][i] for i in xrange(len(board)-1,-1,-1)]
if all(diag2):
return True
return False
if __name__=='__main__':
for board in boards:
if check(board):
print "Yes"
else:
print "No"

Categories