How to solve for unknowns in a 5x5 matrix in python - python

Here is a 5x5 matrix, with all cells unknown, it looks something like this:
A1+B1+C1+D1+E1| 1
A2+B2+C2+D2+E2| 0
A3+B3+C3+D3+E3| 1
A4+B4+C4+D4+E4| 3
A5+B5+C5+D5+E5| 2
_______________
2 1 2 1 1
So, the summation of rows can be seen on the right, and the summation of columns can be seen on the bottom. The solution can only be 0 or 1, and as an example here is the solution to the specific one I have typed out above:
0+0+1+0+0| 1
0+0+0+0+0| 0
1+0+0+0+0| 1
1+1+0+0+1| 3
0+0+1+1+0| 2
____________
2 1 2 1 1
As you can see, summing the rows and columns gives the results on the right and bottom.
My question: How would you go about entering the original matrix with unknowns and having python iterate each cell with 0 or 1 until the puzzle is complete?

You don't really need a matrix -- just use vectors (tuples) of length 25. They can represent 5x5 matrices according to the following scheme:
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
These are the indices of such tuples. Note that the row and column of an index can be obtained from the function divmod.
You can use product from itertools to iterate over the 2**25 possible ways of filling in the matrix.
These ideas lead to the following code:
from itertools import product
#nxn matrices will be represented by tuples of length n**2,
#in row-major order
#the following function caluculates row and column sums:
def all_sums(array,n):
row_sums = [0]*n
col_sums = [0]*n
for i,x in enumerate(array):
q,r = divmod(i,n)
row_sums[q] += x
col_sums[r] += x
return row_sums, col_sums
#in what follows, row_sums, col_sums are lists of target values
def solve_puzzle(row_sums, col_sums):
n = len(row_sums)
for p in product(range(2),repeat = n*n):
if all_sums(p,n) == (row_sums, col_sums):
return p
return "no solution"
solution = solve_puzzle([1,0,1,3,2],[2,1,2,1,1])
for i in range(0,25,5):
print(solution[i:i+5])
Output:
(0, 0, 0, 0, 1)
(0, 0, 0, 0, 0)
(0, 0, 0, 1, 0)
(1, 1, 1, 0, 0)
(1, 0, 1, 0, 0)
In this case brute-force was feasible. If you go much beyond 5x5 it would no longer be feasible, and more sophisticated algorithms would be required.

This is a special case of an integer linear programming problem. The special case of 0-1 integer linear programming is still unfortunately NP-complete, though there exist many algorithms including heuristic ones. You can use a built-in library to do this for you.

Related

Why does the last element reflect the number of non-negative solutions?

Please excuse my naivete as I don't have much programming experience. While googling something for an unrelated question, I stumbled upon this:
https://www.geeksforgeeks.org/find-number-of-solutions-of-a-linear-equation-of-n-variables/
I completely understand the first (extremely inefficient) bit of code. But the second:
def countSol(coeff, n, rhs):
# Create and initialize a table
# to store results of subproblems
dp = [0 for i in range(rhs + 1)]
dp[0] = 1
# Fill table in bottom up manner
for i in range(n):
for j in range(coeff[i], rhs + 1):
dp[j] += dp[j - coeff[i]]
return dp[rhs]
confuses me. My question being: why does this second program count the number of non-negative integer solutions?
I have written out several examples, including the one given in the article, and I understand that it does indeed do this. And I understand how it is populating the list. But I don't understand exactly why this works.
Please excuse what must be, to some, an ignorant question. But I would quite like to understand the logic, as I think it rather clever that such a little snip-it is able able to answer a question as general as "How many non negative integer solutions exist" (for some general equation).
This algorithms is pretty cool and demonstrates the power of looking for a solution from a different perspective.
Let's take a example: 3x + 2y + z = 6, where LHS is the left hand side and RHS is the right hand side.
dp[k] will keep track of the number of unique ways to arrive at a RHS value of k by substituting non-negative integer values for LHS variables.
The i loop iterates over the variables in the LHS. The algorithm begins with setting all the variables to zero. So, the only possible k value is zero, hence
k 0 1 2 3 4 5 6
dp[k] = 1 0 0 0 0 0 0
For i = 0, we will update dp to reflect what happens if x is 1 or 2. We don't care about x > 2 because the solutions are all non-negative and 3x would be too big. The j loop is responsible for updating dp and dp[k] gets incremented by dp[k - 3] because we can arrive at RHS value k by adding one copy of the coefficient 3 to k-3. The result is
k 0 1 2 3 4 5 6
dp[k] = 1 0 0 1 0 0 1
Now the algorithm continues with i = 1, updating dp to reflect all possible RHS values where x is 0, 1, or 2 and y is 0, 1, 2, or 3. This time the j loop increments dp[k] by dp[k-2] because we can arrive at RHS value k by adding one copy of the coefficient 2 to k-2, resulting in
k 0 1 2 3 4 5 6
dp[k] = 1 0 1 1 1 1 2
Finally, the algorithm incorporates z = 1, 2, 3, 4, 5, or 6, resulting in
k 0 1 2 3 4 5 6
dp[k] = 1 1 2 3 4 5 7
In addition to computing the answer in pseudo-polynomial time, dp encodes the answer for every RHS <= the input right hand side.

Count how many consecutive TRUEs on each row in a dataframe

I am trying to count how many consecutive TRUEs on each row and I solved that part myself but I need to find a solution for this part: If a row starts with FALSE then result must be 0. There is a sample dataset below. Can you recommend me your tips to how to solve this.
PS. my original question is at the link below.
how to find number of consecutive decreases(increases)
Sample data, .csv file
idx,Expected Results,M_1,M_2,M_3,M_4,M_5,M_6,M_7,M_8,M_9,M_10,M_11,M_12
1001,0,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1002,3,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE
1003,1,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1004,4,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1005,0,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1006,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1007,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1008,1,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1009,0,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE
1010,1,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE
1011,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE
1013,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1014,1,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1015,1,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1016,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1017,2,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1018,0,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
After John Solution;
How can I count the Trues till I see the "False"
result = df.where(df[0], 0)
idx,M_1,M_2,M_3,M_4,M_5,M_6,M_7,M_8,M_9,M_10,M_11,M_12
1001,0,0,0,0,0,0,0,0,0,0,0,0
1002,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE
1003,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1004,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1005,0,0,0,0,0,0,0,0,0,0,0,0
1006,0,0,0,0,0,0,0,0,0,0,0,0
1007,0,0,0,0,0,0,0,0,0,0,0,0
1008,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1009,0,0,0,0,0,0,0,0,0,0,0,0
1010,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE
1011,0,0,0,0,0,0,0,0,0,0,0,0
1013,0,0,0,0,0,0,0,0,0,0,0,0
1014,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1015,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1016,0,0,0,0,0,0,0,0,0,0,0,0
1017,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
1018,0,0,0,0,0,0,0,0,0,0,0,0
You can use np.argmin. You needn't prefilter your df, it will handle rows starting with False correctly.
df.loc[:, 'M_1':'M_12'].values.argmin(1)
#array([0, 3, 1, 4, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 2, 0])
Note that this assumes there is at least one False in every row.
df.loc[:, 'M_1':'M_12'].apply(np.logical_and.accumulate, axis=1).sum(axis=1)
reverse values of columns M-1 - M-12 using negation '~'. I.e, True to False and vice-versa. Doing cummax to separate first group of consecutive True (note: at this point True represent False-value and 'False' represent True-value). Doing another negation on the result of cummax and finally sum
(~(~df.drop(['idx'], 1)).cummax(1)).sum(1)
Out[503]:
0 0
1 3
2 1
3 4
4 0
5 0
6 0
7 1
8 0
9 1
10 0
11 0
12 1
13 1
14 0
15 2
16 0
dtype: int64

How to determine row equality across several rows in a fully vectorized way?

I have a numpy.ndarray called grouping of size (S, N). Each row of grouping gives me the group labels of a sample of data. I run my algorithm S times and get new group labels in each iteration.
I want to determine how many times each sample of my data has the same group label as every other sample of my data across the S iterations in a fully vectorized way.
In a not-completely-vectorized way:
sim_matrix = np.zeros((N, N))
for s in range(S):
sim_matrix += np.equal.outer(grouping[s, :], grouping[s, :])
One vectorized approach would be with broadcasting -
(grouping[:,None,:] == grouping[:,:,None]).sum(0)
For performance, we can use np.count_nonzero -
np.count_nonzero(grouping[:,None,:] == grouping[:,:,None],axis=0)
The sum of equal.outer is a cryptic way of calculating all-pairs similarity of columns:
sum_i sum_jk (A[i,j] == A[i,k]) is the same as
sum_jk sum_i (A[i,j] == A[i,k])
where sum_i loops over rows, sum_jk over all pairs of columns.
Comparing two vectors by counting the the number of positions where they differ
is called
Hamming distance .
If we change == above to !=, similarity to distance = nrows - similarity
(most similar ⇔ distance 0), we get the problem:
find the Hamming distance between all pairs of a bunch of vectors:.
def allpairs_hamming( A, dtype=np.uint32 ):
""" -> Hamming distances between all pairs of rows of A """
nrow, ncol = A.shape
allpair_dist = np.zeros( [nrow, nrow], dtype=dtype )
for j in xrange(nrow):
for k in xrange( j + 1, nrow ):
allpair_dist[j,k] = allpair_dist[k,j] = (A[j] != A[k]).sum() # row diff
return allpair_dist
allpairs_hamming: 30.7 sec, 3 ns per cmp Nvec 2000 Veclen 5000 A 10m pairdist uint32 15m
Almost all the cpu time is in the row diff, not in the outer loop for j ... for k -- 3 ns per scalar compare, on a stock mac, isn't bad.
However memory caching is much faster if each row A[j] is in contiguous memory,
as for numpy C-order arrays.
Apart from that, whether you do "all pairs of rows" or "all pairs of columns"
doesn't matter, as long as you're clear.
(Is it possible to find "nearby" pairs in time and space < O(npairs), here O(20000^2) ? Afaik there are more methods than test cases.)
See also:
http://docs.scipy.org/doc/scipy/reference/spatial.distance.html (bug: hamming .mean not .sum)
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html
https://stats.stackexchange.com/search?q=[clustering]+pairwise
You want to compare identic rows. A way to do that is grouping the entire rows in a raw block :
S,N=12,2
a=np.random.randint(0,3,(S,N)) #12 samples of two labels.
#a
0 1
0 2 2
1 2 0
2 1 2
3 0 0
4 0 1
5 1 1
6 0 1
7 0 1
8 0 1
9 0 0
10 2 2
11 0 0
samples=np.ascontiguousarray(a).view(dtype((void,a.strides[0])))
sample.shape is then (S,1).
you can now inventory your sample with np.unique, and use Pandas dataframes for pretty report :
_,inds,invs=np.unique(samples,return_index=True, return_inverse=True)
df=pd.DataFrame(invs)
result=df.reset_index().groupby(0).index.apply(list).to_frame()
result['sample']=[list(x) for x in a[inds]]
for
index samples
0
0 [3, 9, 11] [0, 0]
1 [4, 6, 7, 8] [0, 1]
2 [5] [1, 1]
3 [2] [1, 2]
4 [1] [2, 0]
5 [0, 10] [2, 2]
It can be a O(S ln S) if there is few fits between samples, when yours is O( N²S).

Bit wise Operation Logic

I have two equal sized arrays ( array1 and array2 ) of 0's and 1's. How do I get all the arrays whose bit wise union with array1 result into array2 ? For example,if array1 = [1, 1, 1] and array2 = [1, 1, 1]. Output should be all eight arrays : [0, 0, 0], [1, 0, 0], ...., [1, 1, 1] . Are there efficient solutions to it or only brute force is the way ?
My try :
I tried to calculate bit wise difference first and if any of bit is negative then return false( not possible to combine first array with any kind of array to get array2). If all bits are non-negative then .... if bit in difference is 0 then it can be replaced by 0 or 1 either( this is wrong assumption albeit and fails for if array1= [0,0] , array2= [0,0], and if any bit in difference is 0 then required array has to have 1 at that place to make it 1
Here's how I would go about solving this problem:
First, let's think about this. You need to find all arrays of binary values that, when combined (via some operator) with a known binary value, = a new binary value. Don't try to solve the problem yet. Assume you need to go from 00 to 11. How many possible answers are there? Assume you need to go from 11 to 11. How many possible answers are there? Can you do any better (in the worst case) than a brute force approach? That'll give you a complexity bound.
With that rough bound in mind, tackle the bits of the question that are a bit curious. Drill down onto the question a little bit more. What is the 'bitwise union operator'? Is it 'and'? Is it 'or'? Is it something more complicated? 'Bitwise Union' sounds like B[i] = A[i] OR X[i], but anyone asking that question could mean something else..
Depending on the answer to questions 1 and 2, you have a lot to work with here. I can think of a few different options, but I think from here you can come up with an algorithm.
Once you have a solution, you need to think about "Can I do a better job here'? A lot of that goes back to the initial impressions about the problem and how they're constructed, and what/how much you think you can optimize.
Note: I will explain the following with an example input:
A = [0 0 1 0 1 1], B = [1 1 1 0 1 1]
Assuming you want to calculate X for the equations A OR X = B, let us see what are the options for each choice of bit in A and B:
A OR X = B
--------------------
0 0 0
0 1 1
1 N.A. 0
1 (0,1) 1
If any bit in A is 1, and its corresponding B bit is 0, there are no solutions possible. Return an empty set.
If the corresponding bits in A and B are 1, the corresponding bit in X does not matter.
Now, see that one solution for X is B itself, (if condition #1, as stated above, is satisfied). Hence, lets construct a number start_num = B. This will be one solution, and the other solutions will be constructed from this.
start_num = B = [1 1 1 0 1 1]
The 'choice' bits are those where X can take any value, i.e. those positions where A=1 and B=1. Let us make another number choice = A AND B, so that choice = 1 denotes those positions. Also notice that, if there are k positions where choice = 1, the total number of solutions is 2^k.
choice = A AND B = [0 0 1 0 1 1] ,hence, k = 3
Store these 'choice' positions in an array (of length k), starting from the right (LSB = 0). Let us call this array pos_array.
pos_array = [0 1 3]
Notice that all the 'choice' bits in start_num are set to 1. Hence, all the other solutions will have some (1 <= p <= k) of these bits set to 0. Now that we know which bits are to be changed, we need to make these solutions in an efficient manner.
This can be done by making all solutions in an order where the difference between the previous solution and the current one is just at one position, hence making it efficient to calculate the solutions. For example, if we have two 'choice' bits, the following explains the difference between simply running through all combinations in an arithmetic progression, and going through them in a 1-bit-change order:
1-bit-toggle-order decreasing order
---------------------- ----------------------
1 1 // start 1 1 // start
1 0 // toggle bit 0 1 0 // subtract 1
0 0 // toggle bit 1 0 1 // subtract 1
0 1 // toggle bit 0 0 0 // subtract 1
(We want to exploit the speed of bitwise operations, hence we will use the 1-bit-toggle order).
Now, we will build each solution: (This is not actual C code, just an explanation)
addToSet(start_num); // add the initial solution to the set
for(i=1; i<2^k; i++)
{
pos = 0;
count = i;
while( ( count & 1) != 0)
{
count = count>>1;
pos++;
}
toggle(start_num[pos_array[pos]]); // update start_num by toggling the desired bit
addToSet(start_num); // Add the updated vector to the set
}
If this code is run on the above example, the following toggle statements will be executed:
toggle(start_num[0])
toggle(start_num[1])
toggle(start_num[0])
toggle(start_num[3])
toggle(start_num[0])
toggle(start_num[1])
toggle(start_num[0])
, which will result in the following additions:
addToSet([1 1 1 0 1 0])
addToSet([1 1 1 0 0 0])
addToSet([1 1 1 0 0 1])
addToSet([1 1 0 0 0 1])
addToSet([1 1 0 0 0 0])
addToSet([1 1 0 0 1 0])
addToSet([1 1 0 0 1 1])
, which, in addition to the already-present initial solution [1 1 1 0 1 1], completes the set.
NOTE: I am not an expert in bitwise operations, besides other things. I think there are better ways to write the algorithm, making better use of bit-access pointers and bitwise binary operations (and will be glad if someone can suggest improvements). What I am proposing with this solution is the general approach to this problem.
You can construct the digit options for each slot i by evaluating:
for d in (0, 1):
if (array1[i] or d) == array2[i]):
digits[i].append(d)
Then you just need to iterate over i.
The objective is to construct a list of lists: [[0,1],[1],[0,1]] showing the valid digits in each slot. Then you can use itertools.product() to construct all of the valid arrays:
arrays = list(itertools.product(*digits))
You can put all this together using list comprehensions and this would result in:
list(it.product(*[[d for d in (0, 1) if (x or d) == y] for x, y in zip(array1, array2)]))
In action:
>>> import itertools as it
>>> a1, a2 = [1,1,1], [1,1,1]
>>> list(it.product(*[[d for d in (0, 1) if (x or d) == y] for x, y in zip(a1, a2)]))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]
>>> a1, a2 = [1,0,0], [1,1,1]
>>> list(it.product(*[[d for d in (0, 1) if (x or d) == y] for x, y in zip(a1, a2)]))
[(0, 1, 1), (1, 1, 1)]
>>> a1, a2 = [1,0,0], [0,1,1]
>>> list(it.product(*[[d for d in (0, 1) if (x or d) == y] for x, y in zip(a1, a2)]))
[]

Pandas : determine mapping from unique rows to original dataframe

Given the following inputs:
In [18]: input
Out[18]:
1 2 3 4
0 1 5 9 1
1 2 6 10 2
2 1 5 9 1
3 1 5 9 1
In [26]: df = input.drop_duplicates()
Out[26]:
1 2 3 4
0 1 5 9 1
1 2 6 10 2
How would I go about getting an array that has the indices of the rows from the subset that are equivalent, eg:
resultant = [0, 1, 0, 0]
I.e. the '1' here is basically stating that (row[1] in input) == (row[1] in df). Since there will be fewer unique rows than there will be multiple values in 'resultant' that will equate to similar values in df. i.e (row[k] in input == row[k+N] in input) == (row[1] in df) could be a case.
I am looking for actual row number mapping from input:df.
While this example is trivial in my case i have a ton of dropped mappings that might map to one index as an example.
Why do I want this? I am training an autoencoder type system where the target sequence is non-unique.
One way would be to treat it as a groupby on all columns:
>> df.groupby(list(df.columns)).groups
{(1, 5, 9, 1): [0, 2, 3], (2, 6, 10, 2): [1]}
Another would be to sort and then compare, which is less efficient in theory but could very well be faster in some cases and is definitely easier to make more tolerant of error:
>>> ds = df.sort(list(df.columns))
>>> eqs = (ds != ds.shift()).all(axis=1).cumsum()
>>> ds.index.groupby(eqs)
{1: [0, 2, 3], 2: [1]}
This seems the right datastructure to me, but if you really do want an array with the group ids, that's easy too, e.g.
>>> eqs.sort_index() - 1
0 0
1 1
2 0
3 0
dtype: int64
Don't have pandas installed on this computer, but I think you could use df.iterrows() like:
def find_matching_row(row, df_slimmed):
for index, slimmed_row in df_slimmed.iterrows():
if slimmed_row.equals(row[slimmed_row.columns]):
return index
def rows_mappings(df, df_slimmed):
for _, row in df.iterrows():
yield find_matching_row(row, df_slimmed)
list(rows_mappings(df, input))
This is if you are interested in generating the resultant list in your example, I don't quite follow the latter part of your reasoning.

Categories