I have two equal sized arrays ( array1 and array2 ) of 0's and 1's. How do I get all the arrays whose bit wise union with array1 result into array2 ? For example,if array1 = [1, 1, 1] and array2 = [1, 1, 1]. Output should be all eight arrays : [0, 0, 0], [1, 0, 0], ...., [1, 1, 1] . Are there efficient solutions to it or only brute force is the way ?
My try :
I tried to calculate bit wise difference first and if any of bit is negative then return false( not possible to combine first array with any kind of array to get array2). If all bits are non-negative then .... if bit in difference is 0 then it can be replaced by 0 or 1 either( this is wrong assumption albeit and fails for if array1= [0,0] , array2= [0,0], and if any bit in difference is 0 then required array has to have 1 at that place to make it 1
Here's how I would go about solving this problem:
First, let's think about this. You need to find all arrays of binary values that, when combined (via some operator) with a known binary value, = a new binary value. Don't try to solve the problem yet. Assume you need to go from 00 to 11. How many possible answers are there? Assume you need to go from 11 to 11. How many possible answers are there? Can you do any better (in the worst case) than a brute force approach? That'll give you a complexity bound.
With that rough bound in mind, tackle the bits of the question that are a bit curious. Drill down onto the question a little bit more. What is the 'bitwise union operator'? Is it 'and'? Is it 'or'? Is it something more complicated? 'Bitwise Union' sounds like B[i] = A[i] OR X[i], but anyone asking that question could mean something else..
Depending on the answer to questions 1 and 2, you have a lot to work with here. I can think of a few different options, but I think from here you can come up with an algorithm.
Once you have a solution, you need to think about "Can I do a better job here'? A lot of that goes back to the initial impressions about the problem and how they're constructed, and what/how much you think you can optimize.
Note: I will explain the following with an example input:
A = [0 0 1 0 1 1], B = [1 1 1 0 1 1]
Assuming you want to calculate X for the equations A OR X = B, let us see what are the options for each choice of bit in A and B:
A OR X = B
--------------------
0 0 0
0 1 1
1 N.A. 0
1 (0,1) 1
If any bit in A is 1, and its corresponding B bit is 0, there are no solutions possible. Return an empty set.
If the corresponding bits in A and B are 1, the corresponding bit in X does not matter.
Now, see that one solution for X is B itself, (if condition #1, as stated above, is satisfied). Hence, lets construct a number start_num = B. This will be one solution, and the other solutions will be constructed from this.
start_num = B = [1 1 1 0 1 1]
The 'choice' bits are those where X can take any value, i.e. those positions where A=1 and B=1. Let us make another number choice = A AND B, so that choice = 1 denotes those positions. Also notice that, if there are k positions where choice = 1, the total number of solutions is 2^k.
choice = A AND B = [0 0 1 0 1 1] ,hence, k = 3
Store these 'choice' positions in an array (of length k), starting from the right (LSB = 0). Let us call this array pos_array.
pos_array = [0 1 3]
Notice that all the 'choice' bits in start_num are set to 1. Hence, all the other solutions will have some (1 <= p <= k) of these bits set to 0. Now that we know which bits are to be changed, we need to make these solutions in an efficient manner.
This can be done by making all solutions in an order where the difference between the previous solution and the current one is just at one position, hence making it efficient to calculate the solutions. For example, if we have two 'choice' bits, the following explains the difference between simply running through all combinations in an arithmetic progression, and going through them in a 1-bit-change order:
1-bit-toggle-order decreasing order
---------------------- ----------------------
1 1 // start 1 1 // start
1 0 // toggle bit 0 1 0 // subtract 1
0 0 // toggle bit 1 0 1 // subtract 1
0 1 // toggle bit 0 0 0 // subtract 1
(We want to exploit the speed of bitwise operations, hence we will use the 1-bit-toggle order).
Now, we will build each solution: (This is not actual C code, just an explanation)
addToSet(start_num); // add the initial solution to the set
for(i=1; i<2^k; i++)
{
pos = 0;
count = i;
while( ( count & 1) != 0)
{
count = count>>1;
pos++;
}
toggle(start_num[pos_array[pos]]); // update start_num by toggling the desired bit
addToSet(start_num); // Add the updated vector to the set
}
If this code is run on the above example, the following toggle statements will be executed:
toggle(start_num[0])
toggle(start_num[1])
toggle(start_num[0])
toggle(start_num[3])
toggle(start_num[0])
toggle(start_num[1])
toggle(start_num[0])
, which will result in the following additions:
addToSet([1 1 1 0 1 0])
addToSet([1 1 1 0 0 0])
addToSet([1 1 1 0 0 1])
addToSet([1 1 0 0 0 1])
addToSet([1 1 0 0 0 0])
addToSet([1 1 0 0 1 0])
addToSet([1 1 0 0 1 1])
, which, in addition to the already-present initial solution [1 1 1 0 1 1], completes the set.
NOTE: I am not an expert in bitwise operations, besides other things. I think there are better ways to write the algorithm, making better use of bit-access pointers and bitwise binary operations (and will be glad if someone can suggest improvements). What I am proposing with this solution is the general approach to this problem.
You can construct the digit options for each slot i by evaluating:
for d in (0, 1):
if (array1[i] or d) == array2[i]):
digits[i].append(d)
Then you just need to iterate over i.
The objective is to construct a list of lists: [[0,1],[1],[0,1]] showing the valid digits in each slot. Then you can use itertools.product() to construct all of the valid arrays:
arrays = list(itertools.product(*digits))
You can put all this together using list comprehensions and this would result in:
list(it.product(*[[d for d in (0, 1) if (x or d) == y] for x, y in zip(array1, array2)]))
In action:
>>> import itertools as it
>>> a1, a2 = [1,1,1], [1,1,1]
>>> list(it.product(*[[d for d in (0, 1) if (x or d) == y] for x, y in zip(a1, a2)]))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]
>>> a1, a2 = [1,0,0], [1,1,1]
>>> list(it.product(*[[d for d in (0, 1) if (x or d) == y] for x, y in zip(a1, a2)]))
[(0, 1, 1), (1, 1, 1)]
>>> a1, a2 = [1,0,0], [0,1,1]
>>> list(it.product(*[[d for d in (0, 1) if (x or d) == y] for x, y in zip(a1, a2)]))
[]
Related
I am trying to validate if any numbers are duplicates in a 9x9 array however need to exclude all 0 as they are the once I will solve later. I have a 9x9 array and would like to validate if there are any duplicates in the rows and columns however excluding all 0 from the check only numbers from 1 to 9 only. The input array as example would be:
[[1 0 0 7 0 0 0 0 0]
[0 3 2 0 0 0 0 0 0]
[0 0 0 6 0 0 0 0 0]
[0 8 0 0 0 2 0 7 0]
[5 0 7 0 0 1 0 0 0]
[0 0 0 0 0 3 6 1 0]
[7 0 0 0 0 0 2 0 9]
[0 0 0 0 5 0 0 0 0]
[3 0 0 0 0 4 0 0 5]]
Here is where I am currently with my code for this:
#Checking Columns
for c in range(9):
line = (test[:,c])
print(np.unique(line).shape == line.shape)
#Checking Rows
for r in range(9):
line = (test[r,:])
print(np.unique(line).shape == line.shape)
Then I would like to do the exact same for the 3x3 sub arrays in the 9x9 array. Again I need to somehow exclude the 0 from the check. Here is the code I currently have:
for r0 in range(3,9,3):
for c0 in range(3,9,3):
test1 = test[:r0,:c0]
for r in range(3):
line = (test1[r,:])
print(np.unique(line).shape == line.shape)
for c in range(3):
line = (test1[:,c])
print(np.unique(line).shape == line.shape)
``
I would truly appreciate assistance in this regard.
It sure sounds like you're trying to verify the input of a Sudoku board.
You can extract a box as:
for r0 in range(0, 9, 3):
for c0 in range(0, 9, 3):
box = test1[r0:r0+3, c0:c0+3]
... test that np.unique(box) has 9 elements...
Note that this is only about how to extract the elements of the box. You still haven't done anything about removing the zeros, here or on the rows and columns.
Given a box/row/column, you then want something like:
nonzeros = [x for x in box.flatten() if x != 0]
assert len(nonzeros) == len(set(nonzeros))
There may be a more numpy-friendly way to do this, but this should be fast enough.
Excluding zeros is fairly straight forward by masking the array
test = np.array(test)
non_zero_mask = (test != 0)
At this point you can either check the whole matrix for uniqueness
np.unique(test[non_zero_mask])
or you can do it for individual rows/columns
non_zero_row_0 = test[0, non_zero_mask[0]]
unique_0 = np.unique(non_zero_row_0)
You can add the logic above into a loop to get the behavior you want
As for the 3x3 subarrays, you can loop through them as you did in your example.
When you have a small collection of things (small being <=64 or 128, depending on architecture), you can turn it into a set using bits. So for example:
bits = ((2**board) >> 1).astype(np.uint16)
Notice that you have to use right shift after the fact rather than pre-subtracting 1 from board to cleanly handle zeros.
You can now compute three types of sets. Each set is the bitwise OR of bits in a particular arrangement. For this example, you can use sum just the same:
rows = bits.sum(axis=1)
cols = bits.sum(axis=0)
blocks = bits.reshape(3, 3, 3, 3).sum(axis=(1, 3))
Now all you have to do is compare the bit counts of each number to the number of non-zero elements. They will be equal if and only if there are no duplicates. Duplicates will cause the bit count to be smaller.
There are pretty efficient algorithms for counting bits, especially for something as small as a uint16. Here is an example: How to count the number of set bits in a 32-bit integer?. I've adapted it for the smaller size and numpy here:
def count_bits16(arr):
count = arr - ((arr >> 1) & 0x5555)
count = (count & 0x3333) + ((count >> 2) & 0x3333)
return (count * 0x0101) >> 8
This is the count of unique elements for each of the configurations. You need to compare it to the number of non-zero elements. The following boolean will tell you if the board is valid:
count_bits16(rows) == np.count_nonzero(board, axis=1) and \
count_bits16(cols) == np.count_nonzero(board, axis=0) and \
count_bits16(blocks) == np.count_nonzero(board.reshape(3, 3, 3, 3), axis=(1, 3))
Here is a 5x5 matrix, with all cells unknown, it looks something like this:
A1+B1+C1+D1+E1| 1
A2+B2+C2+D2+E2| 0
A3+B3+C3+D3+E3| 1
A4+B4+C4+D4+E4| 3
A5+B5+C5+D5+E5| 2
_______________
2 1 2 1 1
So, the summation of rows can be seen on the right, and the summation of columns can be seen on the bottom. The solution can only be 0 or 1, and as an example here is the solution to the specific one I have typed out above:
0+0+1+0+0| 1
0+0+0+0+0| 0
1+0+0+0+0| 1
1+1+0+0+1| 3
0+0+1+1+0| 2
____________
2 1 2 1 1
As you can see, summing the rows and columns gives the results on the right and bottom.
My question: How would you go about entering the original matrix with unknowns and having python iterate each cell with 0 or 1 until the puzzle is complete?
You don't really need a matrix -- just use vectors (tuples) of length 25. They can represent 5x5 matrices according to the following scheme:
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
These are the indices of such tuples. Note that the row and column of an index can be obtained from the function divmod.
You can use product from itertools to iterate over the 2**25 possible ways of filling in the matrix.
These ideas lead to the following code:
from itertools import product
#nxn matrices will be represented by tuples of length n**2,
#in row-major order
#the following function caluculates row and column sums:
def all_sums(array,n):
row_sums = [0]*n
col_sums = [0]*n
for i,x in enumerate(array):
q,r = divmod(i,n)
row_sums[q] += x
col_sums[r] += x
return row_sums, col_sums
#in what follows, row_sums, col_sums are lists of target values
def solve_puzzle(row_sums, col_sums):
n = len(row_sums)
for p in product(range(2),repeat = n*n):
if all_sums(p,n) == (row_sums, col_sums):
return p
return "no solution"
solution = solve_puzzle([1,0,1,3,2],[2,1,2,1,1])
for i in range(0,25,5):
print(solution[i:i+5])
Output:
(0, 0, 0, 0, 1)
(0, 0, 0, 0, 0)
(0, 0, 0, 1, 0)
(1, 1, 1, 0, 0)
(1, 0, 1, 0, 0)
In this case brute-force was feasible. If you go much beyond 5x5 it would no longer be feasible, and more sophisticated algorithms would be required.
This is a special case of an integer linear programming problem. The special case of 0-1 integer linear programming is still unfortunately NP-complete, though there exist many algorithms including heuristic ones. You can use a built-in library to do this for you.
Good morning, I have a simple question about applying a different if statement to every element of a numpy array.
I have written a function that take as input a numpy array made of 12 elements, checks if the element is 0 or 1 and, if it's 1, acts on another array. The function is the following:
def symmetry_test(determinant):
print(determinant)
ag=np.array([1,1,1,1])
bg=np.array([1,-1,-1,1])
au=np.array([1,1,-1,-1])
bu=np.array([1,-1,1,-1])
representations=np.zeros((4,12))
print(determinant[0])
if int(determinant[0])==1:
representations[:,0]=au
print(determinant[1])
if int(determinant[1])==1:
representations[:,1]=au
print(determinant[2])
if determinant[2]==1:
representations[:,2]=ag
print(determinant[3])
if determinant[3]==1:
representations[:,3]=ag
if determinant[4]==1:
representations[:,4]=bg
if determinant[5]==1:
representations[:,5]=bg
if determinant[6]==1:
representations[:,6]=ag
if determinant[7]==1:
representations[:,7]=ag
if determinant[8]==1:
representations[:,1]=bu
if determinant[9]==1:
representations[:,9]=bu
if determinant[10]==1:
representations[:,10]=au
if determinant[11]==1:
representations[:,11]=au
idx = np.argwhere(np.all(representations[..., :] == 0, axis=0))
representations = np.delete(representations, idx, axis=1)
return representations
The function takes determinant as input, which is a numpy array, generates an array called representations and fills it. I put print(determinant[0])and int(determinant[0]) in the definition to check if the function reads the array properly.
The problem is the following: if I give as input an array defined as test=np.array([1,1,1,1,1,1,0,0,0,0,0,0]) the function works fine and returns and array like
1 1 1 1 1 1
1 1 1 1 -1 -1
-1 -1 1 1 1 1
-1 -1 1 1 -1 -1
which is exactly what I want.
Now, if I give to the function the array test=np.array([1,1,1,1,0,0,0,0,1,1,0,0]) and use it as a=symmetry_test(test),the output is
1 1 1 1 1
1 -1 1 1 -1
-1 1 1 1 1
-1 -1 1 1 -1
(yes, it only has 5 columns)
instead of
1 1 1 1 1 1
1 1 1 1 -1 -1
-1 -1 1 1 -1 -1
-1 -1 1 1 1 1
Honestly I have no idea of the reason why it doesn't work and what puzzles me the most is the fact that for one array it works and for another it fails completely.
I tried to punt the else condition
else:
representations[:,0]=np.zeros(4)
after each if statement without success; I also tried to put determinant=np.asarray(determinant) at the beginning of the function but, also in this case, it didn't solve the problem.
Any suggestion will be greatly appreciated.
Thanks in advance and sorry for the easy question.
It's a bug in your code.
if determinant[8] == 1:
representations[:, 1] = bu
Should be
if determinant[8] == 1:
representations[:, 8] = bu
And if you want a more concise way of implementing that function, consider this -
def symmetry_test(determinant):
ag = np.array([1, 1, 1, 1])
bg = np.array([1, -1, -1, 1])
au = np.array([1, 1, -1, -1])
bu = np.array([1, -1, 1, -1])
representations = np.array([au, au, ag, ag, bg, bg, ag, ag, bu, bu, au, au])
determinant = np.array(determinant, dtype=np.bool)
return representations[determinant]
Please excuse my naivete as I don't have much programming experience. While googling something for an unrelated question, I stumbled upon this:
https://www.geeksforgeeks.org/find-number-of-solutions-of-a-linear-equation-of-n-variables/
I completely understand the first (extremely inefficient) bit of code. But the second:
def countSol(coeff, n, rhs):
# Create and initialize a table
# to store results of subproblems
dp = [0 for i in range(rhs + 1)]
dp[0] = 1
# Fill table in bottom up manner
for i in range(n):
for j in range(coeff[i], rhs + 1):
dp[j] += dp[j - coeff[i]]
return dp[rhs]
confuses me. My question being: why does this second program count the number of non-negative integer solutions?
I have written out several examples, including the one given in the article, and I understand that it does indeed do this. And I understand how it is populating the list. But I don't understand exactly why this works.
Please excuse what must be, to some, an ignorant question. But I would quite like to understand the logic, as I think it rather clever that such a little snip-it is able able to answer a question as general as "How many non negative integer solutions exist" (for some general equation).
This algorithms is pretty cool and demonstrates the power of looking for a solution from a different perspective.
Let's take a example: 3x + 2y + z = 6, where LHS is the left hand side and RHS is the right hand side.
dp[k] will keep track of the number of unique ways to arrive at a RHS value of k by substituting non-negative integer values for LHS variables.
The i loop iterates over the variables in the LHS. The algorithm begins with setting all the variables to zero. So, the only possible k value is zero, hence
k 0 1 2 3 4 5 6
dp[k] = 1 0 0 0 0 0 0
For i = 0, we will update dp to reflect what happens if x is 1 or 2. We don't care about x > 2 because the solutions are all non-negative and 3x would be too big. The j loop is responsible for updating dp and dp[k] gets incremented by dp[k - 3] because we can arrive at RHS value k by adding one copy of the coefficient 3 to k-3. The result is
k 0 1 2 3 4 5 6
dp[k] = 1 0 0 1 0 0 1
Now the algorithm continues with i = 1, updating dp to reflect all possible RHS values where x is 0, 1, or 2 and y is 0, 1, 2, or 3. This time the j loop increments dp[k] by dp[k-2] because we can arrive at RHS value k by adding one copy of the coefficient 2 to k-2, resulting in
k 0 1 2 3 4 5 6
dp[k] = 1 0 1 1 1 1 2
Finally, the algorithm incorporates z = 1, 2, 3, 4, 5, or 6, resulting in
k 0 1 2 3 4 5 6
dp[k] = 1 1 2 3 4 5 7
In addition to computing the answer in pseudo-polynomial time, dp encodes the answer for every RHS <= the input right hand side.
I have a numpy.ndarray called grouping of size (S, N). Each row of grouping gives me the group labels of a sample of data. I run my algorithm S times and get new group labels in each iteration.
I want to determine how many times each sample of my data has the same group label as every other sample of my data across the S iterations in a fully vectorized way.
In a not-completely-vectorized way:
sim_matrix = np.zeros((N, N))
for s in range(S):
sim_matrix += np.equal.outer(grouping[s, :], grouping[s, :])
One vectorized approach would be with broadcasting -
(grouping[:,None,:] == grouping[:,:,None]).sum(0)
For performance, we can use np.count_nonzero -
np.count_nonzero(grouping[:,None,:] == grouping[:,:,None],axis=0)
The sum of equal.outer is a cryptic way of calculating all-pairs similarity of columns:
sum_i sum_jk (A[i,j] == A[i,k]) is the same as
sum_jk sum_i (A[i,j] == A[i,k])
where sum_i loops over rows, sum_jk over all pairs of columns.
Comparing two vectors by counting the the number of positions where they differ
is called
Hamming distance .
If we change == above to !=, similarity to distance = nrows - similarity
(most similar ⇔ distance 0), we get the problem:
find the Hamming distance between all pairs of a bunch of vectors:.
def allpairs_hamming( A, dtype=np.uint32 ):
""" -> Hamming distances between all pairs of rows of A """
nrow, ncol = A.shape
allpair_dist = np.zeros( [nrow, nrow], dtype=dtype )
for j in xrange(nrow):
for k in xrange( j + 1, nrow ):
allpair_dist[j,k] = allpair_dist[k,j] = (A[j] != A[k]).sum() # row diff
return allpair_dist
allpairs_hamming: 30.7 sec, 3 ns per cmp Nvec 2000 Veclen 5000 A 10m pairdist uint32 15m
Almost all the cpu time is in the row diff, not in the outer loop for j ... for k -- 3 ns per scalar compare, on a stock mac, isn't bad.
However memory caching is much faster if each row A[j] is in contiguous memory,
as for numpy C-order arrays.
Apart from that, whether you do "all pairs of rows" or "all pairs of columns"
doesn't matter, as long as you're clear.
(Is it possible to find "nearby" pairs in time and space < O(npairs), here O(20000^2) ? Afaik there are more methods than test cases.)
See also:
http://docs.scipy.org/doc/scipy/reference/spatial.distance.html (bug: hamming .mean not .sum)
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html
https://stats.stackexchange.com/search?q=[clustering]+pairwise
You want to compare identic rows. A way to do that is grouping the entire rows in a raw block :
S,N=12,2
a=np.random.randint(0,3,(S,N)) #12 samples of two labels.
#a
0 1
0 2 2
1 2 0
2 1 2
3 0 0
4 0 1
5 1 1
6 0 1
7 0 1
8 0 1
9 0 0
10 2 2
11 0 0
samples=np.ascontiguousarray(a).view(dtype((void,a.strides[0])))
sample.shape is then (S,1).
you can now inventory your sample with np.unique, and use Pandas dataframes for pretty report :
_,inds,invs=np.unique(samples,return_index=True, return_inverse=True)
df=pd.DataFrame(invs)
result=df.reset_index().groupby(0).index.apply(list).to_frame()
result['sample']=[list(x) for x in a[inds]]
for
index samples
0
0 [3, 9, 11] [0, 0]
1 [4, 6, 7, 8] [0, 1]
2 [5] [1, 1]
3 [2] [1, 2]
4 [1] [2, 0]
5 [0, 10] [2, 2]
It can be a O(S ln S) if there is few fits between samples, when yours is O( N²S).