Compress long vector consists of 0 and 1

Compress long vector consists of 0 and 1 - python

Say I got a vector of size [1 x 300], in which each element consists of 0 or 1. I might need to store a bunch of these iteratively during the run time. How do I effectively represent it so that I can effeciently store them (python)?
I guess there are two ways to do it. The first method is to do something like a bitmap (do they even have this in python)?
The second approach
I was thinking maybe is to store the 1's position.
eg. [0, 1, 1, 1]. I will store them as [1,2,3].
Any ideas?

An alternative often used in raster filled shapes processing (where you typically have large uniform areas) is to store your data as spans, i.e. store just the length of each run of 0s or 1s (essentially, it's RLE with the item of each run implicit in the position). You can choose arbitrarily that the first value (and so, all even values) represents a run of 0s, while the second (and so, all odd values) a run of 1s. So, something like
0 0 0 0 0 1 1 0 0 0 1 1 1 1
becomes
5 2 3 4
Appending to such a structure is trivial:
def append(l, value):
cur = (len(l) + 1) % 2
if value == cur:
l[-1] += 1
else:
l.append(1)

Related

Moves to make all the elements in a array equal in Python

I came across a problem in which the input is as follows:
5
1 1 1 1 6
And the expected output is 4
Basically what we are trying to do is print the minimum number of moves it will require to make all the 5 values equal to each other. One move means reducing a location and incrementing another location. If it is not possible to make them all equal, we print -1.
I tried the below approach:
def solution(N, W):
counter=0
W.sort()
k=W[int(N/2)]
for i in range(0,N):
counter=counter+abs(W[i]-k)
if N%2==0:
tempC=0
k=W[int(N/2)-1]
for i in range(0,N):
tempC=tempC+abs(A[i]-k)
counter=min(counter,tempC)
return counter
and am getting 5 as the answer. Kindly share what your function to achieve this would be.

Lets see how logic works with your input.
5 1 1 1 1 6
1. If 1 2 1 3 3 this case possible then finally it show look like this 2 2 2 2 2. What are the things we get from this result Sum(INITAL_LIST) is equal to SUM(FINAL_LIST), this is 1st condition, if this hold pattern is possible.
2. Among all index-value some of them is going to leave some value some will take. Decrement of one-index and Increment of another-index is taken as one step, so we will consider only decrement case. Those who are leaving and finally become equal to 2. So total-step is equal to some of decremented index value.
Here I use vectorization properties of numpy for easy operation.
CODE :
import numpy as np
def solution(N, W):
if sum(W)%N !=0:
return -1
eq = sum(W)//N
step = np.array(W)-eq
step_sum = np.sum(step[step>0])
return step_sum
if __name__ == '__main__':
var_, list_ = input().split(maxsplit=1)
print(solution(int(var_), list(int(i) for i in list_.split())))
WITHOUT NUMPY :
Update :
step = np.array(W)-eq
step_sum = np.sum(step[step>0])
To :
step = [i-eq for i in W]
step_sum = sum(i for i in step if i>0)
OUTPUT :
5 1 1 1 1 6
4

Why do the Even Ns take longer than the Odd Ns?

I have some code here that solves the n queens problem using backtracking in python. When I run it, the odds always take much less time than the evens. This is especially evident when the n gets to around 20+. Does anyone know why this is?
import time
global N
N = int(input("Enter N: "))
def display_solution(board):
print('\n'.join(['\t'.join([str(cell) for cell in row]) for row in
board]))
def safe(board, row, col):
for i in range(col):
if board[row][i] == 1:
return False
for i, j in zip(range(row, -1, -1), range(col, -1, -1)):
if board[i][j] == 1:
return False
for i, j in zip(range(row, N, 1), range(col, -1, -1)):
if board[i][j] == 1:
return False
return True
def solve_help(board, col):
if col >= N:
return True
for i in range(N):
if safe(board, i, col):
board[i][col] = 1
if solve_help(board, col + 1) is True:
return True
board[i][col] = 0
return False
def solve():
board = [[0 for x in range(N)] for y in range(N)]
if solve_help(board, 0) is False:
print("Solution does not exist")
return False
display_solution(board)
return True
start = time.clock()
solve()
end = time.clock()
print(N, "\t", end-start)
I'm assuming it must have something to do with the diagonals being different for odds as opposed to evens. I'm also not sure if this is an issue with all backtracking algorithms for this problem, or just this specific code. Either way, thanks for the help.

The algorithm takes considerable more time when in one of the first columns backtracking occurs and a next row must be tried. And comparing odd-N boards with N-1 boards shows indeed that often the solution for the even board needs to do more such backtracking/try-next processing. For example the top-left corner of the solution for N=19 looks like this:
1 0 0 0 0
0 0 0 1 0
0 1 0 0 0
0 0 0 0 1*
0 0 1 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
These 5 queens in the first five columns are found quickly, as they are the first that do not collide with the previous queens. And apparently the other queens can be placed without having to reconsider these first five queens.
For N=18 that same corner of the solution looks like this:
1 0 0 0 0
0 0 0 1 0
0 1 0 0 0
0 0 0 0 0-
0 0 1 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 1*
Note the position marked with a minus. This one looks promising (as it was for the 19-board): its investigation takes some considerable time before the conclusion is drawn that the other queens cannot be placed correctly. This early failure costs.
Thus the solution for the 19 board is found sooner than for the 18 board.
Note that the solution for 27 takes slightly more time than the one for 26, although this is not significant: it looks like the time complexity is O(2n), and so to compare times of different board sizes it would better be done on a logarithmic Y-axis:
"work" represents the number of times the function safe is executed.
Whether this algorithm always takes relatively more time for even boards (compared to the time needed for a N+1 board) is unclear, but for these few board sizes it seems related to the knight-jumps that are naturally formed by this algorithm, starting from the top-left corner. Note how this pattern works out perfectly for board sizes 5 and 7: the first spot where the next queen can sit without interfering with the previous positioned queens is always part of the solution. While for board sizes 4 and 6 there isn't even any solution with a queen in a corner, which is the starting point of this algorithm.
Alternatives
To take this question from a programmer's point of view, there is one remedy whereby the difference will (on average) evaporate: pick the columns in a different (or even random) order. It turns out that taking the normal order is one of the less efficient ways to the get a solution.
Shift Pick
A simple shift in this algorithm, where you do not consider the first two rows unless all other fail, already changes the stats considerably:
In solve_help change this:
for i in range(N):
to:
for i in range(N):
i = (i + 2) % N
See how now the average performance has improved: all measures of log(work)/n are below 1, except for n=6. But also: the peeks are now more often for odd values of N.
Random pick
for i in random.sample(range(N), N):
Here is how one such random run turned out:
Much better stats than the original order! Of course, you will get a bad stat now and then, ... because it is random. But on average it performs (much) better.
Other ways of non-random order could include col, and N//2 with differing coefficients, like i = (i + col*2 + N//2) % N, ...etc. But see final remark below.
Other remarks
I would apply the following optimisation: keep track of which rows, forward "diagonals" and backward "diagonals" are already taken. You can use list(s) or set(s) for that. Note that two cells are in the same forward diagonal if the sum of their coordinates are equal. Cells on the backward diagonals have in common that the difference of their coordinates is equal. That way you don't have to scan for a "1" in these lines each time.
Also, board could be just a list of column numbers. There is no need to store all those zeroes. Keep that for the display only.
Finally, there are simple ways to get a solution in linear time. See Wikipedia.

Iterating over a large list

I have a list in the range [465868129, 988379794] both inclusive. When I use the following code I get a Memory Error. What can I do?
r = [465868129, 988379794]
list = [x for x in xrange(r[0], r[1]+1)]

You could iterate over the xrange directly instead of creating a list.
for x in xrange(r[0], r[1] + 1):
...
But iterating over such a large range is a very, very slow way to find squares. The fact that you run out of memory should alert you that a different approach is needed.
A much better way would be to take the square roots of each end point and then iterate between the square roots. Each integer between the square roots, when squared, would give you one of the numbers you're searching for.
In fact, if you're clever enough, you can generate all the squares with a single list comprehension and avoid an explicit for loop entirely.

Unless you have a very good reason to store the list items in a list, iterate over the generator instead, that way Python won't need to allocate a lot of memory (causing your Memory Error) to create that list:
init, end = (465868129, 988379794)
items = xrange(init, end + 1)
for item in items:
#Do something with item
To count squares on an arbitrary range consider the following formula:
import math
number_of_squares = int(math.sqrt(end) - math.sqrt(init)) +
op(is_perfect_square(init), is_perfect_square(end))
The is_perfect_square(n) is another problem on its own, so check this post if interested.
The op is used to adjust the number of squares when the init and end of the intervals init (or/and/neither) end are perfect squares. So we need a function with the following characteristics:
Both numbers are perfect squares: Eg: 25,64 => 8 - 5 = 3 (and there are 4 squares on that range). (it should sum 1 more)
End is a perfect square: Eg: 26,64 => 8 - 5 = 3 (There are 3 squares on that range). (it is correct => it should sum 0)
Init is a perfect square: Eg: 25,65 => 8 - 5 = 3 (There are 4 squares on that range). (it should sum 1 more)
None of the numbers are primes: Eg: 26, 65 => 8 - 5 = 3 (There are 3 squares on that range) (it is correct => it should sum 0)
So we need an operator with the following characteristics, based on the past examples:
1 op 1 = 1 (Both numbers are perfect squares)
0 op 1 = 0 (End is a perfect square)
1 op 0 = 1 (Init is a perfect square)
0 op 0 = 0 (None of the numbers are perfect squares)
Note that the max function almost fulfils our needs, but it fails on the second case max(0,1) = 1 and it should be 0.
So, looks like the result only depends on the first operator: if it's one, the result is 1, on the other hand if it's 0, it returns 0.
So, it's easy to write the function with that in mind:
import math
number_of_squares = int(math.sqrt(end) - math.sqrt(init)) +
int(is_perfect_square(init))
Thanks to #kojiro, we have this approach (having a similar idea), which is easier to read:
from math import sqrt, floor, ceil
number_of_squares = 1 + floor(sqrt(end)) - ceil(sqrt(init))

2D Bit matrix with every possible combination

I need to create a Python generator which yields every possible combination of a 2D bit matrix.
The length of each dimension is variable.
So for a 2x2 matrix:
1.
00
00
2.
10
00
3.
11
00
....
x.
00
01
Higher lengths of the dimentions (up to 200*1000) need to work too.
In the end, I will not need all of the combinations. I need only the ones where the sum in each line is 1. But I need all combinations where this is the case. I would filter them before yielding. Printing is not required.
I want to use this as filter masks to test all possible variations of a data set.
Generating variations like this must be a common problem. Maybe there is even a good library for Python?

Going through all possible values of a bit vector of a given size is exactly what a counter does. It's not evident from your question what order you want, but it looks much like a Gray counter. Example:
from sys import stdout
w,h=2,2
for val in range(2**(w+h)):
gray=val^(val>>1)
for y in range(h):
for x in range(w):
stdout.write('1' if gray & (1<<(w*y+x)) else '0')
stdout.write('\n')
stdout.write('\n')
Note that the dimensions of the vector don't matter to the counter, only the size. Also, while this gives every static pattern, it does not cover all possible transitions.

This can be done using permutation from itertools in the following way.
import itertools
dim=2
dimension = dim*dim
data = [0 for i in range(0,dimension)] + [1 for i in range(0,dimension)]
count = 1
for matrix in set(itertools.permutations(data,dimension)):
print('\n',count,'.')
for i in range(0,dimension,dim):
print(' '.join(map(str,matrix[i:i+dim])))
count+=1
P.S: This will be good for 2X2 matrix but a little bit time consuming and memory consuming for higher order. I would be glad if some one provides the less expensive algorithms for this.

You can generate every possibility of length 2 by using every number from 0 to 4(2 to the power of 2).
0 -> 00
1 -> 01
2 -> 10
3 -> 11
For the displaying part of a number as binary, bin function can be used.
Since you have 2x2 matrix, you need 2 numbers(i and j), each for a row. Then you can just convert these numbers to binary and print them.
for i in range(4):
for j in range(4):
row1 = bin(i)[2:].zfill(2)
row2 = bin(j)[2:].zfill(2)
print row1, "\n" , row2, "\n"
EDIT:
I have found zfill function which fills a string with zeros to make it fixed length.
>>> '1'.zfill(5)
'00001'
Another generic solution might be:
import re
dim1 = 2
dim2 = 2
n = dim1 * dim2
i = 0
limit = 2**n
while i < limit:
print '\n'.join(re.findall('.'*dim2, bin(i)[2:].zfill(n))), '\n'
i += 1

you could do something like this for 3x3 binary matrix:
for i in range(pow(2,9)):
p = '{0:09b}'.format(i)
print(p)
x = []
x.append([p[0],p[1],p[2]])
x.append([p[3],p[4],p[5]])
x.append([p[6],p[7],p[8]])
for i in range(3):
x[i] = map(int, x[i])

Getting the number of digits of nonnegative integers (Python) [duplicate]

This question already has answers here:
How to find length of digits in an integer?
(31 answers)
Closed 6 years ago.
The question asks:
<< BACKGROUND STORY:
Suppose we’re designing a point-of-sale and order-tracking system for a new burger
joint. It is a small joint and it only sells 4 options for combos: Classic Single
Combo (hamburger with one patty), Classic Double With Cheese Combo (2 patties),
and Classic Triple with Cheese Combo (3 patties), Avant-Garde Quadruple with
Guacamole Combo (4 patties). We shall encode these combos as 1, 2, 3, and 4
respectively. Each meal can be biggie sized to acquire a larger box of fries and
drink. A biggie sized combo is represented by 5, 6, 7, and 8 respectively, for the
combos 1, 2, 3, and 4 respectively. >>
Write an iterative function called order_size which takes an order and returns the number of combos in the order. For example, order_size(237) -> 3.
Whereby I should have
order_size(0) = 0
order_size(6) = 1
order_size(51) = 2
order_size(682) = 3
My code is:
def order_size(order):
# Fill in your code here
if order > 0:
size = 0
while order > 0:
size += 1
order = order // 10
return size
else:
return 0
But I don't get the order // 10 portion. I'm guessing it's wrong but I can't think of any stuff to substitute that.

No need for iterative function, you can measure the length of the number by "turning" it into a string:
num = 127
order = len(str(num))
print(order) # prints 3
But if you really want to do it iteratively:
def order(num):
res = 0
while num > 0:
num = int(num / 10)
res += 1
return res
print(order(127)) # prints 3

How about this:
from math import log
def order_size(order):
if order <= 0: return 0
return int(log(order, 10) + 1)
Some samples (left column order, right column order size):
0 0
5 1
10 2
15 2
20 2
100 3
893 3
10232 5

There are a couple errors in your suggested answer.
The else statement and both return statements should be indented a level less.
Your tester questions indicate you are supposed to count the digits for nonnegative integers, not just positive ones (i.e. you algorithm must work on 0).
Here is my suggested alternative based on yours and the criteria of the task.
def order_size(order):
# Fill in your code here
if order >= 0:
size = 0
while order > 0:
size += 1
order = order // 10
return size
else:
return 0
Notice that
By using an inclusive inequality in the if condition, I am allowing 0 to enter the while loop, as I would any other nonnegative single digit number.
By pushing the first return statement back, it executes after the while loop. Thus after the order is counted in the variable size, it is returned.
By pushing the else: back, it executes in the even the if condition is not met (i.e. when the numbers passed to order_size(n) is negative).
By pushing the second return back, it is syntactically correct, and contained in the else block, as it should be.
Now that's taken care of, let me address this:
But I don't get the order // 10 portion.
As of Python 3, the // is a floor division (a.k.a integer division) binary operation.
It effectively performs a standard division, then rounds down (towards negative infinity) to the nearest integer.
Here are some examples to help you out. Pay attention to the last one especially.
10 // 2 # Returns 5 since 10/2 = 5, rounded down is 5
2 // 2 # Returns 1 since 2/2 = 1, rounded down is 1
11 // 2 # Returns 5 since 11/2 = 5.5, rounded down is 5
4 // 10 # Returns 0 since 4/10 = 0.4, rounded down is 0
(-4) // 10 # Returns -1 since (-4)/10 = -0.4, rounded down is -1
For nonnegative numerator n, n // d can be seen as the number of times d fits into n whole.
So for a number like n = 1042, n // 10 would give you how many whole times 10 fits into 1042.
This is 104 (since 1042/10 = 104.2, and rounded down we have 104).
Notice how we've effectively knocked off a digit?
Let's have a look at your while loop.
while order > 0:
size += 1
order = order // 10
Every time a digit is "knocked off" order, the size counter is incremented, thus counting how many digits you can knock off before you hit your terminating step.
Termination occurs when you knock of the final (single) digit. For example, say you reduced order to 1 (from 1042), then 1 // 10 returns 0.
So once all the digits are "knocked off" and counted, your order will have a value of 0. The while loop will then terminate, and your size counter will be returned.
Hope this helps!
Disclaimer: Perhaps this isn't what you want to hear, but many Universities consider copying code from the Internet and passing it off as your own to be plagiarism.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.