Generate kth combination without generating/iterating previous

Generate kth combination without generating/iterating previous - python

Given a set of items, for example:
[ 1, 2, 3, 4, 5, 6 ]
I'd like to generate all possible combinations of a certain length with repetition. The twist is I'd like to start at a predetermined combination (a sort of offset into the list of combinations).
For example, starting with this:
[ 1, 5, 6 ]
The first (next) combination would be:
[ 1, 6, 6 ]
I've had success using itertools.combinations_with_replacement() to generate the combinations, but the project this is for will require working with a set that generates way too many combinations - creating them all first and iterating to the correct point is not possible.
I've found this example for generating kth combination which doesn't seem to be working very well for me. This answer seemed another possibility, but I can't seem to port it from C to Python.
Here's my code so far using the kth combination example:
import operator as op
items = [ 1,2,3,4,5,6 ]
# https://stackoverflow.com/a/4941932/1167783
def nCr(n, r):
r = min(r, n-r)
if r == 0:
return 1
numer = reduce(op.mul, xrange(n, n-r, -1))
denom = reduce(op.mul, xrange(1, r+1))
return numer // denom
# https://stackoverflow.com/a/1776884/1167783
def kthCombination(k, l, r):
if r == 0:
return []
elif len(l) == r:
return l
else:
i = nCr(len(l)-1, r-1)
if k < i:
return l[0:1] + kthCombination(k, l[1:], r-1)
else:
return kthCombination(k-i, l[1:], r)
# get 1st combination of 3 values from list 'items'
print kthCombination(1, items, 3)
# returns [ 1, 2, 4 ]
Any help would be great!

If you assume that all values in the array are digits in a base-n numbering system where n is the length of the array, the k-th combination will be the equivalent of k expressed in base-n.
If you are wanting to start with a given combination (i.e. [1,6,5]) and continue from there, simply read this starting point as a number in base-n. You can then start iterating through successive combinations by incrementing.
EDIT: Further explanation:
Let's start with the array. The array contains 6 values, so we are working in base-6. We will assume the index of each element in the array is the element's base-6 value.
Values in base-6 range from 0 to 5. This may get confusing because our example uses digits, but we could do this with combinations of anything. I will put 'quote' marks around the digits we are combining.
Given a combination ['1', '6', '5'], we first need to convert this to a base-6 value. '1' becomes 0, '6' becomes 5 and '5' becomes 4. Using their positions in the starting value as their powers in base-6, we get:
(0 * 6^0) + (5 * 6^1) + (4 * 6^2) = 174 (decimal)
If we want to know the next combination, we can add 1. If we want to know 20 combinations ahead, we add 20. We can also subtract to go backwards. Let's add 1 to 174 and convert it back to base-6:
175 (decimal) = (1 + 6^0) + (5 * 6^1) + (4 * 6^2) = 451 (base-6) = ['2', '6', '5'] (combination)
For more on number bases, see http://en.wikipedia.org/wiki/Radix and http://en.wikipedia.org/wiki/Base_%28exponentiation%29

Instead of inventing the wheel time number 37,289,423,987,239,489,826,364,653 (that's counting only human beings), you can map the numbers. itertools will return first combination [1,1,1], but you want [1,5,6]. Simply add [0,4,5] mod 6 to each position. You can also map back and forth between numbers, objects, and modulo, of course.
This works even if the number of elements in each position is different.
You will have more fun with what you started, though.

Related

Sum of two squares in Python

I have written a code based on the two pointer algorithm to find the sum of two squares. My problem is that I run into a memory error when running this code for an input n=55555**2 + 66666**2. I am wondering how to correct this memory error.
def sum_of_two_squares(n):
look=tuple(range(n))
i=0
j = len(look)-1
while i < j:
x = (look[i])**2 + (look[j])**2
if x == n:
return (j,i)
elif x < n:
i += 1
else:
j -= 1
return None
n=55555**2 + 66666**2
print(sum_of_two_squares(n))
The problem Im trying to solve using two pointer algorithm is:
return a tuple of two positive integers whose squares add up to n, or return None if the integer n cannot be so expressed as a sum of two squares. The returned tuple must present the larger of its two numbers first. Furthermore, if some integer can be expressed as a sum of two squares in several ways, return the breakdown that maximizes the larger number. For example, the integer 85 allows two such representations 7*7 + 6*6 and 9*9 + 2*2, of which this function must therefore return (9, 2).

You're creating a tuple of size 55555^2 + 66666^2 = 7530713581
So if each element of the tuple takes one byte, the tuple will take up 7.01 GiB.
You'll need to either reduce the size of the tuple, or possibly make each element take up less space by specifying the type of each element: I would suggest looking into Numpy for the latter.
Specifically for this problem:
Why use a tuple at all?
You create the variable look which is just a list of integers:
look=tuple(range(n)) # = (0, 1, 2, ..., n-1)
Then you reference it, but never modify it. So: look[i] == i and look[j] == j.
So you're looking up numbers in a list of numbers. Why look them up? Why not just use i in place of look[i] and remove look altogether?

As others have pointed out, there's no need to use tuples at all.
One reasonably efficient way of solving this problem is to generate a series of integer square values (0, 1, 4, 9, etc...) and test whether or not subtracting these values from n leaves you with a value that is a perfect square.
You can generate a series of perfect squares efficiently by adding successive odd numbers together: 0 (+1) → 1 (+3) → 4 (+5) → 9 (etc.)
There are also various tricks you can use to test whether or not a number is a perfect square (for example, see the answers to this question), but — in Python, at least — it seems that simply testing the value of int(n**0.5) is faster than iterative methods such as a binary search.
def integer_sqrt(n):
# If n is a perfect square, return its (integer) square
# root. Otherwise return -1
r = int(n**0.5)
if r * r == n:
return r
return -1
def sum_of_two_squares(n):
# If n can be expressed as the sum of two squared integers,
# return these integers as a tuple. Otherwise return <None>
# i: iterator variable
# x: value of i**2
# y: value we need to add to x to obtain (i+1)**2
i, x, y = 0, 0, 1
# If i**2 > n / 2, then we can stop searching
max_x = n >> 1
while x <= max_x:
r = integer_sqrt(n-x)
if r >= 0:
return (i, r)
i, x, y = i+1, x+y, y+2
return None
This returns a solution to sum_of_two_squares(55555**2 + 66666**2) in a fraction of a second.

You do not need the ranges at all, and certainly do not need to convert them into tuples. They take a ridiculous amount of space, but you only need their current elements, numbers i and j. Also, as the friendly commenter suggested, you can start with sqrt(n) to improve the performance further.
def sum_of_two_squares(n):
i = 1
j = int(n ** (1/2))
while i < j:
x = i * i + j * j
if x == n:
return j, i
if x < n:
i += 1
else:
j -= 1
Bear in mind that the problem takes a very long time to be solved. Be patient. And no, NumPy won't help. There is nothing here to vectorize.

How to further optimize calculating all the cross sums?

I had some spare time yesterday and somehow thought about calculating cross sums.
My goal is to calculate all the sums up to a given number n. Don't ask why - it's just for fun and to learn stuff.
So for n = 11 I want my result to look something like this: [1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2]
This is my code:
def dynamicCheckSumList(upperLimit):
dynamicChecksumList = []
for i in range(0, 10):
dynamicChecksumList.append(i)
for i in range(10, upperLimit+1):
length = getIntegerPlaces(i)
size = 10**(length-1)
firstNumber = i // size
ancestor = i-(firstNumber*size)
newChecksum = firstNumber + dynamicChecksumList[ancestor]
dynamicChecksumList.append(newChecksum)
return dynamicChecksumList
At first I create my empty list and then populate the numbers 0-9 with their respective trivial sums.
Then I look at all numbers above 9 until the upper limit. Get their length. I then continue with finding out the first digit of the number. After that I calculate the number without that leading digit. For example: If my i is 5432 I will get 432. Since I already saved the cross sum for 432 I can just add that cross sum to my leading digit and I'm basically done.
def getIntegerPlaces(theNumber):
if theNumber <= 999999999999997:
return int(math.log10(theNumber)) + 1
else:
counter = 15
while theNumber >= 10**counter:
counter += 1
return counter
The second function is something I found here at a question of something asking on how to calculate the number of digits in a given number.
Is there any way in here (I guess there will be) to speed up things?
Also appreciated would be tips on how to save on memory. Just for fun I tried to set n to 1 billion. And my memory (16GB) kind of exploded ;)

def digitSums2(n):
n = (n + 9) // 10 * 10 # round up to a multiple of 10
result = bytearray(range(10))
for decade in range(1, n//10):
r_decade = result[decade]
for digit in range(10):
result.append(r_decade + digit)
return result
There are two primary differences are:
bytearray uses a single byte per calculated value, which saves a lot of memory. It only allows numbers up to 255, but it is sufficient for numbers that have less than 26 digits.
Peeling of the last digit is much easier then peeling of the first one.
This should be about as fast as possible in python. Be careful with printing results, since it can take more time than calculation itself (especially if you do in-memory copies).

Finding maximum sum of occurrences of one element in two attempts from a list

Best explained by example. If a python list is -
[[0,1,2,0,4],
[0,1,2,0,2],
[1,0,0,0,1],
[1,0,0,1,0]]
I want to select two sub-lists which will yield the max sum of occurrences of zeros present - where sum is to be calculated as below
SUM = No. of zeros present in the first selected sub-list + No. of zeros present in the second selected sub-list which were not present in the first selected sub-list.
In this case, answer is 5. (First or second sub-list and the last sub-list). (Note that the third sub-list is not to be selected because it has zero present in 3rd index which is same as in first/second sub-list we have to select and it will amount to sum as 4 which will not be maximum if we consider the last sub-list)
What kind of algorithm is best suited if we were to apply it on a big input? Is there a better way to do this in better than in N2 time?

Binary operations are fairly useful for this task:
Convert each sublist to a binary number, where a 0 is turned into a 1 bit, and other numbers are turned into a 0 bit.
For example, [0,1,2,0,4] would be turned into 10010, which is 18.
Eliminate duplicate numbers.
Combine the remaining numbers pairwise and combine them with a binary OR.
Find the number with the most 1 bits.
The code:
lists = [[0,1,2,0,4],
[0,1,2,0,2],
[1,0,0,0,1],
[1,0,0,1,0]]
import itertools
def to_binary(lst):
num = ''.join('1' if n == 0 else '0' for n in lst)
return int(num, 2)
def count_ones(num):
return bin(num).count('1')
# Step 1 & 2: Convert to binary and remove duplicates
binary_numbers = {to_binary(lst) for lst in lists}
# Step 3: Create pairs
combinations = itertools.combinations(binary_numbers, 2)
# Step 4 & 5: Compute binary OR and count 1 digits
zeros = (count_ones(a | b) for a, b in combinations)
print(max(zeros)) # output: 5

The efficiency of the naive algorithm is O(n(n-1)*m) ~ O(n2m) where n is the number of lists and m is the length of each list. When n and m are comparable in magnitude, this equates to O(n3).
It might be helpful to observe that naive matrix multiplication is also O(n3). This might lead us to the following algorithm:
Write each list with only 1's and 0's, where a 1 indicates a non-zero entry.
Arrange these lists in a matrix A.
Compute the product M=AAT.
Find the minimum element in M; the row and column correspond to the lists which produce the maximize number of non-overlapping zeros.
Here, (3) is the limiting step of the algorithm. Asymptotically, depending on your matrix multiplication algorithm, you can achieve a complexity down to roughly O(n2.4).
An example Python implementation would look like:
import numpy as np
lists = [[0,1,2,0,4],
[0,1,2,0,2],
[1,0,0,0,1],
[1,0,0,1,0]]
filtered = list(set(tuple(1 if e else 0 for e in sub) for sub in lists))
A = np.mat(filtered)
D = np.einsum('ik,jk->ij', A, A)
indices= np.unravel_index(np.argmin(D), D.shape)
print(f'{indices}: {len(lists[0]) - D[indices]}') # (0, 3): 0
Note that this algorithm on it's own has the fundamental inefficiency that it is calculating both the lower-triangular and upper-triangular halves of dot product matrix. However, the numpy speed-up will probably offset this from the combinations approach. See the timing results below:
def numpy_approach(lists):
filtered = list(set(tuple(1 if e else 0 for e in sub) for sub in lists))
A = np.mat(filtered, dtype=bool).astype(int)
D = np.einsum('ik,jk->ij', A, A)
return len(lists[0]) - D.min()
def itertools_approach(lists):
binary_numbers = {int(''.join('1' if n == 0 else '0' for n in lst), 2)
for lst in lists}
combinations = itertools.combinations(binary_numbers, 2)
zeros = (bin(a | b).count('1') for a, b in combinations)
return max(zeros)
from time import time
N = 1000
lists = [[random.randint(0, 5) for _ in range(10)] for _ in range(100)]
for name, function in {
'numpy approach': numpy_approach,
'itertools approach': itertools_approach
}.items():
start = time()
for _ in range(N):
function(lists)
print(f'{name}: {time() - start}')
# numpy approach: 0.2698099613189697
# itertools approach: 0.9693171977996826

The algorithm should look something like (with Haskell code as example, so as not to make the process trivial for you in Python:
turn each sublist into "Is zero" or "Isn't zero"
map (map (\x -> if x==0 then 1 else 0)) bigList
Enumerate the list so you can keep indices
enumList = zip [0..] bigList
Compare each sublist with its successive sublists
myCompare = concat . go
where
go [] = []
go ((ix, xs):xss) = [((ix, iy), zipWith (.|.) xs ys) | (iy, ys) <- xss] : go xss
Calculate your maxes
best = maximumBy (compare `on` (sum . snd)) $ myCompare enumList
Pull out the indices
result = fst best

Beautiful sequence

A sequence of integers is beautiful if each element of this sequence is divisible by 4.You are given a sequence a1, a2, ..., an. In one step, you may choose any two elements of this sequence, remove them from the sequence and append their sum to the sequence. Compute the minimum number of steps necessary to make the given sequence beautiful else print -1 if this is not possible.
for i in range(int(input())):
n=int(input())
arr=list(map(int,input().split()))
if((sum(arr))%4)!=0:
print(-1)
continue
else:
counter=[]
for i in range(n):
if arr[i]%4!=0:
counter.append(arr[i])
else:
continue
x=sum(counter)
while(x%4==0):
x=x//4
print(x)
My approach:if the sum of the array is not divisible by 4 then the array can not be beautiful else if the sum of the array mod 4 is equal to zero i count the elements in the array whose mod by 4 is not equal to zero and append them in the list and then find the sum of the list and divide the sum by 4 till its quotient modulus 4 is not equal to zero.what i am doing wrong here?
Edit:I have a working script which works well
for i in range(int(input())):
n=int(input())
arr=list(map(int,input().split()))
count1=0
count2=0
count3=0
summ=0
for i in range(n):
x=arr[i]%4
summ+=x
if x==1:
count1+=1
if x==2:
count2+=1
if x==3:
count3+=1
if (summ%4)!=0:
print(-1)
continue
else:
if count2==0 and count1!=0 and count3==0:
tt=count1//4
print(3*tt)
if count2==0 and count1==0 and count3!=0:
tt=count3//4
print(3*tt)
if count2%2==0 and count1==count3:
print(count2//2+count1)
flag1=min(count1,count3)
flag2=abs(count1-count3)
if count2%2==0 and count1!=count3:
flag3=flag2//4
flag4=flag3*3
print(count2//2+ flag1+ flag4)
if count2%2!=0 and count1!=count3:
flag3=flag2-2
flag4=flag3//4
flag5=flag4*3
print(((count2-1)//2)+flag1+flag5+2)

First some observations:
For the sake of 4-divisibility, we can replace all numbers by their division-by-4 remainder, so we only have to cope with values 0, 1, 2 and 3.
The ordering doesn't matter, counting the zeroes, ones, twos and threes is enough.
There are pairs immediately giving a sum divisible by 4: (1, 3) and (2, 2). Each existence of such a pair needs one step.
There are triples (1, 1, 2) and (3, 3, 2) needing two steps.
There are quadruples (1, 1, 1, 1) and (3, 3, 3, 3) needing three steps.
Algorithm:
Count the remainder-0 (can be omitted), remainder-1, remainder-2 and remainder-3 numbers.
If the total sum (from the counts) isn't divisible by 4, there's no solution.
For all the N-tuples described above, find how often they fit into the counts; add the resulting number of steps, subtract the numbers consumed from the counts.
Finally, the remainder-1, remainder-2 and remainder-3 counts should be zero.

Here is an O(N) implementation going pretty much in the direction suggested by Ralf Kleberhoff:
from collections import Counter
def beautify(seq):
# only mod4 is interesting, count 1s, 2s, and 3s
c = Counter(x % 4 for x in seq)
c1, c2, c3 = c.get(1, 0), c.get(2, 0), c.get(3, 0)
steps22, twos = divmod(c2, 2) # you have either 0 or 1 2s left
steps13, ones_or_threes = min(c1, c3), abs(c1 - c3)
if not twos and not ones_or_threes % 4:
# 3 steps for every quadruple of 1s or 3s
return steps22 + steps13 + 3 * ones_or_threes // 4
if twos and ones_or_threes % 2 == 2:
# 2 extra steps to join the remaining 2 1s or 3s with the remaining 2
return steps22 + steps13 + 3 * ones_or_threes // 4 + 2
return -1

I'm not entirely sure what your issue is, but perhaps you could change your approach to the problem. Your logic seems fine, but it seems that your trying to do everything in one go, this problem would be much easier if you break it down into pieces. It looks like it would fit a divide and conquer / recursive approach quite nicely. I also took the liberty of solving this problem myself, as it seems like a fun question to attempt.
Suggestions below
First thing you could do is write a function that finds two numbers that has a sum divisible by k, and return them:
def two_sum(numbers, k):
n = len(numbers)
for i in range(0, n):
for j in range(i+1, n):
if (numbers[i] + numbers[j]) % k == 0:
return numbers[i], numbers[j]
return None
Furthermore, the above function is O(n^2), this could be made more efficient.
Secondly, you could write a recursive function that uses the above function, and has a base case where it stops recursing when all the numbers in the list are divisible by k, therefore the list has become "beautiful". Here is one way of doing this:
def rec_helper(numbers, k, count):
if all(x % k == 0 for x in numbers):
return count
# probably safer to check if two_sum() is not None here
first, second = two_sum(numbers, k)
numbers.remove(first)
numbers.remove(second)
numbers.append(first + second)
return rec_helper(numbers, k, count + 1)
Procedure of above code
Base case: if all the items in the list are currently divisible by k, return the current accumulated count.
Otherwise, obtain a pair of integers whose sum is divisible by k from two_sum()
remove() these two numbers from the list, and append() them to the end of the list.
Finally, call rec_helper() again, with the new modified list and count incremented by one , which is count + 1. count here is the minimum number of steps.
Lastly, you can now write a main calling function:
def beautiful_array(numbers, k):
if sum(numbers) % k != 0:
return -1
return rec_helper(numbers, k, 0)
Which first checks that the sum() of the numbers in the list is divisible by k, before proceeding to calling rec_helper(). If it doesn't pass this test, the function simply returns -1, and the list cannot be made "beautiful".
Behavior of above code
>>> beautiful_array([1, 2, 3, 1, 2, 3, 8], 4)
3
>>> beautiful_array([1, 3, 2, 2, 4, 8], 4)
2
>>> beautiful_array([1, 5, 2, 2, 4, 8], 4)
-1
Note: The above code examples are just suggestions, you can follow or use it however you want to. It also doesn't handle the input(), since I believe the main issue in your code is the approach. I didn't want to create a whole new solution that handles your input as well. Please comment below if their is something wrong with the above code, or if you don't understand anything.

Number of multiples less than the max number

For the following problem on SingPath:
Given an input of a list of numbers and a high number,
return the number of multiples of each of
those numbers that are less than the maximum number.
For this case the list will contain a maximum of 3 numbers
that are all relatively prime to each
other.
Here is my code:
def countMultiples(l, max_num):
counting_list = []
for i in l:
for j in range(1, max_num):
if (i * j < max_num) and (i * j) not in counting_list:
counting_list.append(i * j)
return len(counting_list)
Although my algorithm works okay, it gets stuck when the maximum number is way too big
>>> countMultiples([3],30)
9 #WORKS GOOD
>>> countMultiples([3,5],100)
46 #WORKS GOOD
>>> countMultiples([13,25],100250)
Line 5: TimeLimitError: Program exceeded run time limit.
How to optimize this code?

3 and 5 have some same multiples, like 15.
You should remove those multiples, and you will get the right answer
Also you should check the inclusion exclusion principle https://en.wikipedia.org/wiki/Inclusion-exclusion_principle#Counting_integers
EDIT:
The problem can be solved in constant time. As previously linked, the solution is in the inclusion - exclusion principle.
Let say you want to get the number of multiples of 3 less than 100, you can do this by dividing floor(100/3), the same applies for 5, floor(100/5).
Now to get the multiplies of 3 and 5 that are less than 100, you would have to add them, and subtract the ones that are multiples of both. In this case, subtracting multiplies of 15.
So the answer for multiples of 3 and 5, that are less than 100 is floor(100/3) + floor(100/5) - floor(100/15).
If you have more than 2 numbers, it gets a bit more complicated, but the same approach applies, for more check https://en.wikipedia.org/wiki/Inclusion-exclusion_principle#Counting_integers
EDIT2:
Also the loop variant can be speed up.
Your current algorithm appends multiple in a list, which is very slow.
You should switch the inner and outer for loop. By doing that you would check if any of the divisors divide the number, and you get the the divisor.
So just adding a boolean variable which tells you if any of your divisors divide the number, and counting the times the variable is true.
So it would like this:
def countMultiples(l, max_num):
nums = 0
for j in range(1, max_num):
isMultiple = False
for i in l:
if (j % i == 0):
isMultiple = True
if (isMultiple == True):
nums += 1
return nums
print countMultiples([13,25],100250)

If the length of the list is all you need, you'd be better off with a tally instead of creating another list.
def countMultiples(l, max_num):
count = 0
counting_list = []
for i in l:
for j in range(1, max_num):
if (i * j < max_num) and (i * j) not in counting_list:
count += 1
return count

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generate kth combination without generating/iterating previous - python

Related

Sum of two squares in Python

How to further optimize calculating all the cross sums?

Finding maximum sum of occurrences of one element in two attempts from a list

Beautiful sequence

Number of multiples less than the max number

Categories

Resources