Find number of subset that satisfy these two conditions? - python

def findNumber(N,A,B):
return count
Count is total number of subsets of array - [1,2,3,...,N] satisfying these Conditions:
1. All subsets should be contiguous.
2. No subset should contain A[i] and B[i] (order doesn't matter).
Example
N = 3, A=[2,1,3], B=[3,3,1]
All subsets = [1],[2],[3],[1,2],[2,3],[1,2,3]
Invalid subsets = [2,3] because A[0] and B[0] are in it. [1,2,3] because it contains A[1],B[1] and A[2],B[2]
so count will be 4.
I was able to figure out that total number of contiguous subsets will be N(N+1)/2 But i got stuck on how to satisfy condition 2.
I tried explaining it as best as i could please ask for clarification if needed.
EDIT
def findallvalid(n,a,b):
for w in range(1, n+1):
for i in range(n-w+1):
if not((a[0],b[0]) in (i+1,i+w+1)):
yield range(i+1,i+w+1)
I tried this code but i don't know how to iterate over all values of a and b without making this very slow. It's already to slow on n>10^2.
1<=n<=10^5
1<=len(A)<=10^6

I'm interested in how to approach this problem without generating subsets, for example I found total contiguous subsets will be n(n+1)/2 I just want to know how to know number of subsets to rule out.
That gave me an idea - indeed it is quite simple to compute the number of subsets ruled out by a single pair (A[i], B[i]). A little more challenging it is to do for multiple pairs, since the excluded subsets can overlap, so just subtracting a number for each pair won't work. What works is to have a set of the numbers or indexes of all N(N+1)/2 subsets, and remove the indexes of the excluded subsets from it; at the end, the cardinality of the reduced index set is the wanted number of remaining subsets.
def findNumber(N, A, B):
count = N*(N+1)//2
powerset = set(range(count)) # set of enumeration of possible intervals
for a, b in zip(A, B):
if a > b: a, b = b, a # let a be the lower number
# when a and b are in a subset, they form a sub-subset of length "span"
span = (b-a)+1
start = 0 # index where the invervals of current length w begin
for w in range(1, N+1): # for all interval lengths w
if span <= w: # if a and b can be in interval of length w
first = 0 if b <= w else b-w # index of first containment
last = a # index of last containment
# remove the intervals containing a and b from the enumeration
powerset -= set(range(start+first, start+last))
start += N+1-w # compute start index of next length w
return len(powerset) # number of remaining intervals

I did some small modification to your code,
this code is really slow, because it is iterating over the entire list which can be made of 10^5 items, and doing some nested operation which will make the complexity skyrocket up to 10^10, which is really slow
from collections import defaultdict
def findallvalid(N,A,B):
a_in = defaultdict(list)
b_in = defaultdict(list)
for idx, a in enumerate(A):
a_in[a].append(idx)
for idx, b in enumerate(B):
b_in[b].append(idx)
def diff_elem_index(subset):
indecies = []
for elem in subset:
indecies.extend(a_in[elem])
indecies.extend(b_in[elem])
return len(set(indecies)) == len(indecies)
for set_window in range(1, N+1):
for start_idx in range(N - set_window + 1):
sett = list(range(start_idx+1,start_idx + set_window + 1))
if diff_elem_index(sett):
yield sett
My closest assumption, since the code only needs to return the count of items
it can be solved mathematically
All contagious permutations of a N-size list is (N*(N+1))/2 + 1
after that you need to deduct the count of possible permutations that doesn't comply with the second condition, which can be figured out from list A and B
I think calculating the count excluded permutations from list A and B, will be much more efficient than going through all permutations from 1 to N.

Related

most efficient way to iterate over a large array looking for a missing element in Python

I was trying an online test. the test asked to write a function that given a list of up to 100000 integers whose range is 1 to 100000, would find the first missing integer.
for example, if the list is [1,4,5,2] the output should be 3.
I iterated over the list as follow
def find_missing(num)
for i in range(1, 100001):
if i not in num:
return i
the feedback I receives is the code is not efficient in handling big lists.
I am quite new and I couldnot find an answer, how can I iterate more efficiently?
The first improvement would be to make yours linear by using a set for the repeated membership test:
def find_missing(nums)
s = set(nums)
for i in range(1, 100001):
if i not in s:
return i
Given how C-optimized python sorting is, you could also do sth like:
def find_missing(nums)
s = sorted(set(nums))
return next(i for i, n in enumerate(s, 1) if i != n)
But both of these are fairly space inefficient as they create a new collection. You can avoid that with an in-place sort:
from itertools import groupby
def find_missing(nums):
nums.sort() # in-place
return next(i for i, (k, _) in enumerate(groupby(nums), 1) if i != k)
For any range of numbers, the sum is given by Gauss's formula:
# sum of all numbers up to and including nums[-1] minus
# sum of all numbers up to but not including nums[-1]
expected = nums[-1] * (nums[-1] + 1) // 2 - nums[0] * (nums[0] - 1) // 2
If a number is missing, the actual sum will be
actual = sum(nums)
The difference is the missing number:
result = expected - actual
This compulation is O(n), which is as efficient as you can get. expected is an O(1) computation, while actual has to actually add up the elements.
A somewhat slower but similar complexity approach would be to step along the sequence in lockstep with either a range or itertools.count:
for a, e in zip(nums, range(nums[0], len(nums) + nums[0])):
if a != e:
return e # or break if not in a function
Notice the difference between a single comparison a != e, vs a linear containment check like e in nums, which has to iterate on average through half of nums to get the answer.
You can use Counter to count every occurrence of your list. The minimum number with occurrence 0 will be your output. For example:
from collections import Counter
def find_missing():
count = Counter(your_list)
keys = count.keys() #list of every element in increasing order
main_list = list(range(1:100000)) #the list of values from 1 to 100k
missing_numbers = list(set(main_list) - set(keys))
your_output = min(missing_numbers)
return your_output

How to find number of ways that the integers 1,2,3 can add up to n?

Given a set of integers 1,2, and 3, find the number of ways that these can add up to n. (The order matters, i.e. say n is 5. 1+2+1+1 and 2+1+1+1 are two distinct solutions)
My solution involves splitting n into a list of 1s so if n = 5, A = [1,1,1,1,1]. And I will generate more sublists recursively from each list by adding adjacent numbers. So A will generate 4 more lists: [2,1,1,1], [1,2,1,1], [1,1,2,1],[1,1,1,2], and each of these lists will generate further sublists until it reaches a terminating case like [3,2] or [2,3]
Here is my proposed solution (in Python)
ways = []
def check_terminating(A,n):
# check for terminating case
for i in range(len(A)-1):
if A[i] + A[i+1] <= 3:
return False # means still can compute
return True
def count_ways(n,A=[]):
if A in ways:
# check if alr computed if yes then don't compute
return True
if A not in ways: # check for duplicates
ways.append(A) # global ways
if check_terminating(A,n):
return True # end of the tree
for i in range(len(A)-1):
# for each index i,
# combine with the next element and form a new list
total = A[i] + A[i+1]
print(total)
if total <= 3:
# form new list and compute
newA = A[:i] + [total] + A[i+2:]
count_ways(A,newA)
# recursive call
# main
n = 5
A = [1 for _ in range(n)]
count_ways(5,A)
print("No. of ways for n = {} is {}".format(n,len(ways)))
May I know if I'm on the right track, and if so, is there any way to make this code more efficient?
Please note that this is not a coin change problem. In coin change, order of occurrence is not important. In my problem, 1+2+1+1 is different from 1+1+1+2 but in coin change, both are same. Please don't post coin change solutions for this answer.
Edit: My code is working but I would like to know if there are better solutions. Thank you for all your help :)
The recurrence relation is F(n+3)=F(n+2)+F(n+1)+F(n) with F(0)=1, F(-1)=F(-2)=0. These are the tribonacci numbers (a variant of the Fibonacci numbers):
It's possible to write an easy O(n) solution:
def count_ways(n):
a, b, c = 1, 0, 0
for _ in xrange(n):
a, b, c = a+b+c, a, b
return a
It's harder, but possible to compute the result in relatively few arithmetic operations:
def count_ways(n):
A = 3**(n+3)
P = A**3-A**2-A-1
return pow(A, n+3, P) % A
for i in xrange(20):
print i, count_ways(i)
The idea that you describe sounds right. It is easy to write a recursive function that produces the correct answer..slowly.
You can then make it faster by memoizing the answer. Just keep a dictionary of answers that you've already calculated. In your recursive function look at whether you have a precalculated answer. If so, return it. If not, calculate it, save that answer in the dictionary, then return the answer.
That version should run quickly.
An O(n) method is possible:
def countways(n):
A=[1,1,2]
while len(A)<=n:
A.append(A[-1]+A[-2]+A[-3])
return A[n]
The idea is that we can work out how many ways of making a sequence with n by considering each choice (1,2,3) for the last partition size.
e.g. to count choices for (1,1,1,1) consider:
choices for (1,1,1) followed by a 1
choices for (1,1) followed by a 2
choices for (1) followed by a 3
If you need the results (instead of just the count) you can adapt this approach as follows:
cache = {}
def countwaysb(n):
if n < 0:
return []
if n == 0:
return [[]]
if n in cache:
return cache[n]
A = []
for last in range(1,4):
for B in countwaysb(n-last):
A.append(B+[last])
cache[n] = A
return A

Efficient way to get every integer vectors summing to a given number [duplicate]

I've been working on some quick and dirty scripts for doing some of my chemistry homework, and one of them iterates through lists of a constant length where all the elements sum to a given constant. For each, I check if they meet some additional criteria and tack them on to another list.
I figured out a way to meet the sum criteria, but it looks horrendous, and I'm sure there's some type of teachable moment here:
# iterate through all 11-element lists where the elements sum to 8.
for a in range(8+1):
for b in range(8-a+1):
for c in range(8-a-b+1):
for d in range(8-a-b-c+1):
for e in range(8-a-b-c-d+1):
for f in range(8-a-b-c-d-e+1):
for g in range(8-a-b-c-d-e-f+1):
for h in range(8-a-b-c-d-e-f-g+1):
for i in range(8-a-b-c-d-e-f-g-h+1):
for j in range(8-a-b-c-d-e-f-g-h-i+1):
k = 8-(a+b+c+d+e+f+g+h+i+j)
x = [a,b,c,d,e,f,g,h,i,j,k]
# see if x works for what I want
Here's a recursive generator that yields the lists in lexicographic order. Leaving exact as True gives the requested result where every sum==limit; setting exact to False gives all lists with 0 <= sum <= limit. The recursion takes advantage of this option to produce the intermediate results.
def lists_with_sum(length, limit, exact=True):
if length:
for l in lists_with_sum(length-1, limit, False):
gap = limit-sum(l)
for i in range(gap if exact else 0, gap+1):
yield l + [i]
else:
yield []
Generic, recursive solution:
def get_lists_with_sum(length, my_sum):
if my_sum == 0:
return [[0 for _ in range(length)]]
if not length:
return [[]]
elif length == 1:
return [[my_sum]]
else:
lists = []
for i in range(my_sum+1):
rest = my_sum - i
sublists = get_lists_with_sum(length-1, rest)
for sl in sublists:
sl.insert(0, i)
lists.append(sl)
return lists
print get_lists_with_sum(11, 8)

Iterate over lists with a particular sum

I would like to iterate over all lists of length n whose elements sum to 2. How can you do this efficiently? Here is a very inefficient method for n = 10. Ultimately I would like to do this for `n > 25'.
n = 10
for L in itertools.product([-1,1], repeat = n):
if (sum(L) == 2):
print L #Do something with L
you only can have a solution of 2 if you have 2 more +1 than -1 so for n==24
a_solution = [-1,]*11 + [1,]*13
now you can just use itertools.permutations to get every permutation of this
for L in itertools.permutations(a_solution): print L
it would probably be faster to use itertools.combinations to eliminate duplicates
for indices in itertools.combinations(range(24),11):
a = numpy.ones(24)
a[list(indices)] = -1
print a
note for you to get 2 the list must be an even length
One way is to stop your product recursion whenever the remaining elements can't make up the target sum.
Specifically, this method would take your itertools.product(...,repeat) apart into a recursive generator, which updates the target sum based on the value of the current list element, and checks to see if the resulting target is achievable before recursing further:
def generate_lists(target, n):
if(n <= 0):
yield []
return
if(target > n or target < -n):
return
for element in [-1,1]:
for list in generate_lists(target-element, n-1):
yield list+[element]

Pythonic way of checking if indefinite # of consec elements in list sum to given value

Having trouble figuring out a nice way to get this task done.
Say i have a list of triangular numbers up to 1000 -> [0,1,3,6,10,15,..]etc
Given a number, I want to return the consecutive elements in that list that sum to that number.
i.e.
64 --> [15,21,28]
225 --> [105,120]
371 --> [36, 45, 55, 66, 78, 91]
if there's no consecutive numbers that add up to it, return an empty list.
882 --> [ ]
Note that the length of consecutive elements can be any number - 3,2,6 in the examples above.
The brute force way would iteratively check every possible consecutive pairing possibility for each element. (start at 0, look at the sum of [0,1], look at the sum of [0,1,3], etc until the sum is greater than the target number). But that's probably O(n*2) or maybe worse. Any way to do it better?
UPDATE:
Ok, so a friend of mine figured out a solution that works at O(n) (I think) and is pretty intuitively easy to follow. This might be similar (or the same) to Gabriel's answer, but it was just difficult for me to follow and I like that this solution is understandable even from a basic perspective. this is an interesting question, so I'll share her answer:
def findConsec(input1 = 7735):
list1 = range(1, 1001)
newlist = [reduce(lambda x,y: x+y,list1[0:i]) for i in list1]
curr = 0
end = 2
num = sum(newlist[curr:end])
while num != input1:
if num < input1:
num += newlist[end]
end += 1
elif num > input1:
num -= newlist[curr]
curr += 1
if curr == end:
return []
if num == input1:
return newlist[curr:end]
A 3-iteration max solution
Another solution would be to start from close where your number would be and walk forward from one position behind. For any number in the triangular list vec, their value can be defined by their index as:
vec[i] = sum(range(0,i+1))
The division between the looking-for sum value and the length of the group is the average of the group and, hence, lies within it, but may as well not exist in it.
Therefore, you can set the starting point for finding a group of n numbers whose sum matches a value val as the integer part of the division between them. As it may not be in the list, the position would be that which minimizes their difference.
# vec as np.ndarray -> the triangular or whatever-type series
# val as int -> sum of n elements you are looking after
# n as int -> number of elements to be summed
import numpy as np
def seq_index(vec,n,val):
index0 = np.argmin(abs(vec-(val/n)))-n/2-1 # covers odd and even n values
intsum = 0 # sum of which to keep track
count = 0 # counter
seq = [] # indices of vec that sum up to val
while count<=2: # walking forward from the initial guess of where the group begins or prior to it
intsum = sum(vec[(index0+count):(index0+count+n)])
if intsum == val:
seq.append(range(index0+count,index0+count+n))
count += 1
return seq
# Example
vec = []
for i in range(0,100):
vec.append(sum(range(0,i))) # build your triangular series from i = 0 (0) to i = 99 (whose sum equals 4950)
vec = np.array(vec) # convert to numpy to make it easier to query ranges
# looking for a value that belong to the interval 0-4590
indices = seq_index(vec,3,4)
# print indices
print indices[0]
print vec[indices]
print sum(vec[indices])
Returns
print indices[0] -> [1, 2, 3]
print vec[indices] -> [0 1 3]
print sum(vec[indices]) -> 4 (which we were looking for)
This seems like an algorithm question rather than a question on how to do it in python.
Thinking backwards I would copy the list and use it in a similar way to the Sieve of Eratosthenes. I would not consider the numbers that are greater than x. Then start from the greatest number and sum backwards. Then if I get greater than x, subtract the greatest number (exclude it from the solution) and continue to sum backward.
This seems the most efficient way to me and actually is O(n) - you never go back (or forward in this backward algorithm), except when you subtract or remove the biggest element, which doesn't need accessing the list again - just a temp var.
To answer Dunes question:
Yes, there is a reason - to subtracts the next largest in case of no-solution that sums larger. Going from the first element, hit a no-solution would require access to the list again or to the temporary solution list to subtract a set of elements that sum greater than the next element to sum. You risk to increase the complexity by accessing more elements.
To improve efficiency in the cases where an eventual solution is at the beginning of the sequence you can search for the smaller and larger pair using binary search. Once a pair of 2 elements, smaller than x is found then you can sum the pair and if it sums larger than x you go left, otherwise you go right. This search has logarithmic complexity in theory. In practice complexity is not what it is in theory and you can do whatever you like :)
You should pick the first three elements, sum them and do and then you keep subtracting the first of the three and add the next element in the list and see if the sum add up to whatever number you want. That would be O(n).
# vec as np.ndarray
import numpy as np
itsum = sum(list[0:2]) # the sum you want to iterate and check its value
sequence = [[] if itsum == whatever else [range(0,3)]] # indices of the list that add up to whatever (creation)
for i in range(3,len(vec)):
itsum -= vec[i-3]
itsum += vec[i]
if itsum == whatever:
sequence.append(range(i-2,i+1)) # list of sequences that add up to whatever
The solution you provide in the question isn't truly O(n) time complexity -- the way you compute your triangle numbers makes the computation O(n2). The list comprehension throws away the previous work that want into calculating the last triangle number. That is: tni = tni-1 + i (where tn is a triangle number). Since you also, store the triangle numbers in a list, your space complexity is not constant, but related to the size of the number you are looking for. Below is an identical algorithm, but is O(n) time complexity and O(1) space complexity (written for python 3).
# for python 2, replace things like `highest = next(high)` with `highest = high.next()`
from itertools import count, takewhile, accumulate
def find(to_find):
# next(low) == lowest number in total
# next(high) == highest number not in total
low = accumulate(count(1)) # generator of triangle numbers
high = accumulate(count(1))
total = highest = next(high)
# highest = highest number in the sequence that sums to total
# definitely can't find solution if the highest number in the sum is greater than to_find
while highest <= to_find:
# found a solution
if total == to_find:
# keep taking numbers from the low iterator until we find the highest number in the sum
return list(takewhile(lambda x: x <= highest, low))
elif total < to_find:
# add the next highest triangle number not in the sum
highest = next(high)
total += highest
else: # if total > to_find
# subtract the lowest triangle number in the sum
total -= next(low)
return []

Categories