Longest arithmetic progression with a hole

Longest arithmetic progression with a hole - python

The longest arithmetic progression subsequence problem is as follows. Given an array of integers A, devise an algorithm to find the longest arithmetic progression in it. In other words find a sequence i1 < i2 < … < ik, such that A[i1], A[i2], …, A[ik] form an arithmetic progression, and k is maximal. The following code solves the problem in O(n^2) time and space. (Modified from http://www.geeksforgeeks.org/length-of-the-longest-arithmatic-progression-in-a-sorted-array/ . )
#!/usr/bin/env python
import sys
def arithmetic(arr):
n = len(arr)
if (n<=2):
return n
llap = 2
L = [[0]*n for i in xrange(n)]
for i in xrange(n):
L[i][n-1] = 2
for j in xrange(n-2,0,-1):
i = j-1
k = j+1
while (i >=0 and k <= n-1):
if (arr[i] + arr[k] < 2*arr[j]):
k = k + 1
elif (arr[i] + arr[k] > 2*arr[j]):
L[i][j] = 2
i -= 1
else:
L[i][j] = L[j][k] + 1
llap = max(llap, L[i][j])
i = i - 1
k = j + 1
while (i >=0):
L[i][j] = 2
i -= 1
return llap
arr = [1,4,5,7,8,10]
print arithmetic(arr)
This outputs 4.
However I would like to be able to find arithmetic progressions where up to one value is missing. So if arr = [1,4,5,8,10,13] I would like it to report that there is a progression of length 5 with one value missing.
Can this be done efficiently?

Adapted from my answer to Longest equally-spaced subsequence. n is the length of A, and d is the range, i.e. the largest item minus the smallest item.
A = [1, 4, 5, 8, 10, 13] # in sorted order
Aset = set(A)
for d in range(1, 13):
already_seen = set()
for a in A:
if a not in already_seen:
b = a
count = 1
while b + d in Aset:
b += d
count += 1
already_seen.add(b)
# if there is a hole to jump over:
if b + 2 * d in Aset:
b += 2 * d
count += 1
while b + d in Aset:
b += d
count += 1
# don't record in already_seen here
print "found %d items in %d .. %d" % (count, a, b)
# collect here the largest 'count'
I believe that this solution is still O(n*d), simply with larger constants than looking without a hole, despite the two "while" loops inside the two nested "for" loops. Indeed, fix a value of d: then we are in the "a" loop that runs n times; but each of the inner two while loops run at most n times in total over all values of a, giving a complexity O(n+n+n) = O(n) again.
Like the original, this solution is adaptable to the case where you're not interested in the absolute best answer but only in subsequences with a relatively small step d: e.g. n might be 1'000'000, but you're only interested in subsequences of step at most 1'000. Then you can make the outer loop stop at 1'000.

Related

How can I count the number of ways to divide a string into N parts of any size?

I'm trying to count the number of ways you can divide a given string into three parts in Python.
Example: "bbbbb" can be divided into three parts 6 ways:
b|b|bbb
b|bb|bb
b|bbb|b
bb|b|bb
bb|bb|b
bbb|b|b
My first line of thinking was N choose K, where N = the string's length and K = the number of ways to split (3), but that only works for 3 and 4.
My next idea was to iterate through the string and count the number of spots the first third could be segmented and the number of spots the second third could be segmented, then multiply the two counts, but I'm having trouble implementing that, and I'm not even too sure if it'd work.
How can I count the ways to split a string into N parts?

Think of it in terms of the places of the splits as the elements you're choosing:
b ^ b ^ b ^ ... ^ b
^ is where you can split, and there are N - 1 places where you can split (N is the length of the string), and, if you want to split the string into M parts, you need to choose M - 1 split places, so it's N - 1 choose M - 1.
For you example, N = 5, M = 3. (N - 1 choose M - 1) = (4 choose 2) = 6.
An implementation:
import scipy.special
s = 'bbbbb'
n = len(s)
m = 3
res = scipy.special.comb(n - 1, m - 1, exact=True)
print(res)
Output:
6

I came up with a solution to find the number of ways to split a string in python and I think it is quite easier to understand and has a better time complexity
def slitStr(s):
i = 1
j= 2
count = 0
while i <= len(s)-2:
# a, b, c are the split strings
a = s[:i]
b = s[i:j]
c = s[j:]
#increase j till it gets to the end of the list
#each time j gets to the end of the list increment i
#set j to i + 1
if j<len(s):
j+= 1
if j==len(s):
i += 1
j = i+1
# you can increment count after each iteration
count += 1
You can customize the solution to fit your need. I hope this helps.

Hope this helps you too :
string = "ABCDE"
div = "|"
out = []
for i in range(len(string)):
temp1 = ''
if 1 < i < len(string):
temp1 += string[0:i-1] + div
for j in range(len(string) + 1):
temp2 = ""
if j > i:
temp2 += string[i-1:j-1] + div + string[j-1:]
out.append(temp1 + temp2)
print(out)
Result :
['A|B|CDE', 'A|BC|DE', 'A|BCD|E', 'AB|C|DE', 'AB|CD|E', 'ABC|D|E']

How to count the number of unique numbers in sorted array using Binary Search?

I am trying to count the number of unique numbers in a sorted array using binary search. I need to get the edge of the change from one number to the next to count. I was thinking of doing this without using recursion. Is there an iterative approach?
def unique(x):
start = 0
end = len(x)-1
count =0
# This is the current number we are looking for
item = x[start]
while start <= end:
middle = (start + end)//2
if item == x[middle]:
start = middle+1
elif item < x[middle]:
end = middle -1
#when item item greater, change to next number
count+=1
# if the number
return count
unique([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5])
Thank you.
Edit: Even if the runtime benefit is negligent from o(n), what is my binary search missing? It's confusing when not looking for an actual item. How can I fix this?

Working code exploiting binary search (returns 3 for given example).
As discussed in comments, complexity is about O(k*log(n)) where k is number of unique items, so this approach works well when k is small compared with n, and might become worse than linear scan in case of k ~ n
def countuniquebs(A):
n = len(A)
t = A[0]
l = 1
count = 0
while l < n - 1:
r = n - 1
while l < r:
m = (r + l) // 2
if A[m] > t:
r = m
else:
l = m + 1
count += 1
if l < n:
t = A[l]
return count
print(countuniquebs([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5]))

I wouldn't quite call it "using a binary search", but this binary divide-and-conquer algorithm works in O(k*log(n)/log(k)) time, which is better than a repeated binary search, and never worse than a linear scan:
def countUniques(A, start, end):
len = end-start
if len < 1:
return 0
if A[start] == A[end-1]:
return 1
if len < 3:
return 2
mid = start + len//2
return countUniques(A, start, mid+1) + countUniques(A, mid, end) - 1
A = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,4,5,5,5,5,5,5,5,5,5,5]
print(countUniques(A,0,len(A)))

Optimal Search Tree Using Python - Code Analysis

First of all, sorry about the naive question. But I couldn't find help elsewhere
I'm trying to create an Optimal Search Tree using Dynamic Programing in Python that receives two lists (a set of keys and a set of frequencies) and returns two answers:
1 - The smallest path cost.
2 - The generated tree for that smallest cost.
I basically need to create a tree organized by the most accessed items on top (most accessed item it's the root), and return the smallest path cost from that tree, by using the Dynamic Programming solution.
I've the following implemented code using Python:
def optimalSearchTree(keys, freq, n):
#Create an auxiliary 2D matrix to store results of subproblems
cost = [[0 for x in xrange(n)] for y in xrange(n)]
#For a single key, cost is equal to frequency of the key
#for i in xrange (0,n):
# cost[i][i] = freq[i]
# Now we need to consider chains of length 2, 3, ... .
# L is chain length.
for L in xrange (2,n):
for i in xrange(0,n-L+1):
j = i+L-1
cost[i][j] = sys.maxint
for r in xrange (i,j):
if (r > i):
c = cost[i][r-1] + sum(freq, i, j)
elif (r < j):
c = cost[r+1][j] + sum(freq, i, j)
elif (c < cost[i][j]):
cost[i][j] = c
return cost[0][n-1]
def sum(freq, i, j):
s = 0
k = i
for k in xrange (k,j):
s += freq[k]
return s
keys = [10,12,20]
freq = [34,8,50]
n=sys.getsizeof(keys)/sys.getsizeof(keys[0])
print(optimalSearchTree(keys, freq, n))
I'm trying to output the answer 1. The smallest cost for that tree should be 142 (the value stored on the Matrix Position [0][n-1], according to the Dynamic Programming solution). But unfortunately it's returning 0. I couldn't find any issues in that code. What's going wrong?

You have several very questionable statements in your code, definitely inspired by C/Java programming practices. For instance,
keys = [10,12,20]
freq = [34,8,50]
n=sys.getsizeof(keys)/sys.getsizeof(keys[0])
I think you think you calculate the number of items in the list. However, n is not 3:
sys.getsizeof(keys)/sys.getsizeof(keys[0])
3.142857142857143
What you need is this:
n = len(keys)
One more find: elif (r < j) is always True, because r is in the range between i (inclusive) and j (exclusive). The elif (c < cost[i][j]) condition is never checked. The matrix c is never updated in the loop - that's why you always end up with a 0.
Another suggestion: do not overwrite the built-in function sum(). Your namesake function calculates the sum of all items in a slice of a list:
sum(freq[i:j])

import sys
def optimalSearchTree(keys, freq):
#Create an auxiliary 2D matrix to store results of subproblems
n = len(keys)
cost = [[0 for x in range(n)] for y in range(n)]
storeRoot = [[0 for i in range(n)] for i in range(n)]
#For a single key, cost is equal to frequency of the key
for i in range (0,n):
cost[i][i] = freq[i]
# Now we need to consider chains of length 2, 3, ... .
# L is chain length.
for L in range (2,n+1):
for i in range(0,n-L+1):
j = i + L - 1
cost[i][j] = sys.maxsize
for r in range (i,j+1):
c = (cost[i][r-1] if r > i else 0)
c += (cost[r+1][j] if r < j else 0)
c += sum(freq[i:j+1])
if (c < cost[i][j]):
cost[i][j] = c
storeRoot[i][j] = r
return cost[0][n-1], storeRoot
if __name__ == "__main__" :
keys = [10,12,20]
freq = [34,8,50]
print(optimalSearchTree(keys, freq))

What will be the big-O notation for this code?

Will it be O(n) or greater?
n = length of list
a is a list of integers that is very long
final_count=0
while(n>1):
i=0
j=1
while(a[i+1]==a[i]):
i=i+1
j=j+1
if i==n-1:
break
for k in range(j):
a.pop(0)
final_count=final_count+j*(n-j)
n=n-j

The way I see it, your code would be O(n) if it wasn’t for the a.pop(0) part. Since lists are implemented as arrays in memory, removing an element at the top means that all elements in the array need to moved as well. So removing from a list is O(n). You do that in a loop over j and as far as I can tell, in the end, the sum of all js will be the same as n, so you are removing the item n times from the list, making this part quadratic (O(n²)).
You can avoid this though by not modifying your list, and just keeping track of the initial index. This not only removes the need for the pop, but also the loop over j, making the complexity calculation a bit more straight-forward:
final_count = 0
offset = 0
while n > 1:
i = 0
j = 1
while a[offset + i + 1] == a[offset + i]:
i += 1
j += 1
if i == n - 1:
break
offset += j
final_count += j * (n - j)
n = n - j
Btw. it’s not exactly clear to me why you keep track of j, since j = i + 1 at every time, so you can get rid of that, making the code a bit simpler. And at the very end j * (n - j) is exactly j * n if you first adjust n:
final_count = 0
offset = 0
while n > 1:
i = 0
while a[offset + i + 1] == a[offset + i]:
i += 1
if i == n - 1:
break
offset += i + 1
n = n - (i + 1)
final_count += (i + 1) * n
One final note: As you may notice, offset is now counting from zero to the length of your list, while n is doing the reverse, counting from the length of your list to zero. You could probably combine this, so you don’t need both.

My FB HackerCup code too slow of large inputs

I was solving the Find the min problem on facebook hackercup using python, my code works fine for sample inputs but for large inputs(10^9) it is taking hours to complete.
So, is it possible that the solution of that problem can't be computed within 6 minutes using python? Or may be my approaches are too bad?
Problem statement:
After sending smileys, John decided to play with arrays. Did you know that hackers enjoy playing with arrays? John has a zero-based index array, m, which contains n non-negative integers. However, only the first k values of the array are known to him, and he wants to figure out the rest.
John knows the following: for each index i, where k <= i < n, m[i] is the minimum non-negative integer which is not contained in the previous *k* values of m.
For example, if k = 3, n = 4 and the known values of m are [2, 3, 0], he can figure out that m[3] = 1.
John is very busy making the world more open and connected, as such, he doesn't have time to figure out the rest of the array. It is your task to help him.
Given the first k values of m, calculate the nth value of this array. (i.e. m[n - 1]).
Because the values of n and k can be very large, we use a pseudo-random number generator to calculate the first k values of m. Given positive integers a, b, c and r, the known values of m can be calculated as follows:
m[0] = a
m[i] = (b * m[i - 1] + c) % r, 0 < i < k
Input
The first line contains an integer T (T <= 20), the number of test
cases.
This is followed by T test cases, consisting of 2 lines each.
The first line of each test case contains 2 space separated integers,
n, k (1 <= k <= 10^5, k < n <= 10^9).
The second line of each test case contains 4 space separated integers
a, b, c, r (0 <= a, b, c <= 10^9, 1 <= r <= 10^9).
I tried two approaches but both failed to return results in 6 minutes, Here's my two approaches:
first:
import sys
cases=sys.stdin.readlines()
def func(line1,line2):
n,k=map(int,line1.split())
a,b,c,r =map(int,line2.split())
m=[None]*n #initialize the list
m[0]=a
for i in xrange(1,k): #set the first k values using the formula
m[i]= (b * m[i - 1] + c) % r
#print m
for j in range(0,n-k): #now set the value of m[k], m[k+1],.. upto m[n-1]
temp=set(m[j:k+j]) # create a set from the K values relative to current index
i=-1 #start at 0, lowest +ve integer
while True:
i+=1
if i not in temp: #if that +ve integer is not present in temp
m[k+j]=i
break
return m[-1]
for ind,case in enumerate(xrange(1,len(cases),2)):
ans=func(cases[case],cases[case+1])
print "Case #{0}: {1}".format(ind+1,ans)
Second:
import sys
cases=sys.stdin.readlines()
def func(line1,line2):
n,k=map(int,line1.split())
a,b,c,r =map(int,line2.split())
m=[None]*n #initialize
m[0]=a
for i in xrange(1,k): #same as above
m[i]= (b * m[i - 1] + c) % r
#instead of generating a set in each iteration , I used a
# dictionary this time.
#Now, if the count of an item is 0 then it
#means the item is not present in the previous K items
#and can be added as the min value
temp={}
for x in m[0:k]:
temp[x]=temp.get(x,0)+1
i=-1
while True:
i+=1
if i not in temp:
m[k]=i #set the value of m[k]
break
for j in range(1,n-k): #now set the values of m[k+1] to m[n-1]
i=-1
temp[m[j-1]] -= 1 #decrement it's value, as it is now out of K items
temp[m[k+j-1]]=temp.get(m[k+j-1],0)+1 # new item added to the current K-1 items
while True:
i+=1
if i not in temp or temp[i]==0: #if i not found in dict or it's val is 0
m[k+j]=i
break
return m[-1]
for ind,case in enumerate(xrange(1,len(cases),2)):
ans=func(cases[case],cases[case+1])
print "Case #{0}: {1}".format(ind+1,ans)
The last for-loop in second approach can also be written as :
for j in range(1,n-k):
i=-1
temp[m[j-1]] -= 1
if temp[m[j-1]]==0:
temp.pop(m[j-1]) #same as above but pop the key this time
temp[m[k+j-1]]=temp.get(m[k+j-1],0)+1
while True:
i+=1
if i not in temp:
m[k+j]=i
break
sample input :
5
97 39
34 37 656 97
186 75
68 16 539 186
137 49
48 17 461 137
98 59
6 30 524 98
46 18
7 11 9 46
output:
Case #1: 8
Case #2: 38
Case #3: 41
Case #4: 40
Case #5: 12
I already tried codereview, but no one replied there yet.

After at most k+1 steps, the last k+1 numbers in the array will be 0...k (in some order). Subsequently, the sequence is predictable: m[i] = m[i-k-1]. So the way to solve this problem is run your naive implementation for k+1 steps. Then you've got an array with 2k+1 elements (the first k were generated from the random sequence, and the other k+1 from iterating).
Now, the last k+1 elements are going to repeat infinitely. So you can just return the result for m[n] immediately: it's m[k + (n-k-1) % (k+1)].
Here's some code that implements it.
import collections
def initial_seq(k, a, b, c, r):
v = a
for _ in xrange(k):
yield v
v = (b * v + c) % r
def find_min(n, k, a, b, c, r):
m = [0] * (2 * k + 1)
for i, v in enumerate(initial_seq(k, a, b, c, r)):
m[i] = v
ks = range(k+1)
s = collections.Counter(m[:k])
for i in xrange(k, len(m)):
m[i] = next(j for j in ks if not s[j])
ks.remove(m[i])
s[m[i-k]] -= 1
return m[k + (n - k - 1) % (k + 1)]
print find_min(97, 39, 34, 37, 656, 97)
print find_min(186, 75, 68, 16, 539, 186)
print find_min(137, 49, 48, 17, 461, 137)
print find_min(1000000000, 100000, 48, 17, 461, 137)
The four cases run in 4 seconds on my machine, and the last case has the largest possible n.

Here is my O(k) solution, which is based on the same idea as above, but runs much faster.
import os, sys
f = open(sys.argv[1], 'r')
T = int(f.readline())
def next(ary, start):
j = start
l = len(ary)
ret = start - 1
while j < l and ary[j]:
ret = j
j += 1
return ret
for t in range(T):
n, k = map(int, f.readline().strip().split(' '))
a, b, c, r = map(int, f.readline().strip().split(' '))
m = [0] * (4 * k)
s = [0] * (k+1)
m[0] = a
if m[0] <= k:
s[m[0]] = 1
for i in xrange(1, k):
m[i] = (b * m[i-1] + c) % r
if m[i] < k+1:
s[m[i]] += 1
p = next(s, 0)
m[k] = p + 1
p = next(s, p+2)
for i in xrange(k+1, n):
if m[i-k-1] > p or s[m[i-k-1]] > 1:
m[i] = p + 1
if m[i-k-1] <= k:
s[m[i-k-1]] -= 1
s[m[i]] += 1
p = next(s, p+2)
else:
m[i] = m[i-k-1]
if p == k:
break
if p != k:
print 'Case #%d: %d' % (t+1, m[n-1])
else:
print 'Case #%d: %d' % (t+1, m[i-k + (n-i+k+k) % (k+1)])
The key point here is, m[i] will never exceeds k, and if we remember the consecutive numbers we can find in previous k numbers from 0 to p, then p will never reduce.
If number m[i-k-1] is larger than p, then it's obviously we should set m[i] to p+1, and p will increase at least 1.
If number m[i-k-1] is smaller or equal to p, then we should consider whether the same number exists in m[i-k:i], if not, m[i] should set equal to m[i-k-1], if yes, we should set m[i] to p+1 just as the "m[i-k-1]-larger-than-p" case.
Whenever p is equal to k, the loop begin, and the loop size is (k+1), so we can jump out of the calculation and print out the answer now.

I enhanced the performance through adding map.
import sys, os
import collections
def min(str1, str2):
para1 = str1.split()
para2 = str2.split()
n = int(para1[0])
k = int(para1[1])
a = int(para2[0])
b = int(para2[1])
c = int(para2[2])
r = int(para2[3])
m = [0] * (2*k+1)
m[0] = a
s = collections.Counter()
s[a] += 1
rs = {}
for i in range(k+1):
rs[i] = 1
for i in xrange(1,k):
v = (b * m[i - 1] + c) % r
m[i] = v
s[v] += 1
if v < k:
if v in rs:
rs[v] -= 1
if rs[v] == 0:
del rs[v]
for j in xrange(0,k+1):
for t in rs:
if not s[t]:
m[k+j] = t
if m[j] < k:
if m[j] in rs:
rs[m[j]] += 1
else:
rs[m[j]] = 0
rs[t] -= 1
if rs[t] == 0:
del rs[t]
s[t] = 1
break
s[m[j]] -= 1
return m[k + ((n-k-1)%(k+1))]
if __name__=='__main__':
lines = []
user_input = raw_input()
num = int(user_input)
for i in xrange(num):
input1 = raw_input()
input2 = raw_input()
print "Case #%s: %s"%(i+1, min(input1, input2))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Longest arithmetic progression with a hole - python

Related

How can I count the number of ways to divide a string into N parts of any size?

How to count the number of unique numbers in sorted array using Binary Search?

Optimal Search Tree Using Python - Code Analysis

What will be the big-O notation for this code?

My FB HackerCup code too slow of large inputs

Categories

Resources