The problem:
There is a list of positive integers where the elements are unique and monotonically increasing. The list is s. For example s=[1,2,3,4,5,6,7,8,9,10].
The length of the list is n. The goal is to find the total number of ordered pairs (L, W) where L and W are in s that satisfies L*W <= a
The solution of the problem seems straight forward, yet I cannot figure out how did I do it wrong. This was an online assessment problem and I have failed already. If someone can tell me how did I fail, I will be very happy.
The approach is straight forward: for each possible L, find an upper bound of W in s using binary search. My code using Python:
def configurationCount(n, s, a):
# n is the length of s.
num_ways = 0
# Consider every L.
for i in range(n):
L = s[i]
goal = a//L
# Binary search for this upper bound in s.
i = 0
j = len(s) - 1
mid = (i+j)//2
exact_match = False
valid = False
cursor = -1
while(i<=j):
if goal > s[mid]:
i = mid+1
cursor = mid
mid = (i+j)//2
elif goal < s[mid]:
j = mid - 1
cursor = mid
mid = (i+j)//2
else:
exact_match = True
cursor = mid
break
if cursor in range(n) and L*s[cursor]<=a:
valid = True
if cursor in range(n) and valid:
num_ways += (cursor + 1)
return num_ways
This code only gave one correct output for one test case, but failed for the other 10 test cases. In one case where I could see that my out put was wrong, the correct output was more than 2000, where mine was about 1200. So I probably did not count some cases. Yet this approach seems so standard but how did I do it wrong?
I have the following input:
a tolerance level T
Number of numbers N
N numbers
The task is to find the longest period within those N numbers such that they are within the tolerance level. More precisely, given a left and a right bound of a substring l and r and two distinct elements a1 and a2 between the two bounds, it must hold that |a1 - a1| <= T. How can I do this in an efficient way? My approach is:
def getLength(T, N, numbers):
max_length = 1
for i in range(0, N-1):
start = numbers[i]
numlist = [start]
for j in range(i+1, N):
end = numbers[j]
numlist.append(end)
if (max(numlist) - min(numlist)) > T:
break
if (j-i+1) > max_length:
max_length = j-i+1
return max_length
EDIT: To make it clear. The code works as expected. However, it is not efficient enough. I would like to do it more efficiently.
First of all, I'm not sure if your code does what you describe in your question. Secondly, it takes (much) less than second to process 12,000 random numbers.
Regardless, it can be sped up by not calling min() and max() on the numlist every time a new element is appended to it. Instead you can just update the current minimum and maximum variables with a couple of if statements.
Here code showing that being done, along with a simple framework I wrote for timing performance:
def getLength(T, N, numbers):
max_length = 1
for i in range(N-1):
start = numbers[i]
numlist = [start]
min_numlist = max_numlist = start # Added variables.
for j in range(i+1, N):
end = numbers[j]
numlist.append(end)
# Inefficient - replaced.
# if (max(numlist) - min(numlist)) > T:
# break
# Update extremities.
if end > max_numlist:
max_numlist = end
if end < min_numlist:
min_numlist = end
if max_numlist-min_numlist > T:
break
if j-i+1 > max_length:
max_length = j-i+1
return max_length
if __name__ == '__main__':
import random
import time
random.seed(42) # Use hardcoded seed to get same numbers each time run.
T = 100
N = 12000
numbers = [random.randrange(1000) for _ in range(N)]
starttime = time.time()
max_length = getLength(T, N, numbers)
stoptime = time.time()
print('max length: {}'.format(max_length))
print('processing {:,d} elements took {:.5f} secs'.format(N, stoptime-starttime))
I am trying to answer a question on an online judge in Python, but I am exceeding both the time limit and memory limit. The question is pretty much asking for the number of all paths from a start node to an end node. Full question specifications can be seen here.
This is my code:
import sys
lines = sys.stdin.read().strip().split('\n')
n = int(lines[0])
dict1 = {}
for i in xrange(1, n+1):
dict1[i] = []
for i in xrange(1, len(lines) - 1):
numbers = map(int, lines[i].split())
num1 = numbers[0]
num2 = numbers[1]
dict1[num2].append(num1)
def pathfinder(start, graph, count):
new = []
if start == []:
return count
for i in start:
numList = graph[i]
for j in numList:
if j == 1:
count += 1
else:
new.append(j)
return pathfinder(new, graph, count)
print pathfinder([n], dict1, 0)
What the code does is it starts at the end node, and works its way up to the top by exploring all neighboring nodes. I made essentially a breadth first search algorithm, but its taking up too much space and time. How can I improve this code to make it more efficient? Is my approach wrong and how should I fix it?
Since the graph is acyclic there is a topological ordering which we can immediately see to be 1, 2, ..., n. So we can use dynamic programming the same way it is used to solve the longest path problem. In a list paths the element paths[i] stores how many paths would there be from 1 to i. The update would be simple - for each edge (i,j) where i is from our topological order we do paths[j] += path[i].
from collections import defaultdict
graph = defaultdict(list)
n = int(input())
while True:
tokens = input().split()
a, b = int(tokens[0]), int(tokens[1])
if a == 0:
break
graph[a].append(b)
paths = [0] * (n+1)
paths[1] = 1
for i in range(1, n+1):
for j in graph[i]:
paths[j] += paths[i]
print(paths[n])
Note that what you are implementing is not actually BFS since you don't mark which vertices you've visited making your start to grow out of proportion.
Test the graph
for i in range(1, n+1):
dict1[i] = list(range(i-1, 0, -1))
If you print the size of start you can see that the max value it gets for a given n grows exactly as binomial(n, floor(n/2)) which is ~4^n/sqrt(n). Note also that BFS is not what you want since it is not possible to count the number of paths in that way.
import sys
from collections import defaultdict
def build_matrix(filename, x):
# A[i] stores number of paths from node x to node i.
# O(n) to build parents_of_node
parents_of_node = defaultdict(list)
with open(filename) as infile:
num_nodes = int(infile.readline())
A = [0] * (num_nodes + 1) # A[0] is dummy variable. Not used.
for line in infile:
if line == "0 0":
break
u, v = map(int, line.strip().split())
parents_of_node[v].append(u)
# Initialize all direct descendants of x to 1
if u == x:
A[v] = 1
# Number of paths from x to i = sum(number of paths from x to parent of i)
for i in xrange(1, num_nodes + 1): # O(n)
A[i] += sum(A[p] for p in parents_of_node[i]) # O(max fan-in of graph), assuming O(1) for accessing dict.
# Total time complexity to build A is O(n * (max_fan-in of graph))
return A
def main():
filename = sys.argv[1]
x = 1 # Find number of paths from x
y = 4 # to y
A = build_matrix(filename, x)
print(A[y])
What you are doing is a DFS (not a BFS) in that code...
Here's a link to a good solution...
EDITED:
Use this approach instead...
http://www.geeksforgeeks.org/find-paths-given-source-destination/
After analyzing the fastest subset sum algorithm which runs in 2^(n/2) time, I noticed a slight optimization that can be done. I'm not sure if it really counts as an optimization and if it does, I'm wondering if it can be improved by recursion.
Basically from the original algorithm: http://en.wikipedia.org/wiki/Subset_sum_problem (see part with title Exponential time algorithm)
it takes the list and splits it into two
then it generates the sorted power sets of both in 2^(n/2) time
then it does a linear search in both lists to see if 1 value in both lists sum to x using a clever trick
In my version with the optimization
it takes the list and removes the last element last
then it splits the list in two
then it generates the sorted power sets of both in 2^((n-1)/2) time
then it does a linear search in both lists to see if 1 value in both lists sum to x or x-last (at same time with same running time) using a clever trick
If it finds either, then I will know it worked. I tried using python time functions to test with lists of size 22, and my version is coming like twice as fast apparently.
After running the below code, it shows
0.050999879837 <- the original algorithm
0.0250000953674 <- my algorithm
My logic for the recursion part is, well if it works for a size n list in 2^((n-1)/1) time, can we not repeat this again and again?
Does any of this make sense, or am I totally wrong?
Thanks
I created this python code:
from math import log, ceil, floor
import helper # my own code
from random import randint, uniform
import time
# gets a list of unique random floats
# s = how many random numbers
# l = smallest float can be
# h = biggest float can be
def getRandomList(s, l, h):
lst = []
while len(lst) != s:
r = uniform(l,h)
if not r in lst:
lst.append(r)
return lst
# This just generates the two powerset sorted lists that the 2^(n/2) algorithm makes.
# This is just a lazy way of doing it, this running time is way worse, but since
# this can be done in 2^(n/2) time, I just pretend its that running time lol
def getSortedPowerSets(lst):
n = len(lst)
l1 = lst[:n/2]
l2 = lst[n/2:]
xs = range(2**(n/2))
ys1 = helper.getNums(l1, xs)
ys2 = helper.getNums(l2, xs)
return ys1, ys2
# this just checks using the regular 2^(n/2) algorithm to see if two values
# sum to the specified value
def checkListRegular(lst, x):
lst1, lst2 = getSortedPowerSets(lst)
left = 0
right = len(lst2)-1
while left < len(lst1) and right >= 0:
sum = lst1[left] + lst2[right]
if sum < x:
left += 1
elif sum > x:
right -= 1
else:
return True
return False
# this is my improved version of the above version
def checkListSmaller(lst, x):
last = lst.pop()
x1, x2 = x, x - last
return checkhelper(lst, x1, x2)
# this is the same as the function 'checkListRegular', but it checks 2 values
# at the same time
def checkhelper(lst, x1, x2):
lst1, lst2 = getSortedPowerSets(lst)
left = [0,0]
right = [len(lst2)-1, len(lst2)-1]
while 1:
check = 0
if left[0] < len(lst1) and right[0] >= 0:
check += 1
sum = lst1[left[0]] + lst2[right[0]]
if sum < x1:
left[0] += 1
elif sum > x1:
right[0] -= 1
else:
return True
if left[1] < len(lst1) and right[1] >= 0:
check += 1
sum = lst1[left[1]] + lst2[right[1]]
if sum < x2:
left[1] += 1
elif sum > x2:
right[1] -= 1
else:
return True
if check == 0:
return False
n = 22
lst = getRandomList(n, 1, 3000)
startTime = time.time()
print checkListRegular(lst, -50) # -50 so it does worst case scenario
startTime2 = time.time()
print checkListSmaller(lst, -50) # -50 so it does worst case scenario
startTime3 = time.time()
print (startTime2 - startTime)
print (startTime3 - startTime2)
This is the helper library which I just use to generate the powerset list.
def dec_to_bin(x):
return int(bin(x)[2:])
def getNums(lst, xs):
sums = []
n = len(lst)
for i in xs:
bin = str(dec_to_bin(i))
bin = (n-len(bin))*"0" + bin
chosen_items = getList(bin, lst)
sums.append(sum(chosen_items))
sums.sort()
return sums
def getList(binary, lst):
s = []
for i in range(len(binary)):
if binary[i]=="1":
s.append(float(lst[i]))
return s
then it generates the sorted power sets of both in 2^((n-1)/2) time
OK, since now the list has one less lement. However, this is not a big deal its just a constant time improvement of 2^(1/2)...
then it does a linear search in both lists to see if 1 value in both lists sum to x or x-last (at same time with same running time) using a clever trick
... and this improvement will go away because now you do twice as many operations to check for both x and x-last sums instead of only for x
can we not repeat this again and again?
No you can't, for the same reason why you couldn't split the original algorithm again and again. The trick only works for once because once you start looking for values in more than two lists you can't use the sorting trick anymore.
I wrote code to arrange numbers after taking user input. The ordering requires that the sum of adjacent numbers is prime. Up until 10 as an input code is working fine. If I go beyond that the system hangs. Please let me know the steps to optimize it
ex input 8
Answer should be: (1, 2, 3, 4, 7, 6, 5, 8)
Code as follows....
import itertools
x = raw_input("please enter a number")
range_x = range(int(x)+1)
del range_x[0]
result = list(itertools.permutations(range_x))
def prime(x):
for i in xrange(1,x,2):
if i == 1:
i = i+1
if x%i==0 and i < x :
return False
else:
return True
def is_prime(a):
for i in xrange(len(a)):
print a
if i < len(a)-1:
if prime(a[i]+a[i+1]):
pass
else:
return False
else:
return True
for i in xrange(len(result)):
if i < len(result)-1:
if is_prime(result[i]):
print 'result is:'
print result[i]
break
else:
print 'result is'
print result[i-1]
For posterity ;-), here's one more based on finding a Hamiltonian path. It's Python3 code. As written, it stops upon finding the first path, but can easily be changed to generate all paths. On my box, it finds a solution for all n in 1 through 900 inclusive in about one minute total. For n somewhat larger than 900, it exceeds the maximum recursion depth.
The prime generator (psieve()) is vast overkill for this particular problem, but I had it handy and didn't feel like writing another ;-)
The path finder (ham()) is a recursive backtracking search, using what's often (but not always) a very effective ordering heuristic: of all the vertices adjacent to the last vertex in the path so far, look first at those with the fewest remaining exits. For example, this is "the usual" heuristic applied to solving Knights Tour problems. In that context, it often finds a tour with no backtracking needed at all. Your problem appears to be a little tougher than that.
def psieve():
import itertools
yield from (2, 3, 5, 7)
D = {}
ps = psieve()
next(ps)
p = next(ps)
assert p == 3
psq = p*p
for i in itertools.count(9, 2):
if i in D: # composite
step = D.pop(i)
elif i < psq: # prime
yield i
continue
else: # composite, = p*p
assert i == psq
step = 2*p
p = next(ps)
psq = p*p
i += step
while i in D:
i += step
D[i] = step
def build_graph(n):
primes = set()
for p in psieve():
if p > 2*n:
break
else:
primes.add(p)
np1 = n+1
adj = [set() for i in range(np1)]
for i in range(1, np1):
for j in range(i+1, np1):
if i+j in primes:
adj[i].add(j)
adj[j].add(i)
return set(range(1, np1)), adj
def ham(nodes, adj):
class EarlyExit(Exception):
pass
def inner(index):
if index == n:
raise EarlyExit
avail = adj[result[index-1]] if index else nodes
for i in sorted(avail, key=lambda j: len(adj[j])):
# Remove vertex i from the graph. If this isolates
# more than 1 vertex, no path is possible.
result[index] = i
nodes.remove(i)
nisolated = 0
for j in adj[i]:
adj[j].remove(i)
if not adj[j]:
nisolated += 1
if nisolated > 1:
break
if nisolated < 2:
inner(index + 1)
nodes.add(i)
for j in adj[i]:
adj[j].add(i)
n = len(nodes)
result = [None] * n
try:
inner(0)
except EarlyExit:
return result
def solve(n):
nodes, adj = build_graph(n)
return ham(nodes, adj)
This answer is based on #Tim Peters' suggestion about Hamiltonian paths.
There are many possible solutions. To avoid excessive memory consumption for intermediate solutions, a random path can be generated. It also allows to utilize multiple CPUs easily (each cpu generates its own paths in parallel).
import multiprocessing as mp
import sys
def main():
number = int(sys.argv[1])
# directed graph, vertices: 1..number (including ends)
# there is an edge between i and j if (i+j) is prime
vertices = range(1, number+1)
G = {} # vertex -> adjacent vertices
is_prime = sieve_of_eratosthenes(2*number+1)
for i in vertices:
G[i] = []
for j in vertices:
if is_prime[i + j]:
G[i].append(j) # there is an edge from i to j in the graph
# utilize multiple cpus
q = mp.Queue()
for _ in range(mp.cpu_count()):
p = mp.Process(target=hamiltonian_random, args=[G, q])
p.daemon = True # do not survive the main process
p.start()
print(q.get())
if __name__=="__main__":
main()
where Sieve of Eratosthenes is:
def sieve_of_eratosthenes(limit):
is_prime = [True]*limit
is_prime[0] = is_prime[1] = False # zero and one are not primes
for n in range(int(limit**.5 + .5)):
if is_prime[n]:
for composite in range(n*n, limit, n):
is_prime[composite] = False
return is_prime
and:
import random
def hamiltonian_random(graph, result_queue):
"""Build random paths until Hamiltonian path is found."""
vertices = list(graph.keys())
while True:
# build random path
path = [random.choice(vertices)] # start with a random vertice
while True: # until path can be extended with a random adjacent vertex
neighbours = graph[path[-1]]
random.shuffle(neighbours)
for adjacent_vertex in neighbours:
if adjacent_vertex not in path:
path.append(adjacent_vertex)
break
else: # can't extend path
break
# check whether it is hamiltonian
if len(path) == len(vertices):
assert set(path) == set(vertices)
result_queue.put(path) # found hamiltonian path
return
Example
$ python order-adjacent-prime-sum.py 20
Output
[19, 18, 13, 10, 1, 4, 9, 14, 5, 6, 17, 2, 15, 16, 7, 12, 11, 8, 3, 20]
The output is a random sequence that satisfies the conditions:
it is a permutation of the range from 1 to 20 (including)
the sum of adjacent numbers is prime
Time performance
It takes around 10 seconds on average to get result for n = 900 and extrapolating the time as exponential function, it should take around 20 seconds for n = 1000:
The image is generated using this code:
import numpy as np
figname = 'hamiltonian_random_noset-noseq-900-900'
Ns, Ts = np.loadtxt(figname+'.xy', unpack=True)
# use polyfit to fit the data
# y = c*a**n
# log y = log (c * a ** n)
# log Ts = log c + Ns * log a
coeffs = np.polyfit(Ns, np.log2(Ts), deg=1)
poly = np.poly1d(coeffs, variable='Ns')
# use curve_fit to fit the data
from scipy.optimize import curve_fit
def func(x, a, c):
return c*a**x
popt, pcov = curve_fit(func, Ns, Ts)
aa, cc = popt
a, c = 2**coeffs
# plot it
import matplotlib.pyplot as plt
plt.figure()
plt.plot(Ns, np.log2(Ts), 'ko', label='time measurements')
plt.plot(Ns, np.polyval(poly, Ns), 'r-',
label=r'$time = %.2g\times %.4g^N$' % (c, a))
plt.plot(Ns, np.log2(func(Ns, *popt)), 'b-',
label=r'$time = %.2g\times %.4g^N$' % (cc, aa))
plt.xlabel('N')
plt.ylabel('log2(time in seconds)')
plt.legend(loc='upper left')
plt.show()
Fitted values:
>>> c*a**np.array([900, 1000])
array([ 11.37200806, 21.56029156])
>>> func([900, 1000], *popt)
array([ 14.1521409 , 22.62916398])
Dynamic programming, to the rescue:
def is_prime(n):
return all(n % i != 0 for i in range(2, n))
def order(numbers, current=[]):
if not numbers:
return current
for i, n in enumerate(numbers):
if current and not is_prime(n + current[-1]):
continue
result = order(numbers[:i] + numbers[i + 1:], current + [n])
if result:
return result
return False
result = order(range(500))
for i in range(len(result) - 1):
assert is_prime(result[i] + result[i + 1])
You can force it to work for even larger lists by increasing the maximum recursion depth.
Here's my take on a solution. As Tim Peters pointed out, this is a Hamiltonian path problem.
So the first step is to generate the graph in some form.
Well the zeroth step in this case to generate prime numbers. I'm going to use a sieve, but whatever prime test is fine. We need primes upto 2 * n since that is the largest any two numbers can sum to.
m = 8
n = m + 1 # Just so I don't have to worry about zero indexes and random +/- 1's
primelen = 2 * m
prime = [True] * primelen
prime[0] = prime[1] = False
for i in range(4, primelen, 2):
prime[i] = False
for i in range(3, primelen, 2):
if not prime[i]:
continue
for j in range(i * i, primelen, i):
prime[j] = False
Ok, now we can test for primality with prime[i]. Now its easy to make the graph edges. If I have a number i, what numbers can come next. I'll also make use of the fact that i and j have opposite parity.
pairs = [set(j for j in range(i%2+1, n, 2) if prime[i+j])
for i in range(n)]
So here pairs[i] is set object whose elements are integers j such that i+j is prime.
Now we need to walk the graph. This is really where the time consuming part is and all further optimizations will be done here.
chains = [
([], set(range(1, n))
]
chains is going to keep track of the valid paths as we walk them. The first element in the tuple will be your result. The second element is all the unused numbers, or unvisited nodes. The idea is to take one chain out of the queue, take a step down the path and put it back.
while chains:
chain, unused = chains.pop()
if not chain:
# we haven't even started, all unused are valid
valid_next = unused
else:
# We need numbers that are both unused and paired with the last node
# Using sets makes this easy
valid_next = unused & pairs[chains[-1]]
for num in valid_next:
# Take a step to the new node and add the new path back to chains
# Reminder, its important not to mutate anything here, always make new objs
newchain = chain + [num]
newunused = unused - set([num])
chains.append( (newchain, newunused) )
# are we done?
if not newunused:
print newchain
chains = False
Notice that if there is no valid next step, the path is removed without a replacement.
This is really memory inefficient, but runs in a reasonable time. The biggest performance bottleneck is walking the graph, so the next optimization would be popping and inserting paths in intelligent places to prioritize the most likely paths. It might be helpful to use a collections.deque or different container for your chains in that case.
EDIT
Here is an example of how you can implement your path priority. We will assign each path a score and keep the chains list sorted by this score. For a simple example I will suggest that paths containing "harder to use" nodes are worth more. That is for each step on a path the score will increase by n - len(valid_next) The modified code will look something like this.
import bisect
chains = ...
chains_score = [0]
while chains:
chain, unused = chains.pop()
score = chains_score.pop()
...
for num in valid_next:
newchain = chain + [num]
newunused = unused - set([num])
newscore = score + n - len(valid_next)
index = bisect.bisect(chains_score, newscore)
chains.insert(index, (newchain, newunused))
chains_score.insert(index, newscore)
Remember that insertion is O(n) so the overhead of adding this can be rather large. Its worth doing some analysis on your score algorithm to keep the queue length len(chains) managable.