I am trying to answer a question on an online judge in Python, but I am exceeding both the time limit and memory limit. The question is pretty much asking for the number of all paths from a start node to an end node. Full question specifications can be seen here.
This is my code:
import sys
lines = sys.stdin.read().strip().split('\n')
n = int(lines[0])
dict1 = {}
for i in xrange(1, n+1):
dict1[i] = []
for i in xrange(1, len(lines) - 1):
numbers = map(int, lines[i].split())
num1 = numbers[0]
num2 = numbers[1]
dict1[num2].append(num1)
def pathfinder(start, graph, count):
new = []
if start == []:
return count
for i in start:
numList = graph[i]
for j in numList:
if j == 1:
count += 1
else:
new.append(j)
return pathfinder(new, graph, count)
print pathfinder([n], dict1, 0)
What the code does is it starts at the end node, and works its way up to the top by exploring all neighboring nodes. I made essentially a breadth first search algorithm, but its taking up too much space and time. How can I improve this code to make it more efficient? Is my approach wrong and how should I fix it?
Since the graph is acyclic there is a topological ordering which we can immediately see to be 1, 2, ..., n. So we can use dynamic programming the same way it is used to solve the longest path problem. In a list paths the element paths[i] stores how many paths would there be from 1 to i. The update would be simple - for each edge (i,j) where i is from our topological order we do paths[j] += path[i].
from collections import defaultdict
graph = defaultdict(list)
n = int(input())
while True:
tokens = input().split()
a, b = int(tokens[0]), int(tokens[1])
if a == 0:
break
graph[a].append(b)
paths = [0] * (n+1)
paths[1] = 1
for i in range(1, n+1):
for j in graph[i]:
paths[j] += paths[i]
print(paths[n])
Note that what you are implementing is not actually BFS since you don't mark which vertices you've visited making your start to grow out of proportion.
Test the graph
for i in range(1, n+1):
dict1[i] = list(range(i-1, 0, -1))
If you print the size of start you can see that the max value it gets for a given n grows exactly as binomial(n, floor(n/2)) which is ~4^n/sqrt(n). Note also that BFS is not what you want since it is not possible to count the number of paths in that way.
import sys
from collections import defaultdict
def build_matrix(filename, x):
# A[i] stores number of paths from node x to node i.
# O(n) to build parents_of_node
parents_of_node = defaultdict(list)
with open(filename) as infile:
num_nodes = int(infile.readline())
A = [0] * (num_nodes + 1) # A[0] is dummy variable. Not used.
for line in infile:
if line == "0 0":
break
u, v = map(int, line.strip().split())
parents_of_node[v].append(u)
# Initialize all direct descendants of x to 1
if u == x:
A[v] = 1
# Number of paths from x to i = sum(number of paths from x to parent of i)
for i in xrange(1, num_nodes + 1): # O(n)
A[i] += sum(A[p] for p in parents_of_node[i]) # O(max fan-in of graph), assuming O(1) for accessing dict.
# Total time complexity to build A is O(n * (max_fan-in of graph))
return A
def main():
filename = sys.argv[1]
x = 1 # Find number of paths from x
y = 4 # to y
A = build_matrix(filename, x)
print(A[y])
What you are doing is a DFS (not a BFS) in that code...
Here's a link to a good solution...
EDITED:
Use this approach instead...
http://www.geeksforgeeks.org/find-paths-given-source-destination/
Related
First of all, sorry about the naive question. But I couldn't find help elsewhere
I'm trying to create an Optimal Search Tree using Dynamic Programing in Python that receives two lists (a set of keys and a set of frequencies) and returns two answers:
1 - The smallest path cost.
2 - The generated tree for that smallest cost.
I basically need to create a tree organized by the most accessed items on top (most accessed item it's the root), and return the smallest path cost from that tree, by using the Dynamic Programming solution.
I've the following implemented code using Python:
def optimalSearchTree(keys, freq, n):
#Create an auxiliary 2D matrix to store results of subproblems
cost = [[0 for x in xrange(n)] for y in xrange(n)]
#For a single key, cost is equal to frequency of the key
#for i in xrange (0,n):
# cost[i][i] = freq[i]
# Now we need to consider chains of length 2, 3, ... .
# L is chain length.
for L in xrange (2,n):
for i in xrange(0,n-L+1):
j = i+L-1
cost[i][j] = sys.maxint
for r in xrange (i,j):
if (r > i):
c = cost[i][r-1] + sum(freq, i, j)
elif (r < j):
c = cost[r+1][j] + sum(freq, i, j)
elif (c < cost[i][j]):
cost[i][j] = c
return cost[0][n-1]
def sum(freq, i, j):
s = 0
k = i
for k in xrange (k,j):
s += freq[k]
return s
keys = [10,12,20]
freq = [34,8,50]
n=sys.getsizeof(keys)/sys.getsizeof(keys[0])
print(optimalSearchTree(keys, freq, n))
I'm trying to output the answer 1. The smallest cost for that tree should be 142 (the value stored on the Matrix Position [0][n-1], according to the Dynamic Programming solution). But unfortunately it's returning 0. I couldn't find any issues in that code. What's going wrong?
You have several very questionable statements in your code, definitely inspired by C/Java programming practices. For instance,
keys = [10,12,20]
freq = [34,8,50]
n=sys.getsizeof(keys)/sys.getsizeof(keys[0])
I think you think you calculate the number of items in the list. However, n is not 3:
sys.getsizeof(keys)/sys.getsizeof(keys[0])
3.142857142857143
What you need is this:
n = len(keys)
One more find: elif (r < j) is always True, because r is in the range between i (inclusive) and j (exclusive). The elif (c < cost[i][j]) condition is never checked. The matrix c is never updated in the loop - that's why you always end up with a 0.
Another suggestion: do not overwrite the built-in function sum(). Your namesake function calculates the sum of all items in a slice of a list:
sum(freq[i:j])
import sys
def optimalSearchTree(keys, freq):
#Create an auxiliary 2D matrix to store results of subproblems
n = len(keys)
cost = [[0 for x in range(n)] for y in range(n)]
storeRoot = [[0 for i in range(n)] for i in range(n)]
#For a single key, cost is equal to frequency of the key
for i in range (0,n):
cost[i][i] = freq[i]
# Now we need to consider chains of length 2, 3, ... .
# L is chain length.
for L in range (2,n+1):
for i in range(0,n-L+1):
j = i + L - 1
cost[i][j] = sys.maxsize
for r in range (i,j+1):
c = (cost[i][r-1] if r > i else 0)
c += (cost[r+1][j] if r < j else 0)
c += sum(freq[i:j+1])
if (c < cost[i][j]):
cost[i][j] = c
storeRoot[i][j] = r
return cost[0][n-1], storeRoot
if __name__ == "__main__" :
keys = [10,12,20]
freq = [34,8,50]
print(optimalSearchTree(keys, freq))
Given an undirected graph consisting of N nodes (labelled 1 to N) where a node S represents the start position and an edge between any two nodes is of length 6 units in the graph. Problem here.
It is required to calculate the shortest distance from start position (Node S) to all of the other nodes in the graph.
Solution: This clearly is an application of floyd algorithm for minimum distances.
What I've tried: I have tried below code and it is passing 2 testcases but failing in all other test cases. I am at my wits end as to the sneaky bug. I just want hint towards the solution. It would be nice to provide hints to other ways to solve this with respect to complexity but I am looking for a sneaky bug with the current code.
def short_paths(cost, nodes):
for i in range(1, nodes):
for j in range(1, nodes):
for k in range(1, nodes):
if cost[i][j] > cost[i][k]+cost[k][j]:
cost[i][j] = cost[i][k]+cost[k][j]
return cost
tests = int(input())
while tests:
x = input().split(" ")
nodes, edges = int(x[0]), int(x[1])
#initialize everything with infinity
dp = [[1<<31 for i in range(nodes+1)] for i in range(nodes+1)]
#distance between self is 0
for i in range(nodes+1):
dp[i][i] = 0
while edges:
p = input().split(" ")
x, y = int(p[0]), int(p[1])
#undirected graph
dp[x][y] = 6
dp[y][x] = 6
edges -= 1
src = int(input())
dp = short_paths(dp, nodes+1)
result = []
for i in range(1, nodes+1):
if src != i:
if dp[src][i] == 1<<31:
result.append("-1")
else:
result.append(dp[src][i])
print(" ".join(str(e) for e in result))
tests -= 1
I think there is a problem in these lines:
for i in range(1, nodes):
for j in range(1, nodes):
for k in range(1, nodes):
You should iterate over k first in order for the result to be correct:
Try:
for k in range(1, nodes):
for i in range(1, nodes):
for j in range(1, nodes):
As the DP uses previous results it turns out that the order of the iteration is crucial to get the correct results.
The way I remember the order is to think that the k^th iteration of the algorithm computes the shortest path from i to j using just intermediate nodes just from positions 1 to k.
However, for this problem this O(N^3) approach will timeout. A better approach is to perform a breadth first search from the starting location which will have complexity of N+M instead.
import queue
def BFS(s):
q = queue.Queue()
q.put(s)
visited[s] = True
dist[s] = 0
while not q.empty():
u = q.get()
for v in graph[u]:
if not visited[v]:
visited[v] = True
q.put(v)
dist[v] = dist[u] + 1
Q = int(input())
for _ in range(Q):
n, m = map(int, input().split())
graph = [[] for i in range(n)]
visited = [False for i in range(n)]
dist = [-1 for i in range(n)]
for i in range(m):
u, v = map(lambda x: int(x) - 1, input().split())
graph[u].append(v)
graph[v].append(u)
s = int(input()) - 1
BFS(s)
for i in range(n):
if i == s:
continue
print(dist[i]*6 if dist[i] != -1 else '-1', end = ' ')
print()
Just use normal BFS
I wrote code to arrange numbers after taking user input. The ordering requires that the sum of adjacent numbers is prime. Up until 10 as an input code is working fine. If I go beyond that the system hangs. Please let me know the steps to optimize it
ex input 8
Answer should be: (1, 2, 3, 4, 7, 6, 5, 8)
Code as follows....
import itertools
x = raw_input("please enter a number")
range_x = range(int(x)+1)
del range_x[0]
result = list(itertools.permutations(range_x))
def prime(x):
for i in xrange(1,x,2):
if i == 1:
i = i+1
if x%i==0 and i < x :
return False
else:
return True
def is_prime(a):
for i in xrange(len(a)):
print a
if i < len(a)-1:
if prime(a[i]+a[i+1]):
pass
else:
return False
else:
return True
for i in xrange(len(result)):
if i < len(result)-1:
if is_prime(result[i]):
print 'result is:'
print result[i]
break
else:
print 'result is'
print result[i-1]
For posterity ;-), here's one more based on finding a Hamiltonian path. It's Python3 code. As written, it stops upon finding the first path, but can easily be changed to generate all paths. On my box, it finds a solution for all n in 1 through 900 inclusive in about one minute total. For n somewhat larger than 900, it exceeds the maximum recursion depth.
The prime generator (psieve()) is vast overkill for this particular problem, but I had it handy and didn't feel like writing another ;-)
The path finder (ham()) is a recursive backtracking search, using what's often (but not always) a very effective ordering heuristic: of all the vertices adjacent to the last vertex in the path so far, look first at those with the fewest remaining exits. For example, this is "the usual" heuristic applied to solving Knights Tour problems. In that context, it often finds a tour with no backtracking needed at all. Your problem appears to be a little tougher than that.
def psieve():
import itertools
yield from (2, 3, 5, 7)
D = {}
ps = psieve()
next(ps)
p = next(ps)
assert p == 3
psq = p*p
for i in itertools.count(9, 2):
if i in D: # composite
step = D.pop(i)
elif i < psq: # prime
yield i
continue
else: # composite, = p*p
assert i == psq
step = 2*p
p = next(ps)
psq = p*p
i += step
while i in D:
i += step
D[i] = step
def build_graph(n):
primes = set()
for p in psieve():
if p > 2*n:
break
else:
primes.add(p)
np1 = n+1
adj = [set() for i in range(np1)]
for i in range(1, np1):
for j in range(i+1, np1):
if i+j in primes:
adj[i].add(j)
adj[j].add(i)
return set(range(1, np1)), adj
def ham(nodes, adj):
class EarlyExit(Exception):
pass
def inner(index):
if index == n:
raise EarlyExit
avail = adj[result[index-1]] if index else nodes
for i in sorted(avail, key=lambda j: len(adj[j])):
# Remove vertex i from the graph. If this isolates
# more than 1 vertex, no path is possible.
result[index] = i
nodes.remove(i)
nisolated = 0
for j in adj[i]:
adj[j].remove(i)
if not adj[j]:
nisolated += 1
if nisolated > 1:
break
if nisolated < 2:
inner(index + 1)
nodes.add(i)
for j in adj[i]:
adj[j].add(i)
n = len(nodes)
result = [None] * n
try:
inner(0)
except EarlyExit:
return result
def solve(n):
nodes, adj = build_graph(n)
return ham(nodes, adj)
This answer is based on #Tim Peters' suggestion about Hamiltonian paths.
There are many possible solutions. To avoid excessive memory consumption for intermediate solutions, a random path can be generated. It also allows to utilize multiple CPUs easily (each cpu generates its own paths in parallel).
import multiprocessing as mp
import sys
def main():
number = int(sys.argv[1])
# directed graph, vertices: 1..number (including ends)
# there is an edge between i and j if (i+j) is prime
vertices = range(1, number+1)
G = {} # vertex -> adjacent vertices
is_prime = sieve_of_eratosthenes(2*number+1)
for i in vertices:
G[i] = []
for j in vertices:
if is_prime[i + j]:
G[i].append(j) # there is an edge from i to j in the graph
# utilize multiple cpus
q = mp.Queue()
for _ in range(mp.cpu_count()):
p = mp.Process(target=hamiltonian_random, args=[G, q])
p.daemon = True # do not survive the main process
p.start()
print(q.get())
if __name__=="__main__":
main()
where Sieve of Eratosthenes is:
def sieve_of_eratosthenes(limit):
is_prime = [True]*limit
is_prime[0] = is_prime[1] = False # zero and one are not primes
for n in range(int(limit**.5 + .5)):
if is_prime[n]:
for composite in range(n*n, limit, n):
is_prime[composite] = False
return is_prime
and:
import random
def hamiltonian_random(graph, result_queue):
"""Build random paths until Hamiltonian path is found."""
vertices = list(graph.keys())
while True:
# build random path
path = [random.choice(vertices)] # start with a random vertice
while True: # until path can be extended with a random adjacent vertex
neighbours = graph[path[-1]]
random.shuffle(neighbours)
for adjacent_vertex in neighbours:
if adjacent_vertex not in path:
path.append(adjacent_vertex)
break
else: # can't extend path
break
# check whether it is hamiltonian
if len(path) == len(vertices):
assert set(path) == set(vertices)
result_queue.put(path) # found hamiltonian path
return
Example
$ python order-adjacent-prime-sum.py 20
Output
[19, 18, 13, 10, 1, 4, 9, 14, 5, 6, 17, 2, 15, 16, 7, 12, 11, 8, 3, 20]
The output is a random sequence that satisfies the conditions:
it is a permutation of the range from 1 to 20 (including)
the sum of adjacent numbers is prime
Time performance
It takes around 10 seconds on average to get result for n = 900 and extrapolating the time as exponential function, it should take around 20 seconds for n = 1000:
The image is generated using this code:
import numpy as np
figname = 'hamiltonian_random_noset-noseq-900-900'
Ns, Ts = np.loadtxt(figname+'.xy', unpack=True)
# use polyfit to fit the data
# y = c*a**n
# log y = log (c * a ** n)
# log Ts = log c + Ns * log a
coeffs = np.polyfit(Ns, np.log2(Ts), deg=1)
poly = np.poly1d(coeffs, variable='Ns')
# use curve_fit to fit the data
from scipy.optimize import curve_fit
def func(x, a, c):
return c*a**x
popt, pcov = curve_fit(func, Ns, Ts)
aa, cc = popt
a, c = 2**coeffs
# plot it
import matplotlib.pyplot as plt
plt.figure()
plt.plot(Ns, np.log2(Ts), 'ko', label='time measurements')
plt.plot(Ns, np.polyval(poly, Ns), 'r-',
label=r'$time = %.2g\times %.4g^N$' % (c, a))
plt.plot(Ns, np.log2(func(Ns, *popt)), 'b-',
label=r'$time = %.2g\times %.4g^N$' % (cc, aa))
plt.xlabel('N')
plt.ylabel('log2(time in seconds)')
plt.legend(loc='upper left')
plt.show()
Fitted values:
>>> c*a**np.array([900, 1000])
array([ 11.37200806, 21.56029156])
>>> func([900, 1000], *popt)
array([ 14.1521409 , 22.62916398])
Dynamic programming, to the rescue:
def is_prime(n):
return all(n % i != 0 for i in range(2, n))
def order(numbers, current=[]):
if not numbers:
return current
for i, n in enumerate(numbers):
if current and not is_prime(n + current[-1]):
continue
result = order(numbers[:i] + numbers[i + 1:], current + [n])
if result:
return result
return False
result = order(range(500))
for i in range(len(result) - 1):
assert is_prime(result[i] + result[i + 1])
You can force it to work for even larger lists by increasing the maximum recursion depth.
Here's my take on a solution. As Tim Peters pointed out, this is a Hamiltonian path problem.
So the first step is to generate the graph in some form.
Well the zeroth step in this case to generate prime numbers. I'm going to use a sieve, but whatever prime test is fine. We need primes upto 2 * n since that is the largest any two numbers can sum to.
m = 8
n = m + 1 # Just so I don't have to worry about zero indexes and random +/- 1's
primelen = 2 * m
prime = [True] * primelen
prime[0] = prime[1] = False
for i in range(4, primelen, 2):
prime[i] = False
for i in range(3, primelen, 2):
if not prime[i]:
continue
for j in range(i * i, primelen, i):
prime[j] = False
Ok, now we can test for primality with prime[i]. Now its easy to make the graph edges. If I have a number i, what numbers can come next. I'll also make use of the fact that i and j have opposite parity.
pairs = [set(j for j in range(i%2+1, n, 2) if prime[i+j])
for i in range(n)]
So here pairs[i] is set object whose elements are integers j such that i+j is prime.
Now we need to walk the graph. This is really where the time consuming part is and all further optimizations will be done here.
chains = [
([], set(range(1, n))
]
chains is going to keep track of the valid paths as we walk them. The first element in the tuple will be your result. The second element is all the unused numbers, or unvisited nodes. The idea is to take one chain out of the queue, take a step down the path and put it back.
while chains:
chain, unused = chains.pop()
if not chain:
# we haven't even started, all unused are valid
valid_next = unused
else:
# We need numbers that are both unused and paired with the last node
# Using sets makes this easy
valid_next = unused & pairs[chains[-1]]
for num in valid_next:
# Take a step to the new node and add the new path back to chains
# Reminder, its important not to mutate anything here, always make new objs
newchain = chain + [num]
newunused = unused - set([num])
chains.append( (newchain, newunused) )
# are we done?
if not newunused:
print newchain
chains = False
Notice that if there is no valid next step, the path is removed without a replacement.
This is really memory inefficient, but runs in a reasonable time. The biggest performance bottleneck is walking the graph, so the next optimization would be popping and inserting paths in intelligent places to prioritize the most likely paths. It might be helpful to use a collections.deque or different container for your chains in that case.
EDIT
Here is an example of how you can implement your path priority. We will assign each path a score and keep the chains list sorted by this score. For a simple example I will suggest that paths containing "harder to use" nodes are worth more. That is for each step on a path the score will increase by n - len(valid_next) The modified code will look something like this.
import bisect
chains = ...
chains_score = [0]
while chains:
chain, unused = chains.pop()
score = chains_score.pop()
...
for num in valid_next:
newchain = chain + [num]
newunused = unused - set([num])
newscore = score + n - len(valid_next)
index = bisect.bisect(chains_score, newscore)
chains.insert(index, (newchain, newunused))
chains_score.insert(index, newscore)
Remember that insertion is O(n) so the overhead of adding this can be rather large. Its worth doing some analysis on your score algorithm to keep the queue length len(chains) managable.
This is a simple question that has been bothering me for a while now.
I am attempting to rewrite my code to be parallel, and in the process I need to split up a sum to be done on multiple nodes and then add those small sums together. The piece that I am working with is this:
def pia(n, i):
k = 0
lsum = 0
while k < n:
p = (n-k)
ld = (8.0*k+i)
ln = pow(16.0, p, ld)
lsum += (ln/ld)
k += 1
return lsum
where n is the limit and i is an integer. Does anyone have some hints on how to split this up and get the same result in the end?
Edit: For those asking, I'm not using pow() but a custom version to do it efficiently with floating point:
def ssp(b, n, m):
ssp = 1
while n>0:
if n % 2 == 1:
ssp = (b*ssp) % m
b = (b**2) % m
n = n // 2
return ssp
Since the only variable that's used from one pass to the next is k, and k just increments by one each time, it's easy to split the calculation.
If you also pass k into pia, then you'll have both a definable starting and ending points, and you can split this up into as many pieces as you want, and at the end, add all the results together. So something like:
# instead of pia(20000, i), use pia(n, i, k) and run
result = pia(20000, i, 10000) + pia(10000, i, 0)
Also, since n is used to both set the limits and in the calculation directly, these two uses need to be split.
from math import pow
def pia(nlimit, ncalc, i, k):
lsum = 0
while k < nlimit:
p = ncalc-k
ld = 8.0*k+i
ln = ssp(16., p, ld)
lsum += ln/ld
k += 1
return lsum
if __name__=="__main__":
i, ncalc = 5, 10
print pia(10, ncalc, i, 0)
print pia(5, ncalc, i, 0) + pia(10, ncalc, i, 5)
Looks like I found a way. What I did was in the sum I had each node calculate a portion (ex. node one calculates k=1, node 2 k=2, node 3 k=3, node 4 k=4, node 1 k=5...) and then gathered them up and added them.
Count the longest sequence of heads and tails in 200 coin flips.
I did this - is there a niftier way to do it in python? (without being too obfuscated)
import random
def toss(n):
count = [0,0]
longest = [0,0]
for i in xrange(n):
coinface = random.randrange(2)
count[coinface] += 1
count[not coinface] = 0
if count[coinface] > longest[coinface]:
longest[coinface] = count[coinface]
#print coinface, count, longest
print "longest sequence heads %d, tails %d" %tuple(longest)
if __name__ == '__main__':
toss(200)
see this for what prompted my playing
def coins(num):
lst = [random.randrange(2) for i in range(num)]
lst = [(i, len(list(j))) for i, j in itertools.groupby(lst)]
tails = max(j for i, j in lst if i)
heads = max(j for i, j in lst if not i)
return {1: tails, 0: heads}
import collections, itertools, random
def makesequence(choices=2, length=200):
return [random.randrange(choices) for _ in itertools.repeat(None, length)]
def runlengths(sequence):
runlength_by_item = collections.defaultdict(set)
for key, group in itertools.groupby(sequence):
runlength_by_item[key].add(sum(1 for _ in group))
return dict((k, max(v)) for k, v in runlength_by_item.items())
As you'll notice, this is much more "decoupled" -- runlengths is a completely general way to determine the maximal run-lengths of different hashable items in any iterable (highly reusable if you need such run-lengths in a variety of different contexts), just as makesequence is a completely general way to make a list of random numbers given list length and number of choices for each random number. Putting these two together may not offer an optimal point-solution to a given, highly specific problem, but it will come close, and building up your little library of reusable "building blocks" will have much higher longer-term returns than just solving each specific problem by entirely dedicated code.
You can use itertools, which is a much more Pythonic way to do this:
def toss(n):
rolls = [random.randrange(2) for i in xrange(n)]
maximums = [0, 0]
for which, grp in itertools.groupby(rolls):
maximums[which] = max(len(list(grp)), maximums[which])
print "Longest sequence of heads %d, tails %d" % tuple(maximums)
Another inefficient solution :-)
import random, re
s = ''.join(str(random.randrange(2)) for c in range(10))
print s
print max(re.findall(r'0+', s))
print max(re.findall(r'1+', s))
>>>
0011100100
00
111
>>>
>>> def toss(count):
result = []
for i in range(count):
result.append("HT"[random.randrange(0, 2)])
return ''.join(result)
>>> s = toss(200)
>>> h_max = max(len(x) for x in s.split("T"))
>>> t_max = max(len(x) for x in s.split("H"))
>>> print h_max, t_max
4 6
This isn't really pythonic so much as tortured, but here's a short version (with meaningless 1-character variable names, no less!)
import random
x = ''.join([chr(random.randrange(2)) for i in range(200)])
print max([len(s) for s in x.split(chr(0)) + x.split(chr(1))])
import random, itertools
def toss(n):
faces = (random.randrange(2) for i in range(n))
longest = [0, 0]
for face, seq in itertools.groupby(faces):
longest[face] = max(longest[face], len(list(seq)))
print "longest sequence heads %d, tails %d" % tuple(longest)
It is probably an axiom that any code can be made more succinct. Yours looks perfectly pythonic, though.
Actually, on reflection perhaps there is no succinctness axiom like that. If succinct means "marked by compact precise expression without wasted words," and if by "words" we mean words of code and not of memory, then a single word program cannot be made more succinct (unless, perhaps, it is the "exit" program).
If pythonic means "of extraordinary size and power", then it seems antagonistic to succinctness unless we restrict our definition to power only. I'm not convinced your program resembles a prophetic oracle at all, although you might implement it as an ascii portrait of a particular prophetic oracle. It doesn't look like a snake, so there's room for improvement there too.
import random
def toss(n):
'''
___ ____________
<<<((__O\ (__<>___<>__ \ ____
\ \_(__<>___<>__)\O\_/O___>-< hiss
\O__<>___<>___<>)\___/
'''
count = [0,0]
longest = [0,0]
for i in xrange(n):
coinface = random.randrange(2)
count[coinface] += 1
count[not coinface] = 0
if count[coinface] > longest[coinface]:
longest[coinface] = count[coinface]
#print coinface, count, longest
print "longest sequence heads %d, tails %d" %tuple(longest)
if __name__ == '__main__':
toss(200)
Nifty, huh?
String scanning algorithm
If you are looking for a fast algorithm, then you can use the algorithm I developed recently for an interview question that asked for the longest string of consecutive letters in a string. See blog entry here.
def search_longest_substring(s):
"""
>>> search_longest_substring('AABBBBCBBBBACCDDDDDDAAABBBBCBBBBACCDDDDDDDAAABBBBCBBBBACCDDDDDDA')
(7, 'D')
"""
def find_left(s, midc, mid, left):
for j in range(mid-1, left-1, -1):
if s[j] != midc:
return j + 1
return left
def find_right(s, midc, mid, right):
for k in range(mid+1, right):
if s[k] != midc:
return k
return right
i, longest = 0, (0, '')
while i < len(s):
c = s[i]
j = find_left(s, c, i, i-longest[0])
k = find_right(s, c, i, len(s))
if k-j > longest[0]:
longest = (k-j, c)
i = k + longest[0]
return longest
if __name__ == '__main__':
import random
heads_or_tails = "".join(["HT"[random.randrange(0, 2)] for _ in range(20)])
print search_longest_substring(heads_or_tails)
print heads_or_tails
This algorithm is O(n) in worst case (all coin flips are identical) or O(n/m) in average case (where m is the length of the longest match). Feel free to correct me on this.
The code is not especially pythonic (i.e. it does not use list comprehensions or itertools or other stuff). It's in python and it's a good algorithm.
Micro-optimizations
For the micro-optimization crowd, here are changes that make this really scream in python 2.6 on a Windows Vista laptop:
def find_left(s, midc, mid, left):
j = mid - 1
while j >= 0:
if s[j] != midc:
return j + 1
j -= 1
return left
def find_right(s, midc, mid, right):
k = mid+1
while k < right:
if s[k] != midc:
return k
k += 1
return right
Timing results for 1000 iterations with timeit:
range: 2.670
xrange: 0.3268
while-loop: 0.255
Adding psyco import to the file:
try:
import psyco
psyco.full()
except ImportError:
pass
0.011 on 1000 iterations with psyco and while-loop. So with judicious micros-optimizations and importing psyco, the code runs 250-ish times faster.