So I have a recursive code that gives the best alignment for 2 DNA strands, but the problem is that it performs very slowly (I need it to be recursive). Then I read on an MIT website that the results are additive, which is great for me, but then I thought about it a little bit and I found out there is a problem:
website: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-096-algorithms-for-computational-biology-spring-2005/lecture-notes/lecture5_newest.pdf
The MIT website says that for a given spilt(i,j):
first_strand(0, i) and second_strand(0,j) alignment
+
first_strand(i, len) and second_strand(j, len) alignment
equals
first_strand and second strand alignment
but:
GTC GTAA
G with GTA alignment is G-- and GTA
TC with A alignment is TC and A-
result = G--TC and GTAA-
real best result = GTC- GTAA
Can anyone explain what they mean on the MIT website? I'm probably getting it all wrong!
I assume you're talking about this link.
If so, read it very carefully hundreds of times ;-) It's "additive" given that you're only considering alignments where the split is fixed at a specific (i, j) pair.
In your supposed counterexample, you started by breaking the initial G off of GTC and the initial GTA off of GTAA. Then G-- is the shortest way to change GTC into G. Fine. Continuing with the same split, you still needed to align the remaining right-hand parts: TC with A. Also fine.
This is no claim that this is the best possible split. There's only the claim that it's the best possible alignment given that you're only considering that specific split.
It's one small step in the dynamic programming approach, which is the part you're missing. It remains to compute the best alignments across all possible splits.
Dynamic programming is tricky at first. You shouldn't expect to learn it from staring at telegraphic slides. Read a real textbook, or search the web for tutorials.
Speeding a recursive version
The comments indicate that the code for this "must" be recursive. Oh well ;-)
Caution: I just threw this together to illustrate a general procedure for speeding suitable recursive functions. It's barely been tested at all.
First an utterly naive recursive version:
def lev(a, b):
if not a:
return len(b)
if not b:
return len(a)
return min(lev(a[:-1], b[:-1]) + (a[-1] != b[-1]),
lev(a[:-1], b) + 1,
lev(a, b[:-1]) + 1)
I'll be using "absd31-km" and "ldk3-1fjm" as arguments in all runs discussed here.
On my box, using Python 3, that simple function returns 7 after about 1.6 seconds. It's horribly slow.
The most obvious problem is the endlessly repeated string slicing. Each : in an index takes time proportional to the current length of the string being sliced. So the first refinement is to pass string indices instead. Since the code always slices off a prefix of a string, we only need to pass the "end of string" indices:
def lev2(a, b):
def inner(j1, j2):
if j1 < 0:
return j2 + 1
if j2 < 0:
return j1 + 1
return min(inner(j1-1, j2-1) + (a[j1] != b[j2]),
inner(j1-1, j2) + 1,
inner(j1, j2-1) + 1)
return inner(len(a)-1, len(b)-1)
Much better! This version returns 7 in "only" about 1.44 seconds. Still horridly slow, but better than the original. It's advantage would increase on longer strings, but who cares ;-)
We're almost done! The important thing to notice now is that the function passes the same arguments many times over the course of a run. We capture those in "a memo" to avoid all the redundant computation:
def lev3(a, b):
memo = {}
def inner(j1, j2):
if j1 < 0:
return j2 + 1
if j2 < 0:
return j1 + 1
args = j1, j2
if args in memo:
return memo[args]
result = min(inner(j1-1, j2-1) + (a[j1] != b[j2]),
inner(j1-1, j2) + 1,
inner(j1, j2-1) + 1)
memo[args] = result
return result
return inner(len(a)-1, len(b)-1)
That version returns 7 in about 0.00026 seconds, over 5000 times faster than lev2 did it.
Now if you've studied the matrix-based algorithms, and squint a little, you'll see that lev3() effectively builds a 2-dimensional matrix mapping index pairs to results in its memo dictionary. They're really the same thing, except that the recursive version builds the matrix in a more convoluted way. On the other hand, the recursive version may be easier to understand and to reason about. Note that the slides you found called the memoization aporoach "top down" and the nested-loop matrix approach "bottom up". Those are nicely descriptive.
You haven't said anything about how your recursive function works, but if it suffers any similar kinds of recursive excess, you should be able to get similar speedups using similar techniques :-)
Related
So this is my line of code so far,
def Adder (i,j,k):
if i<=j:
for x in range (i, j+1):
print(x**k)
else:
print (0)
What it's supposed to do is get inputs (i,j,k) so that each number between [i,j] is multiplied the power of k. For example, Adder(3,6,2) would be 3^2 + 4^2 + 5^2 + 6^2 and eventually output 86. I know how to get the function to output the list of numbers between i and j to the power of K but I don't know how to make it so that the function sums that output. So in the case of my given example, my output would be 9, 16, 25, 36.
Is it possible to make it so that under my if conditional I can generate an output that adds up the numbers in the range after they've been taken to the power of K?
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Question now Answered, thanks to everyone who responded so quickly!
You could use built-in function sum()
def adder(i,j,k):
if i <= j:
print(sum(x**k for x in range(i,j+1)))
else:
print(0)
The documentation is here
I'm not sure if this is what you want but
if i<=j:
sum = 0
for x in range (i, j+1):
sum = sum + x**k #sum += x**k for simplicity
this will give you the sum of the powers
Looking at a few of the answers posted, they do a good job of giving you pythonic code for your solution, I thought I could answer your specific questions:
How can I get my function to add together its output?
A perhaps reasonable way is to iteratively and incrementally perform your calculations and store your interim solutions in a variable. See if you can visualize this:
Let's say (i,j,k) = (3,7,2)
We want the output to be: 135 (i.e., the result of the calculation 3^2 + 4^2 + 5^2 + 6^2 + 7^2)
Use a variable, call it result and initialize it to be zero.
As your for loop kicks off with x = 3, perform x^2 and add it to result. So result now stores the interim result 9. Now the loop moves on to x = 4. Same as the first iteration, perform x^2 and add it to result. Now result is 25. You can now imagine that result, by the time x = 7, contains the answer to the calculation 3^2+4^2+5^2+6^2. Let the loop finish, and you will find that 7^2 is also added to result.
Once loop is finished, print result to get the summed up answer.
A thing to note:
Consider where in your code you need to set and initialize the _result_ variable.
If anyone can give me some advice I would really appreciate it! First week of any coding ever and I don't quite know how to ask this question so sorry for vagueness!
Perhaps a bit advanced for you, but helpful to be made aware I think:
Alright, let's get some nuance added to this discussion. Since this is your first week, I wanted to jot down some things I had to learn which have helped greatly.
Iterative and Recursive Algorithms
First off, identify that the solution is an iterative type of algorithm. Where the actual calculation is the same, but is executed over different cumulative data.
In this example, if we were to represent the calculation as an operation called ADDER(i,j,k), then:
ADDER(3,7,2) = ADDER(3,6,2)+ 7^2
ADDER(3,6,2) = ADDER(3,5,2) + 6^2
ADDER(3,5,2) = ADDER(3,4,2) + 5^2
ADDER(3,4,2) = ADDER(3,3,2) + 4^2
ADDER(3,3,2) = 0 + 3^2
Problems like these can be solved iteratively (like using a loop, be it while or for) or recursively (where a function calls itself using a subset of the data). In your example, you can envision a function calling itself and each time it is called it does the following:
calculates the square of j and
adds it to the value returned from calling itself with j decremented
by 1 until
j < i, at which point it returns 0
Once the limiting condition (Point 3) is reached, a bunch of additions that were queued up along the way are triggered.
Learn to Speak The Language before using Idioms
I may get down-voted for this, but you will encounter a lot of advice displaying pythonic idioms for standard solutions. The idiomatic solution for your example would be as follows:
def adder(i,j,k):
return sum(x**k for x in range(i,j+1)) if i<=j else 0
But for a beginner this obscures a lot of the "science". It is far more rewarding to tread the simpler path as a beginner. Once you develop your own basic understanding of devising and implementing algorithms in python, then the idioms will make sense.
Just so you can lean into the above idiom, here's an explanation of what it does:
It calls the standard library function called sum which can operate over a list as well as an iterator. We feed it as argument a generator expression which does the job of the iterator by "drip feeding" the sum function with x^k values as it iterates over the range (1, j+1). In cases when N (which is j-i) is arbitrarily large, using a standard list can result in huge memory overhead and performance disadvantages. Using a generator expression allows us to avoid these issues, as iterators (which is what generator expressions create) will overwrite the same piece of memory with the new value and only generate the next value when needed.
Of course it only does all this if i <= j else it will return 0.
Lastly, make mistakes and ask questions. The community is great and very helpful
Well, do not use print. It is easy to modify your function like this,
if i<=j:
s = 0
for x in range (i, j+1):
s += x**k
return s # print(s) if you really want to
else:
return 0
Usually functions do not print anything. Instead they return values for their caller to either print or further process. For example, someone may want to find the value of Adder(3, 6, 2)+1, but if you return nothing, they have no way to do this, since the result is not passed to the program. A side note, do not capitalize functions. Those are for classes.
I am new to StackOverflow, and I am extremely new to Python.
My problem is this... I am needing to write a double-sum, as follows:
The motivation is that this is the angular correction to the gravitational potential used for the geoid.
I am having difficulty writing the sums. And please, before you say "Go to such-and-such a resource," or get impatient with me, this is the first time I have ever done coding/programming/whatever this is.
Is this a good place to use a "for" loop?
I have data for the two indices (n,m) and for the coefficients c_{nm} and s_{nm} in a .txt file. Each of those items is a column. When I say usecols, do I number them 0 through 3, or 1 through 4?
(the equation above)
\begin{equation}
V(r, \phi, \lambda) = \sum_{n=2}^{360}\left(\frac{a}{r}\right)^{n}\sum_{m=0}^{n}\left[c_{nm}*\cos{(m\lambda)} + s_{nm}*\sin{(m\lambda)}\right]*\sqrt{\frac{(n-m)!}{(n+m)!}(2n + 1)(2 - \delta_{m0})}P_{nm}(\sin{\lambda})
\end{equation}
(2) Yes, a "for" loop is fine. As #jpmc26 notes, a generator expression is a good alternative to a "for" loop. IMO, you'll want to use numpy if efficiency is important to you.
(3) As #askewchan notes, "usecols" refers to an argument of genfromtxt; as specified in that documentation, column indexes start at 0, so you'll want to use 0 to 3.
A naive implementation might be okay since the larger factorial is the denominator, but I wouldn't be surprised if you run into numerical issues. Here's something to get you started. Note that you'll need to define P() and a. I don't understand how "0 through 3" relates to c and s since their indexes range much further. I'm going to assume that each (and delta) has its own file of values.
import math
import numpy
c = numpy.getfromtxt("the_c_file.txt")
s = numpy.getfromtxt("the_s_file.txt")
delta = numpy.getfromtxt("the_delta_file.txt")
def V(r, phi, lam):
ret = 0
for n in xrange(2, 361):
for m in xrange(0, n + 1):
inner = c[n,m]*math.cos(m*lam) + s[n,m]*math.sin(m*lam)
inner *= math.sqrt(math.factorial(n-m)/math.factorial(n+m)*(2*n+1)*(2-delta[m,0]))
inner *= P(n, m, math.sin(lam))
ret += math.pow(a/r, n) * inner
return ret
Make sure to write unittests to check the math. Note that "lambda" is a reserved word.
I have been working on this project for a couple months right now. The ultimate goal of this project is to evaluate an entire digital logic circuit similar to functional testing; just to give a scope of the problem. The topic I created here deals with the issue I'm having with performance of analyzing a boolean expression. For any gate inside a digital circuit, it has an output expression in terms of the global inputs. EX: ((A&B)|(C&D)^E). What I want to do with this expression is then calculate all possible outcomes and determine how much influence each input has on the outcome.
The fastest way that I have found was by building a truth table as a matrix and looking at certain rows (won't go into specifics of that algorithm as it's offtopic), the problem with that is once the number of unique inputs goes above 26-27 (something around that) the memory usage is well beyond 16GB (Max my computer has). You might say "Buy more RAM", but as every increase in inputs by 1, memory usage doubles. Some of the expressions I analyze are well over 200 unique inputs...
The method I use right now uses the compile method to take the expression as the string. Then I create an array with all of the inputs found from the compile method. Then I generate a list row by row of "True" and "False" randomly chosen from a sample of possible values (that way it will be equivalent to rows in a truth table if the sample size is the same size as the range and it will allow me to limit the sample size when things get too long to calculate). These values are then zipped with the input names and used to evaluate the expression. This will give the initial result, after that I go column by column in the random boolean list and flip the boolean then zip it with the inputs again and evaluate it again to determine if the result changed.
So my question is this: Is there a faster way? I have included the code that performs the work. I have tried regular expressions to find and replace but it is always slower (from what I've seen). Take into account that the inner for loop will run N times where N is the number of unique inputs. The outside for loop I limit to run 2^15 if N > 15. So this turns into eval being executed Min(2^N, 2^15) * (1 + N)...
As an update to clarify what I am asking exactly (Sorry for any confusion). The algorithm/logic for calculating what I need is not the issue. I am asking for an alternative to the python built-in 'eval' that will perform the same thing faster. (take a string in the format of a boolean expression, replace the variables in the string with the values in the dictionary and then evaluate the string).
#value is expression as string
comp = compile(value.strip(), '-', 'eval')
inputs = comp.co_names
control = [0]*len(inputs)
#Sequences of random boolean values to be used
random_list = gen_rand_bits(len(inputs))
for row in random_list:
valuedict = dict(zip(inputs, row))
answer = eval(comp, valuedict)
for column in range(len(row)):
row[column] = ~row[column]
newvaluedict = dict(zip(inputs, row))
newanswer = eval(comp, newvaluedict)
row[column] = ~row[column]
if answer != newanswer:
control[column] = control[column] + 1
My question:
Just to make sure that I understand this correctly: Your actual problem is to determine the relative influence of each variable within a boolean expression on the outcome of said expression?
OP answered:
That is what I am calculating but my problem is not with how I calculate it logically but with my use of the python eval built-in to perform evaluating.
So, this seems to be a classic XY problem. You have an actual problem which is to determine the relative influence of each variable within the a boolean expression. You have attempted to solve this in a rather ineffective way, and now that you actually “feel” the inefficiency (in both memory usage and run time), you look for ways to improve your solution instead of looking for better ways to solve your original problem.
In any way, let’s first look at how you are trying to solve this. I’m not exactly sure what gen_rand_bits is supposed to do, so I can’t really take that into account. But still, you are essentially trying out every possible combination of variable assignments and see if flipping the value for a single variable changes the outcome of the formula result. “Luckily”, these are just boolean variables, so you are “only” looking at 2^N possible combinations. This means you have exponential run time. Now, O(2^N) algorithms are in theory very very bad, while in practice it’s often somewhat okay to use them (because most have an acceptable average case and execute fast enough). However, being an exhaustive algorithm, you actually have to look at every single combination and can’t shortcut. Plus the compilation and value evaluation using Python’s eval is apparently not so fast to make the inefficient algorithm acceptable.
So, we should look for a different solution. When looking at your solution, one might say that more efficient is not really possible, but when looking at the original problem, we can argue otherwise.
You essentially want to do things similar to what compilers do as static analysis. You want to look at the source code and analyze it just from there without having to actually evaluate that. As the language you are analyzing is highly restricted (being only a boolean expression with very few operators), this isn’t really that hard.
Code analysis usually works on the abstract syntax tree (or an augmented version of that). Python offers code analysis and abstract syntax tree generation with its ast module. We can use this to parse the expression and get the AST. Then based on the tree, we can analyze how relevant each part of an expression is for the whole.
Now, evaluating the relevance of each variable can get quite complicated, but you can do it all by analyzing the syntax tree. I will show you a simple evaluation that supports all boolean operators but will not further check the semantic influence of expressions:
import ast
class ExpressionEvaluator:
def __init__ (self, rawExpression):
self.raw = rawExpression
self.ast = ast.parse(rawExpression)
def run (self):
return self.evaluate(self.ast.body[0])
def evaluate (self, expr):
if isinstance(expr, ast.Expr):
return self.evaluate(expr.value)
elif isinstance(expr, ast.Name):
return self.evaluateName(expr)
elif isinstance(expr, ast.UnaryOp):
if isinstance(expr.op, ast.Invert):
return self.evaluateInvert(expr)
else:
raise Exception('Unknown unary operation {}'.format(expr.op))
elif isinstance(expr, ast.BinOp):
if isinstance(expr.op, ast.BitOr):
return self.evaluateBitOr(expr.left, expr.right)
elif isinstance(expr.op, ast.BitAnd):
return self.evaluateBitAnd(expr.left, expr.right)
elif isinstance(expr.op, ast.BitXor):
return self.evaluateBitXor(expr.left, expr.right)
else:
raise Exception('Unknown binary operation {}'.format(expr.op))
else:
raise Exception('Unknown expression {}'.format(expr))
def evaluateName (self, expr):
return { expr.id: 1 }
def evaluateInvert (self, expr):
return self.evaluate(expr.operand)
def evaluateBitOr (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def evaluateBitAnd (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def evaluateBitXor (self, left, right):
return self.join(self.evaluate(left), .5, self.evaluate(right), .5)
def join (self, a, ratioA, b, ratioB):
d = { k: v * ratioA for k, v in a.items() }
for k, v in b.items():
if k in d:
d[k] += v * ratioB
else:
d[k] = v * ratioB
return d
expr = '((A&B)|(C&D)^~E)'
ee = ExpressionEvaluator(expr)
print(ee.run())
# > {'A': 0.25, 'C': 0.125, 'B': 0.25, 'E': 0.25, 'D': 0.125}
This implementation will essentially generate a plain AST for the given expression and the recursively walk through the tree and evaluate the different operators. The big evaluate method just delegates the work to the type specific methods below; it’s similar to what ast.NodeVisitor does except that we return the analyzation results from each node here. One could augment the nodes instead of returning it instead though.
In this case, the evaluation is just based on ocurrence in the expression. I don’t explicitely check for semantic effects. So for an expression A | (A & B), I get {'A': 0.75, 'B': 0.25}, although one could argue that semantically B has no relevance at all to the result (making it {'A': 1} instead). This is however something I’ll leave for you. As of now, every binary operation is handled identically (each operand getting a relevance of 50%), but that can be of course adjusted to introduce some semantic rules.
In any way, it will not be necessary to actually test variable assignments.
Instead of reinventing the wheel and getting into risk like performance and security which you are already in, it is better to search for industry ready well accepted libraries.
Logic Module of sympy would do the exact thing that you want to achieve without resorting to evil ohh I meant eval. More importantly, as the boolean expression is not a string you don;t have to care about parsing the expression which generally turns out to be the bottleneck.
You don't have to prepare a static table for computing this. Python is a dynamic language, thus it's able to interpret and run a code by itself during runtime.
In you case, I would suggest a soluation that:
import random, re, time
#Step 1: Input your expression as a string
logic_exp = "A|B&(C|D)&E|(F|G|H&(I&J|K|(L&M|N&O|P|Q&R|S)&T)|U&V|W&X&Y)"
#Step 2: Retrieve all the variable names.
# You can design a rule for naming, and use regex to retrieve them.
# Here for example, I consider all the single-cap-lettler are variables.
name_regex = re.compile(r"[A-Z]")
#Step 3: Replace each variable with its value.
# You could get the value with reading files or keyboard input.
# Here for example I just use random 0 or 1.
for name in name_regex.findall(logic_exp):
logic_exp = logic_exp.replace(name, str(random.randrange(2)))
#Step 4: Replace the operators. Python use 'and', 'or' instead of '&', '|'
logic_exp = logic_exp.replace("&", " and ")
logic_exp = logic_exp.replace("|", " or " )
#Step 5: interpret the expression with eval(exp) and output its value.
print "exporession =", logic_exp
print "expression output =",eval(logic_exp)
This would be very fast and take very little memory. For a test, I run the example above with 25 input variables:
exporession = 1 or 1 and (1 or 1) and 0 or (0 or 0 or 1 and (1 and 0 or 0 or (0 and 0 or 0 and 0 or 1 or 0 and 0 or 0) and 1) or 0 and 1 or 0 and 1 and 0)
expression output= 1
computing time: 0.000158071517944 seconds
According to your comment, I see that you are computing all the possible combinations instead of the output at a given input values. If so, it would become a typical NP-complete Boolean satisfiability problem. I don't think there's any algorithm that could make it by a complexity lower than O(2^N). I suggest you to search with the keywords fast algorithm to solve SAT problem, you would find a lot of interesting things.
I'm having some troubles understanding this behaviour.
I'm measuring the execution time with the timeit-module and get the following results for 10000 cycles:
Merge : 1.22722930395
Bubble: 0.810706578175
Select: 0.469924766812
This is my code for MergeSort:
def mergeSort(array):
if len(array) <= 1:
return array
else:
left = array[:len(array)/2]
right = array[len(array)/2:]
return merge(mergeSort(left),mergeSort(right))
def merge(array1,array2):
merged_array=[]
while len(array1) > 0 or len(array2) > 0:
if array2 and not array1:
merged_array.append(array2.pop(0))
elif (array1 and not array2) or array1[0] < array2[0]:
merged_array.append(array1.pop(0))
else:
merged_array.append(array2.pop(0))
return merged_array
Edit:
I've changed the list operations to use pointers and my tests now work with a list of 1000 random numbers from 0-1000. (btw: I changed to only 10 cycles here)
result:
Merge : 0.0574434420723
Bubble: 1.74780097558
Select: 0.362952293025
This is my rewritten merge definition:
def merge(array1, array2):
merged_array = []
pointer1, pointer2 = 0, 0
while pointer1 < len(array1) and pointer2 < len(array2):
if array1[pointer1] < array2[pointer2]:
merged_array.append(array1[pointer1])
pointer1 += 1
else:
merged_array.append(array2[pointer2])
pointer2 += 1
while pointer1 < len(array1):
merged_array.append(array1[pointer1])
pointer1 += 1
while pointer2 < len(array2):
merged_array.append(array2[pointer2])
pointer2 += 1
return merged_array
seems to work pretty well now :)
list.pop(0) pops the first element and has to shift all remaining ones, this is an additional O(n) operation which must not happen.
Also, slicing a list object creates a copy:
left = array[:len(array)/2]
right = array[len(array)/2:]
Which means you're also using O(n * log(n)) memory instead of O(n).
I can't see BubbleSort, but I bet it works in-place, no wonder it's faster.
You need to rewrite it to work in-place. Instead of copying part of original list, pass starting and ending indexes.
For starters : I cannot reproduce your timing results, on 100 cycles and lists of size 10000. The exhaustive benchmark with timeit of all implementations discussed in this answer (including bubblesort and your original snippet) is posted as a gist here. I find the following results for the average duration of a single run :
Python's native (Tim)sort : 0.0144600081444
Bubblesort : 26.9620819092
(Your) Original Mergesort : 0.224888720512
Now, to make your function faster, you can do a few things.
Edit : Well, apparently, I was wrong on that one (thanks cwillu). Length computation takes O(1) in python. But removing useless computation everywhere still improves things a bit (Original Mergesort: 0.224888720512, no-length Mergesort: 0.195795390606):
def nolenmerge(array1,array2):
merged_array=[]
while array1 or array2:
if not array1:
merged_array.append(array2.pop(0))
elif (not array2) or array1[0] < array2[0]:
merged_array.append(array1.pop(0))
else:
merged_array.append(array2.pop(0))
return merged_array
def nolenmergeSort(array):
n = len(array)
if n <= 1:
return array
left = array[:n/2]
right = array[n/2:]
return nolenmerge(nolenmergeSort(left),nolenmergeSort(right))
Second, as suggested in this answer, pop(0) is linear. Rewrite your merge to pop() at the end:
def fastmerge(array1,array2):
merged_array=[]
while array1 or array2:
if not array1:
merged_array.append(array2.pop())
elif (not array2) or array1[-1] > array2[-1]:
merged_array.append(array1.pop())
else:
merged_array.append(array2.pop())
merged_array.reverse()
return merged_array
This is again faster: no-len Mergesort: 0.195795390606, no-len Mergesort+fastmerge: 0.126505711079
Third - and this would only be useful as-is if you were using a language that does tail call optimization, without it , it's a bad idea - your call to merge to merge is not tail-recursive; it calls both (mergeSort left) and (mergeSort right) recursively while there is remaining work in the call (merge).
But you can make the merge tail-recursive by using CPS (this will run out of stack size for even modest lists if you don't do tco):
def cps_merge_sort(array):
return cpsmergeSort(array,lambda x:x)
def cpsmergeSort(array,continuation):
n = len(array)
if n <= 1:
return continuation(array)
left = array[:n/2]
right = array[n/2:]
return cpsmergeSort (left, lambda leftR:
cpsmergeSort(right, lambda rightR:
continuation(fastmerge(leftR,rightR))))
Once this is done, you can do TCO by hand to defer the call stack management done by recursion to the while loop of a normal function (trampolining, explained e.g. here, trick originally due to Guy Steele). Trampolining and CPS work great together.
You write a thunking function, that "records" and delays application: it takes a function and its arguments, and returns a function that returns (that original function applied to those arguments).
thunk = lambda name, *args: lambda: name(*args)
You then write a trampoline that manages calls to thunks: it applies a thunk until the thunk returns a result (as opposed to another thunk)
def trampoline(bouncer):
while callable(bouncer):
bouncer = bouncer()
return bouncer
Then all that's left is to "freeze" (thunk) all your recursive calls from the original CPS function, to let the trampoline unwrap them in proper sequence. Your function now returns a thunk, without recursion (and discarding its own frame), at every call:
def tco_cpsmergeSort(array,continuation):
n = len(array)
if n <= 1:
return continuation(array)
left = array[:n/2]
right = array[n/2:]
return thunk (tco_cpsmergeSort, left, lambda leftR:
thunk (tco_cpsmergeSort, right, lambda rightR:
(continuation(fastmerge(leftR,rightR)))))
mycpomergesort = lambda l: trampoline(tco_cpsmergeSort(l,lambda x:x))
Sadly this does not go that fast (recursive mergesort:0.126505711079, this trampolined version : 0.170638551712). OK, I guess the stack blowup of the recursive merge sort algorithm is in fact modest : as soon as you get out of the leftmost path in the array-slicing recursion pattern, the algorithm starts returning (& removing frames). So for 10K-sized lists, you get a function stack of at most log_2(10 000) = 14 ... pretty modest.
You can do slightly more involved stack-based TCO elimination in the guise of this SO answer gives:
def leftcomb(l):
maxn,leftcomb = len(l),[]
n = maxn/2
while maxn > 1:
leftcomb.append((l[n:maxn],False))
maxn,n = n,n/2
return l[:maxn],leftcomb
def tcomergesort(l):
l,stack = leftcomb(l)
while stack: # l sorted, stack contains tagged slices
i,ordered = stack.pop()
if ordered:
l = fastmerge(l,i)
else:
stack.append((l,True)) # store return call
rsub,ssub = leftcomb(i)
stack.extend(ssub) #recurse
l = rsub
return l
But this goes only a tad faster (trampolined mergesort: 0.170638551712, this stack-based version:0.144994809628). Apparently, the stack-building python does at the recursive calls of our original merge sort is pretty inexpensive.
The final results ? on my machine (Ubuntu natty's stock Python 2.7.1+), the average run timings (out of of 100 runs -except for Bubblesort-, list of size 10000, containing random integers of size 0-10000000) are:
Python's native (Tim)sort : 0.0144600081444
Bubblesort : 26.9620819092
Original Mergesort : 0.224888720512
no-len Mergesort : 0.195795390606
no-len Mergesort + fastmerge : 0.126505711079
trampolined CPS Mergesort + fastmerge : 0.170638551712
stack-based mergesort + fastmerge: 0.144994809628
Your merge-sort has a big constant factor, you have to run it on large lists to see the asymptotic complexity benefit.
Umm.. 1,000 records?? You are still well within the polynomial cooefficient dominance here.. If I have
selection-sort: 15 * n ^ 2 (reads) + 5 * n^2 (swaps)
insertion-sort: 5 * n ^2 (reads) + 15 * n^2 (swaps)
merge-sort: 200 * n * log(n) (reads) 1000 * n * log(n) (merges)
You're going to be in a close race for a lonng while.. By the way, 2x faster in sorting is NOTHING. Try 100x slower. That's where the real differences are felt. Try "won't finish in my life-time" algorithms (there are known regular expressions that take this long to match simple strings).
So try 1M or 1G records and let us know if you still thing merge-sort isn't doing too well.
That being said..
There are lots of things causing this merge-sort to be expensive. First of all, nobody ever runs quick or merge sort on small scale data-structures.. Where you have if (len <= 1), people generally put:
if (len <= 16) : (use inline insertion-sort)
else: merge-sort
At EACH propagation level.
Since insertion-sort is has smaller coefficent cost at smaller sizes of n. Note that 50% of your work is done in this last mile.
Next, you are needlessly running array1.pop(0) instead of maintaining index-counters. If you're lucky, python is efficiently managing start-of-array offsets, but all else being equal, you're mutating input parameters
Also, you know the size of the target array during merge, why copy-and-double the merged_array repeatedly.. Pre-allocate the size of the target array at the start of the function.. That'll save at least a dozen 'clones' per merge-level.
In general, merge-sort uses 2x the size of RAM.. Your algorithm is probably using 20x because of all the temporary merge buffers (hopefully python can free structures before recursion). It breaks elegance, but generally the best merge-sort algorithms make an immediate allocation of a merge buffer equal to the size of the source array, and you perform complex address arithmetic (or array-index + span-length) to just keep merging data-structures back and forth. It won't be as elegent as a simple recursive problem like this, but it's somewhat close.
In C-sorting, cache-coherence is your biggest enemy. You want hot data-structures so you maximize your cache. By allocating transient temp buffers (even if the memory manager is returning pointers to hot memory) you run the risk of making slow DRAM calls (pre-filling cache-lines for data you're about to over-write). This is one advantage insertion-sort,selection-sort and quick-sort have over merge-sort (when implemented as above)
Speaking of which, something like quick-sort is both naturally-elegant code, naturally efficient-code, and doesn't waste any memory (google it on wikipedia- they have a javascript implementation from which to base your code). Squeezing the last ounce of performance out of quick-sort is hard (especially in scripting languages, which is why they generally just use the C-api to do that part), and you have a worst-case of O(n^2). You can try and be clever by doing a combination bubble-sort/quick-sort to mitigate worst-case.
Happy coding.
EDIT: See Solving "Who owns the Zebra" programmatically? for a similar class of problem
There's a category of logic problem on the LSAT that goes like this:
Seven consecutive time slots for a broadcast, numbered in chronological order I through 7, will be filled by six song tapes-G, H, L, O, P, S-and exactly one news tape. Each tape is to be assigned to a different time slot, and no tape is longer than any other tape. The broadcast is subject to the following restrictions:
L must be played immediately before O.
The news tape must be played at some time after L.
There must be exactly two time slots between G and
P, regardless of whether G comes before P or whether G comes after P.
I'm interested in generating a list of permutations that satisfy the conditions as a way of studying for the test and as a programming challenge. However, I'm not sure what class of permutation problem this is. I've generalized the type problem as follows:
Given an n-length array A:
How many ways can a set of n unique items be arranged within A? Eg. How many ways are there to rearrange ABCDEFG?
If the length of the set of unique items is less than the length of A, how many ways can the set be arranged within A if items in the set may occur more than once? Eg. ABCDEF => AABCDEF; ABBCDEF, etc.
How many ways can a set of unique items be arranged within A if the items of the set are subject to "blocking conditions"?
My thought is to encode the restrictions and then use something like Python's itertools to generate the permutations. Thoughts and suggestions are welcome.
This is easy to solve (a few lines of code) as an integer program. Using a tool like the GNU Linear Programming Kit, you specify your constraints in a declarative manner and let the solver come up with the best solution. Here's an example of a GLPK program.
You could code this using a general-purpose programming language like Python, but this is the type of thing you'll see in the first few chapters of an integer programming textbook. The most efficient algorithms have already been worked out by others.
EDIT: to answer Merjit's question:
Define:
matrix Y where Y_(ij) = 1 if tape i
is played before tape j, and 0
otherwise.
vector C, where C_i
indicates the time slot when i is
played (e.g. 1,2,3,4,5,6,7)
Large
constant M (look up the term for
"big M" in an optimization textbook)
Minimize the sum of the vector C subject to the following constraints:
Y_(ij) != Y_(ji) // If i is before j, then j must not be before i
C_j < C_k + M*Y_(kj) // the time slot of j is greater than the time slot of k only if Y_(kj) = 1
C_O - C_L = 1 // L must be played immediately before O
C_N > C_L // news tape must be played at some time after L
|C_G - C_P| = 2 // You will need to manipulate this a bit to make it a linear constraint
That should get you most of the way there. You want to write up the above constraints in the MathProg language's syntax (as shown in the links), and make sure I haven't left out any constraints. Then run the GLPK solver on the constraints and see what it comes up with.
Okay, so the way I see it, there are two ways to approach this problem:
Go about writing a program that will approach this problem head first. This is going to be difficult.
But combinatorics teaches us that the easier way to do this is to count all permutations and subtract the ones that don't satisfy your constraints.
I would go with number 2.
You can find all permutations of a given string or list by using this algorithm. Using this algorithm, you can get a list of all permutations. You can now apply a number of filters on this list by checking for the various constraints of the problem.
def L_before_O(s):
return (s.index('L') - s.index('O') == 1)
def N_after_L(s):
return (s.index('L') < s.index('N'))
def G_and_P(s):
return (abs(s.index('G') - s.index('P')) == 2)
def all_perms(s): #this is from the link
if len(s) <=1:
yield s
else:
for perm in all_perms(s[1:]):
for i in range(len(perm)+1):
yield perm[:i] + s[0:1] + perm[i:]
def get_the_answer():
permutations = [i for i in all_perms('GHLOPSN')] #N is the news tape
a = [i for i in permutations if L_before_O(i)]
b = [i for i in a if N_after_L(i)]
c = [i for i in b if G_and_P(i)]
return c
I haven't tested this, but this is general idea of how I would go about coding such a question.
Hope this helps