I'm currently working on doing some shift scheduling simulations for a model taxicab company. The company operates 350 cabs, and all are in use on any given day. Drivers each work 5 shifts of 12 hours each, and the there are four overlapping shifts a day. There are shifts from 3:00-15:00, 15:00-3:00, 16:00-4:00, and 4:00-16:00. I developed it in Python originally, because of the need to rapidly develop it, and I thought that the performance would be acceptable. The original parameters only required two shifts a day (3:00-15:00, and 15:00-3:00), and while performance was not great, it was good enough for my uses. It could make a weekly schedule for the drivers in about 8 minutes, using a simple brute force algorithm (evaluates all potential swaps to see if the situation can be improved.)
With the four overlapping shifts, performance is absolutely abysmal. It takes a little over an hour to do a weekly schedule. I've done some profiling using cProfile, and it looks like the main culprits are two methods. One is a method to determine if there is a conflict when placing a driver in a shift. It makes sure that they are not serving in a shift on the same day, or serving in the preceding or following shifts. With only two shifts a day, this was easy. One simply had to determine if the driver was already scheduled to work in the shift directly before or after. With the four overlapping shifts, this has become more complicated. The second culprit is the method which determines whether the shift is a day or night shift. Again, with the original two shifts, this was as easy as determining if the shift number was even or odd, with shift numbers beginning at 0. The first shift (shift 0) was designated as a night shift, the next was day, and so on and so forth. Now the first two are night, the next two are, etc. These methods call each other, so I will put their bodies below.
def conflict_exists(shift, driver, shift_data):
next_type = get_stype((shift+1) % 28)
my_type = get_stype(shift)
nudge = abs(next_type - my_type)
if driver in shift_data[shift-2-nudge] or driver in shift_data[shift-1-nudge] or driver in shift_data[(shift+1-(nudge*2)) % 28] or driver in shift_data[(shift+2-nudge) % 28] or driver in shift_data[(shift+3-nudge) % 28]:
return True
else:
return False
Note that get_stype returns the type of the shift, with 0 indicating it is a night shift and 1 indicating it a day shift.
In order to determine the shift type, I'm using this method:
def get_stype(k):
if (k / 4.0) % 1.0 < 0.5:
return 0
else:
return 1
And here's the relevant output from cProfile:
ncalls tottime percall cumtime percall
57662556 19.717 0.000 19.717 0.000 sim8.py:241(get_stype)
28065503 55.650 0.000 77.591 0.000 sim8.py:247(in_conflict)
Does anyone have any sagely advice or tips on how I might go about improving the performance of this script? Any help would be greatly appreciated!
Cheers,
Tim
EDIT: Sorry, I should have clarified that the data from each shift is stored as a set i.e. shift_data[k] is of the set data type.
EDIT 2:
Adding main loop, as per request below, along with other methods called. It's a bit of a mess, and I apologize for that.
def optimize_schedule(shift_data, driver_shifts, recheck):
skip = set()
if len(recheck) == 0:
first_run = True
recheck = []
for i in xrange(28):
recheck.append(set())
else:
first_run = False
for i in xrange(28):
if (first_run):
targets = shift_data[i]
else:
targets = recheck[i]
for j in targets:
o_score = eval_score = opt_eval_at_coord(shift_data, driver_shifts, i, j)
my_type = get_stype(i)
s_type_fwd = get_stype((i+1) % 28)
if (my_type == s_type_fwd):
search_direction = (i + 2) % 28
end_direction = i
else:
search_direction = (i + 1) % 28
end_direction = (i - 1) % 28
while True:
if (end_direction == search_direction):
break
for k in shift_data[search_direction]:
coord = search_direction * 10000 + k
if coord in skip:
continue
if k in shift_data[i] or j in shift_data[search_direction]:
continue
if in_conflict(search_direction, j, shift_data) or in_conflict(i, k, shift_data):
continue
node_a_prev_score = o_score
node_b_prev_score = opt_eval_at_coord(shift_data, driver_shifts, search_direction, k)
if (node_a_prev_score == 1) and (node_b_prev_score == 1):
continue
a_type = my_type
b_type = get_stype(search_direction)
if (node_a_prev_score == 1):
if (driver_shifts[j]['type'] == 'any') and (a_type != b_type):
test_eval = 2
else:
continue
elif (node_b_prev_score == 1):
if (driver_shifts[k]['type'] == 'any') and (a_type != b_type):
test_eval = 2
else:
test_eval = 0
else:
if (a_type == b_type):
test_eval = 0
else:
test_eval = 2
print 'eval_score: %f' % test_eval
if (test_eval > eval_score):
cand_coords = [search_direction, k]
eval_score = test_eval
if (test_eval == 2.0):
break
else:
search_direction = (search_direction + 1) % 28
continue
break
if (eval_score > o_score):
print 'doing a swap: ',
print cand_coords,
shift_data[i].remove(j)
shift_data[i].add(cand_coords[1])
shift_data[cand_coords[0]].add(j)
shift_data[cand_coords[0]].remove(cand_coords[1])
if j in recheck[i]:
recheck[i].remove(j)
if cand_coords[1] in recheck[cand_coords[0]]:
recheck[cand_coords[0]].remove(cand_coords[1])
recheck[cand_coords[0]].add(j)
recheck[i].add(cand_coords[1])
else:
coord = i * 10000 + j
skip.add(coord)
if first_run:
shift_data = optimize_schedule(shift_data, driver_shifts, recheck)
return shift_data
def opt_eval_at_coord(shift_data, driver_shifts, i, j):
node = j
if in_conflict(i, node, shift_data):
return float('-inf')
else:
s_type = get_stype(i)
d_pref = driver_shifts[node]['type']
if (s_type == 0 and d_pref == 'night') or (s_type == 1 and d_pref == 'day') or (d_pref == 'any'):
return 1
else:
return 0
There's nothing that would obviously slow these functions down, and indeed they aren't slow. They just get called a lot. You say you're using a brute force algorithm - can you write an algorithm that doesn't try every possible combination? Or is there a more efficient way of doing it, like storing the data by driver rather than by shift?
Of course, if you need instant speedups, it might benefit from running in an interpreter like PyPy, or using Cython to convert critical parts to C.
Hmm. Interesting and fun-looking problem. I will have to look at it more. For now, I have this to offer: Why are you introducing floats? I would do get_stype() as follows:
def get_stype(k):
if k % 4 < 2:
return 0
return 1
It's not a massive speedup, but it's quicker (and simpler). Also, you don't have to do the mod 28 whenever you're feeding get_stype, because that is already taken care of by the mod 4 in get_stype.
If there are significant improvements to be had, they will come in the form of a better algorithm. (I'm not saying that your algorithm is bad, or that there is any better one. I haven't really spent enough time looking at it. But if there isn't a better algorithm to be found, then further significant speed increases will have to come from using PyPy, Cython, Shed Skin, or rewriting in a different (faster) language altogether.)
I don't think your problem is the time it takes to run those two functions. Notice that the percall value for the functions are 0.000. This means that each time the function is invoked, it takes less than 1 millisecond.
I think your problem is the number of times the functions are called. A function call in python is expensive. For example, calling a function that does nothing 57,662,556 times takes 7.15 seconds on my machine:
>>> from timeit import Timer
>>> t = Timer("t()", setup="def t(): pass")
>>> t.timeit(57662556)
7.159075975418091
One thing I'd be curious about is the shift_data variable. Are the values lists or dicts?
driver in shift_data[shift-2-nudge]
The in will take O(N) time if it's a list but O(1) time if it's a dict.
EDIT: Since shift_data values are sets, that should be fine
It seems to me that swapping between the two day-shifts or between the two night-shifts will never help. It won't change the how well the drivers like the shifts and it won't change how those shift conflict with other shifts.
So I think you should be able to only plan two shifts initially, day and night, and only afterwards split the drivers assigned into the shifts into the two actual shifts.
Related
I am implementing the coin change problem in python in CS50's pset6. When I first tackled the problem, this was the algorithm I used:
import time
while True:
try:
totalChange = input('How much change do I owe you? ')
totalChange = float(totalChange) # check it it's a valid numeric value
if totalChange < 0:
print('Error: Please enter a positive numeric value')
continue
break
except:
print('Error: Please enter a positive numeric value')
start_time1 = time.time()
change1 = int(totalChange * 100) # convert money into cents
n = 0
while change1 >= 25:
change1 -= 25
n += 1
while change1 >= 10:
change1 -= 10
n += 1
while change1 >= 5:
change1 -= 5
n += 1
while change1 >= 1:
change1 -= 1
n += 1
print(f'Method1: {n}')
print("--- %s seconds ---" % (time.time() - start_time1))
Having watched the lecture on dynamic programming, I wanted to implement it into this problem. This was my attempt:
while True:
try:
totalChange = input('How much change do I owe you? ')
totalChange = float(totalChange) # check it it's a valid numeric value
if totalChange < 0:
print('Error: Please enter a positive numeric value')
continue
break
except:
print('Error: Please enter a positive numeric value')
start_time2 = time.time()
change2 = int(totalChange*100)
rowsCoins = [1,5,10,25]
colsCoins = list(range(change2 + 1))
n = len(rowsCoins)
m = len(colsCoins)
matrix = [[i for i in range(m)] for j in range(n)]
for i in range(1,n):
for j in range(1,m):
if rowsCoins[i] == j:
matrix[i][j] = 1
elif rowsCoins[i] > j:
matrix[i][j] = matrix[i-1][j]
else:
matrix[i][j] = min(matrix[i-1][j], 1 + matrix[i][j-rowsCoins[i]])
print(f'Method2: {matrix[-1][-1]}')
print("--- %s seconds ---" % (time.time() - start_time2))
When I run the program, it gives the correct answers, but it takes a much longer time.
How could I adjust the second code so that it is correctly implementing dynamic programming. Is my problem that I am starting the loops from the top left corner of the matrix instead of the bottom right?
What are the time complexities of the algorithms for each code that I wrote (as well as for a correct implementation of dynamic programming). I suspect that for the first code, it follows O(n^4), and for the second code O(n*m), and a correct implementation of dynamic programming should be O(n). Am I correct to think this?
Any help for a better understanding of these algorithms is much appreciated.
I think both algorithms are basically O(n).
n in this case is the size of the number entered.
In the first algorithm, it's not O(n^4) as that would suggest you have 4 nested loops looping n times. Instead, you have 4 loops that run sequentially. If they didn't modify change1 at all, that would potentially be O(4n), which is the same as O(n).
In the second algorithm, your choice of variable names confuses things a little. n is a constant, and m is based on the size of the input, so is what would typically be called n. So, if we rename n to c and m to n, we get O(c*n) which, again, is the same as O(n).
The key point here is that for any particular n, and O(n) algorithm isn't necessarily faster than, say, an O(n^2) algorithm. Big O notation just describes how the amount of work done varies with the size of the input. What it does say, is that as n gets bigger, the time taken by an O(n) algorithm will increase slower than the time taken by an O(n^2) algorithm, so for some large enough n, the algorithm with the lower complexity will be quicker.
How could I adjust the second code so that it is correctly implementing dynamic programming. Is my problem that I am starting the loops from the top left corner of the matrix instead of the bottom right?
IMHO, this problem is not suitable for dynamic programming, so it is hard to implement the correct dp. Check a greedy solution https://github.com/endiliey/cs50/blob/master/pset6/greedy.py which should be the best solution.
What are the time complexities of the algorithms for each code that I wrote (as well as for a correct implementation of dynamic programming).
Basically both of your codes should be O(n), but it does not mean that they have the same time complexity, as you have said, the dp solution is much slower. That is because they have different factor(ratio). For example, 4n and 0.25n both are O(n) but they have different time complexity.
The greedy solution should have a time complexity of O(1).
So I'm trying to solve a challenge and have come across a dead end. My solution works when the list is small or medium but when it is over 50000. It just "time out"
a = int(input().strip())
b = list(map(int,input().split()))
result = []
flag = []
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and temp in flag):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flag.remove(temp)
else:
flag.append(b[i])
result.sort()
for i in result:
print(i[0],i[1])
Where
a = 10
and b = [ 2, 4 ,6 ,8, 5 ]
Solution sum any two element in b which matches a
**Edit: ** Updated full code
flag is a list, of potentially the same order of magnitude as b. So, when you do temp in flag that's a linear search: it has to check every value in flag to see if that value is == temp. So, that's 50000 comparisons. And you're doing that once per loop in a linear walk over b. So, your total time is quadratic: 50,000 * 50,000 = 2,500,000,000. (And flag.remove is also linear time.)
If you replace flag with a set, you can test it for membership (and remove from it) in constant time. So your total time drops from quadratic to linear, or 50,000 steps, which is a lot faster than 2 billion:
flagset = set(flag)
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and temp in flagset):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flagset.remove(temp)
else:
flagset.add(b[i])
flag = list(flagset)
If flag needs to retain duplicate values, then it's a multiset, not a set, which means you can implement with Counter:
flagset = collections.Counter(flag)
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and flagset[temp]):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flagset[temp] -= 1
else:
flagset[temp] += 1
flag = list(flagset.elements())
In your edited code, you’ve got another list that’s potentially of the same size, result, and you’re sorting that list every time through the loop.
Sorting takes log-linear time. Since you do it up to 50,000 times, that’s around log(50;000) * 50,000 * 50,000, or around 30 billion steps.
If you needed to keep result in order throughout the operation, you’d want to use a logarithmic data structure, like a binary search tree or a skiplist, so you could insert a new element in the right place in logarithmic time, which would mean just 800.000 steps.
But you don’t need it in order until the end. So, much more simply, just move the result.sort out of the loop and do it at the end.
I'm working on a KNN Classifier using Python but I have some problems.
The following piece of code takes 7.5s-9.0s to be completed and i'll have to run it for 60.000 times.
for fold in folds:
for dot2 in fold:
"""
distances[x][0] = Class of the dot2
distances[x][1] = distance between dot1 and dot2
"""
distances.append([dot2[0], calc_distance(dot1[1:], dot2[1:], method)])
The "folds" variable is a list with 10 folds that summed contain 60.000 inputs of images in the .csv format. The first value of each dot is the class it belongs to. All the values are in integer.
Is there a way to make this line run any faster ?
Here it is the calc_distance function
def calc_distancia(dot1, dot2, distance):
if distance == "manhanttan":
total = 0
#for each coord, take the absolute difference
for x in range(0, len(dot1)):
total = total + abs(dot1[x] - dot2[x])
return total
elif distance == "euclidiana":
total = 0
for x in range(0, len(dot1)):
total = total + (dot1[x] - dot2[x])**2
return math.sqrt(total)
elif distance == "supremum":
total = 0
for x in range(0, len(dot1)):
if abs(dot1[x] - dot2[x]) > total:
total = abs(dot1[x] - dot2[x])
return total
elif distance == "cosseno":
dist = 0
p1_p2_mul = 0
p1_sum = 0
p2_sum = 0
for x in range(0, len(dot1)):
p1_p2_mul = p1_p2_mul + dot1[x]*dot2[x]
p1_sum = p1_sum + dot1[x]**2
p2_sum = p2_sum + dot2[x]**2
p1_sum = math.sqrt(p1_sum)
p2_sum = math.sqrt(p2_sum)
quociente = p1_sum*p2_sum
dist = p1_p2_mul/quociente
return dist
EDIT:
Found a way to make it faster at least for the "manhanttan" method. Instead of:
if distance == "manhanttan":
total = 0
#for each coord, take the absolute difference
for x in range(0, len(dot1)):
total = total + abs(dot1[x] - dot2[x])
return total
i put
if distance == "manhanttan":
totalp1 = 0
totalp2 = 0
#for each coord, take the absolute difference
for x in range(0, len(dot1)):
totalp1 += dot1[x]
totalp2 += dot2[x]
return abs(totalp1-totalp2)
The abs() call is very heavy
There are many guides to "profiling python"; you should search for some, read them, and walk through the profiling process to ensure you know what parts of your work are taking the most time.
But if this is really the core of your work, it's a fair bet that that calc_distance is where the majority of the running time is being consumed.
Optimizing that deeply will probably require using NumPy accelerated math or a similar, lower-level approach.
As a quick and dirty approach requiring less invasive profiling and rewriting, try installing the PyPy implementation of Python and running under it. I have seen easy 2x or more accelerations compared to the standard (CPython) implementation.
I'm confused. Did you try the profiler?
python -m cProfile myscript.py
It will show you where the bulk of the time is being consumed and provide hard data to work with. eg. refactor to reduce the number of calls, restructure the input data, substitute this function for that, etc.
https://docs.python.org/3/library/profile.html
In the first place, you should avoid using a single calc_distance function that performs a linear search in a list of strings on every call. Define independent distance functions and call the right one. As Lee Daniel Crocker suggested, don't use the slicing, just start your loop ranges at 1.
For the cosine distance, I would recommend to normalize all the dot vectors once for all. This way the distance computation reduces to a dot product.
These micro-optimization can give you some speedup. But a better gain should be possible by switching to a better algorithm: the kNN classifier calls for a kD-tree, that will allow you to quickly remove a significant fraction of the points from consideration.
This is harder to implement (you'll have to slightly adapt for the different distances; the cosine distance will make it tricky.)
I have this python code to generate prime numbers. I added a little piece of code (between # Start progress code and # End progress code) to display the progress of the operation but it slowed down the operation.
#!/usr/bin/python
a = input("Enter a number: ")
f = open('data.log', 'w')
for x in range (2, a):
p = 1
# Start progress code
s = (float(x)/float(a))*100
print '\rProcessing ' + str(s) + '%',
# End progress code
for i in range(2, x-1):
c = x % i
if c == 0:
p = 0
break
if p != 0:
f.write(str(x) + ", ")
print '\rData written to \'data.log\'. Press Enter to exit...'
raw_input()
My question is how to show the progress of the operation without slowing down the actual code/loop. Thanks in advance ;-)
To answer your question I/O is very expensive, and so printing out your progress will have a huge impact on performance. I would avoid printing if possible.
If you are concerned about speed, there is a very nice optimization you can use to greatly speed up your code.
For you inner for loop, instead of
for i in range(2, x-1):
c = x % i
if c == 0:
p = 0
break
use
for i in range(2, x-1**(1.0/2)):
c = x % i
if c == 0:
p = 0
break
You only need to iterate from the range of 2 to the square root of the number you are primality testing.
You can use this optimization to offset any performance loss from printing your progress.
Your inner loop is O(n) time. If you're experiencing lag toward huge numbers then it's pretty normal. Also you're converting x and a into float while performing division; as they get bigger, it could slow down your process.
First, I hope this is a toy problem, because (on quick glance) it looks like the whole operation is O(n^2).
You probably want to put this at the top:
from __future__ import division # Make floating point division the default and enable the "//" integer division operator.
Typically for huge loops where each iteration is inexpensive, progress isn't output every iteration because it would take too long (as you say you are experiencing). Try outputting progress either a fixed number of times or with a fixed duration between outputs:
N_OUTPUTS = 100
OUTPUT_EVERY = (a-2) // 5
...
# Start progress code
if a % OUTPUT_EVERY == 0:
print '\rProcessing {}%'.format(x/a),
# End progress code
Or if you want to go by time instead:
UPDATE_DT = 0.5
import time
t = time.time()
...
# Start progress code
if time.time() - t > UPDATE_DT:
print '\rProcessing {}%'.format(x/a),
t = time.time()
# End progress code
That's going to be a little more expensive, but will guarantee that even as the inner loop slows down, you won't be left in the dark for more than one iteration or 0.5 seconds, whichever takes longer.
Hey. This example is pretty specific but I think it could apply to a broad range of functions.
It's taken from some online programming contest.
There is a game with a simple winning condition. Draw is not possible. Game cannot go on forever because every move takes you closer to the terminating condition. The function should, given a state, determine if the player who is to move now has a winning strategy.
In the example, the state is an integer. A player chooses a non-zero digit and subtracts it from the number: the new state is the new integer. The winner is the player who reaches zero.
I coded this:
from Memoize import Memoize
#Memoize
def Game(x):
if x == 0: return True
for digit in str(x):
if digit != '0' and not Game(x-int(digit)):
return True
return False
I think it's clear how it works. I also realize that for this specific game there's probably a much smarter solution but my question is general. However this makes python go crazy even for relatively small inputs. Is there any way to make this code work with a loop?
Thanks
This is what I mean by translating into a loop:
def fac(x):
if x <= 1: return x
else: return x*fac(x-1)
def fac_loop(x):
result = 1
for i in xrange(1,x+1):
result *= i
return result
## dont try: fac(10000)
print fac_loop(10000) % 100 ## works
In general, it is only possible to convert recursive functions into loops when they are primitive-recursive; this basically means that they call themselves only once in the body. Your function calls itself multiple times. Such a function really needs a stack. It is possible to make the stack explicit, e.g. with lists. One reformulation of your algorithm using an explicit stack is
def Game(x):
# x, str(x), position
stack = [(x,str(x),0)]
# return value
res = None
while stack:
if res is not None:
# we have a return value
if not res:
stack.pop()
res = True
continue
# res is True, continue to search
res = None
x, s, pos = stack.pop()
if x == 0:
res = True
continue
if pos == len(s):
# end of loop, return False
res = False
continue
stack.append((x,s,pos+1))
digit = s[pos]
if digit == '0':
continue
x -= int(digit)
# recurse, starting with position 0
stack.append((x,str(x),0))
return res
Basically, you need to make each local variable an element of a stack frame; the local variables here are x, str(x), and the iteration counter of the loop. Doing return values is a bit tricky - I chose to set res to not-None if a function has just returned.
By "go crazy" I assume you mean:
>>> Game(10000)
# stuff skipped
RuntimeError: maximum recursion depth exceeded in cmp
You could start at the bottom instead -- a crude change would be:
# after defining Game()
for i in range(10000):
Game(i)
# Now this will work:
print Game(10000)
This is because, if you start with a high number, you have to recurse a long way before you reach the bottom (0), so your memoization decorator doesn't help the way it should.
By starting from the bottom, you ensure that every recursive call hits the dictionary of results immediately. You probably use extra space, but you don't recurse far.
You can turn any recursive function into an iterative function by using a loop and a stack -- essentially running the call stack by hand. See this question or this quesstion, for example, for some discussion. There may be a more elegant loop-based solution here, but it doesn't leap out to me.
Well, recursion mostly is about being able to execute some code without losing previous contexts and their order. In particular, function frames put and saved onto call stack during recursion, therefore giving constraint on recursion depth because stack size is limited. You can 'increase' your recursion depth by manually managing/saving required information on each recursive call by creating a state stack on the heap memory. Usually, amount of available heap memory is larger than stack's one. Think: good quick sort implementations eliminate recursion to the larger side by creating an outer loop with ever-changing state variables (lower/upper array boundaries and pivot in QS example).
While I was typing this, Martin v. Löwis posted good answer about converting recursive functions into loops.
You could modify your recursive version a bit:
def Game(x):
if x == 0: return True
s = set(digit for digit in str(x) if digit != '0')
return any(not Game(x-int(digit)) for digit in s)
This way, you don't examine digits multiple times. For example, if you are doing 111, you don't have to look at 110 three times.
I'm not sure if this counts as an iterative version of the original algorithm you presented, but here is a memoized iterative version:
import Queue
def Game2(x):
memo = {}
memo[0] = True
calc_dep = {}
must_calc = Queue.Queue()
must_calc.put(x)
while not must_calc.empty():
n = must_calc.get()
if n and n not in calc_dep:
s = set(int(c) for c in str(n) if c != '0')
elems = [n - digit for digit in s]
calc_dep[n] = elems
for new_elem in elems:
if new_elem not in calc_dep:
must_calc.put(new_elem)
for k in sorted(calc_dep.keys()):
v = calc_dep[k]
#print k, v
memo[k] = any(not memo[i] for i in v)
return memo[x]
It first calculates the set of numbers that x, the input, depends on. Then it calculates those numbers, starting at the bottom and going towards x.
The code is so fast because of the test for calc_dep. It avoids calculating multiple dependencies. As a result, it can do Game(10000) in under 400 milliseconds whereas the original takes -- I don't know how long. A long time.
Here are performance measurements:
Elapsed: 1000 0:00:00.029000
Elapsed: 2000 0:00:00.044000
Elapsed: 4000 0:00:00.086000
Elapsed: 8000 0:00:00.197000
Elapsed: 16000 0:00:00.461000
Elapsed: 32000 0:00:00.969000
Elapsed: 64000 0:00:01.774000
Elapsed: 128000 0:00:03.708000
Elapsed: 256000 0:00:07.951000
Elapsed: 512000 0:00:19.148000
Elapsed: 1024000 0:00:34.960000
Elapsed: 2048000 0:01:17.960000
Elapsed: 4096000 0:02:55.013000
It's reasonably zippy.