Benefit of converting Python method to C extension?

Benefit of converting Python method to C extension? - python

A relatively simple question:
If I convert a CPU-bound bottleneck method from Python to a C extension (roughly implementing the same algorithm),
How much increase in speed, and performance should I expect?
What factors determine that?
UPDATE:
People seemed to be complaining on the lack of specifics. I was mostly trying to understand what factors would make a piece of Python code a good candidate for being rewritten in C (i.e., when would porting to C actually give you a speed boost if the original Python is CPU-bound).
For specifics, this is the piece of code I'm looking at. Basically it's a recursive method that takes two lists of lists (a list of "columns", where each column contains possible values that could go in that column...basically, a schema), and seeing if it's possible to make less than n (usually 1) change(s) (where a change might be to add a new value to a column, add a new column, remove a column, etc.) such that there's some sequence of values (one value from each column) you could construct out of either schema. It's very similar in spirit to calculating the edit distance between to strings. Here's the code:
def CheckMerge(self, schemai, schemaj, starti, startj, \
changesLeft, path):
# if starti == 0 and startj == 0:
# print '\n'
# print schemai.schema
# print ''
# print schemaj.schema
if starti == len(schemai.schema) and startj == len(schemaj.schema):
return (True, path)
if starti < len(schemai.schema):
icopy = schemai.schema[starti]
else:
icopy = []
if startj < len(schemaj.schema):
jcopy = schemaj.schema[startj]
else:
jcopy = []
intersect = set(icopy).intersection(set(jcopy))
intersect.discard('')
if len(intersect) == 0:
if starti < len(schemai.schema) and \
('' in schemai.schema[starti] or changesLeft > 0):
if not '' in schemai.schema[starti]:
changesLeft -= 1
changesCopy = list(path)
changesCopy.append('skipi')
result,steps = self.CheckMerge(schemai, schemaj, starti+1, startj, \
changesLeft, changesCopy)
if result:
return (result,steps)
elif not '' in schemai.schema[starti]:
changesLeft += 1
if startj < len(schemaj.schema) and \
('' in schemaj.schema[startj] or changesLeft > 0):
if not '' in schemaj.schema[startj]:
changesLeft -= 1
changesCopy = list(path)
changesCopy.append('skipj')
result,steps = self.CheckMerge(schemai, schemaj, starti, startj+1, \
changesLeft, changesCopy)
if result:
return (result, steps)
elif not '' in schemaj.schema[startj]:
changesLeft += 1
if changesLeft > 0:
changesCopy = list(path)
changesCopy.append('replace')
changesLeft -= 1
result,steps = self.CheckMerge(schemai, schemaj, starti+1, startj+1, \
changesLeft, changesCopy)
if result:
return (result, steps)
return (False, None)
else:
changesCopy = list(path)
changesCopy.append('merge')
result,steps = self.CheckMerge(schemai, schemaj, starti+1, startj+1, \
changesLeft, changesCopy)
if result:
return (result, steps)
else:
return (False, None)

That solely and completely depends on your code.
If some piece of your code is supported by the hardware, like, if you're computing the Hamming weight, doing AES encrption, calculating CRC, or have a vectorizable code, there are hardware instructions for them that boosts up the speed, and you can accesss them by C code but not python code.

Python runs pretty fast, so you would need a distinct reason to convert a Python function to C, like to access hardware, which has already been mentioned. But, here is another reason.
Python (C Python) suffers from the Global Interpreter Lock (GIC) problem. Python threads cannot run simultaneously, only one at a time. So, you could put thread-specific code into C, which is not restricted by the GIC problem.
In general, if you believe your Python code is slow and it there is not a specific reason as you have mentioned in your post, then you may need to adapt to more Python-ic coding conventions, like list comprehensions and other features found in Python and not too many other languages.
My final comment is not a reflection on your code sample. Instead I am supplying it as the general wisdom that I've learned listening to a lot of Python presentations.

Related

Rewriting recursive algorithm to memoized algorithm

I have written the following recursive algorithm:
p = [2,3,2,1,4]
def fn(c,i):
if(c < 0 or i < 0):
return 0
if(c == 0):
return 1
return fn(c,i-1)+fn(c-p[i-1],i-1)
Its a solution to a problem where you have c coins, and you have to find out have many ways you can spend your c coins on beers. There are n different beers, only one of each beer.
i is denoted as the i'th beer, with the price of p[i], the prices are stored in array p.
The algorithm recursively calls itself, and if c == 0, it returns 1, as it has found a valid permutation. If c or i is less than 0, it returns 0 as it's not a valid permutation, as it exceeds the amount of coins available.
Now I need to rewrite the algorithm as a Memoized algorithm. This is my first time trying this, so I'm a little confused on how to do it.
Ive been trying different stuff, my latest try is the following code:
p = [2,3,2,1,4]
prev = np.empty([5, 5])
def fni(c,i):
if(prev[c][i] != None):
return prev[c][i]
if(c < 0 or i < 0):
prev[c][i] = 0
return 0
if(c == 0):
prev[c][i] = 1
return 1
prev[c][i] = fni(c,i-1)+fni(c-p[i-1],i-1)
return prev[c][i]
"Surprisingly" it doesn't work, and im sure it's completely wrong. My thought was to save the results of the recursive call in an 2d array of 5x5, and check in the start if the result is already saved in the array, and if it is just return it.
I only provided my above attempt to show something, so don't take the code too seriously.
My prev array is all 0's, and should be values of null so just ignore that.
My task is actually only to solve it as pseudocode, but I thought it would be easier to write it as code to make sure that it would actually work, so pseudo code would help as well.
I hope I have provided enough information, else feel free to ask!
EDIT: I forgot to mention that I have 5 coins, and 5 different beers (one of each beer). So c = 5, and i = 5

First, np.empty() by default gives an array of uninitialized values, not Nones, as the documentation points out:
>>> np.empty([2, 2])
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]]) #uninitialized
Secondly, although this is more subjective, you should default to using dictionaries for memoization in Python. Arrays may be more efficient if you know you'll actually memoize most of the possible values, but it can be hard to tell that ahead of time. At the very least, make sure your array values are initialized. It's good that you're using numpy-- that will help you avoid the common beginner mistake of writing memo = [[0]*5]*5.
Thirdly, you should perform checks for 'out of bounds' or negative parameters (c < 0 or i < 0) before you use them to access an array as in prev[c][i] != None. Negative indices in Python could map you to a different memoized parameter's value.
Besides those details, your memoization code and strategy is sound.

Python 3: Recursivley find if number is even

I am writing a program that must find if a number is even or not. It needs to follow this template. I can get it to find if a number is even or not recursively

The key is that you need to return a boolean value:
def isEven(num):
if (num <= 0):
return (num == 0)
return isEven(num-2)
For larger numbers though this quickly exceeds the default maximum recursion depth for Python. That can be remedied by calling sys.setrecursionlimit(n) where n is the number of recursive calls you want to allow. n in turn is limited by the platform you are on.

Try this, it works for integer values with 0 <= n <= sys.getrecursionlimit()-2:
def even(n):
return True if n == 0 else odd(n - 1)
def odd(n):
return False if n == 0 else even(n - 1)
It's a nice example of a pair of mutually recursive functions. Not the most efficient way to find the answer, of course - but nevertheless interesting from an academic point of view.

This template will help. You need to fill in the commented lines. The one you have in the question won't work - you aren't passing anything into isEven. This will only work if n >= 0, otherwise it will crash your program. Easy enough to fix if you ever need to deal with negative numbers.
def isEven(n):
if n == 0:
# Number is even
elif n == 1:
# Number is odd
else:
# Call the function again, but with a different n

Taking up wim's challenge to find a "different" way to do this: The prototypical recursive pattern is foo(cdr(x)), with a base case for the empty list… so let's write it around that:
def isEven(num):
def isEvenLength(l):
if not l:
return True
return not isEvenLength(l[1:])
return isEvenLength(range(num))

A really dumb use case for recursion, but here is my version anyway
import random
def isEven(num):
if random.random() < 0.5:
# let's learn about recursion!
return isEven(num)
else:
# let's be sane!
return num % 2 == 0
disclaimer: if you submitted this you'd probably tick off the teacher and come across as a smartypants.

Advice on writing a solver for Vexed levels

Vexed is a popular puzzle game, with many versions available (some of them GPL free software). It is very suitable for small screen devices; versions are available for Android, iOS, etc. I discovered it on the PalmOS platform.
Just for fun, I'd like to write a solver that will solve Vexed levels.
Vexed is a block-sliding puzzle game. Here are the rules in a nutshell:
0) Each level is a grid of squares, bounded by an impassible border. In any level there will be some solid squares, which are impassible. There are some number of blocks of various colors; these could be resting on the bottom border, resting on solid squares, or resting on other blocks (of a different color). Most levels are 8x8 or smaller.
1) The only action you can take is to slide a block to the left or to the right. Each square traveled by a block counts as one move.
2) There is gravity. If, after you slide a block, it is no longer resting on a solid square or another block, it will fall until it comes to rest on another block, a solid square, or the bottom border. Note that you cannot ever lift it up again.
3) Any time two or more blocks of the same color touch, they disappear. Note that chains are possible: if a supporting block disappears, blocks that rested upon it will fall, which could lead to more blocks of the same color touching and thus disappearing.
4) The goal is to make all blocks disappear in the minimum number of moves. Each level has a "par score" which tells you the minimum number of moves. (In the original PalmOS game, the "par score" wasn't necessarily the minimum, but in the Android version I play these days it is the minimum.)
Here is the SourceForge project with the source for the PalmOS version of the game:
http://sourceforge.net/projects/vexed/
I'm an experienced software developer, but I haven't done really any work in AI sort of stuff (pathfinding, problem-solving, etc.) So I'm looking for advice to get me pointed in the right direction.
At the moment, I can see two basic strategies for me to pursue:
0) Just write a brute-force solver, probably in C for the speed, that cranks through every possible solution for every game and returns a list of all solutions, best one first. Would this be a reasonable approach, or would the total number of possible moves make this too slow? I don't think any levels exist larger than 10x10.
1) Learn some AI-ish algorithms, and apply them in a clever way to solve the problem, probably using Python.
Note that the source for PalmOS Vexed includes a solver. According to the author, "The solver uses A* with pruning heuristics to find solutions."
http://www.scottlu.com/Content/Vexed.html
So, one strategy I could pursue would be to study the A* algorithm and then study the C++ code for the existing solver and try to learn from that.
I'm going to tag this with Python and C tags, but if you think I should be using something else, make your sales pitch and I'll consider it!
Here is ASCII art of a level from "Variety 25 Pack"; level 48, "Dark Lord". I am able to solve most levels but this one has, well, vexed me. Par score for this level is 25 moves, but I have not yet solved it at all!
__________
|## g####|
|## # b##|
|## # p##|
|#g ###|
|bp ###|
|p# p g |
==========
In this picture, the borders are underscores, vertical bars, and equals characters. Filled-in squares are '#'. Open spaces are space characters. Colored blocks are 'g' (green), 'b' (blue) and 'p' (purple).
By the way, I'll likely make the input file format to the solver be ASCII art of the levels, just like this but without the fussy line border characters.
Thanks for any advice!
EDIT:
I have accepted an answer. Thank you to the people who gave me answers.
This is a semi-brute-force solver. It isn't using A* but it is cutting short unprofitable branches of the tree.
It reads in a simple text file with the level data. A letter is a block, a '_' (underscore) is an open space, and a '#' is a filled-in space.
#!/usr/bin/env python
#
# Solve levels from the game Vexed.
from collections import Counter
import sys
level_blocks = set(chr(x) for x in range(ord('a'), ord('z')+1))
level_other = set(['_', '#'])
level_valid = set().union(level_blocks, level_other)
def prn_err(s='\n'):
sys.stderr.write(s)
sys.stderr.flush()
def validate_llc(llc):
if len(llc) == 0:
raise ValueError, "need at least one row of level data"
w = len(llc[0])
if w < 2:
raise ValueError, "level data not wide enough"
for i, row in enumerate(llc):
if len(row) != w:
s = "level data: row %d is different length than row 0"
raise ValueError, s % i
for j, ch in enumerate(row):
if ch not in level_valid:
s = "char '%c' at (%d, %d) is invalid" % (ch, i, j)
raise ValueError, s
class Info(object):
pass
info = Info()
info.width = 0
info.height = 0
info.spaces = set()
info.boom_blocks = set()
info.best_solution = 9999999999
info.title = "unknown"
class Level(object):
"""
Hold the state of a level at a particular move. self.parent points
to the previous state, from a previous move, so the solver builds a
tree representing the moves being considered. When you reach a solution
(a state where there are no more blocks) you can walk up the tree
back to the root, and you have the chain of moves that leads to that
solution."""
def __init__(self, x):
if isinstance(x, Level):
self.blocks = dict(x.blocks)
self.colors = dict(x.colors)
self.parent = x
self.s_move = ''
self.rank = x.rank + 1
else:
if isinstance(x, basestring):
# allow to init from a one-line "data" string
# example: "___;___;r_r"
x = x.split(';')
# build llc: list of rows, each row a list of characters
llc = [[ch for ch in row.strip()] for row in x]
llc.reverse()
info.width = len(llc[0])
info.height = len(llc)
validate_llc(llc)
# Use llc data to figure out the level, and build self.blocks
# and info.spaces. self.blocks is a dict mapping a coordinate
# tuple to a block color; info.spaces is just a set of
# coordinate tuples.
self.blocks = {}
for y in range(info.height):
for x in range(info.width):
loc = (x, y)
c = llc[y][x]
if c == '_':
# it's a space
info.spaces.add(loc)
elif c in level_blocks:
# it's a block (and the block is in a space)
self.blocks[loc] = c
info.spaces.add(loc)
else:
# must be a solid square
assert(c == '#')
# colors: map each block color onto a count of blocks.
self.colors = Counter(self.blocks.values())
# parent: points to the level instance that holds the state
# previous to the state of this level instance.
self.parent = None
# s_move: a string used when printing out the moves of a solution
self.s_move = 'initial state:'
# rank: 0 == initial state, +1 for each move
self.rank = 0
self.validate()
print "Solving:", info.title
print
sys.stdout.flush()
if self._update():
print "level wasn't stable! after updating:\n%s\n" % str(self)
def lone_color(self):
return any(count == 1 for count in self.colors.values())
def is_solved(self):
return sum(self.colors.values()) == 0
def validate(self):
if info.height == 0:
raise ValueError, "need at least one row of level data"
if info.width < 2:
raise ValueError, "level data not wide enough"
if self.lone_color():
raise ValueError, "cannot have just one of any block color"
for x, y in info.spaces:
if not 0 <= x < info.width or not 0 <= y < info.height:
raise ValueError, "Bad space coordinate: " + str(loc)
for x, y in self.blocks:
if not 0 <= x < info.width or not 0 <= y < info.height:
raise ValueError, "Bad block coordinate: " + str(loc)
if any(count < 0 for count in self.colors.values()):
raise ValueError, "cannot have negative color count!"
colors = Counter(self.blocks.values())
for k0 in [key for key in self.colors if self.colors[key] == 0]:
del(self.colors[k0]) # remove all keys whose value is 0
if colors != self.colors:
raise ValueError, "self.colors invalid!\n" + str(self.colors)
def _look(self, loc):
"""
return color at location 'loc', or '_' if empty, or '#' for a solid sqaure.
A bad loc does not raise an error; it just returns '#'.
"""
if loc in self.blocks:
return self.blocks[loc]
elif loc in info.spaces:
return '_'
else:
return '#'
def _lookxy(self, x, y):
loc = x, y
return self._look(loc)
def _board_mesg(self, mesg, loc):
x, y = loc
return "%s %c(%d,%d)" % (mesg, self._look(loc), x, y)
def _blocked(self, x, y):
return self._lookxy(x, y) != '_'
def _s_row(self, y):
return ''.join(self._lookxy(x, y) for x in xrange(info.width))
def data(self, ch_join=';'):
return ch_join.join(self._s_row(y)
for y in xrange(info.height - 1, -1, -1))
# make repr() actually print a representation
def __repr__(self):
return type(self).__name__ + "(%s)" % self.data()
# make str() work
def __str__(self):
return self.data('\n')
def _move_block(self, loc_new, loc_old):
self.blocks[loc_new] = self.blocks[loc_old]
del(self.blocks[loc_old])
def _explode_block(self, loc):
if loc in info.boom_blocks:
return
info.boom_blocks.add(loc)
color = self.blocks[loc]
self.colors[color] -= 1
def _try_move(self, loc, d):
x, y = loc
if not d in ('<', '>'):
raise ValueError, "d value '%c' invalid, must be '<' or '>'" % d
if d == '<':
x_m = (x - 1)
else:
x_m = (x + 1)
y_m = y
loc_m = (x_m, y_m)
if self._blocked(x_m, y_m):
return None # blocked, so can't move there
# Not blocked. Let's try the move!
# Make a duplicate level...
m = Level(self)
# ...try the move, and see if anything falls or explodes...
m._move_block(loc_m, loc)
m._update()
if m.lone_color():
# Whoops, we have only one block of some color. That means
# no solution can be found by considering this board.
return None
# finish the update
m.s_move = self._board_mesg("move:", loc) + ' ' + d
m.parent = self
return m
def _falls(self, loc):
x, y = loc
# blocks fall if they can, and only explode when at rest.
# gravity loop: block falls until it comes to rest
if self._blocked(x, y - 1):
return False # it is already at rest
while not self._blocked(x, y - 1):
# block is free to fall so fall one step
y -= 1
loc_m = (x, y)
self._move_block(loc_m, loc)
return True # block fell to new location
def _explodes(self, loc):
x, y = loc
exploded = False
color = self._look(loc)
# look left, right, up, and down for blocks of same color
for e_loc in [(x-1, y), (x+1, y), (x, y-1)]:
if e_loc in self.blocks and self.blocks[e_loc] == color:
self._explode_block(e_loc)
exploded = True
if exploded:
self._explode_block(loc)
return exploded
def _update(self):
c = 0
while True:
# TRICKY: sum() works on functions that return a bool!
# As you might expect, True sums as 1 and False as 0.
f = sum(self._falls(loc) for loc in self.blocks)
e = sum(self._explodes(loc) for loc in self.blocks)
for loc in info.boom_blocks:
del(self.blocks[loc])
info.boom_blocks.clear()
c += f + e
if (f + e) == 0:
# no blocks fell or exploded; board is stable, update is done
break
return c
def print_moves(self):
lst = [self]
a = self
while a.parent:
a = a.parent
lst.append(a)
lst.reverse()
for i, a in enumerate(lst):
if i:
print "Move %d of %d" % (i, len(lst) - 1)
print a.s_move
print a
print
def solve(self):
c = 0
seen = set()
solutions = []
seen.add(self.data())
q = []
if self.is_solved():
solutions.append(self)
else:
q.append(self)
while q:
a = q.pop(0)
# Show dots while solver is 'thinking' to give a progress
# indicator. Dots are written to stderr so they will not be
# captured if you redirect stdout to save the solution.
c += 1
if c % 100 == 0:
prn_err('.')
if a.rank > info.best_solution:
# We cannot beat or even match the best solution.
# No need to think any more about this possibility.
# Just prune this whole branch of the solution tree!
continue
for loc in a.blocks:
for d in ('<', '>'):
m = a._try_move(loc, d)
if not m or m.data() in seen:
continue
if m.is_solved():
if info.best_solution > a.rank:
print "\nnew best solution: %d moves" % m.rank
info.best_solution = a.rank
else:
print "\nfound another solution: %d moves" % m.rank
solutions.append(m)
else:
seen.add(m.data())
q.append(m)
print
print "Considered %d different board configurations." % c
print
solutions.sort(key=lambda a: a.rank)
for n, a in enumerate(solutions):
print "solution %d): %d moves" % (n, a.rank)
a.print_moves()
if not solutions:
print "no solutions found!"
def load_vex_file(fname):
with open(fname, "rt") as f:
s = f.next().strip()
if s != "Vexed level":
raise ValueError, "%s: not a Vexed level file" % fname
s = f.next().strip()
if not s.startswith("title:"):
raise ValueError, "%s: missing title" % fname
info.title = s[6:].lstrip() # remove "title:"
for s in f:
if s.strip() == "--":
break
return Level(f)
if __name__ == "__main__":
if len(sys.argv) == 1:
print "Usage vexed_solver <vexed_level_file.vex>"
sys.exit(1)
fname = sys.argv[1]
level = load_vex_file(fname)
level.solve()
Here is an example level file:
Vexed level
title: 25-48, "Dark Lord"
--
##_f####
##_#_c##
##_#_p##
#f___###
cp___###
p#_p_f__
On my computer, it solves "Dark Lord" in almost exactly 10 seconds, considering 14252 different board configurations. I wrote in Python 2.x instead of Python 3, because I want to try this with PyPy and see how fast it becomes.
Next I should work on applying A* to this. I guess I can make a metric like "better to move an orange block toward another orange block than away" and try to work that in. But I do want all the solutions to pop out, so maybe I'm done already. (If there are three solutions that are all the minimum number of moves, I want to see all three.)
I welcome comments on this Python program. I had fun writing it!
EDIT: I did try this with PyPy but I never updated this until now. On the computer I used with PyPy, the solver could solve the "Dark Lord" level in 10 seconds using CPython; that dropped to 4 seconds with PyPy. The cool part is that I could see the speedup as the JIT kicked in: this program prints dots as it is working, and under PyPy I can see the dots start out slower and then just accelerate. PyPy is nifty.

Studying Wikipedia may be better than studying the actual source code. A* is written out pretty clearly there. But that feels like cheating, doesn't it?
As all good ideas, A* is actually pretty obvious in retrospective. It's fun trying to work it through, and there are a few nice insights along the way. Here's how you get to it:
Write the brute-force solver. You'll need much of what you write in the more advanced versions: a game state, and a description of getting from one state to another. You'll also end up removing duplicate states. You should have a queue of some sort for states to be considered, a set of states you've already done, and structure to hold the best solution(s) found so far. And a method that takes a state from the queue and generates a state's „neighbor“ states (ones reachable from it). That's the basic structure of classical AI algorithms. Note that you're technically „generating“ or „exploring“ a huge graph here.
After that, add a simple pruning algorithm: if a state has only one block of some color left, there's no need to consider it further. See if you can come up with other pruning algorithms (i.e. ones that mark a state as „unsolvable“). A good pruning algorithm will eliminate lots of pointless states, thus justifying the time it takes to run the pruning itself.
Then, introduce a heuristic score: rank each state with a number that tells you how „good“ the state looks – about how much more solving will it take. Make your queue a priority queue. This will allow you to consider the „best looking“ states first, so the program should come up with a solution faster. But, the first solution found may not actually be the best, so to be sure that you find the best one, you still need to run the whole program.
Store the minimum cost (number of moves) that you took to get to each state. Remember to update it if you find a better path. Take the states with the lowest sum of their cost and their heuristic score first; those will more likely lead to a better solution.
And here comes A*. You need to modify your heuristic function so that it doesn't overestimate the distance to the goal, i.e. it can be lower than the number of moves you will actually need, but not higher. Then, note that if you found a solution, its heuristic score will be 0. And, any state where the sum of its cost and heuristic is more than the cost of a solution can't lead to a better solution. So, you can prune that state. But since you're taking the states in order, once you hit that threshold you can just stop and return, since all other states in the queue would be pruned as well.
All that's left now is perfecting your heuristic: it can never overestimate, but the better estimate it gives the less time A* will take. The better the heuristic, the better your results. Take care that the heuristic doesn't take so much time to complete – you wouldn't want, say, generating the solution by brute force, even though it would give the perfect answer :)
Wikipedia has some more discussion and possible improvements, if you get this far. But the best improvements you can make at this point will likely come from improving the heuristic function.

Maybe translate it into a classical planning problem (using PDDL syntax). Then you can try out some planners that are freely available.
E.g. try Fast Forward.

Efficient Shift Scheduling in Python

I'm currently working on doing some shift scheduling simulations for a model taxicab company. The company operates 350 cabs, and all are in use on any given day. Drivers each work 5 shifts of 12 hours each, and the there are four overlapping shifts a day. There are shifts from 3:00-15:00, 15:00-3:00, 16:00-4:00, and 4:00-16:00. I developed it in Python originally, because of the need to rapidly develop it, and I thought that the performance would be acceptable. The original parameters only required two shifts a day (3:00-15:00, and 15:00-3:00), and while performance was not great, it was good enough for my uses. It could make a weekly schedule for the drivers in about 8 minutes, using a simple brute force algorithm (evaluates all potential swaps to see if the situation can be improved.)
With the four overlapping shifts, performance is absolutely abysmal. It takes a little over an hour to do a weekly schedule. I've done some profiling using cProfile, and it looks like the main culprits are two methods. One is a method to determine if there is a conflict when placing a driver in a shift. It makes sure that they are not serving in a shift on the same day, or serving in the preceding or following shifts. With only two shifts a day, this was easy. One simply had to determine if the driver was already scheduled to work in the shift directly before or after. With the four overlapping shifts, this has become more complicated. The second culprit is the method which determines whether the shift is a day or night shift. Again, with the original two shifts, this was as easy as determining if the shift number was even or odd, with shift numbers beginning at 0. The first shift (shift 0) was designated as a night shift, the next was day, and so on and so forth. Now the first two are night, the next two are, etc. These methods call each other, so I will put their bodies below.
def conflict_exists(shift, driver, shift_data):
next_type = get_stype((shift+1) % 28)
my_type = get_stype(shift)
nudge = abs(next_type - my_type)
if driver in shift_data[shift-2-nudge] or driver in shift_data[shift-1-nudge] or driver in shift_data[(shift+1-(nudge*2)) % 28] or driver in shift_data[(shift+2-nudge) % 28] or driver in shift_data[(shift+3-nudge) % 28]:
return True
else:
return False
Note that get_stype returns the type of the shift, with 0 indicating it is a night shift and 1 indicating it a day shift.
In order to determine the shift type, I'm using this method:
def get_stype(k):
if (k / 4.0) % 1.0 < 0.5:
return 0
else:
return 1
And here's the relevant output from cProfile:
ncalls tottime percall cumtime percall
57662556 19.717 0.000 19.717 0.000 sim8.py:241(get_stype)
28065503 55.650 0.000 77.591 0.000 sim8.py:247(in_conflict)
Does anyone have any sagely advice or tips on how I might go about improving the performance of this script? Any help would be greatly appreciated!
Cheers,
Tim
EDIT: Sorry, I should have clarified that the data from each shift is stored as a set i.e. shift_data[k] is of the set data type.
EDIT 2:
Adding main loop, as per request below, along with other methods called. It's a bit of a mess, and I apologize for that.
def optimize_schedule(shift_data, driver_shifts, recheck):
skip = set()
if len(recheck) == 0:
first_run = True
recheck = []
for i in xrange(28):
recheck.append(set())
else:
first_run = False
for i in xrange(28):
if (first_run):
targets = shift_data[i]
else:
targets = recheck[i]
for j in targets:
o_score = eval_score = opt_eval_at_coord(shift_data, driver_shifts, i, j)
my_type = get_stype(i)
s_type_fwd = get_stype((i+1) % 28)
if (my_type == s_type_fwd):
search_direction = (i + 2) % 28
end_direction = i
else:
search_direction = (i + 1) % 28
end_direction = (i - 1) % 28
while True:
if (end_direction == search_direction):
break
for k in shift_data[search_direction]:
coord = search_direction * 10000 + k
if coord in skip:
continue
if k in shift_data[i] or j in shift_data[search_direction]:
continue
if in_conflict(search_direction, j, shift_data) or in_conflict(i, k, shift_data):
continue
node_a_prev_score = o_score
node_b_prev_score = opt_eval_at_coord(shift_data, driver_shifts, search_direction, k)
if (node_a_prev_score == 1) and (node_b_prev_score == 1):
continue
a_type = my_type
b_type = get_stype(search_direction)
if (node_a_prev_score == 1):
if (driver_shifts[j]['type'] == 'any') and (a_type != b_type):
test_eval = 2
else:
continue
elif (node_b_prev_score == 1):
if (driver_shifts[k]['type'] == 'any') and (a_type != b_type):
test_eval = 2
else:
test_eval = 0
else:
if (a_type == b_type):
test_eval = 0
else:
test_eval = 2
print 'eval_score: %f' % test_eval
if (test_eval > eval_score):
cand_coords = [search_direction, k]
eval_score = test_eval
if (test_eval == 2.0):
break
else:
search_direction = (search_direction + 1) % 28
continue
break
if (eval_score > o_score):
print 'doing a swap: ',
print cand_coords,
shift_data[i].remove(j)
shift_data[i].add(cand_coords[1])
shift_data[cand_coords[0]].add(j)
shift_data[cand_coords[0]].remove(cand_coords[1])
if j in recheck[i]:
recheck[i].remove(j)
if cand_coords[1] in recheck[cand_coords[0]]:
recheck[cand_coords[0]].remove(cand_coords[1])
recheck[cand_coords[0]].add(j)
recheck[i].add(cand_coords[1])
else:
coord = i * 10000 + j
skip.add(coord)
if first_run:
shift_data = optimize_schedule(shift_data, driver_shifts, recheck)
return shift_data
def opt_eval_at_coord(shift_data, driver_shifts, i, j):
node = j
if in_conflict(i, node, shift_data):
return float('-inf')
else:
s_type = get_stype(i)
d_pref = driver_shifts[node]['type']
if (s_type == 0 and d_pref == 'night') or (s_type == 1 and d_pref == 'day') or (d_pref == 'any'):
return 1
else:
return 0

There's nothing that would obviously slow these functions down, and indeed they aren't slow. They just get called a lot. You say you're using a brute force algorithm - can you write an algorithm that doesn't try every possible combination? Or is there a more efficient way of doing it, like storing the data by driver rather than by shift?
Of course, if you need instant speedups, it might benefit from running in an interpreter like PyPy, or using Cython to convert critical parts to C.

Hmm. Interesting and fun-looking problem. I will have to look at it more. For now, I have this to offer: Why are you introducing floats? I would do get_stype() as follows:
def get_stype(k):
if k % 4 < 2:
return 0
return 1
It's not a massive speedup, but it's quicker (and simpler). Also, you don't have to do the mod 28 whenever you're feeding get_stype, because that is already taken care of by the mod 4 in get_stype.
If there are significant improvements to be had, they will come in the form of a better algorithm. (I'm not saying that your algorithm is bad, or that there is any better one. I haven't really spent enough time looking at it. But if there isn't a better algorithm to be found, then further significant speed increases will have to come from using PyPy, Cython, Shed Skin, or rewriting in a different (faster) language altogether.)

I don't think your problem is the time it takes to run those two functions. Notice that the percall value for the functions are 0.000. This means that each time the function is invoked, it takes less than 1 millisecond.
I think your problem is the number of times the functions are called. A function call in python is expensive. For example, calling a function that does nothing 57,662,556 times takes 7.15 seconds on my machine:
>>> from timeit import Timer
>>> t = Timer("t()", setup="def t(): pass")
>>> t.timeit(57662556)
7.159075975418091
One thing I'd be curious about is the shift_data variable. Are the values lists or dicts?
driver in shift_data[shift-2-nudge]
The in will take O(N) time if it's a list but O(1) time if it's a dict.
EDIT: Since shift_data values are sets, that should be fine

It seems to me that swapping between the two day-shifts or between the two night-shifts will never help. It won't change the how well the drivers like the shifts and it won't change how those shift conflict with other shifts.
So I think you should be able to only plan two shifts initially, day and night, and only afterwards split the drivers assigned into the shifts into the two actual shifts.

KenKen puzzle addends: REDUX A (corrected) non-recursive algorithm

This question relates to those parts of the KenKen Latin Square puzzles which ask you to find all possible combinations of ncells numbers with values x such that 1 <= x <= maxval and x(1) + ... + x(ncells) = targetsum. Having tested several of the more promising answers, I'm going to award the answer-prize to Lennart Regebro, because:
his routine is as fast as mine (+-5%), and
he pointed out that my original routine had a bug somewhere, which led me to see what it was really trying to do. Thanks, Lennart.
chrispy contributed an algorithm that seems equivalent to Lennart's, but 5 hrs later, sooo, first to the wire gets it.
A remark: Alex Martelli's bare-bones recursive algorithm is an example of making every possible combination and throwing them all at a sieve and seeing which go through the holes. This approach takes 20+ times longer than Lennart's or mine. (Jack up the input to max_val = 100, n_cells = 5, target_sum = 250 and on my box it's 18 secs vs. 8+ mins.) Moral: Not generating every possible combination is good.
Another remark: Lennart's and my routines generate the same answers in the same order. Are they in fact the same algorithm seen from different angles? I don't know.
Something occurs to me. If you sort the answers, starting, say, with (8,8,2,1,1) and ending with (4,4,4,4,4) (what you get with max_val=8, n_cells=5, target_sum=20), the series forms kind of a "slowest descent", with the first ones being "hot" and the last one being "cold" and the greatest possible number of stages in between. Is this related to "informational entropy"? What's the proper metric for looking at it? Is there an algorithm that producs the combinations in descending (or ascending) order of heat? (This one doesn't, as far as I can see, although it's close over short stretches, looking at normalized std. dev.)
Here's the Python routine:
#!/usr/bin/env python
#filename: makeAddCombos.07.py -- stripped for StackOverflow
def initialize_combo( max_val, n_cells, target_sum):
"""returns combo
Starting from left, fills combo to max_val or an intermediate value from 1 up.
E.g.: Given max_val = 5, n_cells=4, target_sum = 11, creates [5,4,1,1].
"""
combo = []
#Put 1 in each cell.
combo += [1] * n_cells
need = target_sum - sum(combo)
#Fill as many cells as possible to max_val.
n_full_cells = need //(max_val - 1)
top_up = max_val - 1
for i in range( n_full_cells): combo[i] += top_up
need = target_sum - sum(combo)
# Then add the rest to next item.
if need > 0:
combo[n_full_cells] += need
return combo
#def initialize_combo()
def scrunch_left( combo):
"""returns (new_combo,done)
done Boolean; if True, ignore new_combo, all done;
if Falso, new_combo is valid.
Starts a new combo list. Scanning from right to left, looks for first
element at least 2 greater than right-end element.
If one is found, decrements it, then scrunches all available counts on its
right up against its right-hand side. Returns the modified combo.
If none found, (that is, either no step or single step of 1), process
done.
"""
new_combo = []
right_end = combo[-1]
length = len(combo)
c_range = range(length-1, -1, -1)
found_step_gt_1 = False
for index in c_range:
value = combo[index]
if (value - right_end) > 1:
found_step_gt_1 = True
break
if not found_step_gt_1:
return ( new_combo,True)
if index > 0:
new_combo += combo[:index]
ceil = combo[index] - 1
new_combo += [ceil]
new_combo += [1] * ((length - 1) - index)
need = sum(combo[index:]) - sum(new_combo[index:])
fill_height = ceil - 1
ndivf = need // fill_height
nmodf = need % fill_height
if ndivf > 0:
for j in range(index + 1, index + ndivf + 1):
new_combo[j] += fill_height
if nmodf > 0:
new_combo[index + ndivf + 1] += nmodf
return (new_combo, False)
#def scrunch_left()
def make_combos_n_cells_ge_two( combos, max_val, n_cells, target_sum):
"""
Build combos, list of tuples of 2 or more addends.
"""
combo = initialize_combo( max_val, n_cells, target_sum)
combos.append( tuple( combo))
while True:
(combo, done) = scrunch_left( combo)
if done:
break
else:
combos.append( tuple( combo))
return combos
#def make_combos_n_cells_ge_two()
if __name__ == '__main__':
combos = []
max_val = 8
n_cells = 5
target_sum = 20
if n_cells == 1: combos.append( (target_sum,))
else:
combos = make_combos_n_cells_ge_two( combos, max_val, n_cells, target_sum)
import pprint
pprint.pprint( combos)

Your algorithm seems pretty good at first blush, and I don't think OO or another language would improve the code. I can't say if recursion would have helped but I admire the non-recursive approach. I bet it was harder to get working and it's harder to read but it likely is more efficient and it's definitely quite clever. To be honest I didn't analyze the algorithm in detail but it certainly looks like something that took a long while to get working correctly. I bet there were lots of off-by-1 errors and weird edge cases you had to think through, eh?
Given all that, basically all I tried to do was pretty up your code as best I could by replacing the numerous C-isms with more idiomatic Python-isms. Often times what requires a loop in C can be done in one line in Python. Also I tried to rename things to follow Python naming conventions better and cleaned up the comments a bit. Hope I don't offend you with any of my changes. You can take what you want and leave the rest. :-)
Here are the notes I took as I worked:
Changed the code that initializes tmp to a bunch of 1's to the more idiomatic tmp = [1] * n_cells.
Changed for loop that sums up tmp_sum to idiomatic sum(tmp).
Then replaced all the loops with a tmp = <list> + <list> one-liner.
Moved raise doneException to init_tmp_new_ceiling and got rid of the succeeded flag.
The check in init_tmp_new_ceiling actually seems unnecessary. Removing it, the only raises left were in make_combos_n_cells, so I just changed those to regular returns and dropped doneException entirely.
Normalized mix of 4 spaces and 8 spaces for indentation.
Removed unnecessary parentheses around your if conditions.
tmp[p2] - tmp[p1] == 0 is the same thing as tmp[p2] == tmp[p1].
Changed while True: if new_ceiling_flag: break to while not new_ceiling_flag.
You don't need to initialize variables to 0 at the top of your functions.
Removed combos list and changed function to yield its tuples as they are generated.
Renamed tmp to combo.
Renamed new_ceiling_flag to ceiling_changed.
And here's the code for your perusal:
def initial_combo(ceiling=5, target_sum=13, num_cells=4):
"""
Returns a list of possible addends, probably to be modified further.
Starts a new combo list, then, starting from left, fills items to ceiling
or intermediate between 1 and ceiling or just 1. E.g.:
Given ceiling = 5, target_sum = 13, num_cells = 4: creates [5,5,2,1].
"""
num_full_cells = (target_sum - num_cells) // (ceiling - 1)
combo = [ceiling] * num_full_cells \
+ [1] * (num_cells - num_full_cells)
if num_cells > num_full_cells:
combo[num_full_cells] += target_sum - sum(combo)
return combo
def all_combos(ceiling, target_sum, num_cells):
# p0 points at the rightmost item and moves left under some conditions
# p1 starts out at rightmost items and steps left
# p2 starts out immediately to the left of p1 and steps left as p1 does
# So, combo[p2] and combo[p1] always point at a pair of adjacent items.
# d combo[p2] - combo[p1]; immediate difference
# cd combo[p2] - combo[p0]; cumulative difference
# The ceiling decreases by 1 each iteration.
while True:
combo = initial_combo(ceiling, target_sum, num_cells)
yield tuple(combo)
ceiling_changed = False
# Generate all of the remaining combos with this ceiling.
while not ceiling_changed:
p2, p1, p0 = -2, -1, -1
while combo[p2] == combo[p1] and abs(p2) <= num_cells:
# 3,3,3,3
if abs(p2) == num_cells:
return
p2 -= 1
p1 -= 1
p0 -= 1
cd = 0
# slide_ptrs_left loop
while abs(p2) <= num_cells:
d = combo[p2] - combo[p1]
cd += d
# 5,5,3,3 or 5,5,4,3
if cd > 1:
if abs(p2) < num_cells:
# 5,5,3,3 --> 5,4,4,3
if d > 1:
combo[p2] -= 1
combo[p1] += 1
# d == 1; 5,5,4,3 --> 5,4,4,4
else:
combo[p2] -= 1
combo[p0] += 1
yield tuple(combo)
# abs(p2) == num_cells; 5,4,4,3
else:
ceiling -= 1
ceiling_changed = True
# Resume at make_combo_same_ceiling while
# and follow branch.
break
# 4,3,3,3 or 4,4,3,3
elif cd == 1:
if abs(p2) == num_cells:
return
p1 -= 1
p2 -= 1
if __name__ == '__main__':
print list(all_combos(ceiling=6, target_sum=12, num_cells=4))

First of all, I'd use variable names that mean something, so that the code gets comprehensible. Then, after I understood the problem, it's clearly a recursive problem, as once you have chosen one number, the question of finding the possible values for the rest of the squares are exactly the same problem, but with different values in.
So I would do it like this:
from __future__ import division
from math import ceil
def make_combos(max_val,target_sum,n_cells):
combos = []
# The highest possible value of the next cell is whatever is
# largest of the max_val, or the target_sum minus the number
# of remaining cells (as you can't enter 0).
highest = min(max_val, target_sum - n_cells + 1)
# The lowest is the lowest number you can have that will add upp to
# target_sum if you multiply it with n_cells.
lowest = int(ceil(target_sum/n_cells))
for x in range(highest, lowest-1, -1):
if n_cells == 1: # This is the last cell, no more recursion.
combos.append((x,))
break
# Recurse to get the next cell:
# Set the max to x (or we'll get duplicates like
# (6,3,2,1) and (6,2,3,1), which is pointless.
# Reduce the target_sum with x to keep the sum correct.
# Reduce the number of cells with 1.
for combo in make_combos(x, target_sum-x, n_cells-1):
combos.append((x,)+combo)
return combos
if __name__ == '__main__':
import pprint
# And by using pprint the output gets easier to read
pprint.pprint(make_combos( 6,12,4))
I also notice that your solution still seems buggy. For the values max_val=8, target_sum=20 and n_cells=5 your code doesn't find the solution (8,6,4,1,1,), as an example. I'm not sure if that means I've missed a rule in this or not, but as I understand the rules that should be a valid option.
Here's a version using generators, It saves a couple of lines, and memory if the values are really big, but as recursion, generators can be tricky to "get".
from __future__ import division
from math import ceil
def make_combos(max_val,target_sum,n_cells):
highest = min(max_val, target_sum - n_cells + 1)
lowest = int(ceil(target_sum/n_cells))
for x in xrange(highest, lowest-1, -1):
if n_cells == 1:
yield (x,)
break
for combo in make_combos(x, target_sum-x, n_cells-1):
yield (x,)+combo
if __name__ == '__main__':
import pprint
pprint.pprint(list(make_combos( 6,12,4)))

Here's the simplest recursive solution that I can think of to "find all possible combinations of n numbers with values x such that 1 <= x <= max_val and x(1) + ... + x(n) = target". I'm developing it from scratch. Here's a version without any optimization at all, just for simplicity:
def apcnx(n, max_val, target, xsofar=(), sumsofar=0):
if n==0:
if sumsofar==target:
yield xsofar
return
if xsofar:
minx = xsofar[-1] - 1
else:
minx = 0
for x in xrange(minx, max_val):
for xposs in apcnx(n-1, max_val, target, xsofar + (x+1,), sumsofar+x+1):
yield xposs
for xs in apcnx(4, 6, 12):
print xs
The base case n==0 (where we can't yield any more numbers) either yield the tuple so far if it satisfies the condition, or nothing, then finishes (returns).
If we're supposed to yield longer tuples than we've built so far, the if/else makes sure we only yield non-decreasing tuples, to avoid repetition (you did say "combination" rather than "permutation").
The for tries all possibilities for "this" item and loops over whatever the next-lower-down level of recursion is still able to yield.
The output I see is:
(1, 1, 4, 6)
(1, 1, 5, 5)
(1, 2, 3, 6)
(1, 2, 4, 5)
(1, 3, 3, 5)
(1, 3, 4, 4)
(2, 2, 2, 6)
(2, 2, 3, 5)
(2, 2, 4, 4)
(2, 3, 3, 4)
(3, 3, 3, 3)
which seems correct.
There are a bazillion possible optimizations, but, remember:
First make it work, then make it fast
I corresponded with Kent Beck to properly attribute this quote in "Python in a Nutshell", and he tells me he got it from his dad, whose job was actually unrelated to programming;-).
In this case, it seems to me that the key issue is understanding what's going on, and any optimization might interfere, so I'm going all out for "simple and understandable"; we can, if need be!, optimize the socks off it once the OP confirms they can understand what's going on in this sheer, unoptimized version!

Sorry to say, your code is kind of long and not particularly readable. If you can try to summarize it somehow, maybe someone can help you write it more clearly.
As for the problem itself, my first thought would be to use recursion. (For all I know, you're already doing that. Sorry again for my inability to read your code.) Think of a way that you can reduce the problem to a smaller easier version of the same problem, repeatedly, until you have a trivial case with a very simple answer.
To be a bit more concrete, you have these three parameters, max_val, target_sum, and n_cells. Can you set one of those numbers to some particular value, in order to give you an extremely simple problem requiring no thought at all? Once you have that, can you reduce the slightly harder version of the problem to the already solved one?
EDIT: Here is my code. I don't like the way it does de-duplication. I'm sure there's a more Pythonic way. Also, it disallows using the same number twice in one combination. To undo this behavior, just take out the line if n not in numlist:. I'm not sure if this is completely correct, but it seems to work and is (IMHO) more readable. You could easily add memoization and that would probably speed it up quite a bit.
def get_combos(max_val, target, n_cells):
if target <= 0:
return []
if n_cells is 1:
if target > max_val:
return []
else:
return [[target]]
else:
combos = []
for n in range(1, max_val+1, 1):
for numlist in get_combos(max_val, target-n, n_cells-1):
if n not in numlist:
combos.append(numlist + [n])
return combos
def deduplicate(combos):
for numlist in combos:
numlist.sort()
answer = [tuple(numlist) for numlist in combos]
return set(answer)
def kenken(max_val, target, n_cells):
return deduplicate(get_combos(max_val, target, n_cells))

First of all, I am learning Python myself so this solution won't be great but this is just an attempt at solving this. I have tried to solve it recursively and I think a recursive solution would be ideal for this kind of problem although THAT recursive solution might not be this one:
def GetFactors(maxVal, noOfCells, targetSum):
l = []
while(maxVal != 0):
remCells = noOfCells - 1
if(remCells > 2):
retList = GetFactors(maxVal, remCells, targetSum - maxVal)
#Append the returned List to the original List
#But first, add the maxVal to the start of every elem of returned list.
for i in retList:
i.insert(0, maxVal)
l.extend(retList)
else:
remTotal = targetSum - maxVal
for i in range(1, remTotal/2 + 1):
itemToInsert = remTotal - i;
if (i > maxVal or itemToInsert > maxVal):
continue
l.append([maxVal, i, remTotal - i])
maxVal -= 1
return l
if __name__ == "__main__":
l = GetFactors(5, 5, 15)
print l

Here a simple solution in C/C++:
const int max = 6;
int sol[N_CELLS];
void enum_solutions(int target, int n, int min) {
if (target == 0 && n == 0)
report_solution(); /* sol[0]..sol[N_CELLS-1] is a solution */
if (target <= 0 || n == 0) return; /* nothing further to explore */
sol[n - 1] = min; /* remember */
for (int i = min; i <= max; i++)
enum_solutions(target - i, n - 1, i);
}
enum_solutions(12, 4, 1);

Little bit offtopic, but still might help at programming kenken.
I got good results using DLX algorhitm for solving Killer Sudoku (very simmilar as KenKen it has cages, but only sums). It took less than second for most of problems and it was implemented in MATLAB language.
reference this forum
http://www.setbb.com/phpbb/viewtopic.php?t=1274&highlight=&mforum=sudoku
killer sudoku
"look at wikipedia, cant post hyper link" damt spammers

Here is a naive, but succinct, solution using generators:
def descending(v):
"""Decide if a square contains values in descending order"""
return list(reversed(v)) == sorted(v)
def latinSquares(max_val, target_sum, n_cells):
"""Return all descending n_cells-dimensional squares,
no cell larger than max_val, sum equal to target_sum."""
possibilities = itertools.product(range(1,max_val+1),repeat=n_cells)
for square in possibilities:
if descending(square) and sum(square) == target_sum:
yield square
I could have optimized this code by directly enumerating the list of descending grids, but I find itertools.product much clearer for a first-pass solution. Finally, calling the function:
for m in latinSquares(6, 12, 4):
print m

And here is another recursive, generator-based solution, but this time using some simple math to calculate ranges at each step, avoiding needless recursion:
def latinSquares(max_val, target_sum, n_cells):
if n_cells == 1:
assert(max_val >= target_sum >= 1)
return ((target_sum,),)
else:
lower_bound = max(-(-target_sum / n_cells), 1)
upper_bound = min(max_val, target_sum - n_cells + 1)
assert(lower_bound <= upper_bound)
return ((v,) + w for v in xrange(upper_bound, lower_bound - 1, -1)
for w in latinSquares(v, target_sum - v, n_cells - 1))
This code will fail with an AssertionError if you supply parameters that are impossible to satisfy; this is a side-effect of my "correctness criterion" that we never do an unnecessary recursion. If you don't want that side-effect, remove the assertions.
Note the use of -(-x/y) to round up after division. There may be a more pythonic way to write that. Note also I'm using generator expressions instead of yield.
for m in latinSquares(6,12,4):
print m

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.