Updating table fields

Updating table fields - python

There is an odoo system with a timesheet module (self-made) in it.
How it works: worker came — time of the arrival has written in the timesheet — everything's good.
But there is a problem: employees, responsible for making such records, are using different timing formats: some of them are using standart HH:MM (e.g. 10:30) and some of them are using tenths HH:T (e.g. 10.5, which means the same 10:30 or even 10.125 (10:08)), so I had to make a convertation function.
Job's done, it works, but I bet there is a way to optimize it. At least, the last part of it.
#api.one
def time_button (self):
def ftohhmm(a):
if a:
a = re.sub(',' , '.' , a)
if (re.search ('^\-?\d+((,|\.)\d+)?$',a) >= 0):
if float(a) <24:
a = float(a) * 60
minutes = a%60
hours = a/60
if int(round(minutes)) < 10:
return str(int(hours))+":0"+str(int(round(minutes)))
else:
return str(int(hours))+":"+str(int(round(minutes)))
return a
if self.format:
for i in self.ids_string:
i.hours1=ftohhmm(i.hours1)
i.hours2=ftohhmm(i.hours2)
i.hours3=ftohhmm(i.hours3)
i.hours4=ftohhmm(i.hours4)
i.hours5=ftohhmm(i.hours5)
i.hours6=ftohhmm(i.hours6)
i.hours7=ftohhmm(i.hours7)
i.hours8=ftohhmm(i.hours8)
i.hours9=ftohhmm(i.hours9)
i.hours10=ftohhmm(i.hours10)
i.hours11=ftohhmm(i.hours11)
i.hours12=ftohhmm(i.hours12)
i.hours13=ftohhmm(i.hours13)
i.hours14=ftohhmm(i.hours14)
i.hours15=ftohhmm(i.hours15)
i.hours16=ftohhmm(i.hours16)
i.hours17=ftohhmm(i.hours17)
i.hours18=ftohhmm(i.hours18)
i.hours19=ftohhmm(i.hours19)
i.hours20=ftohhmm(i.hours20)
i.hours21=ftohhmm(i.hours21)
i.hours22=ftohhmm(i.hours22)
i.hours23=ftohhmm(i.hours23)
i.hours24=ftohhmm(i.hours24)
i.hours25=ftohhmm(i.hours25)
i.hours26=ftohhmm(i.hours26)
i.hours27=ftohhmm(i.hours27)
i.hours28=ftohhmm(i.hours28)
i.hours29=ftohhmm(i.hours29)
i.hours30=ftohhmm(i.hours30)
i.hours31=ftohhmm(i.hours31)
Hours1-31 are the columns for every day. Rows are for workers. Cells at the intersections contain exact time when worker came.
Any advise of how to optimize it would be great. Thanks!

for i in self.ids_string:
for j in range(1, 32):
if hasattr(i, "hours%s" % j):
a = getattr(i, "hours%s" % j)
setattr(i, "hours%s" %j, ftohhmm(a))
maybe this answer is your need.

Related

make big process on graph with python parallelised

i'm working on graphs and big dataset of complex network's. i run SIR algorithm on them with ndlib library.
but each iteration takes something like 1Sec and it make code takes 10-12 h to complete .
i was wondering is there any way to make it parallelised ?
the code is like down bellow
this line of the code is core :
sir = model.infected_SIR_MODEL(it, infectionList, False)
is there any simple method to make it run on multi thread or parallelised ?
count = 500
for i in numpy.arange(1, count, 1):
for it in model.get_nodes():
sir = model.infected_SIR_MODEL(it, infectionList, False)
each iteration :
for u in self.graph.nodes():
u_status = self.status[u]
eventp = np.random.random_sample()
neighbors = self.graph.neighbors(u)
if isinstance(self.graph, nx.DiGraph):
neighbors = self.graph.predecessors(u)
if u_status == 0:
infected_neighbors = len([v for v in neighbors if self.status[v] == 1])
if eventp < self.BetaList[u] * infected_neighbors:
actual_status[u] = 1
elif u_status == 1:
if eventp < self.params['model']['gamma']:
actual_status[u] = 2

So, if the iterations are independent, then I don't see the point of iteration over count=500. Either way the multiprocessing library might be of interest to you.
I've prepared 2 stub solutions (i.e. alter to your exact needs).
The first expects that every input is static (the changes in solutions as far as I understand the OP's question raise from the random state generation inside each iteration). With the second, you can update the input data between iterations of i. I've not tried the code as I don't have the model so it might not work directly.
import multiprocessing as mp
# if everything is independent (eg. "infectionList" is static and does not change during the iterations)
def worker(model, infectionList):
sirs = []
for it in model.get_nodes():
sir = model.infected_SIR_MODEL(it, infectionList, False)
sirs.append(sir)
return sirs
count = 500
infectionList = []
model = "YOUR MODEL INSTANCE"
data = [(model, infectionList) for _ in range(1, count+1)]
with mp.Pool() as pool:
results = pool.starmap(worker, data)
The second proposed solution if "infectionList" or something else gets updated in each iteration of "i":
def worker2(model, it, infectionList):
sir = model.infected_SIR_MODEL(it, infectionList, False)
return sir
with mp.Pool() as pool:
for i in range(1, count+1):
data = [(model, it, infectionList) for it in model.get_nodes()]
results = pool.starmap(worker2, data)
# process results, update something go to next iteration....
Edit: Updated the answer to separate proposals more clearly.

Schedule times overwritten with Schedule module when called via array in Python

I'm attempting to build an alarm application but I'm struggling to get the 'schedule' module to function how I'd like it to. The problem is that I can't seem to schedule multiple alarms for one day while calling the day attribute via an array.
Example of how you'd normally schedule multiple times for one day:
schedule.every().sunday.at('17:25').do(job)
schedule.every().sunday.at('17:30').do(job)
schedule.every().sunday.at('17:35').do(job)
This works fine, but I really want to load times with a for loop so I don't have a giant if statement, and so that I can load times dynamically:
dayArray = [
schedule.every().sunday,
schedule.every().monday,
schedule.every().tuesday,
schedule.every().wednesday,
schedule.every().thursday,
schedule.every().friday,
schedule.every().saturday
]
for i in range(1, xlsxAlarmSheet.ncols):
for j in range(1, 8):
if(str(xlsxAlarmSheet.cell_value(j, i)) != '0'):
dayArray[j - 1].at(str(xlsxAlarmSheet.cell_value(j, i))[:2] + ':' + str(xlsxAlarmSheet.cell_value(j, i))[2:]).do(job)
The days are being loaded from an array and the times from an xlsx file via the XLRD module. The only problem is the alarms are overwriting each other somehow when I schedule multiple times for one day. If I schedule 3 times for Sunday with this method for example, only the third scheduled time fires off. I thought it must be because when I load the days into an array they are no longer unique somehow, so I tried doing a 2-dimensional array:
dayArray = [[
schedule.every().sunday,
schedule.every().monday,
schedule.every().tuesday,
schedule.every().wednesday,
schedule.every().thursday,
schedule.every().friday,
schedule.every().saturday
]] * (xlsxAlarmSheet.ncols - 1)
for i in range(1, xlsxAlarmSheet.ncols):
for j in range(1, 8):
if(str(xlsxAlarmSheet.cell_value(j, i)) != '0'):
dayArray[i - 1][j - 1].at(str(xlsxAlarmSheet.cell_value(j, i))[:2] + ':' + str(xlsxAlarmSheet.cell_value(j, i))[2:]).do(job)
With no luck... the times are still overwriting each other, any ideas?

Disclaimer: 0 Python experience, only JavaScript.... But...
Try to not call the function within the array of objects like that:
dayArray = [[
schedule.every().sunday,
...
Instead just have the name of the day (the only part which is varying)
dayArray = [[
'sunday', 'monday', ...
Then in the for each use that string name when you build the function
for each .... { schedule.every()[dayArray[i]].at(...).do(...) }
My random guess is that it's somehow getting called incorrectly when stored that way, just store the part that is different (the day name), since you can just call the rest of that function in the loop (since it's the same for all).
Hopefully that makes sense. No idea if it will work, just something to try. Good luck.

I think you may need to use an index to store your values. This link might help.
https://treyhunner.com/2016/04/how-to-loop-with-indexes-in-python/#What_if_we_need_indexes?

In another question I posted, I was originally trying to substitute the day attribute with a string, like Vig is suggesting with his answer to this post. This wasn't working for me, so I ended up storing objects in an array, as Prune suggested on my original question.
However, Prune also posted a link to an example (example 7) in which the entire call to schedule a time was stored in a string and then called via eval(), which seems to work.
So this is what I ended up doing:
dayArray = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']
dayTimeArray = []
for i in range(1, xlsxAlarmSheet.ncols):
for j in range(1, 8):
if(str(xlsxAlarmSheet.cell_value(j, i)) != '0'):
dayTimeArray.append(
"schedule.every().{}.at('{}').do(StartSubProcess)".format(
dayArray[j - 1],
str(xlsxAlarmSheet.cell_value(j, i))[:2] + ':' + str(xlsxAlarmSheet.cell_value(j ,i))[2:]
)
)
for i in range(0, len(dayTimeArray)):
eval(dayTimeArray[i])

Massively parallel search operation with Dask, Distributed

I created a demo problem when testing out auto-scalling Dask Distributed implementation on Kubernetes and AWS and I I'm not sure I'm tackling the problem correctly.
My scenario is given a md5 hash of a string (representing a password) find the original string. I hit three main problems.
A) the parameter space is massive and trying to create a dask bag with 2.8211099e+12 members caused memory issues (hence the 'explode' function you'll see in the sample code below).
B) Clean exit on early find. I think using take(1, npartitions=-1) will achieve this but I wasn't sure. Originally I raised an exception raise Exception("%s is your answer' % test_str) which worked but felt "dirty"
C) Given this is long running and sometimes workers or AWS boxes die, how would it be best to store progress?
Example code:
import distributed
import math
import dask.bag as db
import hashlib
import dask
import os
if os.environ.get('SCHED_URL', False):
sched_url = os.environ['SCHED_URL']
client = distributed.Client(sched_url)
versions = client.get_versions(True)
dask.set_options(get=client.get)
difficulty = 'easy'
settings = {
'hard': (hashlib.md5('welcome1'.encode('utf-8')).hexdigest(),'abcdefghijklmnopqrstuvwxyz1234567890', 8),
'mid-hard': (hashlib.md5('032abgh'.encode('utf-8')).hexdigest(),'abcdefghijklmnop1234567890', 7),
'mid': (hashlib.md5('b08acd'.encode('utf-8')).hexdigest(),'0123456789abcdef', 6),
'easy': (hashlib.md5('0812'.encode('utf-8')).hexdigest(),'0123456789', 4)
}
hashed_pw, keyspace, max_guess_length = settings[difficulty]
def is_pw(guess):
return hashlib.md5(guess.encode('utf-8')).hexdigest() == hashed_pw
def guess(n):
guess = ''
size = len(keyspace)
while n>0 :
n -= 1
guess += keyspace[n % size];
n = math.floor(n / size);
return guess
def make_exploder(num_partitions, max_val):
"""Creates a function that maps a int to a range based on the number maximum value aimed for
and the number of partitions that are expected.
Used in this code used with map and flattent to take a short list
i.e 1->1e6 to a large one 1->1e20 in dask rather than on the host machine."""
steps = math.ceil(max_val / num_partitions)
def explode(partition):
return range(partition * steps, partition * steps + steps)
return explode
max_val = len(keyspace) ** max_guess_length # How many possiable password permutation
partitions = math.floor(max_val / 100)
partitions = partitions if partitions < 100000 else 100000 # split in to a maximum of 10000 partitions. Too many partitions caused issues, memory I think.
exploder = make_exploder(partitions, max_val) # Sort of the opposite of a reduce. make_exploder(10, 100)(3) => [30, 31, ..., 39]. Expands the problem back in to the full problem space.
print("max val: %s, partitions:%s" % (max_val, partitions))
search = db.from_sequence(range(partitions), npartitions=partitions).map(exploder).flatten().filter(lambda i: i <= max_val).map(guess).filter(is_pw)
search.take(1,npartitions=-1)
I find 'easy' works well locally, 'mid-hard' works well on our 6 to 8 * m4.2xlarge AWS cluster. But so far haven't got hard to work.

A) the parameter space is massive and trying to create a dask bag with 2.8211099e+12 members caused memory issues (hence the 'explode' function you'll see in the sample code below).
This depends strongly on how you arrange your elements into a bag. If each element is in its own partition then yes, this will certainly kill everything. 1e12 partitions is very expensive. I recommend keeping the number of partitions in the thousands or tens of thousands.
B) Clean exit on early find. I think using take(1, npartitions=-1) will achieve this but I wasn't sure. Originally I raised an exception raise Exception("%s is your answer' % test_str) which worked but felt "dirty"
If you want this then I recommend not using dask.bag, but instead using the concurrent.futures interface and in particular the as_completed iterator.
C) Given this is long running and sometimes workers or AWS boxes die, how would it be best to store progress?
Dask should be resilient to this as long as you can guarantee that the scheduler survives. If you use the concurrent futures interface rather than dask bag then you can also track intermediate results on the client process.

depth-first algorithm in python does not work

I have some project which I decide to do in Python. In brief: I have list of lists. Each of them also have lists, sometimes one-element, sometimes more. It looks like this:
rules=[
[[1],[2],[3,4,5],[4],[5],[7]]
[[1],[8],[3,7,8],[3],[45],[12]]
[[31],[12],[43,24,57],[47],[2],[43]]
]
The point is to compare values from numpy array to values from this rules (elements of rules table). We are comparing some [x][y] point to first element (e.g. 1 in first element), then, if it is true, value [x-1][j] from array with second from list and so on. Five first comparisons must be true to change value of [x][y] point. I've wrote sth like this (main function is SimulateLoop, order are switched because simulate2 function was written after second one):
def simulate2(self, i, j, w, rule):
data = Data(rule)
if w.world[i][j] in data.c:
if w.world[i-1][j] in data.n:
if w.world[i][j+1] in data.e:
if w.world[i+1][j] in data.s:
if w.world[i][j-1] in data.w:
w.world[i][j] = data.cc[0]
else: return
else: return
else: return
else: return
else: return
def SimulateLoop(self,w):
for z in range(w.steps):
for i in range(2,w.x-1):
for j in range(2,w.y-1):
for rule in w.rules:
self.simulate2(i,j,w,rule)
Data class:
class Data:
def __init__(self, rule):
self.c = rule[0]
self.n = rule[1]
self.e = rule[2]
self.s = rule[3]
self.w = rule[4]
self.cc = rule[5]
NumPy array is a object from World class. Rules is list as described above, parsed by function obtained from another program (GPL License).
To be honest it seems to work fine, but it does not. I was trying other possibilities, without luck. It is working, interpreter doesn't return any errors, but somehow values in array changing wrong. Rules are good because it was provided by program from which I've obtained parser for it (GPL license).
Maybe it will be helpful - it is Perrier's Loop, modified Langton's loop (artificial life).
Will be very thankful for any help!
)

I am not familiar with Perrier's Loop, but if you code something like famous "game life" you would have done simple mistake: store the next generation in the same array thus corrupting it.
Normally you store the next generation in temporary array and do copy/swap after the sweep, like in this sketch:
def do_step_in_game_life(world):
next_gen = zeros(world.shape) # <<< Tmp array here
Nx, Ny = world.shape
for i in range(1, Nx-1):
for j in range(1, Ny-1):
neighbours = sum(world[i-1:i+2, j-1:j+2]) - world[i,j]
if neighbours < 3:
next_gen[i,j] = 0
elif ...
world[:,:] = next_gen[:,:] # <<< Saving computed next generation

Worker/Timeslot permutation/constraint filtering algorithm

Hope you can help me out with this guys. It's not help with work -- it's for a charity of very hard working volunteers, who could really use a less confusing/annoying timetable system than what they currently have.
If anyone knows of a good third-party app which (certainly) automate this, that would almost as good. Just... please don't suggest random timetabling stuff such as the ones for booking classrooms, as I don't think they can do this.
Thanks in advance for reading; I know it's a big post. I'm trying to do my best to document this clearly though, and to show that I've made efforts on my own.
Problem
I need a worker/timeslot scheduling algorithm which generates shifts for workers, which meets the following criteria:
Input Data
import datetime.datetime as dt
class DateRange:
def __init__(self, start, end):
self.start = start
self.end = end
class Shift:
def __init__(self, range, min, max):
self.range = range
self.min_workers = min
self.max_workers = max
tue_9th_10pm = dt(2009, 1, 9, 22, 0)
wed_10th_4am = dt(2009, 1, 10, 4, 0)
wed_10th_10am = dt(2009, 1, 10, 10, 0)
shift_1_times = Range(tue_9th_10pm, wed_10th_4am)
shift_2_times = Range(wed_10th_4am, wed_10th_10am)
shift_3_times = Range(wed_10th_10am, wed_10th_2pm)
shift_1 = Shift(shift_1_times, 2,3) # allows 3, requires 2, but only 2 available
shift_2 = Shift(shift_2_times, 2,2) # allows 2
shift_3 = Shift(shift_3_times, 2,3) # allows 3, requires 2, 3 available
shifts = ( shift_1, shift_2, shift_3 )
joe_avail = [ shift_1, shift_2 ]
bob_avail = [ shift_1, shift_3 ]
sam_avail = [ shift_2 ]
amy_avail = [ shift_2 ]
ned_avail = [ shift_2, shift_3 ]
max_avail = [ shift_3 ]
jim_avail = [ shift_3 ]
joe = Worker('joe', joe_avail)
bob = Worker('bob', bob_avail)
sam = Worker('sam', sam_avail)
ned = Worker('ned', ned_avail)
max = Worker('max', max_avail)
amy = Worker('amy', amy_avail)
jim = Worker('jim', jim_avail)
workers = ( joe, bob, sam, ned, max, amy, jim )
Processing
From above, shifts and workers are the two main input variables to process
Each shift has a minimum and maximum number of workers needed. Filling the minimum requirements for a shift is crucial to success, but if all else fails, a rota with gaps to be filled manually is better than "error" :) The main algorithmic issue is that there shouldn't be unnecessary gaps, when enough workers are available.
Ideally, the maximum number of workers for a shift would be filled, but this is the lowest priority relative to other constraints, so if anything has to give, it should be this.
Flexible constraints
These are a little flexible, and their boundaries can be pushed a little if a "perfect" solution can't be found. This flexibility should be a last resort though, rather than being exploited randomly. Ideally, the flexibility would be configurable with a "fudge_factor" variable, or similar.
There is a minimum time period
between two shifts. So, a worker
shouldn't be scheduled for two shifts
in the same day, for instance.
There are a maximum number of shifts a
worker can do in a given time period
(say, a month)
There are a maximum number of certain
shifts that can be done in a month
(say, overnight shifts)
Nice to have, but not necessary
If you can come up with an algorithm which does the above and includes any/all of these, I'll be seriously impressed and grateful. Even an add-on script to do these bits separately would be great too.
Overlapping shifts. For instance,
it would be good to be able to specify
a "front desk" shift and a "back office"
shift that both occur at the same time.
This could be done with separate invocations
of the program with different shift data,
except that the constraints about scheduling
people for multiple shifts in a given time
period would be missed.
Minimum reschedule time period for workers specifiable
on a per-worker (rather than global) basis. For instance,
if Joe is feeling overworked or is dealing with personal issues,
or is a beginner learning the ropes, we might want to schedule him
less often than other workers.
Some automated/random/fair way of selecting staff to fill minimum
shift numbers when no available workers fit.
Some way of handling sudden cancellations, and just filling the gaps
without rearranging other shifts.
Output Test
Probably, the algorithm should generate as many matching Solutions as possible, where each Solution looks like this:
class Solution:
def __init__(self, shifts_workers):
"""shifts_workers -- a dictionary of shift objects as keys, and a
a lists of workers filling the shift as values."""
assert isinstance(dict, shifts_workers)
self.shifts_workers = shifts_workers
Here's a test function for an individual solution, given the above data. I think this is right, but I'd appreciate some peer review on it too.
def check_solution(solution):
assert isinstance(Solution, solution)
def shift_check(shift, workers, workers_allowed):
assert isinstance(Shift, shift):
assert isinstance(list, workers):
assert isinstance(list, workers_allowed)
num_workers = len(workers)
assert num_workers >= shift.min_workers
assert num_workers <= shift.max_workers
for w in workers_allowed:
assert w in workers
shifts_workers = solution.shifts_workers
# all shifts should be covered
assert len(shifts_workers.keys()) == 3
assert shift1 in shifts_workers.keys()
assert shift2 in shifts_workers.keys()
assert shift3 in shifts_workers.keys()
# shift_1 should be covered by 2 people - joe, and bob
shift_check(shift_1, shifts_workers[shift_1], (joe, bob))
# shift_2 should be covered by 2 people - sam and amy
shift_check(shift_2, shifts_workers[shift_2], (sam, amy))
# shift_3 should be covered by 3 people - ned, max, and jim
shift_check(shift_3, shifts_workers[shift_3], (ned,max,jim))
Attempts
I've tried implementing this with a Genetic Algorithm, but can't seem to get it tuned quite right, so although the basic principle seems to work on single shifts, it can't solve even easy cases with a few shifts and a few workers.
My latest attempt is to generate every possible permutation as a solution, then whittle down the permutations that don't meet the constraints. This seems to work much more quickly, and has gotten me further, but I'm using python 2.6's itertools.product() to help generate the permutations, and I can't quite get it right. It wouldn't surprise me if there are many bugs as, honestly, the problem doesn't fit in my head that well :)
Currently my code for this is in two files: models.py and rota.py. models.py looks like:
# -*- coding: utf-8 -*-
class Shift:
def __init__(self, start_datetime, end_datetime, min_coverage, max_coverage):
self.start = start_datetime
self.end = end_datetime
self.duration = self.end - self.start
self.min_coverage = min_coverage
self.max_coverage = max_coverage
def __repr__(self):
return "<Shift %s--%s (%r<x<%r)" % (self.start, self.end, self.min_coverage, self.max_coverage)
class Duty:
def __init__(self, worker, shift, slot):
self.worker = worker
self.shift = shift
self.slot = slot
def __repr__(self):
return "<Duty worker=%r shift=%r slot=%d>" % (self.worker, self.shift, self.slot)
def dump(self, indent=4, depth=1):
ind = " " * (indent * depth)
print ind + "<Duty shift=%s slot=%s" % (self.shift, self.slot)
self.worker.dump(indent=indent, depth=depth+1)
print ind + ">"
class Avail:
def __init__(self, start_time, end_time):
self.start = start_time
self.end = end_time
def __repr__(self):
return "<%s to %s>" % (self.start, self.end)
class Worker:
def __init__(self, name, availabilities):
self.name = name
self.availabilities = availabilities
def __repr__(self):
return "<Worker %s Avail=%r>" % (self.name, self.availabilities)
def dump(self, indent=4, depth=1):
ind = " " * (indent * depth)
print ind + "<Worker %s" % self.name
for avail in self.availabilities:
print ind + " " * indent + repr(avail)
print ind + ">"
def available_for_shift(self, shift):
for a in self.availabilities:
if shift.start >= a.start and shift.end <= a.end:
return True
print "Worker %s not available for %r (Availability: %r)" % (self.name, shift, self.availabilities)
return False
class Solution:
def __init__(self, shifts):
self._shifts = list(shifts)
def __repr__(self):
return "<Solution: shifts=%r>" % self._shifts
def duties(self):
d = []
for s in self._shifts:
for x in s:
yield x
def shifts(self):
return list(set([ d.shift for d in self.duties() ]))
def dump_shift(self, s, indent=4, depth=1):
ind = " " * (indent * depth)
print ind + "<ShiftList"
for duty in s:
duty.dump(indent=indent, depth=depth+1)
print ind + ">"
def dump(self, indent=4, depth=1):
ind = " " * (indent * depth)
print ind + "<Solution"
for s in self._shifts:
self.dump_shift(s, indent=indent, depth=depth+1)
print ind + ">"
class Env:
def __init__(self, shifts, workers):
self.shifts = shifts
self.workers = workers
self.fittest = None
self.generation = 0
class DisplayContext:
def __init__(self, env):
self.env = env
def status(self, msg, *args):
raise NotImplementedError()
def cleanup(self):
pass
def update(self):
pass
and rota.py looks like:
#!/usr/bin/env python2.6
# -*- coding: utf-8 -*-
from datetime import datetime as dt
am2 = dt(2009, 10, 1, 2, 0)
am8 = dt(2009, 10, 1, 8, 0)
pm12 = dt(2009, 10, 1, 12, 0)
def duties_for_all_workers(shifts, workers):
from models import Duty
duties = []
# for all shifts
for shift in shifts:
# for all slots
for cov in range(shift.min_coverage, shift.max_coverage):
for slot in range(cov):
# for all workers
for worker in workers:
# generate a duty
duty = Duty(worker, shift, slot+1)
duties.append(duty)
return duties
def filter_duties_for_shift(duties, shift):
matching_duties = [ d for d in duties if d.shift == shift ]
for m in matching_duties:
yield m
def duty_permutations(shifts, duties):
from itertools import product
# build a list of shifts
shift_perms = []
for shift in shifts:
shift_duty_perms = []
for slot in range(shift.max_coverage):
slot_duties = [ d for d in duties if d.shift == shift and d.slot == (slot+1) ]
shift_duty_perms.append(slot_duties)
shift_perms.append(shift_duty_perms)
all_perms = ( shift_perms, shift_duty_perms )
# generate all possible duties for all shifts
perms = list(product(*shift_perms))
return perms
def solutions_for_duty_permutations(permutations):
from models import Solution
res = []
for duties in permutations:
sol = Solution(duties)
res.append(sol)
return res
def find_clashing_duties(duty, duties):
"""Find duties for the same worker that are too close together"""
from datetime import timedelta
one_day = timedelta(days=1)
one_day_before = duty.shift.start - one_day
one_day_after = duty.shift.end + one_day
for d in [ ds for ds in duties if ds.worker == duty.worker ]:
# skip the duty we're considering, as it can't clash with itself
if duty == d:
continue
clashes = False
# check if dates are too close to another shift
if d.shift.start >= one_day_before and d.shift.start <= one_day_after:
clashes = True
# check if slots collide with another shift
if d.slot == duty.slot:
clashes = True
if clashes:
yield d
def filter_unwanted_shifts(solutions):
from models import Solution
print "possibly unwanted:", solutions
new_solutions = []
new_duties = []
for sol in solutions:
for duty in sol.duties():
duty_ok = True
if not duty.worker.available_for_shift(duty.shift):
duty_ok = False
if duty_ok:
print "duty OK:"
duty.dump(depth=1)
new_duties.append(duty)
else:
print "duty **NOT** OK:"
duty.dump(depth=1)
shifts = set([ d.shift for d in new_duties ])
shift_lists = []
for s in shifts:
shift_duties = [ d for d in new_duties if d.shift == s ]
shift_lists.append(shift_duties)
new_solutions.append(Solution(shift_lists))
return new_solutions
def filter_clashing_duties(solutions):
new_solutions = []
for sol in solutions:
solution_ok = True
for duty in sol.duties():
num_clashing_duties = len(set(find_clashing_duties(duty, sol.duties())))
# check if many duties collide with this one (and thus we should delete this one
if num_clashing_duties > 0:
solution_ok = False
break
if solution_ok:
new_solutions.append(sol)
return new_solutions
def filter_incomplete_shifts(solutions):
new_solutions = []
shift_duty_count = {}
for sol in solutions:
solution_ok = True
for shift in set([ duty.shift for duty in sol.duties() ]):
shift_duties = [ d for d in sol.duties() if d.shift == shift ]
num_workers = len(set([ d.worker for d in shift_duties ]))
if num_workers < shift.min_coverage:
solution_ok = False
if solution_ok:
new_solutions.append(sol)
return new_solutions
def filter_solutions(solutions, workers):
# filter permutations ############################
# for each solution
solutions = filter_unwanted_shifts(solutions)
solutions = filter_clashing_duties(solutions)
solutions = filter_incomplete_shifts(solutions)
return solutions
def prioritise_solutions(solutions):
# TODO: not implemented!
return solutions
# prioritise solutions ############################
# for all solutions
# score according to number of staff on a duty
# score according to male/female staff
# score according to skill/background diversity
# score according to when staff last on shift
# sort all solutions by score
def solve_duties(shifts, duties, workers):
# ramify all possible duties #########################
perms = duty_permutations(shifts, duties)
solutions = solutions_for_duty_permutations(perms)
solutions = filter_solutions(solutions, workers)
solutions = prioritise_solutions(solutions)
return solutions
def load_shifts():
from models import Shift
shifts = [
Shift(am2, am8, 2, 3),
Shift(am8, pm12, 2, 3),
]
return shifts
def load_workers():
from models import Avail, Worker
joe_avail = ( Avail(am2, am8), )
sam_avail = ( Avail(am2, am8), )
ned_avail = ( Avail(am2, am8), )
bob_avail = ( Avail(am8, pm12), )
max_avail = ( Avail(am8, pm12), )
joe = Worker("joe", joe_avail)
sam = Worker("sam", sam_avail)
ned = Worker("ned", sam_avail)
bob = Worker("bob", bob_avail)
max = Worker("max", max_avail)
return (joe, sam, ned, bob, max)
def main():
import sys
shifts = load_shifts()
workers = load_workers()
duties = duties_for_all_workers(shifts, workers)
solutions = solve_duties(shifts, duties, workers)
if len(solutions) == 0:
print "Sorry, can't solve this. Perhaps you need more staff available, or"
print "simpler duty constraints?"
sys.exit(20)
else:
print "Solved. Solutions found:"
for sol in solutions:
sol.dump()
if __name__ == "__main__":
main()
Snipping the debugging output before the result, this currently gives:
Solved. Solutions found:
<Solution
<ShiftList
<Duty shift=<Shift 2009-10-01 02:00:00--2009-10-01 08:00:00 (2<x<3) slot=1
<Worker joe
<2009-10-01 02:00:00 to 2009-10-01 08:00:00>
>
>
<Duty shift=<Shift 2009-10-01 02:00:00--2009-10-01 08:00:00 (2<x<3) slot=1
<Worker sam
<2009-10-01 02:00:00 to 2009-10-01 08:00:00>
>
>
<Duty shift=<Shift 2009-10-01 02:00:00--2009-10-01 08:00:00 (2<x<3) slot=1
<Worker ned
<2009-10-01 02:00:00 to 2009-10-01 08:00:00>
>
>
>
<ShiftList
<Duty shift=<Shift 2009-10-01 08:00:00--2009-10-01 12:00:00 (2<x<3) slot=1
<Worker bob
<2009-10-01 08:00:00 to 2009-10-01 12:00:00>
>
>
<Duty shift=<Shift 2009-10-01 08:00:00--2009-10-01 12:00:00 (2<x<3) slot=1
<Worker max
<2009-10-01 08:00:00 to 2009-10-01 12:00:00>
>
>
>
>

I've tried implementing this with a Genetic Algorithm,
but can't seem to get it tuned quite right, so although
the basic principle seems to work on single shifts,
it can't solve even easy cases with a few shifts and a few workers.
In short, don't! Unless you have lots of experience with genetic algorithms, you won't get this right.
They are approximate methods that do not guarantee converging to a workable solution.
They work only if you can reasonably well establish the quality of your current solution (i.e. number of criteria not met).
Their quality critically depends on the quality of operators you use to combine/mutate previous solutions into new ones.
It is a tough thing to get right in small python program if you have close to zero experience with GA. If you have a small group of people exhaustive search is not that bad option. The problem is that it may work right for n people, will be slow for n+1 people and will be unbearably slow for n+2 and it may very well be that your n will end up as low as 10.
You are working on an NP-complete problem and there are no easy win solutions. If the fancy timetable scheduling problem of your choice does not work good enough, it is very unlikely you will have something better with your python script.
If you insist on doing this via your own code, it is much easier to get some results with min-max or simulated annealing.

Okay, I don't know about a particular algorithm, but here is what I would take into consideration.
Evaluation
Whatever the method you will need a function to evaluate how much your solution is satisfying the constraints. You may take the 'comparison' approach (no global score but a way to compare two solutions), but I would recommend evaluation.
What would be real good is if you could obtain a score for a shorter timespan, for example daily, it is really helpful with algorithms if you can 'predict' the range of the final score from a partial solution (eg, just the first 3 days out of 7). This way you can interrupt the computation based on this partial solution if it's already too low to meet your expectations.
Symmetry
It is likely that among those 200 people you have similar profiles: ie people sharing the same characteristics (availability, experience, willingness, ...). If you take two persons with the same profile, they are going to be interchangeable:
Solution 1: (shift 1: Joe)(shift 2: Jim) ...other workers only...
Solution 2: (shift 1: Jim)(shift 2: Joe) ...other workers only...
are actually the same solution from your point of view.
The good thing is that usually, you have less profiles than persons, which helps tremendously with the time spent in computation!
For example, imagine that you generated all the solutions based on Solution 1, then there is no need to compute anything based on Solution 2.
Iterative
Instead of generating the whole schedule at once, you may consider generating it incrementally (say 1 week at a time). The net gain is that the complexity for a week is reduced (there are less possibilities).
Then, once you have this week, you compute the second one, being careful of taking the first into account the first for your constraints of course.
The advantage is that you explicitly design you algorithm to take into account an already used solution, this way for the next schedule generation it will make sure not to make a person work 24hours straight!
Serialization
You should consider the serialization of your solution objects (pick up your choice, pickle is quite good for Python). You will need the previous schedule when generating a new one, and I bet you'd rather not enter it manually for the 200 people.
Exhaustive
Now, after all that, I would actually favor an exhaustive search since using symmetry and evaluation the possibilities might not be so numerous (the problem remains NP-complete though, there is no silver bullet).
You may be willing to try your hand at the Backtracking Algorithm.
Also, you should take a look at the following links which deal with similar kind of problems:
Python Sudoku Solver
Java Sudoku Solver using Knuth Dancing Links (uses the backtracking algorithm)
Both discuss the problems encountered during the implementation, so checking them out should help you.

I don't have an algorithm choice but I can relate some practical considerations.
Since the algorithm is dealing with cancellations, it has to run whenever a scheduling exception occurs to reschedule everyone.
Consider that some algorithms are not very linear and might radically reschedule everyone from that point forward. You probably want to avoid that, people like to know their schedules well in advance.
You can deal with some cancellations without rerunning the algorithm because it can pre-schedule the next available person or two.
It might not be possible or desirable to always generate the most optimal solution, but you can keep a running count of "less-than-optimal" events per worker, and always choose the worker with the lowest count when you have to assign another "bad choice". That's what people generally care about (several "bad" scheduling decisions frequently/unfairly).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.