The code is for a simulation of registration desk with 2 personnel and 10 students where, it takes 5 min to complete the registration and 3 min is the waiting period before the next student can start registration.
I m new to simulation and I fail to understand the purpose/use/working of code tagged as #line1 and #line2
Also, why doesn't the code execute first for the for loop before #line2?
import simpy
class Student(object):
def __init__(self,env,reg):
self.env=env
self.action=env.process(self.run()) #line 1
self.reg=reg
def run(self):
with reg.request() as r:
yield r
print("Start registration at at %d" % env.now)
yield env.timeout(5)
print("Start waiting at %d "% env.now)
yield env.timeout(3)
env=simpy.Environment()
reg=simpy.Resource(env,capacity=2)
student=Student(env,reg)
for i in range (9):
env.process(student.run())
env.run(until=80) #line2
#Line 2 is what actually starts the simulation for 80 units of time. The rest of the code sets up the simulation's starting state. This env.run() is not the run() defined in the Student class
#Line 1 is more interesting. There are a couple of patterns for doing simulations. one pattern uses classes and when a class object is instantiated/created it also starts its simulation process, so you would not need to call the student.run() yourself. The __init__() gets called when you create a class object. However if you use this pattern then the loop before #line 2 should be changed from
student = Student(env, reg)
for i in range(9):
env.process(student.run())
to
students = [Student(env,reg) for _ in range(10)]
This will create 10 students, each waiting to register. The advantage of creating 10 students is each one can have individual states, like a unique id number, and collect stats about themselves. This is very useful in more complex simulations
This is how this code should have been written.
The other pattern is where you do call Student.run() yourself. To do that here you would comment out #line 1 and change the loop counter in the loop above #line 2 from 9 to 10. The disadvantage here is all your students are using the same class variables. A more common way of using this pattern is to not use a class, just a def function and call this function 10 times, passing the env, and reg directly to the function
As written, what this code is really doing is creating one student and that one student is registering 10 times, once when it is created, and then 9 more times in the loop. While this may demonstrate how a resource is used, I agree this code can be a bit confusing.
Related
I have a function for sending an email which is used in a celery task i need to make the code sleep for a second if that email function is ran for 20 times how can i make this happpen.
#app.task(bind=True)
def send_email_reminder_due_date(self):
send_email_from_template(
subject_template_path='emails/0.txt',
body_template_path='emails/1.html',
template_data=template_data,
to_email_list=email_list,
fail_silently=False,
content_subtype='html'
)
so the above code is celery periodic task which runs daily i filter the todays date and send the email for all the records which are today's date if the number of records is more than 20 lets say so for evry 20 emails sent we need to make the code sleep for a second
send_email_from_template is function which is used for sending an email
I may be missing a vital point here around what you can/can't do because you are using celery, but I'll post in case it's useful. You can track function state by assigning attributes to functions. In the below code I assign an attribute times_without_sleep to a dummy function and track its value to sleep every 20 calls.
# ====================
def my_function():
if not hasattr(my_function, 'times_without_sleep'):
my_function.times_without_sleep = 0
print('Do the stuff')
if my_function.times_without_sleep >= 20:
print('Sleep')
my_function.times_without_sleep = 0
else:
my_function.times_without_sleep += 1
# ====================
if __name__ == "__main__":
for i in range(0, 100):
my_function()
You can also set the attribute value for the function if you need to set e.g. my_function.times_without_sleep = 0 at the end of round of emails etc.
Couldn't you just do something like this with send_email_from_template? You can also make it a decorator as in this answer to avoid cluttering the function code.
I am trying to learn how to write a function that could test the probability of same birthday of two people in a room.
The birthday paradox says that the probability that two people in a room will have the same birthday is more than half, provided n, the number of people in the room, is more than 23. This property is not really a paradox, but many people find it surprising. Design a Python program that can test this paradox by a series of experiments on randomly generated birthdays, which test this paradox for n = 5,10,15,20,... ,100.
Here is the code that showed in my book.
import random
def test_birthday_paradox(num_people):
birthdays = [random.randrange(0,365) for _ in range(num_people)]
birthday_set = set()
for bday in birthdays:
if bday in birthday_set: return True
else: birthday_set.add(bday)
return False
def paradox_stats(num_people = 23, num_trials = 100):
num_successes = 0
for _ in range(num_trials):
if test_birthday_paradox(num_people): num_successes += 1
return num_successes/num_trials
paradox_stats(31)
0.77
I can't understand the code from def paradox_stats to the end of code.
Can someone help me , please?
Guessing that paradox_state(31) is a mistake and you want to write paradox_stats(31):
def paradox_stats(num_people = 23, num_trials = 100): is the definition of the function where two variables could be inserted (these variables are optional).
num_successes = 0 the code are initializing the variable num_successes to zero.
for _ in range(num_trials):
if test_birthday_paradox(num_people): num_successes += 1
return num_successes/num_trials
Here the code is running throw a range from 0 to the number of trials which the user could define once is calling the function (remember it is an optional variable).
In this loop the code is using the previous function test_birthday_paradox (which I suppose you understand as far as you say in your question) to know if someone in the room has the same birthday. In the case that the function returns True (someone has the same birthday) the variable num_successes increase its value in one (this is how works += syntax, but if you need further explanation num_successes+=1 == num_successes = num_successes+1).
And once the loop is completed the function paradox_stats return the probability in the random sample as the number of successes vs number of trials.
Hope my answer can help you.
I am fairly new to python, kindly excuse me for insufficient information if any. As a part of the curriculum , I got introduced to python for quants/finance, I am studying multiprocessing and trying to understand this better. I tried modifying the problem given and now I am stuck mentally with the problem.
Problem:
I have a function which gives me ticks, in ohlc format.
{'scrip_name':'ABC','timestamp':1504836192,'open':301.05,'high':303.80,'low':299.00,'close':301.10,'volume':100000}
every minute. I wish to do the following calculation concurrently and preferably append/insert in the samelist
Find the Moving Average of the last 5 close data
Find the Median of the last 5 open data
Save the tick data to a database.
so expected data is likely to be
['scrip_name':'ABC','timestamp':1504836192,'open':301.05,'high':303.80,'low':299.00,'close':301.10,'volume':100000,'MA_5_open':300.25,'Median_5_close':300.50]
Assuming that the data is going to a db, its fairly easy to write a simple dbinsert routine to the database, I don't see that as a great challenge, I can spawn a to execute a insert statement for every minute.
How do I sync 3 different functions/process( a function to insert into db, a function to calculate the average, a function to calculate the median), while holding in memory 5 ticks to calculate the 5 period, simple average Moving Average and push them back to the dict/list.
The following assumption, challenges me in writing the multiprocessing routine. can someone guide me. I don't want to use pandas dataframe.
====REVISION/UPDATE===
The reason, why I don't want any solution on pandas/numpy is that, my objective is to understand the basics, and not the nuances of a new library. Please don't mistake my need for understanding to be arrogance or not wanting to be open to suggestions.
The advantage of having
p1=Process(target=Median,arg(sourcelist))
p2=Process(target=Average,arg(sourcelist))
p3=process(target=insertdb,arg(updatedlist))
would help me understand the possibility of scaling processes based on no of functions /algo components.. But how should I make sure p1&p2 are in sync while p3 should execute post p1&p2
Here is an example of how to use multiprocessing:
from multiprocessing import Pool, cpu_count
def db_func(ma, med):
db.save(something)
def backtest_strat(d, db_func):
a = d.get('avg')
s = map(sum, a)
db_func(s/len(a), median(a))
with Pool(cpu_count()) as p:
from functools import partial
bs = partial(backtest_strat, db_func=db_func)
print(p.map(bs, [{'avg': [1,2,3,4,5], 'median': [1,2,3,4,5]}]))
also see :
https://stackoverflow.com/a/24101655/2026508
note that this will not speed up anything unless there are a lot of slices.
so for the speed up part:
def get_slices(data)
for slice in data:
yield {'avg': [1,2,3,4,5], 'median': [1,2,3,4,5]}
p.map(bs, get_slices)
from what i understand multiprocessing works by message passing via pickles, so the pool.map when called should have access to all three things, the two arrays, and the db_save function. There are of course other ways to go about it, but hopefully this shows one way to go about it.
Question: how should I make sure p1&p2 are in sync while p3 should execute post p1&p2
If you sync all Processes, computing one Task (p1,p2,p3) couldn't be faster as the slowes Process are be.
In the meantime the other Processes running idle.
It's called "Producer - Consumer Problem".
Solution using Queue all Data serialize, no synchronize required.
# Process-1
def Producer()
task_queue.put(data)
# Process-2
def Consumer(task_queue)
data = task_queue.get()
# process data
You want multiple Consumer Processes and one Consumer Process gather all Results.
You don't want to use Queue, but Sync Primitives.
This Example let all Processes run independent.
Only the Process Result waits until notified.
This Example uses a unlimited Task Buffer tasks = mp.Manager().list().
The Size could be minimized if List Entrys for done Tasks are reused.
If you have some very fast algos combine some to one Process.
import multiprocessing as mp
# Base class for all WORKERS
class Worker(mp.Process):
tasks = mp.Manager().list()
task_ready = mp.Condition()
parties = mp.Manager().Value(int, 0)
#classmethod
def join(self):
# Wait until all Data processed
def get_task(self):
for i, task in enumerate(Worker.tasks):
if task is None: continue
if not self.__class__.__name__ in task['result']:
return (i, task['range'])
return (None, None)
# Main Process Loop
def run(self):
while True:
# Get a Task for this WORKER
idx, _range = self.get_task()
if idx is None:
break
# Compute with self Method this _range
result = self.compute(_range)
# Update Worker.tasks
with Worker.lock:
task = Worker.tasks[idx]
task['result'][name] = result
parties = len(task['result'])
Worker.tasks[idx] = task
# If Last, notify Process Result
if parties == Worker.parties.value:
with Worker.task_ready:
Worker.task_ready.notify()
class Result(Worker):
# Main Process Loop
def run(self):
while True:
with Worker.task_ready:
Worker.task_ready.wait()
# Get (idx, _range) from tasks List
idx, _range = self.get_task()
if idx is None:
break
# process Task Results
# Mark this tasks List Entry as done for reuse
Worker.tasks[idx] = None
class Average(Worker):
def compute(self, _range):
return average of DATA[_range]
class Median(Worker):
def compute(self, _range):
return median of DATA[_range]
if __name__ == '__main__':
DATA = mp.Manager().list()
WORKERS = [Result(), Average(), Median()]
Worker.start(WORKERS)
# Example creates a Task every 5 Records
for i in range(1, 16):
DATA.append({'id': i, 'open': 300 + randrange(0, 5), 'close': 300 + randrange(-5, 5)})
if i % 5 == 0:
Worker.tasks.append({'range':(i-5, i), 'result': {}})
Worker.join()
Tested with Python: 3.4.2
I'm using SimPy in Python to create a Discrete Event Simulation that requires resources to be available based on a schedule input by the user in my case in a csv file. The aim is to represent different numbers of the same resource (e.g. staff) being available at different times of day. As far I as I can tell this isn't something that is available in base SimPy - like resource priorities.
I have managed to get this working and have included the code below to show how. However I wanted to ask the community if there is a better way to achieve this functionality in SimPy?
The code below works by requesting the resources at the start of each day for the times they are not supposed to be available - with a much higher priority to ensure they get the resource. The resources are then released at the appropriate times for use by other events/processes. As I say it works but seems wasteful with a lot of dummy processes working to ensure the correct true availability of resources. Any comments which would lead to improvements would be welcomed.
so the csv looks like:
Number time
0 23
50 22
100 17
50 10
20 8
5 6
where Number represents the number of staff that are the become available at the defined time. For example: There will be 5 staff from 6-8, 20 from 8-10, 50 from 10-17 and so on until the end of the day.
The code:
import csv
import simpy
# empty list ready to hold the input data in the csv
input_list = []
# a dummy process that "uses" staff until the end of the current day
def take_res():
req = staff.request(priority=-100)
yield req # Request a staff resource at set priority
yield test_env.timeout(24 - test_env.now)
# A dummy process that "uses" staff for the time those staff should not
# be available for the real processes
def request_res(delay, avail_time):
req = staff.request(priority=-100)
yield req # Request a staff resource at set priority
yield test_env.timeout(delay)
yield staff.release(req)
# pass time it is avail for
yield test_env.timeout(avail_time)
test_env.process(take_res())
# used to print current levels of resource usage
def print_usage():
print('At time %0.2f %d res are in use' % (test_env.now, staff.count))
yield test_env.timeout(0.5)
test_env.process(print_usage())
# used to open the csv and read the data into a list
with open('staff_schedule.csv', mode="r") as infile:
reader = csv.reader(infile)
next(reader, None) # ignore header
for row in reader:
input_list.append(row[:2])
# calculates the time the current number of resources will be
# available for and adds to the list
i = 0
for row in the_list:
if i == 0:
row.append(24 - int(input_list[i][1]))
else:
row.append(int(input_list[i-1][1]) - int(input_list[i][1]))
i += 1
# converts list to tuple of tuples to prevent any accidental
# edits from this point in
staff_tuple = tuple(tuple(row) for row in input_list)
print(staff_tuple)
# define environment and creates resources
test_env = simpy.Environment()
staff = simpy.PriorityResource(test_env, capacity=sum(int(l[0]) for l in staff_tuple))
# for each row in the tuple run dummy processes to hold resources
# according to schedule in the csv
for item in the_tuple:
print(item[0])
for i in range(int(item[0])):
test_env.process(request_res(int(item[1]), int(item[2])))
# run event to print usage over time
test_env.process(print_usage())
# run for 25 hours - so 1 day
test_env.run(until=25)
I tried something else, I overloaded the Resource class, only adding one method and while I don't fully understand the source code, it seems to work properly. You can tell the resource to change the capacity somewhere in your simulation.
from simpy.resources.resource import Resource, Request, Release
from simpy.core import BoundClass
from simpy.resources.base import BaseResource
class VariableResource(BaseResource):
def __init__(self, env, capacity):
super(VariableResource, self).__init__(env, capacity)
self.users = []
self.queue = self.put_queue
#property
def count(self):
return len(self.users)
request = BoundClass(Request)
release = BoundClass(Release)
def _do_put(self, event):
if len(self.users) < self.capacity:
self.users.append(event)
event.usage_since = self._env.now
event.succeed()
def _do_get(self, event):
try:
self.users.remove(event.request)
except ValueError:
pass
event.succeed()
def _change_capacity(self, capacity):
self._capacity = capacity
I think this should work, but I'm not a 100% confident about how the triggers work.
I solved creating a Resource for each time window. Each arrival is processed in the function service, and each customer will be assigned to a resource depending on the arrival time. In case a customer has to wait in queue and has to be re-asigned to the next time window, it is removed from current Resource and re-assigned to the next Resource. This is done by modifying the request as:
with self.Morning.request() as req1:
yield req1 | self.env.timeout(self.durationMorning)
The code:
import simpy
import numpy as np
import itertools
class Queue():
def __init__(self, env, N_m, N_e):
self.Arrival = {}
self.StartService = {}
self.FinishService = {}
self.Morning = simpy.Resource(env, N_m)
self.Evening = simpy.Resource(env, N_e)
self.env = env
self.durationMorning = 30
#arrivals/second
def t_arrival(self,t):
if t<self.durationMorning:
return 1
else:
return 2
def t_service(self):
return 5
def service(self,i):
arrival_time = self.env.now
if arrival_time==self.durationMorning:
yield self.env.timeout(0.0001)
# Add Arrival
system.Arrival[i] = arrival_time
# Morning shift
if self.env.now < self.durationMorning:
with self.Morning.request() as req1:
yield req1 | self.env.timeout(self.durationMorning)
if self.env.now < self.durationMorning:
system.StartService[i] = self.env.now
yield self.env.timeout(self.t_service())
print(f'{i} arrived at {self.Arrival[i]} done at {self.env.now} by 1')
self.FinishService[i] = self.env.now
# Evening shift
if (self.env.now >= self.durationMorning) & (i not in self.FinishService):
with self.Evening.request() as req2:
yield req2
system.StartService[i] = self.env.now
yield self.env.timeout(self.t_service())
print(f'{i} arrived at {self.Arrival[i]} done at {self.env.now} by 2')
self.FinishService[i] = self.env.now
def arrivals(self):
for i in itertools.count():
self.env.process(self.service(i))
t = self.t_arrival(self.env.now)
yield self.env.timeout(t)
env = simpy.Environment()
system = Queue(env, N_morning, N_evening)
system.env.process(system.arrivals())
system.env.run(until=60)
0 arrived at 0 done at 5 by 1
1 arrived at 1 done at 6 by 1
2 arrived at 2 done at 10 by 1
3 arrived at 3 done at 11 by 1
4 arrived at 4 done at 15 by 1
5 arrived at 5 done at 16 by 1
6 arrived at 6 done at 20 by 1
7 arrived at 7 done at 21 by 1
8 arrived at 8 done at 25 by 1
9 arrived at 9 done at 26 by 1
10 arrived at 10 done at 30 by 1
11 arrived at 11 done at 31 by 1
12 arrived at 12 done at 35 by 2
13 arrived at 13 done at 40 by 2
14 arrived at 14 done at 45 by 2
15 arrived at 15 done at 50 by 2
16 arrived at 16 done at 55 by 2
SimPy related
Maybe you can use PreemptiveResource (see this example). With this, you would only need one blocker-process per resource as it can just "kick" less important processes.
Python related
Document your code. What’s the purpose of take_res() and request_res()? (Why do both functions use priority=-100, anyway?)
Use better names. the_list or the_tuple is not very helpful.
Instead of the_list.append(row[0], row[1]) you can do the_list.append(row[:2]).
Why do you convert the list of lists into a tuple of tuples? As far as I can see the benefit. But it adds extra code and thus, extra confusion and expra possibilities for programming errors.
You should leave the with open(file) block as soon as possible (after the first four lines, in your case). There’s no need to keep the file open longer than necessary and when you are done iterating over all lines, you no longer need it.
This is how I solved it for my application. It's not perfect but was the best I could do given my basic level of skill with Python and SimPy.
The result is the correct number of Advisers are available at the desired times.
First I define a store and set the capacity to be equal to the total number of adviser instances that will exist within the simulation.
self.adviser_store = simpy.FilterStore(self.env,
capacity=self.total_ad_instances)
The instances of the Adviser class required are created in an initialization step which for brevity I have not included. I actually use a JSON file to customize the individual adviser instances which are then placed in a list.
The run parameter in the class definition below is actually another class that contains all info related to the current run of the simulation - so for example it contains the start and end dates for the simulation. self.start_date therefore defines the date that adviser starts working. self.run.start_date is the start date for the simulation.
class Adviser(object):
def __init__(self, run, id_num, start_time, end_time, start_date, end_date):
self.env = run.env
self.run = run
self.id_num = id_num
self.start_time = start_time
self.end_time = end_time
self.start_date = datetime.datetime.strptime(start_date, '%Y, %m, %d')
self.end_date = datetime.datetime.strptime(end_date, '%Y, %m, %d')
self.ad_type = ad_type
self.avail = False
self.run.env.process(self.set_availability())
So as you can see creating the adviser class also starts the process to set the availability. In the example below I've simplified it to set the same availability each day for a given date range. You could of course set different availabilities depending on date/day etc.
def set_availability(self):
start_delay = self.start_time + (self.start_date - self.run.start_date).total_seconds()/3600 # this returns the time in hours until the resource becomes available and is applied below.
end_delay = self.end_time + (self.start_date - self.run.start_date).total_seconds()/3600
repeat = (self.end_date - self.start_date).days + 1 # defines how man days to repeat it for
for i in range(repeat):
start_delayed(self.run.env, self.add_to_store(), start_delay)
start_delayed(self.run.env, self.remove_from_store(), end_delay)
start_delay += 24
end_delay += 24
yield self.run.env.timeout(0)
def add_to_store(self):
self.run.ad_avail.remove(self) # take adviser from a list
self.run.adviser_store.put(self) # and put it in the store
yield self.run.env.timeout(0)
def remove_from_store(self):
current_ad = yield self.run.adviser_store.get(lambda item: item.id_num == self.id_num) # get itself from the store
self.run.ad_avail.append(current_ad) # and put it back in the list
yield self.run.env.timeout(0)
So essentially customers can only request advisers from the store and the advisers will only be in the store at certain times. the rest of the time they are in the list attached to the current run of the simulation.
I think there is still a pitfall here. The adviser object may be in use when it is due to become unavailable. I haven't noticed if this happens as yet or the impact if it does.
just registered so I could ask this question.
Right now I have this code that prevents a class from updating more than once every five minutes:
now = datetime.now()
delta = now - myClass.last_updated_date
seconds = delta.seconds
if seconds > 300
update(myClass)
else
retrieveFromCache(myClass)
I'd like to modify it by allowing myClass to update twice per 5 minutes, instead of just once.
I was thinking of creating a list to store the last two times myClass was updated, and comparing against those in the if statement, but I fear my code will get convoluted and harder to read if I go that route.
Is there a simpler way to do this?
You could do it with a simple counter. Concept is get_update_count tracks how often the class is updated.
if seconds > 300 or get_update_count(myClass) < 2:
#and update updatecount
update(myClass)
else:
#reset update count
retrieveFromCache(myClass)
Im not sure how you uniquely identify myClass.
update_map = {}
def update(instance):
#do the update
update_map[instance] = update_map.get(instance,0)+1
def get_update_count(instance):
return update_map[instance] or 0