How to avoid classmethod side effect using Celery? - python

I am running a class based app using celery, but I am noting that when two processes run simultaneously, certain staticmethods in the class are not acting independently. Here is the app invocation
import os
from PriceOptimization.celery import app
from .Tasks_Sim.sim import Sim, final_report
#app.task(name='Simulations.tasks.scoring')
def simulation(clients, deciles):
s = Sim(**sim_params)
market_by_year = s.control_flow(my_save_path)
report = final_report(market_by_year)
return report
Within my Sim app, I have a class method that creates id's for my instance as follows
class Company:
company_id = 0
#classmethod
def set_company_no(cls):
cls.company_id += 1
return cls.company_id-1
def __init__(self, companies, year):
self._company_id = Company.set_company_no()
self._company_year = year
Usually the first task instantiated will complete successfully, but on the next invocation, I am getting a list index out of range error that suggests to me that my workers are not independent and that my company_id object is not commencing from zero with the next invocation. How can I prevent this side effect and have each app run independently?

For now, I have elected to make my process run sequentially using a redis lock:
from settings import REDIS_INSTANCE
REDIS_LOCK_KEY = 'ABC'
#app.task(name='Simulations.tasks.scoring')
def simulation(clients, deciles):
timeout = (60 * 5)
have_lock = False
my_lock = REDIS_INSTANCE.lock(REDIS_LOCK_KEY, timeout=timeout)
while have_lock == False:
have_lock = my_lock.acquire(blocking=False)
if have_lock:
print('unique process commencing...')
s = Sim(**sim_params)
market_by_year = s.control_flow(my_save_path)
report = final_report(market_by_year)
else:
print('waiting for lock to commence...')
time.sleep(10)
my_lock.release()
return report

Related

How to allow a class's variables to be modified concurrently by multiple threads

I have a class (MyClass) which contains a queue (self.msg_queue) of actions that need to be run and I have multiple sources of input that can add tasks to the queue.
Right now I have three functions that I want to run concurrently:
MyClass.get_input_from_user()
Creates a window in tkinter that has the user fill out information and when the user presses submit it pushes that message onto the queue.
MyClass.get_input_from_server()
Checks the server for a message, reads the message, and then puts it onto the queue. This method uses functions from MyClass's parent class.
MyClass.execute_next_item_on_the_queue()
Pops a message off of the queue and then acts upon it. It is dependent on what the message is, but each message corresponds to some method in MyClass or its parent which gets run according to a big decision tree.
Process description:
After the class has joined the network, I have it spawn three threads (one for each of the above functions). Each threaded function adds items from the queue with the syntax "self.msg_queue.put(message)" and removes items from the queue with "self.msg_queue.get_nowait()".
Problem description:
The issue I am having is that it seems that each thread is modifying its own queue object (they are not sharing the queue, msg_queue, of the class of which they, the functions, are all members).
I am not familiar enough with Multiprocessing to know what the important error messages are; however, it is stating that it cannot pickle a weakref object (it gives no indication of which object is the weakref object), and that within the queue.put() call the line "self._sem.acquire(block, timeout) yields a '[WinError 5] Access is denied'" error. Would it be safe to assume that this failure in the queue's reference not copying over properly?
[I am using Python 3.7.2 and the Multiprocessing package's Process and Queue]
[I have seen multiple Q/As about having threads shuttle information between classes--create a master harness that generates a queue and then pass that queue as an argument to each thread. If the functions didn't have to use other functions from MyClass I could see adapting this strategy by having those functions take in a queue and use a local variable rather than class variables.]
[I am fairly confident that this error is not the result of passing my queue to the tkinter object as my unit tests on how my GUI modifies its caller's queue work fine]
Below is a minimal reproducible example for the queue's error:
from multiprocessing import Queue
from multiprocessing import Process
import queue
import time
class MyTest:
def __init__(self):
self.my_q = Queue()
self.counter = 0
def input_function_A(self):
while True:
self.my_q.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
def input_function_B(self):
while True:
self.counter = 0
self.my_q.put(self.counter)
time.sleep(1)
def output_function(self):
while True:
try:
var = self.my_q.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
def run(self):
process_A = Process(target=self.input_function_A)
process_B = Process(target=self.input_function_B)
process_C = Process(target=self.output_function)
process_A.start()
process_B.start()
process_C.start()
# without this it generates the WinError:
# with this it still behaves as if the two input functions do not modify the queue
process_C.join()
if __name__ == '__main__':
test = MyTest()
test.run()
Indeed - these are not "threads" - these are "processes" - while if you were using multithreading, and not multiprocessing, the self.my_q instance would be the same object, placed at the same memory space on the computer,
multiprocessing does a fork of the process, and any data in the original process (the one in execution in the "run" call) will be duplicated when it is used - so, each subprocess will see its own "Queue" instance, unrelated to the others.
The correct way to have various process share a multiprocessing.Queue object is to pass it as a parameter to the target methods. The simpler way to reorganize your code so that it works is thus:
from multiprocessing import Queue
from multiprocessing import Process
import queue
import time
class MyTest:
def __init__(self):
self.my_q = Queue()
self.counter = 0
def input_function_A(self, queue):
while True:
queue.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
def input_function_B(self, queue):
while True:
self.counter = 0
queue.put(self.counter)
time.sleep(1)
def output_function(self, queue):
while True:
try:
var = queue.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
def run(self):
process_A = Process(target=self.input_function_A, args=(queue,))
process_B = Process(target=self.input_function_B, args=(queue,))
process_C = Process(target=self.output_function, args=(queue,))
process_A.start()
process_B.start()
process_C.start()
# without this it generates the WinError:
# with this it still behaves as if the two input functions do not modify the queue
process_C.join()
if __name__ == '__main__':
test = MyTest()
test.run()
As you can see, since your class is not actually sharing any data through the instance's attributes, this "class" design does not make much sense for your application - but for grouping the different workers in the same code block.
It would be possible to have a magic-multiprocess-class that would have some internal method to actually start the worker-methods and share the Queue instance - so if you have a lot of those in a project, there would be a lot less boilerplate.
Something along:
from multiprocessing import Queue
from multiprocessing import Process
import time
class MPWorkerBase:
def __init__(self, *args, **kw):
self.queue = None
self.is_parent_process = False
self.is_child_process = False
self.processes = []
# ensure this can be used as a colaborative mixin
super().__init__(*args, **kw)
def run(self):
if self.is_parent_process or self.is_child_process:
# workers already initialized
return
self.queue = Queue()
processes = []
cls = self.__class__
for name in dir(cls):
method = getattr(cls, name)
if callable(method) and getattr(method, "_MP_worker", False):
process = Process(target=self._start_worker, args=(self.queue, name))
self.processes.append(process)
process.start()
# Setting these attributes here ensure the child processes have the initial values for them.
self.is_parent_process = True
self.processes = processes
def _start_worker(self, queue, method_name):
# this method is called in a new spawned process - attribute
# changes here no longer reflect attributes on the
# object in the initial process
# overwrite queue in this process with the queue object sent over the wire:
self.queue = queue
self.is_child_process = True
# call the worker method
getattr(self, method_name)()
def __del__(self):
for process in self.processes:
process.join()
def worker(func):
"""decorator to mark a method as a worker that should
run in its own subprocess
"""
func._MP_worker = True
return func
class MyTest(MPWorkerBase):
def __init__(self):
super().__init__()
self.counter = 0
#worker
def input_function_A(self):
while True:
self.queue.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
#worker
def input_function_B(self):
while True:
self.counter = 0
self.queue.put(self.counter)
time.sleep(1)
#worker
def output_function(self):
while True:
try:
var = self.queue.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
if __name__ == '__main__':
test = MyTest()
test.run()

Update long process' progress via callback instead of polling

in my Python script I am triggering a long process (drive()) that is encapsulated into a class method:
car.py
import time
class Car(object):
def __init__(self, sleep_time_in_seconds, miles_to_drive):
self.sleep_time_in_seconds = sleep_time_in_seconds
self.miles_to_drive = miles_to_drive
def drive(self):
for mile in range(self.miles_to_drive):
print('driving mile #{}'.format(mile))
time.sleep(self.sleep_time_in_seconds)
app.py
from car import Car
sleep_time = 2
total_miles = 5
car = Car(sleep_time_in_seconds=sleep_time, miles_to_drive=total_miles)
car.drive()
def print_driven_distance_in_percent(driven_miles):
print("Driven distance: {}%".format(100 * driven_miles / total_miles))
In the main script app.py I'd like to know the progress of the drive() process. One way of solving this would be to create a loop that polls the current progress from the Car class. If the Car class would inherit from Thread - polling seems to be an expected pattern as far as I have googled...
I'm just curious whether it's possible to somehow notify the main script from within the Car class about the current progress.
I thought about maybe creating a wrapper class that I can pass as argument to the Car class, and the car instance then can call the wrapper class' print_progress function.
Or is there a more pythonic way to notify the caller script on demand?
Thanks
EDIT:
Based on Artiom Kozyrev's answer - this is what I wanted to achieve:
import time
from threading import Thread
from queue import Queue
def ask_queue(q):
"""
The function to control status of our status display thread
q - Queue - need to show status of task
"""
while True:
x = q.get() # take element from Queue
if x == "STOP":
break
print("Process completed in {} percents".format(x))
print("100% finished")
class MyClass:
"""My example class"""
def __init__(self, name, status_queue):
self.name = name
self.status_queue = status_queue
def my_run(self):
"""
The function we would like to monitor
"""
# th = Thread(target=MyClass.ask_queue, args=(self.status_queue,), ) # monitoring thread
# th.start() # start monitoring thread
for i in range(100): # start doing our main function we would like to monitor
print("{} {}".format(self.name, i))
if i % 5 == 0: # every 5 steps show status of progress
self.status_queue.put(i) # send status to Queue
time.sleep(0.1)
self.status_queue.put("STOP") # stop Queue
# th.join()
if __name__ == "__main__":
q = Queue()
th = Thread(target=ask_queue, args=(q,), ) # monitoring thread
th.start() # start monitoring thread
# tests
x = MyClass("Maria", q)
x.my_run()
th.join()
Thanks to all!!
Thanks for interesting question, typically you do not need to use status as a separate thread for the case, you can just print status in the method you would like to monitor, but for training purpose you solve the issue the follwoing way, please follow comments and feel free to ask:
import time
from threading import Thread
from queue import Queue
class MyClass:
"""My example class"""
def __init__(self, name, status_queue):
self.name = name
self.status_queue = status_queue
#staticmethod
def ask_queue(q):
"""
The function to control status of our status display thread
q - Queue - need to show status of task
"""
while True:
x = q.get() # take element from Queue
if x == "STOP":
break
print("Process completed in {} percents".format(x))
print("100% finished")
def my_run(self):
"""
The function we would like to monitor
"""
th = Thread(target=MyClass.ask_queue, args=(self.status_queue,), ) # monitoring thread
th.start() # start monitoring thread
for i in range(100): # start doing our main function we would like to monitor
print("{} {}".format(self.name, i))
if i % 5 == 0: # every 5 steps show status of progress
self.status_queue.put(i) # send status to Queue
time.sleep(0.1)
self.status_queue.put("STOP") # stop Queue
th.join()
if __name__ == "__main__":
# tests
x = MyClass("Maria", Queue())
x.my_run()
print("*" * 200)
x.my_run()

Python Threading Flask

I'm creating a platform using flask+python where each loggedin user can add several social media accounts to be used for analysis. Each account added starts a new thread with the account name, which then is saved into a dic (key=account name and value=the thread). HOWEVER, when I do a hard refresh/reload on the homepage, the dic that references all the threads gets reset and returns None. I can make modifications, like get variable values from the threads or call the kill function to end the threads.I've been looking everywhere here on this site and I can't find the solution.
I've simplified my code:
class threads_manager(threading.Thread):
def __init__(self):
super(threads_manager, self).__init__()
self.cancelled = False
self.accounts = {}
def is_loggedin(self, username):
if not self.accounts:
return self.accounts[username].logged_in()
def add_account(self, usname, uspass, PROXY):
# AddAccount runs on a seperate thread that is managed by this thread
acc = AddAccount(usname, uspass, '')
acc.start()
self.accounts.update({usname: acc})
def kill_account(self, username):
if self.accounts[username] != None:
self.accounts[username].cancel()
return True
else:
return False
def run(self):
# make sure that the thread kill it self after 1 min as I cant
#get access to it later
while not self.cancelled:
if self.timeout > 0:
time.sleep(2)
self.timeout -= 1
else:
self.cancelled = True
# create new instance as global variable
manager = threads_manager()
#app.route('/addAccounts/')
def addAccounts():
global manager
# for displaying the accounts that are already stored in the server
#shows the names of the running threads (ajax call)
if request.args.get('command') == 'GETACCOUNTS':
tnames = []
for t in threading.enumerate():
tnames.append(t.name)
# prints the names as a flash/toast
return jsonify({'success' : tname})
if request.args.get('command') == 'ADDACCOUNT':
#ajax call
manager.add_account('username', 'password', 'proxy')
else:
#when reloading the website
return render_template('addAccounts.html')
Any help, ideas?

Converting web.asynchronous code to gen.coroutine in tornado

I want to convert my current tornado app from using #web.asynchronous to #gen.coroutine. My asynchronous callback is called when a particular variable change happens on an IOLoop iteration. The current example in Tornado docs solves an I/O problem but in my case its the variable that I am interested in. I want the coroutine to wake up on the variable change. My app looks like the code shown below.
Note: I can only use Python2.
# A transaction is a DB change that can happen
# from another process
class Transaction:
def __init__(self):
self.status = 'INCOMPLETE'
self.callback = None
# In this, I am checking the status of the DB
# before responding to the GET request
class MainHandler(web.RequestHandler):
def initialize(self, app_reference):
self.app_reference = app_reference
#web.asynchronous
def get(self):
txn = Transaction()
callback = functools.partial(self.do_something)
txn.callback = callback
self.app_reference.monitor_transaction(txn)
def do_something(self):
self.write("Finished GET request")
self.finish()
# MyApp monitors a list of transactions and adds the callback
# 'transaction.callback' when transactions status changes to
# COMPLETE state.
class MyApp(Application):
def __init__(self, settings):
self.settings = settings
self._url_patterns = self._get_url_patterns()
self.txn_list = [] # list of all transactions being monitored
Application.__init__(self, self._url_patterns, **self.settings)
IOLoop.current().add_callback(self.check_status)
def monitor_transaction(self, txn):
self.txn_list.append(txn)
def check_status(self):
count = 0
for transaction in self.txn_list:
transaction.status = is_transaction_complete()
if transaction.status is 'COMPLETE':
IOLoop.current().add_callback(transaction.callback)
self.txn_list.pop(count)
count += 1
if len(self.txn_list):
IOloop.current().add_callback(self.check_status)
# adds 'self' to url_patterns
def _get_url_patterns(self):
from urls import url_patterns
modified_url_patterns = []
for url in url_patterns:
modified_url_patterns.append( url + ({ 'app_reference': self },))
return modified_url_patterns
If I understand right for it to write using gen.coroutine the get should be modified as
#gen.coroutine
def get(self):
txn = Transaction()
response = yield wake_up_when_transaction_completes()
# respond to GET here
My issue is I am not sure how to wake a routine only when the status changes and I cannot use a loop as it will block the tornado thread. Basically I want to notify from the IOLoop iteration.
def check_status():
for transaction in txn_list:
if transaction.status is 'COMPLETE':
NOTIFY_COROUTINE
Sounds like a job for the new tornado.locks! Released last week with Tornado 4.2:
http://tornado.readthedocs.org/en/latest/releases/v4.2.0.html#new-modules-tornado-locks-and-tornado-queues
Use an Event for this:
from tornado import locks, gen
event = locks.Event()
#gen.coroutine
def waiter():
print("Waiting for event")
yield event.wait()
print("Done")
#gen.coroutine
def setter():
print("About to set the event")
event.set()
More info on the Event interface:
http://tornado.readthedocs.org/en/latest/locks.html#tornado.locks.Event

Shared state in multiprocessing Processes

Please consider this code:
import time
from multiprocessing import Process
class Host(object):
def __init__(self):
self.id = None
def callback(self):
print "self.id = %s" % self.id
def bind(self, event_source):
event_source.callback = self.callback
class Event(object):
def __init__(self):
self.callback = None
def trigger(self):
self.callback()
h = Host()
h.id = "A"
e = Event()
h.bind(e)
e.trigger()
def delayed_trigger(f, delay):
time.sleep(delay)
f()
p = Process(target = delayed_trigger, args = (e.trigger, 3,))
p.start()
h.id = "B"
e.trigger()
This gives in output
self.id = A
self.id = B
self.id = A
However, I expected it to give
self.id = A
self.id = B
self.id = B
..because the h.id was already changed to "B" by the time the trigger method was called.
It seems that a copy of host instance is created at the moment when the separate Process is started, so the changes in the original host do not influence that copy.
In my project (more elaborate, of course), the host instance fields are altered time to time, and it is important that the events that are triggered by the code running in a separate process, have access to those changes.
multiprocessing runs stuff in separate processes. It is almost inconceivable that things are not copied as they're sent, as sharing stuff between processes requires shared memory or communication.
In fact, if you peruse the module, you can see the amount of effort it takes to actually share anything between the processes after the diverge, either through explicit communication, or through explicitly-shared objects (which are of a very limited subset of the language, and have to be managed by a Manager).

Categories