Is there a way to track if thread is running by using some kind of ID/reference of thread object so I would know if thread is really running at some specific time.
I have functionality that starts manufacturing processes on application in threaded mode. If there are no server restarts and nothing goes wrong, then everything completes normally in a background.
But if for example server is restarted,and at the time any of the threads were running, well they are killed, but production of some order is stuck in running state, because it is changed only after thread is complete.
I was thinking of some scheduler that would check those production orders and if it won't find related thread for running production order, then it assumes it is dead and has to be restarted.
But how can I make it track properly?
I have this code:
from threading import Thread
def action_produce_threaded(self):
thread = Thread(target=self.action_produce_thread)
thread.start()
return {'type': 'ir.actions.act_window_close'}
def action_produce_thread(self):
"""Threaded method to start job in background."""
# Create a new database cursor.
new_cr = sql_db.db_connect(self._cr.dbname).cursor()
context = self._context
job = None
with api.Environment.manage():
# Create a new environment on newly created cursor.
# Here we don't have a valid self.env, so we can safely
# assign env to self.
new_env = api.Environment(new_cr, self._uid, context)
Jobs = self.with_env(new_env).env['mrp.job']
try:
# Create a new job and commit it.
# This commit is required to know that process is started.
job = Jobs.create({
'production_id': context.get('active_id'),
'state': 'running',
'begin_date': fields.Datetime.now(),
})
new_cr.commit()
# Now call base method `do_produce` in the new cursor.
self.with_env(new_env).do_produce()
# When job will be done, update state and end_date.
job.write({
'state': 'done',
'end_date': fields.Datetime.now(),
})
except Exception as e:
# If we are here, then we have an exception. This exception will
# be written to job our job record and committed changes.
# If job doesn't exist, then rollback all changes.
if job:
job.write({
'state': 'exception',
'exception': e
})
new_cr.commit()
new_cr.rollback()
finally:
# Here commit all transactions and close cursor.
new_cr.commit()
new_cr.close()
So now at part where job is created, it can stuck when something goes wrong. It will stuck at 'running' state, because it won't be updated in database anymore.
Should I use some singleton class that would track threads through their lifetime, so some cronjob that is run periodically, could check it and decide which threads are really running and which were killed unexpectedly?
P.S. probably there is some good practice for doing it, if so, please advice.
I was able to solve this by writing singleton class. Now it tracks live threads and if server is restarted, all references to live threads will disappear and I won't find it anymore (only newly started will be found). So I will know for sure which threads are dead and can be safely restarted and which can not.
I don't know if this is the most optimal way to solve such problem, but here it is:
class ObjectTracker(object):
"""Singleton class to track current live objects."""
class __ObjectTracker:
objects = {}
#classmethod
def add_object(cls, resource, obj):
"""Add object and resource that goes it into class dict."""
cls.objects[resource] = obj
#classmethod
def get_object(cls, resource):
"""Get object using resource as identifier."""
return cls.objects.get(resource)
#classmethod
def pop_object(cls, resource):
"""Pop object if it exists."""
return cls.objects.pop(resource, None)
instance = None
def __new__(cls):
"""Instantiate only once."""
if not ObjectTracker.instance:
ObjectTracker.instance = ObjectTracker.__ObjectTracker()
return ObjectTracker.instance
def __getattr__(self, name):
"""Return from singleton instance."""
return getattr(self.instance, name)
def __setattr__(self, name):
"""Set to singleton instance."""
return setattr(self.instance, name)
P.S. this was used as starting example for my singleton: http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Singleton.html#id4
Related
My aim is to provide to a web framework access to a Pyro daemon that has time-consuming tasks at the first loading. So far, I have managed to keep in memory (outside of the web app) a single instance of a class that takes care of the time-consuming loading at its initialization. I can also query it with my web app. The code for the daemon is:
Pyro4.expose
#Pyro4.behavior(instance_mode='single')
class Store(object):
def __init__(self):
self._store = ... # the expensive loading
def query_store(self, query):
return ... # Useful query tool to expose to the web framework.
# Not time consuming, provided self._store is
# loaded.
with Pyro4.Daemon() as daemon:
uri = daemon.register(Thing)
with Pyro4.locateNS() as ns:
ns.register('thing', uri)
daemon.requestLoop()
The issue I am having is that although a single instance is created, it is only created at the first proxy query from the web app. This is normal behavior according to the doc, but not what I want, as the first query is still slow because of the initialization of Thing.
How can I make sure the instance is already created as soon as the daemon is started?
I was thinking of creating a proxy instance of Thing in the code of the daemon, but this is tricky because the event loop must be running.
EDIT
It turns out that daemon.register() can accept either a class or an object, which could be a solution. This is however not recommended in the doc (link above) and that feature apparently only exists for backwards compatibility.
Do whatever initialization you need outside of your Pyro code. Cache it somewhere. Use the instance_creator parameter of the #behavior decorator for maximum control over how and when an instance is created. You can even consider pre-creating server instances yourself and retrieving one from a pool if you so desire? Anyway, one possible way to do this is like so:
import Pyro4
def slow_initialization():
print("initializing stuff...")
import time
time.sleep(4)
print("stuff is initialized!")
return {"initialized stuff": 42}
cached_initialized_stuff = slow_initialization()
def instance_creator(cls):
print("(Pyro is asking for a server instance! Creating one!)")
return cls(cached_initialized_stuff)
#Pyro4.behavior(instance_mode="percall", instance_creator=instance_creator)
class Server:
def __init__(self, init_stuff):
self.init_stuff = init_stuff
#Pyro4.expose
def work(self):
print("server: init stuff is:", self.init_stuff)
return self.init_stuff
Pyro4.Daemon.serveSimple({
Server: "test.server"
})
But this complexity is not needed for your scenario, just initialize the thing (that takes a long time) and cache it somewhere. Instead of re-initializing it every time a new server object is created, just refer to the cached pre-initialized result. Something like this;
import Pyro4
def slow_initialization():
print("initializing stuff...")
import time
time.sleep(4)
print("stuff is initialized!")
return {"initialized stuff": 42}
cached_initialized_stuff = slow_initialization()
#Pyro4.behavior(instance_mode="percall")
class Server:
def __init__(self):
self.init_stuff = cached_initialized_stuff
#Pyro4.expose
def work(self):
print("server: init stuff is:", self.init_stuff)
return self.init_stuff
Pyro4.Daemon.serveSimple({
Server: "test.server"
})
I'm trying to cache a large resource file among tasks using Celery 4.0.2.
Looking it in the documentation , I have reach with the task caching part.
http://docs.celeryproject.org/en/latest/userguide/tasks.html#instantiation
This can also be useful to cache resources, For example, a base Task class that caches a database connection:
from celery import Task
class DatabaseTask(Task):
_db = None
#property
def db(self):
if self._db is None:
self._db = Database.connect()
return self._db
In my case I have done some changes to cache my big file resource, and the object its shared among the tasks, but the memory used by big file resource are cached in the task forever.
from celery import Task
class BigResourceTask(Task):
_resource = None
#property
def resource(self):
if self._resource is None:
self._resource = load_big_resource()
return self._resource
How can I free that memory or make it expire after the execution of all the related tasks?
Since you are creating _resource on demand after checking if it exists, you can simply delete whenever you want it.
# complete all the tasks
del BigResourceTask._resource # free memory
# do something else
r = BigResourceTask.resource # create when needed
I'm trying to create some celery tasks as classes, but am having some difficulty. The classes are:
class BaseCeleryTask(app.Task):
def is_complete(self):
""" default method for checking if celery task has completed. """
# simply return result (since by default tasks return boolean indicating completion)
try:
return self.result
except AttributeError:
logger.error('Result not defined. Make sure task has run!')
return False
class MacroReportTask(BaseCeleryTask):
def run(self, params):
""" Override the default run method with signal factory run"""
# hold on to the factory
process = MacroCountryReport(params)
self.result = process.run()
return self.result
but when I initialize the app, and check app.tasks (or run worker), app doesn't seem to have these above tasks in its registry. Other function based tasks (using app.task() decorator) seem to be registered fine.
I run the above task as:
process = SignalFactoryTask()
process.delay(params)
Celery worker errors with the following message:
Received unregistered task of type None.
I think the issue I'm having is: how do I add custom classes to the task registry as I do with regular function based tasks?
Ran into the exact same issue, took hours to find the solution cause I'm 90% sure it's a bug. In your class tasks, try the following
class BaseCeleryTask(app.Task):
def __init__(self):
self.name = "[modulename].BaseCeleryTask"
class MacroReportTask(app.Task):
def __init__(self):
self.name = "[modulename].MacroReportTask"
It seems registering it with the app still has a bug where the name isn't automatically configured. Let me know if that works.
I want to run a function when instances of the Post model are committed. I want to run it any time they are committed, so I'd rather not explicitly call the function everywhere. How can I do this?
def notify_subscribers(post):
""" send email to subscribers """
...
post = Post("Hello World", "This is my first blog entry.")
session.commit() # How to run notify_subscribers with post as argument
# as soon as post is committed successfully?
post.title = "Hello World!!1"
session.commit() # Run notify_subscribers once again.
No matter which option you chose below, SQLAlchemy comes with a big warning about the after_commit event (which is when both ways send the signal).
The Session is not in an active transaction when the after_commit() event is invoked, and therefore can not emit SQL.
If your callback needs to query or commit to the database, it may have unexpected issues. In this case, you could use a task queue such as Celery to execute this in a background thread (with a separate session). This is probably the right way to go anyway, since sending emails takes a long time and you don't want your view to wait to return while it's happening.
Flask-SQLAlchemy provides a signal you can listen to that sends all the insert/update/delete ops. It needs to be enabled by setting app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = True because tracking modifications is expensive and not needed in most cases.
Then listen for the signal:
from flask_sqlalchemy import models_committed
def notify_subscribers(app, changes):
new_posts = [target for target, op in changes if isinstance(target, Post) and op in ('insert', 'update')]
# notify about the new and updated posts
models_committed.connect(notify_subscribers, app)
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = True
You can also implement this yourself (mostly by copying the code from Flask-SQLAlchemy). It's slightly tricky, because model changes occur on flush, not on commit, so you need to record all changes as flushes occur, then use them after the commit.
from sqlalchemy import event
class ModelChangeEvent(object):
def __init__(self, session, *callbacks):
self.model_changes = {}
self.callbacks = callbacks
event.listen(session, 'before_flush', self.record_ops)
event.listen(session, 'before_commit', self.record_ops)
event.listen(session, 'after_commit', self.after_commit)
event.listen(session, 'after_rollback', self.after_rollback)
def record_ops(self, session, flush_context=None, instances=None):
for targets, operation in ((session.new, 'insert'), (session.dirty, 'update'), (session.deleted, 'delete')):
for target in targets:
state = inspect(target)
key = state.identity_key if state.has_identity else id(target)
self.model_changes[key] = (target, operation)
def after_commit(self, session):
if self._model_changes:
changes = list(self.model_changes.values())
for callback in self.callbacks:
callback(changes=changes)
self.model_changes.clear()
def after_rollback(self, session):
self.model_changes.clear()
def notify_subscribers(changes):
new_posts = [target for target, op in changes if isinstance(target, Post) and op in ('insert', 'update')]
# notify about new and updated posts
# pass all the callbacks (if you have more than notify_subscribers)
mce = ModelChangeEvent(db.session, notify_subscribers)
# or you can append more callbacks
mce.callbacks.append(my_other_callback)
I have a UUT class which instantiates Worker objects, and calls their do_stuff() method.
The Worker objects uses a Provider object for two things:
Calls methods on the provider object to do some stuff
Gets notifications from the provider by subscribing a method with the provider's events
When a worker gets a notification, it processes it, an notifies the UUT object, which in reponse can create more Worker objects.
I've already tested each class on its own, and I want to test UUT+Worker together. For that, I intend to mock-out Provider.
import mock
import unittest
import provider
class Worker():
def __init__(self, *args):
resource.default_resource.subscribe('on_spam', self._on_spam) # I'm going to patch 'resource.default_resource'
def do_stuff(self):
self.resource.do_stuff()
def _on_spam(self, message):
self._tell_uut_to_create_more_workers(message['num_of_new_workers_to_create'])
class UUT():
def __init__(self, *args):
self._workers = []
def gen_worker_and_do_stuff(self, *args)
worker = Worker(*args)
self._workers.append(resource)
worker.do_stuff()
class TestCase1(unittest.TestCase):
#mock.patch('resource.default_resource', spec_set=resource.Resource)
def test_1(self, mock_resource):
uut = UUT()
uut.gen_worker_and_do_stuff('Egg') # <-- say I automagically grabbed the resulting Worker into self.workers
self.workers[0]._on_spam({'num_of_new_workers_to_create':5}) # <-- I also want to get hold of the newly-created workers
Is there a way to grab the worker objects generated by uut, without directly accessing the _workers list in uut (which is an implementation detail)?
I guess I can do it in Worker.__init__, where the worker subscribes to provider events, so I guess the question reduces to:
How to I extract the self in the callee, when calling resource.default_resource.subscribe('on_spam', self._on_spam)?
As an application of the Dependency Inversion principle, I'd pass the Worker class as a dependency to UUT:
class UUT():
def __init__(self, make_worker=Worker):
self._workers = []
self._make_worker = make_worker
def gen_worker_and_connect(self, *args)
worker = self._make_worker(*args)
self._workers.append(resource)
worker.connect()
Then provide anything you want from the test instead of Worker. This own function could share the created object with the test scope. Besides solving this particular problem, that would also make the dependency explicit and independent of the UUT implementation. And you would not need to mock the resource thing as well, which makes the test dependent on things unrelated to the class under test.