how to setup sqlalchemy session in celery tasks with no global variable

how to setup sqlalchemy session in celery tasks with no global variable - python

Summary: I want to use a sqlalchemy session in celery tasks without having a global variable containing that session.
I am using SQLAlchemy in a project with celery tasks, and I'm having
Currently, I have a global variable 'session' defined along with my celery app setup (celery.py), with a worker signal to set it up.
session = scoped_session(sessionmaker())
#celeryd_init.connect
def configure_workers(sender=None, conf=None, **kwargs):
# load the application configuration
# db_uri = conf['db_uri']
engine = create_engine(db_uri)
session.configure(bind=engine)
In the module defining the tasks, I simply import 'session' and use it. Tasks are defined with a custom class that closes the session after returning:
class DBTask(Task):
def after_return(self, *args, **kwargs):
session.remove()
That works well, however: when unit testing with CELERY_ALWAYS_EAGER=True, the session won't be configured. The only solution I've found so far is to mock that 'session' variable when running a task in a unit test:
with mock.patch('celerymodule.tasks.session', self.session):
do_something.delay(...)
While it works, I don't want to do that.
Is there any way to setup a session that will no be a global variable, that will work both for normal asynchronous behavior and without workers with CELERY_ALWAYS_EAGER=True?

The answer was right under my nose in the official documentation about custom task classes.
I modified the custom task class that I use for tasks accessing the database:
class DBTask(Task):
_session = None
def after_return(self, *args, **kwargs):
if self._session is not None:
self._session.remove()
#property
def session(self):
if self._session is None:
_, self._session = _get_engine_session(self.conf['db_uri'],
verbose=False)
return self._session
I define my tasks this way:
#app.task(base=DBTask, bind=True)
def do_stuff_with_db(self, conf, some_arg):
self.conf = conf
thing = self.session.query(Thing).filter_by(arg=some_arg).first()
That way, the SQLAlchemy session will only be created once for each celery worker process, and I don't need any global variable.
This solves the problem with my unit tests, since the SQLAlchemy session setup is now independant from the celery workers.

Related

Overriding the Celery Task Class

Am trying to implement a task where the global variables are shared between two different Celery tasks. For that, I have inherited task class and used property. As per celery documentation the base class will initialize when a new task is invoked. Do we have an approach where can reuse the object in between tasks? Can we override the run() method from Task? If we override the run method. How can we register the task ? with celery using Celery 5. X ? Tried Serializing object.. Any alternate approach would be appreciated.
class handler(Task):
def __init__(self):
self.base_obj = ""
#property
def global_handler(self):
return self.global_thread_handler
#property
def base_handler(self):
return self.base_obj
#app.task(base=handler)
def test123():
test123.base_handler = cls1()
#app.task(base=handler)
def test456():
test456.base_handler.method()

Registering a task in Celery can be simply done using something like this:
# my_app/tasks.py
import celery
from my_app.celery import app
class MyTask(celery.Task):
def run(self):
[...]
MyTask = app.register_task(MyTask())
I think there's no way you can reuse the objects within tasks. Can someone correct me on this?

How to create a singleton object in Flask micro framework

I am creating a class for Producer which pushes messages to RabbitMQ. It makes use of pika module.
I would like to create a handler so that I have control over the number of connections that interact with Rabbit MQ.
Is there a way we can add this to app_context and later refer to that or is there way that we use init_app to define this handler.
Any code snippet would be of really good help.

In Python, using singleton pattern is not needed in most cases, because Python module is essentially singleton. But you can use it anyway.
class Singleton(object):
_instance = None
def __init__(self):
raise Error('call instance()')
#classmethod
def instance(cls):
if cls._instance is None:
cls._instance = cls.__new__(cls)
# more init operation here
return cls._instance
To use Flask (or any other web framework) app as singleton, simply try like this.
class AppContext(object):
_app = None
def __init__(self):
raise Error('call instance()')
#classmethod
def app(cls):
if cls._app is None:
cls._app = Flask(__name__)
# more init opration here
return cls._app
app = AppContext.app() # can be called as many times as you want
Or inherite Flask class and make itself as a singleton.

In a Flask App, where should Celery be instantiated?

I have a Flask app, which is a very basic app with a POST handler and some DB insertions. The DB insertions are set as tasks using Celery. If I put my Celery instance creation and tasks definition in tasks.py file, and call the functions from my main.py file (which also has the Flask app creation), I get an out of context error. The tasks in the tasks.py file in turn call a DB class that does the DB insertions. How do I properly create the Celery instance and make sure it has the Flask context?
This is how the structure roughly resembles:
main.py = Flask app creation, routes handling and tasks.delay calls.
tasks.py = Celery instance creation and task definitions.
DB = Inserts.
I want everything to work in the same context.

The Flask docs suggest subclassing Celery's Task class and wrapping task execution in a Flask app context. So in task.py if your Flask app instance is named app and your Celery instance is named celery, you would replace celery's Task attribute with the new subclass:
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
celery.Task = ContextTask

pushing celery task from flask view detach SQLAlchemy instances (DetachedInstanceError)

I am using SQLAlchemy models (derived from sqlalchemy.ext.declarative.declarative_base) together with Flask-SQLAlchemy
When I try to run any celery task (just empty)
#celery.task()
def empty_task():
pass
in common flask view
#blueprint.route(...)
def view():
image = Image(...)
db.session.add(image)
db.session.flush()
#this cause later error
empty_task()
#now accessing attributes ends with DetachedInstanceError
return jsonify({'name': image.name, ...}
i get
DetachedInstanceError: Instance <Image at 0x7f6d67e37b50> is not bound to a Session; attribute refresh operation cannot proceed
when I trying access model after task push. Without task it works fine. How to fix it?
update:
celery use this task base:
TaskBase = celery.Task
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
with app.app_context():
try:
return TaskBase.__call__(self, *args, **kwargs)
except Exception:
sentry.captureException()
raise
celery.Task = ContextTask

ah my mistake in running task. it should be
empty_task.apply_async()
calling it directly it creates new app context with new session causing closing old one.

Today I had the same issue while I was running my nose tests.
DetachedInstanceError: Instance <EdTests at 0x1071c4790> is not bound to a Session; attribute refresh operation cannot proceed
I am using Celery and Flask SQLAlchemy.
Issue was caused when I changed in testing settings:
CELERY_ALWAYS_EAGER = True
I had found that when running celery tasks synchronously db session is closed at the end of the task.
I solved my issue following Celery's documentation user guide. Celery recommends not to enable eager testing of tasks.

How to use Flask-SQLAlchemy in a Celery task

I recently switch to Celery 3.0. Before that I was using Flask-Celery in order to integrate Celery with Flask. Although it had many issues like hiding some powerful Celery functionalities but it allowed me to use the full context of Flask app and especially Flask-SQLAlchemy.
In my background tasks I am processing data and the SQLAlchemy ORM to store the data. The maintainer of Flask-Celery has dropped support of the plugin. The plugin was pickling the Flask instance in the task so I could have full access to SQLAlchemy.
I am trying to replicate this behavior in my tasks.py file but with no success. Do you have any hints on how to achieve this?

Update: We've since started using a better way to handle application teardown and set up on a per-task basis, based on the pattern described in the more recent flask documentation.
extensions.py
import flask
from flask.ext.sqlalchemy import SQLAlchemy
from celery import Celery
class FlaskCelery(Celery):
def __init__(self, *args, **kwargs):
super(FlaskCelery, self).__init__(*args, **kwargs)
self.patch_task()
if 'app' in kwargs:
self.init_app(kwargs['app'])
def patch_task(self):
TaskBase = self.Task
_celery = self
class ContextTask(TaskBase):
abstract = True
def __call__(self, *args, **kwargs):
if flask.has_app_context():
return TaskBase.__call__(self, *args, **kwargs)
else:
with _celery.app.app_context():
return TaskBase.__call__(self, *args, **kwargs)
self.Task = ContextTask
def init_app(self, app):
self.app = app
self.config_from_object(app.config)
celery = FlaskCelery()
db = SQLAlchemy()
app.py
from flask import Flask
from extensions import celery, db
def create_app():
app = Flask()
#configure/initialize all your extensions
db.init_app(app)
celery.init_app(app)
return app
Once you've set up your app this way, you can run and use celery without having to explicitly run it from within an application context, as all your tasks will automatically be run in an application context if necessary, and you don't have to explicitly worry about post-task teardown, which is an important issue to manage (see other responses below).
Troubleshooting
Those who keep getting with _celery.app.app_context(): AttributeError: 'FlaskCelery' object has no attribute 'app' make sure to:
Keep the celery import at the app.py file level. Avoid:
app.py
from flask import Flask
def create_app():
app = Flask()
initiliaze_extensions(app)
return app
def initiliaze_extensions(app):
from extensions import celery, db # DOOMED! Keep celery import at the FILE level
db.init_app(app)
celery.init_app(app)
Start you celery workers BEFORE you flask run and use
celery worker -A app:celery -l info -f celery.log
Note the app:celery, i.e. loading from app.py.
You can still import from extensions to decorate tasks, i.e. from extensions import celery.
Old answer below, still works, but not as clean a solution
I prefer to run all of celery within the application context by creating a separate file that invokes celery.start() with the application's context. This means your tasks file doesn't have to be littered with context setup and teardowns. It also lends itself well to the flask 'application factory' pattern.
extensions.py
from from flask.ext.sqlalchemy import SQLAlchemy
from celery import Celery
db = SQLAlchemy()
celery = Celery()
tasks.py
from extensions import celery, db
from flask.globals import current_app
from celery.signals import task_postrun
#celery.task
def do_some_stuff():
current_app.logger.info("I have the application context")
#you can now use the db object from extensions
#task_postrun.connect
def close_session(*args, **kwargs):
# Flask SQLAlchemy will automatically create new sessions for you from
# a scoped session factory, given that we are maintaining the same app
# context, this ensures tasks have a fresh session (e.g. session errors
# won't propagate across tasks)
db.session.remove()
app.py
from extensions import celery, db
def create_app():
app = Flask()
#configure/initialize all your extensions
db.init_app(app)
celery.config_from_object(app.config)
return app
RunCelery.py
from app import create_app
from extensions import celery
app = create_app()
if __name__ == '__main__':
with app.app_context():
celery.start()

In your tasks.py file do the following:
from main import create_app
app = create_app()
celery = Celery(__name__)
celery.add_defaults(lambda: app.config)
#celery.task
def create_facet(project_id, **kwargs):
with app.test_request_context():
# your code

I used Paul Gibbs' answer with two differences. Instead of task_postrun I used worker_process_init. And instead of .remove() I used db.session.expire_all().
I'm not 100% sure, but from what I understand the way this works is when Celery creates a worker process, all inherited/shared db sessions will be expired, and SQLAlchemy will create new sessions on demand unique to that worker process.
So far it seems to have fixed my problem. With Paul's solution, when one worker finished and removed the session, another worker using the same session was still running its query, so db.session.remove() closed the connection while it was being used, giving me a "Lost connection to MySQL server during query" exception.
Thanks Paul for steering me in the right direction!
Nevermind that didn't work. I ended up having an argument in my Flask app factory to not run db.init_app(app) if Celery was calling it. Instead the workers will call it after Celery forks them. I now see several connections in my MySQL processlist.
from extensions import db
from celery.signals import worker_process_init
from flask import current_app
#worker_process_init.connect
def celery_worker_init_db(**_):
db.init_app(current_app)

from flask import Flask
from werkzeug.utils import import_string
from celery.signals import worker_process_init, celeryd_init
from flask_celery import Celery
from src.app import config_from_env, create_app
celery = Celery()
def get_celery_conf():
config = import_string('src.settings')
config = {k: getattr(config, k) for k in dir(config) if k.isupper()}
config['BROKER_URL'] = config['CELERY_BROKER_URL']
return config
#celeryd_init.connect
def init_celeryd(conf=None, **kwargs):
conf.update(get_celery_conf())
#worker_process_init.connect
def init_celery_flask_app(**kwargs):
app = create_app()
app.app_context().push()
Update celery config at celeryd init
Use your flask app factory to inititalize all flask extensions, including SQLAlchemy extension.
By doing this, we are able to maintain database connection per-worker.
If you want to run your task under flask context, you can subclass Task.__call__:
class SmartTask(Task):
abstract = True
def __call__(self, *_args, **_kwargs):
with self.app.flask_app.app_context():
with self.app.flask_app.test_request_context():
result = super(SmartTask, self).__call__(*_args, **_kwargs)
return result
class SmartCelery(Celery):
def init_app(self, app):
super(SmartCelery, self).init_app(app)
self.Task = SmartTask

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to setup sqlalchemy session in celery tasks with no global variable - python

Related

Overriding the Celery Task Class

How to create a singleton object in Flask micro framework

In a Flask App, where should Celery be instantiated?

pushing celery task from flask view detach SQLAlchemy instances (DetachedInstanceError)

How to use Flask-SQLAlchemy in a Celery task

Categories

Resources