So the problem is that when I try to call main task from Django lpr.views.py page shows that loading icon and thats it, nothin else happens. There is no output in Django or Celery console. When I try and run the task from python shell it runs without a problem and saves result in db. I added add task for test purposes and when I run add task it returns an error because of missing 'y' argument which is normal. But what is up with that main task?
There is my code just in case.
Project structure:
Project
├── acpvs
│ ├── celery.py
│ ├── __init__.py
│ ├── settings.py
│ ├── urls.py
│ └── wsgi.py
├── db.sqlite3
├── lpr
│ ├── __init__.py
│ ├── tasks.py
│ ├── urls.py
│ └── views.py
└── manage.py
settings.py
import djcelery
INSTALLED_APPS = [
...
'djcelery',
'django_celery_results',
]
CELERY_BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'django-db'
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_ALWAYS_EAGER = False
djcelery.setup_loader()
init.py
from __future__ import absolute_import, unicode_literals
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from acpvs.celery import app as celery_app
__all__ = ['celery_app']
acpvs.celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'acpvs.settings')
app = Celery('acpvs')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
lpr.tasks.py
from __future__ import absolute_import, unicode_literals
from celery import shared_task
from djcelery import celery
#shared_task
def add(x, y):
return x + y
#shared_task
def main():
...
args = {
'imageName': imageName,
'flag': True
}
return args
lpr.urls.py
from django.conf.urls import url
from . import views
urlpatterns = [
url(r'^t/$', views.test_add),
url(r'^t1/$', views.test_main),
]
lpr.views.py
from . import tasks
from django.http import HttpResponse
def test_add(request):
result = tasks.add.delay()
return HttpResponse(result.task_id)
der test_main(request):
result = tasks.main.delay()
return HttpResponse(result.task_id)
Update
It seems to me that there is still something wrong with that how I have integrated Celery. When I remowe .delay() from views.py it works but ofcourse not async and not using Celery.
delay() is actually execute the task asynchronously, Please confirm if it updating the values in db,
I think it will update the value in db (if main method is doing) but will not return the value since client adds a message to the queue, the broker then delivers that message to a worker which then perform the operation of main
So I got it working by removing all djcelery instances and upgrading Django from 1.11 to 2.0.3
By the way I'm using Celery 4.1.0
Related
Initial Notes: The project uses Blueprints and below are the file structure and extracts of the code used...
File Structure:
/app
├── flaskapp/
│ ├── posts/
│ │ ├── __init__.py
│ │ ├── forms.py
│ │ ├── routes.py
│ │ ├── utils.py
│ ├── users/
│ │ ├── __init__.py
│ │ ├── forms.py
│ │ ├── routes.py
│ │ ├── utils.py
│ ├── main/
│ │ ├── __init__.py
│ │ ├── crons.py
│ │ ├── routes.py
│ ├── templates/
│ │ ├── users.html
│ ├── __init__.py
│ ├── config.py
│ ├── models.py
├── run.py
posts/utils.py
# Function to get all posts from DB
def get_all_posts():
post = Post.query.order_by(Post.id.asc())
return post
users/routes.py
# Importing 'get_all_posts' function from 'posts/utils.py'
from flaskapp.posts.utils import get_all_posts
users = Blueprint('users', __name__)
#All Users Route + Related Posts
#posts.route("/posts", methods=['GET'])
#login_required
def all_users():
users = User.query.order_by(User.id.asc())
return render_template('users.html', USERS=users, POSTS=get_all_posts())
main/crons.py
# Importing 'get_all_posts' function from 'posts/utils.py'
from flaskapp.posts.utils import get_all_posts
# A function to be called using 'scheduler' from 'flaskapp/__init__.py' on launch
def list_expired_posts():
posts = get_all_posts()
for p in posts:
if p.expired == 1:
print(p.title)
scheduler = BackgroundScheduler()
scheduler.add_job(func=list_expired_posts, trigger="interval", seconds=60)
scheduler.start()
# Terminate Scheduler on APP termination
atexit.register(lambda: scheduler.shutdown())
flaskapp/init.py
# __init__.py Main
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_login import LoginManager
from flaskapp.config import Config
db = SQLAlchemy()
login_manager = LoginManager()
login_manager.login_view = 'users.login'
login_manager.login_message_category = 'warning'
# Create APP Function
def create_app(config_class=Config):
app = Flask(__name__)
# Import Configuration from Config File Class
app.config.from_object(Config)
db.init_app(app)
login_manager.init_app(app)
# Import Blueprint objects
from flaskapp.posts.routes import posts
from flaskapp.users.routes import users
from flaskapp.main.routes import main
# Register Blueprints
app.register_blueprint(posts)
app.register_blueprint(users)
app.register_blueprint(main)
# END
return(app)
# Calling scheduler function 'list_expired_posts' FROM '/main/crons.py' as a scheduled job to be triggered on app initiation
from flaskapp.main.crons import list_expired_posts
list_expired_posts()
Explanation:
I have the function 'get_all_posts()' located in 'posts/utils.py' which works fine when I import it and use it in another blueprint's routes.py file (ex.** users/routes.py**) as shown above.
But I'm getting the below error when importing the same function in another blueprint (specifically.** main/crons.py**) as shown above.
I'm trying to use the 'get_all_posts()' function from 'posts/utils.py' within the 'list_expired_posts()' in 'main/crons.py' and then calling the 'list_expired_posts()' function from 'flaskapp/init.py' to trigger it on launch and keep on executing it every 60 minutes.
Error:
RuntimeError: Working outside of application context.
This typically means that you attempted to use functionality that needed
the current application. To solve this, set up an application context
with app.app_context(). See the documentation for more information.
*Conclusion Notes + Steps attempted: *
I have already eliminated the 'Scheduler' temporarily and tried working only with the function itself and not even calling it from 'flaskapp/init.py'.
I have also tried moving the below code to the 'def create_app(config_class=Config)' section without any luck
from flaskapp.main.crons import list_expired_posts
list_expired_posts()
I have also tried creating a specific Blueprint for 'crons.py' and registering it to my 'flaskapp/init.py' but still got the same result.
As a final outcome, I am trying to call the 'get_all_posts()' FROM 'posts/utils.py', then filter out the 'expired posts' using the 'list_expired_posts()' function FROM 'main/crons.py' and schedule it to print the title of the expired posts every 60 minutes.
Since I've eliminated the scheduler already to test out, I'm quite sure this is not a scheduler issue but some import mixup I'm not figuring out.
I am also aware that the 'list_expired_posts()' can become as another function in 'posts/utils.py' and then directly call the function from there using the scheduler which I've also tried but keep getting the same error.
I also tried manually configuring the app's context as instructed in other posts but I keep getting the same error.
with app.app_context():
I'm not really a Python pro and I always try seeking multiple online resources prior to posting a question here but it seems like i'm out of luck this time. Your help is truly appreciated.
There is no blueprint in crons.py, which is expected as it's about scheduling, not routing.
Not being in a route, you get the error that the application context is missing, because effectively everything not in a route does not have an application context.
You might want to look into Flask-APScheduler, which you'd initialise in a similar way to Flask-SQLAlchemy and then use in your crons.py, it should carry the application context transparently.
PS. In users/routes.py you define a users Blueprint, but them seem to use a posts blueprint in the route decorator.
Ok, sorry for delay but here is a demo of using Flask-SQLAlchemy and Flask-APSscheduler to access the app context from scheduled tasks.
from flask import Blueprint, Flask
from flask_apscheduler import APScheduler
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
scheduler = APScheduler()
users = Blueprint("users", __name__)
class User(db.Model):
id = db.Column(db.Integer(), primary_key=True)
username = db.Column(db.String(100), unique=True, nullable=False)
def __repr__(self):
return "<Name %r>" % self.id
def get_all_users():
return User.query.all()
#users.route("/all", methods=["GET"])
def handle_users_all():
return get_all_users()
#scheduler.task("cron", id="job_get_all_users", second="*/10")
def scheduled_get_all_users():
with scheduler.app.app_context():
get_all_users()
def create_app():
app = Flask(__name__)
app.register_blueprint(users)
app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:////tmp/74975778.db"
db.init_app(app)
with app.app_context():
db.create_all()
app.config["SCHEDULER_API_ENABLED"] = True
scheduler.init_app(app)
scheduler.start()
return app
NB. this does not do anything useful in the scheduled task, I leave it up to you, but it can execute the utility function get_all_users on schedule.
I also don't think this is directly related to the scheduler. I think it works in your users/routes.py because it is inside a route. So it will only get called when you visit that route. And by then your app is fully created and has its context and whatever.
But in main/crons.py it is not inside a route so I think it will try to run it when it passes this import line in your flaskapp/init.py
from flaskapp.main.crons import list_expired_posts
Maybe you could try this: putting the scheduler part into a function
main/crons.py
# Importing 'get_all_posts' function from 'posts/utils.py'
from flaskapp.posts.utils import get_all_posts
# A function to be called using 'scheduler' from 'flaskapp/__init__.py' on launch
def list_expired_posts():
posts = get_all_posts()
for p in posts:
if p.expired == 1:
print(p.title)
def initiate_scheduler():
scheduler = BackgroundScheduler()
scheduler.add_job(func=list_expired_posts, trigger="interval", seconds=60)
scheduler.start()
and then call that function in your run.py after you create the app
run.py
# first create the app
app = create_app()
# then start the scheduler
initiate_scheduler()
if __name__ == '__main__':
app.run()
# if you visit '/posts' in a browser it will call the get_all_posts()
# at this point in time after all the other code has ran.
I have built a flask app that I have been starting from an if __name__ == '__main__': block, as I saw in a tutorial. When the time came to get the app to launch from wsgi for production use, I had to remove the port and host options from app.run(), and make changes to the structure of the launching code that I am not too sure about. I am now adding test cases, which adds more ways to launch and access the app (with app.test_client(), with app.test_request_context(), and who knows what else.) What is the right / recommended way to structure the code that creates and launches the application, so that it behaves correctly (and consistently) when launched stand-alone, from wsgi, and for testing?
My current structure is as follows:
def create_app():
"""
Create and initialize our app. Does not call its run() method
"""
app = Flask(__name__)
some_initialization(app, "config_file.json")
return app
app = create_app()
...
# Services decorated with #app.route(...)
...
if __name__ == "__main__":
# The options break wsgi, I had to use `run()`
app.run(host="0.0.0.0", port=5555)
To be clear, I have already gotten wsgi and tests to work, so this question is not about how to do that; it is about the recommended way to organize the state-creating steps so that the result behaves as a module, the app object can be created as many times as necessary, service parameters like port and server can be set, etc. What should my code outline actually look like?
In addition to the launch flag issue, the current code creates an app object (once) as a side effect of importing; I could create more with create_app() but from mycode import app will retrieve the shared object... and I wonder about all those decorators that decorated the original object.
I have looked at the documentation, but the examples are simplified, and the full docs present so many alternative scenarios that I cannot figure out the code structure that the creators of Flask envisioned. I expect this is a simple and it must have a well-supported code pattern; so what is it?
Disclaimer While this isn't the only struture for Flask, this has best suited my needs and is inspired from the Flask officials docs of
using a Factory
Pattern
Project Structure
Following the structure from the Documentation
/home/user/Projects/flask-tutorial
├── flaskr/
│ ├── __init__.py
│ ├── db.py
│ ├── schema.sql
│ ├── auth.py
│ ├── blog.py
│ ├── templates/
│ │ ├── base.html
│ │ ├── auth/
│ │ │ ├── login.html
│ │ │ └── register.html
│ │ └── blog/
│ │ ├── create.html
│ │ ├── index.html
│ │ └── update.html
│ └── static/
│ └── style.css
├── tests/
│ ├── conftest.py
│ ├── data.sql
│ ├── test_factory.py
│ ├── test_db.py
│ ├── test_auth.py
│ └── test_blog.py
├── venv/
├── setup.py
└── MANIFEST.in
flaskr/, a Python package containing your application code and files.
flaskr will contain the factory to generate flask app instances that can be used by WGSI servers and will work with tests, orms (for migrations) etc.
flaskr/__init__.py contains the factory method
The Factory
The factory is aimed at configuring and creating a Flask app. This means you need to pass all required configurations in one of the many ways accepted by Flask
The dev Flask server expects the function create_app() to be present in the package __init__.py file. But when using a production server like those listed in docs you can pass the name of the function to call.
A sample from the documentation:
# flaskr/__init__.py
import os
from flask import Flask
def create_app(test_config=None):
# create and configure the app
app = Flask(__name__, instance_relative_config=True)
app.config.from_mapping(
SECRET_KEY='dev',
DATABASE=os.path.join(app.instance_path, 'flaskr.sqlite'),
)
if test_config is None:
# load the instance config, if it exists, when not testing
app.config.from_pyfile('config.py', silent=True)
else:
# load the test config if passed in
app.config.from_mapping(test_config)
# ensure the instance folder exists
try:
os.makedirs(app.instance_path)
except OSError:
pass
# a simple page that says hello
#app.route('/hello')
def hello():
return 'Hello, World!'
return app
when running a dev flask server, set environment variables as described
$ export FLASK_APP=flaskr
$ export FLASK_ENV=development
$ flask run
The Routes
A Flask app may have multiple modules that require the App object for functioning, like #app.route as you mentioned in the comments. To handle this gracefully we can make use of Blueprints. This allows us to keep the routes in a differnt file and register them in create_app()
from flask import Blueprint, render_template, abort
from jinja2 import TemplateNotFound
simple_page = Blueprint('simple_page', __name__,
template_folder='templates')
#simple_page.route('/', defaults={'page': 'index'})
#simple_page.route('/<page>')
def show(page):
try:
return render_template(f'pages/{page}.html')
except TemplateNotFound:
abort(404)
and we can modify the create_app() to register blueprint as follows:
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
# configure the app
.
.
.
from yourapplication.simple_page import simple_page
app.register_blueprint(simple_page)
return app
You will need to locally import the blueprint to avoid circular imports. But this is not graceful when having many blueprints. Hence we can create an init_blueprints(app) function in the blueprints package like
# flaskr/blueprints/__init__.py
from flaskr.blueprints.simple_page import simple_page
def init_blueprints(app):
with app.app_context():
app.register_blueprint(simple_page)
and modify create_app() as
from flaskr.blueprints import init_blueprints
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
# configure the app
.
.
.
init_blueprints(app)
return app
this way your factory does not get cluttered with blueprints. And you can handle the registration of blueprints inside the blueprint package as per your choice. This also avoids circular imports.
Other Extensions
Most common flask extensions support the factory pattern that allows you to create an object of an extension and then call obj.init_app(app) to initialize it with Flask App. Takeing Marshmallow here as an exmaple, but it applies to all. Modify create_app() as so -
ma = Marshmallow()
def create_app(test_config=None):
app = Flask(__name__, instance_relative_config=True)
# configure the app
.
.
.
init_blueprints(app)
ma.init_app(app)
return app
you can now import ma from flaskr in which ever file required.
Production Server
As mentioned initialially, the production WSGI serevrs will call the create_app() to create instances of Flask.
using gunicorn as an example, but all supported WSGI servers can be used.
$ gunicorn "flaskr:create_app()"
You can pass configurations as per gunicorn docs, and the same can be achieved within a script too.
What I did was:
class App:
def __init__(self):
# Various other initialization (e.g. logging, config, ...)
...
self.webapp = self._start_webapp(self.app_name, self.app_port, self.log)
pass
def _start_webapp(self, app_name: str, app_port: Optional[int], log: logging):
log.info('Running webapp...')
webapp = Flask(app_name)
# other flask related code
...
webapp.run(debug=False, host='0.0.0.0', port=app_port)
return webapp
pass
if __name__ == '__main__':
app = App()
This way you can add optional parameters to the init to override during tests or override via config change and even create additional types of endpoints in the same application, if you need.
I want to create a background task to update a record on a specific date. I'm using Django and Celery with RabbitMQ.
I've managed to get the task called when the model is saved with this dummy task function:
tasks.py
from __future__ import absolute_import
from celery import Celery
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
app = Celery('tasks', broker='amqp://localhost//')
#app.task(name='news.tasks.update_news_status')
def update_news_status(news_id):
# (I pass the news id and return it, nothing complicated about it)
return news_id
this task is called from my save() method in my models.py
from django.db import models
from celery import current_app
class News(models.model):
(...)
def save(self, *args, **kwargs):
current_app.send_task('news.tasks.update_news_status', args=(self.id,))
super(News, self).save(*args, **kwargs)
Thing is I want to import my News model in tasks.py but if I try to like this:
from .models import News
I get this error :
django.core.exceptions.ImproperlyConfigured: Requested setting
DEFAULT_INDEX_TABLESPACE, but settings are not configured. You must
either define the environment variable DJANGO_SETTINGS_MODULE or call
settings.configure() before accessing settings.
This is how mi celery.py looks like
from __future__ import absolute_import, unicode_literals
from celery import Celery
import os
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
I have already tried this:
can't import django model into celery task
I have tried to make the import inside the task method Django and Celery, AppRegisteredNotReady exception
I have also tried this Celery - importing models in tasks.py
I also tried to create a utils.py and import it and was not possible.
and ran into different errors but in the end I'm not able to import any module in tasks.py
There might be something wrong with my config but I can't see the error, I followed the steps in The Celery Docs: First steps with Django
Also, my project structure looks like this:
├── myapp
│ ├── __init__.py
├── ├── celery.py
│ ├── settings.py
│ ├── urls.py
│ └── wsgi.py
├── news
│ ├── __init__.py
│ ├── admin.py
│ ├── apps.py
│ ├── tasks.py
│ ├── urls.py
│ ├── models.py
│ ├── views.py
├── manage.py
I'm executing the worker from myapp directory like this:
celery -A news.tasks worker --loglevel=info
What am I missing here? Thanks in advance for your help!
lambda: settings.INSTALLED_APPS
EDIT
After making the changes suggested in comments:
Add this to celery.py
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
and import inside method: tasks.py
from __future__ import absolute_import
from celery import Celery
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
app = Celery('tasks', broker='amqp://localhost//')
#app.task(name='news.tasks.update_news_status')
def update_news_status(news_id):
from .models import News
return news_id
I get the following error:
[2018-07-20 12:24:29,337: ERROR/ForkPoolWorker-1] Task news.tasks.update_news_status[87f9ec92-c260-4ee9-a3bc-5f684c819f79] raised unexpected: ValueError('Attempted relative import in non-package',)
Traceback (most recent call last):
File "/Users/carla/Develop/App/backend/myapp-venv/lib/python2.7/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/Users/carla/Develop/App/backend/myapp-venv/lib/python2.7/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/Users/carla/Develop/App/backend/news/tasks.py", line 12, in update_news_status
from .models import News
ValueError: Attempted relative import in non-package
Ok so for anyone struggling with this... turns out my celery.py wasn't reading env variables from the settings.
After a week and lots of research I realised that Celery is not a process of Django but a process running outside of it (duh), so when I tried to load the settings they were loaded but then I wasn't able to access the env variables I have defined in my .env ( I use the dotenv library). Celery was trying to look up for the env variables in my .bash_profile (of course)
So in the end my solution was to create a helper module in the same directory where my celery.py is defined, called load_env.py with the following
from os.path import dirname, join
import dotenv
def load_env():
"Get the path to the .env file and load it."
project_dir = dirname(dirname(__file__))
dotenv.read_dotenv(join(project_dir, '.env'))
and then on my celery.py (note the last import and first instruction)
from __future__ import absolute_import, unicode_literals
from celery import Celery
from django.conf import settings
import os
from .load_env import load_env
load_env()
# set the default Django settings module for the 'celery' program.
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myapp.settings")
app = Celery('myapp')
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('myapp.settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
after the call to load_env() env variables are loaded and the celery worker has access to them. By doing this I am now able to access other modules from my tasks.py, which was my main problem.
Credits to this guys (Caktus Consulting Group) and their django-project-template because if it wasn't for them I wouldn't find the answer. Thanks.
try something like this. its working in 3.1 celery, import should happen inside save method and after super()
from django.db import models
class News(models.model):
(...)
def save(self, *args, **kwargs):
(...)
super(News, self).save(*args, **kwargs)
from task import update_news_status
update_news_status.apply_async((self.id,)) #apply_async or delay
Here what i would do (Django 1.11 and celery 4.2), you have a problem in your celery config and you try to re-declare the Celery instance :
tasks.py
from myapp.celery import app # would contain what you need :)
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#app.task(name='news.tasks.update_news_status')
def update_news_status(news_id):
# (I pass the news id and return it, nothing complicated about it)
return news_id
celery.py
from __future__ import absolute_import, unicode_literals
from celery import Celery
from django.conf import settings
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myapp.settings")
app = Celery('myapp', backend='rpc://', broker=BROKER_URL) # your config here
app.config_from_object('django.myapp:settings', namespace='CELERY') # change here
app.autodiscover_tasks()
models.py
from django.db import models
class News(models.model):
(...)
def save(self, *args, **kwargs):
super(News, self).save(*args, **kwargs)
from news.tasks import update_news_status
update_news_status.delay(self.id) # change here
And launch it with celery -A myapp worker --loglevel=info because your app is defined in myapp.celery so -A parameter need to be the app where the conf is declared
I am trying to setup a task to run every ten seconds.Using Celery Beat.
I am using:
Django==1.11.3
celery==4.1.0
django-celery-beat==1.1.1
django-celery-results==1.0.1
It is giving me the following error:
Received unregistered task of type 'operations.tasks.message'
I am new to Celery, I have tried numerous solutions and cannot seem to find a solution,would appreciate the help
settings.py
CELERY_BROKER_URL = 'pyamqp://guest#localhost//'
CELERY_RESULT_BACKEND = 'django-db'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
CELERY_TIMEZONE = 'Africa/Johannesburg'
CELERY_BEAT_SCHEDULE = {
'message': {
'task': 'operations.tasks.message',
'schedule': 10.0
}
}
celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'nodiso.settings')
app = Celery('nodiso')
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
__init__.py
from __future__ import absolute_import, unicode_literals
# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app
__all__ = ['celery_app']
task.py
from __future__ import absolute_import, unicode_literals
from celery import shared_task
from operations import models
from .celery import periodic_task
#task
def message():
t = models.Celerytest.objects.create(Message='Hello World')
t.save()
files structure
proj-
proj-
__init__.py
settings.py-
celery.py-
app-
tasks.py-
Within my celery.py file I define app like this:
app = Celery(
'your_celery_app_name',
include=[
'your_celery_app_name.module.task1',
'your_celery_app_name.module.task2',
]
)
app.config_from_object('your_celery_app_name.celeryconfig')
My celeryconfig.py is where I define my beats and other settings (I think this would be same as your settings.py).
Below is probably not relevant - I'm not an expert with Python and how package should be put together - but from my limited understanding your tasks should be a submodule of your celery app module. Take this with pinch of salt though.
My project structure looks more like this:
your_celery_app_name (dir)
setup.py (file)
your_celery_app_name (dir)
__init__.py (file)
celery.py (file)
celeryconfig.py (file)
module (dir)
__init__.py (importing task1 and task2 from tasks)
tasks.py (implementing task1 and task2)
I'm using celery (and django-celery) to allow a user to launch periodic scrapes through the django admin. This is part of a larger project but I've boiled the issue down to a minimal example.
Firstly, celery/celerybeat are running daemonized. If instead I run them with celery -A evofrontend worker -B -l info from my django project dir then I get no issues weirdly.
When I run celery/celerybeat as daemons however then I get a strange import error:
[2016-01-06 03:05:12,292: ERROR/MainProcess] Task evosched.tasks.scrapingTask[e18450ad-4dc3-47a0-b03d-4381a0e65c31] raised unexpected: ImportError('No module named myutils',)
Traceback (most recent call last):
File "/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "evosched/tasks.py", line 35, in scrapingTask
cs = CrawlerScript('TestSpider', scrapy_settings)
File "evosched/tasks.py", line 13, in __init__
self.crawler = CrawlerProcess(scrapy_settings)
File "/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 209, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 115, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 296, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 30, in from_settings
return cls(settings)
File "/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 21, in __init__
for module in walk_modules(name):
File "/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "retail/spiders/Retail_spider.py", line 16, in <module>
ImportError: No module named myutils
i.e. the spider is having issues importing from the django project app despite adding the relevant things to syslog, and doing django.setup().
My hunch is that this may be caused by a " circular import" during initialization, but I'm not sure (see here for notes on same error)
Celery daemon config
For completeness the celeryd and celerybeat configuration scripts are:
# /etc/default/celeryd
CELERYD_NODES="worker1"
CELERY_BIN="/home/lee/Desktop/pyco/evo-scraping-min/venv/bin/celery"
CELERY_APP="evofrontend"
DJANGO_SETTINGS_MODULE="evofrontend.settings"
CELERYD_CHDIR="/home/lee/Desktop/pyco/evo-scraping-min/evofrontend"
CELERYD_OPTS="--concurrency=1"
# Workers should run as an unprivileged user.
CELERYD_USER="lee"
CELERYD_GROUP="lee"
CELERY_CREATE_DIRS=1
and
# /etc/default/celerybeat
CELERY_BIN="/home/lee/Desktop/pyco/evo-scraping-min/venv/bin/celery"
CELERY_APP="evofrontend"
CELERYBEAT_CHDIR="/home/lee/Desktop/pyco/evo-scraping-min/evofrontend/"
# Django settings module
export DJANGO_SETTINGS_MODULE="evofrontend.settings"
They are largely based on the the generic ones, with the Django settings thrown in and using the celery bin in my virtualenv rather than system.
I'm also using the init.d scripts which are the generic ones.
Project structure
As for the project: it lives at /home/lee/Desktop/pyco/evo-scraping-min. All files under it have ownership lee:lee.
The dir contains both a Scrapy (evo-retail) and Django (evofrontend) project that live under it and the complete tree structure looks like
├── evofrontend
│ ├── db.sqlite3
│ ├── evofrontend
│ │ ├── celery.py
│ │ ├── __init__.py
│ │ ├── settings.py
│ │ ├── urls.py
│ │ └── wsgi.py
│ ├── evosched
│ │ ├── __init__.py
│ │ ├── myutils.py
│ │ └── tasks.py
│ └── manage.py
└── evo-retail
└── retail
├── logs
├── retail
│ ├── __init__.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ └── Retail_spider.py
└── scrapy.cfg
Django project relevant files
Now the relevant files: the evofrontend/evofrontend/celery.py looks like
# evofrontend/evofrontend/celery.py
from __future__ import absolute_import
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'evofrontend.settings')
from django.conf import settings
app = Celery('evofrontend')
# Using a string here means the worker will not have to
# pickle the object when using Windows.
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
The potentially relevant settings from the Django settings file, evofrontend/evofrontend/settings.py are
import os
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir))
INSTALLED_APPS = (
...
'djcelery',
'evosched',
)
# Celery settings
BROKER_URL = 'amqp://guest:guest#localhost//'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'Europe/London'
CELERYD_MAX_TASKS_PER_CHILD = 1 # Each worker is killed after one task, this prevents issues with reactor not being restartable
# Use django-celery backend database
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
# Set periodic task
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
The tasks.py in the scheduling app, evosched, looks like (it just launches the Scrapy spider using the relevant settings after changing dir)
# evofrontend/evosched/tasks.py
from __future__ import absolute_import
from celery import shared_task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
import os
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from django.conf import settings as django_settings
class CrawlerScript(object):
def __init__(self, spider, scrapy_settings):
self.crawler = CrawlerProcess(scrapy_settings)
self.spider = spider # just a string
def run(self, **kwargs):
# Pass the kwargs (usually command line args) to the crawler
self.crawler.crawl(self.spider, **kwargs)
self.crawler.start()
#shared_task
def scrapingTask(**kwargs):
logger.info("Start scrape...")
# scrapy.cfg file here pointing to settings...
base_dir = django_settings.BASE_DIR
os.chdir(os.path.join(base_dir, '..', 'evo-retail/retail'))
scrapy_settings = get_project_settings()
# Run crawler
cs = CrawlerScript('TestSpider', scrapy_settings)
cs.run(**kwargs)
The evofrontend/evosched/myutils.py simply contains (in this min example):
# evofrontend/evosched/myutils.py
SCRAPY_XHR_HEADERS = 'SOMETHING'
Scrapy project relevant files
In the complete Scrapy project the settings file looks like
# evo-retail/retail/retail/settings.py
BOT_NAME = 'retail'
import os
PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))
SPIDER_MODULES = ['retail.spiders']
NEWSPIDER_MODULE = 'retail.spiders'
and (in this min example) the spider is just
# evo-retail/retail/retail/spiders/Retail_spider.py
from scrapy.conf import settings as scrapy_settings
from scrapy.spiders import Spider
from scrapy.http import Request
import sys
import django
import os
import posixpath
SCRAPY_BASE_DIR = scrapy_settings['PROJECT_ROOT']
DJANGO_DIR = posixpath.normpath(os.path.join(SCRAPY_BASE_DIR, '../../../', 'evofrontend'))
sys.path.insert(0, DJANGO_DIR)
os.environ.setdefault("DJANGO_SETTINGS_MODULE", 'evofrontend.settings')
django.setup()
from evosched.myutils import SCRAPY_XHR_HEADERS
class RetailSpider(Spider):
name = "TestSpider"
def start_requests(self):
print SCRAPY_XHR_HEADERS
yield Request(url='http://www.google.com', callback=self.parse)
def parse(self, response):
print response.url
return []
EDIT:
I discovered through lots of trial and error that if the app I'm trying to import from is in my INSTALLED_APPS django setting, then it fails with the import error, but if I remove the app from there then no longer do I get the import error (e.g. removing evosched from INSTALLED_APPS then the import in the spider goes through fine...). Obviously not a solution, but may be a clue.
EDIT 2
I put a print of sys.path immediately before the failing import in the spider, the result was
/home/lee/Desktop/pyco/evo-scraping-min/evofrontend/../evo-retail/retail
/home/lee/Desktop/pyco/evo-scraping-min/venv/lib/python2.7
/home/lee/Desktop/pyco/evo-scraping-min/venv/lib/python2.7/plat-x86_64-linux-gnu
/home/lee/Desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-tk
/home/lee/Desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-old
/home/lee/Desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-dynload
/usr/lib/python2.7
/usr/lib/python2.7/plat-x86_64-linux-gnu
/usr/lib/python2.7/lib-tk
/home/lee/Desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages
/home/lee/Desktop/pyco/evo-scraping-min/evofrontend
/home/lee/Desktop/pyco/evo-scraping-min/evo-retail/retail`
EDIT 3
If I do import evosched then print dir(evosched), I see "tasks" and if I choose to include such a file, I can also see "models", so importing from models would actually be possible. I don't however see " myutils". Even from evosched import myutils fails and also fails if the statement is put in a function below rather than as a global(I thought this might route out a circular import issue...). The direct import evosched works...possibly import evosched.utils will work. Not yet tried...
It seems the celery daemon is running using the system's python and not the python binary inside the virtualenv. You need to use
# Python interpreter from environment.
ENV_PYTHON="$CELERYD_CHDIR/env/bin/python"
As mentioned here to tell celeryd to run using the python inside the virtualenv.