I have an application that needs to initialize Celery and other things (e.g. database). I would like to have a .ini file that would contain the applications configuration. This should be passed to the application at runtime.
development.init:
[celery]
broker=amqp://localhost/
backend=amqp://localhost/
task.result.expires=3600
[database]
# database config
# ...
celeryconfig.py:
from celery import Celery
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read(...) # Pass this from the command line somehow
celery = Celery('myproject.celery',
broker=config.get('celery', 'broker'),
backend=config.get('celery', 'backend'),
include=['myproject.tasks'])
# Optional configuration, see the application user guide.
celery.conf.update(
CELERY_TASK_RESULT_EXPIRES=config.getint('celery', 'task.result.expires')
)
# Initialize database, etc.
if __name__ == '__main__':
celery.start()
To start Celery, I call:
celery worker --app=myproject.celeryconfig -l info
Is there anyway to pass in the config file without doing something ugly like setting a environment variable?
Alright, I took Jordan's advice and used a env variable. This is what I get in celeryconfig.py:
celery import Celery
import os
import sys
import ConfigParser
CELERY_CONFIG = 'CELERY_CONFIG'
if not CELERY_CONFIG in os.environ:
sys.stderr.write('Missing env variable "%s"\n\n' % CELERY_CONFIG)
sys.exit(2)
configfile = os.environ['CELERY_CONFIG']
if not os.path.isfile(configfile):
sys.stderr.write('Can\'t read file: "%s"\n\n' % configfile)
sys.exit(2)
config = ConfigParser.RawConfigParser()
config.read(configfile)
celery = Celery('myproject.celery',
broker=config.get('celery', 'broker'),
backend=config.get('celery', 'backend'),
include=['myproject.tasks'])
# Optional configuration, see the application user guide.
celery.conf.update(
CELERY_TASK_RESULT_EXPIRES=config.getint('celery', 'task.result.expires'),
)
if __name__ == '__main__':
celery.start()
To start Celery:
$ export CELERY_CONFIG=development.ini
$ celery worker --app=myproject.celeryconfig -l info
How is setting an environment variable ugly? You either set an environment variable with the current version of your application, or you can derive it based on hostname, or you can make your build/deployment process overwrite the file, and on development you let development.ini copy over to settings.ini in a general location, and on production you let production.ini copy over to settings.ini.
Any of these options are quite common. Using a configuration management tool such as Chef or Puppet to put the file in place is a good option.
Related
I am using Open Semantic Search (OSS) and I would like to monitor its processes using the Flower tool. The workers that Celery needs should be given as OSS states on its website
The workers will do tasks like analysis and indexing of the queued files. The workers are implemented by etl/tasks.py and will be started automatically on boot by the service opensemanticsearch.
This tasks.py file looks as follows:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
#
# Queue tasks for batch processing and parallel processing
#
# Queue handler
from celery import Celery
# ETL connectors
from etl import ETL
from etl_delete import Delete
from etl_file import Connector_File
from etl_web import Connector_Web
from etl_rss import Connector_RSS
verbose = True
quiet = False
app = Celery('etl.tasks')
app.conf.CELERYD_MAX_TASKS_PER_CHILD = 1
etl_delete = Delete()
etl_web = Connector_Web()
etl_rss = Connector_RSS()
#
# Delete document with URI from index
#
#app.task(name='etl.delete')
def delete(uri):
etl_delete.delete(uri=uri)
#
# Index a file
#
#app.task(name='etl.index_file')
def index_file(filename, wait=0, config=None):
if wait:
time.sleep(wait)
etl_file = Connector_File()
if config:
etl_file.config = config
etl_file.index(filename=filename)
#
# Index file directory
#
#app.task(name='etl.index_filedirectory')
def index_filedirectory(filename):
from etl_filedirectory import Connector_Filedirectory
connector_filedirectory = Connector_Filedirectory()
result = connector_filedirectory.index(filename)
return result
#
# Index a webpage
#
#app.task(name='etl.index_web')
def index_web(uri, wait=0, downloaded_file=False, downloaded_headers=[]):
if wait:
time.sleep(wait)
result = etl_web.index(uri, downloaded_file=downloaded_file, downloaded_headers=downloaded_headers)
return result
#
# Index full website
#
#app.task(name='etl.index_web_crawl')
def index_web_crawl(uri, crawler_type="PATH"):
import etl_web_crawl
result = etl_web_crawl.index(uri, crawler_type)
return result
#
# Index webpages from sitemap
#
#app.task(name='etl.index_sitemap')
def index_sitemap(uri):
from etl_sitemap import Connector_Sitemap
connector_sitemap = Connector_Sitemap()
result = connector_sitemap.index(uri)
return result
#
# Index RSS Feed
#
#app.task(name='etl.index_rss')
def index_rss(uri):
result = etl_rss.index(uri)
return result
#
# Enrich with / run plugins
#
#app.task(name='etl.enrich')
def enrich(plugins, uri, wait=0):
if wait:
time.sleep(wait)
etl = ETL()
etl.read_configfile('/etc/opensemanticsearch/etl')
etl.read_configfile('/etc/opensemanticsearch/enhancer-rdf')
etl.config['plugins'] = plugins.split(',')
filename = uri
# if exist delete protocoll prefix file://
if filename.startswith("file://"):
filename = filename.replace("file://", '', 1)
parameters = etl.config.copy()
parameters['id'] = uri
parameters['filename'] = filename
parameters, data = etl.process (parameters=parameters, data={})
return data
#
# Read command line arguments and start
#
#if running (not imported to use its functions), run main function
if __name__ == "__main__":
from optparse import OptionParser
parser = OptionParser("etl-tasks [options]")
parser.add_option("-q", "--quiet", dest="quiet", action="store_true", default=False, help="Don\'t print status (filenames) while indexing")
parser.add_option("-v", "--verbose", dest="verbose", action="store_true", default=False, help="Print debug messages")
(options, args) = parser.parse_args()
if options.verbose == False or options.verbose==True:
verbose = options.verbose
etl_delete.verbose = options.verbose
etl_web.verbose = options.verbose
etl_rss.verbose = options.verbose
if options.quiet == False or options.quiet==True:
quiet = options.quiet
app.worker_main()
I read multiple tutorials about Celery and from my understanding, this line should do the job
celery -A etl.tasks flower
but it doesnt. The result is the statement
Error: Unable to load celery application. The module etl was not found.
Same for
celery -A etl.tasks worker --loglevel=debug
so Celery itself seems to be causing the trouble, not flower. I also tried e.g. celery -A etl.index_filedirectory worker --loglevel=debug but with the same result.
What am I missing? Do I have to somehow tell Celery where to find etl.tasks? Online research doesn't really show a similar case, most of the "Module not found" errors seem to occur while importing stuff. So possibly it's a silly question but I couldn't find a solution anywhere. I hope you guys can help me. Unfortunately, I won't be able to respond until Monday though, sorry in advance.
I got same issue, I installed and configured my queue as follows, and it works.
Install RabbitMQ
MacOS
brew install rabbitmq
sudo vim ~/.bash_profile
In bash_profile add the following line:
PATH=$PATH:/usr/local/sbin
Then update bash_profile:
sudo source ~/.bash_profile
Linux
sudo apt-get install rabbitmq-server
Configure RabbitMQ
Launch the queue:
sudo rabbitmq-server
In another Terminal, configure the queue:
sudo rabbitmqctl add_user myuser mypassword
sudo rabbitmqctl add_vhost myvhost
sudo rabbitmqctl set_user_tags myuser mytag
sudo rabbitmqctl set_permissions -p myvhost myuser ".*" ".*" ".*"
Launch Celery
I would suggest to go in the folder that contains task.py and use the following command:
celery -A task worker -l info -Q celery --concurrency 5
Beware that this error means two things:
The module is missing
The module exists but cannot be loaded. If it has errors in it, such as a SyntaxError for instance.
To check that it's not the latter, run:
python -c "import <myModuleContainingTasksDotPyFile>"
In the context of this question:
python -c "import etl"
If it crashes, fix this first (Unlike with celery, you'll get a detailed error message).
Solutions above did not work for me.
I had the same issue and my problem was that in main celery.py (that was in SmartCalend folder) I had:
app = Celery('proj')
but instead I must type there:
app = Celery('SmartCalend')
where SmartCalend is the actual app name where celery.py belongs (!). not any random word, but precisely app name. Thats nowhere mentioned, only in official docs here:
Try export PYTHONPATH=<parent directory> where parent directory is the folder where the etl is. Run the Celery worker, and see it if fixes your problem. This is probably one of the most common Celery "issues" (not really Celery, but Python in general). Alternatively, run the Celery worker from that folder.
Answer for MacOS Catalina:
When you install celery with pip (pip install celery), python can import celery, but you are not able to launch celery from the terminal because the terminal does not know of the celery executable.
Add celery to the path to fix:
nano ~/.bash_profile
In the file add: export PATH="/Users/gavinbelson/Library/Python/2.7/bin:$PATH"
To save the file in the nano editor: ctrl+o, then enter, then ctrl+x
To update the terminal with your change type: source ~/.bash_profile
Now you should be able to type celery in the terminal window
---- Note this is for the default python terminal command which runs version 2.7. If you are using python3 to run python, you would need to change alter the path variable accordingly
I have a class that I instantiate in a request (it's a ML model that loads and takes a bit of time to configure on startup). The idea is to only do that once and have each request use the model for predictions. Does gunicorn instantiate the app every time?
Aka, will the model retrain every time a new request comes in?
It sounds like you could benefit from the application preloading:
http://docs.gunicorn.org/en/stable/settings.html#preload-app
This will let you load app code before spinning off your workers.
For those who are looking for how to share a variable between gunicorn workers without using Redis or Session, here is a good alternative with the awesome python dotenv:
The principle is to read and write shared variables from a file that could be done with open() but dotenv is perfect in this situation.
pip install python-dotenv
In app directory, create .env file:
├── .env
└── app.py
.env:
var1="value1"
var2="value2"
app.py: # flask app
from flask import Flask
import os
from dotenv import load_dotenv
app = Flask( __name__ )
# define the path explicitly if not in same folder
#env_path = os.path.dirname(os.path.realpath(__file__)) +'/../.env'
#load_dotenv(dotenv_path=env_path) # you may need a first load
def getdotenv(env):
try:
#global env_path
#load_dotenv(dotenv_path=env_path,override=True)
load_dotenv(override=True)
val = os.getenv(env)
return val
except :
return None
def setdotenv(key, value): # string
global env_path
if key :
if not value:
value = '\'\''
cmd = 'dotenv -f '+env_path+' set -- '+key+' '+value # set env variable
os.system(cmd)
#app.route('/get')
def index():
var1 = getdotenv('var1') # retreive value of variable var1
return var1
#app.route('/set')
def update():
setdotenv('newValue2') # set variable var2='newValue2'
When I'm trying to python manage.py changepassword command I get this errror:
AssertionError: No api proxy found for service "taskqueue"
Here's what I have in my PYTHONPATH:
$ echo $PYTHONPATH
lib/:/usr/local/google_appengine
And my DJANGO_SETTINGS_MODULE points to the settings file that I use for GAE:
$ echo $DJANGO_SETTINGS_MODULE
settings.dev
There's some package for taskqueue in appengine api folder:
/usr/local/google_appengine/google/appengine/api/taskqueue$ ls
__init__.py __init__.pyc taskqueue.py taskqueue.pyc taskqueue_service_pb.py taskqueue_service_pb.pyc taskqueue_stub.py taskqueue_stub.pyc
What could I miss here?
I assume manage.py is executing sdk methods without starting a local dev_appserver. dev_appserver.py sets up stubs to emulate the services available once your application is deployed. When you are executing code locally and outside of the running app server, you will need to initialize those stubs yourself.
The app engine docs have a section on testing that tells you how to initialize those stubs. It isn't the exact solution to your issue, but it can point you to the stubs you need to set up.
import unittest
from google.appengine.api import taskqueue
from google.appengine.ext import deferred
from google.appengine.ext import testbed
class TaskQueueTestCase(unittest.TestCase):
def setUp(self):
self.testbed = testbed.Testbed()
self.testbed.activate()
# root_path must be set the the location of queue.yaml.
# Otherwise, only the 'default' queue will be available.
self.testbed.init_taskqueue_stub(root_path='tests/resources')
self.taskqueue_stub = self.testbed.get_stub(
testbed.TASKQUEUE_SERVICE_NAME)
def tearDown(self):
self.testbed.deactivate()
def testTaskAddedToQueue(self):
taskqueue.Task(name='my_task', url='/url/of/my/task/').add()
tasks = self.taskqueue_stub.get_filtered_tasks()
assert len(tasks) == 1
assert tasks[0].name == 'my_task'
I have written a Python package hwrt (see installation instructions if you want to try it) which serves a website when executed with
$ hwrt serve
2014-12-04 20:27:07,182 INFO * Running on http://127.0.0.1:5000/
2014-12-04 20:27:07,183 INFO * Restarting with reloader
I would like to let it run on http://www.pythonanywhere.com, but when I start it there I get
19:19 ~ $ hwrt serve
2014-12-04 19:19:59,282 INFO * Running on http://127.0.0.1:5000/
Traceback (most recent call last):
File "/home/MartinThoma/.local/bin/hwrt", line 108, in <module>
main(args)
File "/home/MartinThoma/.local/bin/hwrt", line 102, in main
serve.main()
File "/home/MartinThoma/.local/lib/python2.7/site-packages/hwrt/serve.py", line 95, in main
app.run()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 739, in run
run_simple(host, port, self, **options)
File "/usr/local/lib/python2.7/dist-packages/werkzeug/serving.py", line 613, in run_simple
test_socket.bind((hostname, port))
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 98] Address already in use
I only found this in the documentation:
Flask
never use app.run(), it will break your webapp. Just import the
app into your wsgi file...
By searching for wsgi file, I found mod_wsgi (Apache). However, I don't understand how I can adjust my current minimalistic Flask application to work with that. Currently, the script behind hwrt serve is:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Start a webserver which can record the data and work as a classifier."""
import pkg_resources
from flask import Flask, request, render_template
from flask_bootstrap import Bootstrap
import os
import json
# hwrt modules
import hwrt
import hwrt.utils as utils
def show_results(results, n=10):
"""Show the TOP n results of a classification."""
import nntoolkit
classification = nntoolkit.evaluate.show_results(results, n)
return "<pre>" + classification.replace("\n", "<br/>") + "</pre>"
# configuration
DEBUG = True
template_path = pkg_resources.resource_filename('hwrt', 'templates/')
# create our little application :)
app = Flask(__name__, template_folder=template_path)
Bootstrap(app)
app.config.from_object(__name__)
#app.route('/', methods=['POST', 'GET'])
def show_entries():
heartbeat = request.args.get('heartbeat', '')
return heartbeat
#app.route('/interactive', methods=['POST', 'GET'])
def interactive():
if request.method == 'POST':
raw_data_json = request.form['drawnJSON']
# TODO: Check recording
# TODO: Submit recorded json to database
# Classify
model_path = pkg_resources.resource_filename('hwrt', 'misc/')
model = os.path.join(model_path, "model.tar")
print(model)
results = utils.evaluate_model_single_recording(model, raw_data_json)
# Show classification page
page = show_results(results, n=10)
page += 'back'
return page
else:
# Page where the user can enter a recording
return render_template('canvas.html')
def get_json_result(results, n=10):
s = []
for res in results[:min(len(results), n)]:
s.append({res['semantics']: res['probability']})
return json.dumps(s)
#app.route('/worker', methods=['POST', 'GET'])
def worker():
# Test with
# wget --post-data 'classify=%5B%5B%7B%22x%22%3A334%2C%22y%22%3A407%2C%22time%22%3A1417704378719%7D%5D%5D' http://127.0.0.1:5000/worker
if request.method == 'POST':
raw_data_json = request.form['classify']
# TODO: Check recording
# TODO: Submit recorded json to database
# Classify
model_path = pkg_resources.resource_filename('hwrt', 'misc/')
model = os.path.join(model_path, "model.tar")
results = utils.evaluate_model_single_recording(model, raw_data_json)
return get_json_result(results, n=10)
else:
# Page where the user can enter a recording
return "Classification Worker (Version %s)" % hwrt.__version__
def get_parser():
"""Return the parser object for this script."""
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
parser = ArgumentParser(description=__doc__,
formatter_class=ArgumentDefaultsHelpFormatter)
return parser
def main():
app.run()
if __name__ == '__main__':
main()
Ok, a not so non-sequitur answer to your question is around what mod_wsgi does to interface with your app. A typical flask app would look something like this:
from flask import Flask
app = Flask(__name__)
app.route("/")
def hello():
return "Holy moly that tunnel was bright.. said Bit to NIC"
if __name__ == "__main__":
app.run()
Unfortunately, Apache has no way to know what to do with this (though the app would run happily on its own). In order to get the app and Apache to play nice together we're going to use something called mod_wsgi. What Mod_WSGI does that's important to us, is that it provides a known interface (a file type called wsgi) that's going to wrap our application and initialize it so that we can serve it through Apache.
I'm going to assume you are using a python virtual environment, but if you aren't you can omit the step that deals with this in the instructions below. If you're curious why virtual environments are so great, feel free read about the python ecosystem.
Also - you can include an extra flag (assuming you are running wsgi as a daemon) to automatically reload the daemon whenever you touch or alter your wsgi file. This is quite useful during development and debugging so I'll include is below.
Anyway, let's get started. I'll break this down to steps below.
Configuring Apache for mod_wsgi
Enable mod_wsgi in Apache:
sudo apt-get install libapache2-mod-wsgi
Edit your /etc/apache2/sites-available/<yoursite>.conf.
<VirtualHost interface:port>
WSGIDaemonProcess yourapp user=someUser processes=2 threads=15
WSGIProcessGroup yourapp
# In this case / refers to whatever relative URL path hosts flask
WSGIScriptAlias / /absolute/path/to/yourapp.wsgi
<Directory /path/to/your/main/py/file/ >
# Use good judgement here when server hardening, this assumes dev env
Order allow,deny
Allow from all
Require all granted
#The below enables 'auto-reload' of WSGI
WSGIScriptReloading On
</Directory>
# If you want to serve static files as well and bypass flask in those cases
Alias /relative/url/to/static/content/
<Directory /absolute/path/to/static/root/directory/>
Order allow,deny
Allow from all
</Directory>
</VirtualHost>
Create your yourapp.wsgi file and put it in the appropriate place: Be wary of file permissions!
#!/usr/bin/python
import sys
import logging
# Activate virtual environment.
# If you are not using venv, skip this.
# But you really should be using it!
activate_this = "/path/to/venv/bin/activate_this.py"
execfile(activate_this, dict(__file__=activate_this))
# Handle logging
logging.basicConfig(stream=sys.stderr)
sys.path.insert(0, "/path/to/your/main/py/file/")
from YourMainPyFileName import app as application
application.secret_key = "your_secret_key"
Reload Apache and troubleshoot problems. I set this up probably every few weeks for a different project or idea I have and... I usually have to fix one thing or another when doing it from scratch. Don't despair though! Flask has great documentation on this.
Once you've done all this you should be at a place where flask runs all on its own. The sample flask app above is the actual code I use to verify everything works whenever I set this up.
This was left here in case it's some use, but is not really directly related to the question...
The answer here is to use x-send-file. This takes advantage of letting Apache do what it's good at (serving static content), while at the same time first letting flask (or other python framework) do it's work first. I do this often to let flask handle my auth layers in single page web apps and have so far been happy with the results.
Doing so requires two things:
First - Enable xsendfile on Apache2 sudo apt-get install libapache2-mod-xsendfile.
Second - Alter your apache2 configuration so allow x-send-file headers:
Alter your conf file in /etc/apache2/sites-available/<yoursite>.conf and add...
XSendFile On
XSendFilePath /path/to/static/directory
This can be entered top level within the <Virtualhost></Virtualhost> tags.
Don't forget to restart Apache sudo service apache2 restart.
Finally - Configure your flask app to use x-send-file in your app.py file:
app.user_x_sendfile = True
Note: Must be done after app initialization. Consequently can also be passed as an initialization parameter.
Flask has documentation on this (excerpt below):
use_x_sendfile
Enable this if you want to use the X-Sendfile feature. Keep in mind that the server has to support this. This only affects files sent with the send_file() method.
New in version 0.2.
This attribute can also be configured from the config with the USE_X_SENDFILE configuration key. Defaults to False.
I ran into a similar issue #moose was having. Getting connection refused and couldnt even telnet localhost 5000.
Turns out theres a ports.conf file i had to add Listen 5000
Happy days.
/home/myuser/mysite-env/lib/python2.6/site-packages/celery/loaders/default.py:53:
NotConfigured: No celeryconfig.py
module found! Please make sure it
exists and is available to Python.
NotConfigured)
I even defined it in my /etc/profile and also in my virtual environment's "activate". But it's not reading it.
Now in Celery 4.1 you can solve that problem by that code(the easiest way):
import celeryconfig
from celery import Celery
app = Celery()
app.config_from_object(celeryconfig)
For Example small celeryconfig.py :
BROKER_URL = 'pyamqp://'
CELERY_RESULT_BACKEND = 'redis://localhost'
CELERY_ROUTES = {'task_name': {'queue': 'queue_name_for_task'}}
Also very simple way:
from celery import Celery
app = Celery('tasks')
app.conf.update(
result_expires=60,
task_acks_late=True,
broker_url='pyamqp://',
result_backend='redis://localhost'
)
Or Using a configuration class/object:
from celery import Celery
app = Celery()
class Config:
enable_utc = True
timezone = 'Europe/London'
app.config_from_object(Config)
# or using the fully qualified name of the object:
# app.config_from_object('module:Config')
Or how was mentioned by setting CELERY_CONFIG_MODULE
import os
from celery import Celery
#: Set default configuration module name
os.environ.setdefault('CELERY_CONFIG_MODULE', 'celeryconfig')
app = Celery()
app.config_from_envvar('CELERY_CONFIG_MODULE')
Also see:
Create config: http://docs.celeryproject.org/en/latest/userguide/application.html#configuration
Configuration fields: http://docs.celeryproject.org/en/latest/userguide/configuration.html
I had a similar problem with my tasks module. A simple
# celery config is in a non-standard location
import os
os.environ['CELERY_CONFIG_MODULE'] = 'mypackage.celeryconfig'
in my package's __init__.py solved this problem.
Make sure you have celeryconfig.py in the same location you are running 'celeryd' or otherwise make sure its is available on the Python path.
you can work around this with the environment... or, use --config: it requires
a path relative to CELERY_CHDIR from /etc/defaults/celeryd
a python module name, not a filename.
The error message could probably use these two facts.