I am new to Celery and SQS, and would like to use it to periodically check messages stored in SQS and then fire a consumer. The consumer and Celery both live on EC2, while the messages are sent from GAE using boto library.
Currently, I am confused about:
In the message body of creating_msg_gae.py, what task information I should put here? I assume this information would be the name of my celery task?
In the message body of creating_msg_gae.py, is url considered as the argument to be processed by my consumer (function do_something_url(url) in tasks.py)?
Currently, I am running celery with command celery worker -A celery_c -l info, from the command line, it seems like celery checks SQS periodically. Do I need to create a PeriodicTask in Celery instead?
I really appreciate any suggestions to help me with this issue.
creating_msg_gae.py
from boto import sqs
conn = sqs.connect_to_region("us-east-1",
aws_access_key_id='aaa',
aws_secret_access_key='bbb')
my_queue = conn.get_queue('uber_batch')
msg = {'properties': {'content_type': 'application/json',
'content_encoding': 'utf-8',
'body_encoding':'base64',
'delivery_tag':None,
'delivery_info': {'exchange':None, 'routing_key':None}},}
body = {'id':'theid',
###########Question 1#######
'task':'what task name I should put here?',
'url':['my_s3_address']}
msg.update({'body':base64.encodestring(json.dumps(body))})
my_queue.write(my_queue.new_message(json.dumps(msg)))
My Celery file system looks like:
./ce_folder/
celery_c.py, celeryconfig.py, tasks.py, __init__.py
celeryconfig.py
import os
BROKER_BACKEND = "SQS"
AWS_ACCESS_KEY_ID = 'aaa'
AWS_SECRET_ACCESS_KEY = 'bbb'
os.environ.setdefault("AWS_ACCESS_KEY_ID", AWS_ACCESS_KEY_ID)
os.environ.setdefault("AWS_SECRET_ACCESS_KEY", AWS_SECRET_ACCESS_KEY)
BROKER_URL = 'sqs://'
BROKER_TRANSPORT_OPTIONS = {'region': 'us-east-1'}
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 60}
BROKER_TRANSPORT_OPTIONS = {'polling_interval': 30}
CELERY_DEFAULT_QUEUE = 'uber_batch'
CELERY_DEFAULT_EXCHANGE = CELERY_DEFAULT_QUEUE
CELERY_DEFAULT_EXCHANGE_TYPE = CELERY_DEFAULT_QUEUE
CELERY_DEFAULT_ROUTING_KEY = CELERY_DEFAULT_QUEUE
CELERY_QUEUES = {
CELERY_DEFAULT_QUEUE: {
'exchange': CELERY_DEFAULT_QUEUE,
'binding_key': CELERY_DEFAULT_QUEUE,
}
}
celery_c.py
from __future__ import absolute_import
from celery import Celery
app = Celery('uber')
app.config_from_object('celeryconfig')
if __name__ == '__main__':
app.start()
tasks.py
from __future__ import absolute_import
from celery_c import app
#app.task
def do_something_url(url):
..download file from url
..do some calculations
..upload results files to s3 and return the result url###
return result_url
Related
I'm new to learning celery and was following tutorials and setup my celery setup with docker
I'm having issue with sending and executing celery task.
So have 4 docker container one for rabbitmq server, celery producer server and 2 worker.
Celery tasks file:
"""
CELERY MAIN FILE
"""
from celery import Celery
from time import sleep
celery_obj = Celery()
celery_obj.config_from_object('celery_config') #config file we created in same folder
#celery_obj.task
def add(num1,num2):
print("executing add function")
sleep(5)
return num1 + num2
My celery config file for Producer:
"""
CELERY CONFIGURATION FILE
"""
from kombu import Exchange, Queue
broker_url = "pyamqp://rabbitmq_user:123#172.17.0.2/res_opt_rabbitmq_vhost"
result_backend = 'rpc://'
#celery_result_backend = ""
celery_imports = ('res_opt_code.tasks')
task_queues = (
Queue('worker_A_kombu_queue',Exchange('celery',type='direct'),routing_key='worker_A_rabbitmq_queue'),
Queue('worker_B_kombu_queue',Exchange('celery',type='direct'),routing_key='worker_B_rabbitmq_queue')
)
Config file for worker_A:
"""
CELERY CONFIGURATION FILE
"""
from kombu import Exchange, Queue
broker_url = "pyamqp://rabbitmq_user:123#172.17.0.2/res_opt_rabbitmq_vhost"
result_backend = 'rpc://'
#celery_result_backend = ""
celery_imports = ('worker_code.tasks')
task_queues = (
Queue('worker_A_kombu_queue',Exchange('celery',type='direct'),routing_key='worker_A_rabbitmq_queue'),
Queue('worker_B_kombu_queue',Exchange('celery',type='direct'),routing_key='worker_B_rabbitmq_queue')
)
Command for starting celery on producer:
celery -A tasks worker --loglevel=DEBUG -f log_file.txt
command for starting celery on worker:
celery -A tasks worker -n celery_worker_A -Q worker_A_kombu_queue --loglevel=DEBUG
Function call from producer:
from tasks import add
add.apply_async([4,4],routing_key='worker_A_rabbitmq_queue')
#also tried local executing the function but not logs of functions it's in pending
add.delay(4,4)
could you guyz please help me what I'm doing wrong here
In Logs I'm able to see worker_A connected but no logs for function
Tried further troubleshooting and changed the argument in apply_async from routing key to queue and it working with the queue argument
was following this tutorial:
https://www.youtube.com/watch?v=TM1a3m65zaA
old:
add.apply_async([4,4],routing_key='worker_A_rabbitmq_queue')
new:
add.apply_async([4,4],queue='worker_A_rabbitmq_queue')
We want to use celery for listening sqs queue and process event into task
This is the celeryconfig.py file
from kombu import (
Exchange,
Queue
)
broker_transport = 'sqs'
broker_transport_options = {'region': 'us-east-1'}
worker_concurrency = 10
accept_content = ['application/json']
result_serializer = 'json'
content_encoding = 'utf-8'
task_serializer = 'json'
worker_enable_remote_control = False
worker_send_task_events = True
result_backend = None
task_queues = (
Queue('re.fifo', exchange=Exchange('consume', type='direct'), routing_key='consume'),
)
task_routes = {'consume': {'queue': 're.fifo'}}
And this is celery.py file
from celery.utils.log import get_task_logger
from celery import Celery
app = Celery(__name__)
logger = get_task_logger(__name__)
#app.task(routing_key='consume', name="consume", bind=True, acks_late=True, ignore_result=True)
def consume(self, msg):
print('Message received')
logger.info('Message received')
# DO SOMETHING WITH THE RECEIVED MESSAGE
# print('this is the new message', msg)
return True
We are pushing event on sqs using aws cli
aws --endpoint-url http://localhost:9324 sqs send-message --queue-url http://localhost:9324/queue/re.fifo --message-group-id owais --message-deduplication-id test18 --message-body {\"test\":\"test\"}
we are receiving event on celery worker but our consume task is not calling we want to call it
How can we call consume task on event coming from SQS, any help would be appreciated
The message you pushed to SQS using aws won't be recognized by the celery worker. You need to call consume.delay(msg) to push messages SQS, then your worker will be able to recognize it.
I'm trying to connect a Django Rest Framework backend to RabbitMQ using Celery. I have a few micro-services that use RabbitMQ as a message bus between the backend and those services. When I have the backend call a task that pushes a message to the micro-services message bus the message is never put into the queue for one of those services to grab. The queues I declare in my tasks.py are being created in RabbitMQ but are not being used by any of the producers.
testcelery/celery.py
...
from celery import Celery
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "testcelery.settings")
app = Celery("testcelery")
app.config_from_object("django.conf:settings", namespace="CELERY")
app.autodiscover_tasks()
testcelery/settings.py
...
installed_apps = [
...
"backend_processing"
]
CELERY_BROKER_URL = "amqp://testuser#rabbitmq_host:5672"
CELERY_ACCEPT_CONTENT = ["application/json"]
CELERY_TASK_SERIALIZER = "json"
backend_processing/tasks.py
...
from celery import shared_task
from testcelery.celery import app as celery_app
from kombu import Exchange, Queue
test_queue = Queue("test_queue", Exchange("test"), "test", durable=True)
#shared_task
def run_processing(data, producer=None):
with celery_app.producer_or_acquire(producer) as producer:
producer.publish(data,
declare=[test_queue],
retry=True,
exchange=test_queue.exchange,
routing_key=test_queue.routing_key,
delivery_mode="persistent")
What do I need to modify in my configuration to allow celery to publish to queues other than the ones given to celery by the settings.py?
I'd like to call generate_async_audio_service from a view and have it asynchronously generate audio files for the list of words using a threading pool and then commit them to a database.
I keep running into an error that I'm working out of the application context even though I'm creating a new polly and s3 instance each time.
How can I generate/upload multiple audio files at once?
from flask import current_app,
from multiprocessing.pool import ThreadPool
from Server.database import db
import boto3
import io
import uuid
def upload_audio_file_to_s3(file):
app = current_app._get_current_object()
with app.app_context():
s3 = boto3.client(service_name='s3',
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'))
extension = file.filename.rsplit('.', 1)[1].lower()
file.filename = f"{uuid.uuid4().hex}.{extension}"
s3.upload_fileobj(file,
app.config.get('S3_BUCKET'),
f"{app.config.get('UPLOADED_AUDIO_FOLDER')}/{file.filename}",
ExtraArgs={"ACL": 'public-read', "ContentType": file.content_type})
return file.filename
def generate_polly(voice_id, text):
app = current_app._get_current_object()
with app.app_context():
polly_client = boto3.Session(
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'),
region_name=app.config.get('AWS_REGION')).client('polly')
response = polly_client.synthesize_speech(VoiceId=voice_id,
OutputFormat='mp3', Text=text)
return response['AudioStream'].read()
def generate_polly_from_term(vocab_term, gender='m'):
app = current_app._get_current_object()
with app.app_context():
audio = generate_polly('Celine', vocab_term.term)
file = io.BytesIO(audio)
file.filename = 'temp.mp3'
file.content_type = 'mp3'
return vocab_term.id, upload_audio_file_to_s3(file)
def generate_async_audio_service(terms):
pool = ThreadPool(processes=12)
results = pool.map(generate_polly_from_term, terms)
# do something w/ results
This is not necessarily a fleshed-out answer, but rather than putting things into comments I'll put it here.
Celery is a task manager for python. The reason you would want to use this is if you have tasks pinging Flask, but they take longer to finish than the interval of tasks coming in, then certain tasks will be blocked and you won't get all of your results. To fix this, you hand it to another process. This goes like so:
1) Client sends a request to Flask to process audio files
2) The files land in Flask to be processed, Flask will send an asyncronous task to Celery.
3) Celery is notified of the task and stores its state in some sort of messaging system (RabbitMQ and Redis are the canonical examples)
4) Flask is now unburdened from that task and can receive more
5) Celery finishes the task, including the upload to your database
Celery and Flask are then two separate python processes communicating with one another. That should satisfy your multithreaded approach. You can also retrieve the state from a task through Flask if you want the client to verify that the task was/was not completed. The route in your Flask app.py would look like:
#app.route('/my-route', methods=['POST'])
def process_audio():
# Get your files and save to common temp storage
save_my_files(target_dir, files)
response = celery_app.send_tast('celery_worker.files', args=[target_dir])
return jsonify({'task_id': response.task_id})
Where celery_app comes from another module worker.py:
import os
from celery import Celery
env = os.environ
# This is for a rabbitMQ backend
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL', 'amqp://0.0.0.0:5672/0')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
Then, your celery process would have a worker configured something like:
from celery import Celery
from celery.signals import after_task_publish
env = os.environ
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
# Set celery_app with name 'tasks' using the above broker and backend
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
#celery_app.task(name='celery_worker.files')
def async_files(path):
# Get file from path
# Process
# Upload to database
# This is just if you want to return an actual result, you can fill this in with whatever
return {'task_state': "FINISHED"}
This is relatively basic, but could serve as a starting point. I will say that some of Celery's behavior and setup is not always the most intuitive, but this will leave your flask app available to whoever wants to send files to it without blocking anything else.
Hopefully that's somewhat helpful
I'm running a Celery worker in Python with the celery module v3.1.25, and a Celery client in node.js with the node-celery npm package v0.2.7 (not the latest).
The Python Celery worker works fine when sending a job using a Python Celery client.
Problem: When using a node-celery client to send a task to the Celery backend, we get an error in the JS console:
(STDERR) Celery should be configured with json serializer
Python Celery worker is configured with:
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
include=['proj.tasks'])
node-celery client is configured with:
var celery = require('node-celery')
var client = celery.createClient({
CELERY_BROKER_URL: 'amqp://test:test#192.168.1.26:5672//',
CELERY_RESULT_BACKEND: 'amqp',
CELERY_TASK_SERIALIZER: "json"
});
client.on('connect', function() {
console.log('connected');
client.call('proj.tasks.getPriceEstimates', [start_latitude, start_longitude],
function(result) {
console.log('result: ', result);
client.end();
})
});
Is this a problem with the configuration on the Python Celery worker? Did we miss out on a configuration parameter which can change the return serialization format to json?
Update
Updated with result_serializers and accept_content parameters as suggested by ChillarAnand
from __future__ import absolute_import, unicode_literals
from celery import Celery
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
result_serializer='json',
accept_content=['application/json'],
include=['proj.tasks'])
But node.js Celery client still thinks that its not in json, throwing the same error message.
It gives that error because the results were in the form of 'application/x-python-serialize'.
Checked this to be the case, as RabbitMQ management console shows the results to be content_type: application/x-python-serialize
This forum post says that it is because the tasks were created before the configs were loaded.
Here are how my files are like:
proj/celery.py
from __future__ import absolute_import, unicode_literals
from celery import Celery
app = Celery('tasks',
broker='amqp://test:test#192.168.1.26:5672//',
backend='amqp://',
task_serializer='json',
result_serializer='json',
accept_content=['application/json'],
include=['proj.tasks'])
proj/tasks.py
from __future__ import absolute_import, unicode_literals
from .celery import app
#app.task
def myTask():
...
return ...
Is there a better way to structure the code to ensure that the configs are loaded before the tasks?
When configuring a serializer, you should specify content type, task serializer and result serializer as well.
app = Celery(
broker='amqp://guest#localhost//',
backend='amqp://',
include=['proj.tasks'],
task_serializer='json',
result_serializer='json',
accept_content = ['application/json'],
)