I have a Celery implementation in my Python application. The broker i am using is SQS. The messages which goes to SQS are from a different application via Boto3's send_message() api. Now my confusion is how to trigger Celery to pick the message from SQS to process. There will be some task which will run in Celery which should process messages from SQS right. My requirement is similar to Celery Consumer SQS Messages.
As per my understanding, Celery polls SQS till messages arrives there. Can anybody help me with this?
I call this task every 20 seconds:
#app.task(name='listen_to_sqs_telemetry')
def listen_to_sqs_telemetry():
logger.info('start listen_to_telemetry')
sqs = get_sqs_client()
queue_url = 'https://sqs.us-east-2.amazonaws.com/xxx'
logger.info('Using ' + queue_url)
keep_going = True
num = 0
while keep_going:
keep_going = False
try:
response = sqs.receive_message(
QueueUrl=queue_url,
AttributeNames=[
'SentTimestamp',
],
MaxNumberOfMessages=5,
MessageAttributeNames=[
'All'
],
WaitTimeSeconds=20
)
# logger.info(response)
if 'Messages' in response:
keep_going = True
for rec in response['Messages']:
# Process message
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=rec['ReceiptHandle']
)
num = num + 1
else:
pass
# logger.info(response)
except Exception as e:
logger.error(str(e))
logger.info('done with listen_to_sqs_telemetry')
return "Processed {} message(s)".format(num)
If I understand you, try to run the worker as daemon. Use tool like supervisord to do it.
Related
We want to use celery for listening sqs queue and process event into task
This is the celeryconfig.py file
from kombu import (
Exchange,
Queue
)
broker_transport = 'sqs'
broker_transport_options = {'region': 'us-east-1'}
worker_concurrency = 10
accept_content = ['application/json']
result_serializer = 'json'
content_encoding = 'utf-8'
task_serializer = 'json'
worker_enable_remote_control = False
worker_send_task_events = True
result_backend = None
task_queues = (
Queue('re.fifo', exchange=Exchange('consume', type='direct'), routing_key='consume'),
)
task_routes = {'consume': {'queue': 're.fifo'}}
And this is celery.py file
from celery.utils.log import get_task_logger
from celery import Celery
app = Celery(__name__)
logger = get_task_logger(__name__)
#app.task(routing_key='consume', name="consume", bind=True, acks_late=True, ignore_result=True)
def consume(self, msg):
print('Message received')
logger.info('Message received')
# DO SOMETHING WITH THE RECEIVED MESSAGE
# print('this is the new message', msg)
return True
We are pushing event on sqs using aws cli
aws --endpoint-url http://localhost:9324 sqs send-message --queue-url http://localhost:9324/queue/re.fifo --message-group-id owais --message-deduplication-id test18 --message-body {\"test\":\"test\"}
we are receiving event on celery worker but our consume task is not calling we want to call it
How can we call consume task on event coming from SQS, any help would be appreciated
The message you pushed to SQS using aws won't be recognized by the celery worker. You need to call consume.delay(msg) to push messages SQS, then your worker will be able to recognize it.
I've a docker which fetches messages from a standard SQS. But most of the times, the code shows it received zero messages and exits. While the SQS console shows the messages under "Messages in flight", so the messages were received by some consumer.
This is my docker entry point
ENV PYTHONPATH="$PYTHONPATH:/app"
ENTRYPOINT [ "python3" ]
CMD ["multi.py"]
This is multi.py code
import multiprocessing as mp
import subprocess
def s():
subprocess.call(['python3', 'script.py'])
n_process = min(mp.cpu_count(), 8)
process = []
for i in range(n_process):
p = mp.Process(target=s)
process.append(p)
p.start()
for p in process:
p.join()
This is script.py part of the code which calls receive_messages
sqs = boto3.resource('sqs', region_name=REGION, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
queue = sqs.get_queue_by_name(QueueName=QUEUE_NAME)
def main():
while True:
m = queue.receive_messages()
for message in m:
process_message(message)
message.delete()
Also, the docker works like 60% of the time. But I'm trying to figure out why it fails.
PS: Solved
This is from the boto3 docs
Short poll is the default behavior where a weighted random set of machines is sampled on a ReceiveMessage call. Thus, only the messages on the sampled machines are returned. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response. If this happens, repeat the request.
m = queue.receive_messages(WaitTimeSeconds=5)
This will resolve the issue because in cases where there are very less amount of messages in SQS, polling for messages will be very likely to fail.
You can read about short-polling on boto3 docs here.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sqs.html#SQS.Queue.receive_messages
I'm testing a script that runs binwalk on a file and then sends a kafka message to let the sending file know that it was completed or if it failed. It looks like this:
if __name__ == "__main__":
# finds the path of this file
scriptpath = os.path.dirname(inspect.getfile(inspect.currentframe()))
print(scriptpath)
# sets up kafka consumer on the binwalk topic and kafka producer for the bwsignature topic
consumer = KafkaConsumer('binwalk', bootstrap_servers=['localhost:9092'])
producer = KafkaProducer(bootstrap_servers = ['localhost:9092'])
# watches the binwalk kafka topic
for msg in consumer:
# load the json
job = json.loads(msg.value)
# get the filepath of the .bin
filepath = job["src"]
print(0)
try:
# runs the script
binwalkthedog(filepath, scriptpath)
# send a receipt
producer.send('bwsignature', b'accepted')
except:
producer.send('bwsignature', b'failed')
pass
producer.close()
consumer.close()
If I send in a file that doesn't give any errors in the 'binwalkthedog' function then it works fine, but if I give it a file that doesn't exist it prints a general error message and moves on to the next input, as it should. For some reason, the producer.send('bwsignature', b'failed') doesn't send unless there's something that creates a delay after the binwalkthedog call fails like time.sleep(1) or a for loop that counts to a million.
Obviously I could keep that in place but it's really gross and I'm sure there's a better way to do this.
This is the temp script I'm using to send and recieve a signal from the binwalkthedog module:
job = {
'src' : '/home/nick/Documents/summer-2021-intern-project/BinwalkModule/bo.bin',
'id' : 1
}
chomp = json.dumps(job).encode('ascii')
receipt = KafkaConsumer('bwsignature', bootstrap_servers=['localhost:9092'])
producer = KafkaProducer(bootstrap_servers = ['localhost:9092'])
future = producer.send('binwalk', chomp)
try:
record_metadata = future.get(timeout=10)
except KafkaError:
print("sucks")
pass
print(record_metadata.topic)
print(record_metadata.partition)
print(record_metadata.offset)
producer.close()
for msg in receipt:
print(msg.value)
break
Kafka producers batch many records together to reduce requests made to the server. If you want to force records to send, rather than introducing a blocking sleep call, or calling a get on the future, you should use producer.flush()
I can use signals to log task execution time, but I would like to log also the time on queue. Is this possible with signals? Which signals should I use?
Task events can be used to monitor and trigger action based on the events of a task. Task-sent, task-received, task-started, task-succeeded, task-failed, task-rejected, task-revoked, task-retried are the task events
supported in celery. For more details, please refer this link. To log the time a task is waiting in the queue, get the task created (or added to job queue) time and task started time by using the respective task event handlers. The difference of them will give the waiting time of the job in the queue. Below is a sample python code on how to implement it.
from celery import Celery
redis = Redis(host='workerdb', port=6379, db=0)
taskId_startTime = {}
taskId_createTime = {}
def my_monitor():
app = Celery('vwadaptor', broker='redis://workerdb:6379/0',backend='redis://workerdb:6379/0')
state = app.events.State()
def announce_task_received(event):
state.event(event)
task = state.tasks.get(event['uuid'])
taskId_createTime[task.uuid] = task.timestamp
def announce_task_started(event):
state.event(event)
task = state.tasks.get(event['uuid'])
taskId_startTime[task.uuid] = task.timestamp
def announce_task_succeeded(event):
state.event(event)
task = state.tasks.get(event['uuid'])
print "wait time in queue", taskId_startTime[task.uuid] - taskId_createTime[task.uuid]
with app.connection() as connection:
recv = app.events.Receiver(connection, handlers={
'task-received': announce_task_received,
'task-started': announce_task_started,
'task-succeeded': announce_task_succeeded,
})
recv.capture(limit=None, timeout=None, wakeup=True)
my_monitor()
I'm trying to make a simple RPC server with SimpleXMLRPCServer and Celery. Basically, the idea is that a remote client (client.py) can call tasks via xmlrpc.client to the server (server.py) which includes functions registered as Celery tasks (runnable.py).
The problem is, when RPC function is registered via register_function I can call it directly by its name, so it will be executed properly, but without using Celery. What I would like to achieve is to call it via name.delay() within client.py, the way it will be executed by Celery, but without locking the server thread. So, server.py should act like a proxy and allow multiple clients to call complete set of functions like:
for task in flow:
job = globals()[task]
job.delay("some arg")
while True:
if job.ready():
break
I've tried using register_instance with allow_dotted_names=True, but I came to an error:
xmlrpc.client.Fault: <Fault 1: "<class 'TypeError'>:cannot marshal <class '_thread.RLock'> objects">
Which led me to the question - if it's even possible to do something like this
Simplified code:
server.py
# ...runnable.py import
# ...rpc init
def register_tasks():
for task in get_all_tasks():
setattr(self, task, globals()[task])
self.server.register_function(getattr(self, task), task)
runnable.py
app = Celery("tasks", backend="amqp", broker="amqp://")
#app.task()
def say_hello():
return "hello there"
#app.task()
def say_goodbye():
return "bye, bye"
def get_all_tasks():
tasks = app.tasks
runnable = []
for t in tasks:
if t.startswith("modules.runnable"):
runnable.append(t.split(".")[-1])
return runnable
Finally, client.py
s = xmlrpc.client.ServerProxy("http://127.0.0.1:8000")
print(s.say_hello())
I've came up with an idea which creates some extra wrappers for Celery delay functions. Those are registered the way RPC client can call rpc.the_remote_task.delay(*args). This returns Celery job ID, then, client asks whether the job is ready via rpc.ready(job_id) and gets results with rpc.get(job_id). As for now, there's an obvious security hole as you can get results when you know the job ID, but still - it works fine.
Registering tasks (server.py)
def register_tasks():
for task in get_all_tasks():
exec("""def """ + task + """_runtime_task_delay(*args):
return celery_wrapper(""" + task + """, "delay", *args)
setattr(self, task + "_delay", """ + task + """_runtime_task_delay)
""")
f_delay = task + "_delay"
self.server.register_function(getattr(self, f_delay), task + ".delay")
def job_ready(jid):
return celery_wrapper(None, "ready", jid)
def job_get(jid):
return celery_wrapper(None, "get", jid)
setattr(self, "ready", job_ready)
setattr(self, "get", job_get)
self.server.register_function(job_ready, "ready")
self.server.register_function(job_get, "get")
The wrapper (server.py)
def celery_wrapper(task, method, *args):
if method == "delay":
job = task.delay(*args)
job_id = job.id
return job_id
elif method == "ready":
res = app.AsyncResult(args[0])
return res.ready()
elif method == "get":
res = app.AsyncResult(args[0])
return res.get()
else:
return "0"
And the RPC call (client.py)
jid = s.the_remote_task.delay("arg1", "arg2")
is_running = True
while is_running:
is_running = not s.ready(jid)
if not is_running:
print(s.get(jid))
time.sleep(.01)