How can I resolve an R12 (Exit timeout) launching with Heroku?

How can I resolve an R12 (Exit timeout) launching with Heroku? - python

I am using Heroku to host a bot that I have been working on, the code itself works perfectly from when I launch it locally. However, I updated the bot yesterday adding some new functionality, and I am receiving this error code when checking the logs:
2021-08-29T13:42:44.000000+00:00 app[api]: Build succeeded
2021-08-29T13:43:05.090490+00:00 heroku[worker.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 30 seconds of SIGTERM
2021-08-29T13:43:05.095578+00:00 heroku[worker.1]: Stopping remaining processes with SIGKILL
2021-08-29T13:43:05.174771+00:00 heroku[worker.1]: Process exited with status 137
2021-08-29T13:56:00.000000+00:00 app[api]: Build started by user ty.unsworth#gmail.com
2021-08-29T13:56:24.413396+00:00 app[api]: Deploy 5ff2d18b by user ty.unsworth#gmail.com
2021-08-29T13:56:24.413396+00:00 app[api]: Release v39 created by user ty.unsworth#gmail.com
2021-08-29T13:56:26.596593+00:00 heroku[worker.1]: Restarting
2021-08-29T13:56:26.610525+00:00 heroku[worker.1]: State changed from up to starting
2021-08-29T13:56:27.284912+00:00 heroku[worker.1]: Stopping all processes with SIGTERM
2021-08-29T13:56:29.770892+00:00 heroku[worker.1]: Starting process with command `python WagonCounterBot.py`
2021-08-29T13:56:30.497452+00:00 heroku[worker.1]: State changed from starting to up
2021-08-29T13:56:34.000000+00:00 app[api]: Build succeeded
2021-08-29T13:56:57.473450+00:00 heroku[worker.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 30 seconds of SIGTERM
2021-08-29T13:56:57.481783+00:00 heroku[worker.1]: Stopping remaining processes with SIGKILL
2021-08-29T13:56:57.535059+00:00 heroku[worker.1]: Process exited with status 137
I think this error may be caused by the new functionality I added:
#client.listen()
async def on_message(message):
"""
Looks for when a member calls 'bhwagon', and after 24 minutes, sends them a message
:param message: the message sent by the user
:return: a DM to the user letting them know their cooldown ended
"""
channel = client.get_channel(int(WAGON_CHANNEL)) # sets the
if message.content.startswith("bhwagon"):
channel = message.channel
await cool_down_ended(message)
async def cool_down_ended(message):
"""
Sends the author of the message a personal DM 24 minutes after they type 'bhwagon' in the guild
:param message: is the message the author sent
:return: a message to the author
"""
time.sleep(1440) # sets a time for 24 minutes = 1440 seconds
await message.author.send("Your wagon steal timer is up 🎩 time for another materials run!")
So I think I understand this error to be that Heroku doesn't allow functions to delay themselves for more than 30 seconds? Which conflicts with cool_down_ended(message) that delays for 24 minutes.
Would there be any easy way around this?

Don't use time.sleep in asynchronous code, it blocks entire thread. Use await asyncio.sleep(delay) instead.
This is the code to handle SIGTERM from Heroku (you can add it before client.run call)
import signal
signal.signal(signal.SIGTERM, lambda *_: client.loop.create_task(client.close()))

Related

How to configure wait time for requeued celery task after worker is killed?

I have a Celery task that is scheduled to run after five minutes:
my_task.apply_async(countdown=5 * 60)
In case the worker is restarted, I need the task to be requeued, so I'm using the acks_late=True and reject_on_worker_lost=True options in the task decorator.
#shared_task(acks_late=True, reject_on_worker_lost=True)
def my_task():
print('Running task...')
The Celery worker is running inside a Docker container, so when I restart the worker after starting the task by running docker restart, the task is getting requeued, but after approximately 1 hour. I expected that the task would still be executed at the ETA or as soon as the worker is back up if the ETA has already passed. How can I configure the wait time for task requeuing?
There are no other tasks running simultaneously and I'm running this task on a development environment under very small load, so I don't think it is a congestion issue.
Packages:
celery==4.2.2
redis==3.2.1
Example of celery logs without restart:
[2022-08-05 20:00:55,990: INFO/MainProcess] Received task: tasks.my_task[15844866-36c6-465e-874a-78f861837d3c] ETA:[2022-08-05 20:05:55.922479+00:00]
[2022-08-05 20:05:55,972: WARNING/ForkPoolWorker-1] Running task...
[2022-08-05 20:05:55,974: INFO/ForkPoolWorker-1] Task tasks.my_task[15844866-36c6-465e-874a-78f861837d3c] succeeded in 0.001996207982301712s: None
Example of celery logs with restart:
[2022-08-05 19:42:16,961: INFO/MainProcess] Received task: tasks.my_task[c103280b-7d62-4b6a-8311-57769df81c90] ETA:[2022-08-05 19:47:16.893205+00:00]
--- WORKER RESTART HERE ---
[2022-08-05 20:43:39,646: INFO/MainProcess] Received task: tasks.my_task[c103280b-7d62-4b6a-8311-57769df81c90] ETA:[2022-08-05 19:47:16.893205+00:00]
[2022-08-05 20:43:40,106: WARNING/ForkPoolWorker-1] Running task...
[2022-08-05 20:43:40,107: INFO/ForkPoolWorker-1] Task tasks.my_task[c103280b-7d62-4b6a-8311-57769df81c90] succeeded in 0.0011997036635875702s: None
Thanks in advance.

Celery always stop running and record some ERROR messages

I had a workload which have 16 instances, also they can communicate with each other (verified by ping). Each of them was running a long time task and started like this:
nohup celery worker -A tasks.workers --loglevel=INFO --logfile=/dockerdata/log/celery.log --concurrency=7 >/dev/null 2>&1 &
However, after a while, there will always be a few instances of celery that will stop running, because normally the log directory will save every day's logs.
I checked the last day's logs for these instances and found the following information:
worker exited by signal SIGKILL
[2021-07-23 09:04:24,270: ERROR/MainProcess] Process 'ForkPoolWorker-19773' pid:2846586 exited with 'signal 9 (SIGKILL)'
[2021-07-23 09:04:24,281: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 79074.')
Traceback (most recent call last):
File "/data/anaconda3/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 79074.
missed hearbeat from...
[2021-07-30 10:24:26,815: INFO/MainProcess] missed heartbeat from celery#instance-1
I suspect that the celery stop has something to do with the above two messages. Can anyone offer some solutions to this problem?

Celery worker receiving task even I have no message on RabbitMQ queue

Scenario:
I had created a shared_task on celery for testing purpose [RabbitMQ as a Broker for queuing messages]:
#app.task(bind=True, max_retries = 5, base=MyTask)
def testing(self):
try:
raise smtplib.SMTPException
except smtplib.SMTPException as exc:
print 'This is it'
self.retry(exc=exc, countdown=2)
#Overriding base class of Task
class MyTask(celery.Task):
def on_failure(self, exc, task_id, args, kwargs, einfo):
print "MyTask on failure world"
pass
I called the task for testing by entering command testing.delay() by 10 times after creating a worker. And I just quit the server by pressing Ctrl+C and delete all those queues from RabbitMQ server. And again I started the server.
Server starting command: celery worker --app=my_app.settings -l DEBUG
Delete command of queue: rabbitmqadmin delete queue name=<queue_name>
Deleting workers command: ps auxww | grep 'celery worker' | awk '{print $2}' | xargs kill -9
Problem:
Since I have already deleted all queues from the RabbitMQ server, now only fresh tasks should be received. But I am still getting old tasks, moreover, no new tasks are appearing on the list. What would be the actual cause of this?

What is happening is your worker takes in more than one task, unless you have the -Ofair flag when starting the worker
https://medium.com/#taylorhughes/three-quick-tips-from-two-years-with-celery-c05ff9d7f9eb
So, even if you clear out your queue, your worker will still be running with the tasks its already picked up, unless you kill the worker process itself.
Edit to add
If you have a task running after restart, you need to revoke the task.
http://celery.readthedocs.io/en/latest/faq.html#can-i-cancel-the-execution-of-a-task

Celery Closes Unexpectedly After Longer Inactivity

So I am using a RabbitMQ + Celery to create a simple RPC architecture. I have one RabbitMQ message broker and one remote worker which runs Celery deamon.
There is a third server which exposes a thin RESTful API. When it receives HTTP request, it sends a task to the remote worker, waits for response and returns a response.
This works great most of the time. However I have notices that after a longer inactivity (say 5 minutes of no incoming requests), the Celery worker behaves strangely. First 3 tasks received after a longer inactivity return this error:
exchange.declare: connection closed unexpectedly
After three erroneous tasks it works again. If there are not tasks for longer period of time, the same thing happens. Any idea?
My init script for the Celery worker:
# description "Celery worker using sync broker"
console log
start on runlevel [2345]
stop on runlevel [!2345]
setuid richard
setgid richard
script
chdir /usr/local/myproject/myproject
exec /usr/local/myproject/venv/bin/celery worker -n celery_worker_deamon.%h -A proj.sync_celery -Q sync_queue -l info --autoscale=10,3 --autoreload --purge
end script
respawn
My celery config:
# Synchronous blocking tasks
BROKER_URL_SYNC = 'amqp://guest:guest#localhost:5672//'
# Asynchronous non blocking tasks
BROKER_URL_ASYNC = 'amqp://guest:guest#localhost:5672//'
#: Only add pickle to this list if your broker is secured
#: from unwanted access (see userguide/security.html)
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'UTC'
CELERY_ENABLE_UTC = True
CELERY_BACKEND = 'amqp'
# http://docs.celeryproject.org/en/latest/userguide/tasks.html#disable-rate-limits-if-they-re-not-used
CELERY_DISABLE_RATE_LIMITS = True
# http://docs.celeryproject.org/en/latest/userguide/routing.html
CELERY_DEFAULT_QUEUE = 'sync_queue'
CELERY_DEFAULT_EXCHANGE = "tasks"
CELERY_DEFAULT_EXCHANGE_TYPE = "topic"
CELERY_DEFAULT_ROUTING_KEY = "sync_task.default"
CELERY_QUEUES = {
'sync_queue': {
'binding_key':'sync_task.#',
},
'async_queue': {
'binding_key':'async_task.#',
},
}
Any ideas?
EDIT:
Ok, now it appears to happen randomly. I noticed this in RabbitMQ logs:
=WARNING REPORT==== 6-Jan-2014::17:31:54 ===
closing AMQP connection <0.295.0> (some_ip_address:36842 -> some_ip_address:5672):
connection_closed_abruptly

Is your RabbitMQ server or your Celery worker behind a load balancer by any chance? If yes, then the load balancer is closing the TCP connection after some period of inactivity. In which case, you will have to enable heartbeat from the client (worker) side. If you do, I would not recommend using the pure Python amqp lib for this. Instead, replace it with librabbitmq.

The connection_closed_abruptly is caused when clients disconnecting without the proper AMQP shutdown protocol:
channel.close(...)
Request a channel close.
This method indicates that the sender wants to close the channel.
This may be due to internal conditions (e.g. a forced shut-down) or due to
an error handling a specific method, i.e. an exception.
When a close is due to an exception, the sender provides the class and method id of
the method which caused the exception.
After sending this method, any received methods except Close and Close-OK MUST be discarded. The response to receiving a Close after sending Close must be to send Close-Ok.
channel.close-ok():
Confirm a channel close.
This method confirms a Channel.Close method and tells the recipient
that it is safe to release resources for the channel.
A peer that detects a socket closure without having received a
Channel.Close-Ok handshake method SHOULD log the error.
Here is an issue about that.
Can you set your custom configuration for BROKER_HEARTBEAT and BROKER_HEARTBEAT_CHECKRATE and check again, for example:
BROKER_HEARTBEAT = 10
BROKER_HEARTBEAT_CHECKRATE = 2.0

How to tell whether imaplib2 idle response resulted from timeout

I'm using imaplib2 (docs) to interact with an IMAP server.
I'm using the idle command, with a timeout and a callback.
The problem is, I don't see any way of telling if the callback was triggered by the timeout being reached, or if there was a change on the server that I need to check out.
I just get ('OK', ['IDLE terminated (Success)']) every time.
Here's the debug output for both cases:
Timedout:
15:43.94 MainThread server IDLE started, timeout in 5.00 secs
15:48.94 imap.gmail.com handler server IDLE timedout
15:48.94 imap.gmail.com handler server IDLE finished
15:48.94 imap.gmail.com writer > DONE\r\n
15:49.17 imap.gmail.com reader < DDDM6 OK IDLE terminated (Success)\r\n
15:49.17 imap.gmail.com handler _request_pop(DDDM6, ('OK', ['IDLE terminated (Success)']))
Something happened:
18:41.34 MainThread server IDLE started, timeout in 50.00 secs
19:01.35 imap.gmail.com reader < * 1 EXISTS\r\n
19:01.37 imap.gmail.com handler server IDLE finished
19:01.37 imap.gmail.com writer > DONE\r\n
19:01.59 imap.gmail.com reader < BFCN6 OK IDLE terminated (Success)\r\n
19:01.59 imap.gmail.com handler _request_pop(BFCN6, ('OK', ['IDLE terminated (Success)']))
What am I missing?
Does the functionality just not exist in imaplib2?

Piers Lauder (author of imaplib2) just answered this question on the imaplib2-devel mailing list. He said:
I think the way to test if an IDLE has timed out is to execute:
instance.response('IDLE')
which will return:
('IDLE', ['TIMEOUT'])
if the reason that the idle returned as a timeout, rather than
something else (such as ('IDLE', [None])).
I agree that this should be documented, so I'll fix the imaplib2.html
document

You'll have to manually check for new messages each time you get this response. You can store the UIDs of messages in a list and compare new UIDs with it upon each callback. This way you can easily tell if there are new messages or a timeout.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I resolve an R12 (Exit timeout) launching with Heroku? - python

Don't use time.sleep in asynchronous code, it blocks entire thread. Use await asyncio.sleep(delay) instead. This is the code to handle SIGTERM from Heroku (you can add it before client.run call) import signal signal.signal(signal.SIGTERM, lambda *_: client.loop.create_task(client.close()))

Related

How to configure wait time for requeued celery task after worker is killed?

Celery always stop running and record some ERROR messages

Celery worker receiving task even I have no message on RabbitMQ queue

Celery Closes Unexpectedly After Longer Inactivity

How to tell whether imaplib2 idle response resulted from timeout

Categories

Resources