Background worker process crashing - python

I'm creating a web app that takes username as input and perform some tasks with it. I'm using flask for the web and heroku to deploy it.I've two main python scripts these run the app. app.py and task.py. Everything is okey into my app.py file, where i used flask code for making the app but into my task.py file I've a
Initialisation step
from instabot import Bot
bot = Bot()
bot.login(username="myusername", password="mypassword")
now I deployed these script into the Procfile like
worker: python task.py
web: gunicorn app:app
but after deploying when i check it's log I'm getting
logs
2022-01-28T18:30:06.376795+00:00
heroku[worker.1]: Starting process
with command `python task.py`
2022-01-
28T18:30:07.081356+00:00
heroku[worker.1]: State changed
from starting to up
2022-01-
28T18:30:08.213687+00:00
heroku[worker.1]: Process exited
with status 0
2022-01-
28T18:30:08.274498+00:00
heroku[worker.1]: State changed
from up to crashed
as you can see it's getting crashed!! i don't know what's my mistake is can you please help me to figure out : (

The code in my worker is almost stolen from the redis-rq documentation, with some changes to make it work also in our local development enviroment:
import os
from rq import Queue, Connection
from tools import get_redis_store
# we need to use a different worker when we are in Heroku
if 'DYNO' in os.environ:
from rq.worker import HerokuWorker as Worker
else:
from rq import Worker
redis_url = os.getenv('REDIS_URL')
if not redis_url:
raise RuntimeError('Set up Redis first.')
listen = ['default']
conn = get_redis_store()
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
On tools.py
_redis_store = None
def get_redis_store():
'''
Get a connection pool to redis based on the url configured
on env variable REDIS_URL
Returns
-------
redis.ConnectionPool
'''
global _redis_store
if not _redis_store:
redis_url = os.getenv('REDIS_URL')
if redis_url:
logger.debug('starting redis: %s ' % redis_url)
if redis_url.startswith('rediss://'):
# redis 6 encripted connection
# heroku needs disable ssl verification
_redis_store = redis.from_url(
redis_url, ssl_cert_reqs=None)
else:
_redis_store = redis.from_url(redis_url)
else:
logger.debug('redis not configured')
return _redis_store
On the main app, to queue a task:
from rq import Queue
from tools import get_redis_store
redis_store = get_redis_store()
queue = Queue(connection=redis_store)
def _slow_task(param1, param2, paramn):
# here I execute the slow task
my_slow_code(param1, param2)
# when I need to execute the slow task
queue.enqueue(_slow_task, param1, param2)
There are multiple options to use redis on Heroku, depending on what you use, you could need to change the env variable REDIS_URL. The environment variable DYNO is checked to know if the code is running on Heroku or in a developer machine.

Related

Celery jobs not running on heroku (python/django app)

I have a Django app setup with some scheduled tasks. The app is deployed on Heroku with Redis. The task runs if invoked synchronously in the console, or locally when I also have redis and celery running. However, the scheduled jobs are not running on Heroku.
My task:
#shared_task(name="send_emails")
def send_emails():
.....
celery.py:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from celery.schedules import crontab
# set the default Django settings module for the 'celery' program.
# this is also used in manage.py
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_app.settings')
# Get the base REDIS URL, default to redis' default
BASE_REDIS_URL = os.environ.get('REDIS_URL', 'redis://localhost:6379')
app = Celery('my_app')
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
app.conf.broker_url = BASE_REDIS_URL
# this allows you to schedule items in the Django admin.
app.conf.beat_scheduler = 'django_celery_beat.schedulers.DatabaseScheduler'
# These are the scheduled jobs
app.conf.beat_schedule = {
'send_emails_crontab': {
'task': 'send_emails',
'schedule': crontab(hour=9, minute=0),
'args': (),
}
}
In Procfile:
worker: celery -A my_app worker --beat -S django -l info
I've spun up the worker with heroku ps:scale worker=1 -a my-app.
I can see the registered tasks under [tasks] in the worker logs.
However, the scheduled tasks are not running at their scheduled time. Calling send_emails.delay() in the production console does work.
How do I get the worker to stay alive and / or run the job at the scheduled time?
I have a workaround using a command and heroku scheduler. Just unsure if that's the best way to do it.
If you're on free demo, you should know that heroku server sleeps and if your scheduled task becomes due when your server is sleeping, it won't run.
I share you any ideas.
Run console and get the datetime of Dyno. The Dyno use a localtime US.
The DynoFree sleeps each 30 minutes and only 450 hours/month.
Try change celery to BackgroundScheduler,
you need add a script clock.py as:
from myapp import myfunction
from datetime import datetime
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.schedulers.blocking import BlockingScheduler
from time import monotonic, sleep, ctime
import os
sched = BlockingScheduler()
hour = int(os.environ.get("SEARCH_HOUR"))
minutes = int(os.environ.get("SEARCH_MINUTES"))
#sched.scheduled_job('cron', day_of_week='mon-sun', hour=hour, minute = minutes)
def scheduled_job():
print('This job: Execute myfunction every at ', hour, ':', minutes)
#My function
myfunction()
sched.start(
)
In Procfile:
clock: python clock.py
and run:
heroku ps:scale clock=1 --app thenameapp
Regards.

Issues when using rpc:// as a backend for celery application (while amqp:// backend works)

I have a simple celery application with two tasks, a_func() and b_func().
After starting the celery worker, I am calling a_func.apply_async(), and a_func, when running on worker is calling b_func.apply_async().
When using 'amqp://' as a backend everything is working well.
However, when using 'rpc://' as a backend, I am having problems.
I am trying to get the state and the return value of the tasks.
For the a_func() task, there is no problem. However for b_func() I am getting state = 'PENDING' forever, and get() is stuck forever.
I am using:
celery version 4.3.0.
rabbitmq version 3.5.7 as broker.
python 2.7.
ubuntu version 16.0.4 LTS.
Worker cmd:
celery -A celery_test worker --loglevel=inf
celery application:
app = Celery('my_app',
backend='rpc://',
broker='pyamqp://guest#localhost/celery',
include=['tasks'])
a_func and b_func tasks:
#task
def a_func():
print "A"
b_func.apply_async()
return "A"
#task
def b_func():
print "B"
return "B"

Heroku database management from worker

Wondering if anyone can help me with this or at least guide me the correct way.
I currently have a web and a worker process running. I need a task to run 24/7 while the dynos are online, it's job is to access the database and remove records that have expired by checking the "expiry" value for each record against the current timestamp.
My worker.py file:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
This is as shown by the heroku documentation.
Then in my app.py:
from rq import Queue
from worker import conn
from datetime import datetime
q = Queue(connection=conn)
def myFunction():
while True:
for item in Users.query.all():
if int(item.expiry) < (datetime.now().timestamp()):
db.session.delete(item)
db.session.commit()
If __name__ == “__main__”:
q.enqueue(myFunction)
app.run()
My profile looks like so:
web: gunicorn app:app
worker: python worker.py
When I run this, expired records are not removed from the database. Is there anyway I can solve this or diagnose the issue further?
The code that enqueues your task is inside the __name__ == “__main__” block, so it only runs when your script is run directly - eg via python app.py. But you are running this on Heroku via the procfile, which loads it as a module into gunicorn - so that code is never executed. You need to put it somewhere else.
Note though I can't see any reason for using rq here at all. That's used for creating workers that dynamically run offline tasks when enqueued by your web processes. But you seem to want one function to run continuously; rq is irrelevant here, you should just run that code directly via the procfile.

Flask and Celery on Heroku: sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) SSL error: decryption failed or bad record mac

I'm trying to deploy a flask app on heroku that uses background tasks in Celery. I've implemented the application factory pattern so that the celery processes are not bound to any one instance of the flask app.
This works locally, and I have yet to see an error. But when deployed to heroku, the same results always occur: the celery task (I'm only using one) succeeds the first time it is run, but any subsequent celery calls to that task fail with sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) SSL error: decryption failed or bad record mac. If I restart the celery worker, the cycle continues.
There are multiple issues that show this same error, but none specify a proper solution. I initially believed implementing the application factory pattern would have prevented this error from manifesting, but it's not quite there.
In app/__init__.py I create the celery and db objects:
celery = Celery(__name__, broker=Config.CELERY_BROKER_URL)
db = SQLAlchemy()
def create_app(config_name):
app = Flask(__name__)
app.config.from_object(config[config_name])
db.init_app(app)
return app
My flask_celery.py file creates the actual Flask app object:
import os
from app import celery, create_app
app = create_app(os.getenv('FLASK_CONFIG', 'default'))
app.app_context().push()
And I start celery with this command:
celery worker -A app.flask_celery.celery --loglevel=info
This is what the actual celery task looks like:
#celery.task()
def task_process_stuff(stuff_id):
stuff = Stuff.query.get(stuff_id)
stuff.processed = True
db.session.add(stuff)
db.session.commit()
return stuff
Which is invoked by:
task_process_stuff.apply_async(args=[stuff.id], countdown=10)
Library Versions
Flask 0.12.2
SQLAlchemy 1.1.11
Flask-SQLAlchemy 2.2
Celery 4.0.2
The solution was to add db.engine.dispose() at the beginning of the task, disposing of all db connections before any work begins:
#celery.task()
def task_process_stuff(stuff_id):
db.engine.dispose()
stuff = Stuff.query.get(stuff_id)
stuff.processed = True
db.session.commit()
return stuff
As I need this functionality across all of my tasks, I added it to task_prerun:
#task_prerun.connect
def on_task_init(*args, **kwargs):
db.engine.dispose()

Gunicorn do not reload worker

Can I tell the Gunicorn to fail when one of the workers failed to boot? I don't want gunicorn to automatically handle and reload the worker for me, but I want it to fail not trying to launch worker again and again. Should I raise any specific exception to master process or some signal? Or I can provide a command line argument when launching master process?
I want to implement smth like this logic in worker:
if cond():
sys.exit(1)
and then all the gunicorn to stop without relaunching this one worker
So, the solution is to use Gunicorn hooks. There are a lot of them but for this particular case you can use worker_int hook.
Example of usage might be the following (simplified version, launched with gunicorn app_module:app --config gunicorn_config.py), content of gunicorn_config.py:
import sys
workers = 1
loglevel = 'debug'
def worker_int(worker):
print('Exit because of worker failure')
sys.exit(1)
And you worker code might be a simple Flask app for example (content of app_module.py:
from flask import Flask
app = Flask()
Other useful hooks:
on_exit - before exiting gunicorn
pre_request - before a worker processes the request
on_starting - before the master process is initialized
That's it!

Categories