Heroku database management from worker

Heroku database management from worker - python

Wondering if anyone can help me with this or at least guide me the correct way.
I currently have a web and a worker process running. I need a task to run 24/7 while the dynos are online, it's job is to access the database and remove records that have expired by checking the "expiry" value for each record against the current timestamp.
My worker.py file:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
This is as shown by the heroku documentation.
Then in my app.py:
from rq import Queue
from worker import conn
from datetime import datetime
q = Queue(connection=conn)
def myFunction():
while True:
for item in Users.query.all():
if int(item.expiry) < (datetime.now().timestamp()):
db.session.delete(item)
db.session.commit()
If __name__ == “__main__”:
q.enqueue(myFunction)
app.run()
My profile looks like so:
web: gunicorn app:app
worker: python worker.py
When I run this, expired records are not removed from the database. Is there anyway I can solve this or diagnose the issue further?

The code that enqueues your task is inside the __name__ == “__main__” block, so it only runs when your script is run directly - eg via python app.py. But you are running this on Heroku via the procfile, which loads it as a module into gunicorn - so that code is never executed. You need to put it somewhere else.
Note though I can't see any reason for using rq here at all. That's used for creating workers that dynamically run offline tasks when enqueued by your web processes. But you seem to want one function to run continuously; rq is irrelevant here, you should just run that code directly via the procfile.

Related

APScheduler does not run scheduled tasks: Flask + uWSGI

I have an application on a Flask and uWSGI with a jobstore in a SQLite. I start the scheduler along with the application, and add new tasks through add_task when some url is visited.
I see that the tasks are saved correctly in the jobstore, I can view them through the API, but it does not execute at the appointed time.
A few important data:
uwsgi.ini
processes = 1
enable-threads = true
__init__.py
scheduler = APScheduler()
scheduler.init_app(app)
with app.app_context():
scheduler.start()
main.py
scheduler.add_job(
id='{}{}'.format(test.id, g.user.id),
func = pay_day,
args = [test.id, g.user.id],
trigger ='interval',
minutes=test.timer
)
in service.py
def pay_day(tid, uid):
with scheduler.app.app_context():
*some code here*
Interesting behavior: if you create a task by going to the URL and restart the application after that, the task will be executed. But if the application is running and one of the users creates a task by going to the URL, then this task will not be completed until the application is restarted.
I don't get any errors or exceptions, even in the scheduler logs.
I already have no idea how to make it work and what I did wrong. I need a hint.

uWSGI employs some tricks which disable the Global Interpreter Lock and with it, the use of threads which are vital to the operation of APScheduler. To fix this, you need to re-enable the GIL using the --enable-threads switch. See the uWSGI documentation for more details.
I know that you had enable-threads = true in uwsgi.ini, but try the to enable it using the command line.

Background worker process crashing

I'm creating a web app that takes username as input and perform some tasks with it. I'm using flask for the web and heroku to deploy it.I've two main python scripts these run the app. app.py and task.py. Everything is okey into my app.py file, where i used flask code for making the app but into my task.py file I've a
Initialisation step
from instabot import Bot
bot = Bot()
bot.login(username="myusername", password="mypassword")
now I deployed these script into the Procfile like
worker: python task.py
web: gunicorn app:app
but after deploying when i check it's log I'm getting
logs
2022-01-28T18:30:06.376795+00:00
heroku[worker.1]: Starting process
with command `python task.py`
2022-01-
28T18:30:07.081356+00:00
heroku[worker.1]: State changed
from starting to up
2022-01-
28T18:30:08.213687+00:00
heroku[worker.1]: Process exited
with status 0
2022-01-
28T18:30:08.274498+00:00
heroku[worker.1]: State changed
from up to crashed
as you can see it's getting crashed!! i don't know what's my mistake is can you please help me to figure out : (

The code in my worker is almost stolen from the redis-rq documentation, with some changes to make it work also in our local development enviroment:
import os
from rq import Queue, Connection
from tools import get_redis_store
# we need to use a different worker when we are in Heroku
if 'DYNO' in os.environ:
from rq.worker import HerokuWorker as Worker
else:
from rq import Worker
redis_url = os.getenv('REDIS_URL')
if not redis_url:
raise RuntimeError('Set up Redis first.')
listen = ['default']
conn = get_redis_store()
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
On tools.py
_redis_store = None
def get_redis_store():
'''
Get a connection pool to redis based on the url configured
on env variable REDIS_URL
Returns
-------
redis.ConnectionPool
'''
global _redis_store
if not _redis_store:
redis_url = os.getenv('REDIS_URL')
if redis_url:
logger.debug('starting redis: %s ' % redis_url)
if redis_url.startswith('rediss://'):
# redis 6 encripted connection
# heroku needs disable ssl verification
_redis_store = redis.from_url(
redis_url, ssl_cert_reqs=None)
else:
_redis_store = redis.from_url(redis_url)
else:
logger.debug('redis not configured')
return _redis_store
On the main app, to queue a task:
from rq import Queue
from tools import get_redis_store
redis_store = get_redis_store()
queue = Queue(connection=redis_store)
def _slow_task(param1, param2, paramn):
# here I execute the slow task
my_slow_code(param1, param2)
# when I need to execute the slow task
queue.enqueue(_slow_task, param1, param2)
There are multiple options to use redis on Heroku, depending on what you use, you could need to change the env variable REDIS_URL. The environment variable DYNO is checked to know if the code is running on Heroku or in a developer machine.

Long running script from flask endpoint

I've been pulling my hair out trying to figure this one out, hoping someone else has already encountered this and knows how to solve it :)
I'm trying to build a very simple Flask endpoint that just needs to call a long running, blocking php script (think while true {...}). I've tried a few different methods to async launch the script, but the problem is my browser never actually receives the response back, even though the code for generating the response after running the script is executed.
I've tried using both multiprocessing and threading, neither seem to work:
# multiprocessing attempt
#app.route('/endpoint')
def endpoint():
def worker():
subprocess.Popen('nohup php script.php &', shell=True, preexec_fn=os.setpgrp)
p = multiprocessing.Process(target=worker)
print '111111'
p.start()
print '222222'
return json.dumps({
'success': True
})
# threading attempt
#app.route('/endpoint')
def endpoint():
def thread_func():
subprocess.Popen('nohup php script.php &', shell=True, preexec_fn=os.setpgrp)
t = threading.Thread(target=thread_func)
print '111111'
t.start()
print '222222'
return json.dumps({
'success': True
})
In both scenarios I see the 111111 and 222222, yet my browser still hangs on the response from the endpoint. I've tried p.daemon = True for the process, as well as p.terminate() but no luck. I had hoped launching a script with nohup in a different shell and separate processs/thread would just work, but somehow Flask or uWSGI is impacted by it.
Update
Since this does work locally on my Mac when I start my Flask app directly with python app.py and hit it directly without going through my Nginx proxy and uWSGI, I'm starting to believe it may not be the code itself that is having issues. And because my Nginx just forwards the request to uWSGI, I believe it may possibly be something there that's causing it.
Here is my ini configuration for the domain for uWSGI, which I'm running in emperor mode:
[uwsgi]
protocol = uwsgi
max-requests = 5000
chmod-socket = 660
master = True
vacuum = True
enable-threads = True
auto-procname = True
procname-prefix = michael-
chdir = /srv/www/mysite.com
module = app
callable = app
socket = /tmp/mysite.com.sock

This kind of stuff is the actual and probably main use case for Python Celery (https://docs.celeryproject.org/). As a general rule, do not run long-running jobs that are CPU-bound in the wsgi process. It's tricky, it's inefficient, and most important thing, it's more complicated than setting up an async task in a celery worker. If you want to just prototype you can set the broker to memory and not using an external server, or run a single-threaded redis on the very same machine.
This way you can launch the task, call task.result() which is blocking, but it blocks in an IO-bound fashion, or even better you can just return immediately by retrieving the task_id and build a second endpoint /result?task_id=<task_id> that checks if result is available:
result = AsyncResult(task_id, app=app)
if result.state == "SUCCESS":
return result.get()
else:
return result.state # or do something else depending on the state
This way you have a non-blocking wsgi app that does what is best suited for: short time CPU-unbound calls that have IO calls at most with OS-level scheduling, then you can rely directly to the wsgi server workers|processes|threads or whatever you need to scale the API in whatever wsgi-server like uwsgi, gunicorn, etc. for the 99% of workloads as celery scales horizontally by increasing the number of worker processes.

This approach works for me, it calls the timeout command (sleep 10s) in the command line and lets it work in the background. It returns the response immediately.
#app.route('/endpoint1')
def endpoint1():
subprocess.Popen('timeout 10', shell=True)
return 'success1'
However, not testing on WSGI server, but just locally.

Would it be enough to use a background task? Then you only need to import threading e.g.
import threading
import ....
def endpoint():
"""My endpoint."""
try:
t = BackgroundTasks()
t.start()
except RuntimeError as exception:
return f"An error occurred during endpoint: {exception}", 400
return "successful.", 200
return "successfully started.", 200
class BackgroundTasks(threading.Thread):
def run(self,*args,**kwargs):
...do long running stuff

Gunicorn do not reload worker

Can I tell the Gunicorn to fail when one of the workers failed to boot? I don't want gunicorn to automatically handle and reload the worker for me, but I want it to fail not trying to launch worker again and again. Should I raise any specific exception to master process or some signal? Or I can provide a command line argument when launching master process?
I want to implement smth like this logic in worker:
if cond():
sys.exit(1)
and then all the gunicorn to stop without relaunching this one worker

So, the solution is to use Gunicorn hooks. There are a lot of them but for this particular case you can use worker_int hook.
Example of usage might be the following (simplified version, launched with gunicorn app_module:app --config gunicorn_config.py), content of gunicorn_config.py:
import sys
workers = 1
loglevel = 'debug'
def worker_int(worker):
print('Exit because of worker failure')
sys.exit(1)
And you worker code might be a simple Flask app for example (content of app_module.py:
from flask import Flask
app = Flask()
Other useful hooks:
on_exit - before exiting gunicorn
pre_request - before a worker processes the request
on_starting - before the master process is initialized
That's it!

Should two modules use the same redis connection? (I'm working with Flask)

I'm building a Flask app that uses a Redis Queue. The code for the worker is:
listen = ['default']
#redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(list(map(Queue, listen)))
worker.work()
And another module, app.py contains the code to handle Flask routes. My question is, should app.py create a new Redis connection as:
q = Queue(connection= redis.from_url(redis_url))
q.enqueue_call(func=mailers.send_message, kwargs=request.json, result_ttl=86400)
Or should app.py use
import conn from worker
And use that connection?

I'd say use a new connection unless you really have a good reason not to (although I can't imagine such reason)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Heroku database management from worker - python

Related

APScheduler does not run scheduled tasks: Flask + uWSGI

Background worker process crashing

Long running script from flask endpoint

Gunicorn do not reload worker

Should two modules use the same redis connection? (I'm working with Flask)

Categories

Resources