subprocess in views.py does not work

subprocess in views.py does not work - python

I need a function that starts several beanstalk workers before starting to record some videos with different cameras. All of these work in beanstalk. As I need to start the workers before the video record, I want to do a subprocess, but this does not work. The most curious thing is that if I run the subprocess alone in a different python script outside of this function (in the shell), this works! This is my code (the one which is not working):
os.chdir(path_to_the_manage.py)
subprocess.call("python manage.py beanstalk_worker -w 4",shell=True)
phase = get_object_or_404(Phase, pk=int(phase_id))
cameras = Video.objects.filter(phase=phase)
###########################################################################
## BEANSTALK
###########################################################################
num_workers = 4
time_to_run = 86400
[...]
for camera in cameras:
arg = phase_id +' '+settings.PATH_ORIGIN_VIDEOS +' '+camera.name
beanstalk_client.call('video.startvlcserver', arg=arg, ttr=time_to_run)
I want to include the subprocess because it's annoying to me if I have to run the beanstalk workers on each video record I wanna do.
Thanks in advance.

I am not quite sure that subproccess.call is what you are looking for. I believe the issue is subproccess call is syncronous. It doesn't spawn a new process but calls the command within the context of the web request. This ties up resources and if the request times out or the user cancels, weird things could happen?
I have never used beanstalkd, but with celery (another job queue) the celeryd worker process is always running, waiting for jobs. This makes it easy to manage using supervisord. If you look at beanstalkd deployment, I wouldn't be suprised if they recommend doing the same thing. This should include starting your beanstalk workers outside of the context of a view.
From the command line
python manage.py beanstalk_worker -w 4
Once your beanstalkd workers are set up and running, you can send jobs to the queue using an async beanstalk api call, from your view
https://groups.google.com/forum/#!topic/django-users/Vyho8TFew2I

Related

How can I spawn a long running python process from a python script served by Apache running on ubuntu?

Everything I've found so far is good ONLY IF if it's not being used in a web environment.
I have a simple python 'admin tool' web app running in Apache, and I'm using the simple configuration in the vhost :
AddHandler cgi-script .py.
There's no need for high performance, there's no python framework. Users select certain criteria which is sent via a GET request to the server & a python script then gets called by apache and the report is compiled. Problem is that the report complexity is growing, and the length of time it takes to compile this report can be > 20 minutes so I want to spawn a new totally independent process to generate the report asynchronously, allowing the server to return quickly to the client.
My issue is that I've tried absolutely everything I can find to be able to do this, and everything fails in the same way; when I spawn a new process the parent script will not return to the client until the new process has completed.
All of these do exactly the same - they start a new process BUT the main script waits for the child process to finish before returning to the client. I want the main script to return straight back to the client, and leave the newly spawned process running to compile the report.
os.system("""
cd /the_directory
python -m web.wait &
""")
command = ['python', '/the_directory/wait.py &']
Popen(command, shell=True, start_new_session=True)
L = ['python', '/the_directory/wait.py']
os.spawnvp(os.P_NOWAIT, 'python', L)
L = ['test.sh']
os.spawnvpe(os.P_NOWAIT, '/the_directory/test.sh', L, os.environ)
where test.sh contains:
#!/bin/sh
/the_directory/wait.py &
jobs = []
for i in range(5):
p = multiprocessing.Process(target=wait.dostuff, )
jobs.append(p)
p.start()
I want the simplest possible solution to this - it's an admin tool maintained only by me for a very few users.
If there's really no way to accomplish this then I guess I will have to use cron running once a minute and use a simple queue, maybe with redis or something, but this is not ideal.
I used to do this kind of thing relatively easily in php, I can't believe it's not possible in python.

You could use Popen with shell=False
example:
Popen(["bash", "test.sh"], shell=False)

Not able to see the output messages of the tasks performed in celery-rabbitMQ combination

I am trying out celery and rabbitMQ combinations for asynchronous task scheduling, below is the sample program I tried on Pycharm IDE....Everything seems to be working but I am not able to see the return value from the tasks.
I am also monitoring the rabbitMQ management console , but still not able to see the return value from the tasks.
I dont understand where I am going wrong,this is my first attempt at celery and RabbitMQ
I have created a tasks.py file with 2 sample tasks(with proper decorators assigned) and returning a value for each task.
I have also started teh RabbitMQ server (using {rabbitmq-server start} command).
Then I have started the celery worker, command used : {celery -A tasks --loglevel=info}
Now, when I am trying to execute these tasks using delay() method, the command ({reverse.delay('andy')}) is running and I am getting , something like this, but I am not able to see the returned value.
from celery import Celery
app = Celery('tasks', broker= 'amqp://localhost//', backend='rpc://')
#app.task
def reverse(string):
return string[::-1]
#app.task
def add(x, y):
return x + y

Well, I have figured out the issue... It seams the latest versions of celery don't go well with windows. To fix this issue, I have installed 'eventlet' package and it takes care of the rest.
One thing to note is that, we need to start the celery worker using eventlet support, PFB the command:
celery -A <module_name> worker -loglevel=info -P eventlet

You can check the return value from task by using the result property. The syntax below is in python3 with type hinting so that you can follow along with what is going on.
import time
from celery.result import AsyncResult
result: AsyncResult = reverse.delay('andy')
task_id: str = result.id
while not result.ready():
time.sleep(1)
result = AsyncResult(task_id)
if result.successful():
return_value = result.result
n.b., the above is a naive example because in a production environment, you typically don't busy wait like we did here. Instead the client issues periodic polls checking whether the status is ready and, if not, returns control to some other portion of the code (like a UI with a spinner).

Celery worker stops after being idle for a few hours

I have a Flask app that uses WSGI. For a few tasks I'm planning to use Celery with RabbitMQ. But as the title says, I am facing an issue where the Celery tasks run for a few minutes and then after a long time of inactivity it just dies off.
Celery config:
CELERY_BROKER_URL='amqp://guest:guest#localhost:5672//'
BROKER_HEARTBEAT = 10
BROKER_HEARTBEAT_CHECKRATE = 2.0
BROKER_POOL_LIMIT = None
From this question, I added BROKER_HEARTBEAT and BROKER_HEARTBEAT_CHECKRATE.
I run the worker inside the venv with celery -A acmeapp.celery worker & to run it in the background. And while checking the status, for the first few minutes, it shows that one node is online and gives an OK response. But after a few hours of the app being idle, when I check the Celery status, it shows Error: No nodes replied within time constraint..
I am new to Celery and I don't know what to do now.

Your Celery worker might be trying to reconnect to the app until it reaches the retry limit. If that is the case, setting up this options in your config file will fix that problem.
BROKER_CONNECTION_RETRY = True
BROKER_CONNECTION_MAX_RETRIES = 0
The first line will make it retry whenever it fails, and the second one will disable the retry limit.
If that solution does not suit you enough, you can also try a high timeout (specified in seconds) for your app using this option:
BROKER_CONNECTION_TIMEOUT = 120
Hope it helps!

How to schedule a single-time event in Django in Heroku?

I have a question about the software design necessary to schedule an event that is going to be triggered once in the future in Heroku's distributed environment.
I believe it's better to write what I want to achieve, but I have certainly done my research and could not figure it out myself even after two hours of work.
Let's say in my views.py I have a function:
def after_6_hours():
print('6 hours passed.')
def create_game():
print('Game created')
# of course time will be error, but that's just an example
scheduler.do(after_6_hours, time=now + 6)
so what I want to achieve is to be able to run after_6_hours function exactly 6 hours after create_game has been invoked. Now, as you can see, this function is defined out of the usual clock.py or task.py or etc etc files.
Now, how can I have my whole application running in Heroku, all the time, and be able to add this job into the queue of this imaginary-for-now-scheduler library?
On a side note, I can't use Temporizer add-on of Heroku. The combination of APScheduler and Python rq looked promising, but examples are trivial, all scheduled on the same file within clock.py, and I just simply don't know how to tie everything together with the setup I have. Thanks in advance!

In Heroku you can have your Django application running in a Web Dyno, which will be responsible to serve your application and also to schedule the tasks.
For example (Please note that I did not test run the code):
Create after_hours.py, which will have the function you are going to schedule (note that we are going to use the same source code in worker too).
def after_6_hours():
print('6 hours passed.')
in your views.py using rq (note that rq alone is not enough in your situation as you have to schedule the task) and rq-scheduler:
from redis import Redis
from rq_scheduler import Scheduler
from datetime import timedelta
from after_hours import after_6_hours
def create_game():
print('Game created')
scheduler = Scheduler(connection=Redis()) # Get a scheduler for the "default" queue
scheduler.enqueue_in(timedelta(hours=6), after_6_hours) #schedules the job to run 6 hours later.
Calling create_game() should schedule after_6_hours() to run 6 hours later.
Hint: You can provision Redis in Heroku using Redis To Go add-on.
Next step is to run rqscheduler tool, which polls Redis every minute to see if there is any job to be executed at that time and places it in the queue(to which rq workers will be listening to).
Now, in a Worker Dyno create a file after_hours.py
def after_6_hours():
print('6 hours passed.')
#Better return something
And create another file worker.py:
import os
import redis
from rq import Worker, Queue, Connection
from after_hours import after_6_hours
listen = ['high', 'default', 'low'] # while scheduling the task in views.py we sent it to default
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
and run this worker.py
python worker.py
That should run the scheduled task(afer_6_hours in this case) in Worker Dyno.
Please note that the key here is to make the same source code (after_hours.py in this case) available to worker too. The same is emphasized in rq docs
Make sure that the worker and the work generator share exactly the
same source code.
If it helps, there is a hint in the docs to deal with different code bases.
For cases where the web process doesn't have access to the source code
running in the worker (i.e. code base X invokes a delayed function
from code base Y), you can pass the function as a string reference,
too.
q = Queue('low', connection=redis_conn)
q.enqueue('my_package.my_module.my_func', 3, 4)
Hopefully rq-scheduler too respects this way of passing string instead of function object.
You can use any module/scheduling tool (Celery/RabbitMQ, APScheduler etc) as long as you understand this thing.

Web2Py - configure a scheduler

I have an application written in Web2Py that contains some modules. I need to call some functions out of a module on a periodic basis, say once daily. I have been trying to get a scheduler working for that purpose but am not sure how to get it working properly. I have referred to this and this to get started.
I have got a scheduler.py class in the models directory, which contains code like this:
from gluon.scheduler import Scheduler
from Module1 import Module1
def daily_task():
module1 = Module1()
module1.action1(arg1, arg2, arg3)
daily_task_scheduler = Scheduler(db, tasks=dict(my_daily_task=daily_task))
In default.py I have following code for the scheduler:
def daily_periodic_task():
daily_task_scheduler.queue_task('daily_running_task', repeats=0, period=60)
[for testing I am running it after 60 seconds, otherwise for daily I plan to use period=86400]
In my Module1.py class, I have this kind of code:
def action1(self, arg1, arg2, arg3):
for row in db().select(db.table1.ALL):
row.processed = 'processed'
row.update_record()
One of the issues I am facing is that I don't understand clearly how to make this scheduler work to automatically handle the execution of action1 on daily basis.
When I launch my application using syntax similar to: python web2py.py -K my_app it shows this in the console:
web2py Web Framework
Created by Massimo Di Pierro, Copyright 2007-2015
Version 2.11.2-stable+timestamp.2015.05.30.16.33.24
Database drivers available: sqlite3, imaplib, pyodbc, pymysql, pg8000
starting single-scheduler for "my_app"...
However, when I see the browser at:
http://127.0.0.1:8000/my_app/default/daily_periodic_task
I just see "None" as text displayed on the screen and I don't see any changes produced by the scheduled task in my database table.
While when I see the browser at:
http://127.0.0.1:8000/my_app/default/index
I get an error stating This web page is not available, basically indicating my application never got started.
When I start my application normally using python web2py.py my application loads fine but I don't see any changes produced by the scheduled task in my database table.
I am unable to figure out what I am doing wrong here and how to properly use the scheduler with Web2Py. Basically, I need to know how can I start my application normally alongwith the scheduled tasks properly running in background.
Any help in this regard would be highly appreciated.

Running python web2py.py starts the built-in web server, enabling web2py to respond to HTTP requests (i.e., serving web pages to a browser). This has nothing to do with the scheduler and will not result in any scheduled tasks being run.
To run scheduled tasks, you must start one or more background workers via:
python web2py.py -K myapp
The above does not start the built-in web server and therefore does not enable you to visit web pages. It simply starts a worker process that will be available to execute scheduled tasks.
Also, note that the above does not actually result in any tasks being scheduled. To schedule a task, you must insert a record in the db.scheduler_task table, which you can do via any of the usual methods of inserting records (including using appadmin) or programmatically via the scheduler.queue_task method (which is what you use in your daily_periodic_task action).
Note, you can simultaneously start the built-in web server and a scheduler worker process via:
python web2py.py -a yourpassword -K myapp -X
So, to schedule a daily task and have it actually executed, you need to (a) start a scheduler worker and (b) schedule the task. You can schedule the task by visiting your daily_periodic_task action, but note that you only need to visit that action once, as once the task has been scheduled, it remains in effect indefinitely (given that you have set repeats=0).
If the task does not appear to be working, it is possible there is something wrong with the task itself that is resulting in an error.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.