Server side python code runing continuosly per session - python

I have searched the forums for my question but im either searching for a thing naming it wrongly or the question is hard which i really doubt.
I am developing a web-app which would have an web-interface written in one of the MVC frameworks like django or even flask and allow user to login, will identify users session and allow to make some settings and also my app needs to run some python process(script which basically is a separate file) on the server on a per-session per-settings made by user basis. This process is quite long - can take even days to perform and shouldn't affect the execution and performance of MVC part of an app. Another issue is that this process should be run per user so the basic usage model of such app would be:
1. the user enters the site.
2. the user makes some settings which are mirrored to database.
3. the user pushes the launch button which executes some python script just for this user with the settings he has made.
4. the user is able to monitor some parameters of the script running based on some messages that the script itself generates.
I do understand that my question is related to the architecture of the app itself and i'm quite new to python and haven't had any experience of developing such complex application but I'm also quite eager to learn about it. I do understand the bricks from which my app should be built (like django or flask and the server-side script itself) but i know very little about how this elements should be glued together to create seamless environment. Please direct me to some articles related to this topic or recommend some similar threads or just give a clear high level explanation how such separate python processes could be triggered,run and monitored further on a per-user basis from controller part of MVC.

Celery is a great solution, but it can be overpowered for many setups. If you just need tasks to run periodically (once an hour, once a day, etc) then consider just using cron.
There's a lot less setup and it can get you quite far.

Celery is the perfect solution for you purposes.
Celery can easily run long tasks, but you have to write monitoring part. It's simple - you can use django-orm from a celery task.
Do not use django-celery or flask-celery applicattion - they are deprecated.

Related

How to do long running process in Django view?

I have a django web application and I have to create model for Machine Learning in a view.
It takes a long time so PythonAnyWhere does not allow it and it kills the process when it reaches 300 seconds. According to that, i want to ask two questions.
Without celery, django bg task or something else, my view which contains the long running process does not work in order. But when I use debugger, it works correctly. Probably, some lines of code try to work without waiting for each other without debugger. How can i fix this?
PythonAnyWhere does not support celery or other long running task packages. They suggest django-background-tasks but in its documentation, the usage is not explained clearly. So I could not integrate it. How can i integrate django-background-tasks?
Thank you.

Display progress of a long running Python task in Django

I currently have a typical Django structure set up for a project and one web application.
The web application is set up so that a user inputs some information, and this information is taken as the input to run a Python program.
This python program sometimes can take quite a while to finish (grabbing things from the web and doing some text mining scoring) - sometimes it can take multiple minutes to load.
On the command line, this program would periodically display where it was in the process (it'd first say how many things it found to score against, then it'd say where in the number of things found it is in the scoring process), which was very useful. However, when I moved this over to a Django set up, I no longer have this capability (at least, not in the same way since now this is sent to log files).
The way I set it up is that there is an input view, and then a results view. The results view takes the input and runs the Python program. It won't display the results until the entire program is run. So on the user side, the browser just sits there for sometimes minutes before the results are displayed. Obviously, this is not ideal.
Does anyone know of the best way to bring status information on a task to Django?
I've looked into Celery a little bit, but I think since I'm still a beginner in Django that I'm confusing myself with some of the documentation. For instance: even if the task is sent off asynchronously to a worker, how does the browser grab the current state of the program?? Also, consistent documentation seems to be lacking for celery on Django (I've seen people set up celery many different ways on their Django projects).
I would appreciate any input here, I've been stuck on this for a while now.
My first suggestion is to psychologically separate celery from django when you start to think of the two. They can run in the same environment, but celery is to asynchronous processes what django is to http requests.
Also remember that celery is unlike diango in that it requires other services to function; a message broker. So by using celery you will increase your architectural requirements.
To address you specific use case, you'll need a system to publish messages from each celery task to a message broker and your web client will need to subscribe to those messages.
There's a lot involved here, but the short version is that you can use Redis as your celery message broker as well as your pub/sub service to get messages back to the browser. You can then use e.g diango-redis-websockets to subscribe the browser to the task state messages in redis

Splitting a Django project

I have a django project with various apps, which are completely independent. I'd like to make them run each one in their own process, as some of them spawn background threads to precalculate periodically some data and now they are competing for the CPU (the machine has loads of cores, but you know, the GIL and such...)
So, is there an easy way to split automatically the project into different ones, or at least to make each app live in its own process?
You can always have different settings files, but that would be like having multiple projects and even multiple endpoints. With some effort you could configure a reverse proxy to forward to the right Django server, based on the request's path and so on, but I don't think that's what you want and it would be an ugly solution to your problem.
The solution to this is to move the heavy processing to a jobs queue. A lot of people and projects prefer Celery for this.
If that seems like overkill for some reason, you can always implement your own based on simple cron jobs. You can take a look at my small project that does this.
The simplest of the simple is probably to write a custom management command that observes given model (database table) for new entries and processes them. The model is written to by e.g. Django view and the management command is launched periodically from cron (e.g. every 5 minutes).
Example: user registers on the site, but the account creation is an expensive operation (allocating some space, pinging remote services etc.). Therefore you just write a new record to AccountRequest table (AccountRequest.objects.create(...)). Then, cron periodically launches your management script (./manage.py account_creator), which checks for new AccountRequest-s (AccountRequest.objects.filter(unprocessed=True)), does its job and marks those requests as processed.

Run a repeating task for a web app

This seems like a simple question, but I am having trouble finding the answer.
I am making a web app which would require the constant running of a task.
I'll use sites like Pingdom or Twitterfeed as an analogy. As you may know, Pingdom checks uptime, so is constantly checking websites to see if they are up and Twitterfeed checks RSS feeds to see if they;ve changed and then tweet that. I too need to run a simple script to cycle through URLs in a database and perform an action on them.
My question is: how should I implement this? I am familiar with cron, currently using it to do my server backups. Would this be the way to go?
I know how to make a Python script which runs indefinitely, starting back at the beginning with the next URL in the database when I'm done. Should I just run that on the server? How will I know it is always running and doesn't crash or something?
I hope this question makes sense and I hope I am not repeating someone else or anything.
Thank you,
Sam
Edit: To be clear, I need the task to run constantly. As in, check URL 1 in the database, check URl 2 in the database, check URL 3 and, when it reaches the last one, go right back to the beginning. Thanks!
If you need a repeatable running of the task which can be run from command line - that's what the cron is ideal for.
I don't see any demerits of this approach.
Update:
Okay, I saw the issue somewhat different. Now I see several solutions:
run the cron task at set intervals, let it process the data once per run, next time it will process the data on another run; use PIDs/Database/semaphores to avoid parallel processes;
update the processes that insert/update data in the database; let the information be processed when it is inserted/updated; c)
write a demon process which will reside in memory and check the data in real time.
cron would definitely be a way to go with this, as well as any other task scheduler you may prefer.
The main point is found in the title to your question:
Run a repeating task for a web app
The background task and the web application should be kept separate. They can share code, they can share access to a database, but they should be separate and discrete application contexts. (Consider them as separate UIs accessing the same back-end logic.)
The main reason for this is because web applications and background processes are architecturally very different and aren't meant to be mixed. Consider the structure of a web application being held within a web server (Apache, IIS, etc.). When is the application "running"? When it is "on"? It's not really a running task. It's a service waiting for input (requests) to handle and generate output (responses) and then go back to waiting.
Web applications are for responding to requests. Scheduled tasks or daemon jobs are for running repeated processes in the background. Keeping the two separate will make your management of the two a lot easier.

Backend processing for Django

I'm working on a turn-based web game that will perform all world updates (player orders, physics, scripted events, etc.) on the server. For now, I could simply update the world in a web request callback. Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
So what is the best way to separate the load from the web server, ideally in a way that could even be run on a separate machine?
A simple python module with infinite loop?
A distributed task in something like Celery?
Some sort of cross-platform Cron scheduler?
Some other fancy Django feature or third-party library that I don't know about?
I also want to minimize code duplication by using the same model layer. That probably means my service would need access to the Django model code, so that definitely determines how I architect the service.
I think Celery, which you mention in your question, is the way to go here. It will interface nicely with the rest of your setup, support your eventual aim of separating out the systems, and is compatible with Django.
I'd just write the backend to just use the Django database interface (look at the setup code in your manage.py), spawn it as its own process, and interface to it with Protocol Buffers. That route should move to a separate machine with little work. MPI may be an option, too.
Pipes, FIFOs, and most other IPC requires both processes to be on the same box.
Though I have to point out a flaw in your premise:
Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
If you run concurrent games, so long as you keep all the parts for a given game on the same server, this is a non-issue, unless there's a common resource needed by all games. Then the real issue becomes load balancing across the servers.

Categories