I have a django web application and I have to create model for Machine Learning in a view.
It takes a long time so PythonAnyWhere does not allow it and it kills the process when it reaches 300 seconds. According to that, i want to ask two questions.
Without celery, django bg task or something else, my view which contains the long running process does not work in order. But when I use debugger, it works correctly. Probably, some lines of code try to work without waiting for each other without debugger. How can i fix this?
PythonAnyWhere does not support celery or other long running task packages. They suggest django-background-tasks but in its documentation, the usage is not explained clearly. So I could not integrate it. How can i integrate django-background-tasks?
Thank you.
Related
I'm working on a Baseball Simulator app with Dash. It uses a SGD model to simulate gameplay between a lineup and a pitcher. The app (under construction) can be found here: https://capstone-baseball-simulator.herokuapp.com/ and the repo: https://github.com/c-fried/capstone_heroku
To summarize the question: I want to be able to run the lineup optimizer on the heroku server.
There are potentially two parts to this: 1. Running the actual function while avoiding timeout. & 2. Displaying the progress of the function as it's running.
There are several issues I'm having with solving this:
The function is expensive and cannot be completed before the 30-second timeout. (It takes several minutes to complete.)
For this, I attempted to follow these instructions (https://devcenter.heroku.com/articles/python-rq) by creating a worker.py (still in the repo), moving the function to the external .py...etc. The problem I believe was that the process still was taking too long and therefore terminating.
I'm (knowingly) using a global variable in the function which works when I run locally but does not work when deployed (for reasons I somewhat understand - workers don't share memory https://dash.plotly.com/sharing-data-between-callbacks)
I was using a global to be able to see live updates of what the function was doing as it was running. Again, worked as a hack locally, but doesn't work on the server. I don't know how else I can watch the progress of the function without some kind of global operation going on. I'd love a clever solution to this, but I can't think of it.
I'm not experienced with web apps, so thanks for the advice in advance.
A common approach to solve this problem is to,
Run the long calculation asynchronously, e.g. using a background service
On completion, put the result in a shared storage space, e.g. a redis cache or an S3 bucket
Check for updates using an Interval component or a Websocket component
I can recommend Celery for keeping track of the tasks.
OK so I'm working on an app that has 2 Heroku apps - one is the writer that writes to my DB after scraping a site, and one is the reader that consumes the said DB.
The former is just a Python script that has a kind of a while 1 loop - it's actually a Twitter stream. I want this to run every x minutes independent of what the reader is doing.
Now, running the script locally works fine, but I'm not sure how getting this to work on Heroku would work. I've tried looking it up, but could not find a solid answer. I read about background tasks, Redis queue, One-off dynos etc, but I'm not sure what to really use for my purpose. Some of my requirements are:
have the Python script keep logs of whatever I want.
in the future, I might want to add an admin panel for the writer, that will just show me stats of the script (and the logs). So hooking up this admin panel (flask) should be easy-ish and not break the script itself.
I would love any suggestions or pointers here.
I suggest writing the consumer as a server that waits around, then processes the stream on the timed interval. That is, you start it once and it runs forever, doing some processing every 10 minutes or so.
See: sched Python module, which handles scheduling events at certain times and running them.
Simpler: use Heroku's scheduler service.
This technique is simpler -- it's just straight-through code -- but can lead to problems if you have two of the same consumer running at the same time.
I have searched the forums for my question but im either searching for a thing naming it wrongly or the question is hard which i really doubt.
I am developing a web-app which would have an web-interface written in one of the MVC frameworks like django or even flask and allow user to login, will identify users session and allow to make some settings and also my app needs to run some python process(script which basically is a separate file) on the server on a per-session per-settings made by user basis. This process is quite long - can take even days to perform and shouldn't affect the execution and performance of MVC part of an app. Another issue is that this process should be run per user so the basic usage model of such app would be:
1. the user enters the site.
2. the user makes some settings which are mirrored to database.
3. the user pushes the launch button which executes some python script just for this user with the settings he has made.
4. the user is able to monitor some parameters of the script running based on some messages that the script itself generates.
I do understand that my question is related to the architecture of the app itself and i'm quite new to python and haven't had any experience of developing such complex application but I'm also quite eager to learn about it. I do understand the bricks from which my app should be built (like django or flask and the server-side script itself) but i know very little about how this elements should be glued together to create seamless environment. Please direct me to some articles related to this topic or recommend some similar threads or just give a clear high level explanation how such separate python processes could be triggered,run and monitored further on a per-user basis from controller part of MVC.
Celery is a great solution, but it can be overpowered for many setups. If you just need tasks to run periodically (once an hour, once a day, etc) then consider just using cron.
There's a lot less setup and it can get you quite far.
Celery is the perfect solution for you purposes.
Celery can easily run long tasks, but you have to write monitoring part. It's simple - you can use django-orm from a celery task.
Do not use django-celery or flask-celery applicattion - they are deprecated.
What on earth am I doing wrong?
I have recently found an awesome django template called django-skel. I started a project with it because it made it very easy to use heroku with django. It was all going great until I tried to get celery working. No matter what I tried I couldn't get my tasks to run. So I started a new bare bones app just to see if I could get it working without any of my other craziness preventing things.
This is my bare-bones app. I have this up and running on heroku. Django admin is working, I have my databases sync'd up and migrated. I am using CloudAMQP Little Lemur for my RabbitMQ. I see the requests queued up in the RabbitMQ interface, nothing happens. How I queue up the tasks is manually run in the shell:
from herokutest.apps.otgcelery.tasks import add
result = add.delay(2,2)
I make sure that I have all 3 dynos running, and still nothing.
Also I have it working locally.
I am sure there are tons of questions, and I'm willing to give them. Just please ask.
Thank you for everyones help.
There were a couple things that I ended up doing wrong. First thing was that I was importing the task incorrectly. All I had to do was:
from apps.otgcelery.tasks import add
result = add.delay(2,2)
Celery is very picky with how you import your tasks. The second issue is that CloudAMQP Free addon does not work out of the box with django-skel. They limit your number of connections to three, and how each thread kicks on it uses those connections up incredibly fast and your tasks just start not connecting. So I fixed this in a couple different ways. I tried another BigWigs RabbitMQ, and it worked great. However, because they were still in Beta I decided to try out Redis. That also worked great, and my tasks are firing off as fast as I can call them.
Once again thank you everyone for your help.
This seems like a simple question, but I am having trouble finding the answer.
I am making a web app which would require the constant running of a task.
I'll use sites like Pingdom or Twitterfeed as an analogy. As you may know, Pingdom checks uptime, so is constantly checking websites to see if they are up and Twitterfeed checks RSS feeds to see if they;ve changed and then tweet that. I too need to run a simple script to cycle through URLs in a database and perform an action on them.
My question is: how should I implement this? I am familiar with cron, currently using it to do my server backups. Would this be the way to go?
I know how to make a Python script which runs indefinitely, starting back at the beginning with the next URL in the database when I'm done. Should I just run that on the server? How will I know it is always running and doesn't crash or something?
I hope this question makes sense and I hope I am not repeating someone else or anything.
Thank you,
Sam
Edit: To be clear, I need the task to run constantly. As in, check URL 1 in the database, check URl 2 in the database, check URL 3 and, when it reaches the last one, go right back to the beginning. Thanks!
If you need a repeatable running of the task which can be run from command line - that's what the cron is ideal for.
I don't see any demerits of this approach.
Update:
Okay, I saw the issue somewhat different. Now I see several solutions:
run the cron task at set intervals, let it process the data once per run, next time it will process the data on another run; use PIDs/Database/semaphores to avoid parallel processes;
update the processes that insert/update data in the database; let the information be processed when it is inserted/updated; c)
write a demon process which will reside in memory and check the data in real time.
cron would definitely be a way to go with this, as well as any other task scheduler you may prefer.
The main point is found in the title to your question:
Run a repeating task for a web app
The background task and the web application should be kept separate. They can share code, they can share access to a database, but they should be separate and discrete application contexts. (Consider them as separate UIs accessing the same back-end logic.)
The main reason for this is because web applications and background processes are architecturally very different and aren't meant to be mixed. Consider the structure of a web application being held within a web server (Apache, IIS, etc.). When is the application "running"? When it is "on"? It's not really a running task. It's a service waiting for input (requests) to handle and generate output (responses) and then go back to waiting.
Web applications are for responding to requests. Scheduled tasks or daemon jobs are for running repeated processes in the background. Keeping the two separate will make your management of the two a lot easier.