I am using Cherrypy to create an Application that takes user input, manipulates that data. Basically, executes a time taking script. And then when all that is done, it displays a new page. My problem is that by the time my script finishes execution, browser loses connection and displays
The page at myexample.com isn't workingor No data received. Although the whole script doesn't take more than a minute to execute. Any leads on how to go about would be appreciated.
Cherrypy is a multi-threaded python web server. Due to the python GIL you cannot run a time taking script when answering a request because it will cause Cherrypy to be unresponsive to any new user, meanwhile your script is running.
You need to run your time taking script in a separated python process. The best way to do this is using a queue manager like Celery or RQ.
Check this answer to have a detailed example on how to do this with Cherrypy.
Related
OK so I'm working on an app that has 2 Heroku apps - one is the writer that writes to my DB after scraping a site, and one is the reader that consumes the said DB.
The former is just a Python script that has a kind of a while 1 loop - it's actually a Twitter stream. I want this to run every x minutes independent of what the reader is doing.
Now, running the script locally works fine, but I'm not sure how getting this to work on Heroku would work. I've tried looking it up, but could not find a solid answer. I read about background tasks, Redis queue, One-off dynos etc, but I'm not sure what to really use for my purpose. Some of my requirements are:
have the Python script keep logs of whatever I want.
in the future, I might want to add an admin panel for the writer, that will just show me stats of the script (and the logs). So hooking up this admin panel (flask) should be easy-ish and not break the script itself.
I would love any suggestions or pointers here.
I suggest writing the consumer as a server that waits around, then processes the stream on the timed interval. That is, you start it once and it runs forever, doing some processing every 10 minutes or so.
See: sched Python module, which handles scheduling events at certain times and running them.
Simpler: use Heroku's scheduler service.
This technique is simpler -- it's just straight-through code -- but can lead to problems if you have two of the same consumer running at the same time.
Before you go any further, I am currently working in a very restricted environment. Installing additional dll's/exe's, and other admin like activities are frustratingly difficult. I am fully aware that some of the methodology described in this post is far from best practice...
I would like to start a long running background process that start/stops with Apache. I have a cgi enabled python script that takes as input all of the parameters necessary to run a complex "job". It is not feasible to run this job in the cgi script itself - because a)cgi is already slow to begin with and b)multiple simultaneous requests would definitely cause trouble. The cgi script will do nothing more than enter the parameters into a "jobs" database.
Normally, I would set something up like MSMQ in conjunction with a Windows Service. I would have a web service add a job to the queue, and the windows service would be polling the queue at some standard interval - processing jobs in sequence...
How could I accomplish the same in Apache? I can easily enough create a python script to serve as the background job processor. My questions are:
how do I start it process up with, leave it running with, and stop with Apache?
how can i monitor the process - make sure stays alive with Apache?
Any tips or insight welcome.
Note. OS is Windows Server 2008
Heres a pretty hacky solution for anyone looking to do something similar.
Set up a windows scheduled task that does that background processing. set it to run once a day or whatever interval you want (it is irrelevant, as you'll see in next steps)
In the Settings tab of the Scheduled Task - make sure the "Allow task to be run on demand" option is checked. Also, under the "If the task is already running..." text, make sure the Do not start a new instance option in selected.
Then, from the cgi script - it is possible to invoke the scheduled task from the command line(subprocess module) see here. With the options set above - if the task is already running - any subsequent run on demands are ignored.
This seems like a simple question, but I am having trouble finding the answer.
I am making a web app which would require the constant running of a task.
I'll use sites like Pingdom or Twitterfeed as an analogy. As you may know, Pingdom checks uptime, so is constantly checking websites to see if they are up and Twitterfeed checks RSS feeds to see if they;ve changed and then tweet that. I too need to run a simple script to cycle through URLs in a database and perform an action on them.
My question is: how should I implement this? I am familiar with cron, currently using it to do my server backups. Would this be the way to go?
I know how to make a Python script which runs indefinitely, starting back at the beginning with the next URL in the database when I'm done. Should I just run that on the server? How will I know it is always running and doesn't crash or something?
I hope this question makes sense and I hope I am not repeating someone else or anything.
Thank you,
Sam
Edit: To be clear, I need the task to run constantly. As in, check URL 1 in the database, check URl 2 in the database, check URL 3 and, when it reaches the last one, go right back to the beginning. Thanks!
If you need a repeatable running of the task which can be run from command line - that's what the cron is ideal for.
I don't see any demerits of this approach.
Update:
Okay, I saw the issue somewhat different. Now I see several solutions:
run the cron task at set intervals, let it process the data once per run, next time it will process the data on another run; use PIDs/Database/semaphores to avoid parallel processes;
update the processes that insert/update data in the database; let the information be processed when it is inserted/updated; c)
write a demon process which will reside in memory and check the data in real time.
cron would definitely be a way to go with this, as well as any other task scheduler you may prefer.
The main point is found in the title to your question:
Run a repeating task for a web app
The background task and the web application should be kept separate. They can share code, they can share access to a database, but they should be separate and discrete application contexts. (Consider them as separate UIs accessing the same back-end logic.)
The main reason for this is because web applications and background processes are architecturally very different and aren't meant to be mixed. Consider the structure of a web application being held within a web server (Apache, IIS, etc.). When is the application "running"? When it is "on"? It's not really a running task. It's a service waiting for input (requests) to handle and generate output (responses) and then go back to waiting.
Web applications are for responding to requests. Scheduled tasks or daemon jobs are for running repeated processes in the background. Keeping the two separate will make your management of the two a lot easier.
Here is my scenario.
I have a ajax call in my web site to find the elevation at particular point. Once this point comes into an action of a controller in Ruby on rails, I have to use python on command line to find the elevation.
The following sequence of commands in DOS does that for me.
python (starts a python session)
import arcpy (takes a lot of time)
function call (very fast).
Now if I put this into a script and run it, I do get the result, but its very slow, because the 'import' step takes a lot of time. But the actual function takes less than a second.
As all this is suppose to happen behind an Ajax call on ror web site, such a large delay is unacceptable.
Question:
Is it possible for me in Ror to open a 'command line session' when the application loads and issue the first two commands, and then use this session every time a request comes in a controller's action, and issue the third command, and return its output?
If yes can someone please post some samples?
Thanks
Shaunak
What you are proposing could be possible if Rails was friendlier about forked processes. A cleaner and better solution would be to write a python daemon that you could query so that you don't incur the startup penalty. (This could be a web-service or a daemon you communicate with standard network sockets or whatever).
I'm trying to understand threading better. If I create a program that allows people to upload a photo and then I create a new process to resize the image in a hundred ways (taking 5 seconds or longer), and the main program returns a response HTML page to the user saying "Thanks. You're done!", can the other process still be working at this point? Assume that I'm using the multiprocessing module vs GIL threading with subprocess.
Considering a Message or Job Queue is a good idea when you have background processing to do. This way you won't have to write your own code to handle job scheduling, priority etc. You can also add more servers to the queue when the first one starts running out of capacity.There's a package called Celery that is supposed to provide MQ access to django apps.
In your case, you might create an animated 'throbber' on your page that periodically polls the server though ajax. When the image processing is done, you can update the page.