Avoid Python CGI browser timeout - python

I have a Python CGI that I use along with SQLAlchemy, to get data from a database, process it and return the result in Json format to my webpage.
The problem is that this process takes about 2mins to complete, and the browsers return a time out after 20 or 30 seconds of script execution.
Is there a way in Python (maybe a library?) or an idea of design that can help me let the script execute completely ?
Thanks!

You will have to set the timing on the http server's (Apache for
example) configuration. The default should be more than 120 seconds, if I remember
correct.

Related

Run python job every x minutes

I have a small python script that basically connects to a SQL Server (Micrsoft) database and gets users from there, and then syncs them to another mysql database, basically im just running queries to check if the user exists, if not, then add that user to the mysql database.
The script usually would take around 1 min to sync. I require the script to do its work every 5 mins (for example) exactly once (one sync per 5 mins).
How would be the best way to go about building this?
I have some test data for the users but on the real site, theres a lot more users so I can't guarantee the script takes 1 min to execute, it might even take 20 mins. However having an interval of say 15 mins everytime the script executes would be ideal for the problem...
Update:
I have the connection params for the sql server windows db, so I'm using a small ubuntu server to sync between the two databases located on different servers. So lets say db1 (windows) and db2 (linux) are the database servers, I'm using s1 (python server) and pymssql and mysql python modules to sync.
Regards
I am not sure cron is right for the job. It seems to me that if you have it run every 15 minutes but sometimes a synch takes 20 minutes you could have multiple processes running at once and possibly collide.
If the driving force is a constant wait time between the variable execution times then you might need a continuously running process with a wait.
def main():
loopInt = 0
while(loopInt < 10000):
synchDatabase()
loopInt += 1
print("call #" + str(loopInt))
time.sleep(300) #sleep 5 minutes
main()
(obviously not continuous, but long running) You can set the result of while to true and it will be continuous. (comment out loopInt += 1)
Edited to add: Please see note in comments about monitoring the process as you don't want the script to hang or crash and you not be aware of it.
You might want to use a system that handles queues, for example RabbitMQ, and use Celery as the python interface to implement it. With Celery, you can add tasks (like execution of a script) to a queue or run a schedule that'll perform a task after a given interval (just like cron).
Get started http://celery.readthedocs.org/en/latest/

Python watch-dog script : load url asynchronously

I have simple Python script which do check few urls :
f = urllib2.urlopen(urllib2.Request(url))
as i have socket timeout setted on 5 seconds sometimes is bothering to wait 5sec * number of urls on results.
Is there any easy standartized way how to run those url checks asynchronously without big overhead. Script must use standart python components on vanilla ubuntu distribution (no additional installations).
Any ideas ?
I wrote something called multibench a long time ago. I used it for almost the same thing you want to do here, which was to call multiple concurrent instances of wget and see how long it takes to complete. It is a crude load testing and performance monitoring tool. You will need to adapt this somewhat, because this runs the same command n times.
Install additional software. It's a waste of time you re-invent something just because of some packaging decisions made by someone else.

GAE Backend fails to respond to start request

This is probably a truly basic thing that I'm simply having an odd time figuring out in a Python 2.5 app.
I have a process that will take roughly an hour to complete, so I made a backend. To that end, I have a backend.yaml that has something like the following:
-name: mybackend
options: dynamic
start: /path/to/script.py
(The script is just raw computation. There's no notion of an active web session anywhere.)
On toy data, this works just fine.
This used to be public, so I would navigate to the page, the script would start, and time out after about a minute (HTTP + 30s shutdown grace period I assume, ). I figured this was a browser issue. So I repeat the same thing with a cron job. No dice. Switch to a using a push queue and adding a targeted task, since on paper it looks like it would wait for 10 minutes. Same thing.
All 3 time out after that minute, which means I'm not decoupling the request from the backend like I believe I am.
I'm assuming that I need to write a proper Handler for the backend to do work, but I don't exactly know how to write the Handler/webapp2Route. Do I handle _ah/start/ or make a new endpoint for the backend? How do I handle the subdomain? It still seems like the wrong thing to do (I'm sticking a long-process directly into a request of sorts), but I'm at a loss otherwise.
So the root cause ended up being doing the following in the script itself:
models = MyModel.all()
for model in models:
# Magic happens
I was basically taking for granted that the query would automatically batch my Query.all() over many entities, but it was dying at the 1000th entry or so. I originally wrote it was computational only because I completely ignored the fact that the reads can fail.
The actual solution for solving the problem we wanted ended up being "Use the map-reduce library", since we were trying to look at each model for analysis.

long time running python script

I have application of following parts:
client->nginx->uwsgi(python)
and some python scripts can be running long time (2-6 minutes). After execution of script I should give to client content, but connection break with error "gateway timeout 504". What can I use for my case to avoid this error?
So is your goal to reduce the run time of the scripts, or to not have them time out? Browsers are going to give up on a 6 minute request no matter what you try.
Perhaps try doing the work on the server, and then polling for progress with AJAX requests?
Or, if possible, try optimizing the scripts. For example, if you have some horribly slow SQL stuff going on, try cleaning that up.
Otherwise, without more information, a more specific answer is hard to give.
I once set up a system where the "main page" contained an Iframe which showed the output of the long running program as text/plain. I think the the handler for the the Iframe content was a Python CGI script which emitted all headers and then the program output line by line under an Apache server.
I don't know whether this would work under your configuration.
This heavily depends on your server setup (i.e. how easy it is to push data back to the client), but is it possible while running your lengthy application to periodically send some “null” content (e.g plain newlines assuming your output is html) so that the browser thinks this is just a slow connection and not a stalled one?

Google App Engine for pseudo-cronjobs?

I look for a possibility to create pseudo-cronjobs as I cannot use the real jobs on UNIX.
Since Python scripts can run for an unlimited period, I thought Python would be a great solution.
On Google App Engine you can set up Python scripts and it's free. So I should use the App Engine.
The App Engine allows 160,000 external URL accesses (right?) so you should have 160000/31/24/60 = 3,6 accesses per minute.
So my script would be:
import time
import urllib
while time.clock() < 86400:
# execute pseudo-cronjob file and then wait 60 seconds
content = urllib.urlopen('http://www.example.org/cronjob_file.php').read()
time.sleep(60)
Unfortunately, I have no possibility to test the script, so my questions are:
1) Do you think this would work?
2) Is it allowed (Google TOS) to use the service for such an activity?
3) Is my calculation for the URL accesses per minute right?
Thanks in advance!
Maybe I'm misunderstanding you, but the cron config files will let you do this (without Python).
You can add something like this to you cron.yaml file:
cron:
- description: job that runs every minute
url: /cronjobs/job1
schedule: every minute
See Google's documentation for more info on scheduling.
Google has some limits on how long a task can run.
URLFetch calls made in the SDK now have a 5 second timeout, here
They allow you to schedule up to 20 cron tasks in any given day. Here
Duplicate, see cron jobs on google appengine
Cron jobs are now officaly supported on GAE:
http://code.google.com/appengine/docs/python/config/cron.html
You may want to clarify which way around you want to do it
Do you want to use appengine to RUN the job? Ie, the job runs on google's server?
or
Do you want to use your OWN code on your server, and trigger it by using google app engine?
If it's the former: google does cron now. Use that :)
If it's the latter: you could use google's cron to trigger your own, even if it's indirectly (ie, google-cron calls google-app-engine which calls your-app).
If you can, spin up a thread to do the job, so your page returns immediatly. Dont forgot: if you call http://whatever/mypage.php, and your browser dies (or in this case, google kills your process for running too long), the php script usually still runs to the end - the output just goes no where.
Failing that, try to spin up a thread (not sure if you can do that in PHP tho - I'm a C# guy new to PHP)
And if all else fails: get a better webhost! I pay $6/month or so for dreamhost.com, and I can run cron jobs on their servers - it's included. They do PHP, Rails et al. You could even ping me for a discount code :) (view profile for website etc)
Do what Nic Wise said or also outsource the cronjob using a service like www.guardiano.pm so you can actually call www.yoursite.com/myjob.php and every time you call that url something you want will be executed.
Ps is free
Pss is my pet project and is in beta

Categories