I am trying to get a grip on mod_wsgi by writing a simple WSGI server running under Apache on Linux. However I notice that requests are not always served in the order in which they arrive, even if they come from the same client/browser.
If a page e.g. contains images A and B respectively, A may be served as answer to the request for B, so it is shown at the wrong place. I'm convinced I must be making a very trivial mistake, but I am unable to find out what.
I am aware that the WSGI callable must be reentrant, and by logging requests and responses, I see that indeed it is sometimes entered a second time before the first result is served. But surely when the browser asks for B it should not get A as result from a previous GET. Or am I missing something very fundamental about HTTP?
Global Apache directives:
LoadModule wsgi_module /home/sterlicht/modWsgi/mod_wsgi.so
Virtual host directives:
WSGIScriptAlias / /home/sterlicht/debug/app.py
Found out what was wrong.
My callable was not reentrant after all.
The code in the call member of my application class used some instance data, which were overwritten by concurrent invocations.
Related
Newbie on appengine and I really don't know how to phrase the question which sadly results in me not knowing what keywords to google and I hope that i really do get help other than the bashing that a lot of people do.
I'm confused between the behavior of appengine online and the appengine on the local server.
Background info:
Btw this is in Python
Initially i assumed that , when needed or as authored
an instance of the app or module will be created.
And that instance will be the one serving multiple requests from different clients.
In this behavior any initialization code will only be run once.
But in the local development server.
Every time i add something new, specially in the main.py,
the server is able to catch the new changes,
then on browser-refresh be able to run it.
This made me think, wait...
Does it run the entire script over and over again
on every request?
Question:
Does an instance/module run the entire code on every request or is this just an added behavior to the dev server to make development easier?
Both your assumptions - about behaviour in production and development - are wrong.
In production, GAE spins up instances as required. This may be in response to increased load, or the host may simply decide after a certain amount of time to recycle an instance by killing it and starting a new one. Initialization code will always be run whenever a new instance is started.
In development, you only get a single instance. However, the server watches your file system for changes. If it detects a change to the code itself, it will restart itself, and therefore re-run the initialization code. But if you don't make any code changes between requests, the existing process continues indefinitely, and init code will not be re-run.
Later note: the issues in the original posting below have been largely resolved.
Here's the background: For an introductory comp sci course, students develop html and server-side Python 2.7 scripts using a server provided by the instructors. That server is based on CGIHTTPRequestHandler, like the one at pointlessprogramming. When the students' html and scripts seem correct, they port those files to a remote, slow Apache server. Why support two servers? Well, the initial development using a local server has the benefit of reducing network issues and dependency on the remote, weak machine that is running Apache. Eventually porting to the Apache-running machine has the benefit of publishing their results for others to see.
For the local development to be most useful, the local server should closely resemble the Apache server. Currently there is an important difference: Apache requires that a script start its response with headers that include a content-type; if the script fails to provide such a header, Apache sends the client a 500 error ("Internal Server Error"), which too generic to help the students, who cannot use the server logs. CGIHTTPRequestHandler imposes no similar requirement. So it is common for a student to write header-free scripts that work with the local server, but get the baffling 500 error after copying files to the Apache server. It would be helpful to have a version of the local server that checks for a content-type header and gives a good error if there is none.
I seek advice about creating such a server. I am new to Python and to writing servers. Here are the issues that occur to me, but any helpful advice would be appreciated.
Is a content-type header required by the CGI standard? If so, other people might benefit from an answer to the main question here. Also, if so, I despair of finding a way to disable Apache's requirement. Maybe the relevant part of the CGI RFC is section 6.3.1 (CGI Response, Content-Type): "If an entity body is returned, the script MUST supply a Content-Type field in the response."
To make a local server that checks for the content-type header, perhaps I should sub-class CGIHTTPServer.CGIHTTPRequestHandler, to override run_cgi() with a version that issues an error for a missing header. I am looking at CGIHTTPServer.py __version__ = "0.4", which was installed with Python 2.7.3. But run_cgi() does a lot of processing, so it is a little unappealing to copy all its code, just to add a couple calls to a header-checking routine. Is there a better way?
If the answer to (2) is something like "No, overriding run_cgi() is recommended," I anticipate writing a version that invokes the desired script, then checks the script's output for headers before that output is sent to the client. There are apparently two places in the existing run_cgi() where the script is invoked:
3a. When run_cgi() is executed on a non-Unix system, the script is executed using Python's subprocess module. As a result, the standard output from the script will be available as an in-memory string, which I can presumably check for headers before the call to self.wfile.write. Does this sound right?
3b. But when run_cgi() is executed on a *nix system, the script is executed by a forked process. I think the child's stdout will write directly to self.wfile (I'm a little hazy on this), so I see no opportunity for the code in run_cgi() to check the output. Ugh. Any suggestions?
If analyzing the script's output is recommended, is email.parser the standard way to recognize whether there is a content-type header? Is another standard module recommended instead?
Is there a more appropriate forum for asking the main question ("How can a CGI server based on CGIHTTPRequestHandler require...")? It seems odd to ask if there is a better forum for asking programming questions than Stack Overflow, but I guess anything is possible.
Thanks for any help.
I'm running apache, Django and wsgi. I also use this other software called SAS to do statistical analysis. Just to give you some context. My end goal is when a client hits submit on a form written in the django, the appropriate sas script is called (via a python wsgi script) which performs calculations on the server, and then redirects the client to the output page.
I have a basic script called test5.py. It looks like this:
import os
import subprocess
def application(environ, start_response):
status = '200 OK'
output = 'Running External Program!'
f = open("C:\Documents and Settings\eric\Desktop\out.txt", 'a')
f.write('hi')
f.close()
#os.system(r'start "C:\Program Files\SAS92\SASFoundation\9.2\sas.exe"')
#subprocess.call([r'C:\Program Files\SAS92\SASFoundation\9.2\sas.exe'])
#os.startfile(r'C:\Program Files\SAS92\SASFoundation\9.2\sas.exe')
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
#start_response('301 Redirect', [('Location', 'http://myserver/reports'),])
start_response(status, response_headers)
return [output]
So what happens is that the out.txt file does get created and have hi written in the file. That's quite cool. The first 3 commented lines were 3 attempts to have this same script also call sas.exe which lives on the server. I'm just trying to get any .exe to work right now, so calling paint or wordpad would be fine. Those lines however do not seems to execute in the wsgi context. If I just load the Python command line, I can get the .exes to execute just fine. Also the last comment seems to be working properly in the redirecting. I'm not sure if I need to configure apache to add executables. Please forgive me if I'm using terms incorrectly. I am still quite new to all of this.
Thanks
Hi Paulo,
I was trying to look into your last comment. I am a bit confused as to exactly what i am looking for or how to look for it. Here is some information that I have gathered. By the way i am running on windows XP and using Apache 2.2.
My apache is installed for all users as in regedit the variable ServerRoot is under HKEY_LOCAL_MACHINE (http://httpd.apache.org/docs/2.2/platform/windows.html). Also I believe SAS is installed under all users. I tested this by having my coworker sign in using her login and I still had access. I’m not sure if that is a sufficient test though.
The log I get when I run the wsgi is the following. I’m not sure if it matters that the process is empty.
[Mon Aug 20 10:33:17 2012] [info] [client 10.60.8.71] mod_wsgi (pid=5980, process='', application='..com|/test5'): Reloading WSGI script 'C:/Sites/cprm/pyscripts/test5.wsgi'.
Also I tried the .bat trick from the link I posted in the comment i posted earlier to no avail. I made a simple batch file that just echoes 'hi' and placed it in the same directory where my wsgi scripts live. I feel like there should be no access problems there, but I may be mistaken. I also just tried calling a simple python script using subprocess just to test. Also nothing happened.
Also just to show you, my httpd.conf file looks like such:
AllowOverride None
Options None
Order allow,deny
Allow from all
WSGIScriptAlias /test1 "C:/sites/cprm/pyscripts/test1.wsgi"
WSGIScriptAlias /test2 "C:/sites/cprm/pyscripts/test2.py"
WSGIScriptAlias /test3 C:/sites/cprm/pyscripts/test3.py
WSGIScriptAlias /test4 "C:/sites/cprm/pyscripts/test4.py"
WSGIScriptAlias /test5 "C:/sites/cprm/pyscripts/test5.wsgi"
WSGIScriptAlias / "C:/sites/cprm/wsgi.py"
Is this information helpful or not really? Also, am i looking for a specific environ variable or something?
Thanks again
For web applications that perform background calculations or other tasks, IMHO it is best to queue the tasks for processing instead of calling an external process from a Django view and hang everything until the task completes. This leads to better:
user experience (request returns instantly - use ajax to signal task status and present the download link once task completes)
security (background process can run under safer credentials)
scalability (tasks can be distributed among servers)
resilience (by default many webservers will send an 'error 500' if your application fails to answer under 30 seconds or so)
For a background daemon processing all entries in the queue, there are several approaches depending on how big you want to scale:
a cron job
a daemon using supervisor (or your watchdog of choice)
an AMQP module like django-celery
[edit]
The process you start from a WSGI script will run under the same user that is running the webserver. In linux it is generally 'www-data' or 'nobody', in Windows/IIS it is 'IUSR_MachineName' (or authenticated user if using IIS authentication). Check if you can start the program using the same credentials your WSGI is running under.
I have a URL route in my web.py application that I want to run to catch all URLs that hit the server, but only after any static assets are served.
For example, if theres is js/test.js in my static directory, the path http://a.com/js/tests.js should return the file contents. But I also have my url routing set up so that there is a regex that catches everything like this:
urls = ('/.*', 'CatchAllHandler')
So this should run only if no static asset was discovered. A request for http://a.com/js/test.js should return the static file test.js, but a request for http://a.com/js/nope.js should route through the CatchAllHandler.
I've looked into writing my own StaticMiddleware for this, but it will only help if the order of web.py operations is changed. Currently the middleware is executed after the URL routes have been processed. I need the middleware to run first, and let the url routing clean up the requests that were not served static assets.
The one idea I have is to use the notfound() function as my catch all handler, but that may not be best.
the url matching is python regex. You can test/play with python regex here
that said, this should work for you:
('/(?!static)(.*)', 'CatchAllHandler')
I haven't played with web.py's middleware, but my understanding.. WSGI middleware happens before web.py gets to seeing the request/response. I would think, provided your WSGI MiddleWare is properly configured, it would just work.
pouts That sucks. There is the hook stuff, which makes it really easy, I've don't that before, and it will see all the stuff before .. docs are here: http://webpy.org/cookbook/application_processors
but I guess in regards to your other comment, 'wanting it to work regardless of URL'. How would you know it's static content otherwise? I'm confused greatly. The EASIEST way, since for production you want some other web server running your web.py scripts, is to push all the static content into the web server. Then you can of course do whatever you want in the web server that needs doing. This is exactly what happens with mod_wsgi and apache for instance (you change /static to point to the directory IN the web server).
Perhaps if you shared an actual example of what you need done, I could help you more. Otherwise I've given you now 3 different ways to handle the problem (excluding using WSGI middleware). How many more do you need? :P
I have Django running in Apache via mod_wsgi. I believe Django is caching my pages server-side, which is causing some of the functionality to not work correctly.
I have a countdown timer that works by getting the current server time, determining the remaining countdown time, and outputting that number to the HTML template. A javascript countdown timer then takes over and runs the countdown for the user.
The problem arises when the user refreshes the page, or navigates to a different page with the countdown timer. The timer appears to jump around to different times sporadically, usually going back to the same time over and over again on each refresh.
Using HTTPFox, the page is not being loaded from my browser cache, so it looks like either Django or Apache is caching the page. Is there any way to disable this functionality? I'm not going to have enough traffic to worry about caching the script output. Or am I completely wrong about why this is happening?
[Edit] From the posts below, it looks like caching is disabled in Django, which means it must be happening elsewhere, perhaps in Apache?
[Edit] I have a more thorough description of what is happening: For the first 7 (or so) requests made to the server, the pages are rendered by the script and returned, although each of those 7 pages seems to be cached as it shows up later. On the 8th request, the server serves up the first page. On the 9th request, it serves up the second page, and so on in a cycle. This lasts until I restart apache, when the process starts over again.
[Edit] I have configured mod_wsgi to run only one process at a time, which causes the timer to reset to the same value in every case. Interestingly though, there's another component on my page that displays a random image on each request, using order('?'), and that does refresh with different images each time, which would indicate the caching is happening in Django and not in Apache.
[Edit] In light of the previous edit, I went back and reviewed the relevant views.py file, finding that the countdown start variable was being set globally in the module, outside of the view functions. Moving that setting inside the view functions resolved the problem. So it turned out not to be a caching issue after all. Thanks everyone for your help on this.
From my experience with mod_wsgi in Apache, it is highly unlikely that they are causing caching. A couple of things to try:
It is possible that you have some proxy server between your computer and the web server that is appropriately or inappropriately caching pages. Sometimes ISPs run proxy servers to reduce bandwidth outside their network. Can you please provide the HTTP headers for a page that is getting cached (Firebug can give these to you). Headers that I would specifically be interested in include Cache-Control, Expires, Last-Modified, and ETag.
Can you post your MIDDLEWARE_CLASSES from your settings.py file. It possible that you have a Middleware that performs caching for you.
Can you grep your code for the following items "load cache", "django.core.cache", and "cache_page". A *grep -R "search" ** will work.
Does the settings.py (or anything it imports like "from localsettings import *") include CACHE_BACKEND?
What happens when you restart apache? (e.g. sudo services apache restart). If a restart clears the issue, then it might be apache doing caching (it is possible that this could also clear out a locmen Django cache backend)
Did you specifically setup Django caching? From the docs it seems you would clearly know if Django was caching as it requires work beforehand to get it working. Specifically, you need to define where the cached files are saved.
http://docs.djangoproject.com/en/dev/topics/cache/
Are you using a multiprocess configuration for Apache/mod_wsgi? If you are, that will account for why different responses can have a different value for the timer as likely that when timer is initialised will be different for each process handling requests. Thus why it can jump around.
Have a read of:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Work out in what mode or configuration you are running Apache/mod_wsgi and perhaps post what that configuration is. Without knowing, there are too many unknowns.
I just came across this:
Support for Automatic Reloading To help deployment tools you can
activate support for automatic reloading. Whenever something changes
the .wsgi file, mod_wsgi will reload all the daemon processes for us.
For that, just add the following directive to your Directory section:
WSGIScriptReloading On