How can I keep Twisted from sending traceback to the browser when it has an error? It's been exposing file paths and code.
Just about every server framework that serves up something human readable will have an option to serve up tracebacks (for development) or not (for production).
In Twisted, it's called displayTracebacks, and it's documented as a member of twisted.web.server.Site. You configure it the same way(s) you configure anything else in Twisted. (The corresponding one-word option, e.g., for command-line flags, is notracebacks.)
If you're wondering why it defaults to True, you may want to read Glyph's rationale on the 2003 feature request that added this flag in the first place:
Nope, this should be turned on by default, since when you
are using defaults you are typically developing. System
administrators can make this a site-local default by adding
to sitecustomize.py or somesuch.
(As the 11-years-wiser version of Glyph points out in a comment, it's probably better to set it in the application than to use a site-local default. But the basic idea is the same. Set it to False at any level, and you'll get exactly the behavior you're after.)
Related
Later note: the issues in the original posting below have been largely resolved.
Here's the background: For an introductory comp sci course, students develop html and server-side Python 2.7 scripts using a server provided by the instructors. That server is based on CGIHTTPRequestHandler, like the one at pointlessprogramming. When the students' html and scripts seem correct, they port those files to a remote, slow Apache server. Why support two servers? Well, the initial development using a local server has the benefit of reducing network issues and dependency on the remote, weak machine that is running Apache. Eventually porting to the Apache-running machine has the benefit of publishing their results for others to see.
For the local development to be most useful, the local server should closely resemble the Apache server. Currently there is an important difference: Apache requires that a script start its response with headers that include a content-type; if the script fails to provide such a header, Apache sends the client a 500 error ("Internal Server Error"), which too generic to help the students, who cannot use the server logs. CGIHTTPRequestHandler imposes no similar requirement. So it is common for a student to write header-free scripts that work with the local server, but get the baffling 500 error after copying files to the Apache server. It would be helpful to have a version of the local server that checks for a content-type header and gives a good error if there is none.
I seek advice about creating such a server. I am new to Python and to writing servers. Here are the issues that occur to me, but any helpful advice would be appreciated.
Is a content-type header required by the CGI standard? If so, other people might benefit from an answer to the main question here. Also, if so, I despair of finding a way to disable Apache's requirement. Maybe the relevant part of the CGI RFC is section 6.3.1 (CGI Response, Content-Type): "If an entity body is returned, the script MUST supply a Content-Type field in the response."
To make a local server that checks for the content-type header, perhaps I should sub-class CGIHTTPServer.CGIHTTPRequestHandler, to override run_cgi() with a version that issues an error for a missing header. I am looking at CGIHTTPServer.py __version__ = "0.4", which was installed with Python 2.7.3. But run_cgi() does a lot of processing, so it is a little unappealing to copy all its code, just to add a couple calls to a header-checking routine. Is there a better way?
If the answer to (2) is something like "No, overriding run_cgi() is recommended," I anticipate writing a version that invokes the desired script, then checks the script's output for headers before that output is sent to the client. There are apparently two places in the existing run_cgi() where the script is invoked:
3a. When run_cgi() is executed on a non-Unix system, the script is executed using Python's subprocess module. As a result, the standard output from the script will be available as an in-memory string, which I can presumably check for headers before the call to self.wfile.write. Does this sound right?
3b. But when run_cgi() is executed on a *nix system, the script is executed by a forked process. I think the child's stdout will write directly to self.wfile (I'm a little hazy on this), so I see no opportunity for the code in run_cgi() to check the output. Ugh. Any suggestions?
If analyzing the script's output is recommended, is email.parser the standard way to recognize whether there is a content-type header? Is another standard module recommended instead?
Is there a more appropriate forum for asking the main question ("How can a CGI server based on CGIHTTPRequestHandler require...")? It seems odd to ask if there is a better forum for asking programming questions than Stack Overflow, but I guess anything is possible.
Thanks for any help.
It seems obvious that it would use the twisted names api and not any blocking way to resolve host names.
However digging in the source code, I have been unable to find the place where the name resolution occurs. Could someone point me to the relevant source code where the host resolution occurs ( when trying to do a connectTCP, for example).
I really need to be sure that connectTCP wont use blocking DNS resolution.
It seems obvious, doesn't it?
Unfortunately:
Name resolution is not always configured in the obvious way. You think you just have to read /etc/resolv.conf? Even in the specific case of Linux and DNS, you might have to look in an arbitrary number of files looking for name servers.
Name resolution is much more complex than just DNS. You have to do mDNS resolution, possibly look up some LDAP computer records, and then you have to honor local configuration dictating the ordering between these such as /etc/nsswitch.conf.
Name resolution is not exposed via a standard or useful non-blocking API. Even the glibc-specific getaddrinfo_a exposes its non-blockingness via SIGIO, not just a file descriptor you can watch. Which means that, like POSIX AIO, it's probably just a kernel thread behind your back anyway.
For these reasons, among others, Twisted defaults to using a resolver that just calls gethostbyname in a thread.
However, if you know that for your application it is appropriate to have DNS-only hostname resolution, and you'd like to use twisted.names rather than your platform resolver - in other words, if scale matters more to you than esoteric name-resolution use-cases - that is supported. You can install a resolver from twisted.names.client onto the reactor, appropriately configured for your application and all future built-in name resolutions will be made with that resolver.
I'm not massively familiar with twisted, I only recently started used it. It looks like it doesn't block though, but only on platforms that support threading.
In twisted.internet.base in ReactorBase it looks like it does the resolving through it's resolve method which returns a deferred from self.resolver.getHostByName.
self.resolver is an instance of BlockingResolver by default which does block, but it looks like that if the platform supports threading the resolver instance is replaced by ThreadedResolver in the ReactorBase._initThreads method.
I am using the python-ldap module to (amongst other things) search for groups, and am running into the server's size limit and getting a SIZELIMIT_EXCEEDED exception. I have tried both synchronous and asynchronous searches and hit the problem both ways.
You are supposed to be able to work round this by setting a paging control on the search, but according to the python-ldap docs these controls are not yet implemented for search_ext(). Is there a way to do this in Python? If the python-ldap library does not support it, is there another Python library that does?
Here are some links related to paging in python-ldap.
Documentation: http://www.python-ldap.org/doc/html/ldap-controls.html#ldap.controls.SimplePagedResultsControl
Example code using paging: http://www.novell.com/coolsolutions/tip/18274.html
More example code: http://google-apps-for-your-domain-ldap-sync.googlecode.com/svn/trunk/ldap_ctxt.py
After some discussion on the python-ldap-dev mailing list, I can answer my own question.
Page controls ARE supported by the Python lDAP module, but the docs had not been updated for search_ext to show that. The example linked by Gorgapor shows how to use the ldap.controls.SimplePagedResultsControl to read the results in pages.
However there is a gotcha. This will work with Microsoft Active Directory servers, but not with OpenLDAP servers (and possibly others, such as Sun's). The LDAP controls RFC is ambiguous as to whether paged controls should be allowed to override the server's sizelimit setting. On ActiveDirectory servers they can by default while on OpenLDAP they cannot, but I think there is a server setting that will allow them to.
So even if you implement the paged control, there is still no guarantee that it will get all the objects that you want. Sigh
Also paged controls are only available with LDAP v3, but I doubt that there are many v2 servers in use.
How can I be certain that my application is running on development server or not? I suppose I could check value of settings.DEBUG and assume if DEBUG is True then it's running on development server, but I'd prefer to know for sure than relying on convention.
I put the following in my settings.py to distinguish between the standard dev server and production:
import sys
RUNNING_DEVSERVER = (len(sys.argv) > 1 and sys.argv[1] == 'runserver')
This also relies on convention, however.
(Amended per Daniel Magnusson's comment)
server = request.META.get('wsgi.file_wrapper', None)
if server is not None and server.__module__ == 'django.core.servers.basehttp':
print('inside dev')
Of course, wsgi.file_wrapper might be set on META, and have a class from a module named django.core.servers.basehttp by extreme coincidence on another server environment, but I hope this will have you covered.
By the way, I discovered this by making a syntatically invalid template while running on the development server, and searched for interesting stuff on the Traceback and the Request information sections, so I'm just editing my answer to corroborate with Nate's ideas.
Usually this works:
import sys
if 'runserver' in sys.argv:
# you use runserver
Typically I set a variable called environment and set it to "DEVELOPMENT", "STAGING" or "PRODUCTION". Within the settings file I can then add basic logic to change which settings are being used, based on environment.
EDIT: Additionally, you can simply use this logic to include different settings.py files that override the base settings. For example:
if environment == "DEBUG":
from debugsettings import *
Relying on settings.DEBUG is the most elegant way AFAICS as it is also used in Django code base on occasion.
I suppose what you really want is a way to set that flag automatically without needing you update it manually everytime you upload the project to production servers.
For that I check the path of settings.py (in settings.py) to determine what server the project is running on:
if __file__ == "path to settings.py in my development machine":
DEBUG = True
elif __file__ in [paths of production servers]:
DEBUG = False
else:
raise WhereTheHellIsThisServedException()
Mind you, you might also prefer doing this check with environment variables as #Soviut suggests. But as someone developing on Windows and serving on Linux checking the file paths was plain easier than going with environment variables.
I came across this problem just now, and ended up writing a solution similar to Aryeh Leib Taurog's. My main difference is that I want to differentiate between a production and dev environments when running the server, but also when running some one-off scripts for my app (which I run like DJANGO_SETTINGS_MODULE=settings python [the script] ). In this case, simply looking at whether argv[1] == runserver isn't enough. So what I came up with is to pass an extra command-line argument when I run the devserver, and also when I run my scripts, and just look for that argument in settings.py. So the code looks like this:
if '--in-development' in sys.argv:
## YES! we're in dev
pass
else:
## Nope, this is prod
pass
then, running the django server becomes
python manage.py runserver [whatever options you want] --in-development
and running my scripts is as easy as
DJANGO_SETTINGS_MODULE=settings python [myscript] --in-development
Just make sure the extra argument you pass along doens't conflict with anything django (in reality I use my app's name as part of the argument).
I think this is pretty decent, as it lets me control exactly when my server and scripts will behave as prod or dev, and I'm not relying on anyone else's conventions, other than my own.
EDIT: manage.py complains if you pass unrecognized options, so you need to change the code in settings.py to be something like
if sys.argv[0] == 'manage.py' or '--in-development' in sys.argv:
# ...
pass
Although this works, I recognize it's not the most elegant of solutions...
If you want to switch your settings files automatically dependent on the runtime environment
you could just use something that differs in environ, e.g.
from os import environ
if environ.get('_', ''):
print "This is dev - not Apache mod_wsgi"
You can determine whether you're running under WSGI (mod_wsgi, gunicorn, waitress, etc.) vs. manage.py (runserver, test, migrate, etc.) or anything else:
import sys
WSGI = 'django.core.wsgi' in sys.modules
settings.DEBUG could be True and running under Apache or some other non-development server. It will still run. As far as I can tell, there is nothing in the run-time environment short of examining the pid and comparing to pids in the OS that will give you this information.
I use:
DEV_SERVERS = [
'mymachine.local',
]
DEVELOPMENT = platform.node() in DEV_SERVERS
which requires paying attention to what is returned by .node() on your machines. It's important that the default be non-development so that you don't accidentally expose sensitive development information.
You could also look into more complicated ways of uniquely identifying computers.
One difference between the development and deployment environment is going to be the server that it’s running on. What exactly is different will depend on your dev and deployment environments.
Knowing your own dev and deploy environments, the HTTP request variables could be used to distinguish between the two. Look at request variables like request.META.HTTP_HOST, request.META.SERVER_NAME and request.META.SERVER_PORT and compare them in the two environments.
I bet you’ll find something quite obvious that’s different and can be used to detect your development environment. Do the test in settings.py and set a variable that you can use elsewhere.
Inspired by Aryeh's answer, the trick I devised for my own use is to just look for the name of my management script in sys.argv[0]:
USING_DEV_SERVER = "pulpdist/manage_site.py" in sys.argv[0]
(My use case is to automatically enable Django native authentication when running the test server - when running under Apache, even on development servers, all authentication for my current project is handled via Kerberos)
You could check request.META["SERVER_SOFTWARE"] value:
dev_servers = ["WSGIServer", "Werkzeug"]
if any(server in request.META["SERVER_SOFTWARE"] for server in dev_servers):
print("is local")
Simple you may check the path you work on server. Something like:
import os
SERVER = True if os.path.exists('/var/www/your_project') else False
I am using Python and the Twisted framework to connect to an FTP site to perform various automated tasks. Our FTP server happens to be Pure-FTPd, if that's relevant.
When connecting and calling the list method on an FTPClient, the resulting FTPFileListProtocol's files collection does not contain any directories or file names that contain a space (' ').
Has anyone else seen this? Is the only solution to create a sub-class of FTPFileListProtocol and override its unknownLine method, parsing the file/directory names manually?
Firstly, if you're performing automated tasks on a retrieived FTP listing then you should probably be looking at NLST rather than LIST as noted in RFC 959 section 4.1.3:
NAME LIST (NLST)
...
This command is intended to return information that
can be used by a program to further process the
files automatically.
The Twisted documentation for LIST says:
It can cope with most common file listing formats.
This make me suspicious; I do not like solutions that "cope". LIST was intended for human consumption not machine processing.
If your target server supports them then you should prefer MLST and MLSD as defined in RFC 3659 section 7:
7. Listings for Machine Processing (MLST and MLSD)
The MLST and MLSD commands are intended to standardize the file and
directory information returned by the server-FTP process. These
commands differ from the LIST command in that the format of the
replies is strictly defined although extensible.
However, these newer commands may not be available on your target server and I don't see them in Twisted. Therefore NLST is probably your best bet.
As to the nub of your problem, there are three likely causes:
The processing of the returned results is incorrect (Twisted may be at fault, as you suggest, or perhaps elsewhere)
The server is buggy and not sending a correct (complete) response
The wrong command is being sent (unlikely with straight NLST/LIST, but some servers react differently if arguments are supplied to these commands)
You can eliminate (2) and (3) and prove that the cause is (1) by looking at what is sent over the wire. If this option is not available to you as part of the Twisted API or the Pure-FTPD server logging configuration, then you may need to break out a network sniffer such as tcpdump, snoop or WireShark (assuming you're allowed to do this in your environment). Note that you will need to trace not only the control connection (port 21) but also the data connection (since that carries the results of the LIST/NLST command). WireShark is nice since it will perform the protocol-level analysis for you.
Good luck.
This is somehow expected. FTPFileListProtocol isn't able to understand every FTP output, because, well, some are wacky. As explained in the docstring:
If you need different evil for a wacky FTP server, you can
override either C{fileLinePattern} or C{parseDirectoryLine()}.
In this case, it may be a bug: maybe you can improve fileLinePattern and makes it understand filename with spaces. If so, you're welcome to open a bug in the Twisted tracker.