Odd Python WSGI behavior resulting in double iteration - python

The code below runs a simple WSGI app that increments both a local counter and a global counter. The purpose of this test is to make sure the app object is created fresh for each HTTP request, so the expected behavior is for the global counter to increment each request while the local counter should always be ‘1’, because it is only incremented once for each instantiation of the app class.
Actual result? Pretty much what I expected, but the surprise is that the global counter is getting incremented TWICE for each HTTP request after the first one. (The sequence is something like [1, 3, 5, 7, 9, . . .].) I know that the iter() method is only being called once per app object instantiation because the local variable is always ‘1’. So what’s going on?
This may not matter as all I really care about is making sure the WSGI container always creates a new object for each request and uses it only once, but it is really odd. I would like to know why.
I do not like unexplained side effects. Can someone give me some insight?
gctr = 0
class app(object):
html = '''<html><head><title>Simple WSGI App Class Test</title></head>
<body><h2>Simple WSGI App Class Test</h2>
<p>module counter = {}</p>
<p>global counter = {}</p>
</body></html>'''
ctr = 0
def __init__(self, environ, start_response):
self.environ = environ
self.start = start_response
def __iter__(self):
self.start('200 OK', [('Content-Type', 'text/html')])
global gctr
gctr += 1
self.ctr += 1
yield self.html.format(self.ctr, gctr)
if __name__ == '__main__':
# Using simple_server
from wsgiref.simple_server import make_server
srv = make_server('localhost', 8080, app)
srv.serve_forever()

I expect this is your browser making another request alongside the main one, which is also getting captured by your WSGI app - probably for favicon.ico. Log self.environ['PATH_INFO'] to see the exact request.

Related

python- split() doesn't work inside __init__

I am writing code to serve a html file using wsgi.When I write a straight forward function I get no error like :
from wsgiref.simple_server import make_server
import os
...
...
def app(environ, start_response):
path_info = environ["PATH_INFO"]
resource = path_info.split("/")[1] #I get no error here the split works totally fine.
Now when I try to put the code inside a class I get error NoneType has no attribute split.
Perhaps the environ inside __init__ doesn't get initialised , that's why it split returns nothing. Following is the file in which my class Candy resides :
import os
class Candy:
def __init__(self):
#self.environ = environ
#self.start = start_response
self.status = "200 OK"
self.headers = []
def __call__(self , environ , start_response):
self.environ = environ
self.start = start_response
#headers = []
def content_type(path):
if path.endswith(".css"):
return "text/css"
else:
return "text/html"
def app(self):
path_info = self.environ["PATH_INFO"]
resource = path_info.split("/")[1]
#headers = []
self.headers.append(("Content-Type", content_type(resource)))
if not resource:
resource = "login.html"
resp_file = os.path.join("static", resource)
try:
with open(resp_file, "r") as f:
resp_file = f.read()
except Exception:
self.start("404 Not Found", self.headers)
return ["404 Not Found"]
self.start("200 0K", self.headers)
return [resp_file]
Following is the server.py file where I invoke my make_server :
from wsgiref.simple_server import make_server
from candy import Candy
#from app import candy_request
candy_class = Candy()
httpd = make_server('localhost', 8000, candy_class.app)
print "Serving HTTP on port 8000..."
# Respond to requests until process is killed
httpd.serve_forever()
# Alternative: serve one request, then exit
#httpd.handle_request()
Any help ? How to get this error sorted and am I right in my assumption?
To explain what you're doing wrong here, let's start with simple concepts - what a WSGI application is.
WSGI application is just a callable that receives a request environment, and a callback function that starts a response (sends status line and headers back to user). Then, this callable must return one or more strings, that constitute the response body.
In the simplest form, that you have working it's just
def app(environ, start_response):
start_response("200 OK", [("Content-Type", "text/plain")])
return "hello, world"
make_server('localhost', 8000, app).serve_forever()
Whenever a request comes, app function gets called, it starts the response and returns a string (or it could return an iterable of multiple strings, e.g. ["hello, ", "world"])
Now, if you want it to be a class, it works like this:
class MyApp(object):
def __init__(self):
pass
def __call__(self, environ, start_response):
start_response("200 OK", [("Content-Type", "text/plain")])
return "something"
app = MyApp()
make_server("localhost", 8000, app).serve_forever()
In this case, the callable is app, and it's actually __call__ method of Caddy class instance.
When request comes, app.__call__ gets called (__call__ is the magic method that turns your class instance in a callable), and otherwise it works exactly the same as the app function from the first example. Except that you have a class instance (with self), so you can do some pre-configuration in the __init__ method. Without doing anything in __init__ it's useless. E.g., a more realistic example would be this:
class MyApp(object):
def __init__(self):
self.favorite_color = "blue"
def __call__(self, environ, start_response):
start_response("200 OK", [("Content-Type", "text/plain")])
return "my favorite color is {}".format(self.favorite_color)
...
Then, there's another thing. Sometimes you want a streaming response, generated over time. Maybe it's big, or maybe it takes a while. That's why WSGI applications can return an iterable, rather than just a string.
def app(environ, start_response):
start_response("200 OK", [("Content-Type", "text/plain")]))
yield "This was a triumph\n"
time.sleep(1)
yield "I'm making a note here\n"
time.sleep(1)
yield "HUGE SUCCESS\n"
make_server("localhost", 8000, app).serve_forever()
This function returns a generator that returns text, piece by piece. Although your browser may not always show it like this, but try running curl http://localhost:8000/.
Now, the same with classes would be:
class MyApp(object):
def __init__(self, environ, start_response):
self.environ = environ
self.start = start_response
def __iter__(self):
self.start("200 OK", [("Content-Type", "text/plain")]))
yield "This was a triumph\n"
time.sleep(1)
yield "I'm making a note here\n"
time.sleep(1)
yield "HUGE SUCCESS\n"
make_server("localhost", 8000, MyApp).serve_forever()
Here, you pass MyApp (the class) as a application callable - which it is. When request comes it gets called, it's like someone had written MyApp(environ, start_response) somewhere, so __init__ starts and creates an instance for this specific request. Then, as the instance is iterated, __iter__ starts to produce a response. After it's done, the instance is discarded.
Basically, that's it. Classes here are only convenience closures that hold data. If you don't need them, don't use classes, use plain functions - flat is better than nested.
Now, about your code.
What your code uses for callable is Candy().app. This doesn't work because it doesn't even made to receive environ and start_response it will be passed. It should probably fail with 500 error, saying something like app() takes 1 positional arguments but 3 were given.
I assume the code in your question is modified after you got that NoneType has no attribute split issue, and you had passed something to the __init__ when creating candy_instance = Candy() when your __init__ still had 2 arguments (3 with self). Not even sure what exactly it was - it should've failed earlier.
Basically, you passed the wrong objects to make_server and your class was a mix of two different ideas.
I suggest to check my examples above (and read PEP-333), decide what you actually need, and structure your Candy class like that.
If you just need to return something on every request, and you don't have a persistent state - you don't need a class at all.
If you need a persistent state (config, or, maybe, a database connection) - use a class instance, with __call__ method, and make that __call__ return the response.
If you need to respond in chunks, use either a generator function, or a class with __iter__ method. Or a class with __call__ that yields (just like a function).
Hope this helps.

Python Tornado updating shared data between requests

I have a Python Tornado app. The app contains request handlers, for which I am passing data to like (the code below is not complete, and is just to illustrate what I want):
configs = {'some_data': 1, # etc.
}
class Application(tornado.web.Application):
def __init__(self):
handlers = [('/pageone', PageOneHandler, configs),
('/pagetwo', PageTwoHandler, configs)]
settings = dict(template_path='/templates',
static_path='/static', debug=False)
tornado.web.Application.__init__(self, handlers, **settings)
# Run the instance
# ... code goes here ...
application = Application()
http_server = tornado.httpserver.HTTPServer(application)
# ... other code (bind to port, etc.)
# Callback function to update configs
some_time_period = 1000 # Once an second
tornado.ioloop.PeriodicCallback(update_configs, some_time_period).start()
tornado.ioloop.IOLoop.instance().start()
I want the update_configs function to update the configs variable defined above and have this change propagate through the handlers. For example (I know this doesn't work):
def update_configs():
configs['some_data'] += 1
# Now, assuming PageOneHandler just prints out configs['some_data'], I'd expect
# the output to be: "1" on the first load, "2" if I load the page a second
# later, "4" if I load the page two seconds after that, etc.
The problem is, the configs variable is passed along to the handlers during creation in the constructor for the Application class. How can I update configs['some_data'] in the periodic callback function?
My actual use case for this mechanism is to refresh the data stored in the configs dictionary from the database every so often.
Is there an easy way to do this without fiddling around with application.handlers (which I have tried for the past hour or so)?
Well, the simplest thing would be to pass the entire config dict to the handlers, rather than just the individual values inside the dict. Because dicts are mutable, any change you make to the values in the dict would then propagate to all the handlers:
import tornado.web
import tornado.httpserver
configs = {'some_data': 1, # etc.
}
def update_configs():
print("updating")
configs['some_data'] += 1
class PageOneHandler(tornado.web.RequestHandler):
def initialize(self, configs):
self.configs = configs
def get(self):
self.write(str(self.configs) + "\n")
class PageTwoHandler(tornado.web.RequestHandler):
def initialize(self, configs):
self.configs = configs
def get(self):
self.write(str(self.configs) + "\n")
class Application(tornado.web.Application):
def __init__(self):
handlers = [('/pageone', PageOneHandler, {'configs' : configs}),
('/pagetwo', PageTwoHandler, {'configs': configs})]
settings = dict(template_path='/templates',
static_path='/static', debug=False)
tornado.web.Application.__init__(self, handlers, **settings)
# Run the instance
application = Application()
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8888)
# Callback function to update configs
some_time_period = 1000 # Once an second
tornado.ioloop.PeriodicCallback(update_configs, some_time_period).start()
tornado.ioloop.IOLoop.instance().start()
Output:
dan#dantop:~> curl localhost:8888/pageone
{'some_data': 2}
dan#dantop:~> curl localhost:8888/pageone
{'some_data': 3}
dan#dantop:~> curl localhost:8888/pagetwo
{'some_data': 4}
dan#dantop:~> curl localhost:8888/pageone
{'some_data': 4}
To me this approach makes the most sense; the data contained in configs doesn't really belong to any one instance of a RequestHandler, it's global state shared by all RequsetHandlers, as well as your PeriodicCallback. So I don't think it makes sense to try to create X numbers of copies of that state, and then try to keep all those different copies in sync manually. Instead, just share the state across your whole process using either a custom object with class variables, or a dict, as shown above.
Another strategy, in addition to what dano mentions above is to attach the shared data to the Application object.
class MyApplication(tornado.web.Application):
def __init__(self):
self.shared_attribute = foo;
handlers = [#your handlers here]
settings = dict(#your application settings here)
super().__init__(handlers, **settings)
server = tornado.httpserver.HTTPServer(MyApplication())
server.listen(8888)
tornado.ioloop.IOLoop.instance().start()
Next you can access shared_attribute defined above in all your request handlers using self.application.shared_attribute.
You update it at one place and it immediately reflects in all your subsequent calls to the request handlers.

Use variables instead of session objects

Can someone please tell me the difference between
urls = (
"/count", "count",
"/reset", "reset")
app = web.application(urls, locals())
store = web.session.DiskStore('sessions')
session = web.session.Session(app, store, initializer={'count': 0})
class count:
def GET(self):
session.count += 1
return str(session.count)
class reset:
def GET(self):
session.kill()
return ""
if __name__ == "__main__":
app.run()
and
urls = (
"/count", "count",
"/reset", "reset")
app = web.application(urls, locals())
class count:
counting = 0
def GET(self):
count.counting += 1
return str(count.counting)
class reset:
def GET(self):
count.counting = 0
return ""
if __name__ == "__main__":
app.run()
Both their output is exactly the same as far as I can tell. If there is no difference then what is the advantage of using Session objects over variables like this ?
I am pretty new to Python and going through Zed Shaw's Learn Python the Hard Way. I was on exercise 52 where he introduces sessions when this question popped into my head.
In the second instance, all browsers that connect to your application share a single counter. In the first instance, each browser counts according to its own session.
And, as kubked pointed out, the counter is persisted on disk in the first instance.
Reference: http://webpy.org/cookbook/sessions
As far as i see in https://gitorious.org/lpthw-web/lpthw-web/source/b1ab4df58746c5d4e3dfb41e502a8192caec3ef1:web/session.py DiskStore session will keep session in file system. If server crashes you can run it once again and opened session will be still stored. With keeping data in variable, which means it will be stored in RAM, server crash cause losing it.
Of course keeping user data in session will be also better when you decide to use threads.

Does web.py session / processor work when yield is used in handlers?

I have the following two handlers for a web.py setup:
class count1:
def GET(self):
s.session.count += 1
return str(s.session.count)
class count2:
def GET(self):
s.session.count += 1
yield str(s.session.count)
The app runs on web.py shipped cherrypy (app.run()) or gevent server.
urls = (
"/count1", "count.count1",
"/count2", "count.count2",
)
session = web.session.Session(app, web.session.DiskStore('sessions'), initializer={'count': 0})
s.session = session
app = web.application(urls, locals())
print "Main: setting count to 1"
from gevent.wsgi import WSGIServer
if __name__ == "__main__":
usecherrypy = False
if usecherrypy:
app.run()
else: # gevent wsgiserver
wsgifunc = app.wsgifunc()
server = WSGIServer(('0.0.0.0', 8080), wsgifunc, log=None)
server.serve_forever()
Session works fine in count1 case but not always in count2. In the first time a page of /count2 is loaded the counter is increased once, but refreshing after that doesn't increase the counter in session i.e. the update to session is never saved. What would be wrong here?
Webpy installed from pypi or latest from github behaves the same in this case.
After digging into the code, the actual reason seems to be that, when the handler using yield, it is only being called to return the generator object, and then are returned from all enclosing processors (e.g. Session._processor which calls _save in the finally block). Web.py makes sure that the generator is completely unrolled before returning the data to the client, but the unroll process is after all processors which is completely different behavior comparing to normal function handlers.
So the question is as: is there any fixes, or workarounds (apart from calling Session._save manually) to this?
Thanks in advance for any answers!
Maybe it happens because yield returns a generator and not a value.
Refs:
http://od-eon.com/blogs/calvin/python-yield-versus-return/
What does the "yield" keyword do in Python?

Error in Python2.7 takes exactly 2 arguments (1 given)

I am a beginner in Python. I wonder why it is throwing an error.
I am getting an error saying TypeError: client_session() takes exactly 2 arguments (1 given)
The client_session method returns the SecureCookie object.
I have this code here
from werkzeug.utils import cached_property
from werkzeug.contrib.securecookie import SecureCookie
from werkzeug.wrappers import BaseRequest, AcceptMixin, ETagRequestMixin,
class Request(BaseRequest):
def client_session(self,SECRET_KEY1):
data = self.cookies.get('session_data')
print " SECRET_KEY " , SECRET_KEY1
if not data:
print "inside if data"
cookie = SecureCookie({"SECRET_KEY": SECRET_KEY1},secret_key=SECRET_KEY1)
cookie.serialize()
return cookie
print 'self.form[login.name] ', self.form['login.name']
print 'data new' , data
return SecureCookie.unserialize(data, SECRET_KEY1)
#and another
class Application(object):
def __init__(self):
self.SECRET_KEY = os.urandom(20)
def dispatch_request(self, request):
return self.application(request)
def application(self,request):
return request.client_session(self.SECRET_KEY).serialize()
# This is our externally-callable WSGI entry point
def __call__(self, environ, start_response):
"""Invoke our WSGI application callable object"""
return self.wsgi_app(environ, start_response)
Usually, that means you're calling the client_session as an unbound method, giving it only one argument. You should introspect a bit, and look at what is exactly the request you're using in the application() method, maybe it is not what you're expecting it to be.
To be sure what it is, you can always add a debug printout point:
print "type: ", type(request)
print "methods: ", dir(request)
and I expect you'll see that request is the original Request class that werkzeug gives you...
Here, you're extending the BaseRequest of werkzeug, and in application() you expect that werkzeug knows about your own implementation of the BaseRequest class magically. But if you read the zen of python, you will know that "explicit is better than implicit", so python never does stuff magically, you have to tell your library that you made a change somehow.
So after reading werkzeug's documentation, you can find out that this is actually the case:
The request object is created with the WSGI environment as first argument and will add itself to the WSGI environment as 'werkzeug.request' unless it’s created with populate_request set to False.
This may not be totally clear for people who don't know what werkzeug is, and what is the design logic behind.
But a simple google lookup, showed usage examples of BaseRequest:
http://werkzeug.pocoo.org/docs/exceptions/
https://github.com/mitsuhiko/werkzeug/blob/master/examples/i18nurls/application.py or
expand werkzeug useragent class
I only googled for from werkzeug.wrappers import BaseRequest`
So now, you should be able to guess what's to be changed in your application. As you only gave a few parts of the application, I can't advise you exactly where/what to change.

Categories