Why are CherryPy object attributes persistent between requests?

Why are CherryPy object attributes persistent between requests? - python

I was writing debugging methods for my CherryPy application. The code in question was (very) basically equivalent to this:
import cherrypy
class Page:
def index(self):
try:
self.body += 'okay'
except AttributeError:
self.body = 'okay'
return self.body
index.exposed = True
cherrypy.quickstart(Page(), config='root.conf')
I was surprised to notice that from request to request, the output of self.body grew. When I visited the page from one client, and then from another concurrently-open client, and then refreshed the browsers for both, the output was an ever-increasing string of "okay"s. In my debugging method, I was also recording user-specific information (i.e. session data) and that, too, showed up in both users' output.
I'm assuming that's because the python module is loaded into working memory instead of being re-run for every request.
My question is this: How does that work? How is it that self.debug is preserved from request to request, but cherrypy.session and cherrypy.response aren't?
And is there any way to set an object attribute that will only be used for the current request? I know I can overwrite self.body per every request, but it seems a little ad-hoc. Is there a standard or built-in way of doing it in CherryPy?
(second question moved to How does CherryPy caching work?)

synthesizerpatel's analysis is correct, but if you really want to store some data per request, then store it as an attribute on cherrypy.request, not in the session. The cherrypy.request and .response objects are new for each request, so there's no fear that any of their attributes will persist across requests. That is the canonical way to do it. Just make sure you're not overwriting any of cherrypy's internal attributes! cherrypy.request.body, for example, is already reserved for handing you, say, a POSTed JSON request body.
For all the details of exactly how the scoping works, the best source is the source code.

You hit the nail on the head with the observation that you're getting the same data from self.body because it's the same in memory of the Python process running CherryPy.
self.debug maintains 'state' for this reason, it's an attribute of the running server.
To set data for the current session, use cherrypy.session['fieldname'] = 'fieldvalue', to get data use cherrypy.session.get('fieldname').
You (the programmer) do not need to know the session ID, cherrypy.session handles that for you -- the session ID is automatically generated on the fly by cherrypy and is persisted by exchanging a cookie between the browser and server on subsequent query/response interactions.
If you don't specify a storage_type for cherrypy.session in your config, it'll be stored in memory (accessible to the server and you), but you can also store the session files on disk if you wish which might be a handy way for you to debug without having to write a bunch of code to dig out session IDs or key/pair values from the running server.
For more info check out http://www.cherrypy.org/wiki/CherryPySessions

Related

Session object in python

Does a Session object maintain the same TCP connection with a client?
In the code below, a request from the client is submitted to a handler, the handler creates a sessions object why does session["count"] on an object give a dictionary?
A response is then given back to the client, upon another request is the code re-executed?
So that another session object is created?
How does the session store the previous count information if it did not return a cookie to the client?
from appengine_utilities import sessions
class SubmitHandler(webapp.RequestHandler):
def get(self):
session = sessions.Session()
if "count" in session:
session["count"]=session["count"]+1
else:
session["count"]=1
template_values={'message':"You have clicked:"+str(session["count"])}
# render the page using the template engine
path = os.path.join(os.path.dirname(__file__),'index.html')
self.response.out.write(template.render(path,template_values))

You made several questions so let's go one by one:
Sessions are not related to TCP connections. A TCP connection is maintained when both client and server agreed upon that using the HTTP Header keep-alive. (Quoted from Pablo Santa Cruz in this answer).
Looking at the module session.py in line 1010 under __getitem__ definition I've found the following TODO: It's broke here, but I'm not sure why, it's returning a model object. Could be something along these lines, I haven't debug it myself.
From appengine_utilities documentation sessions are stored in Datastore and Memcache or kept entirely as cookies. The first option also involves sending a token to the client to identify it in subsequent requests. Choosing one or another depends on your actual settings or the default ones if you haven't configured your own. Default settings are defined to use the Datastore option.
About code re-execution you could check that yourself adding some logging code to count how many times is the function executed.
Something important, I have noticed that this library had it's latest update on 2nd of January 2016, so it has gone unmaintained for 4 years. It would be best if you change to an up to date library, for example the webapp2 session module. Furthermore, Python 2 is sunsetting by this year (1st January 2020) so you might consider switching to python 3 instead.
PD: I found the exact code you posted under this website. In case you took it from there consider next time to include a reference/citation to it's origin.

Can you hook pyramid into twisted, skipping the wsgi part?

Given that :
WSGI doesn't play very well with async.
Twisted ergonomics suck.
Pyramid is very clean and component oriented.
How could I use Pyramid and Twisted ?
I can imagine making a twisted protocol to get the raw HTML request. But then I can't see how to parse it into a pyramid request objects. All documented pyramid tools seems to expect some wsgi interface at some point.
I could use waitress code to parse the request and turn it into a WSGI env then pass the env to pyramid but that's a lot of work with many issues I'm sure I can't even imagine down the road.
I know twisted includes a WSGI server, but it implies synchronicity in the app code, which does not serve my purpose. I want to be able to use the request and response objects, renderers, routers, and others pyramid tools in a twisted asynchronous protocol, with an asynchronous, non blocking app code as well. Hence I won't want to use WSGI.
Twisted API is verbose, heavy and uninuitive compared to any other asynchronous toolkit you'll find in Python or even other languages. Hence the critic about its ergonomics. I can use it, but training newcomers in my team to do it has a high cost. I wish to lower it.
Indeed, it packs a lot of power that I want to use.
To elaborate on my needs, I'm building a tool using crossbar.io and cyclone to have a WAMP/HTTP framework a bit friendlier to my team that the current tools. But cyclone is not as complete as pyramid, and I was hoping pyramid components were decoupled enough that the WSGI paradigm was not enforced, so I could leverage the tremendous work they did on it. All I need is an entry point : somewhere to get the HTML, and parse it into a request objet, and somewhere to take a response object, and returns HTML to the client. I wish i don't have to write a protocol manually for this, http is tricky and I'm sure I'll get it wrong in many ways.
One precision : i don't wish to use the full pyramid framework, just some components here and there, such as rooting, cookie parsing, CSRF protection, etc. I won't use their view system for it assumes a synchronous API.
Looking at Pyramid, I can see that it expects the entire request be be parsed and turned into a request object. it also returns the response as an object as well. So a part of the problem, to hook twisted and pyramid together, is to :
get the http request text as one big chunk from twisted;
parse it into the request object somehow (couldn't find a simple function to do this, but if I can turn it into an WSGI environ + request object, pyramid can convert it to it's format).
get the pyramid response object and turn it into a generator of strings (an adaptor can be find since that's what WSGI does).
Send the response back with twisted from this generator of strings.
Alternatives can be to use something simpler than pyramid like werkzeug for the glue.

Twisted Web lets you interpret HTTP request bodies (regardless of content-type, HTML or otherwise) incrementally as they're received - but it doesn't make doing so very easy. There's a very old ticket that we never seem to make much progress on for improving this situation. Until it's resolved, there probably isn't a better answer than the one I'm about to give. This incremental HTTP request body delivery, I think, is what you're looking for here (because you said you expect requests to "be a big HTML chunk").
The hook for incremental request body handling is Request.handleContentChunk. You can see a complete demonstration of its use in my answer to Python server for streaming request body content.
This gives you the data as it arrives at the server. If you want to use Pyramid, you'll have to construct a Pyramid request that uses this data. Most of the initialization of the Pyramid request object should be straightforward (eg filling the environ dictionary with the request headers - you can take these from Request.requestHeaders). The slightly trickier part will be initializing the Pyramid request object's body - which is supposed to be a file-like object that provides synchronous access to the request body.
On the one hand, if you dispatch the request before the request body has been completely received then you avoid the cost of buffering the entire request body in memory. On the other hand, if you let application code begin to read the request body then you have to deal with the circumstance that it tries to read beyond the point in the data which has actually arrived at the server. This can be dealt with. The body file-like object is expected to present a blocking interface. All you have to do is block until the data is available.
Here's a brief (incomplete, not meant to actually work) sketch of what I mean:
# XXX Note: Queue is not actually thread-safe. Use a safer primitive.
from Queue import Queue
class Body(object):
def __init__(self):
self._buffer = Queue()
self._pending = b""
self._eof = False
def read(self, how_many):
if self._eof:
return b""
if self._pending == b"":
data = self._buffer.get()
if data is None:
self._eof = True
return b""
else:
self._pending = data
if self._pending is None:
result = self._pending[:how_many]
self._pending = self._pending[how_many:]
return result
def _add_data(self, data):
self._buffer.put(data)
You can create an instance of this type, initialize the Pyramid request object's body attribute with it, and then call _add_data on it in the Twisted Request class's handleContentChunk callback.
You could also implement this as an enhancement to Twisted's own WSGI server. For the sake of simplicity, Twisted's WSGI server does read the entire request body before dispatching the request to the WSGI application - but it doesn't have to. If this is the only problem with WSGI then it'd be better to improve the quality of the WSGI implementation and keep the interface rather than both implementing the improvement and stepping outside of the interface (tying you more closely to both Twisted and Pyramid - unnecessarily).
The second half of the problem, generating response bodies incrementally, shouldn't really be a problem. Twisted's WSGI container will write out response data as the WSGI application object yields it. Or if you use twisted.web.resource instead of the WSGI interface, you can call request.write as many times as you like, at any time you like (up until you call request.finish). The only trick is that if you want to do this you must return NOT_DONE_YET from the render method.

Flask: setting application and request-specific attributes?

I am writing an application which connects to a database. I want to create that db connection once, and then reuse that connection throughout the life of the application.
I also want to authenticate users. A user's auth will live for only the life of a request.
How can I differentiate between objects stored for the life of a flask app, versus specific to the request? Where would I store them so that all modules (and subsequent blueprints) have access to them?
Here is my sample app:
from flask import Flask, g
app = Flask(__name__)
#app.before_first_request
def setup_database(*args, **kwargs):
print 'before first request', g.__dict__
g.database = 'DATABASE'
print 'after first request', g.__dict__
#app.route('/')
def index():
print 'request start', g.__dict__
g.current_user = 'USER'
print 'request end', g.__dict__
return 'hello'
if __name__ == '__main__':
app.run(debug=True, port=6001)
When I run this (Flask 0.10.1) and navigate to http://localhost:6001/, here is what shows up in the console:
$ python app.py
* Running on http://127.0.0.1:6001/
* Restarting with reloader
before first request {}
after first request {'database': 'DATABASE'}
request start {'database': 'DATABASE'}
request end {'current_user': 'USER', 'database': 'DATABASE'}
127.0.0.1 - - [30/Sep/2013 11:36:40] "GET / HTTP/1.1" 200 -
request start {}
request end {'current_user': 'USER'}
127.0.0.1 - - [30/Sep/2013 11:36:41] "GET / HTTP/1.1" 200 -
That is, the first request is working as expected: flask.g is holding my database, and when the request starts, it also has my user's information.
However, upon my second request, flask.g is wiped clean! My database is nowhere to be found.
Now, I know that flask.g used to apply to the request only. But now that it is bound to the application (as of 0.10), I want to know how to bind variables to the entire application, rather than just a single request.
What am I missing?
edit: I'm specifically interested in MongoDB - and in my case, maintaining connections to multiple Mongo databases. Is my best bet to just create those connections in __init__.py and reuse those objects?

flask.g will only store things for the duration of a request. The documentation mentioned that the values are stored on the application context rather than the request, but that is more of an implementation issue: it doesn't change the fact that objects in flask.g are only available in the same thread, and during the lifetime of a single request.
For example, in the official tutorial section on database connections, the connection is made once at the beginning of the request, then terminated at the end of the request.
Of course, if you really wanted to, you could create the database connection once, store it in __init__.py, and reference it (as a global variable) as needed. However, you shouldn't do this: the connection could close or timeout, and you could not use the connection in multiple threads.
Since you didn't specify HOW you will be using Mongo in Python, I assume you will be using PyMongo, since that handles all of the connection pooling for you.
In this case, you would do something like this...
from flask import Flask
from pymongo import MongoClient
# This line of code does NOT create a connection
client = MongoClient()
app = Flask()
# This can be in __init__.py, or some other file that has imported the "client" attribute
#app.route('/'):
def index():
posts = client.database.posts.find()
You could, if you wish, do something like this...
from flask import Flask, g
from pymongo import MongoClient
# This line of code does NOT create a connection
client = MongoClient()
app = Flask()
#app.before_request
def before_request():
g.db = client.database
#app.route('/'):
def index():
posts = g.db.posts.find()
This really isn't all that different, however it can be helpful for logic that you want to perform on every request (such as setting g.db to a specific database depending on the user that is logged in).
Finally, you can realize that most of the work of setting up PyMongo with Flask is probably done for you in Flask-PyMongo.
Your other question deals with how you keep track of stuff specific to the user that is logged in. Well, in this case, you DO need to store some data that sticks around with the connection. flask.g is cleared at the end of the reuquest, so that's no good.
What you want to use is sessions. This is a place where you can store values that is (with the default implementation) stored in a cookie on the user's browser. Since the cookie will be passed along with every request the user's browser makes to your web site, you will have available the data you put in the session.
Keep in mind, though, that the session is NOT stored on the server. It is turned into a string that is passed back and forth to the user. Therefore, you can't store things like DB connections onto it. You would instead store identifiers (like user IDs).
Making sure that user authentication works is VERY hard to get right. The security concerns that you need to make sure of are amazingly complex. I would strongly recommend using something like Flask-Login to handle this for you. You can still use the session for storing other items as needed, or you can let Flask-Login handle determining the user ID and store the values you need in the database and retrieving them from the database in every request.
So, in summary, there are a few different ways to do what you want to do. Each have their usages.
Globals are good for items that are thread-safe (such as the PyMongo's MongoClient).
flask.g can be used for storing data in the lifetime of a request. With SQLAlchemy-based flask apps, a common thing to do is to ensure that all changes happen at once, at the end of a request using an after_request method. Using flask.g for something like this is very helpful.
The Flask session can be used to store simple data (strings and numbers, not connection objects) that can be used on subsequent requests that come from the same user. This is entirely dependent on using cookies, so at any point the user could delete the cookie and everything in the "session" will be lost. Therefore, you probably want to store much of your data in databases, with the session used to identify the data that relates to the user in the session.

"bound to the application" does not mean what you think it means. It means that g is bound to the currently running request. Quoth the docs:
Flask provides you with a special object that ensures it is only valid for the active request and that will return different values for each request.
It should be noted that Flask's tutorials specifically do not persist database objects, but that this is not normative for any application of substantial size. If you're really interested in diving down the rabbit hole, I suggest a database connection pooling tool. (such as this one, mentioned in the SO answer ref'd above)

I suggest you use session to manage user information. Sessions help you keep information b/w multiple requests and flask provides you a session framework already.
from flask import session
session['usename'] = 'xyz'
Look at the extension Flask-Login. It is well designed to handle user authentications.
For database, I suggest looking at Flask-SQLAlchemy extension. This takes care of initialization, pooling, teardowns etc. for you out of the box. All you need to do is define the database URI in a config and bind it to the application.
from flask.ext.sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db'
db = SQLAlchemy(app)

Class instance in Python Bottle application, is it shared between threads/processes?

I have created a class in a Bottle application which handles and stores URL information and is created each time a http request is made:
#route('/<fullurl:path>')
def page_req(fullurl=''):
urlData = urlReq(request.urlparts[1], fullurl)
urlData is the instance name and urlReq is the class name.
Obviously the urlData instance will contain information generated from one request. I'm just wondering what happens if another request comes in before the cycle of the first request has finished and sent its output. Will the second request change the data in urlData or will there be two separate processes each with their own version of urlData?
I've been reading the WSGI processes/threads information and the Bottle docs all afternoon and it's still not immediately clear. I have tried writing a small automated script fire multiple requests at the development server but it seems to hold excess requests off til one has finished. Hope I've been clear enough.

bottle.request is a thread-safe instance of LocalRequest(). If accessed from within a request callback, this instance always refers to the current request (even on a multithreaded server).
see http://bottlepy.org/docs/dev/api.html#bottle.request

Accessing Request from init.py in Pyramid

When setting up a Pyramid app and adding settings to the Configurator, I'm having issues understanding how to access information from request, like request.session and such. I'm completely new at using Pyramid and I've searched all over the place for information on this but found nothing.
What I want to do is access information in the request object when sending out exception emails on production. I can't access the request object, since it's not global in the __init__.py file when creating the app. This is what I've got now:
import logging
import logging.handlers
from logging import Formatter
config.include('pyramid_exclog')
logger = logging.getLogger()
gm = logging.handlers.SMTPHandler(('localhost', 25), 'email#email.com', ['email#email.com'], 'Error')
gm.setLevel(logging.ERROR)
logger.addHandler(gm)
This works fine, but I want to include information about the logged in user when sending out the exception emails, stored in session. How can I access that information from __init__.py?

Attempting to make request a global variable, or somehow store a pointer to "current" request globally (if that's what you're going to try with subscribing to NewRequest event) is not a terribly good idea - a Pyramid application can have more than one thread of execution, so more than one request can be active within a single process at the same time. So the approach may appear to work during development, when the application runs in a single thread mode and just one user accesses it, but produce really funny results when deployed to a production server.
Pyramid has pyramid.threadlocal.get_current_request() function which returns thread-local request variable, however, the docs state that:
This function should be used extremely sparingly, usually only in unit
testing code. it’s almost always usually a mistake to use
get_current_request outside a testing context because its usage makes
it possible to write code that can be neither easily tested nor
scripted.
which suggests that the whole approach is not "pyramidic" (same as pythonic, but for Pyramid :)
Possible other solutions include:
look at exlog.extra_info parameter which should include environ and params attributes of the request into the log message
registering exception views would allow completely custom processing of exceptions
Using WSGI middleware, such as WebError#error_catcher or Paste#error_catcher to send emails when an exception occurs
if you want to log not only exceptions but possibly other non-fatal information, maybe just writing a wrapper function would be enough:
if int(request.POST['donation_amount']) >= 1000000:
send_email("Wake up, we're rich!", authenticated_userid(request))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.