Class instance in Python Bottle application, is it shared between threads/processes? - python

I have created a class in a Bottle application which handles and stores URL information and is created each time a http request is made:
#route('/<fullurl:path>')
def page_req(fullurl=''):
urlData = urlReq(request.urlparts[1], fullurl)
urlData is the instance name and urlReq is the class name.
Obviously the urlData instance will contain information generated from one request. I'm just wondering what happens if another request comes in before the cycle of the first request has finished and sent its output. Will the second request change the data in urlData or will there be two separate processes each with their own version of urlData?
I've been reading the WSGI processes/threads information and the Bottle docs all afternoon and it's still not immediately clear. I have tried writing a small automated script fire multiple requests at the development server but it seems to hold excess requests off til one has finished. Hope I've been clear enough.

bottle.request is a thread-safe instance of LocalRequest(). If accessed from within a request callback, this instance always refers to the current request (even on a multithreaded server).
see http://bottlepy.org/docs/dev/api.html#bottle.request

Related

Session object in python

Does a Session object maintain the same TCP connection with a client?
In the code below, a request from the client is submitted to a handler, the handler creates a sessions object why does session["count"] on an object give a dictionary?
A response is then given back to the client, upon another request is the code re-executed?
So that another session object is created?
How does the session store the previous count information if it did not return a cookie to the client?
from appengine_utilities import sessions
class SubmitHandler(webapp.RequestHandler):
def get(self):
session = sessions.Session()
if "count" in session:
session["count"]=session["count"]+1
else:
session["count"]=1
template_values={'message':"You have clicked:"+str(session["count"])}
# render the page using the template engine
path = os.path.join(os.path.dirname(__file__),'index.html')
self.response.out.write(template.render(path,template_values))
You made several questions so let's go one by one:
Sessions are not related to TCP connections. A TCP connection is maintained when both client and server agreed upon that using the HTTP Header keep-alive. (Quoted from Pablo Santa Cruz in this answer).
Looking at the module session.py in line 1010 under __getitem__ definition I've found the following TODO: It's broke here, but I'm not sure why, it's returning a model object. Could be something along these lines, I haven't debug it myself.
From appengine_utilities documentation sessions are stored in Datastore and Memcache or kept entirely as cookies. The first option also involves sending a token to the client to identify it in subsequent requests. Choosing one or another depends on your actual settings or the default ones if you haven't configured your own. Default settings are defined to use the Datastore option.
About code re-execution you could check that yourself adding some logging code to count how many times is the function executed.
Something important, I have noticed that this library had it's latest update on 2nd of January 2016, so it has gone unmaintained for 4 years. It would be best if you change to an up to date library, for example the webapp2 session module. Furthermore, Python 2 is sunsetting by this year (1st January 2020) so you might consider switching to python 3 instead.
PD: I found the exact code you posted under this website. In case you took it from there consider next time to include a reference/citation to it's origin.

Use a flask session inside a python thread

How can I update a flask session inside a python thread? The below code is throwing this error:
*** RuntimeError: working outside of request context
from flask import session
def test(ses):
ses['test'] = "test"
#app.route('/test', methods=['POST', 'GET'])
def mytest():
t = threading.Thread(target=test, args=(session, ))
t.start()
When you execute t.start(), you are creating an independent thread of execution which is not synchronized with the execution of the main thread in any way.
The Flask session object is only defined in the context of a particular HTTP request.
What does the variable session mean in the second thread (t)?
When t executes, there is no guarantee that the user request from the main thread still exists or is in a modifiable state. Perhaps the HTTP request has already been fully handled in the main thread.
Flask detects that you are trying to manipulate an object that is dependent on a particular context, and that your code is not running in that context. So it raises an exception.
There are a variety of approaches to synchronizing output from multiple threads into a single request context but... what are you actually trying to do here?
None of the documentation I've seen really elaborates why this isn't possible in this framework - it's as if they have never heard of the use case.
In a nutshell, the built in session uses the user's browser (the cookie) as storage for the session - this is not what I understand sessions to be, and oh boy the security issues - don't store any secrets in there - the session is basically JSON encoded, compressed then set as a cookie - at least it's signed, I guess.
Flask-Session mitigates the security issues by behaving more like sessions do in other frameworks - the cookie is just an opaque identifier meaningful only in the back end - but the value changes every time the session changes, requiring the cookie be sent to the browser again - a background thread won't have access to the request when it's been completed a long time ago, so all you have is a one way transfer of data - out of the session and into your background task.
Might I suggest the baggage claim pattern? Your initial request handling function designates some key in some shared storage - a file on disk, a row in a database identified by some key, an object key in an in memory cache - whatever - and puts that in the session, then passes the session to your background process which can inspect the session for the location to place the results. Your subsequent request handling functions can then check this location for the results.

Does Google App Engine run one instance of an app per one request? or for all requests?

Using google app engine:
# more code ahead not shown
application = webapp.WSGIApplication([('/', Home)],
debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
If two different users request the webpage on two different machine, two individual instances of the server will be invoked?
Or just one instance of the server is running all the time which handle all the requests?
How about if one user open the webpage twice in the same browser?
Edit:
According to the answers below, one instance may handle requests from different users turn by turn. Then consider the following fraction of code, taken from the example Google gave:
class User(db.Model):
email = db.EmailProperty()
nickname = db.StringProperty()
1, email and nickname here are defined as class variables?
2, All the requests handled by the same instance of server share the same variables and thus by mistake interfere with each other? (Say, one's email appears in another's page)
ps. I know that I should read the manual and doc more and I am doing it, however answers from experienced programmer will really help me understand faster and more through, thanks
An instance can handle many requests over its lifetime. In the python runtime's threading model, each instance can only handle a single request at any given time. If 2 requests arrive at the same time they might be handled one after the other by a single instance, or a second instance might be spawned to handle the request.
EDIT:
In general, variables used by each request will be scoped to a RequestHandler instance's .get() or .post() method, and thus can't "leak" into other requests. You should be careful about using global variables in your scripts, as these will be cached in the instance and would be shared between requests. Don't use globals without knowing exactly why you want to (which is good advice for any application, for that matter), and you'll be fine.
App Engine dynamically builds up and tears down instances based on request volume.
From the docs:
App Engine applications are powered by
any number of instances at any given
time, depending on the volume of
requests received by your application.
As requests for your application
increase, so do the number of
instances powering it.
Each instance has its own queue for
incoming requests. App Engine monitors
the number of requests waiting in each
instance's queue. If App Engine
detects that queues for an application
are getting too long due to increased
load, it automatically creates a new
instance of the application to handle
that load.
App Engine scales instances in reverse
when request volumes decrease. In this
way, App Engine ensures that all of
your application's current instances
are being used to optimal efficiency.
This automatic scaling makes running
App Engine so cost effective.
When an application is not being used
all, App Engine turns off its
associated instances, but readily
reloads them as soon as they are
needed.

Keeping concurrency in web.py applications on mod_wsgi

Sorry if this makes no sense. Please comment if clarification is needed.
I'm writing a small file upload app in web.py which I am deploying using mod_wsgi + apache. I have been having a problem with my session management and would like clarification on how the threading works in web.py.
Essentially I embed a code in a hidden field of the html page I render when someone accesses my page. The file upload is then done via a standard POST request containing both the file and the code. Then I retrieve the progress of the file by updating it in the file upload POST method and grabbing it with a GET request to a different class. The 'session' (apologies for it being fairly naive) is stored in a session object like this:
class session:
def __init__(self):
self.progress = 0
self.title = ""
self.finished = False
def advance(self):
self.progress = self.progress + 1
The sessions are all kept in a global dictionary within my app script and then accessed with my code (from earlier) as the key.
For some reason my progress seems to stay at 0 and never increments. I've been debugging for a couple hours now and I've found that the two session objects referenced from the upload class and the progress class are not the same. The two codes, however, are (as far as I can tell) equal. This is driving me mad as it worked without any problems on the web.py test server on my local machine.
EDIT: After some research it seems that the dictionary may get copied for every request. I've tried putting the dictionary in another and importing but this doesn't work. Is there some other way short of using a database to 'seperate' the sessions dictionary?
Apache/mod_wsgi can run in multiprocess configurations and possible your requests aren't even being serviced by the same process and never will if for that multiprocess configuration each process is single thread because while the upload is occuring no other requests can be handled by that same process. Read:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Possibly you should use mod_wsgi daemon mode with single multiple thread daemon process.
From PEP 333, defining WSGI:
Servers that can run multiple requests in parallel, should also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server
Check the documentation of your WSGI server.

Why are CherryPy object attributes persistent between requests?

I was writing debugging methods for my CherryPy application. The code in question was (very) basically equivalent to this:
import cherrypy
class Page:
def index(self):
try:
self.body += 'okay'
except AttributeError:
self.body = 'okay'
return self.body
index.exposed = True
cherrypy.quickstart(Page(), config='root.conf')
I was surprised to notice that from request to request, the output of self.body grew. When I visited the page from one client, and then from another concurrently-open client, and then refreshed the browsers for both, the output was an ever-increasing string of "okay"s. In my debugging method, I was also recording user-specific information (i.e. session data) and that, too, showed up in both users' output.
I'm assuming that's because the python module is loaded into working memory instead of being re-run for every request.
My question is this: How does that work? How is it that self.debug is preserved from request to request, but cherrypy.session and cherrypy.response aren't?
And is there any way to set an object attribute that will only be used for the current request? I know I can overwrite self.body per every request, but it seems a little ad-hoc. Is there a standard or built-in way of doing it in CherryPy?
(second question moved to How does CherryPy caching work?)
synthesizerpatel's analysis is correct, but if you really want to store some data per request, then store it as an attribute on cherrypy.request, not in the session. The cherrypy.request and .response objects are new for each request, so there's no fear that any of their attributes will persist across requests. That is the canonical way to do it. Just make sure you're not overwriting any of cherrypy's internal attributes! cherrypy.request.body, for example, is already reserved for handing you, say, a POSTed JSON request body.
For all the details of exactly how the scoping works, the best source is the source code.
You hit the nail on the head with the observation that you're getting the same data from self.body because it's the same in memory of the Python process running CherryPy.
self.debug maintains 'state' for this reason, it's an attribute of the running server.
To set data for the current session, use cherrypy.session['fieldname'] = 'fieldvalue', to get data use cherrypy.session.get('fieldname').
You (the programmer) do not need to know the session ID, cherrypy.session handles that for you -- the session ID is automatically generated on the fly by cherrypy and is persisted by exchanging a cookie between the browser and server on subsequent query/response interactions.
If you don't specify a storage_type for cherrypy.session in your config, it'll be stored in memory (accessible to the server and you), but you can also store the session files on disk if you wish which might be a handy way for you to debug without having to write a bunch of code to dig out session IDs or key/pair values from the running server.
For more info check out http://www.cherrypy.org/wiki/CherryPySessions

Categories