Python bottle server-side caching

Python bottle server-side caching - python

Let's say we have an application based on Bottle like this:
from bottle import route, run, request, template, response
import time
def long_processing_task(i):
time.sleep(0.5) # here some more
return int(i)+2 # complicated processing in reality
#route('/')
def index():
i = request.params.get('id', '', type=str)
a = long_processing_task(i)
response.set_header("Cache-Control", "public, max-age=3600") # does not seem to work
return template('Hello {{a}}', a=a) # here in reality it's: template('index.html', a=a, b=b, ...) based on an external template file
run(port=80)
Obviously going to http://localhost/?id=1, http://localhost/?id=2, http://localhost/?id=3, etc.
takes at least 500 ms per page for the first loading.
How to make subsequent loading of these pages are faster?
More precisely, is there a way to have both:
client-side caching: if user A has visited http://localhost/?id=1 once, then if user A visits this page a second time, it will be faster
server-side caching: if user A has visited http://localhost/?id=1 once, then if user B visits this page later (for the first time for user B!), it will be faster too.
In other words: if 500 ms is spent to generate http://localhost/?id=1 for one user, it will be cached for all future users requesting the same page. (is there a name for that?)
?
Notes:
In my code response.set_header("Cache-Control", "public, max-age=3600") does not seem to work.
In this tutorial, it is mentioned about template caching:
Templates are cached in memory after compilation. Modifications made to the template files will have no affect until you clear the template cache. Call bottle.TEMPLATES.clear() to do so. Caching is disabled in debug mode.
but I think it's not related to caching of the final page ready to send to the client.
I already read Python Bottle and Cache-Control but this is related to static files.

Server Side
You want to avoid calling your long running task repeatedly. A naive solution that would work at small scale is to memoize long_processing_task:
from functools import lru_cache
#lru_cache(maxsize=1024)
def long_processing_task(i):
time.sleep(0.5) # here some more
return int(i)+2 # complicated processing in reality
More complex solutions (that scale better) involve setting up a reverse proxy (cache) in front of your web server.
Client Side
You'll want to use response headers to control how clients cache your responses. (See Cache-Control and Expires headers.) This is a broad topic, with many nuanced alternatives that are out of scope in an SO answer - for example, there are tradeoffs involved in asking clients to cache (they won't get updated results until their local cache expires).
An alternative to caching is to use conditional requests: use an ETag or Last-Modified header to return an HTTP 304 when the client has already received the latest version of the response.
Here's a helpful overview of the various header-based caching strategies.

Related

Disable Cached JSON response on frequent browser calls to django view

In my django app, I use views to call the django method I want to test under development.
When I call my view visiting the mapped url localhost:8000/do_something, twice, it'll return me the cached JSON response and won't process the requests again which destroys my usage of testing the code.
I'm aware that it's definitely not the best practice, but I'd just like to work with it, so following are the things I tried:
Clearing the browser cache(In chrome, IE and firefox, all three)
Restarting the server
It ultimately clears the cache in 2-3 restarts, and the view makes the method calls again instead of just returning the cached JSON response.
I'm sure it's a preference or some setting, would be glad if someone could resolve me with this issue?
Thanks.

I solved the problem by adding thee #never_cache django decorator to my views. e.g.
#never_cache
def do_somethig(request):
return JsonResponse({"Tested":"OK"})

Can you hook pyramid into twisted, skipping the wsgi part?

Given that :
WSGI doesn't play very well with async.
Twisted ergonomics suck.
Pyramid is very clean and component oriented.
How could I use Pyramid and Twisted ?
I can imagine making a twisted protocol to get the raw HTML request. But then I can't see how to parse it into a pyramid request objects. All documented pyramid tools seems to expect some wsgi interface at some point.
I could use waitress code to parse the request and turn it into a WSGI env then pass the env to pyramid but that's a lot of work with many issues I'm sure I can't even imagine down the road.
I know twisted includes a WSGI server, but it implies synchronicity in the app code, which does not serve my purpose. I want to be able to use the request and response objects, renderers, routers, and others pyramid tools in a twisted asynchronous protocol, with an asynchronous, non blocking app code as well. Hence I won't want to use WSGI.
Twisted API is verbose, heavy and uninuitive compared to any other asynchronous toolkit you'll find in Python or even other languages. Hence the critic about its ergonomics. I can use it, but training newcomers in my team to do it has a high cost. I wish to lower it.
Indeed, it packs a lot of power that I want to use.
To elaborate on my needs, I'm building a tool using crossbar.io and cyclone to have a WAMP/HTTP framework a bit friendlier to my team that the current tools. But cyclone is not as complete as pyramid, and I was hoping pyramid components were decoupled enough that the WSGI paradigm was not enforced, so I could leverage the tremendous work they did on it. All I need is an entry point : somewhere to get the HTML, and parse it into a request objet, and somewhere to take a response object, and returns HTML to the client. I wish i don't have to write a protocol manually for this, http is tricky and I'm sure I'll get it wrong in many ways.
One precision : i don't wish to use the full pyramid framework, just some components here and there, such as rooting, cookie parsing, CSRF protection, etc. I won't use their view system for it assumes a synchronous API.
Looking at Pyramid, I can see that it expects the entire request be be parsed and turned into a request object. it also returns the response as an object as well. So a part of the problem, to hook twisted and pyramid together, is to :
get the http request text as one big chunk from twisted;
parse it into the request object somehow (couldn't find a simple function to do this, but if I can turn it into an WSGI environ + request object, pyramid can convert it to it's format).
get the pyramid response object and turn it into a generator of strings (an adaptor can be find since that's what WSGI does).
Send the response back with twisted from this generator of strings.
Alternatives can be to use something simpler than pyramid like werkzeug for the glue.

Twisted Web lets you interpret HTTP request bodies (regardless of content-type, HTML or otherwise) incrementally as they're received - but it doesn't make doing so very easy. There's a very old ticket that we never seem to make much progress on for improving this situation. Until it's resolved, there probably isn't a better answer than the one I'm about to give. This incremental HTTP request body delivery, I think, is what you're looking for here (because you said you expect requests to "be a big HTML chunk").
The hook for incremental request body handling is Request.handleContentChunk. You can see a complete demonstration of its use in my answer to Python server for streaming request body content.
This gives you the data as it arrives at the server. If you want to use Pyramid, you'll have to construct a Pyramid request that uses this data. Most of the initialization of the Pyramid request object should be straightforward (eg filling the environ dictionary with the request headers - you can take these from Request.requestHeaders). The slightly trickier part will be initializing the Pyramid request object's body - which is supposed to be a file-like object that provides synchronous access to the request body.
On the one hand, if you dispatch the request before the request body has been completely received then you avoid the cost of buffering the entire request body in memory. On the other hand, if you let application code begin to read the request body then you have to deal with the circumstance that it tries to read beyond the point in the data which has actually arrived at the server. This can be dealt with. The body file-like object is expected to present a blocking interface. All you have to do is block until the data is available.
Here's a brief (incomplete, not meant to actually work) sketch of what I mean:
# XXX Note: Queue is not actually thread-safe. Use a safer primitive.
from Queue import Queue
class Body(object):
def __init__(self):
self._buffer = Queue()
self._pending = b""
self._eof = False
def read(self, how_many):
if self._eof:
return b""
if self._pending == b"":
data = self._buffer.get()
if data is None:
self._eof = True
return b""
else:
self._pending = data
if self._pending is None:
result = self._pending[:how_many]
self._pending = self._pending[how_many:]
return result
def _add_data(self, data):
self._buffer.put(data)
You can create an instance of this type, initialize the Pyramid request object's body attribute with it, and then call _add_data on it in the Twisted Request class's handleContentChunk callback.
You could also implement this as an enhancement to Twisted's own WSGI server. For the sake of simplicity, Twisted's WSGI server does read the entire request body before dispatching the request to the WSGI application - but it doesn't have to. If this is the only problem with WSGI then it'd be better to improve the quality of the WSGI implementation and keep the interface rather than both implementing the improvement and stepping outside of the interface (tying you more closely to both Twisted and Pyramid - unnecessarily).
The second half of the problem, generating response bodies incrementally, shouldn't really be a problem. Twisted's WSGI container will write out response data as the WSGI application object yields it. Or if you use twisted.web.resource instead of the WSGI interface, you can call request.write as many times as you like, at any time you like (up until you call request.finish). The only trick is that if you want to do this you must return NOT_DONE_YET from the render method.

Is it possible to dynamically update a rendered template in Flask, server-side?

I currently have a Flask web server that pulls data from a JSON API using the built-in requests object.
For example:
def get_data():
response = requests.get("http://myhost/jsonapi")
...
return response
#main.route("/", methods=["GET"])
def index():
return render_template("index.html", response=response)
The issue here is that naturally the GET method is only run once, the first time get_data is called. In order to refresh the data, I have to stop and restart the Flask wsgi server. I've tried wrapping various parts of the code in a while True / sleep loop but this prevents werkzeug from loading the page.
What is the most Pythonic way to dynamically GET the data I want without having to reload the page or restart the server?

You're discussing what are perhaps two different issues.
Let's assume the problem is you're calling the dynamic data source, get_data(), only once and keeping its (static) value in a global response. This one-time-call is not shown, but let's say it's somewhere in your code. Then, if you are willing to refresh the page (/) to get updates, you could then:
#main.route("/", methods=['GET'])
def index():
return render_template("index.html", response=get_data())
This would fetch fresh data on every page load.
Then toward the end of your question, you ask how to "GET the data I want without having to reload the page or restart the server." That is an entirely different issue. You will have to use AJAX or WebSocket requests in your code. There are quite a few tutorials about how to do this (e.g. this one) that you can find through Googling "Flask AJAX." But this will require an JavaScript AJAX call. I recommend finding examples of how this is done through searching "Flask AJAX jQuery" as jQuery will abstract and simplify what you need to do on the client side. Or, if you wish to use WebSockets for lower-latency connection between your web page, that is also possible; search for examples (e.g. like this one).

To add to Jonathan’s comment, you can use frameworks like stimulus or turbo links to do this dynamically, without having to write JavaScript in some cases as the frameworks do a lot of the heavy lifting. https://stimulus.hotwired.dev/handbook/origin

Django statelessness?

I'm just wondering if Django was designed to be a fully stateless framework?
It seems to encourage statelessness and external storage mechanisms (databases and caches) but I'm wondering if it is possible to store some things in the server's memory while my app is in develpoment and runs via manage.py runserver.

Sure it's possible. But if you are writing a web application you probably won't want to do that because of threading issues.

That depends on what you mean by "store things in the server's memory." It also depends on the type of data. If you can, you're better off storing "global data" in a database or in the file system somewhere. Unless it is needed every request it doesn't really make sense to store it in the Django instance itself. You'll need to implement some form of locking to prevent race conditions, but you'd need to worry about race conditions if you stored everything on the server object anyway.
Of course, if you're talking about user-by-user data, Django does support sessions. Or, and this is another perfectly good option if you're willing to make the user save the data, cookies.

The best way to maintain state in a django app on a per-user basis is request.session (see django sessions) which is a dictionary you can use to remember things about the current user.
For Application-wide state you should use the a persistent datastore (database or key/value store)
example view for sessions:
def my_view(request):
pages_viewed = request.session.get('pages_viewed', 1) + 1
request.session['pages_viewed'] = pages_viewed
...
If you wanted to maintain local variables on a per app-instance basis you can just define module level variables like so
# number of times my_view has been served since by this server
# instance since the last restart
served_since_restart = 0
def my_view(request):
served_since_restart += 1
...
If you wanted to maintain some server state across ALL app servers (like total number of pages viewed EVER) you should probably use a persistent key/value store like redis, memcachedb, or riak. There is a decent comparison of all these options here: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
You can do it with redis (via redis-py) like so (assuming your redis server is at "127.0.0.1" (localhost) and it's port 6379 (the default):
import redis
def my_view(request):
r = redis.Redis(host='127.0.0.1', port="6379")
served = r.get('pages_served_all_time', 0)
served += 1
r.set('pages_served_all_time', served)
...

There is LocMemCache cache backend that stores data in-process. You can use it with sessions (but with great care: this cache is not cross-process so you will have to use single process for deployment because it will not be guaranteed that subsequent requests will be handled by the same process otherwise). Global variables may also work (use threadlocals if they shouldn't be shared for all process threads; the warning about cross-process communication also applies here).
By the way, what's wrong with external storage? External storage provides easy cross-process data sharing and other features (like memory limiting algorithms for cache or persistance with databases).

Why are CherryPy object attributes persistent between requests?

I was writing debugging methods for my CherryPy application. The code in question was (very) basically equivalent to this:
import cherrypy
class Page:
def index(self):
try:
self.body += 'okay'
except AttributeError:
self.body = 'okay'
return self.body
index.exposed = True
cherrypy.quickstart(Page(), config='root.conf')
I was surprised to notice that from request to request, the output of self.body grew. When I visited the page from one client, and then from another concurrently-open client, and then refreshed the browsers for both, the output was an ever-increasing string of "okay"s. In my debugging method, I was also recording user-specific information (i.e. session data) and that, too, showed up in both users' output.
I'm assuming that's because the python module is loaded into working memory instead of being re-run for every request.
My question is this: How does that work? How is it that self.debug is preserved from request to request, but cherrypy.session and cherrypy.response aren't?
And is there any way to set an object attribute that will only be used for the current request? I know I can overwrite self.body per every request, but it seems a little ad-hoc. Is there a standard or built-in way of doing it in CherryPy?
(second question moved to How does CherryPy caching work?)

synthesizerpatel's analysis is correct, but if you really want to store some data per request, then store it as an attribute on cherrypy.request, not in the session. The cherrypy.request and .response objects are new for each request, so there's no fear that any of their attributes will persist across requests. That is the canonical way to do it. Just make sure you're not overwriting any of cherrypy's internal attributes! cherrypy.request.body, for example, is already reserved for handing you, say, a POSTed JSON request body.
For all the details of exactly how the scoping works, the best source is the source code.

You hit the nail on the head with the observation that you're getting the same data from self.body because it's the same in memory of the Python process running CherryPy.
self.debug maintains 'state' for this reason, it's an attribute of the running server.
To set data for the current session, use cherrypy.session['fieldname'] = 'fieldvalue', to get data use cherrypy.session.get('fieldname').
You (the programmer) do not need to know the session ID, cherrypy.session handles that for you -- the session ID is automatically generated on the fly by cherrypy and is persisted by exchanging a cookie between the browser and server on subsequent query/response interactions.
If you don't specify a storage_type for cherrypy.session in your config, it'll be stored in memory (accessible to the server and you), but you can also store the session files on disk if you wish which might be a handy way for you to debug without having to write a bunch of code to dig out session IDs or key/pair values from the running server.
For more info check out http://www.cherrypy.org/wiki/CherryPySessions

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.