Should I load the whole database at initialization in Flask web application? - python

I'm develop a web application using Flask. I have 2 approaches to return pages for user's request.
Load requesting data from database then return.
Load the whole database into python dictionary variable at initialization and return the related page when requested. (the whole database is not too big)
I'm curious which approach will have better performance?

Of course it will be faster to get data from cache that is stored in memory. But you've got to be sure that the amount of data won't get too large, and that you're updating your cache every time you update the database. Depending on your exact goal you may choose python dict, cache (like memcached) or something else, such as tries.
There's also a "middle" way for this. You can store in memory not the whole records from database, but just the correspondence between the search params in request and the ids of the records in database. That way user makes a request, you quickly check the ids of the records needed, and query your database by id, which is pretty fast.

Related

Can we cache a value until it is updated in the database?

Is there any caching library for Python (or general technique) that can cache a database query result until the underlying tables of the query have been updated?
The cache should never output stale values. At the same time, the application should only need to query the database once for each change in the data.
I want to optimize a Flask app. I am facing this issue a lot with pages that have a list of objects that change infrequently. It is detrimental to present stale data, so a time-based cache cannot be used.
Right now there are hundreds of queries per hour due to multiple users accessing these pages. I would like to reduce that to the absolute minimum (i.e. only when there is an update to the data), and keep the data cached in-memory.
A possible approach would be to maintain last_updated timestamps for each table somewhere (possibly Redis) and check these before querying the database.

Django: Database caching to store web service result

I have a method to load all ideas from database. There are few comments on each idea.
I have store users id who commented on ideas, in respective table.
I have a web service which contains all the data related to user id
When i load a page, it took time to fetch information for all users using web service.
I want to use databse caching to store that web service response and use that later on.
How to achieve this, to reduce page load timing?
when should i store that web service response in cache table?
The Django documentation on the cache framework is pretty easy to follow and will show you exactly how to set up a database cache for pages and other things in your views, including for how long you'd like the cache to exist, TIMEOUT, as well as other arguments a cache can take.
Another way to speed up accessing your DB information is to take advantage of CONN_MAX_AGE for your database in your settings.py, who's time can be dependent on how often the database needs to be accessed or how much traffic the site gets (as an example). This is basically letting your DB connection know how long to stay open, and can be found in the settings documentation.
You can store information in a cache when something on the site occurs (such as a new comment) or when that particular request is made for a particular page. It can be entirely up to the needs of your project.

Storing queryset after fetching it once

I am new to django and web development.
I am building a website with a considerable size of database.
Large amount of data should be shown in many pages, and a lot of this data is repeated. I mean I need to show the same data in many pages.
Is it a good idea to make a query to the database asking for the data in every GET request? it takes many seconds to get the data every time I refresh the page or request another page that has the same data shown.
Is there a way to fetch the data once and store it somewhere and just display it in every page, and only refetch it when some updates are being done.
I thought about the session but I found that it is limited to 5MB which is small for my data.
Any suggestions?
Thank you.
Django's cache - as mentionned by Leistungsabfall - can help, but like most cache systems it has some drawbacks too if you use it naively for this kind of problems (long queries/computations): when the cache expires, the next request will have to recompute the whole thing - which might take some times durring which every new request will trigger a recomputation... Also, proper cache invalidation can be really tricky.
Actually there's no one-size-fits-all answer to your question, the right solution is often a mix of different solutions (code optimisation, caching, denormalisation etc), based on your actual data, how often they change, how much visitors you have, how critical it is to have up-to-date data etc, but the very first steps would be to
check the code fetching the data and find out if there are possible optimisations at this level using QuerySet features (.select_related() / prefetch_related(), values() and/or values_list(), annotations etc) to avoid issues like the "n+1 queries" problem, fetching whole records and building whole model instances when you only need a single field's value, doing computations at the Python level when they could be done at the database level etc
check your db schema's indexes - well used indexes can vastly improve performances, badly used ones can vastly degrade performances...
and of course use the right tools (db query logging, Python's profiler etc) to make sure you identify the real issues.

Python SQLAlchemy Multiple Queries

I'm not sure about the best way to approach this. Say I've got a "widget" table with fields 'id, 'name', 'size', 'color', with 10,000 rows.
When I load a webpage I will often need to look up hundreds of widgets (by id) and return one or more of the associated fields.
Once I have a database session established, is best practice to do something like:
thiswidget = session.query(Widget).filter(Widget.id=X)
Each time I need a piece of data, or should I grab all the data up front once, say like this:
widgetsdict = {}
for widget in session.query(Widget):
widgets[widget.id] = (widget.name, widget.size, widget.color)
Then each time I need to look something up, just do:
thiswidget = widgetsdict[X]
The first method is far simpler, but is it a good idea to keep asking the database over and over?
You should employ caching to prevent hitting the database too many times.
Redis or memcached are typically used for this purpose. They both act as processes which run on the client machine which can be called to save and retrieve data. You will need to set up the local server and the relevant python library.
The code you write in python should do the following:
Checks the cache for a key
If None returned, query db
Store result in cache, set reasonable expiry
Caching Software
redis
memcached

Persistent object with Django?

So I have a site that on a per-user basis, and it is expected to query a very large database, and flip through the results. Due to the size of the number of entries returned, I run the query once (which takes some time...), store the result in a global, and let folks iterate through the results (or download them) as they want.
Of course, this isn't scalable, as the globals are shared across sessions. What is the correct way to do this in Django? I looked at session management, but I always ran into the "xyz is not serializeable on json" issue. Do I look into how I do this correctly using sessions, or is there another preferred way to do this?
If the user is flipping through the results, you probably don't want to pull back and render any more than you have to. Most SQL dialects have TOP and LIMIT clauses that will let you pull back a limited range of results, as long as your data is ordered consistently. Django's Pagination classes are a nice abstraction of this on top of Django Model classes: https://docs.djangoproject.com/en/dev/topics/pagination/
I would be careful of storing large amounts of data in user sessions, as it won't scale as your number of users grows, and user sessions can stay around for a while after the user has left the site. If you're set on this option, make sure you read about clearing the expired sessions. Django doesn't do it for you:
https://docs.djangoproject.com/en/1.7/topics/http/sessions/#clearing-the-session-store

Categories