I am working on django-project ,I want to reduce database request overhead.
So I am trying with django-cache (Requires Memcached)
vi /etc/sysconfig/memcached
PORT="11211"
USER="memcached"
MAXCONN="1024"
CACHESIZE="64" We increased memory size up to 256
OPTIONS="" added IP address "-l 127.0.0.1"
Changes settings as follows in project:Added new variable in settings.py
CACHE_BACKEND='memcached://localhost:11211'
SESSION_ENGINE = 'django.contrib.sessions.backends.cache'
[ Note : restart memcached: /etc/init.d/memcached restart]
Project is working, It reduces the database request overhead: but that brings certain issues:
I lost my session after few time. So I need to login again in application,how can I handle this,I want to store only session details.
You are using it correctly but keep in mind that if you restart memcached, you will loose all your existing sessions. That's to be expected.
Related
I have a Django REST API endpoint. It receives a JSON payload eg.
{ "data" : [0,1,2,3] }
This is decoded in a views.py function and generates a new database object like so (pseudo code):
newobj = MyObj.(col0 = 0, col1= 1, col2 = 2, col3 = 3)
newobj.save()
In tests, it is 20x faster to create a list of x1000 newobjs, then do a bulk create:
Myobj.objects.bulk_create(newobjs, 1000)
So, the question is how to save individual POSTs somewhere in Django ready for batch writes when we have 1000 of them ?
You can cache it with Memcached or with Redis, for example.
But you will need to write some kind of service that checks how many new items in the cache and if there are more than e.g. 1000 -> insert it.
So:
POST are populating a cache
Service getting new items from the cache and then inserting them in the persistence database.
Do you really need it?
What will happen if data already exist? If data is corrupted? How the user will know about this?
save individual POSTs somewhere in Django ready for batch writes when we have 1000 of them
You can,
use django's cache framework,
maintain a CSV file using python's csv module
you probably want to maintain the order of the posts, so you can use persist-queue package.
But as Victor mentioned as well, why? Why are you so concerned about speeds of SQL Insert which are pretty fast anyway ?
Ofcourse, bulk_create is much faster because it takes a single network call to your DB server and adds all the rows in a single SQL transaction but it only makes sense to use it when you actually have bunch of data to be added together. - At the end, you must save the data somewhere which is gonna take some processing time one way or another.
Because there are many disadvantages to your approach:
you risk losing the data
you will not be able to achieve UNIQUE or any other constraint on your table.
your users won't get instant feedback on creating a post.
you cannot show/access the posts in useful way if they are not stored in your primary DB.
EDIT
Use a fast cache like Redis to maintain a list of the entries, in your api_view you can call cache.get to get the current list, append object to it and then call cache.set to update it. After this add a check that whenever len(list) >= 1000 == True call bulk_create. You might also want to consider using Elasticsearch for such enormous amount of data.
Thanks for the above responses, the answers included some of what was suggested, but is a superset, so here's a summary.
This is really about creating a FIFO. memcached turns out to be unsuitable (after trying) because only redis has a list function that enables this, explained nicely here.
Also note that the Django built in cache does not support the redis list api calls.
So we need a new docker-compose.yml entry to add redis:
redis:
image: redis
ports:
- 6379:6379/tcp
networks:
- app-network
Then in views.py we add: (note the use of redis rpush)
import redis
...
redis_host=os.environ['REDIS_HOST']
redis_port = 6379
redis_password = ""
r = redis.StrictRedis(host=redis_host, port=redis_port, password=redis_password, decode_responses=True)
...
def write_post_to_redis(request):
payload = json.loads(request.body)
r.rpush("df",json.dumps(payload))
So this pushes the received payload into the redis in-memory cache. We now need to read (or pop ) it and write to the postgres database. So we need a process that wakes up every n seconds and checks. For this we need Django background_task. First, install it with:
pipenv install django-background-tasks
And add to the installed apps of the settings.py
INSTALLED_APPS = [
...
'background_task',
Then run a migrate to add the background task tables:
python manage.py migrate
Now in views.py, add:
from background_task import background
from background_task.models import CompletedTask
And add the function to write the cached data to the postgres database, note the decorator which states it should run in the background every 5 seconds. Also note use of redis lpop.
#background(schedule=5)
def write_cached_samples():
...
payload = json.loads(r.lpop('df'))
# now do your write of payload to postgres
... and delete the completed tasks or we'll have a big db leak
CompletedTask.objects.all().delete()
In order to start the process up, add the following to the base of urls.py:
write_cached_samples(repeat=10, repeat_until=None)
Finally, because the background task needs a separate process, we duplicate the django docker container in docker-compose.yml but replace the asgi server run command with the background process run command.
django_bg:
image: my_django
command: >
sh -c "python manage.py process_tasks"
...
In summary we add two new docker containers, one for the redis in-memory cache, and one to run the django background tasks. We use the redis lists rpush and lpop functions to create a FIFO with the API receive pushing and a background task popping.
There was a small issue where nginx was hooking up to the wrong django container, rectified by stopping and restarting the background container, some issue where docker networking routing is wrongly initialising.
Next I am replacing the Django HTTP API endpoint with a Go one to see how much of a speed up we get, as the Daphne ASGI server is hitting max CPU at only 100 requests per sec.
My Django based application is performing following steps
Read million of records from XLS and put the data in cache
cache.set( (str(run)+"trn_data"),files['inp_data'],3600 )
Data stored in cache is optimized and stored again in cache
cache.set( (str(run)+"tier"),tier,3600 )}
Data is fetched from cache and put into database.
tier = cache.get( (str(runid)+"tier"))
The issue I am facing is everything is working fine on Django Dev server but as soon as I hosted the application on Apache( mod_wsgi and apache) the cache is shown as empty(At step 3)
P.S. I am using Local Mem Cache in my application
The local memory cache is per-process. See the warning in the docs (emphasis mine):
Note that each process will have its own private cache instance, which means no cross-process caching is possible. This obviously also means the local memory cache isn’t particularly memory-efficient, so it’s probably not a good choice for production environments. It’s nice for development.
For production, I recommend you use a different cache backend, for example Memcached
I am working on scaling out a webapp and providing some database redundancy for protection against failures and to keep the servers up when updates are needed. The app is still in development, so I have chosen a simple multi-master redundancy with two separate database servers to try and achieve this. Each server will have the Django code and host its own database, and the databases should be as closely mirrored as possible (updated within a few seconds).
I am trying to figure out how to set up the multi-master (master-master) replication between databases with Django and MySQL. There is a lot of documentation about setting it up with MySQL only (using various configurations), but I cannot find any for making this work from the Django side of things.
From what I understand, I need to approach this by adding two database entries in the Django settings (one for each master) and then write a database router that will specify which database to read from and which to write from. In this scenario, both databases should accept both reads and writes, and writes/updates should be mirrored over to the other database. The logic in the router could simply use a round-robin technique to decide which database to use. From there on, further configuration to set up the actual replication should be done through MySQL configuration.
Does this approach sound correct, and does anyone have any experience with getting this to work?
Your idea of the router is great! I would add that you need automatically detect whether a databases is [slow] down. You can detect that by the response time and by connection/read/write errors. And if this happens then you exclude this database from your round-robin list for a while, trying to connect back to it every now and then to detect if the databases is alive.
In other words the round-robin list grows and shrinks dynamically depending on the health status of your database machines.
The another important notice is that luckily you don't need to maintain this round-robin list common to all the web servers. Each web server can store its own copy of the round-robin list and its own state of inclusion and exclusion of databases into this list. This is just because a database server can be seen from one web server and can be not seen from another one due to local network problems.
I am new to Django. I wrote a basic application. When I test it, every small change I make in the Python code logs me out from localhost.
This happens when I use this cache backend:
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
but does not when I use this one:
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
Is there a way that I can continue using locmem but not get logged out?
I'm guessing that your SESSION_ENGINE setting is set to cache, and that you're using the development server.
If so, then the behavior you're seeing makes perfect sense. When you change your Python code, the development server automatically restarts, losing all the data in memory. Since that includes the cache, which includes the session information, you lose that too, forcing everyone to login all over again.
The documentation mentions this:
Warning
You should only use cache-based sessions if you’re using the Memcached cache backend. The local-memory cache backend doesn’t retain data long enough to be a good choice.
Since you want to keep the LocMemCache, you should use a different session backend. A simple approach might be the cookie-based backend, but check the documentation to see all your options.
I'm working on a Django 1.2.3 project, and I'm finding that the admin session seems to timeout extremely early, after about a minute after logging in, even while I'm using it.
Initially, I had these settings:
SESSION_COOKIE_AGE=1800
SESSION_EXPIRE_AT_BROWSER_CLOSE=True
I thought the problem might be my session storage was mis-configured, so I tried configuring my session to be stored in local memory by adding:
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
CACHE_BACKEND = 'locmem://'
However, the problem still occurs. Is there something else that would cause admin sessions to timeout early even when the user is active?
Caching sessions in locmem:// means that you lose the session whenever the python process restarts. If you're running under the dev server, that would be any time you save a file. In a production environment, that will vary based on your infrastructure - mod_wsgi in apache, for example, will restart python after a certain number of requests (which is highly configurable). If you have multiple python processes configured, you'll lose your session whenever your request goes to a different process.
What's more, if you have multiple servers in a production environment, locmem:// will only refer to one server process.
In other words, don't use locmem:// for session storage.