Can you store a constant global object in Django? - python

I am working on a small webpage that uses geospacial data. I did my initial analysis in Python using GeoPandas and Shapely and I am attempting to build a webpage from this. The problem is, when using Django, I can't seem to find a way to keep the shape file stored as a constant object. Each time a request is made to do operations on the shapefile, I need to load the data from source. This takes something like 6 seconds while a standard dataframe deep copy df.copy() would take fractions of a second. Is there a way I can store a dataframe in Django that can be accessed and deep copied by the views without re-reading the shapefile?

Due to the nature of Django, global variables to not really work that well. I solved this problem in two different ways. The first was to just use django sessions. This way, the object you wanted to store globally now only needs to be loaded once per session on your website. Second and more efficient option is to use a cache server, either Redis or memcached. This will allow you to store and get your objects very quickly across all of your sessions and will increase performance the most.

Related

FastAPI workers with shared data across functions called in an aggregate function

I have a FastAPI app that has a route that can dynamically (more or less) call the functions that I need to call. That function also loads the data needed for those functions to run. The data that is loaded and passed to each function is the same and thee functions then pick the data they need to use.
This system works good enough for my liking for a one of call to that route. The issue comes when calling this from an aggregate route which calls this end point 8 times (re-loading the data on every request.. The time to load the data adds up!)
A working solution was to have a dict that acted as a sort of session object that would either return the already loaded data or load it if it wasn't already loaded. While working I'm not exactly confident in the approach seeing as it would also have to re-load the data for every worker..
I've tried the same approach as above but utilising Redis, issue is the data the I need to load is Pandas DataFrames so there's a lot of pickling and unpicking of large data sets which is very time intensive so lands me back at square one..
Im pretty new to FastAPI and uvicorn so Im a bit stuck as to where to go next or how to solve this issue.
Any help/adive would be greatly appreciated, thanks!

Using python sessions to pass variable from one function to another function

I have a couple of quick questions -
~* when I used to code in Java, we used to reduce the usage of session variables as it used to slow the engine/occupy quite some space. In Python-django when I was trying to access one variable in two functions I have seen that request.session('variable_name') is being used to solve this - is there any other way to achieve what I wanted or request.session is the only way? In case request.session is the only approach then will sessions slow down the engine? (I apologize if its a lame question)
~* I have a list which has values that has to be saved in db table - so the list has to be iterated - model has to be instantiated - and finally it has to be saved. If the list is being iterated(say 100 times) it makes a db call 100 times to avoid that, this is what am doing
with transaction.atomic():
for lcc in list_course_content:
print lcc
c = Course_Content(TITLE=lcc, COURSE_ID_id=crse.id)
c.save()
am I in the right path or is there any other better approach ?
You say that you used to reduce the usage of session variables in Java, but you don't say how you did it. If it worked there, in Python it would also work.
Anyway, to be able to use variable on different requests, you have to store that variable somewhere. Language doesn't matter.In django you can set session backend, which can be based on inmemory storage, files or database, so it's your choice.
Of course you can store variable also without using sessions.

Persistence of a large number of objects

I have some code that I am working on that scrapes some data from a website, and then extracts certain key information from that website and stores it in an object. I create a couple hundred of these objects each day, each from unique url's. This is working quite well, however, I'm inexperienced in what options are available to me in Python for persistence and what would be best suited for my needs.
Currently I am using pickle. To do so, I am keeping all of these webpage objects and appending them in a list as new ones are created, then saving that list to a pickle (then reloading it whenever the list is to be updated). However, as i'm in the order of some GB of data, i'm finding pickle to be somewhat slow. It's not unworkable, but I'm wondering if there is a more well suited alternative. I don't really want to break apart the structure of my objects and store it in a sql type database, as its important for me to keep the methods and the data as a single object.
Shelve is one option I've been looking into, as my impression is then that I wouldn't have to unpickle and pickle all the old entries (just the most recent day that needs to be updated), but am unsure if this is how shelve works, and how fast it is.
So to avoid rambling on, my question is: what is the preferred persistence method for storing a large number of objects (all of the same type), to keep read/write speed up as the collection grows?
Martijn's suggestion could be one of the alternatives.
You may consider to store the pickle objects directly in a sqlite database which still can manage from the python standard library.
Use a StringIO object to convert between the database column and python object.
You didn't mention the size of each object you are pickling now. I guess it should stay well within sqlite's limit.

Using NumPy in Pyramid

I'd like to perform some array calculations using NumPy for a view callable in Pyramid. The array I'm using is quite large (3500x3500), so I'm wondering where the best place to load it is for repeated use.
Right now my application is a single page and I am using a single view callable.
The array will be loaded from disk and will not change.
If the array is something that can be shared between threads then you can store it in the registry at application startup (config.registry['my_big_array'] = ??). If it cannot be shared then I'd suggest using a queuing system with workers that can always have the data loaded, probably in another process. You can hack this by making the value in the registry be a threadlocal and then storing a new array in the variable if one is not there already, but then you will have a copy of the array per thread and that's really not a great idea for something that large.
I would just load it in the obvious place in the code, where you need to use it (in your view, I guess?) and see if you have performance problems. It's better to work with actual numbers than try to guess what's going to be a problem. You'll usually be surprised by the reality.
If you do see performance problems, assuming you don't need a copy for each of multiple threads, try just loading it in the global scope after your imports. If that doesn't work, try moving it into its own module and importing that. If that still doesn't help... I don't know what then.

Is there a django idiom to store app-related variables in the DB?

I'm quite new to django, and moved to it from Drupal.
In Drupal is possible to define module-level variables (read "application" for django) which are stored in the DB and use one of Drupal's "core tables". The idiom would be something like:
variable_set('mymodule_variablename', $value);
variable_get('mymodule_variablename', $default_value);
variable_del('mymodule_variablename');
The idea is that it wouldn't make sense to have each module (app) to instantiate a whole "module table" to just store one value, so the core provides a common one to be shared across modules.
To the best of my newbie understanding of django, django lack such a functionality, but - since it is a common pattern - I thought to turn to SO community to check if there is a typical/standard/idiomatic way that django devs use to solve this problem.
(BTW: the value is not a constant that I could put in a settings file. It's a value that should be refreshed daily, and should be read at each request).
There are apps to achieve this, but I'd like to recommend django-modeldict from disqus, as its brief
ModelDict is a very efficient way to store things like settings in
your database. The entire model is transformed into a dictionary
(lazily) as well as stored in your cache. It's invalidated only when
it needs to be (both in process and based on CACHE_BACKEND).
Data that is not static is stored in a model. If you need to share data or functions between apps I have seen the convention of making a shared app, something like 'common'. This would house shared models, or utility functions.
In the django projects I have seen the data is usually specific. The data you are storing should be in a model that is representative of that data, I would rather have an explicit model/object representing my data then a generic object that houses vastly different data.
If you are only defining 1 or two variables which are changed daily, perhaps just a key/value store like memcached would work for you?
Another +1 for ModelDict. Another potential, similar solution is Django Constance:
https://github.com/jazzband/django-constance
It's meant to store app config parameters in the database and has the advantage that it exposes a nice backend to edit them for administrators (with the right permissions), handles default values and also has caching etc.
EDIT:
In case it's not clear from the documentation (which it isn't), you can set settings the same the 'Pythonic way.' I.e. to set a setting to a value, you do
from constance import config
config.variable_name = value

Categories