I have a 200MB sized csv file containing rows where a key term is matched against a list of strings inside the second column.
term_x | ["term_1","term_2"]
term_y | ["term_1","term_2"]
term_z | ["term_1","term_2"]
My Django app is not configured to use any complex memory caching (Redis, Memcached) and in practice, I want to pass a term into the database table to retrieve the corresponding list value. Due to its size however, retrieving the list from the correct row takes around half a second to do, on top of other actions being performed while loading the page.
Is it possible in Django to "pre-cache" this table upon server startup? i.e. add all of those values to the cache with the first column being the key? I have attempted something similar by overriding the "ready" method in my app.py to load the database table into the cache on startup, but I get null values when I try to use a term I know is in the table:
class MyAppConfig(AppConfig):
name = 'test_display'
def ready(self):
print("Loading RS Lookup cache..."),
#setup database connection....
cache_df = pd.read_sql_query("Select * from table_to_cache", engine)
print("Table loaded")
for index, row in cache_df.iterrows():
cache.set(row['term'], row['list_of_terms'], None)
print("RS Table loaded")
My init.py in the same Django app has only one line:
default_app_config = 'test_display.apps.MyAppConfig'
Check whether the following is correct:
In the project settings you did not configure caching or used the local memory caching as described in the documentation.
You only use the default cache (from django.core.cache import cache) or correctly handle cache names.
Make sure your code in .ready() actually stores the values you are trying to read later. You can use one of the following:
assert "my_term" in cache, "The term is not cached!"
or
from django.core.cache.backends import locmem
print(locmem._caches)
# now check what you have inside using your very own eyes and patience
As for the following:
Is it possible in Django to "pre-cache" ... ?
Your solution utilizes AppConfig.ready() which is generally a very good place for activities that your server should only perform once per instance. At least I am not aware of a better solution.
Related
I have a Python app split across different files. One of them, models.py, contains, among PyQt5 table models, several maps referred from several PyQt5 form files:
# first lines:
agents_id_map = \
{agent.name:agent.id for agent in db.session.query(db.Agent, db.Agent.id)}
# ....
# 2000 thousand lines
I want to keep this kind of maps centralized in a single point. I'm using SQLAlchemy also. Agent class is defined in a db.py file. I use these maps to fulfill the foreign key in another object, say, an invoice, like:
invoice = db.Invoice()
# Here is a reference
invoice.agent_id = models.agents_id_map[agent_combo.currentText()]
ยทยทยทยท
db.session.add(invoice)
db.session.commit()
The problem is that the model.py module gets cached and several parts of the application access old data, and, if another running instance A of the app creates a new agent, and a running instance B wants to create a new invoice, the B running instance won't see the new Agent created by A unless restarts the app. This also happens if a user in the same running instance creates an agent and then he wants to create an invoice. My solutions are:
Reload the module, to get the whole code executed again, but this could be very expensive.
Isolate the code building those maps in another file, say maps.py, which would be less expensive to reload and change all code that references it through refactoring.
Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?
Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?
Certainly: put you maps inside a function, or even better, a class.
If I understand this problem correctly, you have stateful data (maps) which need regenerating under some condition (every time they are accessed? Or just every time the db is updated?). I would do something like this:
class Mappings:
def __init__(self, db):
self._db = db
... # do any initial db stuff you need to here
def id_map(self, thing):
db_thing = getattr(self._db, thing.title)
return {x.name:x.id for x in self._db.session.query(db_thing, db_thing.id)}
def other_property_map(self, prop):
... # etc
mapping = Mapping(db)
mapping.id_map("agent")
This assumes that the mapping example you've given is your major use-case, but this model could easily be adapted for almost any other mapping you might want.
You would write a method of every kind of 'mapping' you need, and it would return the desired dictionary. Note that here I've assumed you handle setting up the db elsewhere and pass a fully initialised db access object to the class, which is probably what you want to do---this class is just about encapsulating mapper state, not re-inventing your orm.
Caching
I have not provided any caching. But if you have complete control over the db, it is easy enough to run a hook before you do any db commits looking to see if you've touched any particular model, and then state that those need rebuilding. Something like this:
class DbAccess(Mappings):
def __init__(self, db, models):
super().init(db)
self._cached_map = {model: {} for model in models}
def db_update(model: str, params: dict):
try:
self._cached_map[model] = {} # wipe cache
except KeyError:
pass
self._db.update_with_model(model, params) # dummy fn
def id_map(self, thing: str):
try:
return self._cached_map[thing]["id"]
except KeyError:
self._cached_map[thing]["id"] = super().id_map(thing)
return self._cached_map[thing]["id"]
I don't really think DbAccess should inherit from Mappings---put it all in one class, or have a DB class and a Mappings mixin and inherit from both. I just didn't want to write everything out again.
I've not written any real db access routines, (hence my dummy fn) as I don't know how you're doing it (but clearly using an ORM). But the basic idea is just to handle the caching yourself, by storing the mapping every time, but deleting all the stored mappings every time you do any commit transactions involving the model in question (thus rebuilding the cache as needed).
Aside
Note that if you really do have 2,000 lines of manually declared mappings of the form thing.name: thing.id you really should generate them at runtime anyhow. Declarative is all very well and good, but writing out 2,000 permutations of the same thing isn't declarative, it's just time-consuming---and doing the job a simple loop putting the data in ram could do for you at startup.
I am trying to make use of a column's value as a radio button's choice using below code
Forms.py
#retreiving data from database and assigning it to diction list
diction = polls_datum.objects.values_list('poll_choices', flat=True)
#initializing list and dictionary
OPTIONS1 = {}
OPTIONS = []
#creating the dictionary with 0 to no of options given in list
for i in range(len(diction)):
OPTIONS1[i] = diction[i]
#creating tuples from the dictionary above
#OPTIONS = zip(OPTIONS1.keys(), OPTIONS1.values())
for i in OPTIONS1:
k = (i,OPTIONS1[i])
OPTIONS.append(k)
class polls_form(forms.ModelForm):
#retreiving data from database and assigning it to diction list
options = forms.ChoiceField(choices=OPTIONS, widget = forms.RadioSelect())
class Meta:
model = polls_model
fields = ['options']
Using a form I am saving the data or choices in a field (poll_choices), when trying to display it on the index page, it is not reflecting until a server restart.
Can someone help on this please
of course "it is not reflecting until a server restart" - that's obvious when you remember that django server processes are long-running processes (it's not like PHP where each script is executed afresh on each request), and that top-level code (code that's at the module's top-level, not in a function) is only executed once per process when the module is first imported. As a general rule: don't do ANY db query at a module's top-level or at the top-level of a class statement - at best you'll get stale data, at worse it will crash your server process (if you're doing query before everything has been properly setup by django, or if you're doing query based on a schema update before the migration has been applied).
The possible solutions are either to wait until the form's initialisation to setup your field's choices, or to pass a callable as the formfield's choices options, cf https://docs.djangoproject.com/en/2.1/ref/forms/fields/#django.forms.ChoiceField.choices
Also, the way you're building your choices list is uselessly complicated - you could do it as a one-liner:
OPTIONS = list(enumerate(polls_datum.objects.values_list('poll_choices', flat=True))
but it's also very brittle - you're relying on the current db content and ordering for the choice value when you should use the polls_datum's pk instead (which is garanteed to be stable).
And finally: since you're working with what seems to be a related model, you may want to use a ModelChoiceField instead.
For future reference:
What version of Django are you using?
Have you read up on the documentation of ModelForms? https://docs.djangoproject.com/en/2.1/topics/forms/modelforms/
I'm not sure what you're trying to do with diction to dictionary to tuple. I think you could skip a step there and your future self will thank you for that.
Try to follow some tutorials and understand why certain steps are being taken. I can see from your code that you're rather new to coding or Python and there's room for improvement. Not trying to talk you down, but I'm trying to push you into the direction of becoming a better developer ;-)
REAL ANSWER:
That being said, I think the solution is to write the loading of the data somewhere in your form model, rather than 'loose' in forms.py. See bruno's answer for more information on this.
If you want to reload the data on each request that loads the form, you should create a function that gets called every time the form is loaded (for example in the form's __init__ function).
I came across a requirement where I need to save some settings Key, Value in database. So I created a Model with Key, Value fields. Now I want to access these values. First thing that came in my mind I should get value from database. Here is example.
class Settings(models.Model):
key = models.CharField(max_length=50)
value = models.CharField(max_length=100)
#classmethod
def get_key_value(cls, key):
obj = cls.objects.filter(key==key)
if obj.count() > 0:
return obj.first().value;
return None
class Meta:
db_table = u'app_settings'
I think this is not good idea to hit database every time. I want to save all list in global variable or session.
How can I store data in global?
I dont know is this good approach? Please suggest me better way to do it.
Why do you need to save when you can access the data in settings.py directly?
from django.conf import settings
print settings.SOME_VALUE
So There is no need to save I believe
You need to have a centralized data store that will act as the most reliable source, your database in this case.
You can set the corresponding key,value pair in your settings file and use the settings file key, value pair by simply importing it.
But keep in mind, that on every server restart, the settings file key, value pairs will get reset to their default value.
However, you can also write a function that runs on each server restart, that fetches the corresponding database values and populates the settings file values.
In settings file:
keys = get_keys()
key_setter.py:
def get_keys():
### Write function as per your requirements ###
Now you may simply import "keys" and use them in your views.
When making any change in the database values, make sure your server restarts. Your database must be the only truthful source. You can also add a post save method to the model that updates the settings file value.
I had a similar requirement where I had to dynamically populate the allowed_hosts in the settings file. I achieved this through a function call that fetched certain values from the database. This function would run on each server restart and I also added a post save method to modify the settings file allowed_hosts variable.
there's something I'm struggling to understand with SQLAlchamy from it's documentation and tutorials.
I see how to autoload classes from a DB table, and I see how to design a class and create from it (declaratively or using the mapper()) a table that is added to the DB.
My question is how does one write code that both creates the table (e.g. on first run) and then reuses it?
I don't want to have to create the database with one tool or one piece of code and have separate code to use the database.
Thanks in advance,
Peter
create_all() does not do anything if a table exists already, so just call it as soon as you set up your engine or connection.
(Note that if you change your table schema, create_all() will not update it! So you still need "another program" to do that.)
This is the usual pattern:
def createEngine(metadata, dsn, **args):
engine = create_engine(dsn, **args)
metadata.create_all(engine)
return engine
def doStuff(engine):
res = engine.execute('select * from mytable')
# etc etc
def main():
engine = createEngine(metadata, 'sqlite:///:memory:')
doStuff(engine)
if __name__=='__main__':
main()
I think you're perhaps over-thinking the situation. If you want to create the database afresh, you normally just call Base.metadata.create_all() or equivalent, and if you don't want to do that, you don't call it.
You could try calling it every time and handling the exception if it goes wrong, assuming that the database is already set up.
Or you could try querying for a certain table and if that fails, call create_all() to put everything in place.
Every other part of your app should work in the same way whether you perform the db creation or not.
I am writing a script that requires interacting with several databases (not concurrently). In order to facilitate this, I am mainting the db related information (connections etc) in a dictionary. As an aside, I am using sqlAlchemy for all interaction with the db. I don't know whether that is relevant to this question or not.
I have a function to set up the pool. It looks somewhat like this:
def setupPool():
global pooled_objects
for name in NAMES:
engine = create_engine("postgresql+psycopg2://postgres:pwd#localhost/%s" % name)
metadata = MetaData(engine)
conn = engine.connect()
tbl = Table('my_table', metadata, autoload=True)
info = {'db_connection': conn, 'table': tbl }
pooled_objects[name] = info
I am not sure if there are any gotchas in the code above, since I am using the same variable names, and its not clear (to me atleast), how the underlying pointers to the resources (connection are being handled). For example, will creating another engine (to a different db) and assigning it to the 'engine' variable cause the previous instance to be 'harvested' by the GC (since no code is using that reference yet - the pool is still being setup).
In short, is the code above OK?, and if not, why not - i.e. how may I fix it with respect to the issues mentioned above?
The code you have is perfectly good.
Just because you use the same variable name does not mean you are overriding (or freeing) another object that was assigned to that variable. In fact, you can look at the names as temporary labels to your objects.
Now, you store the final objects in the global dictionary pooled_objects, which means that until your program is done or your delete data from there explicitely, GC is not going to free them.