Hook every variable reference and execute code - python

I have a Python app split across different files. One of them, models.py, contains, among PyQt5 table models, several maps referred from several PyQt5 form files:
# first lines:
agents_id_map = \
{agent.name:agent.id for agent in db.session.query(db.Agent, db.Agent.id)}
# ....
# 2000 thousand lines
I want to keep this kind of maps centralized in a single point. I'm using SQLAlchemy also. Agent class is defined in a db.py file. I use these maps to fulfill the foreign key in another object, say, an invoice, like:
invoice = db.Invoice()
# Here is a reference
invoice.agent_id = models.agents_id_map[agent_combo.currentText()]
ยทยทยทยท
db.session.add(invoice)
db.session.commit()
The problem is that the model.py module gets cached and several parts of the application access old data, and, if another running instance A of the app creates a new agent, and a running instance B wants to create a new invoice, the B running instance won't see the new Agent created by A unless restarts the app. This also happens if a user in the same running instance creates an agent and then he wants to create an invoice. My solutions are:
Reload the module, to get the whole code executed again, but this could be very expensive.
Isolate the code building those maps in another file, say maps.py, which would be less expensive to reload and change all code that references it through refactoring.
Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?

Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?
Certainly: put you maps inside a function, or even better, a class.
If I understand this problem correctly, you have stateful data (maps) which need regenerating under some condition (every time they are accessed? Or just every time the db is updated?). I would do something like this:
class Mappings:
def __init__(self, db):
self._db = db
... # do any initial db stuff you need to here
def id_map(self, thing):
db_thing = getattr(self._db, thing.title)
return {x.name:x.id for x in self._db.session.query(db_thing, db_thing.id)}
def other_property_map(self, prop):
... # etc
mapping = Mapping(db)
mapping.id_map("agent")
This assumes that the mapping example you've given is your major use-case, but this model could easily be adapted for almost any other mapping you might want.
You would write a method of every kind of 'mapping' you need, and it would return the desired dictionary. Note that here I've assumed you handle setting up the db elsewhere and pass a fully initialised db access object to the class, which is probably what you want to do---this class is just about encapsulating mapper state, not re-inventing your orm.
Caching
I have not provided any caching. But if you have complete control over the db, it is easy enough to run a hook before you do any db commits looking to see if you've touched any particular model, and then state that those need rebuilding. Something like this:
class DbAccess(Mappings):
def __init__(self, db, models):
super().init(db)
self._cached_map = {model: {} for model in models}
def db_update(model: str, params: dict):
try:
self._cached_map[model] = {} # wipe cache
except KeyError:
pass
self._db.update_with_model(model, params) # dummy fn
def id_map(self, thing: str):
try:
return self._cached_map[thing]["id"]
except KeyError:
self._cached_map[thing]["id"] = super().id_map(thing)
return self._cached_map[thing]["id"]
I don't really think DbAccess should inherit from Mappings---put it all in one class, or have a DB class and a Mappings mixin and inherit from both. I just didn't want to write everything out again.
I've not written any real db access routines, (hence my dummy fn) as I don't know how you're doing it (but clearly using an ORM). But the basic idea is just to handle the caching yourself, by storing the mapping every time, but deleting all the stored mappings every time you do any commit transactions involving the model in question (thus rebuilding the cache as needed).
Aside
Note that if you really do have 2,000 lines of manually declared mappings of the form thing.name: thing.id you really should generate them at runtime anyhow. Declarative is all very well and good, but writing out 2,000 permutations of the same thing isn't declarative, it's just time-consuming---and doing the job a simple loop putting the data in ram could do for you at startup.

Related

Is it possible to transform one asset into another asset using ops in dagster?

From what I found here, it is possible to use ops and graphs to generate assets.
However, I would like to use an asset as an input for an op. I am exploring it for a following use case:
I fetch a list of country metadata from an external API and store it in my resource:
#dagster.asset
def country_metadata_asset() -> List[Dict]:
...
I use this asset to define some downstream assets, for example:
#dagster.asset
def country_names_asset(country_metadata_asset) -> List[str]:
...
I would like to use this asset to call another data source to retrieve and validate data and then write it to my resource. It returns a huge amount of rows. That is why I need to do it somehow in batch, and I thought that graph with ops would be a better choice for it. I thought to do something like this:
#dagster.op(out=dagster.DynamicOut())
def load_country_names(country_names_asset):
for country_index, country_name in enumerate(country_names_asset):
yield dagster.DynamicOutput(
country_name, mapping_key=f"{country_index} {country_name}"
)
#dagster.graph()
def update_data_graph():
country_names = load_country_names()
country_names.map(retrieve_and_process_data)
#dagster.job()
def run_update_job():
update_data_graph()
It seems that my approach does not work, and I am not sure if it is conceptually correct. My questions are:
How to tell dagster that the input for load_country_names is an asset? Should I manually materialise it inside op?
How to efficiently write augmented data that I return from retrieve_and_process_data into my resource? It is not possible to keep data in memory. So I thought to implement it somehow using a custom IOManager, but I am not sure how to do it.
It seems to me like the augmented data that's returned from retrieve_and_process_data can (at least in theory) be represented by an asset.
So we can start from the standpoint that we'd like to create some asset that takes in country_names_asset, as well as the source data asset (the thing that has a bunch of rows in it, which we can call big_country_data_asset for now). I think this models the underlying relationships a bit better, independent of how we're actually implementing things.
The question then is how to write the computation function for this asset in a way that doesn't require loading the entire contents of country_data_asset into memory at any point in time. While it's possible that you could do this with a dynamic graph, which you then wrap in a call to AssetsDefinition.from_graph, I think there's an easier approach.
Dagster allows you to circumvent the IOManager machinery both when reading an asset as input, as well as when writing an asset as output. In essence, when you set an AssetKey as a non_argument_dep, this tells Dagster that there is some asset which is upstream of the asset you're defining, but will be loaded within the body of the asset function (rather than being loaded by Dagster using IOManager machinery).
Similarly, if you set the output type of the function to None, this tells Dagster that the asset you're defining will be persisted by the logic inside of the function, rather than by an IOManager.
Using both these concepts, we can write an asset which at no point needs to have the entire big_country_data_asset loaded.
#asset(non_argument_deps={AssetKey("big_country_data_asset")})
def processed_country_data_asset(country_names_asset) -> None:
for name in country_names_asset:
# assuming this function actually stores data somewhere,
# and intrinsically knows how to read from big_country_data_asset
retrieve_and_process_data(name)
IOManagers are a very flexible concept however, and it is possible to replicate all of this same batching behavior while using IOManagers (just a bit more convoluted). You'd need to do something like create a SourceAsset(key="big_country_data_asset", io_manager_def=my_custom_io_manager), where my_custom_io_manager has a weird load_input function which itself returns a function like:
def load_input(context):
def _fn(country_name):
# however you actually get these rows
rows = query_source_data_for_name(country_name)
return rows
return _fn
then, you could define your asset like:
#asset
def processed_country_data_asset(
country_names_asset, big_country_data_asset
) -> None:
for name in country_names_asset:
# big_country_data_asset has been loaded as a function
rows = big_country_data_asset(name)
process_data(rows)
You can also handle writing the output of this function in an IOManager using a similar-looking trick: https://github.com/dagster-io/dagster/discussions/9772

Solution needed to a scenario

I am trying to make use of a column's value as a radio button's choice using below code
Forms.py
#retreiving data from database and assigning it to diction list
diction = polls_datum.objects.values_list('poll_choices', flat=True)
#initializing list and dictionary
OPTIONS1 = {}
OPTIONS = []
#creating the dictionary with 0 to no of options given in list
for i in range(len(diction)):
OPTIONS1[i] = diction[i]
#creating tuples from the dictionary above
#OPTIONS = zip(OPTIONS1.keys(), OPTIONS1.values())
for i in OPTIONS1:
k = (i,OPTIONS1[i])
OPTIONS.append(k)
class polls_form(forms.ModelForm):
#retreiving data from database and assigning it to diction list
options = forms.ChoiceField(choices=OPTIONS, widget = forms.RadioSelect())
class Meta:
model = polls_model
fields = ['options']
Using a form I am saving the data or choices in a field (poll_choices), when trying to display it on the index page, it is not reflecting until a server restart.
Can someone help on this please
of course "it is not reflecting until a server restart" - that's obvious when you remember that django server processes are long-running processes (it's not like PHP where each script is executed afresh on each request), and that top-level code (code that's at the module's top-level, not in a function) is only executed once per process when the module is first imported. As a general rule: don't do ANY db query at a module's top-level or at the top-level of a class statement - at best you'll get stale data, at worse it will crash your server process (if you're doing query before everything has been properly setup by django, or if you're doing query based on a schema update before the migration has been applied).
The possible solutions are either to wait until the form's initialisation to setup your field's choices, or to pass a callable as the formfield's choices options, cf https://docs.djangoproject.com/en/2.1/ref/forms/fields/#django.forms.ChoiceField.choices
Also, the way you're building your choices list is uselessly complicated - you could do it as a one-liner:
OPTIONS = list(enumerate(polls_datum.objects.values_list('poll_choices', flat=True))
but it's also very brittle - you're relying on the current db content and ordering for the choice value when you should use the polls_datum's pk instead (which is garanteed to be stable).
And finally: since you're working with what seems to be a related model, you may want to use a ModelChoiceField instead.
For future reference:
What version of Django are you using?
Have you read up on the documentation of ModelForms? https://docs.djangoproject.com/en/2.1/topics/forms/modelforms/
I'm not sure what you're trying to do with diction to dictionary to tuple. I think you could skip a step there and your future self will thank you for that.
Try to follow some tutorials and understand why certain steps are being taken. I can see from your code that you're rather new to coding or Python and there's room for improvement. Not trying to talk you down, but I'm trying to push you into the direction of becoming a better developer ;-)
REAL ANSWER:
That being said, I think the solution is to write the loading of the data somewhere in your form model, rather than 'loose' in forms.py. See bruno's answer for more information on this.
If you want to reload the data on each request that loads the form, you should create a function that gets called every time the form is loaded (for example in the form's __init__ function).

Is it possible to create a Django model w/ variable functions as an attribute (passed in either via __init__ or some method)?

I'm creating an app. that will generate math problems. They're specific problems where some parameters can be altered. Each problem will be different, and require a different method to solve (all of which will be programatically implemented).
For example:
models.py
import random
from django.db import models
class Problem(models.Model):
unformattedText = models.TextField()
def __init__(self, unformattedText, genFunction, *args, **kwargs):
super(Problem, self).__init__(*args, **kwargs)
self.unformatedText = unformattedText
self.genFunction = genFunction
def genQAPair():
self.genFunction(self.unformattedText)
views.py
def genP1(text):
num_1 = random.randrange(0, 100)
num_2 = random.randrange(0, 100)
text.format((num_1, num_2))
return {'question':text, 'answer':num_1 - num_2}
def genP2(text, lim=4):
num_1 = random.randrange(0, lim)
text.format(num_1)
return {'question':text, 'answer':num_1*40}
p1 = Problem(
unformattedText='Sally has {} apples. Frank takes {}. How many apples does Sally have?',
genFunction=genP1
)
p1.save()
p2 = Problem(
unformattedText='John jumps {} feet into the air. How long does it take for him to age?',
genFunction=genP2
)
p2.save()
When I try this, the function isn't actually saved. Django just saves the integer 1. When I initiate an instance of the model, the function is there as intended, but apparently only 1 is saved to the database.
Bonus question: I'm actually beginning to question whether or not I even need Django models for this. I'm using Django because it's super easy to get everything onto a webpage. Is there a better way to do this? (Maybe store the text of each problem in a JSON file and the generating functions in some separate script.)
The persistence layer for a Django application is the database, and the database schema is specified by your model definitions. In this case you've only defined a single field in your model, unformattedText; you haven't specified any storage for the corresponding function. Your self.genFunction = genFunction is just creating an attribute on an object in memory; it won't be persisted.
There are various possible ways to store the function. You could store it as raw text; you could store it as a pickle blob; you could store the function path and name (e.g. "my.path.to.problems.genP1"); or do something else. In any case, you'll need to create a database field for that information.
Here is a rough outline of an example solution using the function path:
models.py
class Problem(models.Model):
unformattedText = models.TextField()
genPath = models.TextField()
views.py
import importlib
def problem_view(request, problem_id):
problem = Problem.objects.get(id=problem_id)
gen_path, gen_name = problem.genPath.rsplit(".", 1)
gen_module = importlib.import_module(gen_path)
gen_function = getattr(gen_module, gen_name)
context = gen_function(problem.unformattedText)
return render(request, 'template.html', context)
Only you can determine if you need to use a database at all. If you only have a few fixed questions then you could just stuff everything into a Python file and be done with it. But there are advantages to using Django's models, including the ability to use the admin.
There are a couple of options, depending on the actual task. I ranged them starting with the most safe option to the most dangerous (but flexible):
1. Store function identifiers:
You can store genP1 and genP2 as 'genP1' and 'genP2' - i.e. by name (or you can use any other unique identifier).
Pros:
You can validate user input and execute trusted code only, because in this case you control almost everything.
You can easily debug your functions, because they are part of your system.
Cons:
You need to define all your functions in the code. That means if you want to add new function, you need to redeploy your application.
If you are storing function names, you need to manually import module (or package) with the functions and call them.
If you are storing identifiers, you need to define a mapping {identifier: path to actual function}.
2. Use DSL
You can write your own DSL (or use existing)
Pros:
You can add new functions at runtime without redeploying application.
You can control which code can user execute.
You can see source code for your functions.
Cons:
It is hard to write safe and flexible DSL, especially if you want to call some python code from it.
It is hard to debug huge functions.
3. Serialize them
You can serialize functions using pickle
Pros:
You can add new functions at runtime without redeploying application.
Easier than writing own DSL.
Cons:
Unsafe - you must not execute untrusted code. If you allow users to create their own functions, serialization is not your way - define (or use existing) safe DSL instead.
It might be impossible to show source python code for the serialized function. For more information: How can I get the source code of a Python function?
It is hard to debug huge functions.
4. Just store actual Python code
Just store the python source code in the DB as a string.
Pros:
You can add new functions at runtime without redeploying application.
You can see source code without any additional processing.
Easier than writing own DSL.
Cons:
Unsafe - you must not execute untrusted code. If you allow users to create their own functions, storing source code is not your way - define (or use existing) safe DSL instead.
It is hard to debug huge functions.

python load from shelve - can I retain the variable name?

I'm teaching myself how to write a basic game in python (text based - not using pygame). (Note: I haven't actually gotten to the "game" part per-se, because I wanted to make sure I have the basic core structure figured out first.)
I'm at the point where I'm trying to figure out how I might implement a save/load scenario so a game session could persist beyond a signle running of the program. I did a bit of searching and everything seems to point to pickling or shelving as the best solutions.
My test scenario is for saving and loading a single instance of a class. Specifically, I have a class called Characters(), and (for testing's sake) a sigle instance of that class assigned to a variable called pc. Instances of the Character class have an attribute called name which is originally set to "DEFAULT", but will be updated based on user input at the initial setup of a new game. For ex:
class Characters(object):
def __init__(self):
self.name = "DEFAULT"
pc = Characters()
pc.name = "Bob"
I also have (or will have) a large number of functions that refer to various instances using the variables they are asigned to. For example, a made up one as a simplified example might be:
def print_name(character):
print character.name
def run():
print_name(pc)
run()
I plan to have a save function that will pack up the pc instance (among other info) with their current info (ex: with the updated name). I also will have a load function that would allow a user to play a saved game instead of starting a new one. From what I read, the load could work something like this:
*assuming info was saved to a file called "save1"
*assuming the pc instance was shelved with "pc" as the key
import shelve
mysave = shelve.open("save1")
pc = mysave["pc"]
My question is, is there a way for the shelve load to "remember" the variable name assotiated with the instance, and automatically do that << pc = mysave["pc"] >> step? Or a way for me to store that variable name as a string (ex as the key) and somehow use that string to create the variable with the correct name (pc)?
I will need to "save" a LOT of instances, and can automate that process with a loop, but I don't know how to automate the unloading to specific variable names. Do I really have to re-asign each one individually and explicitly? I need to asign the instances back to the apropriate variable names bc I have a bunch of core functions that refer to specific instances using variable names (like the example I gave above).
Ideas? Is this possible, or is there an entirely different solution that I'm not seeing?
Thanks!
~ribs
Sure, it's possible to do something like that. Since a shelf itself is like a dictionary, just save all the character instances in a real dictionary instance inside it using their variable's name as the key. For example:
class Character(object):
def __init__(self, name="DEFAULT"):
self.name = name
pc = Character("Bob")
def print_name(character):
print character.name
def run():
print_name(pc)
run()
import shelve
mysave = shelve.open("save1")
# save all Character instances without the default name
mysave["all characters"] = {varname:value for varname,value in
globals().iteritems() if
isinstance(value, Character) and
value.name != "DEFAULT"}
mysave.close()
del pc
mysave = shelve.open("save1")
globals().update(mysave["all characters"])
mysave.close()
run()

Python - caching a property to avoid future calculations

In the following example, cached_attr is used to get or set an attribute on a model instance when a database-expensive property (related_spam in the example) is called. In the example, I use cached_spam to save queries. I put print statements when setting and when getting values so that I could test it out. I tested it in a view by passing an Egg instance into the view and in the view using {{ egg.cached_spam }}, as well as other methods on the Egg model that make calls to cached_spam themselves. When I finished and tested it out the shell output in Django's development server showed that the attribute cache was missed several times, as well as successfully gotten several times. It seems to be inconsistent. With the same data, when I made small changes (as little as changing the print statement's string) and refreshed (with all the same data), different amounts of misses / successes happened. How and why is this happening? Is this code incorrect or highly problematic?
class Egg(models.Model):
... fields
#property
def related_spam(self):
# Each time this property is called the database is queried (expected).
return Spam.objects.filter(egg=self).all() # Spam has foreign key to Egg.
#property
def cached_spam(self):
# This should call self.related_spam the first time, and then return
# cached results every time after that.
return self.cached_attr('related_spam')
def cached_attr(self, attr):
"""This method (normally attached via an abstract base class, but put
directly on the model for this example) attempts to return a cached
version of a requested attribute, and calls the actual attribute when
the cached version isn't available."""
try:
value = getattr(self, '_p_cache_{0}'.format(attr))
print('GETTING - {0}'.format(value))
except AttributeError:
value = getattr(self, attr)
print('SETTING - {0}'.format(value))
setattr(self, '_p_cache_{0}'.format(attr), value)
return value
Nothing wrong with your code, as far as it goes. The problem probably isn't there, but in how you use that code.
The main thing to realise is that model instances don't have identity. That means that if you instantiate an Egg object somewhere, and a different one somewhere else, even if they refer to the same underlying database row they won't share internal state. So calling cached_attr on one won't cause the cache to be populated in the other.
For example, assuming you have a RelatedObject class with a ForeignKey to Egg:
my_first_egg = Egg.objects.get(pk=1)
my_related_object = RelatedObject.objects.get(egg__pk=1)
my_second_egg = my_related_object.egg
Here my_first_egg and my_second_egg both refer to the database row with pk 1, but they are not the same object:
>>> my_first_egg.pk == my_second_egg.pk
True
>>> my_first_egg is my_second_egg
False
So, filling the cache on my_first_egg doesn't fill it on my_second_egg.
And, of course, objects won't persist across requests (unless they're specifically made global, which is horrible), so the cache won't persist either.
Http servers that scale are shared-nothing; you can't rely on anything being singleton. To share state, you need to connect to a special-purpose service.
Django's caching support is appropriate for your use case. It isn't necessarily a global singleton either; if you use locmem://, it will be process-local, which could be the more efficient choice.

Categories