"Annotating" querysets with model function returns

"Annotating" querysets with model function returns - python

Basically I want to do something similar to annotating a queryset but with a call on a function in the model attached to the response.
Currently I have something like:
objs = WebSvc.objects.all().order_by('content_type', 'id')
for o in objs:
o.state = o.cast().get_state()
where get_state() is a function in the model that calls an external web service. I don't want to go down the road of caching the values. I was just hoping for a more succinct way of doing this.

One way to do this, using python properties:
class WebSvc(models.Model):
...
def _get_state():
return self.cast().get_state()
state = property(_get_state)
Advantages: will only run when the property is needed.
Possible disadvantage: when you call the property multiple times, the web service will be called anew (can be required behaviour but I doubt it). You can cache using memoization.
Other way, just do it by overriding init:
class WebSvc(models.Model):
...
def __init__(*args, **kwargs):
super(WebSvc, self).__init__(*args,**kwargs)
self.state = self.caste().get_state()
Advantages: Will only be computed once per instance without need for memoization.
Possible disadvantage: will be calculated for each instantiated object.
In most typical django cases however, you will only run once over properties of an object and you will probably not instantiate object where you won't use the .state property on. So in these cases the approaches are more or less similar in 'performance'.

Related

Parallelize independent class attributes setting

The part of my code that I need to parallelize is something like this:
for ClassInstance in ClassInstancesList:
ClassInstance.set_attributes(arguments)
With the method "set_attributes" having no return and just setting the attributes of the class instance.
I tried using multiprocessing and concurrent.futures, but both of those do a copy of the class instance which is not what I want.
The fixes that I saw (returning self, returning all the attributes and using another method to set the attributes, or using multiprocessing.Value) would either make copies of a big number of lists of lists or make me change the methods in my class in such a way as to make it very difficult to read. (set_attributes actually calls various methods set_attribute_A, set_attribute_B etc..)
In my case the threads can be completely independent.
EDIT: Here is my attempt at a minimal reproducible example:
class Object:
def _init_(self, initial_attributes):
self.attributes1 = initial_attributes
def update(self, attributes):
self.attributes1.append(attributes)
def set_attributes2(self, args):
# Computations based on attributes 1 and args, in the real code many other
# similar private methods are called
self._set_attribute(args)
def detect_and_fill_Objects(args):
ObjectList = detect(args) # other function which initializes instances and updates them
# at this point, the objects instances only have attributes 1 set
# This following loop is the one that I want to parallelize, the one that sets
# the attributes 2
for Object in ObjectList:
Object.set_attributes2(args)
When I ran the code using multiprocessing there was a great speed-up but all the computations were lost because they were done one copies of the instances and not the instances themselves, therefore I believe that a decent speedup could be obtained ?

Cleanest way to reset a class attribute

I have a class that looks like the following
class A:
communicate = set()
def __init__(self):
pass
...
def _some_func(self):
...some logic...
self.communicate.add(some_var)
The communicate variable is shared among the instances of the class. I use it to provide a convenient way for the instances of this class to communicate with one another (they have some mild orchestration needed and I don't want to force the calling object to serve as an intermediary of this communication). However, I realized this causes problems when I run my tests. If I try to test multiple aspects of my code, since the python interpreter is the same throughout all the tests, I won't get a "fresh" A class for the tests, and as such the communicate set will be the aggregate of all objects I add to that set (in normal usage this is exactly what I want, but for testing I don't want interactions between my tests). Furthermore, down the line this will also cause problems in my code execution if I want to loop over my whole process multiple times (because I won't have a way of resetting this class variable).
I know I can fix this issue where it occurs by having the creator of the A objects do something like
A.communicate = set()
before it creates and uses any instances of A. However, I don't really love this because it forces my caller to know some details about the communication pathways of the A objects, and I don't want that coupling. Is there a better way for me to to reset the communicate A class variable? Perhaps some method I could call on the class instead of an instance itself like (A.new_batch()) that would perform this resetting? Or is there a better way I'm not familiar with?
Edit:
I added a class method like
class A:
communicate = set()
def __init__(self):
pass
...
#classmethod
def new_batch(cls):
cls.communicate = set()
def _some_func(self):
...some logic...
self.communicate.add(some_var)
and this works with the caller running A.new_batch(). Is this the way it should be constructed and called, or is there a better practice here?

Best place to put a "saveAll" method/function

I have a Model class that is structured as follows:
class Item(models.Model):
place = models.CharField(max_length=40)
item_id = models.IntegerField()
# etc.
#classmethod
def from_obj(cls, **kwargs):
i = Item()
# populate this from json data
# which needs a lot of translations applied
# before saving the data
So, from the above I have a way to create an Item instance via the from_obj classmethod. Now I want to process a json array that contains about 100 objects to create those 100 Item instances. Where should this be put? The function/method would look something like this:
def save_all():
objects = requests.get(...).json()
for obj in objects:
item = Item.from_obj(obj)
item.save()
Should this be a staticmethod within Item. A plain function outside it? Or should another class be created to 'manage' the Item. What is usually considered the best practice for something like this pattern?

If it was pure Python, I'd say either use a plain function or make it a classmethod.
Now this is django's ORM, and the convention for "table-level" operations here is to put them in a custom ModelManager class.
EDIT Except that in your example, there's an outgoing HTTP request, which suggests it might be better as a plain function - it's closer to a view or management command than to a ModelManager method
Why does python encourage one way and django encourages another way?
Python itself doesn't "encourage" anything special, those (plain function or classmethod) are just the obvious choices - it doesn't make sense to make this an instancemethod (you don't need any instance here) nor a staticmethod (since you do use the class).
Django's ORM is, well, an ORM - a framework that tries to provide a Python representation of the (rather formal and specialized) domain of relational databases. Separating row-level operations (model classes methods) from table-level operations (modelmanager classes) makes sense in this context.

The bulk_create method is ideal for this, you can add an additional classmethod that calls the API and bulk creates all the objects
#classmethod
def bulk_create_from_api(cls):
objects = requests.get(...).json()
cls.objects.bulk_create([cls.from_obj(obj) for obj in objects])
Used like this
Item.bulk_create_from_api()
Please note: bulk_create does not send any signals like pre_save or post_save

How do I fix bi-directional dependencies between ItemManager and ItemValidator classes?

I have a bi-directional relationship between two class, ItemManager and ItemValidator, where an ItemManager has a list of ItemValidators but each ItemValidator also takes an instance of the ItemManager it belongs to so that it can use methods from the ItemManager.
Is this bad practice? If so, what would be a better way to do it?
class ItemValidator:
def __init__(self, order_manager):
self.order_manager = order_manager
def run(self, new_item):
raise NotImplementedError()
class ItemValidatorImpl(ItemValidator):
def run(self, new_item):
existing_items = self.order_manager.list_items() # Here is the issue as the validator needs methods from the OrderManager
# ... validate the new item ...
class ItemManager:
validator_classes = [ItemValidatorImpl]
def run_validations(self, new_item):
for validator_class in self.validator_classes:
validator = validator_class(self)
validator.run(new_item)
def list_items(self): # Method used by some validator implementations
pass

Having bi-directional relationship is OK. It's not OKif it hurts you in some way.
You have to think if you are not violating the Single Reponsibility Principle in some way by having the Validators be in the Manager because they do need to call some methods from the Mananger do to their work. This means that some methods that you will call on the Manager will require validation and others are just used to query some data for the validation. What if the Validator calls a method on the Manager that requires validation and you get an infinite recursion?
Breaking the Manager into smaller objects may be a better thing to do. You can still have the Manager and the Validators depend on the new objects, but you do remove the dependency to the Manager from the Validators.
You can divide methods in two categories: Commands and Queries. Command do mutations and Queries don't. Validaton should not mutate any state, so you probably will need to do only queries.
You can add another object that is designed to do only queries and pass it to the validators.
If you are building a tree structure you probably will have a parent node containing a collection of children and a child node having a reference to it's parent. In this case it's fine to have it.
Sometimes it's difficult to redesign you solution when you think of having one Manager class doing all the work and breaking it into smaller object can be difficult. Manager is a broad term and we do overuse them. We just stick everything that is related to something (like Order) into one object and this can get us into trouble.

Need to combine classes from two python modules into a single class

I need to combine Classes from two separate Python modules (which are similar in purpose but with different Methods) into a single Class so that the Methods can be accessed from the same object in a natural way both in code and for automatic documentation generation.
I am currently accomplishing the former but not the latter with the following code (this is not verbatim, as I can't share my actual source, but there's nothing different here that would impact the conversation).
Basically, I am creating the new class via a function which combines the __dict__ attributes of the two child Classes and returns a new Class.
def combine(argone, argtwo):
"""
Combine Classes
"""
_combined_arg = "some_string_%s_%s" % argone, argtwo
_temp = type('Temp', (ModuleOne, ModuleTwo), dict())
self = _temp(_combined_arg) # Calling the constructor with our combined arg
# The two classes have an identical constructor method within their __init__() methods
# Return the object we've instantiated off of the combined class
return self
This method works fine for producing an object that lets me call Methods from either of the original Classes, but my IDE can't auto-complete Method names nor can documentation generators (like pdoc) produce any documentation beyond our combine() function.
This process is necessary because we are generating code off of other code (descriptive, I know, sorry!) and it isn't practical to combine them upstream (ie, by hand).
Any ideas?
Thank you in advance!!!
ADDENDUM:
What I can say about what we are doing here is that we're just combining client Methods generated off of REST API endpoints that happen to be split into two, non-overlapping, namespaces for practical reasons that aren't important to this discussion. So that's why simply dropping the methods from ModuleTwo into ModuleOne would be all that needs doing.
If there are suggestions on an automatable and clean way to do this before shipping either module, I am definitely open to hearing them. Not having to do this work would be far preferable. Thanks!

There is no need for combine to define a new class every time it is called.
class CombinedAPI(APIOne, APITwo):
#classmethod
def combine(cls, arg_one, arg_two):
arg = "some_string_%s_%s" % (argone, argtwo)
return cls(arg)
obj = CombinedAPI.combine(foo, bar)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.