Names of instances and loading objects from a database - python

I got for example the following structure of a class.
class Company(object):
Companycount = 0
_registry = {}
def __init__(self, name):
Company.Companycount +=1
self._registry[Company.Companycount] = [self]
self.name = name
k = Company("a firm")
b = Company("another firm")
Whenever I need the objects I can access them by using
Company._registry
which gives out a dictionary of all instances.
Do I need reasonable names for my objects since the name of the company is a class attribute, and I can iterate over Company._registry?
When loading the data from the database does it matter what the name of the instance (here k and b) is? Or can I just use arbitrary strings?

Both your Company._registry and the names k and b are just references to your actual instances. Neither play any role in what you'd store in the database.
Python's object model has all objects living on a big heap, and your code interacts with the objects via such references. You can make as many references as you like, and objects automatically are deleted when there are no references left. See the excellent Facts and myths about Python names and values article by Ned Batchelder.
You need to decide, for yourself, if the Company._registry structure needs to have names or not. Iteration over a list is slow if you already have a name for a company you wanted to access, but a dictionary gives you instant access.
If you are going to use an ORM, then you don't really need that structure anyway. Leave it to the ORM to help you find your objects, or give you a sequence of all objects to iterate over. I recommend using SQLAlchemy for this.

the name doesn't matter but if you are gonna initialize a lot of objects you are still gonna make it reasonable somehow

Related

Using slots and "constants" in Python

I'm working with some LDAP data in python (I'm not great at Python) and trying to organize a class object to hold the LDAP variables. Since it's LDAP data, the end result will be many copies of the same data structure (per user) collected in an iterable list.
I started with hard-coded attribute names for the __slots__ which seemed to be working, but as things progressed I realized those LDAP attributes ought to be immutable constants of some sort to minimize hard coded text/typos. I assigned variables to the __slot__ attributes but it seems this is not such a workable plan:
AttributeError: 'LDAP_USER' object has no attribute 'ATTR_GIVEN_NAME'
Now that I think about it, I'm not actually creating immutable "constants" with the ATTR_ definitions so those values could theoretically be changed during runtime. I can see why Python might be having a problem with this design.
What is a better way to reduce the usage of hard coded text in the code while maintaining a class object which can be instantiated?
ATTR_DN = 'dn'
ATTR_GIVEN_NAME = 'givenName'
ATTR_SN = 'sn'
ATTR_EMP_TYPE = 'employeeType'
class LDAP_USER (object):
__slots__ = ATTR_GIVEN_NAME, ATTR_DN, ATTR_SN, ATTR_EMP_TYPE
user = LDAP_USER()
user.ATTR_GIVEN_NAME = "milton"
user.ATTR_SN = "waddams"
user.ATTR_EMP_TYPE = "collating"
print ("user is " + user.ATTR_GIVEN_NAME)
With __slots__ defined as [ATTR_GIVEN_NAME, ATTR_DN], the attributes should be referenced using user.givenName and user.dn, since those are the string values in __slots__.
If you want to actually reference the attribute as user.ATTR_GIVEN_NAME then that should be the value in the __slots__ array. You can then add a mapping routine to convert the object attributes to LDAP fields when performing LDAP operations with the object.
Referencing a property not in __slots__ will generate an error, so typos will be caught at runtime.

Why shouldn't one dynamically generate variable names in python?

Right now I am learning Python and struggling with a few concepts of OOP, one of that being how difficult it is (to me) to dynamically initialize class instances and assign them to a dynamically generated variable name and why I am reading that I shouldn't do that in the first place.
In most threads with a similar direction, the answer seems to be that it is un-Pythonic to do that.
For example generating variable names on fly in python
Could someone please elaborate?
Take the typical OOP learning case:
LOE = ["graham", "eric", "terry_G", "terry_J", "john", "carol"]
class Employee():
def __init__(self, name, job="comedian"):
self.name = name
self.job = job
Why is it better to do this:
employees = []
for name in LOE:
emp = Employee(name)
employees.append(emp)
and then
for emp in employees:
if emp.name == "eric":
print(emp.job)
instead of this
for name in LOE:
globals()[name] = Employee(name)
and
print(eric.job)
Thanks!
If you dynamically generate variable names, you don't know what names exist, and you can't use them in code.
globals()[some_unknown_name] = Foo()
Well, now what? You can't safely do this:
eric.bar()
Because you don't know whether eric exists. You'll end up having to test for eric's existence using dictionaries/lists anyway:
if 'eric' in globals(): ...
So just store your objects in a dictionary or list to begin with:
people = {}
people['eric'] = Foo()
This way you can also safely iterate one data structure to access all your grouped objects without needing to sort them from other global variables.
globals() gives you a dict which you can put names into. But you can equally make your own dict and put the names there.
So it comes down to the idea of "namespaces," that is the concept of isolating similar things into separate data structures.
You should do this:
employees = {}
employees['alice'] = ...
employees['bob'] = ...
employees['chuck'] = ...
Now if you have another part of your program where you describe parts of a drill, you can do this:
drill['chuck'] = ...
And you won't have a name collision with Chuck the person. If everything were global, you would have a problem. Chuck could even lose his job.

ZODB multiple object references

im developing an application that it is used to fill some huge forms. There are several projects that a form can belong to. Also the form has two sections a that can be filled in many times, like objectives and activities, so a form can have many objectives and activities defined.
I have a class to represent the projects, another for the form and two simple classes to represent the objective and activities. Project has a list of forms, and Form has a list of activities and objectives.
class Project(persistent.Persistent):
forms = PersistentList()
...
class Form(persistent.Persistent):
objectives = PersistentList()
activities = PersistentList()
...
My question is, im planning on storing this data in ZODB like this:
db['projects'] = OOBTree()
db['forms'] = OOBTree()
db['activities'] = OOBTree()
db['objectives'] = OOBTree()
project = Project(...)//fill data with some parameters
form = Form(...)//fill data with some parameters
objective1 = Objective(...)
objective2 = Objective(...)
activity1 = Activitiy(...)
activity2 = Activitiy(...)
form.addObjective(objective1)
form.addObjective(objective2)
form.addActivity(activity1)
form.addActivity(activity2)
project.addForm(form)
db['projects']['projectID'] = project
db['forms']['formID'] = form
db['activities']['activityID'] = activity1
db['activities']['activityID'] = activity2
db['objectives']['objectiveID'] = objective1
db['objectives']['objectiveID'] = ojective2
transaction.commit()
I know that when storing the project, the list of forms gets persisted as well, and the corresponding list of objectives and activities from the form too.
But what happens in the case of the other OOBTrees, 'forms', 'activities' and 'objectives'?
im doing this in order to be easier to traverse or look for individual forms/objectives/activities. But im not sure if the ZODB will cache those objects and only persist them once when saving the project, and keeping a reference to that object. So when any of those are modified, all references are updated.
Meaning that when doing db['forms']['formID'] = form the OOBTree will point to the same object as the project OOBTree and thus not persisting the same object twice.
Is that the way it works? or ill get duplicated persisted objects and will all be independent instances?
I know that theres repoze catalog to handle indexing and stuff, but i dont need that much, just be able to access a form without having to iterate over projects.
Thanks!
Yes, as long as the target objects you are storing have classes that subclass persistent.Persistent somewhere in their inheritance, any references to the same object will point to exactly the same (persistent) object. You should not expect duplication as you have described this.
The short-long-version: ZODB uses special pickling techniques, when serializing the source/referencing object, it sees that the reference is to a persistent object, instead of storing that object again, it stores a tuple of the class dotted name and the internal OID of the target object.
Caveat: this only works within the same object database. You should not have cross-database references in your application.

Python: Retrieve items from a set

In general, Python sets don't seem to be designed for retrieving items by key. That's obviously what dictionaries are for. But is there anyway that, given a key, you can retrieve an instance from a set which is equal to the key?
Again, I know this is exactly what dictionaries are for, but as far as I can see, there are legitimate reasons to want to do this with a set. Suppose you have a class defined something like:
class Person:
def __init__(self, firstname, lastname, age):
self.firstname = firstname
self.lastname = lastname
self.age = age
Now, suppose I am going to be creating a large number of Person objects, and each time I create a Person object I need to make sure it is not a duplicate of a previous Person object. A Person is considered a duplicate of another Person if they have the same firstname, regardless of other instance variables. So naturally the obvious thing to do is insert all Person objects into a set, and define a __hash__ and __eq__ method so that Person objects are compared by their firstname.
An alternate option would be to create a dictionary of Person objects, and use a separately created firstname string as the key. The drawback here is that I'd be duplicating the firstname string. This isn't really a problem in most cases, but what if I have 10,000,000 Person objects? The redundant string storage could really start adding up in terms of memory usage.
But if two Person objects compare equally, I need to be able to retrieve the original object so that the additional instance variables (aside from firstname) can be merged in a way required by the business logic. Which brings me back to my problem: I need some way to retrieve instances from a set.
Is there anyway to do this? Or is using a dictionary the only real option here?
I'd definitely use a dictionary here. Reusing the firstname instance variable as a dictionary key won't copy it -- the dictionary will simply use the same object. I doubt a dictionary will use significantly more memory than a set.
To actually save memory, add a __slots__ attribute to your classes. This will prevent each of you 10,000,000 instances from having a __dict__ attribute, which will save much more memory than the potential overhead of a dict over a set.
Edit: Some numbers to back my claims. I defined a stupid example class storing pairs of random strings:
def rand_str():
return str.join("", (chr(random.randrange(97, 123))
for i in range(random.randrange(3, 16))))
class A(object):
def __init__(self):
self.x = rand_str()
self.y = rand_str()
def __hash__(self):
return hash(self.x)
def __eq__(self, other):
return self.x == other.x
The amount of memory used by a set of 1,000,000 instances of this class
random.seed(42)
s = set(A() for i in xrange(1000000))
is on my machine 240 MB. If I add
__slots__ = ("x", "y")
to the class, this goes down to 112 MB. If I store the same data in a dictionary
def key_value():
a = A()
return a.x, a
random.seed(42)
d = dict(key_value() for i in xrange(1000000))
this uses 249 MB without __slots__ and 121 MB with __slots__.
Yes, you can do this: A set can be iterated over. But note that this is an O(n) operation as opposed to the O(1) operation of the dict.
So, you have to trade off speed versus memory. This is a classic. I personally would optimize for here (i.e. use the dictionary), since memory won't get short so quickly with only 10,000,000 objects and using dictionaries is really easy.
As for additional memory consumption for the firstname string: Since strings are immutable in Python, assigning the firstname attribute as a key will not create a new string, but just copy the reference.
I think you'll have the answer here:
Moving Beyond Factories in Python

App Engine, Cross reference between two entities

i will like to have two types of entities referring to each other.
but python dont know about name of second entity class in the body of first yet.
so how shall i code.
class Business(db.Model):
bus_contact_info_ = db.ReferenceProperty(reference_class=Business_Info)
class Business_Info (db.Model):
my_business_ = db.ReferenceProperty(reference_class=Business)
if you advice to use reference in only one and use the implicitly created property
(which is a query object) in other.
then i question the CPU quota penalty of using query vs directly using get() on key
Pleas advise how to write this code in python
Queries are a little slower, and so they do use a bit more resources. ReferenceProperty does not require reference_class. So you could always define Business like:
class Business(db.Model):
bus_contact_info_ = db.ReferenceProperty()
There may also be better options for your datastructure too. Check out the modelling relationships article for some ideas.
Is this a one-to-one mapping? If this is a one-to-one mapping, you may be better off denormalizing your data.
Does it ever change? If not (and it is one-to-one), perhaps you could use entity groups and structure your data so that you could just directly use the keys / key names. You might be able to do this by making BusinessInfo a child of Business, then always use 'i' as the key_name. For example:
business = Business().put()
business_info = BusinessInfo(key_name='i', parent=business).put()
# Get business_info from business:
business_info = db.get(db.Key.from_path('BusinessInfo', 'i', parent=business))
# Get business from business_info:
business = db.get(business_info.parent())

Categories