Python caching attributes in object with __slots__

Python caching attributes in object with __slots__ - python

I am trying to cache a computationally expensive property in a class defined with the __slots__ attribute.
Any idea, how to store the cache for later use? Of course the usual way to store a dictionary in instance._cache would not work without __dict__ being defined. For several reasons i do not want to add a '_cache' string to __slots__.
I was thinking whether this is one of the rare use cases for global. Any thoughts or examples on this matter?

There is no magic possible there - ou want to store a value, so you need a place to store your value.
You can't just decide "I won't have an extra entry on my __slots__ because it is not elegant" - you don't need to call it _cached:
give it whatever name you want, but these cached values are something you want to exist in each of the object's instances, and therefore you need an attribute.
You can cache in a global (module level) dictionary, in which the keys are id(self) - but that would be a major headache to keep synchronized when instances are deleted. (The same thing is true for a class-level dictionary, with the further downside of it still be visible on the instance).
TL;DR: the "one and obvious way to do it" is to have a shadow attribute, starting with "_" to keep the values you want cached, and declare these in __slots__. (If you use a _cached dictionary per instance, you loose the main advantage from __slots__, that is exactly not needing one dictionary per instance).

You don't quite need a global; you can store the cache as a class property and still define the expensive property as a property.
class Foo(object):
__slots__ = ('a', 'b', 'c')
expensive_cache = {}
#property
def expensive(self):
if self not in self.expensive_cache:
self.expensive_cache[self] = self._compute_expensive()
return self.expensive_cache[self]
def _compute_expensive(self):
print("Computing expensive property for {}".format(self))
return 3
f = Foo()
g = Foo()
print(f.expensive)
print("===")
print(f.expensive)
print("===")
print(g.expensive)
If you run this code, you can see that _compute_expensive is run only once, the first time you access expensive for each distinct object.
$ python3 tmp.py
Computing expensive property for <__main__.Foo object at 0x102861188>
3
===
3
===
Computing expensive property for <__main__.Foo object at 0x1028611c8>
3

Something like Borg pattern can help.
You can alterate the status of your instance in the __init__ or __new__ methods.

Related

how deque of python print all items [duplicate]

If someone writes a class in python, and fails to specify their own __repr__() method, then a default one is provided for them. However, suppose we want to write a function which has the same, or similar, behavior to the default __repr__(). However, we want this function to have the behavior of the default __repr__() method even if the actual __repr__() for the class was overloaded. That is, suppose we want to write a function which has the same behavior as a default __repr__() regardless of whether someone overloaded the __repr__() method or not. How might we do it?
class DemoClass:
def __init__(self):
self.var = 4
def __repr__(self):
return str(self.var)
def true_repr(x):
# [magic happens here]
s = "I'm not implemented yet"
return s
obj = DemoClass()
print(obj.__repr__())
print(true_repr(obj))
Desired Output:
print(obj.__repr__()) prints 4, but print(true_repr(obj)) prints something like:
<__main__.DemoClass object at 0x0000000009F26588>

You can use object.__repr__(obj). This works because the default repr behavior is defined in object.__repr__.

Note, the best answer is probably just to use object.__repr__ directly, as the others have pointed out. But one could implement that same functionality roughly as:
>>> def true_repr(x):
... type_ = type(x)
... module = type_.__module__
... qualname = type_.__qualname__
... return f"<{module}.{qualname} object at {hex(id(x))}>"
...
So....
>>> A()
hahahahaha
>>> true_repr(A())
'<__main__.A object at 0x106549208>'
>>>

Typically we can use object.__repr__ for that, but this will to the "object repr for every item, so:
>>> object.__repr__(4)
'<int object at 0xa6dd20>'
Since an int is an object, but with the __repr__ overriden.
If you want to go up one level of overwriting, we can use super(..):
>>> super(type(4), 4).__repr__() # going up one level
'<int object at 0xa6dd20>'
For an int that thus again means that we will print <int object at ...>, but if we would for instance subclass the int, then it would use the __repr__ of int again, like:
class special_int(int):
def __repr__(self):
return 'Special int'
Then it will look like:
>>> s = special_int(4)
>>> super(type(s), s).__repr__()
'4'
What we here do is creating a proxy object with super(..). Super will walk the method resolution order (MRO) of the object and will try to find the first function (from a superclass of s) that has overriden the function. If we use single inheritance, that is the closest parent that overrides the function, but if it there is some multiple inheritance involved, then this is more tricky. We thus select the __repr__ of that parent, and call that function.
This is also a rather weird application of super since usually the class (here type(s)) is a fixed one, and does not depend on the type of s itself, since otherwise multiple such super(..) calls would result in an infinite loop.
But usually it is a bad idea to break overriding anyway. The reason a programmer overrides a function is to change the behavior. Not respecting this can of course sometimes result into some useful functions, but frequently it will result in the fact that the code contracts are no longer satisfied. For example if a programmer overrides __eq__, he/she will also override __hash__, if you use the hash of another class, and the real __eq__, then things will start breaking.
Calling magic function directly is also frequently seen as an antipattern, so you better avoid that as well.

Python: Assigning custom attributes on objects [duplicate]

So, I was playing around with Python while answering this question, and I discovered that this is not valid:
o = object()
o.attr = 'hello'
due to an AttributeError: 'object' object has no attribute 'attr'. However, with any class inherited from object, it is valid:
class Sub(object):
pass
s = Sub()
s.attr = 'hello'
Printing s.attr displays 'hello' as expected. Why is this the case? What in the Python language specification specifies that you can't assign attributes to vanilla objects?
For other workarounds, see How can I create an object and add attributes to it?.

To support arbitrary attribute assignment, an object needs a __dict__: a dict associated with the object, where arbitrary attributes can be stored. Otherwise, there's nowhere to put new attributes.
An instance of object does not carry around a __dict__ -- if it did, before the horrible circular dependence problem (since dict, like most everything else, inherits from object;-), this would saddle every object in Python with a dict, which would mean an overhead of many bytes per object that currently doesn't have or need a dict (essentially, all objects that don't have arbitrarily assignable attributes don't have or need a dict).
For example, using the excellent pympler project (you can get it via svn from here), we can do some measurements...:
>>> from pympler import asizeof
>>> asizeof.asizeof({})
144
>>> asizeof.asizeof(23)
16
You wouldn't want every int to take up 144 bytes instead of just 16, right?-)
Now, when you make a class (inheriting from whatever), things change...:
>>> class dint(int): pass
...
>>> asizeof.asizeof(dint(23))
184
...the __dict__ is now added (plus, a little more overhead) -- so a dint instance can have arbitrary attributes, but you pay quite a space cost for that flexibility.
So what if you wanted ints with just one extra attribute foobar...? It's a rare need, but Python does offer a special mechanism for the purpose...
>>> class fint(int):
... __slots__ = 'foobar',
... def __init__(self, x): self.foobar=x+100
...
>>> asizeof.asizeof(fint(23))
80
...not quite as tiny as an int, mind you! (or even the two ints, one the self and one the self.foobar -- the second one can be reassigned), but surely much better than a dint.
When the class has the __slots__ special attribute (a sequence of strings), then the class statement (more precisely, the default metaclass, type) does not equip every instance of that class with a __dict__ (and therefore the ability to have arbitrary attributes), just a finite, rigid set of "slots" (basically places which can each hold one reference to some object) with the given names.
In exchange for the lost flexibility, you gain a lot of bytes per instance (probably meaningful only if you have zillions of instances gallivanting around, but, there are use cases for that).

As other answerers have said, an object does not have a __dict__. object is the base class of all types, including int or str. Thus whatever is provided by object will be a burden to them as well. Even something as simple as an optional __dict__ would need an extra pointer for each value; this would waste additional 4-8 bytes of memory for each object in the system, for a very limited utility.
Instead of doing an instance of a dummy class, in Python 3.3+, you can (and should) use types.SimpleNamespace for this.

It is simply due to optimization.
Dicts are relatively large.
>>> import sys
>>> sys.getsizeof((lambda:1).__dict__)
140
Most (maybe all) classes that are defined in C do not have a dict for optimization.
If you look at the source code you will see that there are many checks to see if the object has a dict or not.

So, investigating my own question, I discovered this about the Python language: you can inherit from things like int, and you see the same behaviour:
>>> class MyInt(int):
pass
>>> x = MyInt()
>>> print x
0
>>> x.hello = 4
>>> print x.hello
4
>>> x = x + 1
>>> print x
1
>>> print x.hello
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'int' object has no attribute 'hello'
I assume the error at the end is because the add function returns an int, so I'd have to override functions like __add__ and such in order to retain my custom attributes. But this all now makes sense to me (I think), when I think of "object" like "int".

https://docs.python.org/3/library/functions.html#object :
Note: object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.

It's because object is a "type", not a class. In general, all classes that are defined in C extensions (like all the built in datatypes, and stuff like numpy arrays) do not allow addition of arbitrary attributes.

This is (IMO) one of the fundamental limitations with Python - you can't re-open classes. I believe the actual problem, though, is caused by the fact that classes implemented in C can't be modified at runtime... subclasses can, but not the base classes.

How to get all instances of a certain class in python?

Someone asked a similar one [question]:Printing all instances of a class.
While I am less concerned about printing them, I'd rather to know how many instances are currently "live".
The reason for this instance capture is more like a setting up a scheduled job, every hour check these "live" unprocessed instances and enrich the data. After that, either a flag in this instance is set or just delete this instance.
Torsten Marek 's answer in [question]:Printing all instances of a class using weakrefs need a call to the base class constructor for every class of this type, is it possible to automate this? Or we can get all instances with some other methods?

You can either track it on your own (see the other answers) or ask the garbage collector:
import gc
class Foo(object):
pass
foo1, foo2 = Foo(), Foo()
foocount = sum(1 for o in gc.get_referrers(Foo) if o.__class__ is Foo)
This can be kinda slow if you have a lot of objects, but it's generally not too bad, and it has the advantage of being something you can easily use with someone else's code.
Note: Used o.__class__ rather than type(o) so it works with old-style classes.

If you only want this to work for CPython, and your definition of "live" can be a little lax, there's another way to do this that may be useful for debugging/introspection purposes:
>>> import gc
>>> class Foo(object): pass
>>> spam, eggs = Foo(), Foo()
>>> foos = [obj for obj in gc.get_objects() if isinstance(obj, Foo)]
>>> foos
[<__main__.Foo at 0x1153f0190>, <__main__.Foo at 0x1153f0210>]
>>> del spam
>>> foos = [obj for obj in gc.get_objects() if isinstance(obj, Foo)]
>>> foos
[<__main__.Foo at 0x1153f0190>, <__main__.Foo at 0x1153f0210>]
>>> del foos
>>> foos = [obj for obj in gc.get_objects() if isinstance(obj, Foo)]
>>> foos
[<__main__.Foo at 0x1153f0190>]
Note that deleting spam didn't actually make it non-live, because we've still got a reference to the same object in foos. And reassigning foos didn't not help, because apparently the call to get_objects happened before the old version is released. But eventually it went away once we stopped referring to it.
And the only way around this problem is to use weakrefs.
Of course this will be horribly slow in a large system, with or without weakrefs.

Sure, store the count in a class attribute:
class CountedMixin(object):
count = 0
def __init__(self, *args, **kwargs):
type(self).count += 1
super().__init__(*args, **kwargs)
def __del__(self):
type(self).count -= 1
try:
super().__del__()
except AttributeError:
pass
You could make this slightly more magical with a decorator or a metaclass than with a base class, or simpler if it can be a bit less general (I've attempted to make this fit in anywhere in any reasonable multiple-inheritance hierarchy, which you usually don't need to worry about…), but basically, this is all there is to it.
If you want to have the instances themselves (or, better, weakrefs to them), rather than just a count of them, just replace count=0 with instances=set(), then do instances.add(self) instead of count += 1, etc. (Again, though, you probably want a weakref to self, rather than self.)

I cannot comment to the answer of kindall, thus I write my comment as answer:
The solution with gc.get_referrers(<ClassName>) does not work with inherited classes in python 3. The method gc.get_referrers(<ClassName>) does not return any instances of a class that was inherited from <ClassName>.
Instead you need to use gc.get_objects() which is much slower, since it returns a full list of objects. But in case of unit-tests, where you simply want to ensure your objects get deleted after the test (no circular references) it should be sufficient and fast enough.
Also do not forget to call gc.collect() before checking the number of your instances, to ensure all unreferenced instances are really deleted.
I also saw an issue with weak references which are also counted in this way. The problem with weak references is, that the object which is referenced might not exist any more, thus isinstance(Instance, Class) might fail with an error about non existing weak references.
Here is a simple code example:
import gc
def getInstances(Class):
gc.collect()
Number = 0
InstanceList = gc.get_objects()
for Instance in InstanceList:
if 'weakproxy' not in str(type(Instance)): # avoid weak references
if isinstance(Instance, Class):
Number += 1
return Number

How does attribute resolution work in Python?

Consider the following code:
class A(object):
def do(self):
print self.z
class B(A):
def __init__(self, y):
self.z = y
b = B(3)
b.do()
Why does this work? When executing b = B(3), attribute z is set. When b.do() is called, Python's MRO finds the do function in class A. But why is it able to access an attribute defined in a subclass?
Is there a use case for this functionality? I would love an example.

It works in a pretty simple way: when a statement is executed that sets an attribute, it is set. When a statement is executed that reads an attribute, it is read. When you write code that reads an attribute, Python does not try to guess whether the attribute will exist when that code is executed; it just waits until the code actually is executed, and if at that time the attribute doesn't exist, then you'll get an exception.
By default, you can always set any attribute on an instance of a user-defined class; classes don't normally define lists of "allowed" attributes that could be set (although you can make that happen too), they just actually set attributes. Of course, you can only read attributes that exist, but again, what matters is whether they exist when you actually try to read them. So it doesn't matter if an attribute exists when you define a function that tries to read it; it only matters when (or if) you actually call that function.
In your example, it doesn't matter that there are two classes, because there is only one instance. Since you only create one instance and call methods on one instance, the self in both methods is the same object. First __init__ is run and it sets the attribute on self. Then do is run and it reads the attribute from the same self. That's all there is to it. It doesn't matter where the attribute is set; once it is set on the instance, it can be accessed from anywhere: code in a superclass, subclass, other class, or not in any class.

Since new attributes can be added to any object at any time, attribute resolution happens at execution time, not compile time. Consider this example which may be a bit more instructive, derived from yours:
class A(object):
def do(self):
print(self.z) # references an attribute which we have't "declared" in an __init__()
#make a new A
aa = A()
# this next line will error, as you would expect, because aa doesn't have a self.z
aa.do()
# but we can make it work now by simply doing
aa.z = -42
aa.do()
The first one will squack at you, but the second will print -42 as expected.
Python objects are just dictionaries. :)

When retrieving an attribute from an object (print self.attrname) Python follows these steps:
If attrname is a special (i.e. Python-provided) attribute for objectname, return it.
Check objectname.__class__.__dict__ for attrname. If it exists and is a data-descriptor, return the descriptor result. Search all bases of objectname.__class__ for the same case.
Check objectname.__dict__ for attrname, and return if found. If objectname is a class, search its bases too. If it is a class and a descriptor exists in it or its bases, return the descriptor result.
Check objectname.__class__.__dict__ for attrname. If it exists and is a non-data descriptor, return the descriptor result. If it exists, and is not a descriptor, just return it. If it exists and is a data descriptor, we shouldn't be here because we would have returned at point 2. Search all bases of objectname.__class__ for same case.
Raise AttributeError
Source
Understanding get and set and Python descriptors

Since you instanciated a B object, B.__init__ was invoked and added an attribute z. This attribute is now present in the object. It's not some weird overloaded magical shared local variable of B methods that somehow becomes inaccessible to code written elsewhere. There's no such thing. Neither does self become a different object when it's passed to a superclass' method (how's polymorphism supposed to work if that happens?).
There's also no such thing as a declaration that A objects have no such object (try o = A(); a.z = whatever), and neither is self in do required to be an instance of A1. In fact, there are no declarations at all. It's all "go ahead and try it"; that's kind of the definition of a dynamic language (not just dynamic typing).
That object's z attribute present "everywhere", all the time2, regardless of the "context" from which it is accessed. It never matters where code is defined for the resolution process, or for several other behaviors3. For the same reason, you can access a list's methods despite not writing C code in listobject.c ;-) And no, methods aren't special. They are just objects too (instances of the type function, as it happens) and are involved in exactly the same lookup sequence.
1 This is a slight lie; in Python 2, A.do would be "bound method" object which in fact throws an error if the first argument doesn't satisfy isinstance(A, <first arg>).
2 Until it's removed with del or one of its function equivalents (delattr and friends).
3 Well, there's name mangling, and in theory, code could inspect the stack, and thereby the caller code object, and thereby the location of its source code.

Python objects - avoiding creation of attribute with unknown name

Wishing to avoid a situation like this:
>>> class Point:
x = 0
y = 0
>>> a = Point()
>>> a.X = 4 #whoops, typo creates new attribute capital x
I created the following object to be used as a superclass:
class StrictObject(object):
def __setattr__(self, item, value):
if item in dir(self):
object.__setattr__(self, item, value)
else:
raise AttributeError("Attribute " + item + " does not exist.")
While this seems to work, the python documentation says of dir():
Note: Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases. For example, metaclass attributes are not in the result list when the argument is a class.
Is there a better way to check if an object has an attribute?

Much better ways.
The most common way is "we're all consenting adults". That means, you don't do any checking, and you leave it up to the user. Any checking you do makes the code less flexible in it's use.
But if you really want to do this, there is __slots__ by default in Python 3.x, and for new-style classes in Python 2.x:
By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.
The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.
Without a __dict__ variable, instances cannot be assigned new variables not listed in the __slots__ definition. Attempts to assign to an unlisted variable name raises AttributeError. If dynamic assignment of new variables is desired, then add '__dict__' to the sequence of strings in the __slots__ declaration.
For example:
class Point(object):
__slots__ = ("x", "y")
point = Point()
point.x = 5 # OK
point.y = 1 # OK
point.X = 4 # AttributeError is raised
And finally, the proper way to check if an object has a certain attribute is not to use dir, but to use the built-in function hasattr(object, name).

I don't think it's a good idea to write code to prevent such errors. These "static" checks should be the job of your IDE. Pylint will warn you about assigning attributes outside of __init__ thus preventing typo errors. It also shows many other problems and potential problems and it can easily be used from PyDev.

In such situation you should look what the python standard library may offer you. Did you consider the namedtuple?
from collections import namedtuple
Point = namedtuple("Point", "x, y")
a = Point(1,3)
print a.x, a.y
Because Point is now immutable your problem just can't happen, but the draw-back is naturally you can't e.g. just add +1 to a, but have to create a complete new Instance.
x,y = a
b = Point(x+1,y)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.