When did CPython's `repr` start handling recursive data structures? - python

This MWE in the past would cause a stack overflow, as x references y that references x:
class Ref:
def __init__(self, name):
self.name = name
self.value = None
def __repr__(self):
if self.value is None:
return self.name
return f"{self.name}={self.value!r}"
if __name__ == '__main__':
x, y = Ref("x"), Ref("y")
x.value = (1, y)
y.value = (2, x)
print(x)
print(y)
But as I test it with CPython 3.10.4, it works out of the box!
x=(1, y=(2, x=(...)))
y=(2, x=(1, y=(...)))
I can't find when this behavior changed. I see several questions as recent as 2020 wondering how to handle mutually- or self-recursive data structures. I also found about the reprlib builtin library which produces similar output, so I suspect some language dev decided to use it by default.
Note: I also tested it with __str__ and it also works, so it's not specific to repr().

It actually never really did, and still doesn't as of today (version 3.12.0 alpha 0).
The case you show is the simplest possible one: recursive repr with instances of the same class. In such case, it's pretty easy for the interpreter to detect that the repr is going to cause infinite recursion and therefore stop and produce ... instead: it just needs to check whether the .__repr__() method for the current class is asking for the .__repr__() of an instance of the same class.
This has been supported since Python 1.5.1 (1998!) as can be seen in Misc/HISTORY:
========================================
==> Release 1.5.1 (October 31, 1998) <==
========================================
[...]
- No longer a core dump when attempting to print (or repr(), or str())
a list or dictionary that contains an instance of itself; instead, the
recursive entry is printed as [...] or {...}. See Py_ReprEnter() and
Py_ReprLeave() below. Comparisons of such objects still go beserk,
since this requires a different kind of fix; fortunately, this is a
less common scenario in practice.
Any slightly more complex case will still cause trouble even on the latest CPython version:
class A:
def __init__(self):
self.value = None
def __repr__(self):
return f"{self.value!r}"
class B:
def __init__(self):
self.value = None
def __repr__(self):
return f"{self.value!r}"
a, b = A(), B()
a.value = b
b.value = a
print(a)
# RecursionError: maximum recursion depth exceeded while getting the repr of an object

Related

DRY principle in Python __init__ method

In this class definition, every parameter occurs three times, which seems to violate the DRY (don't repeat yourself) principle:
class Foo:
def __init__(self, a=1, b=2.0, c=(3, 4, 5)):
self.a = int(a)
self.b = float(b)
self.c = list(c)
DRY could be applied like this (Python 3):
class Foo:
def __init__(self, **kwargs):
defaults = dict(a=1, b=2.0, c=[3, 4, 5])
for k, v in defaults.items():
setattr(self, k, type(v)(kwargs[k]) if k in kwargs else v)
# ...detect illegal keywords here...
However, this breaks IDE autocomplete (tried Spyder and Elpy) and pylint will complain if I try to access the attributes later on.
Is there a clean way to handle this?
Edit: The example has three parameters, but I find myself dealing with this when there are 15 parameters, where I only rarely need to override the defaults; often with more complicated types, where I would need to do
if not isinstance(kwargs['x'], SomeClass):
raise TypeError('x: must be SomeClass')
self.x = kwargs['x']
for each of them. Moreover, I can't use mutables as default values for keyword arguments.
Principles like DRY are important, but it's important to keep in mind the rationale for such a principle before blindly applying it -- arguably the biggest advantage of DRY code is that you increase the maintainability of the code by only having to modify it in one place and not having to risk the subtle bugs that can occur with code that is modified in one place and not another. DRY can be antithetical to other common principles like YAGNI and KISS, and choosing the correct balance for your application is important.
In particular, DRY often applies to default values, application logic, and other things that could cause bugs if changed in one place and not another. IMO variable names don't fit in the same way since refactoring the code to change every occurrence of Foo's instance variable of a won't actually break anything by not changing the name in the initializer as well.
With that in mind, we have a simple test for your code. Are these variables likely to change together, or is the initializer for Foo a layer of abstraction that allows a refactoring of the inputs independently of the class's instance variables?
Change Together: I rather like #chepner's answer, and I'd take it one step further. If your class is anything more than a data transfer object you can use #chepner's solution as a way to logically group related pieces of data (which admittedly could be unnecessary in your situation, and without some context it's difficult to choose an optimal way to introduce such an idea), e.g.
from dataclasses import dataclass, field
#dataclass
class MyData:
a: int
b: float
c: list
class Foo:
def __init__(self, my_data):
self.wrapped = my_data
Change Separately: Then just leave it alone, or KISS as they say.
As a preface, your code
class Foo:
def __init__(self, a=1, b=2.0, c=(3, 4, 5)):
self.a = int(a)
self.b = float(b)
self.c = list(c)
is, as mentioned in several comments, fine as it is. Code is read far more than it is written, and aside from needing to be careful to avoid typos in the names when first defining this, the intent is perfectly clear. (Though see the end of the answer regarding the default value of c.)
If you are using Python 3.7, you can use a data class to reduce the number of references you make to each variable.
from dataclasses import dataclass, field
from typing import List
#dataclass
class Foo:
a: int = 1
b: float = 2.0
c: List[int] = field(default_factory=lambda: [3,4,5])
This doesn't prevent you from violating the type hints (Foo("1") will happily set a = "1" instead of a = 1 or raising an error), but it's typically the responsibility of the caller to provide arguments of the correct type.) If you really want to enforce this at run-time, you can add a __post_init__ method:
def __post_init__(self):
self.a = int(self.a)
self.b = float(self.b)
self.c = list(self.c)
But if you do that, you may as well go back to your original hand-coded __init__ method.
As an aside, the standard idiom for mutable default arguments is
def __init__(self, a=1, b=2.0, c=None):
...
if c is None:
c = [3, 4, 5]
Your approach has two problem:
It requires that list be run for every instantiation, rather than letting the compiler hard-code [3,4,5].
If you were type-hinting the arguments to __init__, your default value doesn't match the intended type. You'd have to write something like
def init(a: int = 1, b: float = 2.0, c : Union[List[Int], Tuple[Int,Int,Int]] = (3,4,5))
A default value of None automatically causes a "promotion" of the type to a corresponding optional type. The following are equivalent:
def __init__(a: int = 1, b: float = 2.0, c : List[Int] = None):
def __init__(a: int = 1, b: float = 2.0, c : Optional[List[Int]] = None):

python set() membership and hashable objects

I wanted to store instances of a class in a set, so I could use the set methods to find intersections, etc. My class has a __hash__() function, along with an __eq__ and a __lt__, and is decorated with functools.total_ordering
When I create two sets, each containing the same two objects, and do a set_a.difference(set_b), I get a result with a single object, and I have no idea why. I was expecting none, or at the least, 2, indicating a complete failure in my understanding of how sets work. But one?
for a in set_a:
print(a, a.__hash__())
for b in set_b:
print(b, b.__hash__(), b in set_a)
(<foo>, -5267863171333807568)
(<bar>, -8020339072063373731)
(<foo>, -5267863171333807568, False)
(<bar)>, -8020339072063373731, True)
Why is the <foo> object in set_b not considered to be in set_a? What other properties does an object require in order to be considered a member of a set? And why is bar considered to be a part of set_a, but not foo?
edit: updating with some more info. I figured that simply showing that the two objects' hash() results where the same meant that they where indeed the same, so I guess that's where my mistake probably comes from.
#total_ordering
class Thing(object):
def __init__(self, i):
self.i = i
def __eq__(self, other):
return self.i == other.i
def __lt__(self, other):
return self.i < other.i
def __repr__(self):
return "<Thing {}>".format(self.i)
def __hash__(self):
return hash(self.i)
I figured it out thanks to some of the questions in the comments- the problem was due to the fact that I had believed that ultimately, the hash function decides if two objects are the same, or not. The __eq__ also needs to match, which it always did in my tests and attempts to create a minimal example here.
However, when pulling data from a DB in prod, a certain float was being rounded down, and thus, the x == y was failing in prod. Argh.

Can I override a class function without creating a new class in Python?

I'm making a game in pygame and I have made an 'abstract' class that's sole job is to store the sprites for a given level (with the intent of having these level objects in a list to facilitate the player being moved from one level to another)
Alright, so to the question. If I can do the equivalent of this in Python(code curtesy of Java):
Object object = new Object (){
public void overriddenFunction(){
//new functionality
};
};
Than when I build the levels in the game I would simply have to override the constructor (or a class/instance method that is responsible for building the level) with the information on where the sprites go, because making a new class for every level in the game isn't that elegant of an answer. Alternatively I would have to make methods within the level class that would then build the level once a level object is instantiated, placing the sprites as needed.
So, before one of the more stanch developers goes on about how anti-python this might be (I've read enough of this site to get that vibe from Python experts) just tell me if its doable.
Yes, you can!
class Foo:
def do_other(self):
print('other!')
def do_foo(self):
print('foo!')
def do_baz():
print('baz!')
def do_bar(self):
print('bar!')
# Class-wide impact
Foo.do_foo = do_bar
f = Foo()
g = Foo()
# Instance-wide impact
g.do_other = do_baz
f.do_foo() # prints "bar!"
f.do_other() # prints "other!"
g.do_foo() # prints "bar!"
g.do_other() # prints "baz!"
So, before one of the more stanch developers goes on about how anti-python this might be
Overwriting functions in this fashion (if you have a good reason to do so) seems reasonably pythonic to me. An example of one reason/way for which you might have to do this would be if you had a dynamic feature for which static inheritance didn't or couldn't apply.
The case against might be found in the Zen of Python:
Beautiful is better than ugly.
Readability counts.
If the implementation is hard to explain, it's a bad idea.
Yes, it's doable. Here, I use functools.partial to get the implied self argument into a regular (non-class-method) function:
import functools
class WackyCount(object):
"it's a counter, but it has one wacky method"
def __init__(self, name, value):
self.name = name
self.value = value
def __str__(self):
return '%s = %d' % (self.name, self.value)
def incr(self):
self.value += 1
def decr(self):
self.value -= 1
def wacky_incr(self):
self.value += random.randint(5, 9)
# although x is a regular wacky counter...
x = WackyCount('spam', 1)
# it increments like crazy:
def spam_incr(self):
self.value *= 2
x.incr = functools.partial(spam_incr, x)
print (x)
x.incr()
print (x)
x.incr()
print (x)
x.incr()
print (x)
and:
$ python2.7 wacky.py
spam = 1
spam = 2
spam = 4
spam = 8
$ python3.2 wacky.py
spam = 1
spam = 2
spam = 4
spam = 8
Edit to add note: this is a per-instance override. It takes advantage of Python's attribute look-up sequence: if x is an instance of class K, then x.attrname starts by looking at x's dictionary to find the attribute. If not found, the next lookup is in K. All the normal class functions are actually K.func. So if you want to replace the class function dynamically, use #Brian Cane's answer instead.
I'd suggest using a different class, via inheritance, for each level.
But you might get some mileage out of copy.deepcopy() and monkey patching, if you're really married to treating Python like Java.

Python - __eq__ method not being called

I have a set of objects, and am interested in getting a specific object from the set. After some research, I decided to use the solution provided here: http://code.activestate.com/recipes/499299/
The problem is that it doesn't appear to be working.
I have two classes defined as such:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
class Bar(Foo):
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
Note: equality of these two classes should only be defined on the attributes a, b, c.
The wrapper _CaptureEq in http://code.activestate.com/recipes/499299/ also defines its own __eq__ method. The problem is that this method never gets called (I think). Consider,
bar_1 = Bar(1,2,3,4,5)
bar_2 = Bar(1,2,3,10,11)
summary = set((bar_1,))
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
bar_equiv.d should equal 4 and likewise bar_equiv .e should equal 5, but they are not. Like I mentioned, it looks like the __CaptureEq __eq__ method does not get called when the statement bar_2 in summary is executed.
Is there some reason why the __CaptureEq __eq__ method is not being called? Hopefully this is not too obscure of a question.
Brandon's answer is informative, but incorrect. There are actually two problems, one with
the recipe relying on _CaptureEq being written as an old-style class (so it won't work properly if you try it on Python 3 with a hash-based container), and one with your own Foo.__eq__ definition claiming definitively that the two objects are not equal when it should be saying "I don't know, ask the other object if we're equal".
The recipe problem is trivial to fix: just define __hash__ on the comparison wrapper class:
class _CaptureEq:
'Object wrapper that remembers "other" for successful equality tests.'
def __init__(self, obj):
self.obj = obj
self.match = obj
# If running on Python 3, this will be a new-style class, and
# new-style classes must delegate hash explicitly in order to populate
# the underlying special method slot correctly.
# On Python 2, it will be an old-style class, so the explicit delegation
# isn't needed (__getattr__ will cover it), but it also won't do any harm.
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
result = (self.obj == other)
if result:
self.match = other
return result
def __getattr__(self, name): # support anything else needed by __contains__
return getattr(self.obj, name)
The problem with your own __eq__ definition is also easy to fix: return NotImplemented when appropriate so you aren't claiming to provide a definitive answer for comparisons with unknown objects:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
if not isinstance(other, Foo):
# Don't recognise "other", so let *it* decide if we're equal
return NotImplemented
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
With those two fixes, you will find that Raymond's get_equivalent recipe works exactly as it should:
>>> from capture_eq import *
>>> bar_1 = Bar(1,2,3,4,5)
>>> bar_2 = Bar(1,2,3,10,11)
>>> summary = set((bar_1,))
>>> assert(bar_1 == bar_2)
>>> bar_equiv = get_equivalent(summary, bar_2)
>>> bar_equiv.d
4
>>> bar_equiv.e
5
Update: Clarified that the explicit __hash__ override is only needed in order to correctly handle the Python 3 case.
The problem is that the set compares two objects the “wrong way around” for this pattern to intercept the call to __eq__(). The recipe from 2006 evidently was written against containers that, when asked if x was present, went through the candidate y values already present in the container doing:
x == y
comparisons, in which case an __eq__() on x could do special actions during the search. But the set object is doing the comparison the other way around:
y == x
for each y in the set. Therefore this pattern might simply not be usable in this form when your data type is a set. You can confirm this by instrumenting Foo.__eq__() like this:
def __eq__(self, other):
print '__eq__: I am', self.d, self.e, 'and he is', other.d, other.e
return self.__key() == other.__key()
You will then see a message like:
__eq__: I am 4 5 and he is 10 11
confirming that the equality comparison is posing the equality question to the object already in the set — which is, alas, not the object wrapped with Hettinger's _CaptureEq object.
Update:
And I forgot to suggest a way forward: have you thought about using a dictionary? Since you have an idea here of a key that is a subset of the data inside the object, you might find that splitting out the idea of the key from the idea of the object itself might alleviate the need to attempt this kind of convoluted object interception. Just write a new function that, given an object and your dictionary, computes the key and looks in the dictionary and returns the object already in the dictionary if the key is present else inserts the new object at the key.
Update 2: well, look at that — Nick's answer uses a NotImplemented in one direction to force the set to do the comparison in the other direction. Give the guy a few +1's!
There are two issues here. The first is that:
t = _CaptureEq(item)
if t in container:
return t.match
return default
Doesn't do what you think. In particular, t will never be in container, since _CaptureEq doesn't define __hash__. This becomes more obvious in Python 3, since it will point this out to you rather than providing a default __hash__. The code for _CaptureEq seems to believe that providing an __getattr__ will solve this - it won't, since Python's special method lookups are not guaranteed to go through all the same steps as normal attribute lookups - this is the same reason __hash__ (and various others) need to be defined on a class and can't be monkeypatched onto an instance. So, the most direct way around this is to define _CaptureEq.__hash__ like so:
def __hash__(self):
return hash(self.obj)
But that still isn't guaranteed to work, because of the second issue: set lookup is not guaranteed to test equality. sets are based on hashtables, and only do an equality test if there's more than one item in a hash bucket. You can't (and don't want to) force items that hash differently into the same bucket, since that's all an implementation detail of set. The easiest way around this issue, and to neatly sidestep the first one, is to use a list instead:
summary = [bar_1]
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
assert(bar_equiv is bar_1)

sharing a string between two objects

I want two objects to share a single string object. How do I pass the string object from the first to the second such that any changes applied by one will be visible to the other? I am guessing that I would have to wrap the string in a sort of buffer object and do all sorts of complexity to get it to work.
However, I have a tendency to overthink problems, so undoubtedly there is an easier way. Or maybe sharing the string is the wrong way to go? Keep in mind that I want both objects to be able to edit the string. Any ideas?
Here is an example of a solution I could use:
class Buffer(object):
def __init__(self):
self.data = ""
def assign(self, value):
self.data = str(value)
def __getattr__(self, name):
return getattr(self.data, name)
class Descriptor(object):
def __get__(self, instance, owner):
return instance._buffer.data
def __set__(self, instance, value):
if not hasattr(instance, "_buffer"):
if isinstance(value, Buffer):
instance._buffer = value
return
instance._buffer = Buffer()
instance._buffer.assign(value)
class First(object):
data = Descriptor()
def __init__(self, data):
self.data = data
def read(self, size=-1):
if size < 0:
size = len(self.data)
data = self.data[:size]
self.data = self.data[size:]
return data
class Second(object):
data = Descriptor()
def __init__(self, data):
self.data = data
def add(self, newdata):
self.data += newdata
def reset(self):
self.data = ""
def spawn(self):
return First(self._buffer)
s = Second("stuff")
f = s.spawn()
f.data == s.data
#True
f.read(2)
#"st"
f.data
# "uff"
f.data == s.data
#True
s.data
#"uff"
s._buffer == f._buffer
#True
Again, this seems like absolute overkill for what seems like a simple problem. As well, it requires the use of the Buffer class, a descriptor, and the descriptor's impositional _buffer variable.
An alternative is to put one of the objects in charge of the string and then have it expose an interface for making changes to the string. Simpler, but not quite the same effect.
I want two objects to share a single
string object.
They will, if you simply pass the string -- Python doesn't copy unless you tell it to copy.
How do I pass the string object from
the first to the second such that any
changes applied by one will be visible
to the other?
There can never be any change made to a string object (it's immutable!), so your requirement is trivially met (since a false precondition implies anything).
I am guessing that I would have to
wrap the string in a sort of buffer
object and do all sorts of complexity
to get it to work.
You could use (assuming this is Python 2 and you want a string of bytes) an array.array with a typecode of c. Arrays are mutable, so you can indeed alter them (with mutating methods -- and some operators, which are a special case of methods since they invoke special methods on the object). They don't have the myriad non-mutating methods of strings, so, if you need those, you'll indeed need a simple wrapper (delegating said methods to the str(...) of the array that the wrapper also holds).
It doesn't seem there should be any special complexity, unless of course you want to do something truly weird as you seem to given your example code (have an assignment, i.e., a *rebinding of a name, magically affect a different name -- that has absolutely nothing to do with whatever object was previously bound to the name you're rebinding, nor does it change that object in any way -- the only object it "changes" is the one holding the attribute, so it's obvious that you need descriptors or other magic on said object).
You appear to come from some language where variables (and particularly strings) are "containers of data" (like C, Fortran, or C++). In Python (like, say, in Java), names (the preferred way to call what others call "variables") always just refer to objects, they don't contain anything except exactly such a reference. Some objects can be changed, some can't, but that has absolutely nothing to do with the assignment statement (see note 1) (which doesn't change objects: it rebinds names).
(note 1): except of course that rebinding an attribute or item does alter the object that "contains" that item or attribute -- objects can and do contain, it's names that don't.
Just put your value to be shared in a list, and assign the list to both objects.
class A(object):
def __init__(self, strcontainer):
self.strcontainer = strcontainer
def upcase(self):
self.strcontainer[0] = self.strcontainer[0].upper()
def __str__(self):
return self.strcontainer[0]
# create a string, inside a shareable list
shared = ['Hello, World!']
x = A(shared)
y = A(shared)
# both objects have the same list
print id(x.strcontainer)
print id(y.strcontainer)
# change value in x
x.upcase()
# show how value is changed in both x and y
print str(x)
print str(y)
Prints:
10534024
10534024
HELLO, WORLD!
HELLO, WORLD!
i am not a great expert in python, but i think that if you declare a variable in a module and add a getter/setter to the module for this variable you will be able to share it this way.

Categories