Python str subclass represents a value which is not a real string - python

I am a novice in python. Working on extending an older module. So far it had a function that returned str (output of a blocking shell command). Now I need that function to also be able to return an object so later operations can be done on it (checking output for a non-blocking shell command). So the function now returns an instance of my class which I subclassed from str for backward compatibility. The problem is, however, when such an object is passed to os.path.isdir - it always returns False, even with the string being a valid path
import os
class ShellWrap(str):
def __new__(cls, dummy_str_value, process_handle):
return str.__new__(cls,"")
def __init__(self, dummy_str_value, process_handle):
self._ph = process_handle
self._output_str = ""
def wait_for_output(self):
# for simplicity just do
self._output_str = "/Users"
def __str__(self):
return str(self._output_str)
def __repr__(self):
return str(self._output_str)
def __eq__(self,other):
if (isinstance(other, str)):
return other == str(self._output_str)
else:
return super().__eq__(self,other)
>>> obj = ShellWrap("",None)
>>> obj.wait_for_output()
>>> print(type(obj))
... <class '__main__.ShellWrap'>
>>> print (ShellWrap.__mro__)
... <class '__main__.ShellWrap'>
(<class '__main__.ShellWrap'>, <class 'str'>, <class 'object'>)
>>> print(type(obj._output_str))
... <class 'str'>
>>> print(obj)
... /Users
>>> print(obj._output_str)
... /Users
>>> obj == "/Users"
... True
The one that puzzles me is :
>>> print(os.path.isdir(obj))
... False **<<-- This one puzzles me**
print(os.path.isdir("/Users"))
... True
I tried to add PathLike inheritance and implement one more dunder but to no prevail :
class ShellWrap(str,PathLike):
....
def __fspath__(self):
return self._output_str
It seems there is one more dunder that I failed to implement. But which?
I do see, however, something strange in the debugger. When I put a watch on obj - it says it is of a class str but the value is shown by the debugger is without the quotes (unlike other 'pure' strs).
Adding quotes manually to the string in the debugger - makes it work but I guess editing a string probably creates a new object, this time pure str.
What do I miss?
Edit: after realizing (see the accepted answer) that what I try to do is impossible, I decided to challenge the decision of having to subclass str. So now my class does not inherit anything. It just implements __str__, __repr__ and __fspath__ and this seems to be enough! Apparently as long as the str inheritance is there - it gets precedence, the dunders don't get called and it tricks some libraries to go fetch the underlying C storage of the str value

Consider the source of os.path.isdir. When you pass in obj, you’re probably triggering that value error because the string you want to evaluate is an attribute of your string subclass, not the string the subclass is supposed to represent. You’ll have to muck around a bit more in the source for str to find the right member to override.
Edit: one possible way around this is to use __init__ dynamically. That is, get everything you need done to render the path string in__new__, and before you return the class in that method, set output_str as an attribute. Now in your __init__, call super().__init__ with self.output_str as the only argument.

What you're trying to do is impossible.
C code working with a string accesses the actual string data managed by the str class, not the methods you're writing. It doesn't care that you attached another string to your object as an attribute, or that you overrode a bunch of methods. It's closer to str.__whatever__(your_obj) than your_obj.__whatever__(), although it doesn't go through method calls at all.
In this case, the relevant C code is the os.stat call that os.path.isdir delegates to, but almost anything that uses strings is going to use something written in C that accesses the str data directly at some point.
You want your object's data to be mutable - wait_for_output is mutative - but you cannot mutate the parts of your object inherited from str, and that's the data that matters.

Related

how deque of python print all items [duplicate]

If someone writes a class in python, and fails to specify their own __repr__() method, then a default one is provided for them. However, suppose we want to write a function which has the same, or similar, behavior to the default __repr__(). However, we want this function to have the behavior of the default __repr__() method even if the actual __repr__() for the class was overloaded. That is, suppose we want to write a function which has the same behavior as a default __repr__() regardless of whether someone overloaded the __repr__() method or not. How might we do it?
class DemoClass:
def __init__(self):
self.var = 4
def __repr__(self):
return str(self.var)
def true_repr(x):
# [magic happens here]
s = "I'm not implemented yet"
return s
obj = DemoClass()
print(obj.__repr__())
print(true_repr(obj))
Desired Output:
print(obj.__repr__()) prints 4, but print(true_repr(obj)) prints something like:
<__main__.DemoClass object at 0x0000000009F26588>
You can use object.__repr__(obj). This works because the default repr behavior is defined in object.__repr__.
Note, the best answer is probably just to use object.__repr__ directly, as the others have pointed out. But one could implement that same functionality roughly as:
>>> def true_repr(x):
... type_ = type(x)
... module = type_.__module__
... qualname = type_.__qualname__
... return f"<{module}.{qualname} object at {hex(id(x))}>"
...
So....
>>> A()
hahahahaha
>>> true_repr(A())
'<__main__.A object at 0x106549208>'
>>>
Typically we can use object.__repr__ for that, but this will to the "object repr for every item, so:
>>> object.__repr__(4)
'<int object at 0xa6dd20>'
Since an int is an object, but with the __repr__ overriden.
If you want to go up one level of overwriting, we can use super(..):
>>> super(type(4), 4).__repr__() # going up one level
'<int object at 0xa6dd20>'
For an int that thus again means that we will print <int object at ...>, but if we would for instance subclass the int, then it would use the __repr__ of int again, like:
class special_int(int):
def __repr__(self):
return 'Special int'
Then it will look like:
>>> s = special_int(4)
>>> super(type(s), s).__repr__()
'4'
What we here do is creating a proxy object with super(..). Super will walk the method resolution order (MRO) of the object and will try to find the first function (from a superclass of s) that has overriden the function. If we use single inheritance, that is the closest parent that overrides the function, but if it there is some multiple inheritance involved, then this is more tricky. We thus select the __repr__ of that parent, and call that function.
This is also a rather weird application of super since usually the class (here type(s)) is a fixed one, and does not depend on the type of s itself, since otherwise multiple such super(..) calls would result in an infinite loop.
But usually it is a bad idea to break overriding anyway. The reason a programmer overrides a function is to change the behavior. Not respecting this can of course sometimes result into some useful functions, but frequently it will result in the fact that the code contracts are no longer satisfied. For example if a programmer overrides __eq__, he/she will also override __hash__, if you use the hash of another class, and the real __eq__, then things will start breaking.
Calling magic function directly is also frequently seen as an antipattern, so you better avoid that as well.

Can a Python class's __str___() return one of two strings?

I have a class for which I want to be able to print either a short string representation of an object or a longer string representation. Ideally, __str__() would accept a flag that chooses which string to return, and print() would accept that flag as well and use the correct version of __str__() accordingly, but nothing like that seems to exist.
I know that I can include print_short() and print_long() methods inside my class to choose the correct string, but this doesn't seem Pythonic, and violates the Python 3 change by which print() is a function. This would also bypass the use of __str__(), which again, seems unPythonic.
What's the most Pythonic way of handling this? Solutions involving __repr__() won't work, since I'm already using __repr__() as intended, to unambiguously represent the object itself.
The job of str is to provide "the" string representation of an object, whatever representation you decide is most useful.
If you want to control the formatting of an object, override __format__.
class MyClass:
def __format__(self, spec):
...
If you have code like
s = MyClass()
print("{:r}".format(s))
s.__format__ receives everything after the colon (in this case r) as its spec parameter; it is then entirely up to the definition of __format__ how it uses the spec in deciding what string value to return. You could do something like the following
class MyClass:
def __format__(self, spec):
if spec == 's':
return self._short_str()
elif spec == 'l':
return self._long_str()
else:
# This includes both no spec whatsoever, which is
# conventionally expected to behave like __str__
# and an unrecognized specification, which is just ignored.
return str(self)
def _long_str(self):
return "three"
def _short_str(self):
return "3"
def __str__(self):
return "III"
>>> x = MyClass()
>>> str(x)
'III'
>>> "{}".format(x)
'III'
>>> "{:whatever}".format(x)
'III'
>>> "{:s}".format(x)
'3'
>>> "{:l}".format(x)
'three'

How does this specific section of code work?

def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
if not isinstance(ie, type):
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
def get_info_extractor(self, ie_key):
"""
Get an instance of an IE with name ie_key, it will try to get one from
the _ies list, if there's no instance it will create a new one and add
it to the extractor list.
"""
ie = self._ies_instances.get(ie_key)
if ie is None:
ie = get_info_extractor(ie_key)()
self.add_info_extractor(ie)
return ie
The following is taken from a popular python repo, the youtube-dl. In an effor to become a better programmer I cam across this section and I'm having trouble understanding it.
Particularly the last method and how it does not enter infinite recursion if the ie_key is not found in the list.
As well as the isinstance comparision in the first method.
I understand the normal implementation is something to the effect of: isinstance('hello', str) , but how can type() be a type? Moreover what's the point of comparing an ie object to type?
This certainly could cause infinite recursion. No updates seem to happen to self._ies_instances in between recursive calls, and as recursion is dependent on this case, it will continue.
Maybe this is a bug, but the code has never had a situation when ie_key is not in the dictionary?
As for your confusion with type, it's a result of Python Metaclasses (a great read). type acts both as a "function" to return the type of an object as well as a class to create a new type (when called with more arguments).
One reason you may want to check to see if something is an instance of type is to see if something is a metaclass:
>>> isinstance(1, type)
False
>>> isinstance("", type)
False
>>> isinstance({}, type)
False
>>> isinstance((), type)
False
>>> type(object) == type
True
>>> isinstance(object, type)
True
>>> isinstance(object(), type)
False
>>> class a(): pass
...
>>> isinstance(a, type)
False
>>> isinstance(a(), type)
False
As object is the 'base for all new style classes' (docs), it also acts as a metaclass (as shown above).
I believe the reason this avoids infinite recursion is that it never actually recurses at all! Look closely:
def get_info_extractor(self, ie_key):
...
ie = get_info_extractor(ie_key)()
Note that the get_info_extractor whose definition we're reading is a method, and it calls a non-method function that just so happens to also be named get_info_extractor, and so it's not calling itself, and so there's no recursion.

Treating function as an object in Python

I am not sure if the title will match the question I am about to ask but please feel free to update it if you know a better title which can help everyone.
So let's say we have the following definition:
>>> def helloFunction():
name = "Hello World"
so when I type in the following code, that returns an empty dictionary.
>>> helloFunction.__dict__
{}
I am not sure if this is how it should be but let's continue. Interestingly, I can do the following:
>>> helloFunction.hello = "world"
>>> helloFunction.__dict__
{'hello': 'world'}
and when I type in the following code, it tells me helloFunction is indeed a function.
>>> type(helloFunction)
<type 'function'>
I am coming from C# and this behavior is little odd to me. How come Python works like this? Is a function an object? How should I interpret this situation? And also where would I need this type of functionality?
Update
While I was composing this question, I realized __class__ is defined on helloFunction.
>>> helloFunction.__class__
<type 'function'>
So it seems like function is indeed a class type?
Pep 232 added "function attributes" to the language. You can read that if you want all the official reasoning. The reality of the situation boils down to this sentence in the intro:
func_doc has the
interesting property that there is special syntax in function (and
method) definitions for implicitly setting the attribute. This
convenience has been exploited over and over again, overloading
docstrings with additional semantics.
Emphasis mine. People were using the __doc__ attribute to smuggle all sorts of function metadata; it seemed more natural to provide a real place to do that.
As for some more specific questions:
Is a function an object?
Oh yes. Functions are first-class objects in python. You can pass references to them as arguments to other functions all you like. Were they not first-class, I couldn't do this:
def increment(x):
return x+1
map(increment,[1,2,3]) # python2 `map`, for brevity
Out[3]: [2, 3, 4]
And also where would I need this type of functionality?
You generally don't. Not often. But it can be useful when you want to store metadata about a function.
Say I wanted to wrap a function with a decorator that records how many times it's been called. That's easy since we can plop that info into the function's __dict__.
def count_calls(func):
def _inner(*args, **kwargs):
before = getattr(func,'times_called',0)
func.times_called = before + 1
print('func has now been called {} times'.format(func.times_called))
return func(*args,**kwargs)
return _inner
#count_calls
def add(x,y):
return x+y
add(3,4)
func has now been called 1 times
Out[7]: 7
add(2,3)
func has now been called 2 times
Out[8]: 5
A function is an object and - like most objects in Python - it has a dictionary. One usage example I've seen in the wild is with the web framework CherryPy, where it's used to indicate which methods are to web access:
import cherrypy
class HelloWorld(object):
def index(self):
return "Hello World!"
index.exposed = True
When a path is accessed, the dispatcher can check that the corresponding handler method has its exposed attribute set to True and respond to it, allowing for both accessible and private methods to be safely defined on the controller.
Another use I've seen was a decorator that counted the number of times a function was called:
def call_counter(fn):
fn.count = 0
def _fn(*args, **kwargs):
fn.count += 1
return fn(*arg, **kwargs)
return _fn
Partly quote from Learning Python (Mark Lutz):
Like everything else in Python, functions are just objects; they are
recorded explicitly in memory at program execution time. In fact,
besides calls, functions allow arbitrary attributes to be attached to
record information for later use.
def func(): ... # Create function object
func() # Call object
func.attr = value # Attach attributes

Python: dereferencing weakproxy

Is there any way to get the original object from a weakproxy pointed to it? eg is there the inverse to weakref.proxy()?
A simplified example(python2.7):
import weakref
class C(object):
def __init__(self, other):
self.other = weakref.proxy(other)
class Other(object):
pass
others = [Other() for i in xrange(3)]
my_list = [C(others[i % len(others)]) for i in xrange(10)]
I need to get the list of unique other members from my_list. The way I prefer for such tasks
is to use set:
unique_others = {x.other for x in my_list}
Unfortunately this throws TypeError: unhashable type: 'weakproxy'
I have managed to solve the specific problem in an imperative way(slow and dirty):
unique_others = []
for x in my_list:
if x.other in unique_others:
continue
unique_others.append(x.other)
but the general problem noted in the caption is still active.
What if I have only my_list under control and others are burried in some lib and someone may delete them at any time, and I want to prevent the deletion by collecting nonweak refs in a list?
Or I may want to get the repr() of the object itself, not <weakproxy at xx to Other at xx>
I guess there should be something like weakref.unproxy I'm not aware about.
I know this is an old question but I was looking for an answer recently and came up with something. Like others said, there is no documented way to do it and looking at the implementation of weakproxy type confirms that there is no standard way to achieve this.
My solution uses the fact that all Python objects have a set of standard methods (like __repr__) and that bound method objects contain a reference to the instance (in __self__ attribute).
Therefore, by dereferencing the proxy to get the method object, we can get a strong reference to the proxied object from the method object.
Example:
>>> def func():
... pass
...
>>> weakfunc = weakref.proxy(func)
>>> f = weakfunc.__repr__.__self__
>>> f is func
True
Another nice thing is that it will work for strong references as well:
>>> func.__repr__.__self__ is func
True
So there's no need for type checks if either a proxy or a strong reference could be expected.
Edit:
I just noticed that this doesn't work for proxies of classes. This is not universal then.
Basically there is something like weakref.unproxy, but it's just named weakref.ref(x)().
The proxy object is only there for delegation and the implementation is rather shaky...
The == function doesn't work as you would expect it:
>>> weakref.proxy(object) == object
False
>>> weakref.proxy(object) == weakref.proxy(object)
True
>>> weakref.proxy(object).__eq__(object)
True
However, I see that you don't want to call weakref.ref objects all the time. A good working proxy with dereference support would be nice.
But at the moment, this is just not possible. If you look into python builtin source code you see, that you need something like PyWeakref_GetObject, but there is just no call to this method at all (And: it raises a PyErr_BadInternalCall if the argument is wrong, so it seems to be an internal function). PyWeakref_GET_OBJECT is used much more, but there is no method in weakref.py that could be able to do that.
So, sorry to disappoint you, but you weakref.proxy is just not what most people would want for their use cases. You can however make your own proxy implementation. It isn't to hard. Just use weakref.ref internally and override __getattr__, __repr__, etc.
On a little sidenote on how PyCharm is able to produce the normal repr output (Because you mentioned that in a comment):
>>> class A(): pass
>>> a = A()
>>> weakref.proxy(a)
<weakproxy at 0x7fcf7885d470 to A at 0x1410990>
>>> weakref.proxy(a).__repr__()
'<__main__.A object at 0x1410990>'
>>> type( weakref.proxy(a))
<type 'weakproxy'>
As you can see, calling the original __repr__ can really help!
weakref.ref is hashable whereas weakref.proxy is not. The API doesn't say anything about how you actually can get a handle on the object a proxy points to. with weakref, it's easy, you can just call it. As such, you can roll your own proxy-like class...Here's a very basic attemp:
import weakref
class C(object):
def __init__(self,obj):
self.object=weakref.ref(obj)
def __getattr__(self,key):
if(key == "object"): return object.__getattr__(self,"object")
elif(key == "__init__"): return object.__getattr__(self,"__init__")
else:
obj=object.__getattr__(self,"object")() #Dereference the weakref
return getattr(obj,key)
class Other(object):
pass
others = [Other() for i in range(3)]
my_list = [C(others[i % len(others)]) for i in range(10)]
unique_list = {x.object for x in my_list}
Of course, now unique_list contains refs, not proxys which is fundamentally different...
I know that this is an old question, but I've been bitten by it (so, there's no real 'unproxy' in the standard library) and wanted to share my solution...
The way I solved it to get the real instance was just creating a property which returned it (although I suggest using weakref.ref instead of a weakref.proxy as code should really check if it's still alive before accessing it instead of having to remember to catch an exception whenever any attribute is accessed).
Anyways, if you still must use a proxy, the code to get the real instance is:
import weakref
class MyClass(object):
#property
def real_self(self):
return self
instance = MyClass()
proxied = weakref.proxy(instance)
assert proxied.real_self is instance

Categories