How does this specific section of code work? - python

def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
if not isinstance(ie, type):
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
def get_info_extractor(self, ie_key):
"""
Get an instance of an IE with name ie_key, it will try to get one from
the _ies list, if there's no instance it will create a new one and add
it to the extractor list.
"""
ie = self._ies_instances.get(ie_key)
if ie is None:
ie = get_info_extractor(ie_key)()
self.add_info_extractor(ie)
return ie
The following is taken from a popular python repo, the youtube-dl. In an effor to become a better programmer I cam across this section and I'm having trouble understanding it.
Particularly the last method and how it does not enter infinite recursion if the ie_key is not found in the list.
As well as the isinstance comparision in the first method.
I understand the normal implementation is something to the effect of: isinstance('hello', str) , but how can type() be a type? Moreover what's the point of comparing an ie object to type?

This certainly could cause infinite recursion. No updates seem to happen to self._ies_instances in between recursive calls, and as recursion is dependent on this case, it will continue.
Maybe this is a bug, but the code has never had a situation when ie_key is not in the dictionary?
As for your confusion with type, it's a result of Python Metaclasses (a great read). type acts both as a "function" to return the type of an object as well as a class to create a new type (when called with more arguments).
One reason you may want to check to see if something is an instance of type is to see if something is a metaclass:
>>> isinstance(1, type)
False
>>> isinstance("", type)
False
>>> isinstance({}, type)
False
>>> isinstance((), type)
False
>>> type(object) == type
True
>>> isinstance(object, type)
True
>>> isinstance(object(), type)
False
>>> class a(): pass
...
>>> isinstance(a, type)
False
>>> isinstance(a(), type)
False
As object is the 'base for all new style classes' (docs), it also acts as a metaclass (as shown above).

I believe the reason this avoids infinite recursion is that it never actually recurses at all! Look closely:
def get_info_extractor(self, ie_key):
...
ie = get_info_extractor(ie_key)()
Note that the get_info_extractor whose definition we're reading is a method, and it calls a non-method function that just so happens to also be named get_info_extractor, and so it's not calling itself, and so there's no recursion.

Related

Why is `x[i]` not equivalent to `x.__getitem__(x)`?

From the documentation:
x[i] is roughly equivalent to type(x).__getitem__(x, i).
What is the benefit of the above rather than having a seemingly simpler x.__getitem__(i)?
EDIT: Why is Python behaving this way?
As a downside of the standard behavior let me show this sample code where I was surprised to find the last assertion fails while second to last one (calling __getitem__ directly) passes.
def poww_bar(base):
class Bar():
def __getitem__(self, x):
return lambda: base**x
return Bar()
def poww_foo(base):
class Foo():
pass
f = Foo()
f.__getitem__ = lambda x: lambda: base ** x
return f
pow_bar2 = poww_bar(2)
pow_foo2 = poww_foo(2)
assert pow_bar2.__getitem__(3)() == 8 # OK
assert pow_bar2[3]() == 8 # OK
assert pow_foo2.__getitem__(3)() == 8 # OK
assert pow_foo2[3]() == 8 # TypeError: 'Foo' object is not subscriptable
Methods are class attributes, not instance attributes.
There is no instance attribute named __getitem__ associated with pow_bar2. So lookup proceeds to checking the class for an attribute by that name, and it succeeds in finding Bar.__getitem__.
But the process doesn't end there. pow_bar2.__getitem__(i) is not equivalent to Bar.__getitem__(i), because Python first checks of the attribute lookup produces an object that implements the descriptor protocol. Since Bar.__getitem__ is an instance of function, it does implement the descriptor protocol.
The next step is then to return not the function itself, but the result of Bar.__dict__['__getitem__'].__get__(pow_bar2, Bar). (I'm switching to the use of Bar.__dict__ to emphasize that we do not get into an infinite loop of triggering the descriptor protocol.) This is an instance of method, which is itself a callable that passes is own arguments, along with pow_bar2, as arguments to the original function.
Thus, pow_bar2.__getitem__(i) is equivalent to Bar.__dict__['__getitem__'].__get__(pow_bar2, Bar)(i), which is roughly equivalent to Bar.__dict__['__getitem__'](pow_bar2, i).
But really, pow_bar2[i] is just shorter and more easily recognizable (due to decades of established support for this syntax in other languages) than pow_bar2.__getitem__(i). __getitem__ is what makes the use of [] extendable to other classes, rather than limiting it to built-in types.
The descriptor protocol is not just a one-shot feature that makes instance-method behavior seem more complicated than necessary. It also determines how class methods, static methods, and properties work, and can further be used to customize attribute behavior in other ways.
It could just be an optimization. A class function will only have one reference in the class definition. An object function will have a reference in every object. So the __getitem__ method was specified to be a class function, so they didn't need to waste time looking in the object definitions for it.
This is all speculation of course.

how deque of python print all items [duplicate]

If someone writes a class in python, and fails to specify their own __repr__() method, then a default one is provided for them. However, suppose we want to write a function which has the same, or similar, behavior to the default __repr__(). However, we want this function to have the behavior of the default __repr__() method even if the actual __repr__() for the class was overloaded. That is, suppose we want to write a function which has the same behavior as a default __repr__() regardless of whether someone overloaded the __repr__() method or not. How might we do it?
class DemoClass:
def __init__(self):
self.var = 4
def __repr__(self):
return str(self.var)
def true_repr(x):
# [magic happens here]
s = "I'm not implemented yet"
return s
obj = DemoClass()
print(obj.__repr__())
print(true_repr(obj))
Desired Output:
print(obj.__repr__()) prints 4, but print(true_repr(obj)) prints something like:
<__main__.DemoClass object at 0x0000000009F26588>
You can use object.__repr__(obj). This works because the default repr behavior is defined in object.__repr__.
Note, the best answer is probably just to use object.__repr__ directly, as the others have pointed out. But one could implement that same functionality roughly as:
>>> def true_repr(x):
... type_ = type(x)
... module = type_.__module__
... qualname = type_.__qualname__
... return f"<{module}.{qualname} object at {hex(id(x))}>"
...
So....
>>> A()
hahahahaha
>>> true_repr(A())
'<__main__.A object at 0x106549208>'
>>>
Typically we can use object.__repr__ for that, but this will to the "object repr for every item, so:
>>> object.__repr__(4)
'<int object at 0xa6dd20>'
Since an int is an object, but with the __repr__ overriden.
If you want to go up one level of overwriting, we can use super(..):
>>> super(type(4), 4).__repr__() # going up one level
'<int object at 0xa6dd20>'
For an int that thus again means that we will print <int object at ...>, but if we would for instance subclass the int, then it would use the __repr__ of int again, like:
class special_int(int):
def __repr__(self):
return 'Special int'
Then it will look like:
>>> s = special_int(4)
>>> super(type(s), s).__repr__()
'4'
What we here do is creating a proxy object with super(..). Super will walk the method resolution order (MRO) of the object and will try to find the first function (from a superclass of s) that has overriden the function. If we use single inheritance, that is the closest parent that overrides the function, but if it there is some multiple inheritance involved, then this is more tricky. We thus select the __repr__ of that parent, and call that function.
This is also a rather weird application of super since usually the class (here type(s)) is a fixed one, and does not depend on the type of s itself, since otherwise multiple such super(..) calls would result in an infinite loop.
But usually it is a bad idea to break overriding anyway. The reason a programmer overrides a function is to change the behavior. Not respecting this can of course sometimes result into some useful functions, but frequently it will result in the fact that the code contracts are no longer satisfied. For example if a programmer overrides __eq__, he/she will also override __hash__, if you use the hash of another class, and the real __eq__, then things will start breaking.
Calling magic function directly is also frequently seen as an antipattern, so you better avoid that as well.

Python str subclass represents a value which is not a real string

I am a novice in python. Working on extending an older module. So far it had a function that returned str (output of a blocking shell command). Now I need that function to also be able to return an object so later operations can be done on it (checking output for a non-blocking shell command). So the function now returns an instance of my class which I subclassed from str for backward compatibility. The problem is, however, when such an object is passed to os.path.isdir - it always returns False, even with the string being a valid path
import os
class ShellWrap(str):
def __new__(cls, dummy_str_value, process_handle):
return str.__new__(cls,"")
def __init__(self, dummy_str_value, process_handle):
self._ph = process_handle
self._output_str = ""
def wait_for_output(self):
# for simplicity just do
self._output_str = "/Users"
def __str__(self):
return str(self._output_str)
def __repr__(self):
return str(self._output_str)
def __eq__(self,other):
if (isinstance(other, str)):
return other == str(self._output_str)
else:
return super().__eq__(self,other)
>>> obj = ShellWrap("",None)
>>> obj.wait_for_output()
>>> print(type(obj))
... <class '__main__.ShellWrap'>
>>> print (ShellWrap.__mro__)
... <class '__main__.ShellWrap'>
(<class '__main__.ShellWrap'>, <class 'str'>, <class 'object'>)
>>> print(type(obj._output_str))
... <class 'str'>
>>> print(obj)
... /Users
>>> print(obj._output_str)
... /Users
>>> obj == "/Users"
... True
The one that puzzles me is :
>>> print(os.path.isdir(obj))
... False **<<-- This one puzzles me**
print(os.path.isdir("/Users"))
... True
I tried to add PathLike inheritance and implement one more dunder but to no prevail :
class ShellWrap(str,PathLike):
....
def __fspath__(self):
return self._output_str
It seems there is one more dunder that I failed to implement. But which?
I do see, however, something strange in the debugger. When I put a watch on obj - it says it is of a class str but the value is shown by the debugger is without the quotes (unlike other 'pure' strs).
Adding quotes manually to the string in the debugger - makes it work but I guess editing a string probably creates a new object, this time pure str.
What do I miss?
Edit: after realizing (see the accepted answer) that what I try to do is impossible, I decided to challenge the decision of having to subclass str. So now my class does not inherit anything. It just implements __str__, __repr__ and __fspath__ and this seems to be enough! Apparently as long as the str inheritance is there - it gets precedence, the dunders don't get called and it tricks some libraries to go fetch the underlying C storage of the str value
Consider the source of os.path.isdir. When you pass in obj, you’re probably triggering that value error because the string you want to evaluate is an attribute of your string subclass, not the string the subclass is supposed to represent. You’ll have to muck around a bit more in the source for str to find the right member to override.
Edit: one possible way around this is to use __init__ dynamically. That is, get everything you need done to render the path string in__new__, and before you return the class in that method, set output_str as an attribute. Now in your __init__, call super().__init__ with self.output_str as the only argument.
What you're trying to do is impossible.
C code working with a string accesses the actual string data managed by the str class, not the methods you're writing. It doesn't care that you attached another string to your object as an attribute, or that you overrode a bunch of methods. It's closer to str.__whatever__(your_obj) than your_obj.__whatever__(), although it doesn't go through method calls at all.
In this case, the relevant C code is the os.stat call that os.path.isdir delegates to, but almost anything that uses strings is going to use something written in C that accesses the str data directly at some point.
You want your object's data to be mutable - wait_for_output is mutative - but you cannot mutate the parts of your object inherited from str, and that's the data that matters.

Can someone explain the Python "function" concept in this context to me?

How are you supposed to access the 10 in this? I've been informed we're returning a function in this function, but how does this make sense?
function([1, 2, 3, 4])(10)
I'm assuming a lot based on the limited information you've provided in your question.
But it looks like you trying to understand a functional closure. Here's a totally contrived example:
def function(a):
def inner(b):
return sum(a) == b
return inner
>>> function([1,2,3,4])(10)
True
>>> eq = function([1,2,3,4])
>>> eq(10)
True
>>> eq(11)
False
In your expression function([1, 2, 3, 4])(10), there are two calls, one with the argument [1, 2, 3, 4] and the other with the argument 10. For this to work, function must be a callable that returns a callable. Python relies heavily on objects having types which define their behaviour, and callability is one of those behaviours, recursively defined by objects having a __call__ method (which is a type of callable). Because of this dynamic behaviour, we can't tell from the expression what type function is.
We can provide examples that would make the expression valid, though. For instance:
function = lambda x: x.__contains__
This creates an anonymous function using a lambda expression, which is a callable. That function returns a bound method (assuming its argument has the __contains__ method) which in turn is callable, and the expression would evaluate to False.
class function:
def __init__(self,a):
"Method called during object initialization"
# Note that the return value doesn't come from this method.
# self is created before it is called and returned after.
def __call__(self,b):
"Method called when the object is called"
return "Well, the first one wasn't quite a function."
This makes a class named function, and classes are callable, which is how we instantiate them. So the first call became an object instantiation and the second call calls an object. In this example, we don't actually have a function, though we do have two methods that are called within the two calls.
AChampion's example uses two normal function definitions, one of which occurs inside another creating a closure over that call's a value. That is a more traditional approach, though we can still muddle the waters using mutable values:
def function(a):
def inner(b):
return sum(a) == b
return inner
>>> l = [1,2,3,4]
>>> eq = function(l)
>>> eq(10)
True
>>> eq(15)
False
>>> l.append(5)
>>> eq(15)
True
>>> eq(10)
False
We see here that this isn't a pure function in the mathematical sense, as its value is affected by other state than its arguments. We frequently try to avoid such side effects, or at least expose them by prominently displaying the state container, such as in method calls.
Lastly, depending on the context, the expression could fail in a variety of ways including NameError if function simply isn't defined, or TypeError if one of the calls was attempted on a non-callable object. It's still syntactically correct Python, and both of those exceptions are possible to handle, although doing so is likely a bit of a perversion. An example might be a spreadsheet program in which the cell formulae are Python expressions; you'd evaluate them with specific namespaces (globals), and catch any error to account for mistyped formulae.

Python: dereferencing weakproxy

Is there any way to get the original object from a weakproxy pointed to it? eg is there the inverse to weakref.proxy()?
A simplified example(python2.7):
import weakref
class C(object):
def __init__(self, other):
self.other = weakref.proxy(other)
class Other(object):
pass
others = [Other() for i in xrange(3)]
my_list = [C(others[i % len(others)]) for i in xrange(10)]
I need to get the list of unique other members from my_list. The way I prefer for such tasks
is to use set:
unique_others = {x.other for x in my_list}
Unfortunately this throws TypeError: unhashable type: 'weakproxy'
I have managed to solve the specific problem in an imperative way(slow and dirty):
unique_others = []
for x in my_list:
if x.other in unique_others:
continue
unique_others.append(x.other)
but the general problem noted in the caption is still active.
What if I have only my_list under control and others are burried in some lib and someone may delete them at any time, and I want to prevent the deletion by collecting nonweak refs in a list?
Or I may want to get the repr() of the object itself, not <weakproxy at xx to Other at xx>
I guess there should be something like weakref.unproxy I'm not aware about.
I know this is an old question but I was looking for an answer recently and came up with something. Like others said, there is no documented way to do it and looking at the implementation of weakproxy type confirms that there is no standard way to achieve this.
My solution uses the fact that all Python objects have a set of standard methods (like __repr__) and that bound method objects contain a reference to the instance (in __self__ attribute).
Therefore, by dereferencing the proxy to get the method object, we can get a strong reference to the proxied object from the method object.
Example:
>>> def func():
... pass
...
>>> weakfunc = weakref.proxy(func)
>>> f = weakfunc.__repr__.__self__
>>> f is func
True
Another nice thing is that it will work for strong references as well:
>>> func.__repr__.__self__ is func
True
So there's no need for type checks if either a proxy or a strong reference could be expected.
Edit:
I just noticed that this doesn't work for proxies of classes. This is not universal then.
Basically there is something like weakref.unproxy, but it's just named weakref.ref(x)().
The proxy object is only there for delegation and the implementation is rather shaky...
The == function doesn't work as you would expect it:
>>> weakref.proxy(object) == object
False
>>> weakref.proxy(object) == weakref.proxy(object)
True
>>> weakref.proxy(object).__eq__(object)
True
However, I see that you don't want to call weakref.ref objects all the time. A good working proxy with dereference support would be nice.
But at the moment, this is just not possible. If you look into python builtin source code you see, that you need something like PyWeakref_GetObject, but there is just no call to this method at all (And: it raises a PyErr_BadInternalCall if the argument is wrong, so it seems to be an internal function). PyWeakref_GET_OBJECT is used much more, but there is no method in weakref.py that could be able to do that.
So, sorry to disappoint you, but you weakref.proxy is just not what most people would want for their use cases. You can however make your own proxy implementation. It isn't to hard. Just use weakref.ref internally and override __getattr__, __repr__, etc.
On a little sidenote on how PyCharm is able to produce the normal repr output (Because you mentioned that in a comment):
>>> class A(): pass
>>> a = A()
>>> weakref.proxy(a)
<weakproxy at 0x7fcf7885d470 to A at 0x1410990>
>>> weakref.proxy(a).__repr__()
'<__main__.A object at 0x1410990>'
>>> type( weakref.proxy(a))
<type 'weakproxy'>
As you can see, calling the original __repr__ can really help!
weakref.ref is hashable whereas weakref.proxy is not. The API doesn't say anything about how you actually can get a handle on the object a proxy points to. with weakref, it's easy, you can just call it. As such, you can roll your own proxy-like class...Here's a very basic attemp:
import weakref
class C(object):
def __init__(self,obj):
self.object=weakref.ref(obj)
def __getattr__(self,key):
if(key == "object"): return object.__getattr__(self,"object")
elif(key == "__init__"): return object.__getattr__(self,"__init__")
else:
obj=object.__getattr__(self,"object")() #Dereference the weakref
return getattr(obj,key)
class Other(object):
pass
others = [Other() for i in range(3)]
my_list = [C(others[i % len(others)]) for i in range(10)]
unique_list = {x.object for x in my_list}
Of course, now unique_list contains refs, not proxys which is fundamentally different...
I know that this is an old question, but I've been bitten by it (so, there's no real 'unproxy' in the standard library) and wanted to share my solution...
The way I solved it to get the real instance was just creating a property which returned it (although I suggest using weakref.ref instead of a weakref.proxy as code should really check if it's still alive before accessing it instead of having to remember to catch an exception whenever any attribute is accessed).
Anyways, if you still must use a proxy, the code to get the real instance is:
import weakref
class MyClass(object):
#property
def real_self(self):
return self
instance = MyClass()
proxied = weakref.proxy(instance)
assert proxied.real_self is instance

Categories