Why does Python allow mentioning a method without calling it? - python

I had some trouble finding my error: I had written
myfile.close
instead of
myfile.close()
I am surprised and somewhat unhappy that python did not object; how come? BTW the file was NOT closed.
(python 2.7 on Ubuntu)

In python methods are first class objects, you can write something like that:
my_close = myfile.close
my_close()
Since expressions don't have to be assigned to some variable
2 + 3
a simple
myfile.close
is valid, too.

That is because myfile.close returns a method.
If you do print(myfile.close) you will get an output like:
<built-in method close of file object at 0x7fb672e0f540>

Python file object has built-in method close() to handle file object.
In your case myfile.close will return the method object reference but it is not being called.
This example will may be helpful to understand this:
>>> def test():
... print 'test print'
...
>>> a = test
>>> print(a)
<function test at 0x7f6ff1f6ab90>
>>>

myfile.close is a member function of a file-like object. Your code just 'mentions' this function, but doesn't call it (this is why the file wasn't closed).
It's the same as doing this:
a = 5
a # OK but does nothing
def test():
return 256
test # also not an error, but useless in this case
Now, you may say, why even allow this if it's totally useless? Well, not quite. With this you can pass functions as arguments to other functions, for example.
def MyMap(function, iterable):
for x in iterable:
yield function(x)
print(list(MyMap(str, range(5))))

fp.close is method of file object.
>>> fp.close
<built-in method close of file object at 0xb749c1d8>
fp.close() is method call by using ()
>>> fp.close()
You can understand in more details by following Demo.
We define test function and we is test and test() to see difference.
>>> def test():
... return 1
...
>>> test
<function test at 0xb742bb54>
>>> test()
1

Your code does not throw an error because you don't have to assign a name to the result of an expression (edit: or statement). Python allows this because it could be that you only need the expression/statement for some sort of side effect but not the actual return value.
This is often seen when advancing an iterator without assigning a name to the yielded object, i.e. just having a line
next(myiter, None)
in the code.
If you wanted expressions with no sideeffects like
1 + 1
or
myfile.close
to throw an error if there is no name assigned to the result (and therefore the expression is useless) Python would have to inspect each right hand side of an expression for possible sideeffects. Compared to the current behavior of just allowing these expressions, this seems needlessly expensive (and would also be very complex, if not impossible for every expression).
Consider how dynamic Python is. You could override what the + operator does to certain types at runtime.

Related

What happens when you invoke a function that contains yield?

I read here the following example:
>>> def double_inputs():
... while True: # Line 1
... x = yield # Line 2
... yield x * 2 # Line 3
...
>>> gen = double_inputs()
>>> next(gen) # Run up to the first yield
>>> gen.send(10) # goes into 'x' variable
If I understand the above correctly, it seems to imply that Python actually waits until next(gen) to "run up to" to Line 2 in the body of the function. Put another way, the interpreter would not start executing the body of the function until we call next.
Is that actually correct?
To my knowledge, Python does not do AOT compilation, and it doesn't "look ahead" much except for parsing the code and making sure it's valid Python. Is this correct?
If the above are true, how would Python know when I invoke double_inputs() that it needs to wait until I call next(gen) before it even enters the loop while True?
Correct. Calling double_inputs never executes any of the code; it simply returns a generator object. The presence of the yield expression in the body, discovered when the def statement is parsed, changes the semantics of the def statement to create a generator object rather than a function object.
The function contains yield is a generator.
When you call gen = double_inputs(), you get a generator instance as the result. You need to consume this generator by calling next.
So for your first question, it is true. It runs lines 1, 2, 3 when you first call next.
For your second question, I don't exactly get your point. When you define the function, Python knows what you are defining, it doesn't need to look ahead when running it.
For your third question, the key is yield key word.
Generator-function is de iure a function, but de facto it is an iterator, i.e. a class (with implemented __next__(), __iter()__, and some other methods.)
          In other words, it is a class disguised as a function.
It means, that “calling” this function is in reality making an instance of this class, and explains, why the “called function” does initially nothing. This is the answer to your 3rd question.
The answer to your 1st question is surprisingly no.
Instances always wait for calling its methods, and the __next__() method (indirectly launched by calling the next() build-in function) is not the only method of generators. Other method is the .send(), and you may use gen.send(None) instead of your next(gen).
The answer to your 2nd question is no. Python interpreter by no mean "look ahead" and there are no exceptions, including your
... except for parsing the code and making sure it's valid Python.
Or the answer to this question is yes, if you mean “parsing only up to the next command”. ;-)

Is there a string `s` such that eval(repr(s)) leads to arbitrary code execution?

I found similar code somewhere:
USER_CONTROLLED = 'a'
open("settings.py", "w").write("USER_CONTROLLED = %s" % eval(repr(a)))
And in another file:
import settings
x = settings.USER_CONTROLLED * [0]
Is this a security vulnerability?
In contrast to what you were told on IRC, there definitely is an x that makes eval(repr(x)) dangerous, so saying it just like that without any restrictions is wrong too.
Imagine a custom object that implements __repr__ differently. The documentation says on __repr__ that it “should look like a valid Python expression that could be used to recreate an object with the same value”. But there is simply nothing that can possibly enforce this guideline.
So instead, we could create a class that has a custom __repr__ that returns a string which when evaluated runs arbitrary code. For example:
class MyObj:
def __repr__ (self):
return "__import__('urllib.request').request.urlopen('http://example.com').read()"
Calling repr() on an object of that type shows that it returns a string that can surely be evaluated:
>>> repr(MyObj())
"__import__('urllib.request').request.urlopen('http://example.com').read()"
Here, that would just involve making a request to example.com. But as you can see, we can import arbitrary modules here and run code with them. And that code can have any kind of side effects. So it’s definitely dangerous.
If we however limit that x to known types of which we know what calling repr() on them will do, then we can indeed say when it’s impossible to run arbitrary code with it. For example, if that x is a string, then the implementation of unicode_repr makes sure that everything is properly escaped and that evaluating the repr() of that object will always return a proper string (which even equals x) without any side effects.
So we should check for the type before evaluating it:
if type(a) is not str:
raise Exception('Only strings are allowed!')
something = eval(repr(a))
Note that we do not use isinstance here to do an inheritance-aware type check. Because I could absolutely make MyObj above inherit from str:
>>> x = MyObj()
>>> isinstance(x, str)
True
>>> type(x)
<class '__main__.MyObj'>
So you should really test against concrete types here.
Note that for strings, there is actually no reason to call eval(repr(x)) because as mentioned above, this will result in x itself. So you could just assign x directly.
Coming to your actual use case however, you do have a very big security problem here. You want to create a variable assignment and store that code in a Python file to be later run by an actual Python interpreter. So you should absolutely make sure that the right side of the assignment is not arbitrary code but actually the repr of a string:
>>> a = 'runMaliciousCode()'
>>> "USER_CONTROLLED = %s" % eval(repr(a))
'USER_CONTROLLED = runMaliciousCode()'
>>> "USER_CONTROLLED = %s" % repr(a)
"USER_CONTROLLED = 'runMaliciousCode()'"
As you can see, evaluating the repr() will put the actual content on the right side of the assignment (since it’s equivalent to "…" % a). But that can then lead to malicious code running when you import that file. So you should really just insert the repr of the string there, and completely forget about using eval altogether.

What is the difference between using parenthesis and not using parenthesis in a method in Python

d=dict(a=1)
What is the difference between the following two?
d.clear
d.clear()
Why can't the first clear the dictionary?
Using parenthesis calls the function where as not using them creates a reference to that function.
See below:
>>> def t():
... return "Hi"
...
>>> a = t
>>> a
<function t at 0x01BECA70>
>>> a = t()
>>> a
'Hi'
>>>
Here is a good link to explain further: http://docs.python.org/2/tutorial/controlflow.html (scroll down to the "defining functions" part).
The first one doesn't actually call the function. In Python, you can use functions as values, so you can assign a function to a new variable like this:
def timeTen(n):
return n * 10
fn = timesTen
then you can call it later:
print(fn(5)) # 50
Functions are just values that just happen to have a certain property (that you can call them).
With () you execute function. Without () you get reference to function - and you can assign this to variable and execute later with new name.
new_d = d.clear # assign
new_d() # execute
BTW: in Ruby and Perl you can call function without parenthesis.

In Python, how can I use an attribute as both and attribute and a method/callable?

This is kindof an experiment. I'm interested in an API that supports both of these syntaxes:
obj.thing
--> returns default value
obj.thing(2, 'a')
--> returns value derived from *args and **kwargs
"thing" is the same object in both cases; I'd like the calling of thing to be optional (or implicit, if there are not () after it).
I tried over-riding __repr__, but that's just the visual representation of the the object itself, and what is actually returned is an instance of the containing object (here, 'obj'). So, no good.
I'm thinking that there would be an attribute set on an object that was a callable (don't care if it's an instance, a def, or just __call__ on the object) that has enough default values:
class CallableDefault(object):
__call__(self, num=3, letter="z"):
return letter * num
class DumbObject(object):
foo = CallableDefault()
obj = DumbObject()
so, ideally, doing obj alone would return "zzz", but one could also do obj(7,'a') and get 'aaaaaaa'.
I'm thinking decorators might be the way to do this, but I'm not great with decorators. One could override the getattr() call on the containing class, but that would mean that it has to be in a containing class that supports this feature.
What you describe could work, but notice that now the value of the attribute is constrained to be an object of your CallableDefault class. This probably won't be very useful.
I strongly suggest that you don't try to do this. For one thing, you're spending a lot of time trying to trick Python into doing something it doesn't want to do. For another, the users of your API will be confused because it acts differently than every other Python code they've ever seen. They will be confused.
Write a Python API that works naturally in Python.
What happens when you do either
obj.thing
or
obj.thing(2, 'a')
is Python goes looking for thing on obj; once it has thing it either returns it (first case above), or calls it with the parameters (second case) -- the critical point being that the call does not happen until after the attribute is retrieved -- and the containing class has no way of knowing if the thing it returns will be called or not.
You could add a __call__ method to every type you might use this way, but that way lies madness.
Update
Well, as long as you're comfortable with insanity, you could try something like this:
class CallableStr(str):
def __call__(self, num, letter):
return num*letter
class CallableInt(int):
def __call__(self, num, pow):
return num ** pow
class Tester(object):
wierd = CallableStr('zzz')
big = CallableInt(3)
t = Tester()
print repr(t.wierd)
print repr(t.wierd(7, 'a'))
print repr(t.big)
print repr(t.big(2, 16))
One nice thing about this magic object is that it becomes normal soon as you use it in a calculation (or call):
print type(t.big), type(t.big + 3), t.big + 3
print type(t.big), type(t.big(2, 3) + 9), t.big(2, 3) + 9
which results in
<class '__main__.CallableInt'> <type 'int'> 6
<class '__main__.CallableInt'> <type 'int'> 17

How to check if an object is a generator object in python?

In python, how do I check if an object is a generator object?
Trying this -
>>> type(myobject, generator)
gives the error -
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'generator' is not defined
(I know I can check if the object has a next method for it to be a generator, but I want some way using which I can determine the type of any object, not just generators.)
You can use GeneratorType from types:
>>> import types
>>> types.GeneratorType
<class 'generator'>
>>> gen = (i for i in range(10))
>>> isinstance(gen, types.GeneratorType)
True
You mean generator functions ? use inspect.isgeneratorfunction.
EDIT :
if you want a generator object you can use inspect.isgenerator as pointed out by JAB in his comment.
I think it is important to make distinction between generator functions and generators (generator function's result):
>>> def generator_function():
... yield 1
... yield 2
...
>>> import inspect
>>> inspect.isgeneratorfunction(generator_function)
True
calling generator_function won't yield normal result, it even won't execute any code in the function itself, the result will be special object called generator:
>>> generator = generator_function()
>>> generator
<generator object generator_function at 0x10b3f2b90>
so it is not generator function, but generator:
>>> inspect.isgeneratorfunction(generator)
False
>>> import types
>>> isinstance(generator, types.GeneratorType)
True
and generator function is not generator:
>>> isinstance(generator_function, types.GeneratorType)
False
just for a reference, actual call of function body will happen by consuming generator, e.g.:
>>> list(generator)
[1, 2]
See also In python is there a way to check if a function is a "generator function" before calling it?
The inspect.isgenerator function is fine if you want to check for pure generators (i.e. objects of class "generator"). However it will return False if you check, for example, a izip iterable. An alternative way for checking for a generalised generator is to use this function:
def isgenerator(iterable):
return hasattr(iterable,'__iter__') and not hasattr(iterable,'__len__')
You could use the Iterator or more specifically, the Generator from the typing module.
from typing import Generator, Iterator
g = (i for i in range(1_000_000))
print(type(g))
print(isinstance(g, Generator))
print(isinstance(g, Iterator))
result:
<class 'generator'>
True
True
(I know it's an old post.) There is no need to import a module, you can declare an object for comparison at the beginning of the program:
gentyp= type(1 for i in "")
...
type(myobject) == gentyp
>>> import inspect
>>>
>>> def foo():
... yield 'foo'
...
>>> print inspect.isgeneratorfunction(foo)
True
I know I can check if the object has a next method for it to be a generator, but I want some way using which I can determine the type of any object, not just generators.
Don't do this. It's simply a very, very bad idea.
Instead, do this:
try:
# Attempt to see if you have an iterable object.
for i in some_thing_which_may_be_a_generator:
# The real work on `i`
except TypeError:
# some_thing_which_may_be_a_generator isn't actually a generator
# do something else
In the unlikely event that the body of the for loop also has TypeErrors, there are several choices: (1) define a function to limit the scope of the errors, or (2) use a nested try block.
Or (3) something like this to distinguish all of these TypeErrors which are floating around.
try:
# Attempt to see if you have an iterable object.
# In the case of a generator or iterator iter simply
# returns the value it was passed.
iterator = iter(some_thing_which_may_be_a_generator)
except TypeError:
# some_thing_which_may_be_a_generator isn't actually a generator
# do something else
else:
for i in iterator:
# the real work on `i`
Or (4) fix the other parts of your application to provide generators appropriately. That's often simpler than all of this.
If you are using tornado webserver or similar you might have found that server methods are actually generators and not methods. This makes it difficult to call other methods because yield is not working inside the method and therefore you need to start managing pools of chained generator objects. A simple method to manage pools of chained generators is to create a help function such as
def chainPool(*arg):
for f in arg:
if(hasattr(f,"__iter__")):
for e in f:
yield e
else:
yield f
Now writing chained generators such as
[x for x in chainPool(chainPool(1,2),3,4,chainPool(5,chainPool(6)))]
Produces output
[1, 2, 3, 4, 5, 6]
Which is probably what you want if your looking to use generators as a thread alternative or similar.
It's a little old question, however I was looking for similar solution for myself, but for async generator class, so you may find this helpful.
Based on utdemir reply:
import types
isinstance(async_generator(), types.AsyncGeneratorType)

Categories