setattr accepts invalid identifiers

setattr accepts invalid identifiers - python

Here's what I mean:
>>> class Foo:
pass
>>> foo = Foo()
>>> setattr(foo, "#%#$%", 10)
>>> foo.#%#$%
SyntaxError: invalid syntax
>>> getattr(foo, "#%#$%")
10
>>> foo.__dict__
{'#%#$%': 10}
I looked it up and it has been brought up twice on the issue tracker for python 2:
https://bugs.python.org/issue14029
https://bugs.python.org/issue25205
And once for python 3:
https://bugs.python.org/issue35105
They insist it isn't a bug. Yet this behavior is quite obviously not intended; it's not documented in any version. What is the explanation for this? It seems like something that can be ignored easily, but that feels like sweeping it under the rug. So, is there any reason behind setattr's behavior or is it just a benign idiosyncrasy of python?

A bug is something that happens when it's not supposed to happen, i.e., when there's some method of communication forbidding it. If there's no documentation stating this shouldn't happen then (at worst) it's an idiosyncrasy, not a bug.
There appears to be nothing in the Python documentation forbidding attribute names that are not usable with the dot notation (which is, after all, just syntactic sugar), like foo.#%#$%. The only mention is an example of where they are equivalent, specifically:
For example, setattr(x, 'foobar', 123) is equivalent to x.foobar = 123.
The only restriction appears to be whether the class itself allows it:
The function assigns the value to the attribute, provided the object allows it.
In a more formal sense, the dot notation is specified here:
6.3.1. Attribute references
An attribute reference is a primary followed by a period and a name: attributeref ::= primary "." identifier.
The primary must evaluate to an object of a type that supports attribute references, which most objects do. This object is then asked to produce the attribute whose name is the identifier. This production can be customized by overriding the __getattr__() method.
Note the identifier in that syntax, it has limits above and beyond those of actual attribute names, as per here, and PEP 3131 is a more detailed look at what is allowed (it was the PEP that moved identifiers into the non-ASCII world).
Since the limits of identifiers are more restrictive that what is allowed in strings, it makes sense that the getattr/setattr attribute names could be a superset of the ones allowed in dot notation.

Related

Understanding Python Attributes and Methods

I am trying to learn Python and am a bit confused about a script I am playing with. I am using Python to launch scapy. There are some conditional statements that test for certain values. My confusion is centered around how the values are checked. I hope I am using the terms attributes and methods appropriately. I am still trying to figure out the builtin features vs. what is included with scapy. I've been using Powershell mainly for the last few years so its hard to switch gears :)
tcp_connect_scan_resp = sr1(IP(dst=dst_ip)/TCP(sport=src_port,dport=dst_port,flags="S"),timeout=10)
if(str(type(tcp_connect_scan_resp))=="<type 'NoneType'>"):
Print "Closed"
elif(tcp_connect_scan_resp.haslayer(TCP)):
if(tcp_connect_scan_resp.getlayer(TCP).flags == 0x12):
The first conditional statement appears to be check for the attribute 'type'. Why would they use the Python built-in str() and type() functions in this case? If I just use type() it pulls the same value.
For the second and third conditional statements appear to be using methods built into scapy. What is the logic for including the brackets () on the outside of the statements? Again if I run them manually, I get the proper value.

The second statement, the parantheses around the expression of an if statement, is simply unnecessary and bad style.
The first statement warrants a more detailed explanation:
if(str(type(tcp_connect_scan_resp))=="<type 'NoneType'>"):
This checks if the string representation of the type that tcp_connect_scan_resp is of is equal to "". This is a bad form of type checking, used in a bad way. There are situations where type checking may be necessary, but generally you should try to avoid it in Python (see duck typing). If you must, use isinstance().
In the case of the Python builtin type None, the idiomatic way is to just write
if foo is None
Now, the reason you got the "same result" by using type() yourself, is that if you enter someting in an interactive Python shell, the interpreter represents the value for you (by calling __repr__()). Except for basic types that have literal notations, like integers, strings, or sequences, the representation of an object isn't necessarlily the same as its value (or what you would type in to recreate that same object).
So, when you do
>>> foo = type(42)
>>> foo
<type 'int'>
the interpreter prints '<type 'int'>', but the result of the call is actualy int, the built-in type for integers:
>>> type(42) == int
True
>>> type(42) == "<type 'int'>"
False
Also, consider this:
Libraries or tools written to help with a specific field of expertise are often written by experts in those fields - not necessarily experts in Python. In my opinion, you often see this in scientific libraries (matplotlib and numpy for example). This doesn't mean they're bad libraries, but they often aren't a good inspiration for Pythonic style.

Never check a type by comparing str(type(obj)) == 'ClassName'.
You should use isinstance(obj, Class), or for None you just write if obj is None.

Python: emulate C-style pass-by-reference for variables

I have a framework with some C-like language. Now I'm re-writing that framework and the language is being replaced with Python.
I need to find appropriate Python replacement for the following code construction:
SomeFunction(&arg1)
What this does is a C-style pass-by-reference so the variable can be changed inside the function call.
My ideas:
just return the value like v = SomeFunction(arg1)
is not so good, because my generic function can have a lot of arguments like SomeFunction(1,2,'qqq','vvv',.... and many more)
and I want to give the user ability to get the value she wants.
Return the collection of all the arguments no matter have they changed or not, like: resulting_list = SomeFunction(1,2,'qqq','vvv',.... and many more) interesting_value = resulting_list[3]
this can be improved by giving names to the values and returning dictionary interesting_value = resulting_list['magic_value1']
It's not good because we have constructions like
DoALotOfStaff( [SomeFunction1(1,2,3,&arg1,'qq',val2),
SomeFunction2(1,&arg2,v1),
AnotherFunction(),
...
], flags1, my_var,... )
And I wouldn't like to load the user with list of list of variables, with names or indexes she(the user) should know. The kind-of-references would be very useful here ...
Final Response
I compiled all the answers with my own ideas and was able to produce the solution. It works.
Usage
SomeFunction(1,12, get.interesting_value)
AnotherFunction(1, get.the_val, 'qq')
Explanation
Anything prepended by get. is kind-of reference, and its value will be filled by the function. There is no need in previous defining of the value.
Limitation - currently I support only numbers and strings, but these are sufficient form my use-case.
Implementation
wrote a Getter class which overrides getattribute and produces any variable on demand
all newly created variables has pointer to their container Getter and support method set(self,value)
when set() is called it checks if the value is int or string and creates object inheriting from int or str accordingly but with addition of the same set() method. With this new object we replace our instance in the Getter container
Thank you everybody. I will mark as "answer" the response which led me on my way, but all of you helped me somehow.

I would say that your best, cleanest, bet would be to construct an object containing the values to be passed and/or modified - this single object can be passed, (and will automatically be passed by reference), in as a single parameter and the members can be modified to return the new values.
This will simplify the code enormously and you can cope with optional parameters, defaults, etc., cleanly.
>>> class C:
... def __init__(self):
... self.a = 1
... self.b = 2
...
>>> c=C
>>> def f(o):
... o.a = 23
...
>>> f(c)
>>> c
<class __main__.C at 0x7f6952c013f8>
>>> c.a
23
>>>
Note
I am sure that you could extend this idea to have a class of parameter that carried immutable and mutable data into your function with fixed member names plus storing the names of the parameters actually passed then on return map the mutable values back into the caller parameter name. This technique could then be wrapped into a decorator.
I have to say that it sounds like a lot of work compared to re-factoring your existing code to a more object oriented design.

This is how Python works already:
def func(arg):
arg += ['bar']
arg = ['foo']
func(arg)
print arg
Here, the change to arg automatically propagates back to the caller.
For this to work, you have to be careful to modify the arguments in place instead of re-binding them to new objects. Consider the following:
def func(arg):
arg = arg + ['bar']
arg = ['foo']
func(arg)
print arg
Here, func rebinds arg to refer to a brand new list and the caller's arg remains unchanged.

Python doesn't come with this sort of thing built in. You could make your own class which provides this behavior, but it will only support a slightly more awkward syntax where the caller would construct an instance of that class (equivalent to a pointer in C) before calling your functions. It's probably not worth it. I'd return a "named tuple" (look it up) instead--I'm not sure any of the other ways are really better, and some of them are more complex.

There is a major inconsistency here. The drawbacks you're describing against the proposed solutions are related to such subtle rules of good design, that your question becomes invalid. The whole problem lies in the fact that your function violates the Single Responsibility Principle and other guidelines related to it (function shouldn't have more than 2-3 arguments, etc.). There is really no smart compromise here:
either you accept one of the proposed solutions (i.e. Steve Barnes's answer concerning your own wrappers or John Zwinck's answer concerning usage of named tuples) and refrain from focusing on good design subtleties (as your whole design is bad anyway at the moment)
or you fix the design. Then your current problem will disappear as you won't have the God Objects/Functions (the name of the function in your example - DoALotOfStuff really speaks for itself) to deal with anymore.

What are properties called in Python?

I'm trying to figure out the proper name for these properties which are written using underscores, so that I can read about them and understand them more. They seem to generally be lower level things, more advanced stuff for really explicit behavior.
What terminology is used for these underscore properties/methods?

"Magic Methods". You can learn more about them here: http://docs.python.org/2/reference/datamodel.html#basic-customization
Important ones are:
__init__(): Constructor for a class
__str__() (or __unicode__(): verbose name of the object used whenever string conversion is needed (e.g. when calling print my_object
I'd say those are the one you'll need in the beginning.

"Magic methods" is a term often used for those that are methods. "Double-underscore" is also sometimes used.
PEP 8 describes them as "magic".

Dunder. e.g. __init__ can be referred to as "dunder init". See this alias.

How can I describe a Python object, and what is the relationship between attributes, identity, type and value?

(In regards to Python 3.2)
I'm trying to make a statement along the lines of:
In Python, an object is...
According to the doc (http://docs.python.org/py3k/reference/datamodel.html#objects-values-and-types):
Every object has an identity, a type and a value
But where do attributes fall into that? If I do something like a = 3; print(a.__class__) I get <class 'int'> I assume that is the type of the object a references, meaning that "type" is an "attribute" of an object. So in that sense we can say a sufficient set of "things" an object has would be its identity, value and attributes. However, looking through the attributes of a using dir(a), I do not see anything resembling identity (even though I know the id() function will tell me that information).
So my question is are any of the following minimal statements to sufficiently describe the notion of a Python object?
In Python an object has attributes, of which always include an identity, type and value.
In Python an object has an identity and attributes, of which always include its type and value.
In Python an object has an identity, value and attributes, of which always include its type, among other things.
If not, could someone give me a definition that conveys the relationships attributes, identity, type and value for an object?
(I would prefer number 1 to be true. :P)

While you can access the type of an object through an attribute, its type isn't just an attribute -- the type defines how the object was created before it had any attributes at all. By that fact alone none of those statements is sufficient to describe a Python object.
I'd say it this way:
In Python, everything is an object.
An object is a block of information, which has a type, which defines its creation and how it interacts with other objects, an identity, which differentiates it from all other objects, and a value, which is the information in the block. Attributes are other objects associated with a given object, including the object that is its type.
You should then give some examples of things people might not expect to be objects, like functions.
A paragraph on "What is an object" can be found in Dive Into Python:
Everything in Python is an object, and almost everything has attributes and methods. All functions have a built-in attribute __doc__, which returns the doc string defined in the function's source code. The sys module is an object which has (among other things) an attribute called path. And so forth.
Still, this begs the question. What is an object? Different programming languages define “object” in different ways. In some, it means that all objects must have attributes and methods; in others, it means that all objects are subclassable. In Python, the definition is looser; some objects have neither attributes nor methods (more on this in Chapter 3), and not all objects are subclassable (more on this in Chapter 5). But everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function (more in this in Chapter 4).

I think you are getting your terminology mixed up.
identity
The identity of an object is just a value that uniquely identifies that object during it's lifetime. So, id(1) is guaranteed to be different from id(2).
type
The type of an object tells you something more; it tells you what operations you can use with that object, and what possible values can be stored in that object. There are two kinds of types: built-in and user-defined. Built-in types are int, string, etc. User-defined are the classes you define yourself.
value
This is what is stored inside the type.
attributes
These are additional variables that you can get to from your existing object.
Examples:
>>> s = "foo" # s is a new variable
>>> id(s) # it has a unique identifier
4299579536
>>> type(s) # it has a type ("str")
<type 'str'>
>>> s # it has a value ("foo")
'foo'
>>> s.__class__ # it has an attribute called __class__ that has the following value:
<type 'str'>
So all three of your statements could be correct, but #3 sounds "most correct" to me. I have also heard
In Python, an object is a dict.
Which makes sense to me, but might not make sense to everyone.

But where do attributes fall into that?
(Some combination of) a particular object's attributes record and determine its value.
All three of value, type and identity are abstract concepts that can only really be understood by example. Consider:
>>> a = (1, 2)
>>> b = (1, 2)
>>> c = a
>>> d = (3, 4)
>>> e = 2
You no doubt understand that a, b, and c are all equal, but only a and c are the same object. That is, all three have the same value, and a and c have the same identity. d is the same type of object as a (a tuple), but has a different value; e is a completely different type (an int).
The value of e is trivial to understand for a human (since int literals are spelled with well known symbols), but Python keeps track of several attributes for it:
>>> e.numerator
2
>>> e.denominator
1
>>> e.real
2
>>> e.imag
0
These together determine the value - although being an int, it is guaranteed that the denominator is 1, and the imaginary part is 0 (and, indeed, all built in Rationals have a zero imaginary part). So, the value of an int is its numerator.
But "value" is something abstract - it makes sense for humans, the computer has to distil it down a bit more. Similarly with type - "what type of thing is this?" is something humans understand directly; computers don't, but Python uses classes to implement it - hence,
>>> type('')
<class 'str'>
>>> type(2)
<class 'int'>
You ask "what type of thing is this?", and it gives you back its class. The type is stored as an attribute, but only because Python considers an object's type to be (a potential) part of its value. Identity isn't part of the value (although in some cases, they can be equivalent), and so it isn't recorded in an attribute - its stored in the internal machinery that lets Python find the object you want when you say a.

What languages other than Python have an explicit self?

It seems a bit weird that Python requires you to explicitly pass self as the first argument to all class functions. Are there other languages that require something similar?

By explicit, do you mean "explicitly passed as an argument to each class function"?
If so, then Python is the only one I know off-hand.
Most OO languages support this or self in some form, but most of them let you define class functions without always defining self as the first argument.

Depending on your point of view, Lua. To quote the reference: "A call v:name(args) is syntactic sugar for v.name(v,args), except that v is evaluated only once." You can also define methods using either notation. So you could say that Lua has an optional explicit self.

The programming language Oberon 2 has an explicitly named but not explicitly passed 'this' or 'self' argument for member functions of classes (known as type bound procedures in Oberon terminology)
The following example is an Insert method on a type Text, where the identifier 't' is specified to bind to the explicit 'this' or 'self' reference.
PROCEDURE (t: Text) Insert (string: ARRAY OF CHAR; pos: LONGINT);
BEGIN ...
END Insert;
More details on Object Orientation in Oberon are here.

F# (presumably from its OCAML heritage) requires an explicit name for all self-references; though the name is any arbitrary identifier e.g.
override x.BeforeAnalysis() =
base.BeforeAnalysis()
DoWithLock x.AddReference
Here we're defining an overriding member function BeforeAnalysis which calls another member function AddReference. The identifier x here is arbitrary, but is required in both the declaration and any reference to members of the "this"/"self" instance.

Modula-3 does. Which is not too surprising since Python's class mechanism is a mixture of the ones found in Modula-3 and C++.

any Object-Oriented language has a notion of this or self within member functions.

Clojure isn't an OOP language but does use explicit self parameters in some circumstances: most notably when you implement a protocol, and the "self" argument (you can name it anything you like) is the first parameter to a protocol method. This argument is then used for polymorphic dispatch to determine the right function implementation, e.g.:
(defprotocol MyProtocol
(foo [this that]))
(extend-protocol MyProtocol String
(foo [this that]
(str this " and " that)))
(extend-protocol MyProtocol Long
(foo [this that]
(* this that)))
(foo "Cat" "Dog")
=> "Cat and Dog"
(foo 10 20)
=> 200
Also, the first parameter to a function is often used by convention to mean the object that is being acted upon, e.g. the following code to append to a vector:
(conj [1 2 3] 4)
=> [1 2 3 4]

many object oriented languages if not all of them
for example c++ support "this" instead of "self"
but you dont have to pass it, it is passed passively
hope that helps ;)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.