I was reading the documentation for attrs. It says:
Please note that true immutability is impossible in Python
I am wondering what is the reason for that. Why someone cannot have an immutable list in Python while it is possible in C++? What is the main difference here?
TLDR; "True Immutability" is only possible on an impervious stone tablet, but it's counter-productive to the discussion of mutability, and why it is used / is important. It's not worth being technically correct at the expense of being practically wrong.
This is a bad argument of semantics. Python allows re-defining variable names with different types in an otherwise strongly typed language, which is where some of the confusion comes from, but to be clear the object a variable name refers to can very much be properly immutable.
Take for instance a tuple with a few numbers in it:
>>> tup_A = (1,2,3)
It is not possible to change the values of any of the objects in the tuple:
>>> tup_A[0] = 10
TypeError: 'tuple' object does not support item assignment
It is possible to overwrite the variable name tup_A with some other value, but then it will be a different object entirely even if it was related to the original. For example a slice of a tuple creates an entirely new object rather than a view of the original:
>>> id(tup_A)
2887473131072
>>> tup_A = tup_A[:1]
>>> id(tup_A)
2887473037616
I believe the article mentioned may also be somewhat referring to the possibility of creating custom immutable types (classes). This is also a bad argument because there are plenty of mechanisms to enforce immutability. In particular, the tools for customizing attribute access, and the #property function can be used to great effect for this. Once these methods are used to implement immutability, one would have to intentionally break the class to mutate data which was not meant to be mutated. This is of course possible because python is primarily distributed as source code, but the same could theoretically be said for the python c api. Tuples don't have to be immutable if you re-write python, but that's so far beyond the point, it's fair to say it's just wrong.
Immutability is a tool with a specific purpose. It is a good idea to use it whenever possible so an accidental slip-up will produce an error message rather than a silent bug. If you encounter errors like this, you should never ask "how can I mutate this value which was intended to be immutable?", but rather ask "why is this value not meant to be mutated, and how am I intended to utilize it?"
P.S. You could probably even mutate a tuple without editing cpython using the ctypes library by getting the actual memory locations of the objects contained within it, and overwriting the pointers, but this would break lots of things (like garbage collection ref counting). Don't do this. It's another one of those "so far beyond the point" things.
Actually other languages like C++ treat variable like storage containers. But Python treats them as a reference to memory address. Lists can be modified in place, i.e. in same memory location. But we have tuples, whose values can't be updates in place.
I don't know exactly why python treats variable in this way, but I think it is necessary for
dynamic typing. True immutability isn't possible, may refers to dynamic typing feature. You can Google to know more.
Please let me know, if this is what you wanted.
True immutability is impossible if your memory is mutable.
Think of immutability as checks, but not a hard guarantee that everything will stay the same.
Related
Question
Suppose that I have implemented two Python types using the C extension API and that the types are identical (same data layouts/C struct) with the exception of their names and a few methods. Assuming that all methods respect the data layout, can you safely change the type of an object from one of these types into the other in a C function?
Notably, as of Python 3.9, there appears to be a function Py_SET_TYPE, but the documentation is not clear as to whether/when this is safe to do. I'm interested in knowing both how to use this function safely and whether types can be safely changed prior to version 3.9.
Motivation
I'm writing a Python C extension to implement a Persistent Hash Array Mapped Trie (PHAMT); in case it's useful, the source code is here (as of writing, it is at this commit). A feature I would like to add is the ability to create a Transient Hash Array Mapped Trie (THAMT) from a PHAMT. THAMTs can be created from PHAMTs in O(1) time and can be mutated in-place efficiently. Critically, THAMTs have the exact same underlying C data-structure as PHAMTs—the only real difference between a PHAMT and a THAMT is a few methods encapsulated by their Python types. This common structure allows one to very efficiently turn a THAMT back into a PHAMT once one has finished performing a set of edits. (This pattern typically reduces the number of memory allocations when performing a large number of updates to a PHAMT).
A very convenient way to implement the conversion from THAMT to PHAMT would be to simply change the type pointers of the THAMT objects from the THAMT type to the PHAMT type. I am confident that I can write code that safely navigates this change, but I can imagine that doing so might, for example, break the Python garbage collector.
(To be clear: the motivation is just context as to how the question arose. I'm not looking for help implementing the structures described in the Motivation, I'm looking for an answer to the Question, above.)
The supported way
It is officially possible to change an object's type in Python, as long as the memory layouts are compatible... but this is mostly limited to types not implemented in C. With some restrictions, it is possible to do
# Python attribute assignment, not C struct member assignment
obj.__class__ = some_new_class
to change an object's class, with one of the restrictions being that both the old and new classes must be "heap types", which all classes implemented in Python are and most classes implemented in C are not. (types.ModuleType and subclasses of that type are also specifically permitted, despite types.ModuleType not being a heap type. See the source for exact restrictions.)
If you want to create a heap type from C, you can, but the interface is pretty different from the normal way of defining Python types from C. Plus, for __class__ assignment to work, you have to not set the Py_TPFLAGS_IMMUTABLETYPE flag, and that means that people will be able to monkey-patch your classes in ways you might not like (or maybe you see that as an upside).
If you want to go that route, I suggest looking at the CPython 3.10 _functools module source code for an example. (They set the Py_TPFLAGS_IMMUTABLETYPE flag, which you'll have to make sure not to do.)
The unsupported way
There was an attempt at one point to allow __class__ assignment for non-heap types, as long as the memory layouts worked. It got abandoned because it caused problems with some built-in immutable types, where the interpreter likes to reuse instances. For example, allowing (1).__class__ = SomethingElse would have caused a lot of problems. You can read more in the big comment in the source code for the __class__ setter. (The comment is slightly out of date, particularly regarding the Py_TPFLAGS_IMMUTABLETYPE flag, which was added after the comment was written.)
As far as I know, this was the only problem, and I don't think any more problems have been added since then. The interpreter isn't going to aggressively reuse instances of your classes, so as long as you're not doing anything like that, and the memory layouts are compatible, I think changing the type of your objects should work for now, even for non-heap-types. However, it is not officially supported, so even if I'm right about this working for now, there's no guarantee it'll keep working.
Py_SET_TYPE only sets an object's type pointer. It doesn't do any refcount fixing that might be needed. It's a very low-level operation. If neither the old class nor the new class are heap types, no extra refcount fixing is needed, but if the old class is a heap type, you will have to decref the old class, and if the new class is a heap type, you will have to incref the new class.
If you need to decref the old class, make sure to do it after changing the object's class and possibly incref'ing the new class.
According to the language reference, chapter 3 "Data model" (see here):
An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The type() function returns an object’s type (which is an object itself). Like its identity, an object’s type is also unchangeable.[1]
which, to my mind states that the type must never change, and changing it would be illegal as it would break the language specification. The footnote however states that
[1] It is possible in some cases to change an object’s type, under certain controlled conditions. It generally isn’t a good idea though, since it can lead to some very strange behaviour if it is handled incorrectly.
I don't know of any method to change the type of an object from within python itself, so the "possible" may indeed refer to the CPython function.
As far as I can see a PyObject is defined internally as a
struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
};
So the reference counting should still work. On the other hand you will segfault the interpreter if you set the type to something that is not a PyTypeObject, or if the pointer is free()d, so the usual caveats.
Apart from that I agree that the specification is a little ambiguous, but the question of "legality" may not have a good answer. The long and short of it seems to me to be "do not change types unless you know what your are doing, and if you are not hacking on CPython itself you do not know what you are doing".
Edit: The Py_SET_TYPE function was added in Python 3.9 based on this commit. Apparently, people used to just set the type using
Py_TYPE(obj) = typeobj;
So the inclusion (without being formerly announced as far as I can see) is more akin to adding a convenience function.
Pandas defines a FrozenList object, described in its documentation as a
[c]ontainer that doesn't allow setting item but because it's technically non-hashable, will be used for lookups, appropriately, etc.
I don't understand what benefit there is in having an immutable-but-non-hashable container. Could someone give me an example where the hashability of a standard Python tuple would cause problems?
The answer to this similar question does not give any reason for why one would want an immutable-but-non-hashable container. In particular, it does not explain why a Python tuple's hashability makes it unsuitable for whatever task the FrozenLists are being used for. Therefore,
THIS QUESTION IS NOT A DUPLICATE.
This class is indeed a bit confusing, because the discussion on that question indicates that it is mutable via pandas internals, but I don't see how that ability is actually used. Being unhashable has no specific benefit. The benefit is that the object is mutable (although, again, as that question states, it is mutable only via the C internals, not via Python code). If an object is mutable, it cannot be hashable. (Or rather, it can cannot have its hash value depend on its mutable state; for an object like a list/tuple that has no real state except its mutable contents, there's no sensible way to make it hashable if it's mutable.)
In addition, there is this comment in the source code:
# Sidenote: This has to be of type list, otherwise it messes up PyTables
# typechecks
As the other question also says, there are other reasons to use a custom class instead of tuple. For instance, FrozenList has a custom __repr__.
Basically, it does not appear that the class was written because someone thought "I need an immutable but nonhashable container". Rather, it appears it was written because someone thought "I need a class that is mutable, but only secretly, and I need to be able to give it custom methods, and I need it to be a subclass of list to avoid breaking this other library." Some of the comments on that other question/answer suggest that nonhashability per se may never have been a necessary criterion, and even if was, it may no longer be necessary due to other changes in how pandas works. I didn't write the class, so I can't be sure, but it seems to me that the comment you quoted in your question is misleading about what the real impetus was for making such a class.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
without explicit (type) declaration I struggle to try to figure out how things work --- are there some good thumbs of rule/tips that you may have for reading python code better? Thanks!
In spite of the first impression that this question gives, I think it is indeed really intelligent because it reveals that you are subconscious of something that should interest any Python's developper but that I find very neglected in general and in explanations in particular, if not misunderstood.
I mean that IMO the base of Python is terrificly quaint and intelligent: it's the data model on which it has been conceived.
In this Python's data model, there are no variables in the sense of "chunks of memory whose contents can change", contrary to other languages, and in the sense that we don't manage this precise kind of variables in Python.
More precisely, all is object in Python, and every object is named and designed with an identifier, but neither the object nor the identifier are 'variables' in the said sense.
That doesn't mean that there are no little boxes, so called variables in other languages, temporarily hosting values that go in and out of them, in the depthes of the implementation.
.
Say an object is designed with the identifier XYA2.
Personally I use this appearance of letters to designate any identifier. An identifier is nothing else than a word written in a code. It is what appears in a code.
Note that this appearance of letters is the one used by this stackoverflow.com site to represent a code sample inside text, by clicking on the button {}. That's easy to remind.
Now, the object whose name is XYA2 is a real thing, a concrete set of bits lying in the memory of the computer to represent the desired conceptual value that it stands for.
This set is defined in C language in which Python is implemented.
Personnaly, I bold the letters when I want to designate an object.
Then the object of name XYA2 is, for me, refered to by XYA2
The identifier is XYA2
It is linked to an underlying and inaccessible pointer that points to the object.
This link is done by means of the symbol table. You will see very few references or allusions to symbol table in general, here on stackoverflow or elsewhere. However it's very important, I think.
The pointer linked to the identifier XYA2 points to the object XYA2
So, XYA2 is directly linked to the pointer and indirectly linked to the object.
Instead of saying "indirectly linked", we say "assigned". An object and its identifier are reciprocally assigned one to the other, but the medium of this link is the underlying pointer.
.
And now, something important.
Strictly speaking, a variable is a "chunk of memory whose content can change".
I personally do efforts to never use the word 'variable' in an other sense that this one.
The problem is that, because of the use of the word 'variable' in mathematics, this word is very often used indiscriminately and thrown in all the wind's directions by many developpers (not all) even when it isn't justified.
Thereby, it is commonly used by nearly everybody to designates the names, aka the identifiers in a code. But this practice is horribly confusing. It should be carefully avoided.
That said, an object in Python is not only an instance of some class, it is above all a concrete set of bits; set which IS NOT, as far as I know, a variable, in the sense of "chunk of memory whose content can change".
Hence my opinion that there aren't variables in Python, since the only entities we can access to and manipulate are identifiers and objects.
However, the processes under the hood in an executed Python program use quantities of pointers that are, as far as I know, real variables in the strict sense of this word.
So, in a sense, it could be said that my affirmation 'There are no variables in Python" is false.
It's a matter of point of view.
As a developer in Python, conceptually speaking, I don't manage variables. When I think to an algorithm, I don't think at the level of the pointers, even if I know they exist and that it's very important to know they exist. Being not at the level of the variables, but at the level of the Python's data model, I don't see why I should accept to believe that there are variables in a Python program. There are variables at the machine low-level, and Python is a very-high-level language.
.
Why did I write all this ?
1)
because the nature of the Python's data model has quantities of consequences that can't be understood if this data model isn't known. Among these consequences, some are interesting because they give incredible possibilities, others are traps (a well known example is: modifying an element in a copied list modifies also the element in the original list). That's why it's of first importance to learn about this data model.
For that, I recommend you to read these parts of the documentation:
3.1 of objects-values-and-types
4.1 of naming-and-binding
.
2)
To justify my answer to your perplexity: don't struggle about what happens under the hood:
there's a garbage colector, a reference counter, wagons of underlying dictionaries-like entities, a thunderous ballet of values in the secret of the underlying pointers, many verifications made by the interpreter... When something doesn't fit well , warning is given in the form of exception's messages.
Python has all the machinery under control
The only concern you must have is to think about the algorithm you want to achieve, and for that, knowing the data model is essential.
Welcome in the Python universe
Warning
I don't consider myself as a very skilled Python developper, I'm just an amateur who had a lot of problems before understanding some essential things about Python.
All the above description is my personal views about the data model of Python. If any point is incorrect in this description, I will be happy to learn more about it if the teaching is done with developped argumentation.
But I underline the fact that this vision of things allows me to understand and to answer to a lot of tough problems and to achieve some tricky mechanisms that Python is capable of. So, all can't be false in this above description.
You should take a look at PEP8 documentation This describes the Python formatting and style.
Read up on Duck Typing. One of the purposes of Duck Typing is that you shouldn't be thinking too much about the type of something anyway. What really concerns you is that the the variable can be used the way that you want it.
In Python, you don't need a type declaration because the name you assign is just a pointer to an object, and furthermore it can change at any time.
a = None
a = 1+5
a = my_function() # calls my function and assigns the return object to a
a = my_function # Assigns the function itself to a. You could actually pass it as a parameter
a = MyClass() # Runs the __init__() function of the class and assigns the return value to a
a = MyClass # Assigns the class itself to a.
This is all valid Python. You could run this sequentially, although changing up the type is frowned upon unless its totally clear as to why.
if you know the c++11 then it is similer to auto type.
The variable type is decided on the bases of its assignment.
Note: I'm not talking about preventing the rebinding of a variable. I'm talking about preventing the modification of the memory that the variable refers to, and of any memory that can be reached from there by following the nested containers.
I have a large data structure, and I want to expose it to other modules, on a read-only basis. The only way to do that in Python is to deep-copy the particular pieces I'd like to expose - prohibitively expensive in my case.
I am sure this is a very common problem, and it seems like a constant reference would be the perfect solution. But I must be missing something. Perhaps constant references are hard to implement in Python. Perhaps they don't quite do what I think they do.
Any insights would be appreciated.
While the answers are helpful, I haven't seen a single reason why const would be either hard to implement or unworkable in Python. I guess "un-Pythonic" would also count as a valid reason, but is it really? Python does do scrambling of private instance variables (starting with __) to avoid accidental bugs, and const doesn't seem to be that different in spirit.
EDIT: I just offered a very modest bounty. I am looking for a bit more detail about why Python ended up without const. I suspect the reason is that it's really hard to implement to work perfectly; I would like to understand why it's so hard.
It's the same as with private methods: as consenting adults authors of code should agree on an interface without need of force. Because really really enforcing the contract is hard, and doing it the half-assed way leads to hackish code in abundance.
Use get-only descriptors, and state clearly in your documentation that these data is meant to be read only. After all, a determined coder could probably find a way to use your code in different ways you thought of anyways.
In PEP 351, Barry Warsaw proposed a protocol for "freezing" any mutable data structure, analogous to the way that frozenset makes an immutable set. Frozen data structures would be hashable and so capable being used as keys in dictionaries.
The proposal was discussed on python-dev, with Raymond Hettinger's criticism the most detailed.
It's not quite what you're after, but it's the closest I can find, and should give you some idea of the thinking of the Python developers on this subject.
There are many design questions about any language, the answer to most of which is "just because". It's pretty clear that constants like this would go against the ideology of Python.
You can make a read-only class attribute, though, using descriptors. It's not trivial, but it's not very hard. The way it works is that you can make properties (things that look like attributes but call a method on access) using the property decorator; if you make a getter but not a setter property then you will get a read-only attribute. The reason for the metaclass programming is that since __init__ receives a fully-formed instance of the class, you actually can't set the attributes to what you want at this stage! Instead, you have to set them on creation of the class, which means you need a metaclass.
Code from this recipe:
# simple read only attributes with meta-class programming
# method factory for an attribute get method
def getmethod(attrname):
def _getmethod(self):
return self.__readonly__[attrname]
return _getmethod
class metaClass(type):
def __new__(cls,classname,bases,classdict):
readonly = classdict.get('__readonly__',{})
for name,default in readonly.items():
classdict[name] = property(getmethod(name))
return type.__new__(cls,classname,bases,classdict)
class ROClass(object):
__metaclass__ = metaClass
__readonly__ = {'a':1,'b':'text'}
if __name__ == '__main__':
def test1():
t = ROClass()
print t.a
print t.b
def test2():
t = ROClass()
t.a = 2
test1()
While one programmer writing code is a consenting adult, two programmers working on the same code seldom are consenting adults. More so if they do not value the beauty of the code but them deadlines or research funds.
For such adults there is some type safety, provided by Enthought's Traits.
You could look into Constant and ReadOnly traits.
For some additional thoughts, there is a similar question posed about Java here:
Why is there no Constant feature in Java?
When asking why Python has decided against constant references, I think it's helpful to think of how they would be implemented in the language. Should Python have some sort of special declaration, const, to create variable references that can't be changed? Why not allow variables to be declared a float/int/whatever then...these would surely help prevent programming bugs as well. While we're at it, adding class and method modifiers like protected/private/public/etc. would help enforce compile-type checking against illegal uses of these classes. ...pretty soon, we've lost the beauty, simplicity, and elegance that is Python, and we're writing code in some sort of bastard child of C++/Java.
Python also currently passes everything by reference. This would be some sort of special pass-by-reference-but-flag-it-to-prevent-modification...a pretty special case (and as the Tao of Python indicates, just "un-Pythonic").
As mentioned before, without actually changing the language, this type of behaviour can be implemented via classes & descriptors. It may not prevent modification from a determined hacker, but we are consenting adults. Python didn't necessarily decide against providing this as an included module ("batteries included") - there was just never enough demand for it.
My karate instructor is fond of saying, "a block is a lock is a throw is a blow." What he means is this: When we come to a technique in a form, although it might seem to look like a block, a little creativity and examination shows that it can also be seen as some kind of joint lock, or some kind of throw, or some kind of blow.
So it is with the way the django template syntax uses the dot (".") character. It perceives it first as a dictionary lookup, but it will also treat it as a class attribute, a method, or list index - in that order. The assumption seems to be that, one way or another, we are looking for a piece of knowledge. Whatever means may be employed to store that knowledge, we'll treat it in such a way as to get it into the template.
Why doesn't python do the same? If there's a case where I might have assigned a dictionary term spam['eggs'], but know for sure that spam has an attribute eggs, why not let me just write spam.eggs and sort it out the way django templates do?
Otherwise, I have to except an AttributeError and add three additional lines of code.
I'm particularly interested in the philosophy that drives this setup. Is it regarded as part of strong typing?
django templates and python are two, unrelated languages. They also have different target audiences.
In django templates, the target audience is designers, who proabably don't want to learn 4 different ways of doing roughly the same thing ( a dictionary lookup ). Thus there is a single syntax in django templates that performs the lookup in several possible ways.
python has quite a different audience. developers actually make use of the many different ways of doing similar things, and overload each with distinct meaning. When one fails it should fail, because that is what the developer means for it to do.
JUST MY correct OPINION's opinion is indeed correct. I can't say why Guido did it this way but I can say why I'm glad that he did.
I can look at code and know right away if some expression is accessing the 'b' key in a dict-like object a, the 'b' attribute on the object a, a method being called on or the b index into the sequence a.
Python doesn't have to try all of the above options every time there is an attribute lookup. Imagine if every time one indexed into a list, Python had to try three other options first. List intensive programs would drag. Python is slow enough!
It means that when I'm writing code, I have to know what I'm doing. I can't just toss objects around and hope that I'll get the information somewhere somehow. I have to know that I want to lookup a key, access an attribute, index a list or call a method. I like it that way because it helps me think clearly about the code that I'm writing. I know what the identifiers are referencing and what attributes and methods I'm expecting the object of those references to support.
Of course Guido Van Rossum might have just flipped a coin for all I know (He probably didn't) so you would have to ask him yourself if you really want to know.
As for your comment about having to surround these things with try blocks, it probably means that you're not writing very robust code. Generally, you want your code to expect to get some piece of information from a dict-like object, list-like object or a regular object. You should know which way it's going to do it and let anything else raise an exception.
The exception to this is that it's OK to conflate attribute access and method calls using the property decorator and more general descriptors. This is only good if the method doesn't take arguments.
The different methods of accessing
attributes do different things. If
you have a function foo the two lines
of code
a = foo,
a = foo()
do two
very different things. Without
distinct syntax to reference and call
functions there would be no way for
python to know whether the variable
should be a reference to foo or the
result of running foo. The () syntax removes the ambiguity.
Lists and dictionaries are two very different data structures. One of the things that determine which one is appropriate in a given situation is how its contents can be accessed (key Vs index). Having separate syntax for both of them reinforces the notion that these two things are not the same and neither one is always appropriate.
It makes sense for these distinctions to be ignored in a template language, the person writing the html doesn't care, the template language doesn't have function pointers so it knows you don't want one. Programmers who write the python that drive the template however do care about these distinctions.
In addition to the points already posted, consider this. Python uses special member variables and functions to provide metadata about the object. Both the interpreter and programmers make heavy use of these. For example, both dicts and lists have a __len__ member function. Now, if a dict's data were accessed by using the . operator, a potential ambiguity arises if the dict has a key called __len__. You could special-case these, but many objects have a __dict__ attribute which is a mapping of member names and values. If that object happened to be a container, which also defined a __len__ attribute, you would end up with an utter mess.
Problems like this would end up turning Python into a mishmash of special cases that the programmer would have to constantly be aware of. This would detract from the reason why many people use Python in the first place, i.e., its elegant simplicity.
Now, consider that new users often shadow built-ins (if the code in SO questions is any indication) and having something like this starts to look like a really bad idea, since it would exacerbate the problem many-fold.
In addition to the responses above, it's not practical to merge dictionary lookup and object lookup in general because of the restrictions on object members.
What if your key has whitespace? What if it's an int, or a frozenset, etc.? Dot notation can't account for these discrepancies, so while it's an acceptable tradeoff for a templating language, it's unacceptable for a general-purpose programming language like Python.