What is the use case of immutable types/objects like tuple in python.
Tuple('hello')
('h','i')
Where we can use the not changeable sequences.
One common use case is the list of (unnamed) arguments to a function.
In [1]: def foo(*args):
...: print(type(args))
...:
In [2]: foo(1,2,3)
<class 'tuple'>
Technically, tuples are semantically different to lists.
When you have a list, you have something that is... a list. Of items of some sort. And therefore can have items added or removed to it.
A tuple, on the other hand, is a set of values in a given order. It just happens to be one value that is made up of more than one value. A composite value.
For example. Say you have a point. X, Y. You could have a class called Point, but that class would have a dictionary to store its attributes. A point is only two values which are, most of the time, used together. You don't need the flexibility or the cost of a dictionary for storing named attributes, you can use a tuple instead.
myPoint = 70, 2
Points are always X and Y. Always 2 values. They are not lists of numbers. They are two values in which the order of a value matters.
Another example of tuple usage. A function that creates links from a list of tuples. The tuples must be the href and then the label of the link. Fixed order. Order that has meaning.
def make_links(*tuples):
return "".join('%s' % t for t in tuples)
make_links(
("//google.com", "Google"),
("//stackoveflow.com", "Stack Overflow")
)
So the reason tuples don't change is because they are supposed to be one single value. You can only assign the whole thing at once.
Here is a good resource that describes the difference between tuples and lists, and the reasons for using each: https://mail.python.org/pipermail/tutor/2001-September/008888.html
The main reason outlined in that link is that tuples are immutable and less extensive than say, lists. This makes them useful only in certain situations, but if those situations can be identified, tuples take up much less resources.
Immutable objects will make life simpler in many cases. They are especially applicable for value types, where objects don't have an identity so they can be easily replaced. And they can make concurrent programming way safer and cleaner (most of the notoriously hard to find concurrency bugs are ultimately caused by mutable state shared between threads). However, for large and/or complex objects, creating a new copy of the object for every single change can be very costly and/or tedious. And for objects with a distinct identity, changing an existing objects is much more simple and intuitive than creating a new, modified copy of it.
Related
I have a condition that compares one object to several others, like so:
if 'a' in ('a','b','c','e'):
The sequence was created for this purpose and doesn't exist anywhere else in the function. What are the pros and cons to grouping it as a tuple, list, or set, given that they all seem to work the same and the list is short? Which would be idiomatic?
Use a set until you have good reason not to. (And then use a list.)
I would consider a set to be more idiomatic. It conveys the meaning more clearly, since order doesn't matter, only membership.
And to be clear, a set is a collection but not a "sequence type" (even though it's iterable), because it's semantically "unordered".
Why not use a set?
Sets may only contain hashable types. And, this is important, they will raise a TypeError instead of simply returning False when you ask if an unhashable type is in the set. If you might get an unhashable object on either side of the in operator, you're out of luck. Sometimes you can use hashable elements instead (like frozenset instead of set or tuple instead of list), sometimes you can't.
But tuples and lists don't have to hash their elements.
Why a list over a tuple?
The main advantage of a list that they avoid a syntactic quirk for tuples of one element. Say you have ('foo', 'bar') and later decide to remove the 'bar'. Then you have ('foo'). Oops, see what I did there? It was actually supposed to be ('foo',). It's easy to forget the comma. And the in check still works for strings like ('foo'), since in checks for substrings. This can subtly change the meaning of your program. 'oo' is in ('foo'), but not in ('foo',).
A one-item list like ['foo'] doesn't have that problem. [And as
user2357112 pointed out, a constant list is going to get compiled to a tuple anyway.]
Note that a one-item set, like {'a'} doesn't have that problem either. An empty {} is a dict instead, but that's not going to cause any issues with an in check because it's also an empty collection.
But you should arguably be using == instead of in when comparing against only one element.
That's it for clarity. Now for the micro-optimizations. Early optimization is the root of all evil. Don't optimize at the expense of readability before it's actually necessary.
A set lookup is faster if it's not too small, since a tuple's elements have to be checked one-by-one which (on average) grows with the size of the tuple, while a set is backed by a hashtable (like a dict), which has a small constant overhead. If the distribution of cases isn't uniform, this means that the order of elements in the tuple matters a lot. Putting the more common cases first in the tuple will make the checks much faster than the reverse, on average.
How small does the collection have to be to for the set's constant overhead to matter? Profile and see. Performance can vary based on a lot of factors. It's not just the number of elements, but how long an equality check takes, and where they're located in memory, etc.
A tuple should have a slightly smaller overhead both in memory and construction time than the other collections. But the construction overhead doesn't really matter if the compiler can make it load as a saved constant value. (This can happen when all the elements are themselves constant at compile time. You can use the dis module to confirm this is happening.)
I understand the differences between mutable and immutable objects in Python. I have read many posts discussing the differences. However, I have not read anything regarding WHY integers are immutable objects.
Does there exist a reason for this? Or is the answer "that's just how it is"?
Edit: I am getting prompted to 'differentiate' this question from other questions as it seems to be a previously asked question. However, I believe what I'm asking is more of a philosophical Python question rather than a technical Python question.
It appears that 'primitive' objects in Python (i.e. strings, booleans, numbers, etc.) are immutable. I've also noticed that derived data types that are made up of primitives (i.e. dicts, lists, classes) are mutable.
Is that where the line is drawn whether or not an object is mutable? Primitive vs derived?
Making integers mutable would be very counter-intuitive to the way we are used to working with them.
Consider this code fragment:
a = 1 # assign 1 to a
b = a+2 # assign 3 to b, leave a at 1
After these assignments are executed we expect a to have the value 1 and b to have the value 3. The addition operation is creating a new integer value from the integer stored in a and an instance of the integer 2.
If the addition operation just took the integer at a and just mutated it then both a and b would have the value 3.
So we expect arithmetic operations to create new values for their results - not to mutate their input parameters.
However, there are cases where mutating a data structure is more convenient and more efficient. Let's suppose for the moment that list.append(x) did not modify list but returned a new copy of list with x appended.
Then a function like this:
def foo():
nums = []
for x in range(0,10):
nums.append(x)
return nums
would just return the empty list. (Remember - here nums.append(x) doesn't alter nums - it returns a new list with x appended. But this new list isn't saved anywhere.)
We would have to write the foo routine like this:
def foo():
nums = []
for x in range(0,10):
nums = nums.append(x)
return nums
(This, in fact, is very similar to the situation with Python strings up until about 2.6 or perhaps 2.5.)
Moreover, every time we assign nums = nums.append(x) we would be copying a list that is increasing in size resulting in quadratic behavior.
For those reasons we make lists mutable objects.
A consequence to making lists mutable is that after these statements:
a = [1,2,3]
b = a
a.append(4)
the list b has changed to [1,2,3,4]. This is something that we live with even though it still trips us up now and then.
What are the design decisions to make numbers immutable in Python?
There are several reasons for immutability, let's see first what are the reasons for immutability?
1- Memory
Saves memory. If it's well known that an object is immutable, it can be easily copied creating a new reference to the same object.
Performance. Python can allocate space for an immutable object at creation time, and the storage requirements are fixed and unchanging.
2- Fast execution.
It doesn't have to copy each part of the object, only a simple reference.
Easy to be compared, comparing equality by reference is faster than comparing values.
3- Security:
In Multi-threading apps Different threads can interact with data contained inside the immutable objects, without to worry about data consistency.
The internal state of your program will be consistent even if you have exceptions.
Classes should be immutable unless there's a very good reason to make them mutable....If a class cannot be made immutable, limit its mutability as much as possible
4- Ease to use
Is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways.
Immutable objects are easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce.
5- Keys must be immutable. Which means you can use strings, numbers or tuples as dictionary key. This is something that you want to use.
The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.
Going back to the integers:
Security (3), Easy to use (4) and capacity of using numbers as keys in dictionaries (5) are reasons for taken the decision of making numbers immutable.
Has fixed memory requirements since creation time (1).
All in Python is an object, the numbers (like strings) are "elemental" objects. No amount of activity will change the value 8 to anything else, and no amount of activity will change the string “eight” to anything else. This is because a decision in the design too.
What does this mean?
The only types of values not acceptable as dictionary keys are values containing lists or dictionaries or other mutable types that are compared by value rather than by object identity, the reason being that the efficient implementation of dictionaries requires a key’s hash value to remain constant.
I think even for tuples, comparison will happen by value.
The problem with a mutable object as a key is that when we use a dictionary, we rarely want to check identity. For example, when we use a dictionary like this:
a = "bob"
test = {a: 30}
print(test["bob"])
We expect it to work - the second string "bob" may not be the same as a, but it is the same value, which is what we care about. This works as any two strings that equate will have the same hash, meaning that the dict (implemented as a hashmap) can find those strings very efficiently.
The issue comes into play when we have a list as a key, imagine this case:
a = ["bob"]
test = {a: 30}
print(test[["bob"]])
We can't do this any more - the comparison won't work as the hash of a list is not based on it's value, but rather the instance of the list (aka (id(a) != id(["bob"))).
Python has the choice of making the list's hash change (undermining the efficiency of a hashmap) or simply comparing on identity (which is useless in most cases). Python disallows these specific mutable keys to avoid subtle but common bugs where people expect the values to be equated on value, rather than identity.
The documentation mixes together two different things: mutability, and value-comparable. Let's separate them out.
Immutable objects that compare by identity are fine. The identity can
never change, for any object.
Immutable objects that compare by value are fine. The value can never
change for an immutable object. This includes tuples.
Mutable objects that compare by identity are fine. The identity can
never change, for any object.
Mutable objects that compare by value are not acceptable. The value
can change for a mutable object, which would make the dictionary
invalid.
Meanwhile, your wording isn't quite the same as Mapping Types (4.10 in Python 3.3 or 5.8 in Python 2.7, both of which say:
A dictionary’s keys are almost arbitrary values. Values that are not hashable, that is, values containing lists, dictionaries or other mutable types (that are compared by value rather than by object identity) may not be used as keys.
Anyway, the key point here is that the rule is "not hashable"; "mutable types (that are compared by value rather than by object identity)" is just to explain things a little further. It isn't strictly true that comparing by object identity and hashing by object identity are always the same (the only thing that's required is that if id is equal, the hash is equal).
The part about "efficient implementation of dictionaries" from the version you posted just adds to the confusion (which is probably why it's not in the reference documentation). Even if someone came up with an efficient way to deal with storing lists as dict keys tomorrow, the language doesn't allow it.
A hash is way of calculating an unique code for an object, this code always the same for the same object. hash('test') for example is 2314058222102390712, so is a = 'test'; hash(a) = 2314058222102390712.
Internally a dictionary value is searched by the hash, not by the variable you specify. A list is mutable, a hash for a list, if it where defined, would be changing whenever the list changes. Therefore python's design does not hash lists. Lists therefore can not be used as dictionary keys.
Tuples are immutable, therefore tubles have hashes e.G. hash((1,2)) = 3713081631934410656. one could compare whether a tuple a is equal to the tuple (1,2) by comparing the hash, rather than the value. This would be more efficient as we have to compare only one value instead of two.
As a way to get used to python, I am trying to translate some of my code to python from Autohotkey_L.
I am immediately running into tons of choices for collection objects. Can you help me figure out a built in type or a 3rd party contributed type that has as much as possible, the functionality of the AutoHotkey_L object type and its methods.
AutoHotkey_L Objects have features of a python dict, list, and a class instance.
I understand that there are tradeoffs for space and speed, but I am just interested in functionality rather than optimization issues.
Don't write Python as <another-language>. Write Python as Python.
The data structure should be chosen just to have the minimal ability you need to use.
list — an ordered sequence of elements, with 1 flexible end.
collections.deque — an ordered sequence of elements, with 2 flexible ends (e.g. a queue).
set / frozenset — an unordered sequence of unique elements.
collections.Counter — an unordered sequence of non-unique elements.
dict — an unordered key-value relationship.
collections.OrderedDict — an ordered key-value relationship.
bytes / bytearray — a list of bytes.
array.array — a homogeneous list of primitive types.
Looking at the interface of Object,
dict would be the most suitable for finding a value by key
collections.OrderedDict would be the most suitable for the push/pop stuff.
when you need MinIndex / MaxIndex, where a sorted key-value relationship (e.g. red black tree) is required. There's no such type in the standard library, but there are 3rd party implementations.
It would be impossible to recommend a particular class without knowing how you intend on using it. If you are using this particular object as an ordered sequence where elements can be repeated, then you should use a list; if you are looking up values by their key, then use a dictionary. You will get very different algorithmic runtime complexity with the different data types. It really does not take that much time to determine when to use which type.... I suggest you give it some further consideration.
If you really can't decide, though, here is a possibility:
class AutoHotKeyObject(object):
def __init__(self):
self.list_value = []
self.dict_value = {}
def getDict(self):
return self.dict_value
def getList(self):
return self.list_value
With the above, you could use both the list and dictionary features, like so:
obj = AutoHotKeyObject()
obj.getList().append(1)
obj.getList().append(2)
obj.getList().append(3)
print obj.getList() # Prints [1, 2, 3]
obj.getDict()['a'] = 1
obj.getDict()['b'] = 2
print obj.getDict() # Prints {'a':1, 'b':2}
I've read several python tutorials (Dive Into Python, for one), and the language reference on Python.org - I don't see why the language needs tuples.
Tuples have no methods compared to a list or set, and if I must convert a tuple to a set or list to be able to sort them, what's the point of using a tuple in the first place?
Immutability?
Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.
In C/C++ if I allocate a pointer and point to some valid memory, I don't care where the address is located as long as it's not null before I use it.
Whenever I reference that variable, I don't need to know if the pointer is still pointing to the original address or not. I just check for null and use it (or not).
In Python, when I allocate a string (or tuple) assign it to x, then modify the string, why do I care if it's the original object? As long as the variable points to my data, that's all that matters.
>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167
x still references the data I want, why does anyone need to care if its id is the same or different?
immutable objects can allow substantial optimization; this is presumably why strings are also immutable in Java, developed quite separately but about the same time as Python, and just about everything is immutable in truly-functional languages.
in Python in particular, only immutables can be hashable (and, therefore, members of sets, or keys in dictionaries). Again, this afford optimization, but far more than just "substantial" (designing decent hash tables storing completely mutable objects is a nightmare -- either you take copies of everything as soon as you hash it, or the nightmare of checking whether the object's hash has changed since you last took a reference to it rears its ugly head).
Example of optimization issue:
$ python -mtimeit '["fee", "fie", "fo", "fum"]'
1000000 loops, best of 3: 0.432 usec per loop
$ python -mtimeit '("fee", "fie", "fo", "fum")'
10000000 loops, best of 3: 0.0563 usec per loop
None of the answers above point out the real issue of tuples vs lists, which many new to Python seem to not fully understand.
Tuples and lists serve different purposes. Lists store homogenous data. You can and should have a list like this:
["Bob", "Joe", "John", "Sam"]
The reason that is a correct use of lists is because those are all homogenous types of data, specifically, people's names. But take a list like this:
["Billy", "Bob", "Joe", 42]
That list is one person's full name, and their age. That isn't one type of data. The correct way to store that information is either in a tuple, or in an object. Lets say we have a few :
[("Billy", "Bob", "Joe", 42), ("Robert", "", "Smith", 31)]
The immutability and mutability of Tuples and Lists is not the main difference. A list is a list of the same kind of items: files, names, objects. Tuples are a grouping of different types of objects. They have different uses, and many Python coders abuse lists for what tuples are meant for.
Please don't.
Edit:
I think this blog post explains why I think this better than I did:
Understanding tuples vs. lists in Python - E-Scribe
if I must convert a tuple to a set or list to be able to sort them, what's the point of using a tuple in the first place?
In this particular case, there probably isn't a point. This is a non-issue, because this isn't one of the cases where you'd consider using a tuple.
As you point out, tuples are immutable. The reasons for having immutable types apply to tuples:
copy efficiency: rather than copying an immutable object, you can alias it (bind a variable to a reference)
comparison efficiency: when you're using copy-by-reference, you can compare two variables by comparing location, rather than content
interning: you need to store at most one copy of any immutable value
there's no need to synchronize access to immutable objects in concurrent code
const correctness: some values shouldn't be allowed to change. This (to me) is the main reason for immutable types.
Note that a particular Python implementation may not make use of all of the above features.
Dictionary keys must be immutable, otherwise changing the properties of a key-object can invalidate invariants of the underlying data structure. Tuples can thus potentially be used as keys. This is a consequence of const correctness.
See also "Introducing tuples", from Dive Into Python.
Sometimes we like to use objects as dictionary keys
For what it's worth, tuples recently (2.6+) grew index() and count() methods
I've always found having two completely separate types for the same basic data structure (arrays) to be an awkward design, but not a real problem in practice. (Every language has its warts, Python included, but this isn't an important one.)
Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.
These are different things. Mutability isn't related to the place it's stored in memory; it means the stuff it points to can't change.
Python objects can't change location after they're created, mutable or not. (More accurately, the value of id() can't change--same thing, in practice.) The internal storage of mutable objects can change, but that's a hidden implementation detail.
>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167
This isn't modifying ("mutating") the variable; it's creating a new variable with the same name, and discarding the old one. Compare to a mutating operation:
>>> a = [1,2,3]
>>> id(a)
3084599212L
>>> a[1] = 5
>>> a
[1, 5, 3]
>>> id(a)
3084599212L
As others have pointed out, this allows using arrays as keys to dictionaries, and other data structures that need immutability.
Note that keys for dictionaries do not have to be completely immutable. Only the part of it used as a key needs to be immutable; for some uses, this is an important distinction. For example, you could have a class representing a user, which compares equality and a hash by the unique username. You could then hang other mutable data on the class--"user is logged in", etc. Since this doesn't affect equality or the hash, it's possible and perfectly valid to use this as a key in a dictionary. This isn't too commonly needed in Python; I just point it out since several people have claimed that keys need to be "immutable", which is only partially correct. I've used this many times with C++ maps and sets, though.
As gnibbler offered in a comment, Guido had an opinion that is not fully accepted/appreciated: “lists are for homogeneous data, tuples are for heterogeneous data”. Of course, many of the opposers interpreted this as meaning that all elements of a list should be of the same type.
I like to see it differently, not unlike others also have in the past:
blue= 0, 0, 255
alist= ["red", "green", blue]
Note that I consider alist to be homogeneous, even if type(alist[1]) != type(alist[2]).
If I can change the order of the elements and I won't have issues in my code (apart from assumptions, e.g. “it should be sorted”), then a list should be used. If not (like in the tuple blue above), then I should use a tuple.
They are important since they guarantee the caller that the object they pass won't be mutated.
If you do this:
a = [1,1,1]
doWork(a)
The caller has no guarantee of the value of a after the call.
However,
a = (1,1,1)
doWorK(a)
Now you as the caller or as a reader of this code know that a is the same.
You could always for this scenario make a copy of the list and pass that but now you are wasting cycles instead of using a language construct that makes more semantic sense.
you can see here for some discussion on this
Your question (and follow-up comments) focus on whether the id() changes during an assignment. Focusing on this follow-on effect of the difference between immutable object replacement and mutable object modification rather than the difference itself is perhaps not the best approach.
Before we continue, make sure that the behavior demonstrated below is what you expect from Python.
>>> a1 = [1]
>>> a2 = a1
>>> print a2[0]
1
>>> a1[0] = 2
>>> print a2[0]
2
In this case, the contents of a2 was changed, even though only a1 had a new value assigned. Contrast to the following:
>>> a1 = (1,)
>>> a2 = a1
>>> print a2[0]
1
>>> a1 = (2,)
>>> print a2[0]
1
In this latter case, we replaced the entire list, rather than updating its contents. With immutable types such as tuples, this is the only behavior allowed.
Why does this matter? Let's say you have a dict:
>>> t1 = (1,2)
>>> d1 = { t1 : 'three' }
>>> print d1
{(1,2): 'three'}
>>> t1[0] = 0 ## results in a TypeError, as tuples cannot be modified
>>> t1 = (2,3) ## creates a new tuple, does not modify the old one
>>> print d1 ## as seen here, the dict is still intact
{(1,2): 'three'}
Using a tuple, the dictionary is safe from having its keys changed "out from under it" to items which hash to a different value. This is critical to allow efficient implementation.