Why Python "append" is not behaving as expected? - python

I have this code which I'm using to transform a dictionary with a list inside it's structure into a list of dictionaries adding new columns in a flat structure for each item of the internal list. This is my code:
origin = {
"a":1,
"b":2,
"m":[
{"c":3},
{"c":4}
]
}
# separating the "flat" part of the structure
flat = dict()
for o in origin.keys():
if not isinstance(origin[o], list):
flat[o] = origin[o]
lines = list()
# starts receiving the 'flat' value, once the new lines will receive the same flat values.
new_line = flat
# getting the "non-flat" values and creating new dictionaries using the flat structure
for i in origin["m"]:
k = list(i.keys())[0]
v = list(i.values())[0]
new_line[k] = v
print(f"NEW_LINE: {str(new_line)}")
lines.append(new_line)
print(f"LINES:\n{str(lines)}")
I was expecting this:
NEW_LINE: {'a': 1, 'b': 2, 'c': 3}
NEW_LINE: {'a': 1, 'b': 2, 'c': 4}
LINES:
[{'a': 1, 'b': 2, 'c': 3}, {'a': 1, 'b': 2, 'c': 4}]
But I'm getting this:
NEW_LINE: {'a': 1, 'b': 2, 'c': 3}
NEW_LINE: {'a': 1, 'b': 2, 'c': 4}
LINES:
[{'a': 1, 'b': 2, 'c': 4}, {'a': 1, 'b': 2, 'c': 4}]
Why?

You need to append a copy of the dictionary object as lines.append(new_line.copy()) else they are all pointing to the same object.
Note that copy() does a shallow copy of the object and you'll need to use deepcopy() for nested objects.
Read the difference between the two in the docs.
The difference between shallow and deep copying is only relevant for
compound objects (objects that contain other objects, like lists or
class instances):
A shallow copy constructs a new compound object and then (to the
extent possible) inserts references into it to the objects found in
the original.
A deep copy constructs a new compound object and then, recursively,
inserts copies into it of the objects found in the original.

Related

Python Concatenated list of dictionaries modifies all instances of dictionary inside the list if one of the dictionary is updated

I have a simple scenario where Python (3.7, tested also 3.5) does not seem to behave as I expect.
putting it simply:
a = [{"c":1, "d":2}]
a
[{'c': 1, 'd': 2}]
b = a + a
b
[{'c': 1, 'd': 2}, {'c': 1, 'd': 2}]
b[0]
{'c': 1, 'd': 2}
b[0]['c'] = 3
b
[{'c': 3, 'd': 2}, {'c': 3, 'd': 2}]
Changing the value of an entry in the first dictionary in b, also updates the corresponding entry in the 2nd dictionary.
I have tried b = a.copy() + a.copy() but got the same result.
Does anyone know a way around it?
You should use deepcopy
copy returns only a shallow copy so since your dictionary is inside the list copy will create new list but the dictionary inside the list will still reference the same dictionary.
shallow copy would work if you had this case:
a = {"c":1, "d":2}
b = [a.copy(), a.copy()]
But in your case you need to use deepcopy
from copy import deepcopy
b = deepcopy(a) + deepcopy(a)

Is there any use of dictionaries which reference itself in its value?

Example:
>>> x = {'a' : 3, 'b' : 5, 'c' : 6}
>>> x['d'] = x
>>> x
{'a': 3, 'b': 5, 'c': 6, 'd': {...}}
​
>>> x['d']
{'a': 3, 'b': 5, 'c': 6, 'd': {...}}
​>>> x['d']['d']
{'a': 3, 'b': 5, 'c': 6, 'd': {...}}
​>>> x['d']['d']['d']
{'a': 3, 'b': 5, 'c': 6, 'd': {...}}
I guess it is infinitely looped since it is referencing itself. I just wanted to know if there is any use case for such dictionaries in real world? If yes, any examples?
No actually I don't think that's a very useful thing. The author's opinion even leans the other way that it's hard enough to have objects copied in all the right spots in python.
One obvious possibility would be to subclass dict and create (or interface) an RPG like Nethack inside the python CLI that way. You could even add some sort of UI to it. Basically, dictionaries and other mapping types are "useful enough" in python without recursion, though.

Python command dict(zip()) changes the order of a [duplicate]

This question already has answers here:
Order of keys in dictionary
(3 answers)
Closed 7 years ago.
I have two lists which I'm mapping into a dictionary.
The two lists are-
a = ['a','b','c','d'] and b = [1,2,3,4] .
When I run the command
>>> d = dict(zip(a,b))
>>> d
I get
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
whereas the expected value is {'a': 1, 'b': 2, 'c': 3, 'd': 4}
Why this change in the order of the keys?
There is no inherent "obvious" order in the keys of a dict. Admittedly, the docs only spell it out for CPython, but also note
If items(), keys(), values(), iteritems(), iterkeys(), and
itervalues() are called with no intervening modifications to the
dictionary, the lists will directly correspond.
which says by omission that otherwise they might change.
(Note that there is an order, but it involves the hashes of the keys, so it's not as easy as "a before b", and in particular, since a few years back, it is liable to change with each new call of the executable.)
There is no order in a dictionary.
{'a': 1, 'b': 2, 'c': 3, 'd': 4} == {'a': 1, 'c': 3, 'b': 2, 'd': 4}

how to use a sentinel list in a comprehension?

I have a list
In [4]: a = [1, 2, 3, 3, 2, 4]
from which I would like to remove duplicates via a comprehension using a sentinel list (see below why):
In [8]: [x if x not in seen else seen.append(x) for x in a]
Out[8]: [1, 2, 3, 3, 2, 4]
It seems that seen is not taken into account (neither updated, not checked). Why is it so?
As for the reason why using a convoluted method: The list I have is of the form
[{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
and I want to remove duplicates based on the value of a specific key (b in the case above, to leave [{'a': 3, 'b': 4}, {'a': 5, 'b': 5}] (I do not care which dict is removed). The idea would be to build a sentinel list with the values of b and keep only the dicts without b equal to any element in that sentinel list.
Since x is not in seen, you are never adding it to seen either; the else branch is not executed when x not in seen is true.
However, you are using a conditional expression; it always produces a value; either x or the result of seen.append() (which is None), so you are not filtering, you are mapping here.
If you wanted to filter, move the test to an if section after the for loop:
seen = set()
[x for x in a if not (x in seen or seen.add(x))]
Since you were using seen.append() I presume you were using a list; I switched you to a set() instead, as membership tests are way faster using a set.
So x is excluded only if a) x in seen is true (so we have already seen it), or seen.append(x) returned a true value (None is not true). Yes, this works, if only a little convoluted.
Demo:
>>> a = [1, 2, 3, 3, 2, 4]
>>> seen = set()
>>> [x for x in a if not (x in seen or seen.add(x))]
[1, 2, 3, 4]
>>> seen
set([1, 2, 3, 4])
Applying this to your specific problem:
>>> a = [{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
>>> seen = set()
>>> [entry for entry in a if not (entry['b'] in seen or seen.add(entry['b']))]
[{'a': 3, 'b': 4}, {'a': 5, 'b': 5}]
You never execute the else part of the if, because you do not update when you match the first time. You could do this:
[seen.append(x) or x for x in lst if x not in seen]
This way the or returns the last value (and executes the update using append (which always returns None, to let the or continue looking for truth-y value).
Maybe you can use the fact that dict keys are a set for this. If you want to prioritize the last items use reversed (last item is prioritized here):
>>> lst = [{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
>>> filtered = {item['b']: item for item in reversed(lst)}
>>> filtered.values()
[{'a': 3, 'b': 4}, {'a': 5, 'b': 5}]
This uses 'b' as the key to map a value to, so only a single elemnt can be mapped to a value of 'b', which effectively creates a set over 'b'.
note: this will return the values in random order. To fix it nicely, for big datasets, I'd create another mapping, of each object to it's index in the original list (O(n)), and use that mapping as a sorting function of the final result (O(n*log(n))). That's beyond the scope of this answer.
I'm always queasy making use of operator precedence as execution flow control. I feel that the below is marginally more explicit and palatable, although it does carry the additional cost of tuple creation.
b_values = set()
[(item, b_values.add(item['b']))[0] for item in original_list
if item['b'] not in b_values]
But really when you're maintaining/updating some sort of state, I think the best format is the simple for-loop:
output_list = []
b_values = set()
for item in original_list:
if item['b'] not in b_values:
output_list.append(item)
b_values.add(item['b'])

Understanding dict.copy() - shallow or deep?

While reading up the documentation for dict.copy(), it says that it makes a shallow copy of the dictionary. Same goes for the book I am following (Beazley's Python Reference), which says:
The m.copy() method makes a shallow
copy of the items contained in a
mapping object and places them in a
new mapping object.
Consider this:
>>> original = dict(a=1, b=2)
>>> new = original.copy()
>>> new.update({'c': 3})
>>> original
{'a': 1, 'b': 2}
>>> new
{'a': 1, 'c': 3, 'b': 2}
So I assumed this would update the value of original (and add 'c': 3) also since I was doing a shallow copy. Like if you do it for a list:
>>> original = [1, 2, 3]
>>> new = original
>>> new.append(4)
>>> new, original
([1, 2, 3, 4], [1, 2, 3, 4])
This works as expected.
Since both are shallow copies, why is that the dict.copy() doesn't work as I expect it to? Or my understanding of shallow vs deep copying is flawed?
By "shallow copying" it means the content of the dictionary is not copied by value, but just creating a new reference.
>>> a = {1: [1,2,3]}
>>> b = a.copy()
>>> a, b
({1: [1, 2, 3]}, {1: [1, 2, 3]})
>>> a[1].append(4)
>>> a, b
({1: [1, 2, 3, 4]}, {1: [1, 2, 3, 4]})
In contrast, a deep copy will copy all contents by value.
>>> import copy
>>> c = copy.deepcopy(a)
>>> a, c
({1: [1, 2, 3, 4]}, {1: [1, 2, 3, 4]})
>>> a[1].append(5)
>>> a, c
({1: [1, 2, 3, 4, 5]}, {1: [1, 2, 3, 4]})
So:
b = a: Reference assignment, Make a and b points to the same object.
b = a.copy(): Shallow copying, a and b will become two isolated objects, but their contents still share the same reference
b = copy.deepcopy(a): Deep copying, a and b's structure and content become completely isolated.
Take this example:
original = dict(a=1, b=2, c=dict(d=4, e=5))
new = original.copy()
Now let's change a value in the 'shallow' (first) level:
new['a'] = 10
# new = {'a': 10, 'b': 2, 'c': {'d': 4, 'e': 5}}
# original = {'a': 1, 'b': 2, 'c': {'d': 4, 'e': 5}}
# no change in original, since ['a'] is an immutable integer
Now let's change a value one level deeper:
new['c']['d'] = 40
# new = {'a': 10, 'b': 2, 'c': {'d': 40, 'e': 5}}
# original = {'a': 1, 'b': 2, 'c': {'d': 40, 'e': 5}}
# new['c'] points to the same original['d'] mutable dictionary, so it will be changed
It's not a matter of deep copy or shallow copy, none of what you're doing is deep copy.
Here:
>>> new = original
you're creating a new reference to the the list/dict referenced by original.
while here:
>>> new = original.copy()
>>> # or
>>> new = list(original) # dict(original)
you're creating a new list/dict which is filled with a copy of the references of objects contained in the original container.
Adding to kennytm's answer. When you do a shallow copy parent.copy() a new dictionary is created with same keys,but the values are not copied they are referenced.If you add a new value to parent_copy it won't effect parent because parent_copy is a new dictionary not reference.
parent = {1: [1,2,3]}
parent_copy = parent.copy()
parent_reference = parent
print id(parent),id(parent_copy),id(parent_reference)
#140690938288400 140690938290536 140690938288400
print id(parent[1]),id(parent_copy[1]),id(parent_reference[1])
#140690938137128 140690938137128 140690938137128
parent_copy[1].append(4)
parent_copy[2] = ['new']
print parent, parent_copy, parent_reference
#{1: [1, 2, 3, 4]} {1: [1, 2, 3, 4], 2: ['new']} {1: [1, 2, 3, 4]}
The hash(id) value of parent[1], parent_copy[1] are identical which implies [1,2,3] of parent[1] and parent_copy[1] stored at id 140690938288400.
But hash of parent and parent_copy are different which implies
They are different dictionaries and parent_copy is a new dictionary having values reference to values of parent
"new" and "original" are different dicts, that's why you can update just one of them.. The items are shallow-copied, not the dict itself.
In your second part, you should use new = original.copy()
.copy and = are different things.
Contents are shallow copied.
So if the original dict contains a list or another dictionary, modifying one them in the original or its shallow copy will modify them (the list or the dict) in the other.

Categories