I am working with large nested dictionaries, and am trying to delete nested subdictionaries. I am wondering why the following behavior occurs.
When I set a reference to dictionary d (called ref), then I change ref and print d, it shows an updated version of d with the third element added.
input:
d={"a":1,"b":2}
ref=d
ref["c"]=3
print(d)
output:
{'a': 1, 'b': 2, 'c': 3}
Given this behavior, I was expecting to be able to delete the dictionary by delete
input:
d={"a":1,"b":2}
ref=d
del ref
print(d)
output:
{'a': 1, 'b': 2}
I am wondering if there is a way to delete the original object when I delete the reference (meaning that the output of the second program would be an error because d was deleted.
del doesn't actually handle any de-allocation of memory, it merely unbinds a value from a name, and then decrements the reference count of that object by one. There is no way to systematically unbind all names from an object given a single reference.
An object is not garbage collected until some point after the reference count drops to 0. You can see an object's reference count by using the sys.getrefcount method (which is typically one higher than it actually is because of the temporary reference within the method itself).
We can demonstrate del in practice using this method and the __del__ method (which is called only when the reference count for the object is decremented to 0):
>>> # print something when refcount == 0 and object is about to be collected
>>> class Deleted:
... def __del__(self):
... print("actually deleted")
...
>>> a = Deleted()
>>> # just a
>>> sys.getrefcount(a) - 1
1
>>> b = a
>>> # a and b
>>> sys.getrefcount(a) - 1
2
>>> del a
>>> # now it's just b
>>> sys.getrefcount(b) - 1
1
>>> del b
actually deleted
If you're curious to read more about how all of this works internally, check out the C API documentation on the internal calls for reference counting, and check out the gc module, which is the high level python interface for introspecting the garbage collection sub-system.
Given your specific problem, since you are working with dictionaries which are mutable types, you could just clear the dictionary:
>>> a = {"a": 1}
>>> b = a
>>> # will clear the dict that both a and b are referencing
>>> b.clear()
>>> a
{}
Alternatively you can use the equivalent range syntax to clear the dictionary del a[:].
The del statement behaves differently depending on what is being deleted. Paraphrasing slightly:
Deletion of a name removes the binding of that name from the local or global namespace
That is the second case presented. You've got two references to the same object. The name ref has been deleted, but the name d still exists and points to the same object is always did.
However, attributes, subscriptions, and slicings have different behaviour:
Deletion of attribute references, subscriptions and slicings is passed to the primary object involved
That is more like the first case - deleting an element from either name will be reflected in the other:
input:
d = {"a":1, "b":2}
ref = d
del ref["a"]
print(d)
output:
{'b': 2}
So, wrapping the references inside a dictionary (or other container), will allow deletion by any reference.
Related
I was testing out the pre-defined dict attribute on a function, and I got a result I didn't expect to get. Consider the following code:
>>> def func():
x = 7
a = 0
print(func.__dict__)
>>> func()
{}
After all, I did define two variables in the namespace of the function. So why aren't these names appearing in the dict attribute of the function?
A function's __dict__ holds attributes, not local variables. Local variables are specific to each execution of a function, not to the function itself, so you can't get local variable values by inspecting a function.
If you assigned an attribute:
func.blah = 3
that would show up in the __dict__.
(A __dict__ doesn't hold all attributes - there are other ways to create attributes, which is why __dict__ itself doesn't show up in the __dict__, for example.)
The __dict__ attribute of a function object stores attributes assigned to the function object.
>>> def foo():
... a = 2
... return a
...
>>> foo.bar = 12
>>> foo.__dict__
{'bar': 12}
Attributes of the function object are not related to the local variables that exist during the function call. The former is unique (there is one __dict__ per function object) the latter are not (a function may be called multiple times with separate local variables).
>>> def nfoo(n: int):
... print(f'level {n} before:', locals())
... if n > 0:
... nfoo(n - 1)
... print(f'level {n} after: ', locals())
...
>>> nfoo(2)
level 2 before: {'n': 2}
level 1 before: {'n': 1}
level 0 before: {'n': 0}
level 0 after: {'n': 0}
level 1 after: {'n': 1}
level 2 after: {'n': 2}
Note how the locals of each level exist at the same time but hold separate values for the same name.
__dict__ is there to support function attributes, it's not a namespace nor a symbol table for the function body scope. You can read PEP-232 for further information regarding function attributes.
Suppose I have the following module:
blah.py
a = 1
someDict = {'a' : 1, 'b': 2, 'c' : 3}
In the next python session I get the following:
>>> from blah import a, someDict
>>> a
1
>>> someDict
{'a': 1, 'b': 2, 'c': 3}
>>> a = 100
>>> someDict['a'] = 100
>>> del a, someDict
>>> from blah import a, someDict
>>> a
1
>>> someDict['a']
100
>>> import blah
>>> blah.someDict['a']
100
It appears that when I modify an object that I imported from another module, and then re-import that object, it recovers its original value expressed in the module. But this doesn't apply to values in a dictionary. If I want to recover the original value of someDict after making any modification, I have to close the current python session and open a new one. I find that this is even true if I merely called a function that modifies the dict elements.
Why does this happen? And is there some way I can re-import the dictionary with its original value without starting a new python session?
Because you denamespaced the dict (with from x import y syntax), you need to do this as a two-step process (three including the necessary imports):
Do import importlib, blah to gain access to the reload function, and the actual module to call it on
Run importlib.reload(blah) to throw away the module cache of blah, and reread it afresh from disk (the fresh version is stored in the cache, so future imports related to blah see the new version)
Run from blah import a, someDict again to pull the refreshed contents of blah
The reason you didn't see a problem with a is that after doing from blah import a, a wasn't special; __main__.a was just another alias to blah.a, but since a = 100 rebinds a to a completely new int anyway (and since ints are immutable, even a += 100 would actually perform a rebinding), you never changed blah.a (you'd have to explicitly do import blah, blah.a = 100 to have that happen).
someDict was a problem because, like a, __main__.someDict and blah.someDict end up as aliases of the same dict, and you mutate that dict, you're not rebinding __main__.someDict itself. If you want to avoid mutating blah's values in the first place, make sure the first modification to someDict rebinds it to a fresh dict, rather than modifying the one it's sharing with blah, e.g. instead of:
someDict['a'] = 100
do:
someDict = {**someDict, 'a': 100}
to make a fresh dict with a copy of blah.someDict, but with the value of 'a' in it replaced with a new value.
I understand that in python every thing, be it a number, string, dict or anything is an object. The variable name simply points to the object in the memory. Now according to this question,
>> a_dict = b_dict = c_dict = {}
This creates an empty dictionary and all the variables point to this dict object. So, changing any one would be reflected in the other variables.
>> a_dict["key"] = "value" #say
>> print a_dict
>> print b_dict
>> print c_dict
would give
{'key': value}
{'key': value}
{'key': value}
I had understood the concept of variables pointing to objects, so this seems fair enough.
Now even though it might be weird, since its such a basic statement, why does this happen ?
>> a = b = c = 1
>> a += 1
>> print a, b, c
2, 1, 1 # and not 2, 2, 2
First part of question: Why isn't the same concept applied here ?
Actually this doubt came up when I was trying to search for a solution for this:
>> a_dict = {}
>> some_var = "old_value"
>> a_dict['key'] = some_var
>> some_var = "new_value"
>> print a_dict
{'key': 'old_value'} # and not {'key': 'new_value'}
This seemed counter-intuitive since I had always assumed that I am telling the dictionary to hold the variable, and changing the object that the variable was pointing to would obviously reflect in the dictionary. But this seems to me as if the value is being copied, not referenced. This was the second thing I didn't understand.
Moving on, i tried something else
>> class some_class(object):
.. def __init__(self):
.. self.var = "old_value"
>> some_object = some_class()
>> a_dict = {}
>> a_dict['key'] = some_object
>> some_object.var = "new_value"
>> print a_dict['key'].var
"new_value" # even though this was what i wanted and expected, it conflicts with the output in the previous code
Now, over here, obviously it was being referenced. These contradictions has left me squacking at the unpredictable nature of python, even though I still love it, owing to the fact I don't know any other language well enough :p . Even though I'd always imagined that assignments lead to reference of the object, however these 2 cases are conflicting. So this is my final doubt . I understand that it might be one those python gotcha's . Please educate me.
You're wrestling with 2 different things here. The first is the idea of mutability vs. immutability. In python, str, int, tuple are some of the builtin immutable types compared to list, dict (and others) which are mutable types. immutable objects are ones which cannot be changed once they are created. So, in your example:
a = b = c = 1
After that line, all a, b and c refer to the same integer in memory (you can check by printing their respecitve id's and noting that they are the same). However, when you do:
a += 1
a now refers to a new (different) integer at a different memory location. Note that as a convention, += should return a new instance of something if the type is immutable. If the type is mutable, it should change the object in place and return it. I explain some of the more gory detail in this answer.
For the second part, you're trying to figure out how python's identifiers work. The way that I think of it is this... when you write a statement:
name = something
The right hand side is evaluated into some object (an integer, string, ...). That object is then given the name on the left hand side1. When a name is on the right hand side, the corresponding object is automatically "looked up" and substituted for the name in the calculation. Note that in this framework, assignment doesn't care if anything had that name before -- it simply overwrites the old value with the new one. Objects which were previously constructed using that name don't see any changes -- either. They've already been created -- keeping references to the objects themselves, not the names. So:
a = "foo" # `a` is the name of the string "foo"
b = {"bar": a} # evaluate the new dictionary and name it `b`. `a` is looked up and returns "foo" in this calculation
a = "bar" # give the object "bar" the name `a` irrespecitve of what previously had that name
1I'm glossing over a few details here for simplicity -- e.g. what happens when you assign to a list element: lst[idx] = some_value * some_other_value.
This is because += can be interpreted as a = a + 1, which rebinds the variable a to the value a + 1, that is, 2.
Similarly, some_var = "new_value" rebinds the variable and the object is not changed, so the key, value pair in the dictionary still points to that object.
In your last example, you are not rebinding, but mutating the object, so the value is changed in the dictionary.
class test:
def __init__(self):
self.see=0
self.dic={"1":self.see}
examine=test()
examine.see+=1
print examine.dic["1"]
print examine.see
this has as a result 0 and 1 and it makes no sense why.
print id(examine.dic["1"])
print id(examine.see)
they also have different memory addresses
However, if you use the same example but you have an array instead of variable in see. You get the expected output.
Any explanations?
This gives the expected output:
class test:
def __init__(self):
self.see=[0]
self.dic={"1":self.see}
examine=test()
examine.see[0]+=1
print examine.dic["1"][0]
print examine.see[0]
Short answer:
Arrays/lists are mutable whereas integers/ints are not.
lists are mutable (they can be changed in place), when you change a list the same object gets updated (the id doesn't change, because a new object is not needed).
Integers are immuable - this means to change the value of something, you have to create a new object, which will have a different id. Strings work the same way and you would have had the same "problem" if you set self.see = 'a', and then did examine.see += ' b'
>>> a = 'a'
>>> id(a)
3075861968L
>>> z = a
>>> id(z)
3075861968L
>>> a += ' b'
>>> id(a)
3075385776L
>>> id(z)
3075861968L
>>> z
'a'
>>> a
'a b'
In Python, names point to values; and values are managed by Python. The id() method returns a unique identifier of the value and not the name.
Any number of names can point to the same value. This means, you can have multiple names that are all linked to the same id.
When you first create your class object, the name see is pointing to the value of an integer object, and that object's value is 1. Then, when you create your class dic, the "1" key is now pointing to the same object that see was pointing to; which is 1.
Since 1 (an object of type integer) is immutable - whenever you update it, the original object is replaced and a new object is created - this is why the return value of id() changes.
Python is smart enough to know that there are some other names pointing to the "old" value, and so it keeps that around in memory.
However, now you have two objects; and the dictionary is still pointing to the "old" one, and see is now pointing to the new one.
When you use a list, Python doesn't need to create a new object because it can modify a list without destroying it; because lists are mutable. Now when you create a list and point two names to it, both the names are pointing to the same object. When you update this object (by adding a value, or deleting a value or changing its value) the same object is updated - and so everything pointing to it will get the "updated" value.
examine.dic["1"] and examine.see do indeed have different locations, even if the former's initial value is copied from the latter.
With your case of using an array, you're not changing the value of examine.see: you're instead changing examine.see[0], which is changing the content of the array it points to (which is aliased to examine.dic["1"]).
When you do self.dic={"1":self.see}, the dict value is set to the value of self.see at that moment. When you later do examine.see += 1, you set examine.see to a new value. This has no effect on the dict because the dict was set to the value of self.see; it does not know to "keep watching" the name self.see to see if is pointing to a different value.
If you set self.see to a list, and then do examine.see += [1], you are not setting examine.see to a new value, but are changing the existing value. This will be visible in the dict, because, again, the dict is set to the value, and that value can change.
The thing is that sometimes a += b sets a to a new value, and sometimes it changes the existing value. Which one happens depends on the type of a; you need to know what examine.see is to know what examine.see += something does.
Others have addressed the mutability/boxing question. What you seem to be asking for is late binding. This is possible, but a little counterintuitive and there's probably a better solution to your underlying problem… if we knew what it was.
class test:
#property
def dic(self):
self._dic.update({'1': self.see})
return self._dic
def __init__(self):
self.see = 0
self._dic = {}
>>> ex=test()
>>> ex.see
0
>>> ex.see+=1
>>> ex.see
1
>>> ex.dic
{'1': 1}
>>> ex.see+=1
>>> ex.dic
{'1': 2}
In fact, in this contrived example it's even a little dangerous because returning self._dic the consumer could modify the dict directly. But that's OK, because you don't need to do this in real life. If you want the value of self.see, just get the value of self.see.
In fact, it looks like this is what you want:
class test:
_see = 0
#property
def see(self):
self._see+=1
return self._see
or, you know, just itertools.count() :P
This solution worked for me. Feel free to use it.
class integer:
def __init__(self, integer):
self.value=integer
def plus(self):
self.value=self.value+1
def output(self):
return self.value
The solution replaces the mutable type int with a class whose address is used as reference.
Furthermore you can make changes to the class object and the changes apply to what the dictionary points. It is somewhat a pointer/datastructure.
Today I learned that Python caches the expression {}, and replaces it with a new empty dict when it's assigned to a variable:
print id({})
# 40357936
print id({})
# 40357936
x = {}
print id(x)
# 40357936
print id({})
# 40356432
I haven't looked at the source code, but I have an idea as to how this might be implemented. (Maybe when the reference count to the global {} is incremented, the global {} gets replaced.)
But consider this bit:
def f(x):
x['a'] = 1
print(id(x), x)
print(id(x))
# 34076544
f({})
# (34076544, {'a': 1})
print(id({}), {})
# (34076544, {})
print(id({}))
# 34076544
f modifies the global dict without causing it to be replaced, and it prints out the modified dict. But outside of f, despite the id being the same, the global dict is now empty!
What is happening??
It's not being cached -- if you don't assign the result of {} anywhere, its reference count is 0 and it's cleaned up right away. It just happened that the next one you allocated reused the memory from the old one. When you assign it to x you keep it alive, and then the next one has a different address.
In your function example, once f returns there are no remaining references to your dict, so it gets cleaned up too, and the same thing applies.
Python isn't doing any caching here. There are two possibilities when id() gives the same return value at different points in a program:
id() was called on the same object twice
The first object that id() was called on was garbage collected before the second object was created, and the second object was created in the same memory location as the original
In this case, it was the second one. This means that even though print id({}); print id({}) may print the same value twice, each call is on a distinct object.