Dictionary (same value, different key) - python

Newbie Alert:
I'm new to Python and when I'm basically adding values to a dict, I find that when I'm printing the whole dictionary, I get the same value of something for all keys of a specific key.
Seems like a pointer issue?
Here's a snippet when using the event-based XML parser (SAX):
Basically with every end element of "row", I'm storing the element by it's key: self.Id, where self is the element.
def endElement(self, name):
if name == "row":
self.mapping[self.Id] = self
print "Storing...: " + self.DisplayName + " at Id: " + self.Id

You'll get the value self for every single entry in self.mapping, of course, since that's the only value you ever store there. Did you rather mean to take a copy/snapshot of self or some of its attributes at that point, then have self change before it gets stored again...?
Edit: as the OP has clarified (in comments) that they do indeed need to take a copy:
import copy
...
self.mapping[self.Id] = copy.copy(self)
or, use copy.deepcopy(self) if self has, among its attributes, dictionaries, lists etc that need to be recursively copied (that would of course include self.mapping, leading to rather peculiar results -- if the normal, shallow copy.copy is not sufficient, it's probably worth adding the special method to self's class to customize deep copying, to avoid the explosion of copies of copies of copies of ... that would normally result;-).

If I understand what you're saying, then this is probably expected behaviour. When you make an assignment in Python, you're just assigning the reference (sort of like a pointer). When you do:
self.mapping[self.Id] = self
then future changes to self will be reflected in the value for that mapping you just did. Python does not "copy" objects (unless you specifically write code to do so), it only assigns references.

Related

Why does Python return None on list.reverse()?

Was solving an algorithms problem and had to reverse a list.
When done, this is what my code looked like:
def construct_path_using_dict(previous_nodes, end_node):
constructed_path = []
current_node = end_node
while current_node:
constructed_path.append(current_node)
current_node = previous_nodes[current_node]
constructed_path = reverse(constructed_path)
return constructed_path
But, along the way, I tried return constructed_path.reverse() and I realized it wasn't returning a list...
Why was it made this way?
Shouldn't it make sense that I should be able to return a reversed list directly, without first doing list.reverse() or list = reverse(list) ?
What I'm about to write was already said here, but I'll write it anyway because I think it will perhaps add some clarity.
You're asking why the reverse method doesn't return a (reference to the) result, and instead modifies the list in-place. In the official python tutorial, it says this on the matter:
You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. This is a design principle for all mutable data structures in Python.
In other words (or at least, this is the way I think about it) - python tries to mutate in-place where-ever possible (that is, when dealing with an immutable data structure), and when it mutates in-place, it doesn't also return a reference to the list - because then it would appear that it is returning a new list, when it is really returning the old list.
To be clear, this is only true for object methods, not functions that take a list, for example, because the function has no way of knowing whether or not it can mutate the iterable that was passed in. Are you passing a list or a tuple? The function has no way of knowing, unlike an object method.
list.reverse reverses in place, modifying the list it was called on. Generally, Python methods that operate in place don’t return what they operated on to avoid confusion over whether the returned value is a copy.
You can reverse and return the original list:
constructed_path.reverse()
return constructed_path
Or return a reverse iterator over the original list, which isn’t a list but doesn’t involve creating a second list just as big as the first:
return reversed(constructed_path)
Or return a new list containing the reversed elements of the original list:
return constructed_path[::-1]
# equivalent: return list(reversed(constructed_path))
If you’re not concerned about performance, just pick the option you find most readable.
methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. 1 This is a design principle for all mutable data structures in Python.
PyDocs 5.1
As I understand it, you can see the distinction quickly by comparing the differences returned by modifying a list (mutable) ie using list.reverse() and mutating a list that's an element within a tuple (non-mutable), while calling
id(list)
id(tuple_with_list)
before and after the mutations. Mutable data-type mutations returning none is part allowing them to be changed/expanded/pointed-to-by-multiple references without reallocating memory.

`dict.popitem` with filtered data and a conditional flag

I am implementing a something that uses a dictionary to store data. Additionally to the normal data, it also stores some internal data, all prefixed with _. However, I want to isolate the user of the library from this data since he is normally not concerned with it. Additionally, I need to set a modified flag in my class to track if the data was modified.
For all interface functions this worked nicely, here are two examples, one with and one without modification. Note that in this case, I do not hide internal data, because it is intentionally demanded as a key:
def __getitem__(self, key):
return self._data[key]
def __setitem__(self, key, value):
self.modified = True
self._data[key] = value
On some functions, e.g. __iter__, I filter out everything that starts with _ before I yield the data.
But a single function makes real problems here: popitem. In its normal behaviour it would just withdraw an arbitrary item and return it while deleting it from the dict. However, here comes the problem: Without deep internal knowledge, I don't know which item will be returned beforehand. But I know that popitem follows the same rules as items and keys. So I did come up with an implementation:
keys = self._data.keys()
for k in keys:
if k.startswith("_"):
continue
v = self._data.pop(k)
self.modified = True
return k, v
else:
raise KeyError('popitem(): dictionary is empty')
This implementation works. But it feels to unpythonic and not at all dynamic or clean. It did also struggle with the idea to raise the exception like this: {}.popitem() which looks totally insane but would give me at least a dynamic way (e.g. if the exception message or type ever changes, I don't have to adjust).
What I am now after is a cleaner and less crazy way to solve this problem. There would be a way of removing the internal data from the dict, but I'd only take this road as a last resort. So do you have any recipes or ideas for this?
Give your objects two dict attributes: self._data and self._internal_data. Then forward all the dict methods to self._data, and you won't have to filter out anything.
edit: Okay, I missed the "last resort" bit at the end. But I suspect that managing two dicts will be far easier than "fixing" every single dict method and operator. :)
Subclass dict rather than wrapping a dictionary. You'll need to implement a lot less stuff.
Store your "internal data" as attributes on the object, not in the dictionary. This way they are easy to get to if you need them, but won't appear in ordinary iteration. If at some point you need to combine them, do that with x = dict(self); x.update(self.__dict__) to create a new dictionary having both sets of values.
If you do want to store your internal data as a dictionary, embed that one. Implement __missing__ on your main object so you can grab items from the internal dictionary if they're not found in the main one.
Well, the logic's correct, you could reduce it to something like:
self._data.pop(next((key for key in self._data if not key.startswith('_')), 'popitem(): dictionary is empty'))
So, find the next key in self._data that doesn't start with _, otherwise default it to a key that isn't going to match any of the other keys in the dictionary so that when the pop fails, you automatically get the KeyError thrown (with your "error message")

Use python dict to lookup mutable objects

I have a bunch of File objects, and a bunch of Folder objects. Each folder has a list of files. Now, sometimes I'd like to lookup which folder a certain file is in. I don't want to traverse over all folders and files, so I create a lookup dict file -> folder.
folder = Folder()
myfile = File()
folder_lookup = {}
# This is pseudocode, I don't actually reach into the Folder
# object, but have an appropriate method
folder.files.append(myfile)
folder_lookup[myfile] = folder
Now, the problem is, the files are mutable objects. My application is built around the fact. I change properites on them, and the GUI is notified and updated accordingly. Of course you can't put mutable objects in dicts. So what I tried first is to generate a hash based on the current content, basically:
def __hash__(self):
return hash((self.title, ...))
This didn't work of course, because when the object's contents changed its hash (and thus its identity) changed, and everything got messed up. What I need is an object that keeps its identity, although its contents change. I tried various things, like making __hash__ return id(self), overriding __eq__, and so on, but never found a satisfying solution. One complication is that the whole construction should be pickelable, so that means I'd have to store id on creation, since it could change when pickling, I guess.
So I basically want to use the identity of an object (not its state) to quickly look up data related to the object. I've actually found a really nice pythonic workaround for my problem, which I might post shortly, but I'd like to see if someone else comes up with a solution.
I felt dirty writing this. Just put folder as an attribute on the file.
class dodgy(list):
def __init__(self, title):
self.title = title
super(list, self).__init__()
self.store = type("store", (object,), {"blanket" : self})
def __hash__(self):
return hash(self.store)
innocent_d = {}
dodge_1 = dodgy("dodge_1")
dodge_2 = dodgy("dodge_2")
innocent_d[dodge_1] = dodge_1.title
innocent_d[dodge_2] = dodge_2.title
print innocent_d[dodge_1]
dodge_1.extend(range(5))
dodge_1.title = "oh no"
print innocent_d[dodge_1]
OK, everybody noticed the extremely obvious workaround (that took my some days to come up with), just put an attribute on File that tells you which folder it is in. (Don't worry, that is also what I did.)
But, it turns out that I was working under wrong assumptions. You are not supposed to use mutable objects as keys, but that doesn't mean you can't (diabolic laughter)! The default implementation of __hash__ returns a unique value, probably derived from the object's address, that remains constant in time. And the default __eq__ follows the same notion of object identity.
So you can put mutable objects in a dict, and they work as expected (if you expect equality based on instance, not on value).
See also: I'm able to use a mutable object as a dictionary key in python. Is this not disallowed?
I was having problems because I was pickling/unpickling the objects, which of course changed the hashes. One could generate a unique ID in the constructor, and use that for equality and deriving a hash to overcome this.
(For the curious, as to why such a "lookup based on instance identity" dict might be neccessary: I've been experimenting with a kind of "object database". You have pure python objects, put them in lists/containers, and can define indexes on attributes for faster lookup, complex queries and so on. For foreign keys (1:n relationships) I can just use containers, but for the backlink I have to come up with something clever if I don't want to modify the objects on the n side.)

memory management with objects and lists in python

I am trying to understand how exactly assignment operators, constructors and parameters passed in functions work in python specifically with lists and objects. I have a class with a list as a parameter. I want to initialize it to an empty list and then want to populate it using the constructor. I am not quite sure how to do it.
Lets say my class is --
class A:
List = [] # Point 1
def __init1__(self, begin=[]): # Point 2
for item in begin:
self.List.append(item)
def __init2__(self, begin): # Point 3
List = begin
def __init3__(self, begin=[]): # Point 4
List = list()
for item in begin:
self.List.append(item)
listObj = A()
del(listObj)
b = listObj
I have the following questions. It will be awesome if someone could clarify what happens in each case --
Is declaring an empty like in Point 1 valid? What is created? A variable pointing to NULL?
Which of Point 2 and Point 3 are valid constructors? In Point 3 I am guessing that a new copy of the list passed in (begin) is not made and instead the variable List will be pointing to the pointer "begin". Is a new copy of the list made if I use the constructor as in Point 2?
What happens when I delete the object using del? Is the list deleted as well or do I have to call del on the List before calling del on the containing object? I know Python uses GC but if I am concerned about cleaning unused memory even before GC kicks in is it worth it?
Also assigning an object of type A to another only makes the second one point to the first right? If so how do I do a deep copy? Is there a feature to overload operators? I know python is probably much simpler than this and hence the question.
EDIT:
5. I just realized that using Point 2 and Point 3 does not make a difference. The items from the list begin are only copied by reference and a new copy is not made. To do that I have to create a new list using list(). This makes sense after I see it I guess.
Thanks!
In order:
using this form is simply syntactic sugar for calling the list constructor - i.e. you are creating a new (empty) list. This will be bound to the class itself (is a static field) and will be the same for all instances.
apart from the constructor name which must always be init, both are valid forms, but mean different things.
The first constructor can be called with a list as argument or without. If it is called without arguments, the empty list passed as default is used within (this empty list is created once during class definition, and not once per constructor call), so no items are added to the static list.
The second must be called with a list parameter, or python will complain with an error, but using it without the self. prefix like you are doing, it would just create a new local variable name List, accessible only within the constructor, and leave the static A.List variable unchanged.
Deleting will only unlink a reference to the object, without actually deleting anything. Once all references are removed, however, the garbage collector is free to clear the memory as needed.
It is usually a bad idea to try to control the garbage collector. instead. just make sure you don't hold references to objects you no longer need and let it make its work.
Assigning a variable with an object will only create a new reference to the same object, yes. To create a deep copy use the related functions or write your own.
Operator overloading (use with care, it can make things more confusing instead of clearer if misused) can be done by overriding some special methods in the class definition.
About your edit: like i pointed above, when writing List=list() inside the constructor, without the self. (or better, since the variable is static, A.) prefix, you are just creating an empty variable, and not overriding the one you defined in the class body.
For reference, the usual way to handle a list as default argument is by using a None placeholder:
class A(object):
def __init__(self, arg=None):
self.startvalue = list(arg) if arg is not None else list()
# making a defensive copy of arg to keep the original intact
As an aside, do take a look at the python tutorial. It is very well written and easy to follow and understand.
"It will be awesome if someone could clarify what happens in each case" isn't that the purpose of the dis module ?
http://docs.python.org/2/library/dis.html

cache in python function

This appeared as some test question.
If you consider this function which uses a cache argument as the 1st argument
def f(cache, key, val):
cache[key] = val
# insert some insanely complicated operation on the cache
print cache
and now create a dictionary and use the function like so:
c = {}
f(c,"one",1)
f(c,"two",2)
this seems to work as expected (i.e adding to the c dictionary), but is it actually passing that reference or is it doing some inefficient copy ?
The dictionary passed to cache is not copied. As long as the cache variable is not rebound inside the function, it stays the same object, and modifications to the dictionary it refers to will affect the dictionary outside.
There is not even any need to return cache in this case (and indeed the sample code does not).
It might be better if f was a method on a dictionary-like object, to make this more conceptually clear.
If you use the id() function (built-in, does not need to be imported) you can get a unique identifier for any object. You can use that to confirm that you are really and truly dealing with the same object and not any sort of copy.

Categories