This appeared as some test question.
If you consider this function which uses a cache argument as the 1st argument
def f(cache, key, val):
cache[key] = val
# insert some insanely complicated operation on the cache
print cache
and now create a dictionary and use the function like so:
c = {}
f(c,"one",1)
f(c,"two",2)
this seems to work as expected (i.e adding to the c dictionary), but is it actually passing that reference or is it doing some inefficient copy ?
The dictionary passed to cache is not copied. As long as the cache variable is not rebound inside the function, it stays the same object, and modifications to the dictionary it refers to will affect the dictionary outside.
There is not even any need to return cache in this case (and indeed the sample code does not).
It might be better if f was a method on a dictionary-like object, to make this more conceptually clear.
If you use the id() function (built-in, does not need to be imported) you can get a unique identifier for any object. You can use that to confirm that you are really and truly dealing with the same object and not any sort of copy.
Related
Is there a simple/elegant way to pass an object to a function in a way that does not let it be modified from inside that function? As an example, for a passed array, it should not be possible to perform
array[2]=1
and change the outer version of the array. That was an example, but I am looking for a solution that does not only work for arrays.
I have considered passing a copy, but this feels a little inelegant as it requires constant use of the copy library, and requires the constant computational effort of copying the object(s).
It is not an option to make the object somehow unmodifiable as a whole, because I do want it to be modified outside of the given function.
The short answer is no. There will always be a way to somehow access the object and update it.
Although, there are ways to discourage the mutation of the object. Here are a few.
Pass an immutable object
If some data should not be updated passed some point in the execution of your program, you should make it an immutable object.
my_data = []
populate_my_data(my_data) # this part of the code mutates 'my_data'
my_final_data = tuple(my_data)
Since my_final_data is a tuple, it cannot be mutated. Be aware that mutable objects contained in my_final_data can still be mutated.
Create a view on the object
Instead of passing the object itself, you could provide a view on the object. A view is some instance which provides ways to read your object, but not update it.
Here is how you could define a simple view on a list.
class ListView:
def __init__(self, data):
self._data = data
def __getitem__(self, item):
return self._data[item]
The above provides an interface for reading from the list with __getitem__, but none to update it.
my_list = [1, 2, 3]
my_list_view = ListView(my_list)
print(my_list_view[0]) # 1
my_list_view[0] = None # TypeError: 'ListView' object does not support item assignment
Once again, it is not completely impossible to mutate your data, one could access it through the field my_list_view._data. Although, this design makes it reasonnably clear that the list should not be mutated at that point.
In C++ there is const address param option which prevents the alteration of the given object,
string concatenate (const string& a, const string& b)
{
return a+b;
}
however, in python, there is no equivalent of this usage. You should use copy to handle the task.
Assume we have an object a and we want modify data which is structures like this
a.substructure1.subsubstructure1.name_of_the_data1
and this
a.substructure2.subsubstructure2.name_of_the_data2
To access this structure we call an external method get_the_data_shortcut(a) which is heavily parameterized (for example the parameter subsstructure specifies which substructure to return). This seems very redundant but there is a very good default setting for all these parameter which makes sense. Also, this function will return another branch of data if the default branch is not available.
How do I modify get_the_data_shortcut(a) ?
b = get_the_data_shortcut(a)
b = b + 1
Then, get_the_data_shortcut(a) is unchanged because well Python is not Java.
Do I need a setter? Mostly, this is not my code and written by people who write pythonic code, and I am trying to keep up with those standards.
As you discovered changing the object b refers to won't modify the a object (or its substructures). If you want to do this you will need a method similar to your get_the_data_shortcut(a). Namely a
set_the_data_shortcut(a, newvalue)
Alternatively you could have a method which would return the substructure the value was stored in and manipulate that..
# returns a.substructure2.subsubstructure2
# or a.substructure1.subsubstructure1 based on the value of kind
substruct = get_the_substructure(a, kind)
substruct.name_of_data1 += 1
Python uses reference types, just like java.
However, when you do
b = b + 1
you are not updating the object you have. Instead, you are creating a new object and assigning it to the variable b.
If you want to update the value of b in the data structure, you should follow your suggestion and write a setter for the data structure.
Suppose I want PERL-like autovivication in Python, i.e.:
>>> d = Autovivifier()
>>> d = ['nested']['key']['value']=10
>>> d
{'nested': {'key': {'value': 10}}}
There are a couple of dominant ways to do that:
Use a recursive default dict
Use a __missing__ hook to return the nested structure
OK -- easy.
Now suppose I want to return a default value from a dict with a missing key. Once again, few way to do that:
For a non-nested path, you can use a __missing__ hook
try/except block wrapping the access to potentially missing key path
Use {}.get(key, default) (does not easily work with a nested dict) i.e., There is no version of autoviv.get(['nested']['key']['no key of this value'], default)
The two goals seem in irreconcilable conflict (based on me trying to work this out the last couple hours.)
Here is the question:
Suppose I want to have an Autovivifying dict that 1) creates the nested structure for d['arbitrary']['nested']['path']; AND 2) returns a default value from a non-existing arbitrary nesting without wrapping that in try/except?
Here are the issues:
The call of d['nested']['key']['no key of this value'] is equivalent to (d['nested'])['key']['no key of this value']. Overiding __getitem__ does not work without returning an object that ALSO overrides __getitem__.
Both the methods for creating an Autovivifier will create a dict entry if you test that path for existence. i.e., I do not want if d['p1']['sp2']['etc.'] to create that whole path if you just test it with the if.
How can I provide a dict in Python that will:
Create an access path of the type d['p1']['p2'][etc]=val (Autovivication);
NOT create that same path if you test for existence;
Return a default value (like {}.get(key, default)) without wrapping in try/except
I do not need the FULL set of dict operations. Really only d=['nested']['key']['value']=val and d['nested']['key']['no key of this value'] is equal to a default value. I would prefer that testing d['nested']['key']['no key of this value'] does not create it, but would accept that.
?
To create a recursive tree of dictionaries, use defaultdict with a trick:
from collections import defaultdict
tree = lambda: defaultdict(tree)
Then you can create your x with x = tree().
above from #BrenBarn -- defaultdict of defaultdict, nested
Don't do this. It could be solved much more easily by just writing a class that has the operations you want, and even in Perl it's not a universally-appraised feature.
But, well, it is possible, with a custom autoviv class. You'd need a __getitem__ that returns an empty autoviv dict but doesn't store it. The new autoviv dict would remember the autoviv dict and key that created it, then insert itself into its parent only when a "real" value is stored in it.
Since an empty dict tests as falsey, you could then test for existence Perl-style, without ever actually creating the intermediate dicts.
But I'm not going to write the code out, because I'm pretty sure this is a terrible idea.
While it does not precisely match the dictionary protocol in Python, you could achieve reasonable results by implementing your own auto-vivification dictionary that uses variable getitem arguments. Something like (2.x):
class ExampleVivifier(object):
""" Small example class to show how to use varargs in __getitem__. """
def __getitem__(self, *args):
print args
Example usage would be:
>>> v = ExampleVivifier()
>>> v["nested", "dictionary", "path"]
(('nested', 'dictionary', 'path'),)
You can fill in the blanks to see how you can achieve your desired behaviour here.
I am trying to understand how exactly assignment operators, constructors and parameters passed in functions work in python specifically with lists and objects. I have a class with a list as a parameter. I want to initialize it to an empty list and then want to populate it using the constructor. I am not quite sure how to do it.
Lets say my class is --
class A:
List = [] # Point 1
def __init1__(self, begin=[]): # Point 2
for item in begin:
self.List.append(item)
def __init2__(self, begin): # Point 3
List = begin
def __init3__(self, begin=[]): # Point 4
List = list()
for item in begin:
self.List.append(item)
listObj = A()
del(listObj)
b = listObj
I have the following questions. It will be awesome if someone could clarify what happens in each case --
Is declaring an empty like in Point 1 valid? What is created? A variable pointing to NULL?
Which of Point 2 and Point 3 are valid constructors? In Point 3 I am guessing that a new copy of the list passed in (begin) is not made and instead the variable List will be pointing to the pointer "begin". Is a new copy of the list made if I use the constructor as in Point 2?
What happens when I delete the object using del? Is the list deleted as well or do I have to call del on the List before calling del on the containing object? I know Python uses GC but if I am concerned about cleaning unused memory even before GC kicks in is it worth it?
Also assigning an object of type A to another only makes the second one point to the first right? If so how do I do a deep copy? Is there a feature to overload operators? I know python is probably much simpler than this and hence the question.
EDIT:
5. I just realized that using Point 2 and Point 3 does not make a difference. The items from the list begin are only copied by reference and a new copy is not made. To do that I have to create a new list using list(). This makes sense after I see it I guess.
Thanks!
In order:
using this form is simply syntactic sugar for calling the list constructor - i.e. you are creating a new (empty) list. This will be bound to the class itself (is a static field) and will be the same for all instances.
apart from the constructor name which must always be init, both are valid forms, but mean different things.
The first constructor can be called with a list as argument or without. If it is called without arguments, the empty list passed as default is used within (this empty list is created once during class definition, and not once per constructor call), so no items are added to the static list.
The second must be called with a list parameter, or python will complain with an error, but using it without the self. prefix like you are doing, it would just create a new local variable name List, accessible only within the constructor, and leave the static A.List variable unchanged.
Deleting will only unlink a reference to the object, without actually deleting anything. Once all references are removed, however, the garbage collector is free to clear the memory as needed.
It is usually a bad idea to try to control the garbage collector. instead. just make sure you don't hold references to objects you no longer need and let it make its work.
Assigning a variable with an object will only create a new reference to the same object, yes. To create a deep copy use the related functions or write your own.
Operator overloading (use with care, it can make things more confusing instead of clearer if misused) can be done by overriding some special methods in the class definition.
About your edit: like i pointed above, when writing List=list() inside the constructor, without the self. (or better, since the variable is static, A.) prefix, you are just creating an empty variable, and not overriding the one you defined in the class body.
For reference, the usual way to handle a list as default argument is by using a None placeholder:
class A(object):
def __init__(self, arg=None):
self.startvalue = list(arg) if arg is not None else list()
# making a defensive copy of arg to keep the original intact
As an aside, do take a look at the python tutorial. It is very well written and easy to follow and understand.
"It will be awesome if someone could clarify what happens in each case" isn't that the purpose of the dis module ?
http://docs.python.org/2/library/dis.html
Newbie Alert:
I'm new to Python and when I'm basically adding values to a dict, I find that when I'm printing the whole dictionary, I get the same value of something for all keys of a specific key.
Seems like a pointer issue?
Here's a snippet when using the event-based XML parser (SAX):
Basically with every end element of "row", I'm storing the element by it's key: self.Id, where self is the element.
def endElement(self, name):
if name == "row":
self.mapping[self.Id] = self
print "Storing...: " + self.DisplayName + " at Id: " + self.Id
You'll get the value self for every single entry in self.mapping, of course, since that's the only value you ever store there. Did you rather mean to take a copy/snapshot of self or some of its attributes at that point, then have self change before it gets stored again...?
Edit: as the OP has clarified (in comments) that they do indeed need to take a copy:
import copy
...
self.mapping[self.Id] = copy.copy(self)
or, use copy.deepcopy(self) if self has, among its attributes, dictionaries, lists etc that need to be recursively copied (that would of course include self.mapping, leading to rather peculiar results -- if the normal, shallow copy.copy is not sufficient, it's probably worth adding the special method to self's class to customize deep copying, to avoid the explosion of copies of copies of copies of ... that would normally result;-).
If I understand what you're saying, then this is probably expected behaviour. When you make an assignment in Python, you're just assigning the reference (sort of like a pointer). When you do:
self.mapping[self.Id] = self
then future changes to self will be reflected in the value for that mapping you just did. Python does not "copy" objects (unless you specifically write code to do so), it only assigns references.