Is there a simple/elegant way to pass an object to a function in a way that does not let it be modified from inside that function? As an example, for a passed array, it should not be possible to perform
array[2]=1
and change the outer version of the array. That was an example, but I am looking for a solution that does not only work for arrays.
I have considered passing a copy, but this feels a little inelegant as it requires constant use of the copy library, and requires the constant computational effort of copying the object(s).
It is not an option to make the object somehow unmodifiable as a whole, because I do want it to be modified outside of the given function.
The short answer is no. There will always be a way to somehow access the object and update it.
Although, there are ways to discourage the mutation of the object. Here are a few.
Pass an immutable object
If some data should not be updated passed some point in the execution of your program, you should make it an immutable object.
my_data = []
populate_my_data(my_data) # this part of the code mutates 'my_data'
my_final_data = tuple(my_data)
Since my_final_data is a tuple, it cannot be mutated. Be aware that mutable objects contained in my_final_data can still be mutated.
Create a view on the object
Instead of passing the object itself, you could provide a view on the object. A view is some instance which provides ways to read your object, but not update it.
Here is how you could define a simple view on a list.
class ListView:
def __init__(self, data):
self._data = data
def __getitem__(self, item):
return self._data[item]
The above provides an interface for reading from the list with __getitem__, but none to update it.
my_list = [1, 2, 3]
my_list_view = ListView(my_list)
print(my_list_view[0]) # 1
my_list_view[0] = None # TypeError: 'ListView' object does not support item assignment
Once again, it is not completely impossible to mutate your data, one could access it through the field my_list_view._data. Although, this design makes it reasonnably clear that the list should not be mutated at that point.
In C++ there is const address param option which prevents the alteration of the given object,
string concatenate (const string& a, const string& b)
{
return a+b;
}
however, in python, there is no equivalent of this usage. You should use copy to handle the task.
Related
When writing a function which accepts a mutable object, which will could be changed, is it necessary to return this object to the caller?
By necessary I mean...
Is there a specific PEP guideline around this?
If not, what is most common in the world of Python programming?
A little bit of code:
def foo(args):
args['a'] = 'new-value'
args['b'] = args['b'] + 1
# is there a need for a 'return args' ?
args = {'a': 'old-value', 'b': 99}
foo(args) # is there a need for args = foo(args)
print(args['a'], args['b']) # outputs new-value 100
"Explicit is better than implicit." makes me think I should make the potential for args to change very explicit in the main body, so that one does not have to look into the function to see if args might be changed...
This is not covered by any PEP, and it's really up to style of the author. Generally in API design though, methods that mutate arguments won't return anything so you don't forget you're mutating things. Be very careful with this kind of design.
In terms of what is more commonplace in python, there are some examples where the object passed is altered. These tend to be methods of the object to be amended (e.g. list.append, where list is a type), but most functions tend to take a copy of the object passed and return a new one (e.g. string.strip, where string is the module string).
This of course also brings up str.strip which is also a method of a str type object which returns a new object.
I have a framework with some C-like language. Now I'm re-writing that framework and the language is being replaced with Python.
I need to find appropriate Python replacement for the following code construction:
SomeFunction(&arg1)
What this does is a C-style pass-by-reference so the variable can be changed inside the function call.
My ideas:
just return the value like v = SomeFunction(arg1)
is not so good, because my generic function can have a lot of arguments like SomeFunction(1,2,'qqq','vvv',.... and many more)
and I want to give the user ability to get the value she wants.
Return the collection of all the arguments no matter have they changed or not, like: resulting_list = SomeFunction(1,2,'qqq','vvv',.... and many more) interesting_value = resulting_list[3]
this can be improved by giving names to the values and returning dictionary interesting_value = resulting_list['magic_value1']
It's not good because we have constructions like
DoALotOfStaff( [SomeFunction1(1,2,3,&arg1,'qq',val2),
SomeFunction2(1,&arg2,v1),
AnotherFunction(),
...
], flags1, my_var,... )
And I wouldn't like to load the user with list of list of variables, with names or indexes she(the user) should know. The kind-of-references would be very useful here ...
Final Response
I compiled all the answers with my own ideas and was able to produce the solution. It works.
Usage
SomeFunction(1,12, get.interesting_value)
AnotherFunction(1, get.the_val, 'qq')
Explanation
Anything prepended by get. is kind-of reference, and its value will be filled by the function. There is no need in previous defining of the value.
Limitation - currently I support only numbers and strings, but these are sufficient form my use-case.
Implementation
wrote a Getter class which overrides getattribute and produces any variable on demand
all newly created variables has pointer to their container Getter and support method set(self,value)
when set() is called it checks if the value is int or string and creates object inheriting from int or str accordingly but with addition of the same set() method. With this new object we replace our instance in the Getter container
Thank you everybody. I will mark as "answer" the response which led me on my way, but all of you helped me somehow.
I would say that your best, cleanest, bet would be to construct an object containing the values to be passed and/or modified - this single object can be passed, (and will automatically be passed by reference), in as a single parameter and the members can be modified to return the new values.
This will simplify the code enormously and you can cope with optional parameters, defaults, etc., cleanly.
>>> class C:
... def __init__(self):
... self.a = 1
... self.b = 2
...
>>> c=C
>>> def f(o):
... o.a = 23
...
>>> f(c)
>>> c
<class __main__.C at 0x7f6952c013f8>
>>> c.a
23
>>>
Note
I am sure that you could extend this idea to have a class of parameter that carried immutable and mutable data into your function with fixed member names plus storing the names of the parameters actually passed then on return map the mutable values back into the caller parameter name. This technique could then be wrapped into a decorator.
I have to say that it sounds like a lot of work compared to re-factoring your existing code to a more object oriented design.
This is how Python works already:
def func(arg):
arg += ['bar']
arg = ['foo']
func(arg)
print arg
Here, the change to arg automatically propagates back to the caller.
For this to work, you have to be careful to modify the arguments in place instead of re-binding them to new objects. Consider the following:
def func(arg):
arg = arg + ['bar']
arg = ['foo']
func(arg)
print arg
Here, func rebinds arg to refer to a brand new list and the caller's arg remains unchanged.
Python doesn't come with this sort of thing built in. You could make your own class which provides this behavior, but it will only support a slightly more awkward syntax where the caller would construct an instance of that class (equivalent to a pointer in C) before calling your functions. It's probably not worth it. I'd return a "named tuple" (look it up) instead--I'm not sure any of the other ways are really better, and some of them are more complex.
There is a major inconsistency here. The drawbacks you're describing against the proposed solutions are related to such subtle rules of good design, that your question becomes invalid. The whole problem lies in the fact that your function violates the Single Responsibility Principle and other guidelines related to it (function shouldn't have more than 2-3 arguments, etc.). There is really no smart compromise here:
either you accept one of the proposed solutions (i.e. Steve Barnes's answer concerning your own wrappers or John Zwinck's answer concerning usage of named tuples) and refrain from focusing on good design subtleties (as your whole design is bad anyway at the moment)
or you fix the design. Then your current problem will disappear as you won't have the God Objects/Functions (the name of the function in your example - DoALotOfStuff really speaks for itself) to deal with anymore.
I'm trying to create a class that populates a list of structured data items along with some methods for populating the list from files and IO devices.
I'm having a problem with my method that fills out a new data structure and appends it to a list. It's set-up as a coroutine that fills up a temporary structure with data from the (yield) function. When it's done it appends the data to the list (e.g. self.list.append(newdata)). My problem is that this append happens by reference and I can't figure out how to initialize newdata to new memoryspace. What winds up happening is I have a list of data all pointing to the same data structure (e.g. "myclass.list[n] is myclass.list[m]" always yields TRUE). Can anyone tell me how to make this work?
If I were writing in C++, I would just need to do "newdata = new * mydatastructure;" after each loop iteration... I just can't figure out how to do this in python.... am I way off course here?
new is syntactic sugar for mydatastructure* = malloc(sizeof(mydatastructure)); (or something like that, it's been a while). It allocates the appropriate amount of memory on the heap for your what-have-you, and if you use the constructor (in C++) it initializes the memory.
Python takes care of this for you. Technically, there is a similar routine in Python, called __new__, which controls the allocation. But, you rarely need to override this on your objects.
The constructor for Python objects is called __init__. When you call __init__, __new__ is actually called first. So, when you construct objects in Python, you are automatically allocating new memory for them, and each one is different. As Benjamin pointed out, the constructor syntax (foo = Foo()) is the way you call __init__ without actually typing __init__().
Your problem lies elsewhere in your code, unfortunately.
By the way, if you really want to be sure that two variables reference the same object, you can use the id() function to get the reference number. The is keyword compares these reference numbers, in contrast to the == operator which uses the __eq__ method of objects to compare them.
My problem is that this append happens by reference and I can't figure out how to initialize newdata to new memoryspace.
If you're trying to append objects into a list by value, you might want to use something like copy.copy or copy.deepcopy to make sure what is being appended is copied.
>>> # The Problem
>>> class ComplexObject:
... def __init__(self, herp, derp):
... self.herp = herp
... self.derp = derp
...
>>> obj = ComplexObject(1, 2)
>>> list = []
>>> list.append(obj)
>>> obj.derp = 5
>>> list[0].derp
5
>>> # obj and list[0] are the same thing in memory
>>> obj
<__main__.ComplexObject instance at 0x0000000002243D48>
>>> list[0]
<__main__.ComplexObject instance at 0x0000000002243D48>
>>> # The solution
>>> from copy import deepcopy
>>> list = []
>>> obj = ComplexObject(1,2)
>>> list.append(deepcopy(obj))
>>> obj.derp = 5
>>> list[0].derp
2
>>> obj
<__main__.ComplexObject instance at 0x0000000002243D48>
>>> list[0]
<__main__.ComplexObject instance at 0x000000000224ED88>
This is my attempt at actually solving your problem from your description without seeing any code. If you're more interested in allocation/constructors in Python, refer to another answer.
I am trying to understand how exactly assignment operators, constructors and parameters passed in functions work in python specifically with lists and objects. I have a class with a list as a parameter. I want to initialize it to an empty list and then want to populate it using the constructor. I am not quite sure how to do it.
Lets say my class is --
class A:
List = [] # Point 1
def __init1__(self, begin=[]): # Point 2
for item in begin:
self.List.append(item)
def __init2__(self, begin): # Point 3
List = begin
def __init3__(self, begin=[]): # Point 4
List = list()
for item in begin:
self.List.append(item)
listObj = A()
del(listObj)
b = listObj
I have the following questions. It will be awesome if someone could clarify what happens in each case --
Is declaring an empty like in Point 1 valid? What is created? A variable pointing to NULL?
Which of Point 2 and Point 3 are valid constructors? In Point 3 I am guessing that a new copy of the list passed in (begin) is not made and instead the variable List will be pointing to the pointer "begin". Is a new copy of the list made if I use the constructor as in Point 2?
What happens when I delete the object using del? Is the list deleted as well or do I have to call del on the List before calling del on the containing object? I know Python uses GC but if I am concerned about cleaning unused memory even before GC kicks in is it worth it?
Also assigning an object of type A to another only makes the second one point to the first right? If so how do I do a deep copy? Is there a feature to overload operators? I know python is probably much simpler than this and hence the question.
EDIT:
5. I just realized that using Point 2 and Point 3 does not make a difference. The items from the list begin are only copied by reference and a new copy is not made. To do that I have to create a new list using list(). This makes sense after I see it I guess.
Thanks!
In order:
using this form is simply syntactic sugar for calling the list constructor - i.e. you are creating a new (empty) list. This will be bound to the class itself (is a static field) and will be the same for all instances.
apart from the constructor name which must always be init, both are valid forms, but mean different things.
The first constructor can be called with a list as argument or without. If it is called without arguments, the empty list passed as default is used within (this empty list is created once during class definition, and not once per constructor call), so no items are added to the static list.
The second must be called with a list parameter, or python will complain with an error, but using it without the self. prefix like you are doing, it would just create a new local variable name List, accessible only within the constructor, and leave the static A.List variable unchanged.
Deleting will only unlink a reference to the object, without actually deleting anything. Once all references are removed, however, the garbage collector is free to clear the memory as needed.
It is usually a bad idea to try to control the garbage collector. instead. just make sure you don't hold references to objects you no longer need and let it make its work.
Assigning a variable with an object will only create a new reference to the same object, yes. To create a deep copy use the related functions or write your own.
Operator overloading (use with care, it can make things more confusing instead of clearer if misused) can be done by overriding some special methods in the class definition.
About your edit: like i pointed above, when writing List=list() inside the constructor, without the self. (or better, since the variable is static, A.) prefix, you are just creating an empty variable, and not overriding the one you defined in the class body.
For reference, the usual way to handle a list as default argument is by using a None placeholder:
class A(object):
def __init__(self, arg=None):
self.startvalue = list(arg) if arg is not None else list()
# making a defensive copy of arg to keep the original intact
As an aside, do take a look at the python tutorial. It is very well written and easy to follow and understand.
"It will be awesome if someone could clarify what happens in each case" isn't that the purpose of the dis module ?
http://docs.python.org/2/library/dis.html
This appeared as some test question.
If you consider this function which uses a cache argument as the 1st argument
def f(cache, key, val):
cache[key] = val
# insert some insanely complicated operation on the cache
print cache
and now create a dictionary and use the function like so:
c = {}
f(c,"one",1)
f(c,"two",2)
this seems to work as expected (i.e adding to the c dictionary), but is it actually passing that reference or is it doing some inefficient copy ?
The dictionary passed to cache is not copied. As long as the cache variable is not rebound inside the function, it stays the same object, and modifications to the dictionary it refers to will affect the dictionary outside.
There is not even any need to return cache in this case (and indeed the sample code does not).
It might be better if f was a method on a dictionary-like object, to make this more conceptually clear.
If you use the id() function (built-in, does not need to be imported) you can get a unique identifier for any object. You can use that to confirm that you are really and truly dealing with the same object and not any sort of copy.