what is the python equivalent of "new" in C++ - python

I'm trying to create a class that populates a list of structured data items along with some methods for populating the list from files and IO devices.
I'm having a problem with my method that fills out a new data structure and appends it to a list. It's set-up as a coroutine that fills up a temporary structure with data from the (yield) function. When it's done it appends the data to the list (e.g. self.list.append(newdata)). My problem is that this append happens by reference and I can't figure out how to initialize newdata to new memoryspace. What winds up happening is I have a list of data all pointing to the same data structure (e.g. "myclass.list[n] is myclass.list[m]" always yields TRUE). Can anyone tell me how to make this work?
If I were writing in C++, I would just need to do "newdata = new * mydatastructure;" after each loop iteration... I just can't figure out how to do this in python.... am I way off course here?

new is syntactic sugar for mydatastructure* = malloc(sizeof(mydatastructure)); (or something like that, it's been a while). It allocates the appropriate amount of memory on the heap for your what-have-you, and if you use the constructor (in C++) it initializes the memory.
Python takes care of this for you. Technically, there is a similar routine in Python, called __new__, which controls the allocation. But, you rarely need to override this on your objects.
The constructor for Python objects is called __init__. When you call __init__, __new__ is actually called first. So, when you construct objects in Python, you are automatically allocating new memory for them, and each one is different. As Benjamin pointed out, the constructor syntax (foo = Foo()) is the way you call __init__ without actually typing __init__().
Your problem lies elsewhere in your code, unfortunately.
By the way, if you really want to be sure that two variables reference the same object, you can use the id() function to get the reference number. The is keyword compares these reference numbers, in contrast to the == operator which uses the __eq__ method of objects to compare them.

My problem is that this append happens by reference and I can't figure out how to initialize newdata to new memoryspace.
If you're trying to append objects into a list by value, you might want to use something like copy.copy or copy.deepcopy to make sure what is being appended is copied.
>>> # The Problem
>>> class ComplexObject:
... def __init__(self, herp, derp):
... self.herp = herp
... self.derp = derp
...
>>> obj = ComplexObject(1, 2)
>>> list = []
>>> list.append(obj)
>>> obj.derp = 5
>>> list[0].derp
5
>>> # obj and list[0] are the same thing in memory
>>> obj
<__main__.ComplexObject instance at 0x0000000002243D48>
>>> list[0]
<__main__.ComplexObject instance at 0x0000000002243D48>
>>> # The solution
>>> from copy import deepcopy
>>> list = []
>>> obj = ComplexObject(1,2)
>>> list.append(deepcopy(obj))
>>> obj.derp = 5
>>> list[0].derp
2
>>> obj
<__main__.ComplexObject instance at 0x0000000002243D48>
>>> list[0]
<__main__.ComplexObject instance at 0x000000000224ED88>
This is my attempt at actually solving your problem from your description without seeing any code. If you're more interested in allocation/constructors in Python, refer to another answer.

Related

Updating a list in Python: Why is the scope of my for-loop within a function apparently global?

I'm an absolute Python-Newbe and I have some trouble with following function. I hope you can help me. Thank you very much for your help in advance!
I have created a list of zip-files in a directory via a list-comprehension:
zips_in_folder = [file for file in os.listdir(my_path) if file.endswith('.zip')]
I then wanted to define a function that replaces a certain character at a certain index in every element fo the list with "-":
print(zips_in_folder)
def replacer_zip_names(r_index, replacer, zips_in_folder=zips_in_folder):
for index, element in enumerate(zips_in_folder):
x = list(element)
x[r_index] = replacer
zips_in_folder[index]=''.join(x)
replacer_zip_names(5,"-")
print(zips_in_folder)
Output:
['12345#6', '22345#6']
['12345-6', '22345-6']
The function worked, but what I cannot wrap my head around: Why will my function update the actual list "zips_in_folder". I thought the "zips_in_folder"-list within the function would only be a "shadow" of the actual list outside the function. Is the scope of the for-loop global instead of local in this case?
In other functions I wrote the scope of the variables was always local...
I was searching for an answer for hours now, I hope my question isn't too obvious!
Thanks again!
Best
Felix
This is a rather intermediate topic. In one line: Python is pass-by-object-reference.
What this means
zips_in_folder is an object. An object has a reference (think of it like an address) that points to its location in memory. To access an object, you need to use its reference.
Now, here's the key part:
For objects Python passes their reference as value
This means that a copy of the reference of the object is created but again, the new reference is pointing to the same location in the memory.
As a consequence, if you use the reference's copy to access the object, then the original object will be modified.
In your function, zips_in_folder is a variable storing a new copy of the reference.
The following line is using the new copy to access the original object:
zips_in_folder[index]=''.join(x)
However, if you decide to reassign the variable that is storing the reference, nothing will be done to the object, or its original reference, because you just reassigned the variable storing the copy of the reference, you did not modify the original object. Meaning that:
def reassign(a):
a = []
a = [1,0]
reassign(a)
print(a) # output: [1,0]
A simple way to think about it is that lists are mutable, this means that the following will be true:
a = [1, 2, 3]
b = a # a, b are referring to the same object
a[1] = 20 # b now is [1, 20, 3]
That is because lists are objects in python, not primitive variables, so the function changes the "original" list i.e. it doesn't make a local copy of it.
The same is true for any class, user-defined or otherwise: a function manipulating an object will not make a copy of the object, it will change the "original" object passed to it.
If you have knowledge of c++ or any other low-level programming language, it's the same as pass-by-reference.

How to make my Class' objects copied when called upon (when iterated for example)

I need my class to create new copy of object when referenced.
For example:
obj1 = MyClass(1,2)
obj2 = obj1
obj2.1st_att = 5
>>> print obj1
(5,2)
I want obj1 to remain unlinked to obj2
You should copy object in cases like this.
from copy import copy
obj1 = MyClass(1,2)
obj2 = copy(obj1)
obj2.1st_att = 5
Or deepcopy if your class is complicated and has lots of references.
Python isn't making a copy of ints or floats every time you assign them to a different variable. The thing is that they're not mutable; there's nothing you could do to an int that would change it, so however many variables you assign it to, you won't ever get any surprising behaviour.
With obj2.1st_att = 5, you're explicitly modifying an attribute of an object. There's no analog operation to that for an int. It is expected that this operation modifies the object, and that this change will be visible to anyone else who holds a reference to that object.
You should not try to work around that behaviour in any way, as that would break a lot of expectations and cause bugs or surprising behaviour in itself. It is good that making a copy of an object is an explicit action you need to take. Get used to it.

Without pointers, can I pass references as arguments in Python? [duplicate]

This question already has answers here:
How do I pass a variable by reference?
(39 answers)
Closed 8 years ago.
Since Python doesn't have pointers, I am wondering how I can pass a reference to an object through to a function instead of copying the entire object. This is a very contrived example, but say I am writing a function like this:
def some_function(x):
c = x/2 + 47
return c
y = 4
z = 12
print some_function(y)
print some_function(z)
From my understanding, when I call some_function(y), Python allocates new space to store the argument value, then erases this data once the function has returned c and it's no longer needed. Since I am not actually altering the argument within some_function, how can I simply reference y from within the function instead of copying y when I pass it through? In this case it doesn't matter much, but if y was very large (say a giant matrix), copying it could eat up some significant time and space.
Your understanding is, unfortunately, completely wrong. Python does not copy the value, nor does it allocate space for a new one. It passes a value which is itself a reference to the object. If you modify that object (rather than rebinding its name), then the original will be modified.
Edit
I wish you would stop worrying about memory allocation: Python is not C++, almost all of the time you don't need to think about memory.
It's easier to demonstrate rebinding via the use of something like a list:
def my_func(foo):
foo.append(3) # now the source list also has the number 3
foo = [3] # we've re-bound 'foo' to something else, severing the relationship
foo.append(4) # the source list is unaffected
return foo
original = [1, 2]
new = my_func(original)
print original # [1, 2, 3]
print new # [3, 4]
It might help if you think in terms of names rather than variables: inside the function, the name "foo" starts off being a reference to the original list, but then we change that name to point to a new, different list.
Python parameters are always "references".
The way parameters in Python works and the way they are explained on the docs can be confusing and misleading to newcomers to the languages, specially if you have a background on other languages which allows you to choose between "pass by value" and "pass by reference".
In Python terms, a "reference" is just a pointer with some more metadata to help the garbage collector do its job. And every variable and every parameter are always "references".
So, internally, Python pass a "pointer" to each parameter. You can easily see this in this example:
>>> def f(L):
... L.append(3)
...
>>> X = []
>>> f(X)
>>> X
[3]
The variable X points to a list, and the parameter L is a copy of the "pointer" of the list, and not a copy of the list itself.
Take care to note that this is not the same as "pass-by-reference" as C++ with the & qualifier, or pascal with the var qualifier.

memory management with objects and lists in python

I am trying to understand how exactly assignment operators, constructors and parameters passed in functions work in python specifically with lists and objects. I have a class with a list as a parameter. I want to initialize it to an empty list and then want to populate it using the constructor. I am not quite sure how to do it.
Lets say my class is --
class A:
List = [] # Point 1
def __init1__(self, begin=[]): # Point 2
for item in begin:
self.List.append(item)
def __init2__(self, begin): # Point 3
List = begin
def __init3__(self, begin=[]): # Point 4
List = list()
for item in begin:
self.List.append(item)
listObj = A()
del(listObj)
b = listObj
I have the following questions. It will be awesome if someone could clarify what happens in each case --
Is declaring an empty like in Point 1 valid? What is created? A variable pointing to NULL?
Which of Point 2 and Point 3 are valid constructors? In Point 3 I am guessing that a new copy of the list passed in (begin) is not made and instead the variable List will be pointing to the pointer "begin". Is a new copy of the list made if I use the constructor as in Point 2?
What happens when I delete the object using del? Is the list deleted as well or do I have to call del on the List before calling del on the containing object? I know Python uses GC but if I am concerned about cleaning unused memory even before GC kicks in is it worth it?
Also assigning an object of type A to another only makes the second one point to the first right? If so how do I do a deep copy? Is there a feature to overload operators? I know python is probably much simpler than this and hence the question.
EDIT:
5. I just realized that using Point 2 and Point 3 does not make a difference. The items from the list begin are only copied by reference and a new copy is not made. To do that I have to create a new list using list(). This makes sense after I see it I guess.
Thanks!
In order:
using this form is simply syntactic sugar for calling the list constructor - i.e. you are creating a new (empty) list. This will be bound to the class itself (is a static field) and will be the same for all instances.
apart from the constructor name which must always be init, both are valid forms, but mean different things.
The first constructor can be called with a list as argument or without. If it is called without arguments, the empty list passed as default is used within (this empty list is created once during class definition, and not once per constructor call), so no items are added to the static list.
The second must be called with a list parameter, or python will complain with an error, but using it without the self. prefix like you are doing, it would just create a new local variable name List, accessible only within the constructor, and leave the static A.List variable unchanged.
Deleting will only unlink a reference to the object, without actually deleting anything. Once all references are removed, however, the garbage collector is free to clear the memory as needed.
It is usually a bad idea to try to control the garbage collector. instead. just make sure you don't hold references to objects you no longer need and let it make its work.
Assigning a variable with an object will only create a new reference to the same object, yes. To create a deep copy use the related functions or write your own.
Operator overloading (use with care, it can make things more confusing instead of clearer if misused) can be done by overriding some special methods in the class definition.
About your edit: like i pointed above, when writing List=list() inside the constructor, without the self. (or better, since the variable is static, A.) prefix, you are just creating an empty variable, and not overriding the one you defined in the class body.
For reference, the usual way to handle a list as default argument is by using a None placeholder:
class A(object):
def __init__(self, arg=None):
self.startvalue = list(arg) if arg is not None else list()
# making a defensive copy of arg to keep the original intact
As an aside, do take a look at the python tutorial. It is very well written and easy to follow and understand.
"It will be awesome if someone could clarify what happens in each case" isn't that the purpose of the dis module ?
http://docs.python.org/2/library/dis.html

Assignment of objects and fundamental types [duplicate]

This question already has answers here:
Why variable = object doesn't work like variable = number
(10 answers)
Closed 4 years ago.
There is this code:
# assignment behaviour for integer
a = b = 0
print a, b # prints 0 0
a = 4
print a, b # prints 4 0 - different!
# assignment behaviour for class object
class Klasa:
def __init__(self, num):
self.num = num
a = Klasa(2)
b = a
print a.num, b.num # prints 2 2
a.num = 3
print a.num, b.num # prints 3 3 - the same!
Questions:
Why assignment operator works differently for fundamental type and
class object (for fundamental types it copies by value, for class object it copies by reference)?
How to copy class objects only by value?
How to make references for fundamental types like in C++ int& b = a?
This is a stumbling block for many Python users. The object reference semantics are different from what C programmers are used to.
Let's take the first case. When you say a = b = 0, a new int object is created with value 0 and two references to it are created (one is a and another is b). These two variables point to the same object (the integer which we created). Now, we run a = 4. A new int object of value 4 is created and a is made to point to that. This means, that the number of references to 4 is one and the number of references to 0 has been reduced by one.
Compare this with a = 4 in C where the area of memory which a "points" to is written to. a = b = 4 in C means that 4 is written to two pieces of memory - one for a and another for b.
Now the second case, a = Klass(2) creates an object of type Klass, increments its reference count by one and makes a point to it. b = a simply takes what a points to , makes b point to the same thing and increments the reference count of the thing by one. It's the same as what would happen if you did a = b = Klass(2). Trying to print a.num and b.num are the same since you're dereferencing the same object and printing an attribute value. You can use the id builtin function to see that the object is the same (id(a) and id(b) will return the same identifier). Now, you change the object by assigning a value to one of it's attributes. Since a and b point to the same object, you'd expect the change in value to be visible when the object is accessed via a or b. And that's exactly how it is.
Now, for the answers to your questions.
The assignment operator doesn't work differently for these two. All it does is add a reference to the RValue and makes the LValue point to it. It's always "by reference" (although this term makes more sense in the context of parameter passing than simple assignments).
If you want copies of objects, use the copy module.
As I said in point 1, when you do an assignment, you always shift references. Copying is never done unless you ask for it.
Quoting from Data Model
Objects are Python’s abstraction for data. All data in a Python
program is represented by objects or by relations between objects. (In
a sense, and in conformance to Von Neumann’s model of a “stored
program computer,” code is also represented by objects.)
From Python's point of view, Fundamental data type is fundamentally different from C/C++. It is used to map C/C++ data types to Python. And so let's leave it from the discussion for the time being and consider the fact that all data are object and are manifestation of some class. Every object has an ID (somewhat like address), Value, and a Type.
All objects are copied by reference. For ex
>>> x=20
>>> y=x
>>> id(x)==id(y)
True
>>>
The only way to have a new instance is by creating one.
>>> x=3
>>> id(x)==id(y)
False
>>> x==y
False
This may sound complicated at first instance but to simplify a bit, Python made some types immutable. For example you can't change a string. You have to slice it and create a new string object.
Often copying by reference gives unexpected results for ex.
x=[[0]*8]*8 may give you a feeling that it creates a two dimensional list of 0s. But in fact it creates a list of the reference of the same list object [0]s. So doing x[1][1] would end up changing all the duplicate instance at the same time.
The Copy module provides a method called deepcopy to create a new instance of the object rather than a shallow instance. This is beneficial when you intend to have two distinct object and manipulate it separately just as you intended in your second example.
To extend your example
>>> class Klasa:
def __init__(self, num):
self.num = num
>>> a = Klasa(2)
>>> b = copy.deepcopy(a)
>>> print a.num, b.num # prints 2 2
2 2
>>> a.num = 3
>>> print a.num, b.num # prints 3 3 - different!
3 2
It doesn't work differently. In your first example, you changed a so that a and b reference different objects. In your second example, you did not, so a and b still reference the same object.
Integers, by the way, are immutable. You can't modify their value. All you can do is make a new integer and rebind your reference. (like you did in your first example)
Suppose you and I have a common friend. If I decide that I no longer like her, she is still your friend. On the other hand, if I give her a gift, your friend received a gift.
Assignment doesn't copy anything in Python, and "copy by reference" is somewhere between awkward and meaningless (as you actually point out in one of your comments). Assignment causes a variable to begin referring to a value. There aren't separate "fundamental types" in Python; while some of them are built-in, int is still a class.
In both cases, assignment causes the variable to refer to whatever it is that the right-hand-side evaluates to. The behaviour you're seeing is exactly what you should expect in that environment, per the metaphor. Whether your "friend" is an int or a Klasa, assigning to an attribute is fundamentally different from reassigning the variable to a completely other instance, with the correspondingly different behaviour.
The only real difference is that the int doesn't happen to have any attributes you can assign to. (That's the part where the implementation actually has to do a little magic to restrict you.)
You are confusing two different concepts of a "reference". The C++ T& is a magical thing that, when assigned to, updates the referred-to object in-place, and not the reference itself; that can never be "reseated" once the reference is initialized. This is useful in a language where most things are values. In Python, everything is a reference to begin with. The Pythonic reference is more like an always-valid, never-null, not-usable-for-arithmetic, automatically-dereferenced pointer. Assignment causes the reference to start referring to a different thing completely. You can't "update the referred-to object in-place" by replacing it wholesale, because Python's objects just don't work like that. You can, of course, update its internal state by playing with its attributes (if there are any accessible ones), but those attributes are, themselves, also all references.

Categories