Is there a name for this kind of variable stomping bug? - python

I solved a bug recently where (in Python code) a dictionary variable was initialized outside the loop, then modified and assigned to another dictionary within the loop. The expectation was that a deep copy of the variable was being assigned to the dictionary, but really it was the same variable being passed in over and over. The end result was that the dictionary contained a bunch of repeated dictionaries within it, instead of unique dictionaries for each iteration of the loop.
Something like this:
d = []
a = {"key": "value"}
for x in range(5):
a["key2"] = "value" + str(x)
d.append({"results": a})
Where the proper behavior is something more like this
d = []
for x in range(5):
a = {"key": "value", "key2": "value" + str(x)}
d.append({"results": a})
Going to write a changelog message for this fix, I was wondering if there was a proper term for this kind of bug? The best I could come up with was "variable stomping" but I believe there's more descriptive.

I would use a line like
Create a new dictionary at each iteration to prevent mutating template
This makes it clear what the original issue is and how it's solved - as for naming, the word is mutating!

Related

List of empty variables in Python

Is there any possibility of creating a list of variables/names* that have not been defined yet, and then loop through the list at a later stage to define them?
Like this:
varList = [varA, varB, varC]
for var in varList:
var = 0
print(varList)
>>>[0, 0, 0]
The reason I'm asking is because I have a project where I could hypothetically batch fill 40+ variables/names* this way by looping through a Pandas series*. Unfortunately Python doesn't seem to allow undefined variables in a list.
Does anyone have a creative workaround?
EDIT: Since you asked for the specific problem, here goes:
I have a Pandas series that looks like this (excuse the Swedish):
print(Elanv)
>>>
Förb. KVV PTP 5653,021978
Förb. KVV Skogsflis 0
Förb. KVV Återvinningsflis 337,1416119
Förb. KVV Eo1 6,1
Förb. HVC Återvinningsflis 1848
Name: Elanv, dtype: object
I want to store each value in this array to a set of new variables/names*, the names of which I want to control. For example, I want the new variable/name* containing the first value to be called "förbKVVptp", the second one "förbKVVsflis", and so forth.
The "normal" option is to assign each variable manually, like this:
förbKVVptp, förbKVVsflis, förbKVVåflis = Elanv.iloc[0], Elanv.iloc[1], Elanv.iloc[2] ....
But that creates a not so nice looking long bunch of code just to name variables/names*. Instead I thought I could do something like this (obviously with all the variables/names*, not just the first three) which looks and feels cleaner:
varList = [förbKVVptp, förbKVVsflis, förbKVVåflis]
for i, var in enumerate(varList): var = Elanv.iloc[i]
print(varList)
>>>[5653,021978, 0, 337,1416119]
Obviously this becomes pointless if I have to write the name of my new variables/names* twice (first to define them, then to put them inside the varList) so that was why I asked.
You cannot create uninitialized variables in python. Python doesn't really have variables, it has names referring to values. An uninitialized variable would be a name that doesn't refer to a value - so basically just a string:
varList = ['förbKVVptp', 'förbKVVsflis', 'förbKVVåflis']
You can turn these strings into variables by associating them with a value. One of the ways to do that is via globals:
for i, varname in enumerate(varList):
globals()[varname] = Elanv.iloc[i]
However, dynamically creating variables like this is often a code smell. Consider storing the values in a dictionary or list instead:
my_vars_dict = {
'förbKVVptp': Elanv.iloc[0],
'förbKVVsflis': Elanv.iloc[1],
'förbKVVåflis': Elanv.iloc[2]
}
my_vars_list = [Elanv.iloc[0], Elanv.iloc[1], Elanv.iloc[2]]
See also How do I create a variable number of variables?.
The answer to your question is that you can not have undefined variables in a list.
My solution is specific to solving this part of your problem The reason I'm asking is that I have a project where I could hypothetically batch fill over 100 arrays this way by looping through a Pandas array.
Below solution prefills the list with None and then you can change the values in the list.
Code:
varList = [None]*3
for i in range(len(varList)):
varList[i] = 0
print(varList)
Output:
[0, 0, 0]
So something you are trying to do in your example that won't do what you expect, is how you are trying to modify the list:
for var in varList:
var = 0
When you do var = 0, it won't change the list, nor the values of varA, varB, varC (if they were defined.)
Similarly, the following won't change the value of the list. It will just change the value of var.
var = mylist[0]
var = 1
To change the value of the list, you need to do an assignment expression on an indexed item on the list:
mylist = [None, None, None]
for i in range(len(mylist)):
mylist[i] = 0
print(mylist)
Note that by creating a list with empty slots before assigning the value is inefficient and not pythonic. A better way would be to just iterate through the source values, and append them to a list, or even better, use a list comprehension.

define a dict of variables in a for loop : hello = var('hello')

var = {'hello': 'world', 'good': 'day', 'see': 'you'}
Function:
def func(key):
return newfunc(var[key])
I would like to get something like this: hello = func('hello') = newfunc('world').
varlist = list(var.keys())
for i, variab in enumerate(varlist):
varname = variab
variab = func(varname)
But the problem at last the variables are not defined because the variable variab is overwritten when the next iteration starts. So do I have other ways to code a for loop to define all the variables in the dict?
I know I can keep writing hello = func('hello') and other variables every line but I would like to know if another method is possible.
You may find this article to be a worthwhile read: http://stupidpythonideas.blogspot.com/2013/05/why-you-dont-want-to-dynamically-create.html/
The short answer to the problem is that when you do:
variab = func(varname)
You aren't defining a new variable, you are just defining the value stored in the variable. Variab is static. The name isnt changing. To define a static variable, you use the syntax
globlas()[variablename] = variablevalue
And while this is possible, it begs the question of why? There is pretty much no need to create variables dynamically in this way, and there's a reason why you don't generally see this pattern in programming. The solution? Use a data structure to solve the problem properly.
The article suggests dictionaries, but depending on your data structure you can use classes as well. It depends on the problem you are trying to accomplish.
If you must use dynamically created global variables I would strongly recommend getting past the new to Python stage before doing so. Again, the current code patterns and data structures exist for a reason, and I would discourage willingly avoiding them in favor of a workaround style solution.
Dynamically creating variables can be done, but it is not wise. Maintenance is a nightmare. It would be better to store the functions in a dictionary with the key value being what the dynamically created variable would have been. This should give you an idea of how it can be done:
#!/usr/bin/python
h = {'hello': 'world', 'good': 'day', 'see': 'you' }
def func(v):
def newfunc():
return v
return newfunc
for k,v in h.items():
h[k] = func(v)
a = h['hello']()
b = h['good']()
c = h['see']()
print("a = {}".format(a))
print("b = {}".format(b))
print("c = {}".format(c))
First of all, are those values callable functions or just string values?
If they are some callable functions, something like:
a = {'hello': hello, 'world': world}
It is simple and straight forward:
A = {'hello': hello, 'world': world}
def foo(var):
callback = A.get(var, None)
# You cancheck and raise when the value
# is not a method.
if not callable(callback):
raise
return callback
foo('hello')
You can put the variable, fn pairs in a dict.
Also some comments:
you don't use the index i in the for loop so there is no point in using enumerate.
there is no point renaming variab to varname. If you want to use this name then just use it from the beginning.
you can iterate the dict_keys so there is no need for the varlist = list(var.keys()) line, you can just use for variab in var.keys()...
... actually you don't even need the var.keys(). for key in dictionary iterates through the keys of the dictionary, so you can just use for variab in var.
So something like this would work:
fn_dict = {}
for varname in var:
fn_dict[varname] = func(varname)
At the end of the loop you will have the fn_dict populated with the key, function pairs you want.

Updating dictionary with randint performing unexpectedly

I'm trying to run a simple program in which I'm trying to run random.randint() in a loop to update a dictionary value but it seems to be working incorrectly. It always seems to be generating the same value.
The program so far is given below. I'm trying to create a uniformly distributed population, but I'm unsure why this isn't working.
import random
__author__ = 'navin'
namelist={
"person1":{"age":23,"region":1},
"person2":{"age":24,"region":2},
"person3":{"age":25,"region":0}
}
def testfunction():
default_val={"age":23,"region":1}
for i in xrange(100):
namelist[i]=default_val
for index in namelist:
x = random.randint(0, 2)
namelist[index]['region']=x
print namelist
if __name__ == "__main__" :
testfunction()
I'm expecting the 103 people to be roughly uniformly distributed across region 0-2, but I'm getting everyone in region 0.
Any idea why this is happening? Have I incorrectly used randint?
It is because all your 100 dictionary entries created in the for loop refer to not only the same value, but the same object. Thus there are only 4 distinct dictionaries at all as the values - the 3 created initially and the fourth one that you add 100 times with keys 0-99.
This can be demonstrated with the id() function that returns distinct integer for each distinct object:
from collections import Counter
...
ids = [ id(i) for i in namelist.values() ]
print Counter(ids)
results in:
Counter({139830514626640: 100, 139830514505160: 1,
139830514504880: 1, 139830514505440: 1})
To get distinct dictionaries, you need to copy the default value:
namelist[i] = default_val.copy()
Or create a new dictionary on each loop
namelist[i] = {"age": 23, "region": 1}
default_val={"age":23,"region":1}
for i in xrange(100):
namelist[i]=default_val
This doesn't mean "set every entry to a dictionary with these particular age and region values". This means "set every entry to this particular dictionary object".
for index in namelist:
x = random.randint(0, 2)
namelist[index]['region']=x
Since every object in namelist is really the same dictionary, all modifications in this loop happen to the same dictionary, and the last value of x wipes the others.
Evaluating a dict literal creates a new dict; assignment does not. If you want to make a new dictionary each time, put the dict literal in the loop:
for i in xrange(100):
namelist[i]={"age":23,"region":1}
Wanted to add this as a comment but the link is too long. As others have said you have just shared the reference to the dictionary, if you want to see the visualisation you can check it out on Python Tutor it should help you grok what's happening.

Python - Updating value in one dictionary is updating value in all dictionaries

I have a list of dictionaries called lod. All dictionaries have the same keys but different values. I am trying to update one specific value in the list of values for the same key in all the dictionaries.
I am attempting to do it with the following for loop:
for i in range(len(lod)):
a=lod[i][key][:]
a[p]=a[p]+lov[i]
lod[i][key]=a
What's happening is each is each dictionary is getting updated len(lod) times so lod[0][key][p] is supposed to have lov[0] added to it but instead it is getting lov[0]+lov[1]+.... added to it.
What am I doing wrong?
Here is how I declared the list of dicts:
lod = [{} for _ in range(len(dataul))]
for j in range(len(dataul)):
for i in datakl:
rrdict[str.split(i,',')[0]]=list(str.split(i,',')[1:len(str.split(i,','))])
lod[j]=rrdict
The problem is in how you created the list of dictionaries. You probably did something like this:
list_of_dicts = [{}] * 20
That's actually the same dict 20 times. Try doing something like this:
list_of_dicts = [{} for _ in range(20)]
Without seeing how you actually created it, this is only an example solution to an example problem.
To know for sure, print this:
[id(x) for x in list_of_dicts]
If you defined it in the * 20 method, the id is the same for each dict. In the list comprehension method, the id is unique.
This it where the trouble starts: lod[j] = rrdict. lod itself is created properly with different dictionaries. Unfortunately, afterwards any references to the original dictionaries in the list get overwritten with a reference to rrdict. So in the end, the list contains only references to one single dictionary. Here is some more pythonic and readable way to solve your problem:
lod = [{} for _ in range(len(dataul))]
for rrdict in lod:
for line in datakl:
splt = line.split(',')
rrdict[splt[0]] = splt[1:]
You created the list of dictionaries correctly, as per other answer.
However, when you are updating individual dictionaries, you completely overwrite the list.
Removing noise from your code snippet:
lod = [{} for _ in range(whatever)]
for j in range(whatever):
# rrdict = lod[j] # Uncomment this as a possible fix.
for i in range(whatever):
rrdict[somekey] = somevalue
lod[j] = rrdict
Assignment on the last line throws away the empty dict that was in lod[j] and inserts a reference to the object represented by rrdict.
Not sure what your code does, but see a commented-out line - it might be the fix you are looking for.

getting Python variable name in runtime

This is different from retrieving variable/object name at run time.
2G_Functions={'2G_1':2G_f1,'2G_2':2G_f2}
3G_Functions={'3G_1':3G_f1,'3G_2':3G_f2}
myFunctionMap=[2G_Functions,3G_Functions]
for i in myFunctionMap:
print i.??? "\n"
for j in i:
print str(j)
I want the output look like below.
2G_Functions:
2G_1
2G_2
3G_Functions:
3G_1
3G_2
How can I get the name of dictionary variable in my code?I dont know which I am calling in the loop to know its name beforehand.
Despite the pessimism of the other answers, in this particular case you actually can do what you're asking for if there are no other names names assigned to the objects identified by G2_Functions and G3_Functions (I took the liberty of fixing your names, which are not valid Python identifiers as given.) That being said, this is a terrible, terrible, terrible idea and you should not do it, because it will eventually break and you'll be sad. So don't do it. Ever.
The following is analogous to what you're trying to do:
alpha = {'a': 1, 'b': 2}
beta = {'c': 2, 'd': 4}
gamma = [alpha, beta]
listOfDefinedLocals = list(locals().iteritems())
for x, y in listOfDefinedLocals:
if y is gamma[0]: print "gamma[0] was originally named " + x
if y is gamma[1]: print "gamma[1] was originally named " + x
This will output:
gamma[1] was originally named beta
gamma[0] was originally named alpha
I accept no responsibility for what you do with this information. It's pretty much guaranteed to fail exactly when you need it. I'm not kidding.
You can't. The myFunctionMap list contains the objects, not the name attached to them 2 lines above. BTW, calling a list variable "map" isn't a good practice, maps are usually dictionaries.
You can't start a variable name with a digit, so 2G_Functions and 3G_Functions won't work.
You can sidestep the problem by creating a dictionary with appropriate names
e.g.
myFunctionMap = {
"2G_Functions" : { ... },
"3G_Functions" : { ... },
}
for (name, functions) in myFunctionMap.iteritems():
print name
for func in functions.keys():
print func
In short, you can't.
In longer, it is sort of possible if you poke deep into, I think, the gc module (for the general case) or use locals() and globals()… But it's likely a better idea to simply define the list like this:
myFunctionMap = [ ("someName", someName), … ]
for name, map in myFunctionMap:
print name
…
Try making your list of lists as a list of strings instead:
d2G_Functions={'2G_1':"2G_f1",'2G_2':"2G_f2"}
d3G_Functions={'3G_1':"3G_f1",'3G_2':"3G_f2"}
myFunctions=["2G_Functions","3G_Functions"]
for dict_name in myFunctions:
print dict_name
the_dict = eval("d"+dict_name)
for j in the_dict:
print str(j)
(I changed the name of your original variables since python identifiers cannot begin with a digit)

Categories