How does this for-loop within this dictionary work exactly? - python

Currently I'm learning Python text sentiment module via this online course and the lecturer failed to explain in enough detail how this piece of code works. I tried searching each piece of code individually to try piece together how he did it but it makes no sense to me.
So how does this code work? Why is there a for loop within dictionary braces?
What is the logic behind x before the for y in emotion_dict.values() then for x in y at the end?
What is the purpose behind emotion_dict=emotion_dict within the parentheses? Wouldn't just emotion_dict do?
def emotion_analyzer(text,emotion_dict=emotion_dict):
#Set up the result dictionary
emotions = {x for y in emotion_dict.values() for x in y}
emotion_count = dict()
for emotion in emotions:
emotion_count[emotion] = 0
#Analyze the text and normalize by total number of words
total_words = len(text.split())
for word in text.split():
if emotion_dict.get(word):
for emotion in emotion_dict.get(word):
emotion_count[emotion] += 1/len(text.split())
return emotion_count

1 and 2
The line emotions = {x for y in emotion_dict.values() for x in y} uses a set comprehension. It builds a set, not a dictionary (though dictionary comprehensions also exist and look somewhat similar). It is shorthand notation for
emotions = set() # Empty set
# Loop over all values (not keys) in the pre-existing dictionary emotion_dict
for y in emotion_dict.values():
# The values y are some kind of container.
# Loop over each element in these containers.
for x in y:
# Add x to the set
emotions.add(x)
The x right after the { in the original set comprehension signifies which value to store in the set. In all, emotions is just a set (with no repeats) of all elements within all containers within the dictionary emotion_dict. Try printing out emotion_dict and emotion and compare.
3
In the function definition,
def emotion_analyzer(text, emotion_dict=emotion_dict):
the emotion_dict=emotion_dict means that the local variable with name emotion_dict gets set to the global variable similarly named emotion_dict, if you do not pass anything as the second argument. This is an example of a default argument.

Related

Concatenate strings to form a variable name

I have a number of variables in my python script containing integers e.g.,
lab19 = 100-50 #50
lab20 = 200-20 #180
I have a while loop that loops through an incrementing counter calling a function each time. The function needs to pass the variable, but the 19 and 20 parts of the variable name come from the counter.
I have tried this,
y = 1
while y < 21:
bundleRun('lab' + str(y))
y += 1
but that is passing the literal string 'lab1' value to the function.
How do I get my code to pass the variable value (e.g., 50) to my function based on the counter?
Thanks
I think you should store your data in a list or dict so it can be accessed by index or by key. But if you really need to access a local variable by name you can use locals()
y = 1
while y < 21:
a = locals()['lab' + str(y)] # value stored in variable a
y += 1
Whilst you can do this, you shouldn't.[^1] Dynamically created variables in python are a code smell (unlike e.g. TeX, where this kind of thing is routine).
Instead, store everything in a collection:
results = {"lab19": 50, "lab20": 180}
for lab, result in results.keys():
bundleRun(result)
Note that your while loop could be better written as:
for y in range(1,21):
...
If you really do need to do this, the other answer with locals() is the way to go.
[^1] but it could be worse: you're only trying to access them. Perhaps you have to do this---but if you have control over the variables the pythonic way is to use a collection.

Dictionary creation code. What is going on here most likely?

I am looking at this code:
DICT_IDS = dict(x.split('::')
for x in object.method()
['ids_comma_separated'].split(','))
DICT_ATTRS = dict(x.split('::')
for x in object.method()
['comma_separated_key_value_pairs'].split(','))
So each constanty will ultimately refer to a dictionary, but what is going on inside the constructors?
Does this occur first:
x.split('::')
for x in object.method()
So x must be a string that is split on the ::? right?
EDIT
Oh....
for x in object.method()
['ids_comma_separated'].split(',')
is executed first. x is probably another dictionary that we key into using ids_comma_separated whose value is a string that needs to be split on the , like "cat,dog, mouse" into a list. So x is going to be a list?
It is just parsing values like this into a dict:
'ids_comma_separated': "somekey::somevalue,anotherkey::anothervalue"
from a method (object.method()) that returns a dictionary:
class object:
def method():
return {
'ids_comma_separated': "somekey::somevalue,anotherkey::anothervalue"
}
DICT_IDS = dict(x.split('::')
for x in object.method()
['ids_comma_separated'].split(','))
DICT_IDS
# {'somekey': 'somevalue', 'anotherkey': 'anothervalue'}
The part inside the dict() is a generator comprehension but the line breaks make it a little hard to see that:
(x.split('::') for x in object.method()['ids_comma_separated'].split(','))
in each iteration x is somekey::somevalue which gets split once again.

Python references to references in python

I have a function that takes given initial conditions for a set of variables and puts the result into another global variable. For example, let's say two of these variables is x and y. Note that x and y must be global variables (because it is too messy/inconvenient to be passing large amounts of references between many functions).
x = 1
y = 2
def myFunction():
global x,y,solution
print(x)
< some code that evaluates using a while loop >
solution = <the result from many iterations of the while loop>
I want to see how the result changes given a change in the initial condition of x and y (and other variables). For flexibility and scalability, I want to do something like this:
varSet = {'genericName0':x, 'genericName1':y} # Dict contains all variables that I wish to alter initial conditions for
R = list(range(10))
for r in R:
varSet['genericName0'] = r #This doesn't work the way I want...
myFunction()
Such that the 'print' line in 'myFunction' outputs the values 0,1,2,...,9 on successive calls.
So basically I'm asking how do you map a key to a value, where the value isn't a standard data type (like an int) but is instead a reference to another value? And having done that, how do you reference that value?
If it's not possible to do it the way I intend: What is the best way to change the value of any given variable by changing the name (of the variable that you wish to set) only?
I'm using Python 3.4, so would prefer a solution that works for Python 3.
EDIT: Fixed up minor syntax problems.
EDIT2: I think maybe a clearer way to ask my question is this:
Consider that you have two dictionaries, one which contains round objects and the other contains fruit. Members of one dictionary can also belong to the other (apples are fruit and round). Now consider that you have the key 'apple' in both dictionaries, and the value refers to the number of apples. When updating the number of apples in one set, you want this number to also transfer to the round objects dictionary, under the key 'apple' without manually updating the dictionary yourself. What's the most pythonic way to handle this?
Instead of making x and y global variables with a separate dictionary to refer to them, make the dictionary directly contain "x" and "y" as keys.
varSet = {'x': 1, 'y': 2}
Then, in your code, whenever you want to refer to these parameters, use varSet['x'] and varSet['y']. When you want to update them use varSet['x'] = newValue and so on. This way the dictionary will always be "up to date" and you don't need to store references to anything.
we are going to take an example of fruits as given in your 2nd edit:
def set_round_val(fruit_dict,round_dict):
fruit_set = set(fruit_dict)
round_set = set(round_dict)
common_set = fruit_set.intersection(round_set) # get common key
for key in common_set:
round_dict[key] = fruit_dict[key] # set modified value in round_dict
return round_dict
fruit_dict = {'apple':34,'orange':30,'mango':20}
round_dict = {'bamboo':10,'apple':34,'orange':20} # values can even be same as fruit_dict
for r in range(1,10):
fruit_set['apple'] = r
round_dict = set_round_val(fruit_dict,round_dict)
print round_dict
Hope this helps.
From what I've gathered from the responses from #BrenBarn and #ebarr, this is the best way to go about the problem (and directly answer EDIT2).
Create a class which encapsulates the common variable:
class Count:
__init__(self,value):
self.value = value
Create the instance of that class:
import Count
no_of_apples = Count.Count(1)
no_of_tennis_balls = Count.Count(5)
no_of_bananas = Count.Count(7)
Create dictionaries with the common variable in both of them:
round = {'tennis_ball':no_of_tennis_balls,'apple':no_of_apples}
fruit = {'banana':no_of_bananas,'apple':no_of_apples}
print(round['apple'].value) #prints 1
fruit['apple'].value = 2
print(round['apple'].value) #prints 2

Updating dictionary with randint performing unexpectedly

I'm trying to run a simple program in which I'm trying to run random.randint() in a loop to update a dictionary value but it seems to be working incorrectly. It always seems to be generating the same value.
The program so far is given below. I'm trying to create a uniformly distributed population, but I'm unsure why this isn't working.
import random
__author__ = 'navin'
namelist={
"person1":{"age":23,"region":1},
"person2":{"age":24,"region":2},
"person3":{"age":25,"region":0}
}
def testfunction():
default_val={"age":23,"region":1}
for i in xrange(100):
namelist[i]=default_val
for index in namelist:
x = random.randint(0, 2)
namelist[index]['region']=x
print namelist
if __name__ == "__main__" :
testfunction()
I'm expecting the 103 people to be roughly uniformly distributed across region 0-2, but I'm getting everyone in region 0.
Any idea why this is happening? Have I incorrectly used randint?
It is because all your 100 dictionary entries created in the for loop refer to not only the same value, but the same object. Thus there are only 4 distinct dictionaries at all as the values - the 3 created initially and the fourth one that you add 100 times with keys 0-99.
This can be demonstrated with the id() function that returns distinct integer for each distinct object:
from collections import Counter
...
ids = [ id(i) for i in namelist.values() ]
print Counter(ids)
results in:
Counter({139830514626640: 100, 139830514505160: 1,
139830514504880: 1, 139830514505440: 1})
To get distinct dictionaries, you need to copy the default value:
namelist[i] = default_val.copy()
Or create a new dictionary on each loop
namelist[i] = {"age": 23, "region": 1}
default_val={"age":23,"region":1}
for i in xrange(100):
namelist[i]=default_val
This doesn't mean "set every entry to a dictionary with these particular age and region values". This means "set every entry to this particular dictionary object".
for index in namelist:
x = random.randint(0, 2)
namelist[index]['region']=x
Since every object in namelist is really the same dictionary, all modifications in this loop happen to the same dictionary, and the last value of x wipes the others.
Evaluating a dict literal creates a new dict; assignment does not. If you want to make a new dictionary each time, put the dict literal in the loop:
for i in xrange(100):
namelist[i]={"age":23,"region":1}
Wanted to add this as a comment but the link is too long. As others have said you have just shared the reference to the dictionary, if you want to see the visualisation you can check it out on Python Tutor it should help you grok what's happening.

Python large list manipulation

I have python list like below:
DEMO_LIST = [
[{'unweighted_criket_data': [-46.14554728131345, 2.997789122813151, -23.66171024766996]},
{'weighted_criket_index_input': [-6.275794430258629, 0.4076993207025885, -3.2179925936831144]},
{'manual_weighted_cricket_data': [-11.536386820328362, 0.7494472807032877, -5.91542756191749]},
{'average_weighted_cricket_data': [-8.906090625293496, 0.5785733007029381, -4.566710077800302]}],
[{'unweighted_football_data': [-7.586729834820534, 3.9521665714843675, 5.702038461085529]},
{'weighted_football_data': [-3.512655913521907, 1.8298531225972623, 2.6400438074826]},
{'manual_weighted_football_data': [-1.8966824587051334, 0.9880416428710919, 1.4255096152713822]},
{'average_weighted_football_data': [-2.70466918611352, 1.4089473827341772, 2.0327767113769912]}],
[{'unweighted_rugby_data': [199.99999999999915, 53.91020408163265, -199.9999999999995]},
{'weighted_rugby_data': [3.3999999999999857, 0.9164734693877551, -3.3999999999999915]},
{'manual_rugby_data': [49.99999999999979, 13.477551020408162, -49.99999999999987]},
{'average_weighted_rugby_data': [26.699999999999886, 7.197012244897959, -26.699999999999932]}],
[{'unweighted_swimming_data': [2.1979283454982053, 14.079951031527246, -2.7585499298828777]},
{'weighted_swimming_data': [0.8462024130168091, 5.42078114713799, -1.062041723004908]},
{'manual_weighted_swimming_data': [0.5494820863745513, 3.5199877578818115, -0.6896374824707194]},
{'average_weighted_swimming_data': [0.6978422496956802, 4.470384452509901, -0.8758396027378137]}]]
I want to manipulate list items and do some basic math operation,like getting each data type list (example taking all first element of unweighted data and do sum etc)
Currently I am doing it like this.
The current solution is a very basic one, I want to do it in such way that if the list length is grown, it can automatically calculate the results. Right now there are four list, it can be 5 or 8,the final result should be the summation of all the first element of unweighted values,example:
now I am doing result_u1/4,result_u2/4,result_u3/4
I want it like result_u0/4,result_u1/4.......result_n4/4 # n is the number of list inside demo list
Any idea how I can do that?
(sorry for the beginner question)
You can implement a specific list class for yourself, that adds your summary with new item's values in append function, or decrease them on remove:
class MyList(list):
def __init__(self):
self.summary = 0
list.__init__(self)
def append(self, item):
self.summary += item.sample_value
list.append(self, item)
def remove(self, item):
self.summary -= item.sample_value
list.remove(self, item)
And a simple usage:
my_list = MyList()
print my_list.summary # Outputs 0
my_list.append({'sample_value': 10})
print my_list.summary # Outputs 10
In Python, whenever you start counting how many there are of something inside an iterable (a string, a list, a set, a collection of any of these) in order to loop over it - its a sign that your code can be revised.
Things can can work for 3 of something, can work for 300, 3000 and 3 million of the same thing without changing your code.
In your case, your logic is - "For every X inside DEMO_LIST, do something"
This translated into Python is:
for i in DEMO_LIST:
# do something with i
This snippet will run through any size of DEMO_LIST and each time i is each of whatever is in side DEMO_LIST. In your case it is the list that contains your dictionaries.
Further expanding on that, you can say:
for i in DEMO_LIST:
for k in i:
# now you are in each list that is inside the outer DEMO_LIST
Expanding this to do a practical example; a sum of all unweighted_criket_data:
all_unweighted_cricket_data = []
for i in DEMO_LIST:
for k in i:
if 'unweighted_criket_data' in k:
for data in k['unweighted_cricket_data']:
all_unweighted_cricked_data.append(data)
sum_of_data = sum(all_unweighted_cricket_data)
There are various "shortcuts" to do the same, but you can appreciate those once you understand the "expanded" version of what the shortcut is trying to do.
Remember there is nothing wrong with writing it out the 'long way' especially when you are not sure of the best way to do something. Once you are comfortable with the logic, then you can use shortcuts like list comprehensions.
Start by replacing this:
for i in range(0,len(data_list)-1):
result_u1+=data_list[i][0].values()[0][0]
result_u2+=data_list[i][0].values()[0][1]
result_u3+=data_list[i][0].values()[0][2]
print "UNWEIGHTED",result_u1/4,result_u2/4,result_u3/4
With this:
sz = len(data_list[i][0].values()[0])
result_u = [0] * sz
for i in range(0,len(data_list)-1):
for j in range(0,sz):
result_u[j] += data_list[i][0].values()[0][j]
print "UNWEIGHTED", [x/len(data_list) for x in result_u]
Apply similar changes elsewhere. This assumes that your data really is "rectangular", that is to say every corresponding inner list has the same number of values.
A slightly more "Pythonic"[*] version of:
for j in range(0,sz):
result_u[j] += data_list[i][0].values()[0][j]
is:
for j, dataval in enumerate(data_list[i][0].values()[0]):
result_u[j] += dataval
There are some problems with your code, though:
values()[0] might give you any of the values in the dictionary, since dictionaries are unordered. Maybe it happens to give you the unweighted data, maybe not.
I'm confused why you're looping on the range 0 to len(data_list)-1: if you want to include all the sports you need 0 to len(data_list), because the second parameter to range, the upper limit, is excluded.
You could perhaps consider reformatting your data more like this:
DEMO_LIST = {
'cricket' : {
'unweighted' : [1,2,3],
'weighted' : [4,5,6],
'manual' : [7,8,9],
'average' : [10,11,12],
},
'rugby' : ...
}
Once you have the same keys in each sport's dictionary, you can replace values()[0] with ['unweighted'], so you'll always get the right dictionary entry. And once you have a whole lot of dictionaries all with the same keys, you can replace them with a class or a named tuple, to define/enforce that those are the values that must always be present:
import collections
Sport = collections.namedtuple('Sport', 'unweighted weighted manual average')
DEMO_LIST = {
'cricket' : Sport(
unweighted = [1,2,3],
weighted = [4,5,6],
manual = [7,8,9],
average = [10,11,12],
),
'rugby' : ...
}
Now you can replace ['unweighted'] with .unweighted.
[*] The word "Pythonic" officially means something like, "done in the style of a Python programmer, taking advantage of any useful Python features to produce the best idiomatic Python code". In practice it usually means "I prefer this, and I'm a Python programmer, therefore this is the correct way to write Python". It's an argument by authority if you're Guido van Rossum, or by appeal to nebulous authority if you're not. In almost all circumstances it can be replaced with "good IMO" without changing the sense of the sentence ;-)

Categories