Doing calculations while creating a List (in Python) - python

I'm getting data from an API and storing it on Python dictionary (and then a list of dictionaries).
I need to do calculations (max, sum, divisions...) on the dictionary data to create extra data to add to the same dictionary/list.
My current code looks like this:
stream = whatever (whatever, whatever)
keywords = []
for batch in stream:
for row in batch.results:
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords = {}
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)
This doesn't work, it gives UnboundLocalError (local variable 'data_keywords' referenced before assignment). I've tried different options and got different errors.
data_keywords["etv"] is what I want to calculate ("max_clicks", "weighted_clicks" and data_keywords["keywords_weighted"] are intermediate calculations for that)
The main problem is that I need to calculate max and sum for all values inside the dictionary, then do a calculation using that max and sum for each value and then store the results in the dictionary itself.
So I don't know where to put the code to do the calculations (before the dictionary, inside the dictionary, after the dictionary or a mix)
I guess it should be possible, but I'm a Python/programming newbie and can't figure this out.
It's probably not relevant, but in case you are wondering, I'm trying to create a weighted sort (https://moz.com/blog/build-your-own-weighted-sort). And I can't use models/database to store data.
Thanks!
EDIT: Some extra info, in case it helps understand better what I need: The results that the keywords list gives without the calculations is something like this:
[{'keywords_text': 'whatever', 'keywords_clicks': 5, 'keywords_conversion_rate': 6.3}, {'keywords_text': 'whatever2', 'keywords_clicks': 50, 'keywords_conversion_rate': 2.3}, {'keywords_text': 'whatever3', 'keywords_clicks': 20, 'keywords_conversion_rate': 2.0}]
I want basically to add to this keywords list a new key/value of 'etv': 8.5 or whatever for each keyword. That etv should come from the formula that I put on my code (data_keywords["etv"] = ...) but maybe it needs changes to work in Python.
The info from this "original" keywords list comes directly from the API (I don't have that data stored anywhere) and it works perfectly if I just request the info and store it in that list. But when the problems come when I introduce the calculations (specially using sum and max inside a loop I guess).

The UnboundLocalError is because you are trying to access data_keywords["keywords_clicks"] before you have declared data_keywords or set the value for "keywords_clicks".
Also, I think you need to be clearer about what data structure you are trying to create. You mention "a list of dictionaries" which I don't see. Maybe you are trying to create a dictionary of lists, but it looks like you overwrite the dictionary values each time you go through your loop.

adding my response as an answer, as I do not have enough reputation to comment
To get rid of assignment error just move the line data_keywords = {} above max_clicks = max(data_keywords["keywords_clicks"])
Here you are trying to access a local variable before its declaration. The code in this case is trying to access a global variable which doesn't seems to exist.
stream = whatever (whatever, whatever)
keywords = []
for batch in stream:
for row in batch.results:
data_keywords = {}
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)
More on that here

You can't refer to elements of the dictionary before you create it. Move those variable assignments down to after you assign the dictionary elements.
for batch in stream:
for row in batch.results:
data_keywords = {}
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)

Related

Python Refactoring - Changing Variable Type and Value Within a Loop

I'm working on automating some word and PDF documents that need to be updated on a certain cadence.
The way I'm doing this is using dictionaries that replace variables within word documents.
My code works but because my area is not tech savvy I'm using an excel file so people can replace the values in that file whenever they need to update the documents.
I was also successful on pulling the dictionary key and values from excel but I'm trying to refactor this code which is repetitive. Here is an excerpt with 2 of the 7 dictionaries I'm creating:
dic = pd.read_excel('test.xlsx',"AD")
AD = dict(zip(dic.Key,dic.Value))
dic = pd.read_excel('test.xlsx',"RSM")
RSM = dict(zip(dic.Key,dic.Value))
I'm trying to refactor this so I can run it all within a single loop and trying something like this:
import pandas as pd
AD = "AD"
RSM = "RSM"
groups = [AD, RSM]
for item in groups:
dic = pd.read_excel('test.xlsx',item)
item = dict(zip(dic.Key,dic.Value))
So I'm basically first using the variable as a string to call the excel tab within the read_excel method and then I want to replace that same variable to become the output dictionary.
When I print item within the loop I do get the correct dictionaries but I'm not able to output a variable that stores each dictionary that the loop creates.
Any help would be appreciated.
Thanks!
You're almost there, you can just have a dictionary of dictionaries:
import pandas as pd
groups = ['AD', 'RSM']
dicts = {}
for item in groups:
dic = pd.read_excel('test.xlsx', item)
dicts[item] = dict(zip(dic.Key, dic.Value))
Now you can just access them like this:
print(dicts['AD']['some key'])
The values of a dictionary can be anything, including other dictionaries. Keys of dictionaries can be many things as well, as long as they're hashable, and strings are a common choice of course - and the names of your groups are just that.
Also note that I removed the variables named AD and RSM. You don't really achieve anything by having variables that are named after the string value they are assigned. It only serves to be able to leave off the quotes where you use the values, but it creates an additional indirection that serves no purpose.
If you don't even need the list of groups, but just want groups to be the actual dictionaries:
import pandas as pd
groups = {}
for item in ['AD', 'RSM']:
dic = pd.read_excel('test.xlsx', item)
groups[item] = dict(zip(dic.Key, dic.Value))
The problem is that you assign the result to the item variable and not to an entry in the list.
A simple fix would be to use a dictionary instead of a list to save the reult, eg
import pandas as pd
AD = "AD"
RSM = "RSM"
groups = {AD: None, RSM: None}
for item in groups.keys():
dic = pd.read_excel('test.xlsx',item)
groups[item] = dict(zip(dic.Key,dic.Value))
My suggestion would be to use an overall dictionary to track your work and also to save the results there. I refactored your code slightly to this:
import pandas as pd
groups = dict.fromkeys(('AD', 'RSM')) # setup main dict containing dicts
for item in groups:
dic = pd.read_excel('test.xlsx', item)
groups[item] = dict(zip(dic.Key, dic.Value)) # store individual dict
There's no need for your global constants that are used only once, so I removed those. I also added some spaces to help your Python code conform with PEP-8, the global standard style guide.
Now you can access each dictionary as you like, for example, groups['AD'].

Defining max/min stats from nested JSON data in python

I'd like to preface this with I'm new to python (and not traditionally a programmer) so all suggestions to clean up any of the code (even not related to the below problem) are entirely welcome. I've been stuck on this for a couple of days now so I figured I'd give it a shot here.
I have a script that calls a RESTful API via requests package, parses returned JSON data via JSON package, assigns data to variables and then writes to a csv file via csv package (which is later via vba script written to an excel file.) This all seems to work fine, but currently its writing individual data points to excel and while I'd like to keep doing that, I'd also like to calculate summary statistics for that data (min, max, average, standard deviation, etc) in python before writing to a separate CSV output file.
I would imagine the correct way to do this is to write those saved variables (initially from JSON) to a nested dictionary/list and then use max/min functions etc. on the correct list, but I'm having trouble constructing the nested dictionaries dynamically.
To be clear, each data point in the jData['ProductActivity'] node is a separate transaction. I'm trying to build a dictionary that logically looks like:
[prodsize1]:
'bid':
[bid1]
[bid2]
'ask':
[ask1]
[ask2]
'trade':
[trade1]
[trade2]
[prodsize2]:
'bid':
[bid1]
[bid2]
'ask':
[ask1]
[ask2]
'trade':
[trade1]
[trade2]
Where [prodSize] key and [bid][trade] & [ask] value lists are all being added dynamically off of the jData.
code:
state_dict = {"trade": "3", "bid": "2", "ask": "1"}
market_activity = {}
bid_list = []
ask_list = []
trade_list = []
def get_hist(side):
state = state_dict.get(side)
jData = json.loads(myResponse.content)
page_length = len(jData['ProductActivity'])
for i in range (0, page_length):
chainId = jData['ProductActivity'][i]['chainId']
skuUuid = jData['ProductActivity'][i]['skuUuid']
createdAt = jData['ProductActivity'][i]['createdAt']
prodSize = float(jData['ProductActivity'][i]['prodSize'])
amount = float(jData['ProductActivity'][i]['amount'])
localAmount = float(jData['ProductActivity'][i]['localAmount'])
localCurrency = jData['ProductActivity'][i]['localCurrency']
productId = jData['ProductActivity'][i]['productId']
customerId = jData['ProductActivity'][i]['customerId']
if "frequency" in jData['ProductActivity'][i]:
frequency = jData['ProductActivity'][i]['frequency']
else:
frequency = 1
csv_writer.writerow([chainId, skuUuid, createdAt, styleId, name, target_product, side, prodSize, amount, localAmount,
frequency, localCurrency, productId, customerId])
if side == 'bid':
market_activity[prodSize] = {'bid': bid_list.append(amount)}
elif side == 'ask':
market_activity[prodSize] = {'ask': ask_list.append(amount)}
elif side == 'trade':
market_activity[prodSize] = {'trade': trade_list.append(amount)}
myResponse.raise_for_status()
get_hist(side="trade")
get_hist(side="bid")
get_hist(side="ask")
available_sizes = []
for key in market_activity.keys():
available_sizes.append(key)
summary_stats={'max_bid': '','min_ask': '','avg_trade': ''}
def generate_summary_stats:
for size in available_shoe_sizes:
summary_stats[size].update(max(market_info[size]['bid']))
summary_stats[size].update(min(market_info[size]['ask']))
#add in rest of stats
generate_summary_stats()
data_to_file.close()
I think I may need to add new keys separately and then append the lists stored as values. I also fear that the way I have it written will write over 'state' (bid, ask, trade) values instead of add to each list.
It's difficult to understand what you're expecting to get without an example. Can you post a sample of what your JSON data looks like, and how you ultimately want the market_activity dictionary to look? As for overwriting, notice that market_activity[prodSize] = {'bid': bid_list.append(amount)} for example tries to assign a new value to market_activity[prodSize] on each run. Maybe you want something more like market_activity[prodSize]['bid'].append(amount) here. Though you'll have to set the initial {'bid': []} empty list before you can "append" anything to it.
Also, since you asked for some general Python coding suggestions:
market_activity.keys() already returns a list, so you should be able to just replace for size in available_shoe_sizes: with for size in market_activity.keys():, and forget about making the available_shoe_sizes list completely
Since "side" is the only input variable in your get_hist()
function, you can just call the function as get_hist("trade") for
example. In general, the order of the parameters you pass in just has to match
up with the order they're defined.
I'm not sure where market_info[] comes from. Is that supposed to be market_activity?
In general, definitions (functions) should be put near the top of your code, instead of being defined in between other commands you're running.
Likewise, generate_summary_stats should have the parameters passed through it, instead of having them be global. Then return the value you want, which is probably the dictionary summary_stats. So something more like this:
.
def generate_summary_stats(sizes):
summary_stats={'max_bid': '','min_ask': '','avg_trade': ''}
for size in sizes:
## Here's the loop where you'd insert your code
## for updating summary_stats properly.
return summary_stats
# Getting the stats, outside of your function.
stats = generate_summary_stats(market_activity)

How can I rename a dictionary within a program?

I ask the user of my program to input the number of datasets he/she wants to investigate, e.g. three datasets. Accordingly, I should then create three dictionaries (dataset_1, dataset_2, and dataset_3) to hold the values for the various parameters. Since I do not know beforehand the number of datasets the user wants to investigate, I have to create and name the dictionaries within the program.
Apparently, Python does not let me do that. I could not rename the dictionary once it has been created.
I have tried using os.rename("oldname", "newname"), but that only works if I have a file stored on my computer hard disk. I could not get it to work with an object that lives only within my program.
number_sets = input('Input the number of datasets to investigate:')
for dataset in range(number_sets):
init_dict = {}
# create dictionary name for the particular dataset
dict_name = ''.join(['dataset_', str(dataset+1)])
# change the dictionary´s name
# HOW CAN I CHANGE THE DICTIONARY´S NAME FROM "INIT_DICT"
# TO "DATASET_1", WHICH IS THE STRING RESULT FOR DICT_NAME?
I would like to have in the end
dataset_1 = {}
dataset_2 = {}
and so on.
You don't (need to). Keep a list of data sets.
datasets = []
for i in range(number_sets):
init_dict = {}
...
datasets.append(init_dict)
Then you have datasets[0], datasets[1], etc., rather than dataset_1, dataset_2, etc.
Inside the loop, init_dict is set to a brand new empty directory at the top of each iteration, without affecting the dicts added to datasets on previous iterations.
If you want to create variables like that you could use the globals
number_sets = 2
for dataset in range(number_sets):
dict_name = ''.join(['dataset_', str(dataset+1)])
globals() [dict_name] = {}
print(dataset_1)
print(dataset_2)
However this is not a good practice, and it should be avoided, if you need to keep several variables that are similar the best thing to do is to create a list.
You can use a single dict and then add all the data sets into it as a dictionary:
all_datasets = {}
for i in range(number_sets):
all_datasets['dataset'+str(i+1)] = {}
And then you can access the data by using:
all_datasets['dataset_1']
This question gets asked many times in many different variants (this is one of the more prominent ones, for example). The answer is always the same:
It is not easily possible and most of the time not a good idea to create python variable names from strings.
The more easy, approachable, safe and usable way is to just use another dictionary. One of the cool things about dictionaries: any object can become a key / value. So the possibilities are nearly endless. In your code, this can be done easily with a dict comprehension:
number_sets = int(input('Input the number of datasets to investigate:')) # also notice that you have to add int() here
data = {''.join(['dataset_', str(dataset + 1)]): {} for dataset in range(number_sets)}
print(data)
>>> 5
{'dataset_1': {}, 'dataset_2': {}, 'dataset_3': {}, 'dataset_4': {}, 'dataset_5': {}}
Afterwards, these dictionaries can be easily accessed via data[name_of_dataset]. Thats how it should be done.

Create many empty dictionary in Python

I'm trying to create many dictionaries in a for loop in Python 2.7. I have a list as follows:
sections = ['main', 'errdict', 'excdict']
I want to access these variables, and create new dictionaries with the variable names. I could only access the list sections and store an empty dictionary in the list but not in the respective variables.
for i in enumerate(sections):
sections[i] = dict()
The point of this question is. I'm going to obtain the list sections from a .ini file, and that variable will vary. And I can create an array of dictionaries, but that doesn't work well will the further function requirements. Hence, my doubt.
Robin Spiess answered your question beautifully.
I just want to add the one-liner way:
section_dict = {sec : {} for sec in sections}
For maintaining the order of insertion, you'll need an OrderedDict:
from collections import OrderedDict
section_dict = OrderedDict((sec, {}) for sec in sections)
To clear dictionaries
If the variables in your list are already dictionaries use:
for var in sections:
var.clear()
Note that here var = {} does not work, see Difference between dict.clear() and assigning {} in Python.
To create new dictionaries
As long as you only have a handful of dicts, the best way is probably the easiest one:
main = {} #same meaning as main = dict() but slightly faster
errdict = {}
excdict = {}
sections = [main,errdict,excdict]
The variables need to be declared first before you can put them in a list.
For more dicts I support #dslack's answer in the comments (all credit to him):
sections = [dict() for _ in range(numberOfDictsYouWant)]
If you want to be able to access the dictionaries by name, the easiest way is to make a dictionary of dictionaries:
sectionsdict = {}
for var in sections:
sectionsdict[var] = {}
You might also be interested in: Using a string variable as a variable name

Python references to references in python

I have a function that takes given initial conditions for a set of variables and puts the result into another global variable. For example, let's say two of these variables is x and y. Note that x and y must be global variables (because it is too messy/inconvenient to be passing large amounts of references between many functions).
x = 1
y = 2
def myFunction():
global x,y,solution
print(x)
< some code that evaluates using a while loop >
solution = <the result from many iterations of the while loop>
I want to see how the result changes given a change in the initial condition of x and y (and other variables). For flexibility and scalability, I want to do something like this:
varSet = {'genericName0':x, 'genericName1':y} # Dict contains all variables that I wish to alter initial conditions for
R = list(range(10))
for r in R:
varSet['genericName0'] = r #This doesn't work the way I want...
myFunction()
Such that the 'print' line in 'myFunction' outputs the values 0,1,2,...,9 on successive calls.
So basically I'm asking how do you map a key to a value, where the value isn't a standard data type (like an int) but is instead a reference to another value? And having done that, how do you reference that value?
If it's not possible to do it the way I intend: What is the best way to change the value of any given variable by changing the name (of the variable that you wish to set) only?
I'm using Python 3.4, so would prefer a solution that works for Python 3.
EDIT: Fixed up minor syntax problems.
EDIT2: I think maybe a clearer way to ask my question is this:
Consider that you have two dictionaries, one which contains round objects and the other contains fruit. Members of one dictionary can also belong to the other (apples are fruit and round). Now consider that you have the key 'apple' in both dictionaries, and the value refers to the number of apples. When updating the number of apples in one set, you want this number to also transfer to the round objects dictionary, under the key 'apple' without manually updating the dictionary yourself. What's the most pythonic way to handle this?
Instead of making x and y global variables with a separate dictionary to refer to them, make the dictionary directly contain "x" and "y" as keys.
varSet = {'x': 1, 'y': 2}
Then, in your code, whenever you want to refer to these parameters, use varSet['x'] and varSet['y']. When you want to update them use varSet['x'] = newValue and so on. This way the dictionary will always be "up to date" and you don't need to store references to anything.
we are going to take an example of fruits as given in your 2nd edit:
def set_round_val(fruit_dict,round_dict):
fruit_set = set(fruit_dict)
round_set = set(round_dict)
common_set = fruit_set.intersection(round_set) # get common key
for key in common_set:
round_dict[key] = fruit_dict[key] # set modified value in round_dict
return round_dict
fruit_dict = {'apple':34,'orange':30,'mango':20}
round_dict = {'bamboo':10,'apple':34,'orange':20} # values can even be same as fruit_dict
for r in range(1,10):
fruit_set['apple'] = r
round_dict = set_round_val(fruit_dict,round_dict)
print round_dict
Hope this helps.
From what I've gathered from the responses from #BrenBarn and #ebarr, this is the best way to go about the problem (and directly answer EDIT2).
Create a class which encapsulates the common variable:
class Count:
__init__(self,value):
self.value = value
Create the instance of that class:
import Count
no_of_apples = Count.Count(1)
no_of_tennis_balls = Count.Count(5)
no_of_bananas = Count.Count(7)
Create dictionaries with the common variable in both of them:
round = {'tennis_ball':no_of_tennis_balls,'apple':no_of_apples}
fruit = {'banana':no_of_bananas,'apple':no_of_apples}
print(round['apple'].value) #prints 1
fruit['apple'].value = 2
print(round['apple'].value) #prints 2

Categories