Defining max/min stats from nested JSON data in python - python

I'd like to preface this with I'm new to python (and not traditionally a programmer) so all suggestions to clean up any of the code (even not related to the below problem) are entirely welcome. I've been stuck on this for a couple of days now so I figured I'd give it a shot here.
I have a script that calls a RESTful API via requests package, parses returned JSON data via JSON package, assigns data to variables and then writes to a csv file via csv package (which is later via vba script written to an excel file.) This all seems to work fine, but currently its writing individual data points to excel and while I'd like to keep doing that, I'd also like to calculate summary statistics for that data (min, max, average, standard deviation, etc) in python before writing to a separate CSV output file.
I would imagine the correct way to do this is to write those saved variables (initially from JSON) to a nested dictionary/list and then use max/min functions etc. on the correct list, but I'm having trouble constructing the nested dictionaries dynamically.
To be clear, each data point in the jData['ProductActivity'] node is a separate transaction. I'm trying to build a dictionary that logically looks like:
[prodsize1]:
'bid':
[bid1]
[bid2]
'ask':
[ask1]
[ask2]
'trade':
[trade1]
[trade2]
[prodsize2]:
'bid':
[bid1]
[bid2]
'ask':
[ask1]
[ask2]
'trade':
[trade1]
[trade2]
Where [prodSize] key and [bid][trade] & [ask] value lists are all being added dynamically off of the jData.
code:
state_dict = {"trade": "3", "bid": "2", "ask": "1"}
market_activity = {}
bid_list = []
ask_list = []
trade_list = []
def get_hist(side):
state = state_dict.get(side)
jData = json.loads(myResponse.content)
page_length = len(jData['ProductActivity'])
for i in range (0, page_length):
chainId = jData['ProductActivity'][i]['chainId']
skuUuid = jData['ProductActivity'][i]['skuUuid']
createdAt = jData['ProductActivity'][i]['createdAt']
prodSize = float(jData['ProductActivity'][i]['prodSize'])
amount = float(jData['ProductActivity'][i]['amount'])
localAmount = float(jData['ProductActivity'][i]['localAmount'])
localCurrency = jData['ProductActivity'][i]['localCurrency']
productId = jData['ProductActivity'][i]['productId']
customerId = jData['ProductActivity'][i]['customerId']
if "frequency" in jData['ProductActivity'][i]:
frequency = jData['ProductActivity'][i]['frequency']
else:
frequency = 1
csv_writer.writerow([chainId, skuUuid, createdAt, styleId, name, target_product, side, prodSize, amount, localAmount,
frequency, localCurrency, productId, customerId])
if side == 'bid':
market_activity[prodSize] = {'bid': bid_list.append(amount)}
elif side == 'ask':
market_activity[prodSize] = {'ask': ask_list.append(amount)}
elif side == 'trade':
market_activity[prodSize] = {'trade': trade_list.append(amount)}
myResponse.raise_for_status()
get_hist(side="trade")
get_hist(side="bid")
get_hist(side="ask")
available_sizes = []
for key in market_activity.keys():
available_sizes.append(key)
summary_stats={'max_bid': '','min_ask': '','avg_trade': ''}
def generate_summary_stats:
for size in available_shoe_sizes:
summary_stats[size].update(max(market_info[size]['bid']))
summary_stats[size].update(min(market_info[size]['ask']))
#add in rest of stats
generate_summary_stats()
data_to_file.close()
I think I may need to add new keys separately and then append the lists stored as values. I also fear that the way I have it written will write over 'state' (bid, ask, trade) values instead of add to each list.

It's difficult to understand what you're expecting to get without an example. Can you post a sample of what your JSON data looks like, and how you ultimately want the market_activity dictionary to look? As for overwriting, notice that market_activity[prodSize] = {'bid': bid_list.append(amount)} for example tries to assign a new value to market_activity[prodSize] on each run. Maybe you want something more like market_activity[prodSize]['bid'].append(amount) here. Though you'll have to set the initial {'bid': []} empty list before you can "append" anything to it.
Also, since you asked for some general Python coding suggestions:
market_activity.keys() already returns a list, so you should be able to just replace for size in available_shoe_sizes: with for size in market_activity.keys():, and forget about making the available_shoe_sizes list completely
Since "side" is the only input variable in your get_hist()
function, you can just call the function as get_hist("trade") for
example. In general, the order of the parameters you pass in just has to match
up with the order they're defined.
I'm not sure where market_info[] comes from. Is that supposed to be market_activity?
In general, definitions (functions) should be put near the top of your code, instead of being defined in between other commands you're running.
Likewise, generate_summary_stats should have the parameters passed through it, instead of having them be global. Then return the value you want, which is probably the dictionary summary_stats. So something more like this:
.
def generate_summary_stats(sizes):
summary_stats={'max_bid': '','min_ask': '','avg_trade': ''}
for size in sizes:
## Here's the loop where you'd insert your code
## for updating summary_stats properly.
return summary_stats
# Getting the stats, outside of your function.
stats = generate_summary_stats(market_activity)

Related

Doing calculations while creating a List (in Python)

I'm getting data from an API and storing it on Python dictionary (and then a list of dictionaries).
I need to do calculations (max, sum, divisions...) on the dictionary data to create extra data to add to the same dictionary/list.
My current code looks like this:
stream = whatever (whatever, whatever)
keywords = []
for batch in stream:
for row in batch.results:
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords = {}
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)
This doesn't work, it gives UnboundLocalError (local variable 'data_keywords' referenced before assignment). I've tried different options and got different errors.
data_keywords["etv"] is what I want to calculate ("max_clicks", "weighted_clicks" and data_keywords["keywords_weighted"] are intermediate calculations for that)
The main problem is that I need to calculate max and sum for all values inside the dictionary, then do a calculation using that max and sum for each value and then store the results in the dictionary itself.
So I don't know where to put the code to do the calculations (before the dictionary, inside the dictionary, after the dictionary or a mix)
I guess it should be possible, but I'm a Python/programming newbie and can't figure this out.
It's probably not relevant, but in case you are wondering, I'm trying to create a weighted sort (https://moz.com/blog/build-your-own-weighted-sort). And I can't use models/database to store data.
Thanks!
EDIT: Some extra info, in case it helps understand better what I need: The results that the keywords list gives without the calculations is something like this:
[{'keywords_text': 'whatever', 'keywords_clicks': 5, 'keywords_conversion_rate': 6.3}, {'keywords_text': 'whatever2', 'keywords_clicks': 50, 'keywords_conversion_rate': 2.3}, {'keywords_text': 'whatever3', 'keywords_clicks': 20, 'keywords_conversion_rate': 2.0}]
I want basically to add to this keywords list a new key/value of 'etv': 8.5 or whatever for each keyword. That etv should come from the formula that I put on my code (data_keywords["etv"] = ...) but maybe it needs changes to work in Python.
The info from this "original" keywords list comes directly from the API (I don't have that data stored anywhere) and it works perfectly if I just request the info and store it in that list. But when the problems come when I introduce the calculations (specially using sum and max inside a loop I guess).
The UnboundLocalError is because you are trying to access data_keywords["keywords_clicks"] before you have declared data_keywords or set the value for "keywords_clicks".
Also, I think you need to be clearer about what data structure you are trying to create. You mention "a list of dictionaries" which I don't see. Maybe you are trying to create a dictionary of lists, but it looks like you overwrite the dictionary values each time you go through your loop.
adding my response as an answer, as I do not have enough reputation to comment
To get rid of assignment error just move the line data_keywords = {} above max_clicks = max(data_keywords["keywords_clicks"])
Here you are trying to access a local variable before its declaration. The code in this case is trying to access a global variable which doesn't seems to exist.
stream = whatever (whatever, whatever)
keywords = []
for batch in stream:
for row in batch.results:
data_keywords = {}
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)
More on that here
You can't refer to elements of the dictionary before you create it. Move those variable assignments down to after you assign the dictionary elements.
for batch in stream:
for row in batch.results:
data_keywords = {}
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)

Create nested python dictionary

I am using the python code below to extract some values from an excel spreadsheet and then push them to an html page for further processing. I would like to modify the code below so that I can add additional values against each task, any help
the code below does spit out the following:
{'line items': {'AMS Upgrade': '30667', 'BMS works':
'35722'}}
How can I revise the code below so that I can add 2 more values against each task i.e. AMS Upgrade and BMS works
and get the likes of (note the structure below could be wrong)
{'line items': {'AMS Upgrade': {'30667','100%', '25799'}},{'BMS works':
{'10667','10%', '3572'}} }
Code:
book = xlrd.open_workbook("Example - supporting doc.xls")
first_sheet = book.sheet_by_index(-1)
nested_dict = {}
nested_dict["line items"] = {}
for i in range(21,175):
Line_items = first_sheet.row_slice(rowx=i, start_colx=2, end_colx=8)
if str(Line_items[0].value) and str(Line_items[1].value):
if not Line_items[5].value ==0 :
nested_dict["line items"].update({str(Line_items[0].value) : str(Line_items[1].value)})
print nested_dict
print json.dumps(nested_dict)
*** as requested see excel extract below
In Python, each key of a dict can only be associated with a single value. However that single value can be a dict, list, set, etc that holds many values.
You will need to decide the type to use for the value associated with the 'AMS Upgrade' key, if you want it to hold multiple values like '30667','10%', '222'.
Note: what you have written:
{'30667','100%', '25799'}
Is a set literal in Python.

How to iterate and extract data from this specific JSON file example

I'm trying to extract data from a JSON file with Python.
Mainly, I want to pull out the date and time from the "Technicals" section, to put that in one column of a dataframe, as well as pulling the "AKG" number and putting that in the 2nd col of the dataframe. Yes, I've looked at similar questions, but this issue is different. Thanks for your help.
A downNdirty example of the JSON file is below:
{ 'Meta Data': { '1: etc'
'2: etc'},
'Technicals': { '2017-05-04 12:00': { 'AKG': '64.8645'},
'2017-05-04 12:30': { 'AKG': '65.7834'},
'2017-05-04 13:00': { 'AKG': '63.2348'}}}
As you can see, and what's stumping me, is while the date stays the same the time advances. 'AKG' never changes, but the number does. Some of the relevant code I've been using is below. I can extract the date and time, but I can't seem to reach the AKG numbers. Note, I don't need the "AKG", just the number.
I'll mention: I'm creating a DataFrame because this will be easier to work with when creating plots with the data...right? I'm open to an array of lists et al, or anything easier, if that will ultimately help me with the plots.
akg_time = []
akg_akg = []
technicals = akg_data['Technicals'] #akg_data is the entire json file
for item in technicals: #this works
akg_time.append(item)
for item in technicals: #this not so much
symbol = item.get('AKG')
akg_akg.append(symbol)
pp.pprint(akg_akg)
error: 'str' object has no attribute 'get'
You've almost got it. You don't even need the second loop. You can append the akg value in the first one itself:
for key in technicals: # renaming to key because that is a clearer name
akg_time.append(key)
akg_akg.append(technicals[key]['AKG'])
Your error is because you believe item (or key) is a dict. It is not. It is just a string, one of the keys of the technicals dictionary, so you'd actually need to use symbols = technicals[key].get('AKG').
Although Coldspeed answer is right: when you have a dictionary you loop through keys and values like this:
Python 3
for key,value in technicals.items():
akg_time.append(key)
akg_akg.append(value["akg"])
Python 2
for key,value in technicals.iteritems():
akg_time.append(key)
akg_akg.append(value["akg"])

Add missing dictionary key/value via raw_input

import collections
header_dict = {'account number':'ACCOUNT_name','accountID':'ACCOUNT_name','name':'client','first name':'client','tax id':'tin'}
#header_dict = collections.defaultdict(lambda: 'tin') # attempted use of defaultdict...destroys my dictionary
given_header = ['account number','name','tax id']#,'tax identification number']#,'social security number'
#given_header = ['account number','name','tax identification number']...non working header layout
fileLayout = [header_dict[ting] for ting in given_header if ting] #create if else..if ting exists, add to list...else if not in list, add to dictionary
def getLayout(ting):
global given_header
global fileLayout
return given_header[fileLayout.index(ting)]
print getLayout('ACCOUNT_name')
print getLayout('client')
print getLayout('tin')
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
I am working with many files of random, mixed up layouts/column orders. I have a set template for my db table of 'ACCOUNT_name','client','tin' that I want the files to be ordered in. I have created a dictionary of the possible header/column names I might find in other files as keys and my set header names as values. So, for example, if I wanted to see where to put the column 'account number' from one of my given files, I would type header_dict['account number'].
This would give me the corresponding column from my template, 'ACCOUNT_name'. This works great...I also added another feature. Instead of having to type 'account number'..I made a list comprehension that looks up each value by key.
This list I just created with the 'fileLayout' list comprehension essentially transforms my given file's header into my desired names: ['ACCOUNT_name','client']
That makes life a lot easier...I know that I want to look up 'ACCOUNT_name', or 'client'. Next I run a function 'getLayout' that returns the index of the desired columns I am searching...So if I want to see where my desired column 'ACCOUNT_name' is in the file, I just run the function which is called like this...
getLayout('ACCOUNT_name')
Now at this point, I can easily print the columns to my order...with:
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
The above code gives me [('account number'),('name'),('tax id')], which is exactly what I want...
But what if there is a new header I am not used to ?? Lets use the same example code above but change the list 'given_header' to this:
given_header = ['account number','name','tax identification number']
I most certainly get the key error, KeyError: 'tax identification number' I know I can use defaultdict but when I try to use it with the set value 'tin', I end up overwriting my entire dictionary... What I would ultimately like to end up doing is this...
I would like to create an else within my list comprehension that allows me to standard input dictionary entries if they don't exist. In other words, since 'tax identification number' does not exists as a key, add it as one to my dict and give it the value 'tin' via raw_input. Has anyone ever done or tried anything like this? Any ideas? If you have and have any suggestions, I am all ears. I'm struggling on this issue...
The way I would want to go about this is in the list comprehension..
fileLayout = [header_dict[ting] for ting in given_header if ting else raw_input('add missing key value pair to dictionary')] # or do something of the sort.

How to create a dictionary based on variable value in Python

I am trying to create a dictionary where the name comes from a variable.
Here is the situation since maybe there is a better way:
Im using an API to get attributes of "objects". (Name, Description, X, Y, Z) etc. I want to store this information in a way that keeps the data by "object".
In order to get this info, the API iterates through all the "objects".
So what my proposal was that if the object name is one of the ones i want to "capture", I want to create a dictionary with that name like so:
ObjectName = {'Description': VarDescrption, 'X': VarX.. etc}
(Where I say "Varetc..." that would be the value of that attribute passed by the API.
Now since I know the list of names ahead of time, I CAN use a really long If tree but am looking for something easier to code to accomplish this. (and extensible without adding too much code)
Here is code I have:
def py_cell_object():
#object counter - unrelated to question
addtototal()
#is this an object I want?
if aw.aw_string (239)[:5] == "TDT3_":
#If yes, make a dictionary with the object description as the name of the dictionary.
vars()[aw.aw_string (239)]={'X': aw.aw_int (232), 'Y': aw.aw_int (233), 'Z': aw.aw_int (234), 'No': aw.aw_int (231)}
#print back result to test
for key in aw.aw_string (239):
print 'key=%s, value=%s' % (key, aw.aw_string (239)[key])
here are the first two lines of code to show what "aw" is
from ctypes import *
aw = CDLL("aw")
to explain what the numbers in the API calls are:
231 AW_OBJECT_NUMBER,
232 AW_OBJECT_X,
233 AW_OBJECT_Y,
234 AW_OBJECT_Z,
239 AW_OBJECT_DESCRIPTION,
231-234 are integers and 239 is a string
I deduce that you are using the Active Worlds SDK. It would save time to mention that in the first place in future questions.
I guess your goal is to create a top-level dictionary, where each key is the object description. Each value is another dictionary, storing many of the attributes of that object.
I took a quick look at the AW SDK documentation on the wiki and I don't see a way to ask the SDK for a list of attribute names, IDs, and types. So you will have to hard-code that information in your program somehow. Unless you need it elsewhere, it's simplest to just hard-code it where you create the dictionary, which is what you are already doing. To print it back out, just print the attribute dictionary's repr. I would probably format your method more like this:
def py_cell_object():
#object counter - unrelated to question
addtototal()
description = aw.aw_string(239)
if description.startswith("TDT3_"):
vars()[description] = {
'DESCRIPTION': description,
'X': aw.aw_int(232),
'Y': aw.aw_int(233),
'Z': aw.aw_int(234),
'NUMBER': aw.aw_int (231),
... etc for remaining attributes
}
print repr(vars()[description])
Some would argue that you should make named constants for the numbers 232, 233, 234, etc., but I see little reason to do that unless you need them in multiple places, or unless it's easy to generate them automatically from the SDK (for example, by parsing a .h file).
If the variables are defined in the local scope, it's as simple as:
obj_names = {}
while True:
varname = read_name()
if not varname: break
obj_names[varname] = locals()[varname]
This is actual code I am using in my production environment
hope it helps.
cveDict = {}
# StrVul is a python list holding list of vulnerabilities belonging to a report
report = Report.objects.get(pk=report_id)
vul = Vulnerability.objects.filter(report_id=report_id)
strVul = map(str, vul)
# fill up the python dict, += 1 if cvetype already exists
for cve in strVul:
i = Cve.objects.get(id=cve)
if i.vul_cvetype in cveDict.keys():
cveDict[i.vul_cvetype] += 1
else:
cveDict[i.vul_cvetype] = 1

Categories