Insert JSON in Array Python - python

I have a dict created in a for loop in Python dict = {year:{month:{day:[title]}}} where year, month, day, and title are all variables. I then use data = json.dumps(dict) which works perfectly. But if the day is the same, I'd like it to add another [title] aspect to the array, so it would be
for title in x:
dict = {year:{month:{day:[title]}}}
data = json.dumps(dict)
if day==day:
//insert another [title] right next to [title]
I've tried using append, update, and insert, but none of them work.
How would I go about doing this?

Note that as user2357112 mentioned, you are creating a Python dict -- not a Python list (aka a JSON "array"). Thus, when you say "[title] right next to [title]" there is a bit of confusion. Dicts do not use the order you are expecting (they use a hash-ordering).
That, and you are attempting to add a field after you've dumped the JSON to a string. You should do that before you dump it. More so, you're throwing away both your dict and data variables every loop. As written, your code will only have access to the variables in the last iteration of the loop.
And another important note: don't overload dict. Rename your variable to something else.
Also, your line day==day will always return True...
Here is what I think you are trying to do: you are creating a "calendar" of sorts that is organized into years, then months, then days. Each day has a list of "titles."
# Variables I'm assuming exist:
# `title`, `year`, `month`, `day`, `someOtherDay`, `titles`, `someOtherTitle`
myDict = {}
for title in titles: #Renamed `x` to `titles` for clarity.
# Make sure myDict has the necessary keys.
if not myDict[year]:
myDict[year] = {}
if not myDict[year][month]:
myDict[year][month] = {}
# Set the day to be a list with a single `title` (and possibly two).
myDict[year][month][day] = [title]
if day==someOtherDay:
myDict[year][month][day].append(someotherTitle)
# And FINALLY dump the result to a string.
data = json.dumps(myDict)

Related

Doing calculations while creating a List (in Python)

I'm getting data from an API and storing it on Python dictionary (and then a list of dictionaries).
I need to do calculations (max, sum, divisions...) on the dictionary data to create extra data to add to the same dictionary/list.
My current code looks like this:
stream = whatever (whatever, whatever)
keywords = []
for batch in stream:
for row in batch.results:
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords = {}
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)
This doesn't work, it gives UnboundLocalError (local variable 'data_keywords' referenced before assignment). I've tried different options and got different errors.
data_keywords["etv"] is what I want to calculate ("max_clicks", "weighted_clicks" and data_keywords["keywords_weighted"] are intermediate calculations for that)
The main problem is that I need to calculate max and sum for all values inside the dictionary, then do a calculation using that max and sum for each value and then store the results in the dictionary itself.
So I don't know where to put the code to do the calculations (before the dictionary, inside the dictionary, after the dictionary or a mix)
I guess it should be possible, but I'm a Python/programming newbie and can't figure this out.
It's probably not relevant, but in case you are wondering, I'm trying to create a weighted sort (https://moz.com/blog/build-your-own-weighted-sort). And I can't use models/database to store data.
Thanks!
EDIT: Some extra info, in case it helps understand better what I need: The results that the keywords list gives without the calculations is something like this:
[{'keywords_text': 'whatever', 'keywords_clicks': 5, 'keywords_conversion_rate': 6.3}, {'keywords_text': 'whatever2', 'keywords_clicks': 50, 'keywords_conversion_rate': 2.3}, {'keywords_text': 'whatever3', 'keywords_clicks': 20, 'keywords_conversion_rate': 2.0}]
I want basically to add to this keywords list a new key/value of 'etv': 8.5 or whatever for each keyword. That etv should come from the formula that I put on my code (data_keywords["etv"] = ...) but maybe it needs changes to work in Python.
The info from this "original" keywords list comes directly from the API (I don't have that data stored anywhere) and it works perfectly if I just request the info and store it in that list. But when the problems come when I introduce the calculations (specially using sum and max inside a loop I guess).
The UnboundLocalError is because you are trying to access data_keywords["keywords_clicks"] before you have declared data_keywords or set the value for "keywords_clicks".
Also, I think you need to be clearer about what data structure you are trying to create. You mention "a list of dictionaries" which I don't see. Maybe you are trying to create a dictionary of lists, but it looks like you overwrite the dictionary values each time you go through your loop.
adding my response as an answer, as I do not have enough reputation to comment
To get rid of assignment error just move the line data_keywords = {} above max_clicks = max(data_keywords["keywords_clicks"])
Here you are trying to access a local variable before its declaration. The code in this case is trying to access a global variable which doesn't seems to exist.
stream = whatever (whatever, whatever)
keywords = []
for batch in stream:
for row in batch.results:
data_keywords = {}
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)
More on that here
You can't refer to elements of the dictionary before you create it. Move those variable assignments down to after you assign the dictionary elements.
for batch in stream:
for row in batch.results:
data_keywords = {}
data_keywords["keywords_text"] = row.ad_group_criterion.keyword.text
data_keywords["keywords_clicks"] = row.metrics.clicks
data_keywords["keywords_conversion_rate"] = row.metrics.conversions_from_interactions_rate
data_keywords["keywords_weighted"] = row.metrics.clicks * row.metrics.conversions_from_interactions_rate
max_clicks = max(data_keywords["keywords_clicks"])
weighted_clicks = sum(data_keywords["keywords_weighted"])/sum(data_keywords["keywords_clicks"])
data_keywords["etv"] = (data_keywords["keywords_clicks"]/max_clicks*data_keywords["keywords_conversion_rate"])+((1-data_keywords["keywords_clicks"]/max_clicks)*weighted_clicks)
keywords.append(data_keywords)

How can I rename a dictionary within a program?

I ask the user of my program to input the number of datasets he/she wants to investigate, e.g. three datasets. Accordingly, I should then create three dictionaries (dataset_1, dataset_2, and dataset_3) to hold the values for the various parameters. Since I do not know beforehand the number of datasets the user wants to investigate, I have to create and name the dictionaries within the program.
Apparently, Python does not let me do that. I could not rename the dictionary once it has been created.
I have tried using os.rename("oldname", "newname"), but that only works if I have a file stored on my computer hard disk. I could not get it to work with an object that lives only within my program.
number_sets = input('Input the number of datasets to investigate:')
for dataset in range(number_sets):
init_dict = {}
# create dictionary name for the particular dataset
dict_name = ''.join(['dataset_', str(dataset+1)])
# change the dictionary´s name
# HOW CAN I CHANGE THE DICTIONARY´S NAME FROM "INIT_DICT"
# TO "DATASET_1", WHICH IS THE STRING RESULT FOR DICT_NAME?
I would like to have in the end
dataset_1 = {}
dataset_2 = {}
and so on.
You don't (need to). Keep a list of data sets.
datasets = []
for i in range(number_sets):
init_dict = {}
...
datasets.append(init_dict)
Then you have datasets[0], datasets[1], etc., rather than dataset_1, dataset_2, etc.
Inside the loop, init_dict is set to a brand new empty directory at the top of each iteration, without affecting the dicts added to datasets on previous iterations.
If you want to create variables like that you could use the globals
number_sets = 2
for dataset in range(number_sets):
dict_name = ''.join(['dataset_', str(dataset+1)])
globals() [dict_name] = {}
print(dataset_1)
print(dataset_2)
However this is not a good practice, and it should be avoided, if you need to keep several variables that are similar the best thing to do is to create a list.
You can use a single dict and then add all the data sets into it as a dictionary:
all_datasets = {}
for i in range(number_sets):
all_datasets['dataset'+str(i+1)] = {}
And then you can access the data by using:
all_datasets['dataset_1']
This question gets asked many times in many different variants (this is one of the more prominent ones, for example). The answer is always the same:
It is not easily possible and most of the time not a good idea to create python variable names from strings.
The more easy, approachable, safe and usable way is to just use another dictionary. One of the cool things about dictionaries: any object can become a key / value. So the possibilities are nearly endless. In your code, this can be done easily with a dict comprehension:
number_sets = int(input('Input the number of datasets to investigate:')) # also notice that you have to add int() here
data = {''.join(['dataset_', str(dataset + 1)]): {} for dataset in range(number_sets)}
print(data)
>>> 5
{'dataset_1': {}, 'dataset_2': {}, 'dataset_3': {}, 'dataset_4': {}, 'dataset_5': {}}
Afterwards, these dictionaries can be easily accessed via data[name_of_dataset]. Thats how it should be done.

How to iterate and extract data from this specific JSON file example

I'm trying to extract data from a JSON file with Python.
Mainly, I want to pull out the date and time from the "Technicals" section, to put that in one column of a dataframe, as well as pulling the "AKG" number and putting that in the 2nd col of the dataframe. Yes, I've looked at similar questions, but this issue is different. Thanks for your help.
A downNdirty example of the JSON file is below:
{ 'Meta Data': { '1: etc'
'2: etc'},
'Technicals': { '2017-05-04 12:00': { 'AKG': '64.8645'},
'2017-05-04 12:30': { 'AKG': '65.7834'},
'2017-05-04 13:00': { 'AKG': '63.2348'}}}
As you can see, and what's stumping me, is while the date stays the same the time advances. 'AKG' never changes, but the number does. Some of the relevant code I've been using is below. I can extract the date and time, but I can't seem to reach the AKG numbers. Note, I don't need the "AKG", just the number.
I'll mention: I'm creating a DataFrame because this will be easier to work with when creating plots with the data...right? I'm open to an array of lists et al, or anything easier, if that will ultimately help me with the plots.
akg_time = []
akg_akg = []
technicals = akg_data['Technicals'] #akg_data is the entire json file
for item in technicals: #this works
akg_time.append(item)
for item in technicals: #this not so much
symbol = item.get('AKG')
akg_akg.append(symbol)
pp.pprint(akg_akg)
error: 'str' object has no attribute 'get'
You've almost got it. You don't even need the second loop. You can append the akg value in the first one itself:
for key in technicals: # renaming to key because that is a clearer name
akg_time.append(key)
akg_akg.append(technicals[key]['AKG'])
Your error is because you believe item (or key) is a dict. It is not. It is just a string, one of the keys of the technicals dictionary, so you'd actually need to use symbols = technicals[key].get('AKG').
Although Coldspeed answer is right: when you have a dictionary you loop through keys and values like this:
Python 3
for key,value in technicals.items():
akg_time.append(key)
akg_akg.append(value["akg"])
Python 2
for key,value in technicals.iteritems():
akg_time.append(key)
akg_akg.append(value["akg"])

Add missing dictionary key/value via raw_input

import collections
header_dict = {'account number':'ACCOUNT_name','accountID':'ACCOUNT_name','name':'client','first name':'client','tax id':'tin'}
#header_dict = collections.defaultdict(lambda: 'tin') # attempted use of defaultdict...destroys my dictionary
given_header = ['account number','name','tax id']#,'tax identification number']#,'social security number'
#given_header = ['account number','name','tax identification number']...non working header layout
fileLayout = [header_dict[ting] for ting in given_header if ting] #create if else..if ting exists, add to list...else if not in list, add to dictionary
def getLayout(ting):
global given_header
global fileLayout
return given_header[fileLayout.index(ting)]
print getLayout('ACCOUNT_name')
print getLayout('client')
print getLayout('tin')
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
I am working with many files of random, mixed up layouts/column orders. I have a set template for my db table of 'ACCOUNT_name','client','tin' that I want the files to be ordered in. I have created a dictionary of the possible header/column names I might find in other files as keys and my set header names as values. So, for example, if I wanted to see where to put the column 'account number' from one of my given files, I would type header_dict['account number'].
This would give me the corresponding column from my template, 'ACCOUNT_name'. This works great...I also added another feature. Instead of having to type 'account number'..I made a list comprehension that looks up each value by key.
This list I just created with the 'fileLayout' list comprehension essentially transforms my given file's header into my desired names: ['ACCOUNT_name','client']
That makes life a lot easier...I know that I want to look up 'ACCOUNT_name', or 'client'. Next I run a function 'getLayout' that returns the index of the desired columns I am searching...So if I want to see where my desired column 'ACCOUNT_name' is in the file, I just run the function which is called like this...
getLayout('ACCOUNT_name')
Now at this point, I can easily print the columns to my order...with:
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
The above code gives me [('account number'),('name'),('tax id')], which is exactly what I want...
But what if there is a new header I am not used to ?? Lets use the same example code above but change the list 'given_header' to this:
given_header = ['account number','name','tax identification number']
I most certainly get the key error, KeyError: 'tax identification number' I know I can use defaultdict but when I try to use it with the set value 'tin', I end up overwriting my entire dictionary... What I would ultimately like to end up doing is this...
I would like to create an else within my list comprehension that allows me to standard input dictionary entries if they don't exist. In other words, since 'tax identification number' does not exists as a key, add it as one to my dict and give it the value 'tin' via raw_input. Has anyone ever done or tried anything like this? Any ideas? If you have and have any suggestions, I am all ears. I'm struggling on this issue...
The way I would want to go about this is in the list comprehension..
fileLayout = [header_dict[ting] for ting in given_header if ting else raw_input('add missing key value pair to dictionary')] # or do something of the sort.

Generating a .CSV with Several Columns - Use a Dictionary?

I am writing a script that looks through my inventory, compares it with a master list of all possible inventory items, and tells me what items I am missing. My goal is a .csv file where the first column contains a unique key integer and then the remaining several columns would have data related to that key. For example, a three row snippet of my end-goal .csv file might look like this:
100001,apple,fruit,medium,12,red
100002,carrot,vegetable,medium,10,orange
100005,radish,vegetable,small,10,red
The data for this is being drawn from a couple sources. 1st, a query to an API server gives me a list of keys for items that are in inventory. 2nd, I read in a .csv file into a dict that matches keys with item name for all possible keys. A snippet of the first 5 rows of this .csv file might look like this:
100001,apple
100002,carrot
100003,pear
100004,banana
100005,radish
Note how any key in my list of inventory will be found in this two column .csv file that gives all keys and their corresponding item name and this list minus my inventory on hand yields what I'm looking for (which is the inventory I need to get).
So far I can get a .csv file that contains just the keys and item names for the items that I don't have in inventory. Give a list of inventory on hand like this:
100003,100004
A snippet of my resulting .csv file looks like this:
100001,apple
100002,carrot
100005,radish
This means that I have pear and banana in inventory (so they are not in this .csv file.)
To get this I have a function to get an item name when given an item id that looks like this:
def getNames(id_to_name, ids):
return [id_to_name[id] for id in ids]
Then a function which gives a list of keys as integers from my inventory server API call that returns a list and I've run this function like this:
invlist = ServerApiCallFunction(AppropriateInfo)
A third function takes this invlist as its input and returns a dict of keys (the item id) and names for the items I don't have. It also writes the information of this dict to a .csv file. I am using the set1 - set2 method to do this. It looks like this:
def InventoryNumbers(inventory):
with open(csvfile,'w') as c:
c.write('InvName' + ',InvID' + '\n')
missinginvnames = []
with open("KeyAndItemNameTwoColumns.csv","rb") as fp:
reader = csv.reader(fp, skipinitialspace=True)
fp.readline() # skip header
invidsandnames = {int(id): str.upper(name) for id, name in reader}
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
print missinginvnameswithids
with open(csvfile,'a') as c:
for invname, invid in missinginvnameswithids.iteritems():
c.write(invname + ',' + str(invid) + '\n')
return missinginvnameswithids
Which I then call like this:
InventoryNumbers(invlist)
With that explanation, now on to my question here. I want to expand the data in this output .csv file by adding in additional columns. The data for this would be drawn from another .csv file, a snippet of which would look like this:
100001,fruit,medium,12,red
100002,vegetable,medium,10,orange
100003,fruit,medium,14,green
100004,fruit,medium,12,yellow
100005,vegetable,small,10,red
Note how this does not contain the item name (so I have to pull that from a different .csv file that just has the two columns of key and item name) but it does use the same keys. I am looking for a way to bring in this extra information so that my final .csv file will not just tell me the keys (which are item ids) and item names for the items I don't have in stock but it will also have columns for type, size, number, and color.
One option I've looked at is the defaultdict piece from collections, but I'm not sure if this is the best way to go about what I want to do. If I did use this method I'm not sure exactly how I'd call it to achieve my desired result. If some other method would be easier I'm certainly willing to try that, too.
How can I take my dict of keys and corresponding item names for items that I don't have in inventory and add to it this extra information in such a way that I could output it all to a .csv file?
EDIT: As I typed this up it occurred to me that I might make things easier on myself by creating a new single .csv file that would have date in the form key,item name,type,size,number,color (basically just copying in the column for item name into the .csv that already has the other information for each key.) This way I would only need to draw from one .csv file rather than from two. Even if I did this, though, how would I go about making my desired .csv file based on only those keys for items not in inventory?
ANSWER: I posted another question here about how to implement the solution I accepted (becauseit was giving me a value error since my dict values were strings rather than sets to start with) and I ended up deciding that I wanted a list rather than a set (to preserve the order.) I also ended up adding the column with item names to my .csv file that had all the other data so that I only had to draw from one .csv file. That said, here is what this section of code now looks like:
MyDict = {}
infile = open('FileWithAllTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in missinginvids: #note that this is the list I was using as the keys for my dict which I was zipping together with a corresponding list of item names to make my dict before.
MyDict.setdefault(int(spl_line[0]), list()).append(spl_line[1:])
print MyDict
it sounds like what you need is a dict mapping ints to sets, ie,
MyDict = {100001: set([apple]), 100002: set([carrot])}
you can add with update:
MyDict[100001].update([fruit])
which would give you: {100001: set([apple, fruit]), 100002: set([carrot])}
Also if you had a list of attributes of carrot... [vegetable,orange]
you could say MyDict[100002].update([vegetable, orange])
and get: {100001: set([apple, fruit]), 100002: set([carrot, vegetable, orange])}
does this answer your question?
EDIT:
to read into CSV...
infile = open('MyFile.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in MyDict.keys():
MyDict[spl_line[0]].update(spl_line[1:])
This isn't an answer to the question, but here is a possible way of simplifying your current code.
This:
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
Can be replaced with:
invonhandset = set(inventory)
missinginvnameswithids = {k: v for k, v in invidsandnames.iteritems() if k in in inventory}
Or:
invonhandset = set(inventory)
for key in invidsandnames.keys():
if key not in invonhandset:
del invidsandnames[key]
missinginvnameswithids = invidsandnames
Have you considered making a temporary RDB (python has sqlite support baked in) and for reasonable numbers of items I don't think you would have a performance issues.
I would turn each CSV file and the result from the web-api into a tables (one table per data source). You can then do everything you want to do with some SQL queries + joins. Once you have the data you want, you can then dump it back to CSV.

Categories