How to create a hierarchical dictionary from a csv? - python

I am trying to build a hierarchical dict (please see below the desired output I am looking for) from my csv file.
The following is my code so far, I was searching through itertools possibly I think that's the best tool I need for this task. I cannot use pandas. I think I need to maybe put the values of the key into a new dictionary and then try to map the policy interfaces and build a new dict?
import csv
import pprint
from itertools import groupby
new_dict=[]
with open("test_.csv", "rb") as file_data:
reader = csv.DictReader(file_data)
for keys, grouping in groupby(reader, lambda x: x['groupA_policy']):
new_dict.append(list(grouping))
pprint.pprint(new_dict)
My csv file looks like this:
GroupA_Host,groupA_policy,groupA_policy_interface,GroupB_Host,GroupB_policy,GroupB_policy_interface
host1,policy10,eth0,host_R,policy90,eth9
host1,policy10,eth0.1,host_R,policy90,eth9.1
host2,policy20,eth2,host_Q,policy80,eth8
host2,policy20,eth2.1,host_Q,policy80,eth8.1
The desired output I want achieve is this:
[{'GroupA_Host': 'host1',
'GroupB_Host': 'host_R',
'GroupB_policy': 'policy90',
'groupA_policy': 'policy10',
'interfaces': [{'GroupB_policy_interface': 'eth9',
'group_a_policy_interfaces': 'eth0'},
{'GroupB_policy_interface': 'eth9.1',
'group_a_policy_interface': 'eth0.1'}]},
{'GroupA_host': 'host2',
'GroupB_Host': 'host_Q',
'GroupB_policy': 'policy80',
'groupA_policy': 'policy20',
'interfaces': [{'GroupB_policy_interface': 'eth8',
'groupA_policy_interfaces': 'eth2'},
{'groupA_policy_interface': 'eth8.1',
'groupA_policy_interfaces': 'eth2.1'}]}]

I don't think itertools is necessary here. The important thing is to recognize that you're using ('GroupA_Host', 'GroupB_Host', 'groupA_policy', 'GroupB_policy') as the key for the grouping -- so you can use a dictionary to collect interfaces keyed on this key:
d = {}
for row in reader:
key = row['GroupA_Host'], row['GroupB_Host'], row['groupA_policy'], row['GroupB_policy']
interface = {'groupA_policy_interface': row['groupA_policy_interface'],
'GroupB_policy_interface': row['GroupB_policy_interface']
}
if key in d:
d[key].append(interface)
else:
d[key] = [interface]
as_list = []
for key, interfaces in d.iteritems():
record = {}
record['GroupA_Host'] = key[0]
record['GroupB_Host'] = key[1]
record['groupA_policy'] = key[2]
record['GroupB_policy'] = key[3]
record['interfaces'] = interfaces
as_list.append(record)

Related

Save dataframes to multiple CSVs retaining dataframe name

How can I export multiple dataframes to CSVs that have the same title, in general code?
I tried:
dframes_list = [economy, finance, language]
for i, df in enumerate(dframes_list, 1):
filename_attempt1 = "{}.csv".format(i)
filename_attempt2= f"{i}.csv"
df.to_save(filename_attempt2)
Expected Output:
file saved: "economy.csv"
file saved: "finance.csv"
file saved: "language.csv"
I think in python is strongly not recommended create strings variables, because then generate strings is not trivial.
Then best is create another list for names in strings and use zip:
dframes_list = [economy, finance, language]
names = ['economy','finance','language']
for i, df in zip(names, dframes_list):
filename_attempt1 = "df_{}.csv".format(i)
Another idea is create dict of DataFrames:
dframes_dict = {'economy': economy, 'finance': finance, 'language': language}
for i, df in dframes_dict.items():
filename_attempt1 = "df_{}.csv".format(i)
If need working with dict of DataFrames use:
for k, v in dframes_dict.items():
v = v.set_index('date')
#another code for processing each DataFrame
dframes_dict[k] = v
If you're doing this on a notebook, you can use a hack where you search the locals() and you could use regex to match 'dframes_list = [.+]` that should return a string value
'dframes_list = [economy, finance, language]'
which then you can do replacing until you get to 'economy, finance, language' at which point you can split and get a list.
A colab version works like this,
temp_local = dict(locals())
data = {}
for k,v in temp_local.items():
try:
if re.match('dframes_list = \[.+\]', v):
data[k] = v
print(k, v)
except:
pass
then,
names = re.findall('\[.+\]', data[key])[0].replace('[', '').replace(']', '').split(',')
where key has been identified from the data dict.
This isn't recommended tho.

How do I create a list as a key of a dictionary and add to the in different parts list?

I have a for loop that runs through a CSV file and grabs certain elements and creates a dictionary based on two variables.
Code:
for ind, row in sf1.iterrows():
sf1_date = row['datekey']
sf1_ticker = row['ticker']
company_date[sf1_ticker] = [sf1_date]
I for example during the first iteration of the for loop, sf1_ticker = 'AAPL' and sf1_date = '2020/03/01' and the next time around, sf1_ticker = 'AAPL' and sf1_date = '2020/06/01', how do I make the key of 'AAPL' in the dictionary equal to ['2020/03/01', '2020/06/01']
It appears that when you say "key" you actually mean "value". The keys for a dictionary are the things that you use to lookup values in the dictionary. In your case ticker is the key and a list of dates are the values, e.g. you want a dictionary that looks like this:
{'AAPL': ['2020/03/01', '2020/06/01'].
'MSFT': ['2020/04/01', '2020/09/01']}
Here the strings AAPL and MSFT are dictionary keys. The date lists are the values associated with each key.
Your code can not construct such a dictionary because it is assigning a new value to the key. The following code will either create a new key in the dictionary company_date if the key does not already exist in the dictionary, or replace the existing value if the key already exists:
company_date[sf1_ticker] = [sf1_date]
You need to append to a list of values in the dict, rather than replace the current list, if any. There are a couple of ways to do it; dict.setdefault() is one:
company_date = {}
for ind, row in sf1.iterrows():
sf1_date = row['datekey']
sf1_ticker = row['ticker']
company_date.setdefault(sf1_ticker, []).append(sf1_date)
Another way is with a collections.defaultdict of list:
from collections import defaultdict
company_date = defaultdict(list)
for ind, row in sf1.iterrows():
sf1_date = row['datekey']
sf1_ticker = row['ticker']
company_date[sf1_ticker].append(sf1_date)
You could create a new dictionary and add the date to the list if it exists. Otherwise, create the entry.
ticker_dates = {}
# Would give ticker_dates = {"AAPL":['2020/03/01', '2020/06/01']}
for ind,row in sft1.iterrows():
sf1_ticker = row['ticker']
sf1_date = row['datekey']
if sf1_ticker in ticker_dates:
ticker_date[sf1_ticker].append(sf1_date)
else:
ticker_dates[sf1_ticker] = [sf1_date]
You can use a defaultdict, which can be setup to add an empty list to any item that doesn't exist. It generally acts like a dictionary otherwise.
from collections import defaultdict
rows = [
['AAPL', '2020/03/01'],
['AAPL', '2020/06/01'],
['GOOGL', '2021/01/01']
]
company_date = defaultdict(list)
for ticker, date in rows:
company_date[ticker].append(date)
print(company_date)
# defaultdict(<class 'list'>, {'AAPL': ['2020/03/01', '2020/06/01'], 'GOOGL': ['2021/01/01']})

Creating Dataframe with JSON Keys

I have a JSON file which resulted from YouTube's iframe API and I want to put this JSON data into a pandas dataframe, where each JSON key will be a column, and each record should be a new row.
Normally I would use a loop and iterate over the rows of the JSON but this particular JSON looks like this :
[
"{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
"{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"
]
In this JSON not every key is written as a new line. How can I extract the keys in this case, and express them as columns?
A Pythonic Solution would be to use the keys and values API of the Python Dictionary.
it should be something like this:
ls = [
"{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
"{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"
]
ls = [json.loads(j) for j in ls]
keys = [j.keys() for j in ls] # this will get you all the keys
vals = [j.values() for j in ls] # this will get the values and then you can do something with it
print(keys)
print(values)
easiest way is to leverage json_normalize from pandas.
import json
from pandas.io.json import json_normalize
input_dict = [
"{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
"{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"
]
input_json = [json.loads(j) for j in input_dict]
df = json_normalize(input_json)
I think you are asking to break down your key and values and want keys as a column,and values as a row:
This is my approach and plz always provide how your expected output should like
ChainMap flats your dict in key and values and pretty much is self explanatory.
data = ["{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}","{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"]
import json
from collections import ChainMap
data = [json.loads(i) for i in data]
data = dict(ChainMap(*data))
keys = []
vals = []
for k,v in data.items():
keys.append(k)
vals.append(v)
data = pd.DataFrame(zip(keys,vals)).T
new_header = data.iloc[0]
data = data[1:]
data.columns = new_header
#startSecond playbackRates playbackRate qual totalTimeFormatted timemillis playerStateNumeric playerStateVerbose playerErrorNumeric date time stopSecond bufferLevelPercent playerErrorVerbose qualLevels videoId curTimeFormatted playoutLevelPercent
#0 [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] 1 large 9:46 1563467467703 1 Playing 18.7.2019 18:31:07,703 90 1.4 [hd720, large, medium, small, tiny, auto] 0HJx2JhQKQk 0:02 0.3

How to insert dictionaries as values into a dictionary using loop on python

I am currently facing a problem to make my cvs data into dictionary.
I have 3 columns that I'd like to use in the file:
userID, placeID, rating
U1000, 12222, 3
U1000, 13333, 2
U1001, 13333, 4
I would like to make the result look like this:
{'U1000': {'12222': 3, '13333': 2},
'U1001': {'13333': 4}}
That is to say,
I would like to make my data structure look like:
sample = {}
sample["U1000"] = {}
sample["U1001"] = {}
sample["U1000"]["12222"] = 3
sample["U1000"]["13333"] = 2
sample["U1001"]["13333"] = 4
but I have a lot of data to be processed.
I'd like to get the result with loop, but i have tried for 2 hours and failed..
---the following codes may confuse you---
My result look like this now:
{'U1000': ['12222', 3],
'U1001': ['13333', 4]}
the value of the dict is a list rather a dictionary
the user "U1000" appears multiple times but in my result theres only one time
I think my code has many mistakes.. if you don't mind please take a look:
reader = np.array(pd.read_csv("rating_final.csv"))
included_cols = [0, 1, 2]
sample= {}
target=[]
target1 =[]
for row in reader:
content = list(row[i] for i in included_cols)
target.append(content[0])
target1.append(content[1:3])
sample = dict(zip(target, target1))
how can I improve the codes?
I have looked through stackoverflow but due to personal lacking in ability,
can anyone please kindly help me with this?
Many thanks!!
This should do what you want:
import collections
reader = ...
sample = collections.defaultdict(dict)
for user_id, place_id, rating in reader:
rating = int(rating)
sample[user_id][place_id] = rating
print(sample)
# -> {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}
defaultdict is a convenience utility that provides default values whenever you try to access a key that is not in the dictionary. If you don't like it (for example because you want sample['non-existent-user-id] to fail with KeyError), use this:
reader = ...
sample = {}
for user_id, place_id, rating in reader:
rating = int(rating)
if user_id not in sample:
sample[user_id] = {}
sample[user_id][place_id] = rating
The expected output in the example is impossible, since {'1333': 2} would not be associated with a key. You could get {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}} though, with a dict of dicts:
sample = {}
for row in reader:
userID, placeID, rating = row[:3]
sample.setdefault(userID, {})[placeID] = rating # Possibly int(rating)?
Alternatively, using collections.defaultdict(dict) to avoid the need for setdefault (or alternate approaches that involve a try/except KeyError or if userID in sample: that sacrifice the atomicity of setdefault in exchange for not creating empty dicts unnecessarily):
import collections
sample = collections.defaultdict(dict)
for row in reader:
userID, placeID, rating = row[:3]
sample[userID][placeID] = rating
# Optional conversion back to plain dict
sample = dict(sample)
The conversion back to plain dict ensures future lookups don't auto-vivify keys, raising KeyError as normal, and it looks like a normal dict if you print it.
If the included_cols is important (because names or column indices might change), you can use operator.itemgetter to speed up and simplify extracting all the desired columns at once:
from collections import defaultdict
from operator import itemgetter
included_cols = (0, 1, 2)
# If columns in data were actually:
# rating, foo, bar, userID, placeID
# we'd do this instead, itemgetter will handle all the rest:
# included_cols = (3, 4, 0)
get_cols = itemgetter(*included_cols) # Create function to get needed indices at once
sample = defaultdict(dict)
# map(get_cols, ...) efficiently converts each row to a tuple of just
# the three desired values as it goes, which also lets us unpack directly
# in the for loop, simplifying code even more by naming all variables directly
for userID, placeID, rating in map(get_cols, reader):
sample[userID][placeID] = rating # Possibly int(rating)?

Python Group Array by Column and Display Unique Values

I have an Array of Arrays with following format:
x = [["Username1","id3"],
["Username1", "id4"],
["Username1", "id4"],
["Username3", "id3"]]
I want to group by the ids and display all the unique usernames
How would I get an output that is like:
id3: Username1, Username3
id4: Username1
Edit: Was able to group by second column but I cannot only display unique values. Here is my code:
data={}
for key, group in groupby(sorted(x), key=lambda x: x[1]):
data[key]=[v[0] for v in group]
print(data)
Use dict to create unique keys by id and pythons sets to store values ( so you would store only unique names for that keys):
items = [
["Username1","id3"],
["Username1", "id4"],
["Username1", "id4"],
["Username3", "id3"]
]
data = {}
for item in items:
if data.has_key(item[1]):
data[item[1]].add(item[0])
else:
data[item[1]] = set([item[0]])
print(data)
You may use a for loop but using a linq statement might be cleaner for future usage.
https://stackoverflow.com/a/3926105/4564614
has some great ways to incorpurate linq to solve this issue. I think what you are looking for would be grouping by.
Example:
from collections import defaultdict
from operator import attrgetter
def group_by(iterable, group_func):
groups = defaultdict(list)
for item in iterable:
groups[group_func(item)].append(item)
return groups
group_by((x.foo for x in ...), attrgetter('bar'))

Categories