How to convert list of nested dictionary to pandas DataFrame? - python

I have some data containing nested dictionaries like below:
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
If we convert it to pandas DataFrame,
import pandas as pd
result_dataframe = pd.DataFrame(mylist)
print(result_dataframe)
It will output:
a b
0 1 {'c': 2, 'd': 3}
1 3 {'c': 4, 'd': 3}
I want to convert the list of dictionaries and ignore the key of the nested dictionary. My code is below:
new_dataframe = result_dataframe.drop(columns=["b"])
b_dict_list = [document["b"] for document in mylist]
b_df = pd.DataFrame(b_dict_list)
frames = [new_dataframe, b_df]
total_frame = pd.concat(frames, axis=1)
The total_frame is which I want:
a c d
0 1 2 3
1 3 4 3
But I think my code is a little complicated. Is there any simple way to deal with this problem? Thank you.

I had a similar problem to this one. I used pd.json_normalize(x) and it worked. The only difference is that the column names of the data frame will look a little different.
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
df = pd.json_normalize(mylist)
print(df)
Output:
a
b.c
b.d
0
1
2
3
1
3
4
3

Use dict comprehension with pop for extract value b and merge dictionaries:
a = [{**x, **x.pop('b')} for x in mylist]
print (a)
[{'a': 1, 'c': 2, 'd': 3}, {'a': 3, 'c': 4, 'd': 3}]
result_dataframe = pd.DataFrame(a)
print(result_dataframe)
a c d
0 1 2 3
1 3 4 3
Another solution, thanks #Sandeep Kadapa :
a = [{'a': x['a'], **x['b']} for x in mylist]
#alternative
a = [{'a': x['a'], **x.get('b')} for x in mylist]

Or by applying pd.Series() to your method:
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
result_dataframe = pd.DataFrame(mylist)
result_dataframe.drop('b',1).join(result_dataframe.b.apply(pd.Series))
a c d
0 1 2 3
1 3 4 3

I prefer to write a function that accepts your mylist and converts it 1 nested layer down and returns a dictionary. This has the added advantage of not requiring you to 'manually' know what key like b to convert. So this function works for all nested keys 1 layer down.
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
import pandas as pd
def dropnested(alist):
outputdict = {}
for dic in alist:
for key, value in dic.items():
if isinstance(value, dict):
for k2, v2, in value.items():
outputdict[k2] = outputdict.get(k2, []) + [v2]
else:
outputdict[key] = outputdict.get(key, []) + [value]
return outputdict
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
# a c d
#0 1 2 3
#1 3 4 3
If you try:
mylist = [{"a": 1, "b": {"c": 2, "d":3}, "g": {"e": 2, "f":3}},
{"a": 3, "z": {"c": 4, "d":3}, "e": {"e": 2, "f":3}}]
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
# a c d e f
#0 1 2 3 2 3
#1 3 4 3 2 3
We can see here that it converts keys b,g,z,e without issue, as opposed to having to define each and every nested key name to convert

Related

How to create a Pandas Dataframe from a dictionary with values into one column?

Suppose dict = {'A':{1,2,4}, 'B':{5,6}}, How to create a Pandas Dataframe like this:
Key Value
0 'A' {1,2,4}
1 'B' {5,6}
You can feed the dict to pd.Series and then convert the series to dataframe with reset_index(), as follows:
d = {'A':{1,2,4}, 'B':{5,6}}
df = pd.Series(d).rename_axis(index='Key').reset_index(name='Value')
Result:
print(df)
Key Value
0 A {1, 2, 4}
1 B {5, 6}
Try:
dct = {"A": {1, 2, 4}, "B": {5, 6}}
df = pd.DataFrame({"Key": dct.keys(), "Value": dct.values()})
print(df)
Prints:
Key Value
0 A {1, 2, 4}
1 B {5, 6}

Python: issue trying to merge two dictionaries in which values must be added up

I'm extremely new to Python and stuck with a task of the online course I'm following. My knowledge of Python is very limited.
Here is the task: ''' Write a script that takes the following two
dictionaries and creates a new dictionary by combining the common keys
and adding the values of duplicate keys together. Please use For Loops
to iterate over these dictionaries to accomplish this task.
Example input/output:
dict_1 = {"a": 1, "b": 2, "c": 3} dict_2 = {"a": 2, "c": 4 , "d": 2}
result = {"a": 3, "b": 2, "c": 7 , "d": 2}
'''
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3 = {}
for x, y in dict_1.items():
for z, h in dict_2.items():
if x == z:
dict_3[x] = (y + h)
else:
dict_3[x] = (y)
dict_3[z] = (h)
print(dict_3)
Wrong output:
{'a': 2, 'c': 3, 'd': 2, 'b': 2}
Everything is working up till the "else" condition.
I'm trying to isolate only the unique occurrences of both dictionaries, but the result actually overwrites what I added to the dictionary in the condition before.
Do you know a way to isolate only the single occurrences for every dictionary? I guess you could count them and add "if count is 1" condition, but I can't happen to make that work. Thanks!
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
key_list = {*dict_1, *dict_2}
sum ={}
for key in key_list:
sum[key] = dict_1.get(key, 0) + dict_2.get(key, 0)
print(sum)
#{'a': 3, 'c': 7, 'd': 2, 'b': 2}
Not the most elegant or efficient solution, but an intuitive way would be to extract a list of the unique keys and then iterate over the new list of keys to extract and append the values from the two dictionaries.
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
result = {}
# Extract the unique keys from both dicts
keys = set.union(set(dict_1.keys()), set(dict_2.keys()))
# Initialize the values of the result dictionary
for key in sorted(keys):
result[key] = 0
# Append the values of dict_1 and dict_2 to result if key is present
for key in keys:
if key in dict_1:
result[key] += dict_1[key]
if key in dict_2:
result[key] += dict_2[key]
print(result)
This will print: {'a': 3, 'b': 2, 'c': 7, 'd': 2}
Perhaps collections.defaultdict would be more suited to your purposes; when there's a value that it doesn't have, it just returns a default value that you assign to it and puts it in its "actual" dictionary. Then you can just convert it back to a normal dictionary with the dict() function.
from collections import defaultdict
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3 = defaultdict(int) # provide 0 as the default value
for k, v in dict_1.items():
dict_3[k] += v
for k, v in dict_2.items():
dict_3[k] += v
print(dict(dict_3)) # convert back to normal dictionary
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3={}
for key in dict_1:
if key in dict_2:
dict_3[key] = dict_2[key] + dict_1[key]
else:
dict_3[key]=dict_1[key]
for key in dict_2:
if key in dict_1:
dict_3[key] = dict_2[key] + dict_1[key]
else:
dict_3[key]=dict_2[key]
print(dict_3)
If you would like to avoid the use of a loop, you could use dictionary comprehension using get with default value 0 to avoid running into KeyError:
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3 = {key: dict_1.get(key, 0) + dict_2.get(key, 0) for key in set(list(dict_1.keys())+list(dict_2.keys()))}
>>> {'c': 7, 'b': 2, 'd': 2, 'a': 3}
Though this is possibly unnecessarily advanced.
These changes to your original code produce your desired results.
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3 = {}
for x, y in dict_1.items():
if x not in dict_2 and x not in dict_3:
dict_3[x] = y
for z, h in dict_2.items():
if x == z:
dict_3[x] = y + h
elif z not in dict_1 and z not in dict_3:
dict_3[z] = h
print(dict_3)
Prints:
{'a': 3, 'd': 2, 'b': 2, 'c': 7}
A more concise way would be to set dict_3 to dict_1 and then iterate over dict_2.
dict_3 = dict_1.copy()
for key, val in dict_2.items():
dict_3[key] = dict_3.get(key, 0) + val
The line dict_3.get(key, 0) gets the value for key in dict_3 if it exists in dict_3, otherwise, it supplies the value 0.

Iterate over X dictionary items in Python

How can I iterate over only X number of dictionary items? I can do it using the following bulky way, but I am sure Python allows a more elegant way.
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
x = 0
for key in d:
if x == 3:
break
print key
x += 1
If you want a random sample of X values from a dictionary you can use random.sample on the dictionary's keys:
from random import sample
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
X = 3
for key in sample(d, X):
print key, d[key]
And get output for example:
e 5
c 3
b 2

dataframe to json with orient index and index equals row value

I have a pandas dataframe that I am trying to convert to a certain json format:
df = pd.DataFrame([['A',1,2,3],['B',2,3,4],['C','C',1,6],['D','D',9,7]], columns=['W','X','Y','Z'])
df.set_index('W', inplace=True, drop=True, append=False)
df
X Y Z
W
A 1 2 3
B 2 3 4
C C 1 6
D D 9 7
I am looking to get a json output as follows:
output_json = {'A': {'X':1,'Y':2,'Z':3}, 'B': {'X':2,'Y':3,'Z':4}, 'C':{'Y':1,'Z':6}, 'D': {'Y':9,'Z':7} }
This is what I have tried but I can't get the desired result for 'C' and 'D' keys:
df.to_json(orient='index')
'{"A":{"X":1,"Y":2,"Z":3},"B":{"X":2,"Y":3,"Z":4},"C":{"X":"C","Y":1,"Z":6},"D":{"X":"D","Y":9,"Z":7}}'
How to fix this? Perhaps this is something straightforward that I am missing. Thanks.
You can first convert to_dict and then use nested dict comprehension for filtering only int values, last for json use dumps:
import json
d = df.to_dict(orient='index')
j = json.dumps({k:{x:y for x,y in v.items() if isinstance(y, int)} for k, v in d.items()})
print (j)
{"A": {"X": 1, "Y": 2, "Z": 3},
"C": {"Y": 1, "Z": 6},
"D": {"Y": 9, "Z": 7},
"B": {"X": 2, "Y": 3, "Z": 4}}

How to multiply to each value in each element in SArray in Python?

I'm using Graphlab, but I guess this question can apply to pandas.
import graphlab
sf = graphlab.SFrame({'id': [1, 2, 3], 'user_score': [{"a":4, "b":3}, {"a":5, "b":7}, {"a":2, "b":3}], 'weight': [4, 5, 2]})
I want to create a new column where the value of each element in 'user_score' is multiplied by the number in 'weight'. That is,
sf = graphlab.SFrame({'id': [1, 2, 3], 'user_score': [{"a":4, "b":3}, {"a":5, "b":7}, {"a":2, "b":3}], 'weight': [4, 5, 2]}, 'new':[{"a":16, "b":12}, {"a":25, "b":35}, {"a":4, "b":6}])
I tried to write a simple function below and applied to no avail. Any thoughts?
def trans(x, y):
d = dict()
for k, v in x.items():
d[k] = v*y
return d
sf.apply(trans(sf['user_score'], sf['weight']))
It got the following error message:
AttributeError: 'SArray' object has no attribute 'items'
I'm using pandas dataframe, but it should also work in your case.
import pandas as pd
df['new']=[dict((k,v*y) for k,v in x.items()) for x, y in zip(df['user_score'], df['weight'])]
Input dataframe:
df
Out[34]:
id user_score weight
0 1 {u'a': 4, u'b': 3} 4
1 2 {u'a': 5, u'b': 7} 5
2 3 {u'a': 2, u'b': 3} 2
Output:
df
Out[36]:
id user_score weight new
0 1 {u'a': 4, u'b': 3} 4 {u'a': 16, u'b': 12}
1 2 {u'a': 5, u'b': 7} 5 {u'a': 25, u'b': 35}
2 3 {u'a': 2, u'b': 3} 2 {u'a': 4, u'b': 6}
This is subtle, but I think what you want is this:
sf.apply(lambda row: trans(row['user_score'], row['weight']))
The apply function takes a function as its argument, and will pass each row as the parameter to that function. In your version, you are evaluating the trans function before apply is called, which is why the error message complains about passing an SArray to the trans function when a dict is expected.
here is one of many possible solutions:
In [69]: df
Out[69]:
id user_score weight
0 1 {'b': 3, 'a': 4} 4
1 2 {'b': 7, 'a': 5} 5
2 3 {'b': 3, 'a': 2} 2
In [70]: df['user_score'] = df['user_score'].apply(lambda x: pd.Series(x)).mul(df.weight, axis=0).to_dict('record')
In [71]: df
Out[71]:
id user_score weight
0 1 {'b': 12, 'a': 16} 4
1 2 {'b': 35, 'a': 25} 5
2 3 {'b': 6, 'a': 4} 2

Categories