dataframe to json with orient index and index equals row value - python

I have a pandas dataframe that I am trying to convert to a certain json format:
df = pd.DataFrame([['A',1,2,3],['B',2,3,4],['C','C',1,6],['D','D',9,7]], columns=['W','X','Y','Z'])
df.set_index('W', inplace=True, drop=True, append=False)
df
X Y Z
W
A 1 2 3
B 2 3 4
C C 1 6
D D 9 7
I am looking to get a json output as follows:
output_json = {'A': {'X':1,'Y':2,'Z':3}, 'B': {'X':2,'Y':3,'Z':4}, 'C':{'Y':1,'Z':6}, 'D': {'Y':9,'Z':7} }
This is what I have tried but I can't get the desired result for 'C' and 'D' keys:
df.to_json(orient='index')
'{"A":{"X":1,"Y":2,"Z":3},"B":{"X":2,"Y":3,"Z":4},"C":{"X":"C","Y":1,"Z":6},"D":{"X":"D","Y":9,"Z":7}}'
How to fix this? Perhaps this is something straightforward that I am missing. Thanks.

You can first convert to_dict and then use nested dict comprehension for filtering only int values, last for json use dumps:
import json
d = df.to_dict(orient='index')
j = json.dumps({k:{x:y for x,y in v.items() if isinstance(y, int)} for k, v in d.items()})
print (j)
{"A": {"X": 1, "Y": 2, "Z": 3},
"C": {"Y": 1, "Z": 6},
"D": {"Y": 9, "Z": 7},
"B": {"X": 2, "Y": 3, "Z": 4}}

Related

How to create a Pandas Dataframe from a dictionary with values into one column?

Suppose dict = {'A':{1,2,4}, 'B':{5,6}}, How to create a Pandas Dataframe like this:
Key Value
0 'A' {1,2,4}
1 'B' {5,6}
You can feed the dict to pd.Series and then convert the series to dataframe with reset_index(), as follows:
d = {'A':{1,2,4}, 'B':{5,6}}
df = pd.Series(d).rename_axis(index='Key').reset_index(name='Value')
Result:
print(df)
Key Value
0 A {1, 2, 4}
1 B {5, 6}
Try:
dct = {"A": {1, 2, 4}, "B": {5, 6}}
df = pd.DataFrame({"Key": dct.keys(), "Value": dct.values()})
print(df)
Prints:
Key Value
0 A {1, 2, 4}
1 B {5, 6}

How to merge and sum two dictionaries into a single one whilst removing keys that are not common

I want to merge two dictionaries into a single dictionary with only common keys between the two.
Here are the two dictionaries
{"a": 5, "b": 8, "d": 9, "z": 4}
{"a": 1, "b": 1, "d": 2, "e": 1}
The result that I want is:
{"a": 6, "b": 9, "d": 11}
Do you guys know any way to do this?
You can try this -
Idea is to find intersections of common keys within both of the dictionaries and sum them up from both
d1.keys() & d2.keys()
d1 = {"a": 5, "b": 8, "d": 9, "z": 4}
d2 = {"a": 1, "b": 1, "d": 2, "e": 1}
result = {key: d1[key] + d2[key] for key in d1.keys() & d2.keys()}
result
{'a': 6, 'd': 11, 'b': 9}
First you need to get common keys between two dicts:
Lets say you have two dicts d1 and d2
common_keys = list(set(d1.keys()).intersection(set(d2.keys())))
new_dict = {}
for key in common_keys:
new_dict[key] = d1[key] + d2[key]

Python: issue trying to merge two dictionaries in which values must be added up

I'm extremely new to Python and stuck with a task of the online course I'm following. My knowledge of Python is very limited.
Here is the task: ''' Write a script that takes the following two
dictionaries and creates a new dictionary by combining the common keys
and adding the values of duplicate keys together. Please use For Loops
to iterate over these dictionaries to accomplish this task.
Example input/output:
dict_1 = {"a": 1, "b": 2, "c": 3} dict_2 = {"a": 2, "c": 4 , "d": 2}
result = {"a": 3, "b": 2, "c": 7 , "d": 2}
'''
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3 = {}
for x, y in dict_1.items():
for z, h in dict_2.items():
if x == z:
dict_3[x] = (y + h)
else:
dict_3[x] = (y)
dict_3[z] = (h)
print(dict_3)
Wrong output:
{'a': 2, 'c': 3, 'd': 2, 'b': 2}
Everything is working up till the "else" condition.
I'm trying to isolate only the unique occurrences of both dictionaries, but the result actually overwrites what I added to the dictionary in the condition before.
Do you know a way to isolate only the single occurrences for every dictionary? I guess you could count them and add "if count is 1" condition, but I can't happen to make that work. Thanks!
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
key_list = {*dict_1, *dict_2}
sum ={}
for key in key_list:
sum[key] = dict_1.get(key, 0) + dict_2.get(key, 0)
print(sum)
#{'a': 3, 'c': 7, 'd': 2, 'b': 2}
Not the most elegant or efficient solution, but an intuitive way would be to extract a list of the unique keys and then iterate over the new list of keys to extract and append the values from the two dictionaries.
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
result = {}
# Extract the unique keys from both dicts
keys = set.union(set(dict_1.keys()), set(dict_2.keys()))
# Initialize the values of the result dictionary
for key in sorted(keys):
result[key] = 0
# Append the values of dict_1 and dict_2 to result if key is present
for key in keys:
if key in dict_1:
result[key] += dict_1[key]
if key in dict_2:
result[key] += dict_2[key]
print(result)
This will print: {'a': 3, 'b': 2, 'c': 7, 'd': 2}
Perhaps collections.defaultdict would be more suited to your purposes; when there's a value that it doesn't have, it just returns a default value that you assign to it and puts it in its "actual" dictionary. Then you can just convert it back to a normal dictionary with the dict() function.
from collections import defaultdict
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3 = defaultdict(int) # provide 0 as the default value
for k, v in dict_1.items():
dict_3[k] += v
for k, v in dict_2.items():
dict_3[k] += v
print(dict(dict_3)) # convert back to normal dictionary
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3={}
for key in dict_1:
if key in dict_2:
dict_3[key] = dict_2[key] + dict_1[key]
else:
dict_3[key]=dict_1[key]
for key in dict_2:
if key in dict_1:
dict_3[key] = dict_2[key] + dict_1[key]
else:
dict_3[key]=dict_2[key]
print(dict_3)
If you would like to avoid the use of a loop, you could use dictionary comprehension using get with default value 0 to avoid running into KeyError:
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3 = {key: dict_1.get(key, 0) + dict_2.get(key, 0) for key in set(list(dict_1.keys())+list(dict_2.keys()))}
>>> {'c': 7, 'b': 2, 'd': 2, 'a': 3}
Though this is possibly unnecessarily advanced.
These changes to your original code produce your desired results.
dict_1 = {"a": 1, "b": 2, "c": 3}
dict_2 = {"a": 2, "c": 4 , "d": 2}
dict_3 = {}
for x, y in dict_1.items():
if x not in dict_2 and x not in dict_3:
dict_3[x] = y
for z, h in dict_2.items():
if x == z:
dict_3[x] = y + h
elif z not in dict_1 and z not in dict_3:
dict_3[z] = h
print(dict_3)
Prints:
{'a': 3, 'd': 2, 'b': 2, 'c': 7}
A more concise way would be to set dict_3 to dict_1 and then iterate over dict_2.
dict_3 = dict_1.copy()
for key, val in dict_2.items():
dict_3[key] = dict_3.get(key, 0) + val
The line dict_3.get(key, 0) gets the value for key in dict_3 if it exists in dict_3, otherwise, it supplies the value 0.

How do I get a unique set of values for a specific key in a list of dictionaries?

I'm using Python 3.8. If I want to get a unique set of values for an array of dictionaries, I can do the below
>>> lis = [{"a": 1, "b": 2}, {"a": 3, "b": 4}]
>>> s = set( val for dic in lis for val in dic.values())
>>> s
{1, 2, 3, 4}
However, how would I refine the above if I only wanted a unique set of values for the dictionary key "a"? In the above, the answer would be
{1, 3}
I'll assume that each dictionary in the array has the same set of keys.
You could simply do:
lis = [{"a": 1, "b": 2}, {"a": 3, "b": 4}]
# assuming the variable key points to what you want
key = 'a'
a_values = set(dictionary[key] for dictionary in lis)
I hope I've understood what you are looking for. Thanks.
You can do it like this:
lis = [{"a": 1, "b": 2}, {"a": 3, "b": 4}]
search_key = 'a'
s = set(val for dic in lis for key, val in dic.items() if key == search_key)
print(s)
#OUTPUT: {1, 3}
Use the dic.items() instead of dic.values() and check where the key is a.
Another way to do it to simplify the things:
lis = [{"a": 1, "b": 2}, {"a": 3, "b": 4}]
search_key = 'a'
s = set(dic.get(search_key) for dic in lis)
print(s)

How to convert list of nested dictionary to pandas DataFrame?

I have some data containing nested dictionaries like below:
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
If we convert it to pandas DataFrame,
import pandas as pd
result_dataframe = pd.DataFrame(mylist)
print(result_dataframe)
It will output:
a b
0 1 {'c': 2, 'd': 3}
1 3 {'c': 4, 'd': 3}
I want to convert the list of dictionaries and ignore the key of the nested dictionary. My code is below:
new_dataframe = result_dataframe.drop(columns=["b"])
b_dict_list = [document["b"] for document in mylist]
b_df = pd.DataFrame(b_dict_list)
frames = [new_dataframe, b_df]
total_frame = pd.concat(frames, axis=1)
The total_frame is which I want:
a c d
0 1 2 3
1 3 4 3
But I think my code is a little complicated. Is there any simple way to deal with this problem? Thank you.
I had a similar problem to this one. I used pd.json_normalize(x) and it worked. The only difference is that the column names of the data frame will look a little different.
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
df = pd.json_normalize(mylist)
print(df)
Output:
a
b.c
b.d
0
1
2
3
1
3
4
3
Use dict comprehension with pop for extract value b and merge dictionaries:
a = [{**x, **x.pop('b')} for x in mylist]
print (a)
[{'a': 1, 'c': 2, 'd': 3}, {'a': 3, 'c': 4, 'd': 3}]
result_dataframe = pd.DataFrame(a)
print(result_dataframe)
a c d
0 1 2 3
1 3 4 3
Another solution, thanks #Sandeep Kadapa :
a = [{'a': x['a'], **x['b']} for x in mylist]
#alternative
a = [{'a': x['a'], **x.get('b')} for x in mylist]
Or by applying pd.Series() to your method:
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
result_dataframe = pd.DataFrame(mylist)
result_dataframe.drop('b',1).join(result_dataframe.b.apply(pd.Series))
a c d
0 1 2 3
1 3 4 3
I prefer to write a function that accepts your mylist and converts it 1 nested layer down and returns a dictionary. This has the added advantage of not requiring you to 'manually' know what key like b to convert. So this function works for all nested keys 1 layer down.
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
import pandas as pd
def dropnested(alist):
outputdict = {}
for dic in alist:
for key, value in dic.items():
if isinstance(value, dict):
for k2, v2, in value.items():
outputdict[k2] = outputdict.get(k2, []) + [v2]
else:
outputdict[key] = outputdict.get(key, []) + [value]
return outputdict
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
# a c d
#0 1 2 3
#1 3 4 3
If you try:
mylist = [{"a": 1, "b": {"c": 2, "d":3}, "g": {"e": 2, "f":3}},
{"a": 3, "z": {"c": 4, "d":3}, "e": {"e": 2, "f":3}}]
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
# a c d e f
#0 1 2 3 2 3
#1 3 4 3 2 3
We can see here that it converts keys b,g,z,e without issue, as opposed to having to define each and every nested key name to convert

Categories