Pandas: how to reorganize data in dictionaries

Pandas: how to reorganize data in dictionaries - python

I have a dataframe object structured flat like this (actually I have more than 6 variables)
VAR1 VAR2 VAR3 VAR4 VAR5 VAR6
0 1 2 3 4 5 6
1 2 4 6 8 10 12
2 3 6 9 12 15 18
3 4 8 12 16 20 24
4 5 10 15 20 25 30
5 6 12 18 24 30 36
However, in order to get compatibility with other applications I would like to have a structure like this
NEW1 NEW2 NEW3
0 {"id":{"VAR1": 1}} {"AA":{"VAR2": 2, "VAR3": 3}, "CC":{"BB":{"VAR4": 4, "VAR5": 5}}} {"TS": 6}
1 {"id":{"VAR1": 2}} {"AA":{"VAR2": 4, "VAR3": 6}, "CC":{"BB":{"VAR4": 8, "VAR5":10}}} {"TS":12}
2 {"id":{"VAR1": 3}} {"AA":{"VAR2": 6, "VAR3": 9}, "CC":{"BB":{"VAR4":12, "VAR5":15}}} {"TS":18}
3 {"id":{"VAR1": 4}} {"AA":{"VAR2": 8, "VAR3":12}, "CC":{"BB":{"VAR4":16, "VAR5":20}}} {"TS":24}
4 {"id":{"VAR1": 5}} {"AA":{"VAR2":10, "VAR3":15}, "CC":{"BB":{"VAR4":20, "VAR5":25}}} {"TS":30}
5 {"id":{"VAR1": 6}} {"AA":{"VAR2":12, "VAR3":18}, "CC":{"BB":{"VAR4":24, "VAR5":30}}} {"TS":36}
Is there any easy way to achieve this result?
I have tried to use df.to_dict("index"), but it groups all the variables together while I need to split the dict into "subdictionaries" and associate them to these variables "AAA", "BBB", "id", "TS"
thank you for the tips and suggestions

Create dictionaries defining the relationships of variables to other labels
replace = {'VAR6': 'TS'}
graph = {
'VAR1': 'id', 'VAR2': 'AA', 'VAR3': 'AA',
'VAR4': 'BB', 'VAR5': 'BB',
'BB': 'CC',
'id': 'NEW1', 'AA': 'NEW2', 'CC': 'NEW2', 'TS': 'NEW3'}
def traverse(k, d):
path = []
while k in d:
k = d[k]
path.append(k)
return path
dat = {}
for i, rec in zip(df.index, df.to_dict('records')):
for k, v in rec.items():
k = replace.get(k, k)
cur = dat
path = traverse(k, thing)
cur = dat.setdefault(path.pop(), {}).setdefault(i, {})
while path:
cur = cur.setdefault(path.pop(), {})
cur[k] = v
pd.DataFrame(dat)
NEW1 NEW2 NEW3
0 {'id': {'VAR1': 1}} {'AA': {'VAR2': 2, 'VAR3': 3}, 'CC': {'BB': {'VAR4': 4, 'VAR5': 5}}} {'TS': 6}
1 {'id': {'VAR1': 2}} {'AA': {'VAR2': 4, 'VAR3': 6}, 'CC': {'BB': {'VAR4': 8, 'VAR5': 10}}} {'TS': 12}
2 {'id': {'VAR1': 3}} {'AA': {'VAR2': 6, 'VAR3': 9}, 'CC': {'BB': {'VAR4': 12, 'VAR5': 15}}} {'TS': 18}
3 {'id': {'VAR1': 4}} {'AA': {'VAR2': 8, 'VAR3': 12}, 'CC': {'BB': {'VAR4': 16, 'VAR5': 20}}} {'TS': 24}
4 {'id': {'VAR1': 5}} {'AA': {'VAR2': 10, 'VAR3': 15}, 'CC': {'BB': {'VAR4': 20, 'VAR5': 25}}} {'TS': 30}
5 {'id': {'VAR1': 6}} {'AA': {'VAR2': 12, 'VAR3': 18}, 'CC': {'BB': {'VAR4': 24, 'VAR5': 30}}} {'TS': 36}

Related

Filtering a dictionary of dictionaries by a value [duplicate]

This question already has answers here:
Elegant way to remove fields from nested dictionaries
(11 answers)
Closed 7 months ago.
I have a small example of a nested dictionary (in my case a collections defaultdict):
all_dict = {'d1': {'a': 2, 'b': 4, 'c': 10},
'd2': {'a': 1, 'b': 23,'c': 0},
'd3': {'a': 4, 'b': 12,'c': 4},
'd4': {'a': 0, 'b': 4, 'c': 3},
'd5': {'a': 4, 'b': 0, 'c': 1}}
And I would like to filter all the zero values in all_dict. In my real data, the nested dictionaries are very big. I would like any general "pythonic" solution.
I tried some comprehension but I failed.
I would like some like:
all_dict_filtered =
{'d1': {'a': 2, 'b': 4, 'c': 10},
'd2': {'a': 1, 'b': 23},
'd3': {'a': 4, 'b': 12,'c': 4},
'd4': {'b': 4, 'c': 3},
'd5': {'a': 4, 'c': 1}}
Any tip would be great.
Thank you for your time and attention.
Paulo
I have this but is ugly:
filtered = defaultdict(dict)
for k1, v1 in all_dict.items():
for k2, v2 in v1.items():
if v2 > 0:
filtered[k1] = filtered.get(k1, {})
filtered[k1][k2] = v2
defaultdict(dict,
{'d1': {'a': 2, 'b': 4, 'c': 10},
'd2': {'a': 1, 'b': 23},
'd3': {'a': 4, 'b': 12, 'c': 4},
'd4': {'b': 4, 'c': 3},
'd5': {'a': 4, 'c': 1}})

Simple one-liner using dict comprehensions:
import pprint
all_dict = {
'd1': {'a': 2, 'b': 4, 'c': 10},
'd2': {'a': 1, 'b': 23, 'c': 0},
'd3': {'a': 4, 'b': 12, 'c': 4},
'd4': {'a': 0, 'b': 4, 'c': 3},
'd5': {'a': 4, 'b': 0, 'c': 1}
}
pprint.pprint({k: {key: value for key, value in v.items() if value != 0} for k, v in all_dict.items()})
Output:
{'d1': {'a': 2, 'b': 4, 'c': 10},
'd2': {'a': 1, 'b': 23},
'd3': {'a': 4, 'b': 12, 'c': 4},
'd4': {'b': 4, 'c': 3},
'd5': {'a': 4, 'c': 1}}

Try:
all_dict = {
"d1": {"a": 2, "b": 4, "c": 10},
"d2": {"a": 1, "b": 23, "c": 0},
"d3": {"a": 4, "b": 12, "c": 4},
"d4": {"a": 0, "b": 4, "c": 3},
"d5": {"a": 4, "b": 0, "c": 1},
}
for k, v in all_dict.items():
all_dict[k] = {kk: vv for kk, vv in v.items() if vv != 0}
print(all_dict)
Prints:
{
"d1": {"a": 2, "b": 4, "c": 10},
"d2": {"a": 1, "b": 23},
"d3": {"a": 4, "b": 12, "c": 4},
"d4": {"b": 4, "c": 3},
"d5": {"a": 4, "c": 1},
}

took me some time, but here you go:
dicty = {'d1': {'a': 2, 'b': 4, 'c': 10},
'd2': {'a': 1, 'b': 23,'c': 0},
'd3': {'a': 4, 'b': 12,'c': 4},
'd4': {'a': 0, 'b': 4, 'c': 3},
'd5': {'a': 4, 'b': 0, 'c': 1}}
listy = []
for value in dicty:
for value2 in dicty[value]:
if dicty[value][value2] == 0:
listy.append((value, value2))
for inpurity in listy:
dicty[inpurity[0]].pop(inpurity[1])
print(listy)
print(dicty)

Try this:
import copy
all_dict = {'d1': {'a': 2, 'b': 4, 'c': 10},
'd2': {'a': 1, 'b': 23,'c': 0},
'd3': {'a': 4, 'b': 12,'c': 4},
'd4': {'a': 0, 'b': 4, 'c': 3},
'd5': {'a': 4, 'b': 0, 'c': 1}}
all_dict_filtered = copy.deepcopy(all_dict)
for i in all_dict:
for j in all_dict[i] :
if all_dict[i][j] == 0 :
all_dict_filtered[str(i)].pop(str(j))
print(all_dict_filtered)
Output : {'d1': {'a': 2, 'b': 4, 'c': 10}, 'd2': {'a': 1, 'b': 23}, 'd3': {'a': 4, 'b': 12, 'c': 4}, 'd4': {'b': 4, 'c': 3}, 'd5': {'a': 4, 'c': 1}}

How to create a nested json with key names in python?

I have the following data in a pandas dataframe. I want to group the data based on month, then type.
month hour Type count
0 4 0 Bike 8
1 4 0 Pedelec 16
2 4 1 Bike 9
3 4 1 Pedelec 4
4 4 2 Bike 18
... ... ... ... ...
412 12 21 Pedelec 15
413 12 22 Bike 7
414 12 22 Pedelec 10
415 12 23 Bike 2
416 12 23 Pedelec 15
I want to convert this to a nested json with field names. The code I use to create a dictionary is this:
jsonfile=barchart.groupby(['month','Type'])[['hour','count']].apply(lambda x: x.to_dict('r')).reset_index(name='data').groupby('month')['Type','data'].apply(lambda x: x.set_index('Type')['data'].to_dict()).reset_index(name='data').groupby('month')['data'].apply(list).to_dict()
The output I get is in this format:
[{'month': 4,
'values': [{'Bike': [{'hour': 0, 'count': 8},
{'hour': 1, 'count': 9},
{'hour': 2, 'count': 18},
{'hour': 3, 'count': 2},
{'hour': 4, 'count': 2},
...
{'hour': 23, 'count': 14}],
'Pedelec': [{'hour': 0, 'count': 16},
{'hour': 1, 'count': 4},
{'hour': 2, 'count': 12},
...
{'hour': 23, 'count': 27}]}]},
Expected output:
[{'month': 4,
'values': [{'Type': 'Bike': [{'hour': 0, 'count': 8},
{'hour': 1, 'count': 9},

I used the following to create my deired format
jsonfile=barchart.groupby(['month','Type'])[['hour','count']].apply(lambda x: x.to_dict('r')).reset_index(name='data').groupby('month')['Type','data'].apply(lambda x: x.set_index('Type')['data'].to_dict()).reset_index(name='data').groupby('month')['data'].apply(list).to_dict()
json_arr=[]
for month,values in jsonfile.items():
arr=[]
for value in values:
for types, val in value.items():
arr.append({"type": types, "values": val})
json_arr.append({"month": month, "values": arr} )

Convert list of dicts of dict into DataFrame

I have a list of dictionaries of dictionary looks like:
[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
and the result should looks like:
a c d e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
while the default pd.DataFrame(data) looks like:
a b f
0 1 {'c': 1, 'd': 2, 'e': 3} 4
1 2 {'c': 2, 'd': 3, 'e': 4} 3
2 3 {'c': 3, 'd': 4, 'e': 5} 2
3 4 {'c': 4, 'd': 5, 'e': 6} 1
How can I do this with pandas? Thanks.

you need to convert json to flat data as such:
import pandas as pd
from pandas.io.json import json_normalize
data = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
df = pd.DataFrame.from_dict(json_normalize(data), orient='columns')
df
# output:
a b.c b.d b.e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
You can rename the columns once it's done..

json_normalize is what you're loooking for!
import pandas as pd
from pandas.io.json import json_normalize
x = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
sep = '::::' # string that doesn't appear in column names
frame = json_normalize(x, sep=sep)
frame.columns = frame.columns.str.split(sep).str[-1]
print(frame)
Output
a c d e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1

import pandas as pd
z=[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
step1=pd.DataFrame(z)
column_with_sets = 'b'
step2=pd.DataFrame(list(step1[column_with_sets]))
step3=pd.concat([step1[[i for i in step1.columns if column_with_sets
not in i]], step2],1)
step4=output.reindex_axis(sorted(output.columns), axis=1)

Three lists zipped into list of dicts

Consider the following:
>>> # list of length n
>>> idx = ['a', 'b', 'c', 'd']
>>> # list of length n
>>> l_1 = [1, 2, 3, 4]
>>> # list of length n
>>> l_2 = [5, 6, 7, 8]
>>> # first key
>>> key_1 = 'mkt_o'
>>> # second key
>>> key_2 = 'mkt_c'
How do I zip this mess to look like this?
{
'a': {'mkt_o': 1, 'mkt_c': 5},
'b': {'mkt_o': 2, 'mkt_c': 6},
'c': {'mkt_o': 3, 'mkt_c': 6},
'd': {'mkt_o': 4, 'mkt_c': 7},
...
}
The closest I've got is something like this:
>>> dict(zip(idx, zip(l_1, l_2)))
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7), 'd': (4, 8)}
Which of course has tuples as values instead of dictionaries, and
>>> dict(zip(('mkt_o', 'mkt_c'), (1,2)))
{'mkt_o': 1, 'mkt_c': 2}
Which seems like it might be promising, but again, fails to meet requirements.

{k : {key_1 : v1, key_2 : v2} for k,v1,v2 in zip(idx, l_1, l_2)}

Solution 1: You may use zip twice (actually thrice) with dictionary comprehension to achieve this as:
idx = ['a', 'b', 'c', 'd']
l_1 = [1, 2, 3, 4]
l_2 = [5, 6, 7, 8]
keys = ['mkt_o', 'mkt_c'] # yours keys in another list
new_dict = {k: dict(zip(keys, v)) for k, v in zip(idx, zip(l_1, l_2))}
Solution 2: You may also use zip with nested list comprehension as:
new_dict = dict(zip(idx, [{key_1: i, key_2: j} for i, j in zip(l_1, l_2)]))
Solution 3: using dictionary comprehension on top of zip as shared in DYZ's answer:
new_dict = {k : {key_1 : v1, key_2 : v2} for k,v1,v2 in zip(idx, l_1, l_2)}
All the above solutions will return new_dict as:
{
'a': {'mkt_o': 1, 'mkt_c': 5},
'b': {'mkt_o': 2, 'mkt_c': 6},
'c': {'mkt_o': 3, 'mkt_c': 7},
'd': {'mkt_o': 4, 'mkt_c': 8}
}

You're working with dicts, lists, indices, keys and would like to transpose the data. It might make sense to work with pandas (DataFrame, .T and .to_dict):
>>> import pandas as pd
>>> idx = ['a', 'b', 'c', 'd']
>>> l_1 = [1, 2, 3, 4]
>>> l_2 = [5, 6, 7, 8]
>>> key_1 = 'mkt_o'
>>> key_2 = 'mkt_c'
>>> pd.DataFrame([l_1, l_2], index=[key_1, key_2], columns = idx)
a b c d
mkt_o 1 2 3 4
mkt_c 5 6 7 8
>>> pd.DataFrame([l_1, l_2], index=[key_1, key_2], columns = idx).T
mkt_o mkt_c
a 1 5
b 2 6
c 3 7
d 4 8
>>> pd.DataFrame([l_1, l_2], index=[key_1, key_2], columns = idx).to_dict()
{'a': {'mkt_o': 1, 'mkt_c': 5},
'b': {'mkt_o': 2, 'mkt_c': 6},
'c': {'mkt_o': 3, 'mkt_c': 7},
'd': {'mkt_o': 4, 'mkt_c': 8}
}

It can also be done with dict, zip, map and repeat from itertools:
>>> from itertools import repeat
>>> dict(zip(idx, map(dict, zip(zip(repeat(key_1), l_1), zip(repeat(key_2), l_2)))))
{'a': {'mkt_c': 5, 'mkt_o': 1}, 'c': {'mkt_c': 7, 'mkt_o': 3}, 'b': {'mkt_c': 6, 'mkt_o': 2}, 'd': {'mkt_c': 8, 'mkt_o': 4}}

Combining all combinations of two lists into a dict of special form

I have two lists:
var_a = [1,2,3,4]
var_b = [6,7]
I want to have a list of dicts as follows:
result = [{'a':1,'b':6},{'a':1,'b':7},{'a':2,'b':6},{'a':2,'b':7},....]
I think the result should be clear.

[{k:v for k,v in itertools.izip('ab', comb)} for comb in itertools.product([1,2,3,4], [6,7])]
>>> import itertools
>>> [{k:v for k,v in itertools.izip('ab', comb)} for comb in itertools.product([
1,2,3,4], [6,7])]
[{'a': 1, 'b': 6}, {'a': 1, 'b': 7}, {'a': 2, 'b': 6}, {'a': 2, 'b': 7}, {'a': 3
, 'b': 6}, {'a': 3, 'b': 7}, {'a': 4, 'b': 6}, {'a': 4, 'b': 7}]

from itertools import product
a = [1,2,3,4]
b = [6,7]
[dict(zip(('a','b'), (i,j))) for i,j in product(a,b)]
yields
[{'a': 1, 'b': 6},
{'a': 1, 'b': 7},
{'a': 2, 'b': 6},
{'a': 2, 'b': 7},
{'a': 3, 'b': 6},
{'a': 3, 'b': 7},
{'a': 4, 'b': 6},
{'a': 4, 'b': 7}]

If the name of variables is given to you, you could use.
>>> a = [1,2,3,4]
>>> b = [6,7]
>>> from itertools import product
>>> nameTup = ('a', 'b')
>>> [dict(zip(nameTup, elem)) for elem in product(a, b)]
[{'a': 1, 'b': 6}, {'a': 1, 'b': 7}, {'a': 2, 'b': 6}, {'a': 2, 'b': 7}, {'a': 3, 'b': 6}, {'a': 3, 'b': 7}, {'a': 4, 'b': 6}, {'a': 4, 'b': 7}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas: how to reorganize data in dictionaries - python

Related

Filtering a dictionary of dictionaries by a value [duplicate]

How to create a nested json with key names in python?

Convert list of dicts of dict into DataFrame

Three lists zipped into list of dicts

Combining all combinations of two lists into a dict of special form

Categories

Resources