turn a dict that may contain a pandas dataframe to several dicts - python

I have a dict that may be 'infinitely' nested and contain several pandas DataFrame's (all the DataFrame's have the same amount of rows).
I want to create a new dict for each row in the DataFrame's, with the row being transformed to a dict (the key's are the column names) and the rest of the dictionary staying the same.
Note: I am not making a cartesian product between the rows of the different DataFrame's.
what would be the best and most pythonic way to do it?
Example:
the original dict:
d = {'a': 1,
'inner': {
'b': 'string',
'c': pd.DataFrame({'c_col1': range(1,3), 'c_col2': range(2,4)})
},
'd': pd.DataFrame({'d_col1': range(4,6), 'd_col2': range(7,9)})
}
the desired result:
lst_of_dicts = [
{'a': 1,
'inner': {
'b': 'string',
'c': {
'c_col1': 1, 'c_col2':2
}
},
'd': {
'd_col1': 4, 'd_col2': 7
}
},
{'a': 1,
'inner': {
'b': 'string',
'c': {
'c_col1': 2, 'c_col2': 3
}
},
'd': {
'd_col1': 5, 'd_col2': 8
}
}
]

Related

how to sort a Dictionary in list? [duplicate]

This question already has answers here:
How do I sort a list of dictionaries by a value of the dictionary?
(20 answers)
Closed 2 months ago.
I had Question in python
Imagine a list with dictionaries in it
how can we sort it by a value in dictionary ?
Imagine this list :
lst = [
{
"a" : 3,
"b" : 2
},
{
"a" : 1,
"b" : 4
},
{
"a" : 2,
"b" : 3
}
]
how can we sort this list by value of "a" in each dictionary (python)
i mean i want this list at the end :
lst = [
{
"a" : 1,
"b" : 4
},
{
"a" : 2,
"b" : 3
},
{
"a" : 3,
"b" : 2
}
]
You could provide a lambda key to sorted:
>>> lst = [
... {
... "a" : 3,
... "b" : 2
... },
... {
... "a" : 1,
... "b" : 4
... },
... {
... "a" : 2,
... "b" : 3
... }
... ]
>>> sorted(lst, key=lambda d: d["a"])
[{'a': 1, 'b': 4}, {'a': 2, 'b': 3}, {'a': 3, 'b': 2}]
One approach, use the key argument with itemgetter:
from operator import itemgetter
lst = [{"a": 3, "b": 2}, {"a": 1, "b": 4}, {"a": 2, "b": 3}]
res = sorted(lst, key=itemgetter("a"))
print(res)
Output
[{'a': 1, 'b': 4}, {'a': 2, 'b': 3}, {'a': 3, 'b': 2}]
From the documentation on itemgetter:
Return a callable object that fetches item from its operand using the
operand’s getitem() method. If multiple items are specified,
returns a tuple of lookup values. For example:
After f = itemgetter(2), the call f(r) returns r[2].
After g =
itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3]).

How to convert this DataFrame into Json

I have this DataFrame with 2 columns
print(df)
a b
10 {'A': 'foo', ...}
20 {'B': 'faa', ...}
30 {'C': 'fee', ...}
40 {'D': 'fii', ...}
50 {'E': 'foo', ...}
when I try to convert it into json it goes wrong:
df.to_json("test.json")
# Output:
{
"a":{10, 20, 30, 40, 50},
"b":{
"1":{
"A":"foo",
...
},
"2":{
"B":"faa",
...
},
"3":{
"B":"faa",
...
},
...
"5":{
"E":"foo",
...
}
}
I don't even know ehere the numbers come from.
My desired json:
[{
'a': 10,
'b': {
'A': 'foo',
...
},
...
'a': 50,
'b': {
'E': 'foo',
...
}
}
]
You could try the following:
data = []
for i in df:
data.append({'a': df[i[0]], 'b': df(i[1])})
This should give you your desired output.
If you want to convert this into a JSON file then you can do the following:
with open("myjson.json", "w") as f:
json.dump(data, f, indent=4)

Converting pandas dataframe to JSON Object Column

I have a pandas dataframe that has information about a user with multiple orders and within each order there are multiple items purchases. An example of the dataframe format:
user_id | order_num | item_id | item_desc
1 1 1 red
1 1 2 blue
1 1 3 green
I want to convert it to JSONb Object in a column so that I can query it in postgresql.
Currently I am using the following code:
j = (reg_test.groupby(['user_id', 'order_num'], as_index=False)
.apply(lambda x: x[['item_id','item_desc']].to_dict('r'))
.reset_index()
.rename(columns={0:'New-Data'})
.to_json(orient='records'))
This is the result I am getting:
'''
[
{
"New-Data": [
{
"item_id": "1",
"item_desc": "red",
},
{
"item_id": "2",
"item_desc": "blue",
},
{
"item_id": "3",
"item_desc": "green",
}
],
"order_number": "1",
"user_id": "1"
}
]
'''
While that is correct json format, I want the result to look like this:
'''
[
{
"New-Data": [{
"1":
{
"item_id": "1",
"item_desc": "red",
},
"2": {
"item_id": "2",
"item_desc": "blue",
},
"3":
{
"item_id": "3",
"item_desc": "green",
}
}
],
"order_number": "1",
"user_id": "1"
}
]
'''
as an alternative to #rpanai's solution, i moved the processing into vanilla python :
convert dataframe to dict :
M = df.to_dict("records")
create the dict for the items
items = [
{key: value
for key, value in entry.items()
if key not in ("user_id", "order_num")}
for entry in M
]
item_details = [{str(num + 1): entry}
for num, entry
in enumerate(items)]
print(item_details)
[{'1': {'item_id': 1, 'item_desc': 'red'}},
{'2': {'item_id': 2, 'item_desc': 'blue'}},
{'3': {'item_id': 3, 'item_desc': 'green'}}]
Initialize dict and add the remaining data
d = dict()
d['New-Data'] = item_details
d['order_number'] = M[0]['order_num']
d['user_id'] = M[0]['user_id']
wrapper = [d]
print(wrapper)
[{'New-Data': [{'1': {'item_id': 1, 'item_desc': 'red'}},
{'2': {'item_id': 2, 'item_desc': 'blue'}},
{'3': {'item_id': 3, 'item_desc': 'green'}}],
'order_number': 1,
'user_id': 1}]
Have you considered to use a custom function
import pandas as pd
df = pd.DataFrame({'user_id': {0: 1, 1: 1, 2: 1},
'order_num': {0: 1, 1: 1, 2: 1},
'item_id': {0: 1, 1: 2, 2: 3},
'item_desc': {0: 'red', 1: 'blue', 2: 'green'}})
out = df.groupby(['user_id', 'order_num'])[["item_id", "item_desc"]]\
.apply(lambda x: x.to_dict("records"))\
.apply(lambda x: [{str(l["item_id"]):l for l in x}])\
.reset_index(name="New-Data")\
.to_dict("records")
where out returns
[{'user_id': 1,
'order_num': 1,
'New-Data': [{'1': {'item_id': 1, 'item_desc': 'red'},
'2': {'item_id': 2, 'item_desc': 'blue'},
'3': {'item_id': 3, 'item_desc': 'green'}}]}]

Problems to create a nested Python dictonary

I try to create a dictonary with Python 3. Here is my code:
data = {}
data['price'] = []
data['place1'] = []
data['place2'] = []
data['place1'].append({
'x': 2,
'y': 1
})
data['place2'].append({
'a': 5,
'b': 6
})
data['price'].append(data['place1'])
data['price'].append(data['place2'])
print(data)
so the output ist:
{'price': [[{'x': 2, 'y': 1}], [{'a': 5, 'b': 6}]], 'place1': [{'x': 2, 'y': 1}], 'place2': [{'a': 5, 'b': 6}]}
But I need it like in this example:
'price'
->'place1'
->'x'=2
->'y'=1
->'place2'
->'a'=5
->'b'=6
Is diconary the correct method for this?
Thanks for your help!
Best, Marius
Well, you're appending lists here: data['price'].append(data['place1']), so now data['price'] is a list of lists.
You can write a simple dictionary literal instead:
data = {
'price': {
'place1': {
'x': 2,
'y': 1
},
'place2': {
'a': 5,
'b': 6
}
}
}
Or, if you insist on appending data dynamically:
data = {'price': {}}
data['price']['place1'] = {'x': 2, 'y': 1}
data['price']['place2'] = {'a': 5, 'b': b}
Just to keep the original content as much as possible, you need to make data['price'] a dict then put place1 and place2 inside it.
data = {}
data['price'] = {}
data['price']['place1'] = []
data['price']['place2'] = []
data['price']['place1'].append({
'x': 2,
'y': 1
})
data['price']['place2'].append({
'a': 5,
'b': 6
})
No, you cannot map the (:) on the dictionary to an (=) sign as your output.

Nested dictionary with lists to many dictionaries

I have nested dictionary with lists like this
{
'a': 1,
'x':[
{'b': 1,
'c': [
{'z': 12},
{'z': 22},
]
},
{'b': 2,
'c': [
{'z': 10},
{'z': 33},
]
}
]
}
And I want to convert it to list of flat dictionaries i form like this.
[
{'a': 1, 'b': 1, 'z': 12},
{'a': 1, 'b': 1, 'z': 22},
{'a': 1, 'b': 2, 'z': 10},
{'a': 1, 'b': 2, 'z': 33},
]
Any idea how to achieve that?
The following produces the requested result:
[{'a': 1, 'b': 1, 'z': 12}, {'a': 1, 'b': 2, 'z': 10}]
Use at your own risk. The following was only tested on your example.
from itertools import product
def flatten(D):
if not isinstance(D, dict): return D
base = [(k, v) for k, v in D.items() if not isinstance(v, list)]
lists = [[flatten(x) for x in v] for k, v in D.items() if isinstance(v, list)]
l = []
for p in product(*lists):
r = dict(base)
for a in p:
for d in a:
r.update(d)
l.append(r)
return l
The following tests above.
d = {
'a': 1,
'x':[
{'b': 1,
'c': [
{'z': 12}
]
},
{'b': 2,
'c': [
{'z': 10}
]
}
]
}
print flatten(d)
A possible solution is:
#!/usr/bin/env python3
d = {
'a': 1,
'x': [
{
'b': 1,
'c': [
{'z': 12}
]
},
{
'b': 2,
'c': [
{'z': 10}
]
}
]
}
res = [{"a": 1, "b": x["b"], "z": x["c"][0]["z"]} for x in d["x"]]
print(res)
This assumes that there is only one a (with a fixed value of 1) and x element and this element is added to the comprehension manually.
The other two elements (b and z) are taken from x array with a list comprehension.
To learn more about how comprehensions work read the following:
Python Documentation - 5.1.4. List Comprehensions
Python: List Comprehensions
PS. You are supposed to first show what you have tried so far and get help on that. Take a look at SO rules before posting your next question.

Categories