Convert pandas.DataFrame to list of dictionaries in Python - python

I have a dictionary which is converted from a dataframe as below :
a = d.to_json(orient='index')
Dictionary :
{"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
What I need is it be in a list, so essentially a list of dictionary.
So i just add a [] because that is the format to be used in the rest of the code.
input_dict = [a]
input_dict :
['
{"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
']
I need to get the single quotes removed just after the [ and just before the ]. Also, have the PKID values in form of list.
How can this be achieved ?
Expected Output :
[ {"yr":2017,"PKID":[58306, 57011],"Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":[1234,54321],"Subject":"XYZ","ID":"T002"} ]
NOTE : The PKID column has multiple integer values which have to come as a lift of integers. a string is not acceptable.
so we need like "PKID":[58306, 57011] and not "PKID":"[58306, 57011]"

pandas.DataFrame.to_json returns a string (JSON string), not a dictionary. Try to_dict instead:
>>> df
col1 col2
0 1 3
1 2 4
>>> [df.to_dict(orient='index')]
[{0: {'col1': 1, 'col2': 3}, 1: {'col1': 2, 'col2': 4}}]
>>> df.to_dict(orient='records')
[{'col1': 1, 'col2': 3}, {'col1': 2, 'col2': 4}]

Here is one way:
from collections import OrderedDict
d = {"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
list(OrderedDict(sorted(d.items())).values())
# [{'ID': 'T001', 'PKID': '58306, 57011', 'Subject': 'ABC', 'yr': 2017},
# {'ID': 'T002', 'PKID': '1234,54321', 'Subject': 'XYZ', 'yr': 2018}]
Note the ordered dictionary is ordered by text string keys, as supplied. You may wish to convert these to integers first before any processing via d = {int(k): v for k, v in d.items()}.

You are converting your dictionary to json which is a string. Then you wrap your resulting string a list. So, naturally, the result is a string inside of a list.
Try instead: [d] where d is your raw dictionary (not converted json

You can use a list comprehension
Ex:
d = {"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
print [{k: v} for k, v in d.items()]
Output:
[{'1': {'PKID': '1234,54321', 'yr': 2018, 'ID': 'T002', 'Subject': 'XYZ'}}, {'0': {'PKID': '58306, 57011', 'yr': 2017, 'ID': 'T001', 'Subject': 'ABC'}}]

What about something like this:
from operator import itemgetter
d = {"0":{"yr":2017,"PKID":"58306, 57011","Subject":"ABC","ID":"T001"},"1":
{"yr":2018,"PKID":"1234,54321","Subject":"XYZ","ID":"T002"}}
sorted_d = sorted(d.items(), key=lambda x: int(x[0]))
print(list(map(itemgetter(1), sorted_d)))
Which Outputs:
[{'yr': 2017, 'PKID': '58306, 57011', 'Subject': 'ABC', 'ID': 'T001'},
{'yr': 2018, 'PKID': '1234,54321', 'Subject': 'XYZ', 'ID': 'T002'}]

Related

Parse output from json python

I have a json below, and I want to parse out value from this dict.
I can do something like this to get one specific value
print(abc['everything']['A']['1']['tree']['value'])
But, what is best way to parse out all "value?"
I want to output good, bad, good.
abc = {'everything': {'A': {'1': {'tree': {'value': 'good'}}},
'B': {'5': {'tree1': {'value': 'bad'}}},
'C': {'30': {'tree2': {'value': 'good'}}}}}
If you are willing to use pandas, you could just use pd.json_normalize, which is actually quite fast:
import pandas as pd
abc = {'everything': {'A': {'1': {'tree': {'value': 'good'}}},
'B': {'5': {'tree1': {'value': 'bad'}}},
'C': {'30': {'tree2': {'value': 'good'}}}}}
df = pd.json_normalize(abc)
print(df.values[0])
['good' 'bad' 'good']
Without any extra libraries, you will have to iterate through your nested dictionary:
values = [abc['everything'][e][k][k1]['value'] for e in abc['everything'] for k in abc['everything'][e] for k1 in abc['everything'][e][k]]
print(values)
['good', 'bad', 'good']
Provided your keys and dictionaries have a value somewhere, you can try this:
Create a function (or reuse the code) that gets the first element of the dictionary until the value key exists, then return that. Note that there are other ways of doing this.
Iterate through, getting the result under each value key and return.
# Define function
def get(d):
while not "value" in d:
d = list(d.values())[0]
return d["value"]
# Get the results from your example
results = [get(v) for v in list(abc["everything"].values())]
['good', 'bad', 'good']
A Recursive way:
def fun(my_dict, values=[]):
if not isinstance(my_dict, dict):
return values
for i, j in my_dict.items():
if i == 'value':
values.append(j)
else:
values = fun(j, values)
return values
abc = {'everything': {'A': {'1': {'tree': {'value': 'good'}}},
'B': {'5': {'tree1': {'value': 'bad'}}},
'C': {'30': {'tree2': {'value': 'good'}}}}}
data = fun(abc)
print(data)
Output:
['good', 'bad', 'good']
Firstly, the syntax you are using is incorrect.
If you are using pandas, you can code like
import pandas as pd
df4 = pd.DataFrame({"TreeType": ["Tree1", "Tree2", "Tree3"],
"Values": ["Good", "Bad","Good"]})
df4.index = ["A","B","C"]
next just run the code df4, you would get the correct output.
output:
TreeType Values
A Tree1 Good
B Tree2 Bad
C Tree3 Good

Flatten list of dictionaries with multiple key, value pairs

I have a list of dictionaries with multiple KVP each
list_dict = [{'id': 1, 'name': 'sana'}, {'id': 2, 'name': 'art'}, {'id': 3, 'name': 'tiara'}]
I want to transform this into this format:
final_dict = {1: 'sana', 2: 'art', 3: 'tiara'}
I've been trying dict comprehensions but it does not work. Here's the best that I could do:
{k:v for d in list_dict for k, v in d.items()}
You don't need d.items(), you can just access the id and name properties of each dict.
{d['id']: d['name'] for d in list_dict}
for each element of the list you want the d["id"] to be the key and d["name"] to be the value, so the dictionary comprehension would look like this:
{d["id"]: d["name"] for d in list_dict}
You can try
final_dict={}
for dico in list_dict:
final_dict[dico['id']] = dico['name']
There's probably a few different ways you can do this. Here's a nice simple way of doing it using a pandas dataframe:
import pandas as pd
df = pd.DataFrame(list_dict) # <- convert to dataframe
df = df.set_index('id') # <- set the index to the field you want as the key
final_dict = df['name'].to_dict() # <- convert the series 'name' to a dict
print(final_dict)
{1: 'sana', 2: 'art', 3: 'tiara'}

how to merge values of python dict in a list of dictionaries

i have a python list of dictionary as shown below:
mylist = [{'id':1,'value':4},{'id':1,'value':6},{'id':2,'value':6},{'id':3,'value':9},{'id':3,'value':56},{'id':3,'value':67},]
i am trying to create a new list of dictionaries like this by doing some operations on the above shown list of dictionaries
newlist = [{'id':1,'value':[4,6]},{'id':2,'value':[6]},{'id':3,'value':[9,56,67]}]
Does anyone know a good way to do this?
If list items are sorted by id, you can use itertools.groupby:
>>> mylist = [{'id':1,'value':4},{'id':1,'value':6},{'id':2,'value':6},{'id':3,'value':9},{'id':3,'value':56},{'id':3,'v alue':67},]
>>> import itertools
>>> [{'id': key, 'value': [x['value'] for x in grp]}
... for key, grp in itertools.groupby(mylist, key=lambda d: d['id'])]
[{'id': 1, 'value': [4, 6]},
{'id': 2, 'value': [6]},
{'id': 3, 'value': [9, 56, 67]}]
You can construct the entire the list of dictionaries as a single dictionary with multiple values, using defaultdict, like this
from collections import defaultdict
d = defaultdict(list)
for item in mylist:
d[item['id']].append(item['value'])
And then using list comprehension, you can reconstruct the required list of dictionaries like this
print[{'id': key, 'value': d[key]} for key in d]
# [{'id':1, 'value':[4, 6]}, {'id':2, 'value':[6]}, {'id':3, 'value':[9,56,67]}]
You could also use dict comprehension:
newlist = {key: [entries[key] for entries in diclist] for key, value in diclist[0].items()}

Merging two python lists of dicts

I am trying to take two lists, each of which are lists of dictionaries with the same keys, and output versions of each list that only contain dictionaries that share common values for one of the keys. For example:
#before:
json1 = [{'id':1, 'name':'john', 'age': 3}, {'id':2, 'name':'jack', 'age':5}]
json2 = [{'id':3, 'name':'john', 'age': 5}, {'id':1, 'name':'jill', 'age':3}]
#Do some operation that merges based on the key 'id'
json1 = [{'id':1, 'name':'john', 'age': 3}]
json2 = [{'id':1, 'name':'jill', 'age':3}]
So, merging the lists of dicts based on id would output what I wrote above. Merging based on another key, say 'name', would only keep the first dict of each list.
Does anyone know a good way to do this?
EDIT
Sorry about the list names, I guess to be extremely accurate I'll call them json1 and json2
I thing your merging function could be something like that
def merge(key, l1, l2):
k1 = { d[key] for d in l1 }
k2 = { d[key] for d in l2 }
keys = k1.intersection(k2)
f1 = [ d for d in l1 if d[key] in keys ]
f2 = [ d for d in l2 if d[key] in keys ]
return f1, f2
That is :
take values of the key used for merging (in your examples 'id' or 'name')
in sets to avoid duplicates
find common values in the 2 sets
keep only dicts from the initial lists where the key take one of the common values
If you take merge('id', json1, json2) you get a 2-tuple of your resulting json1 and json2
Assuming I understand you, I'd do this in two passes: first find the common values, and then build the new lists:
>>> j1 = [{'id':1, 'name':'john', 'age': 3}, {'id':2, 'name':'jack', 'age':5}]
>>> j2 = [{'id':3, 'name':'john', 'age': 5}, {'id':1, 'name':'jill', 'age':3}]
>>> jj = (j1, j2)
>>> common = set.intersection(*({d['id'] for d in j} for j in jj))
>>> common
set([1])
>>> jjnew = [[d for d in j if d['id'] in common] for j in jj]
>>> jjnew
[[{'age': 3, 'id': 1, 'name': 'john'}], [{'age': 3, 'id': 1, 'name': 'jill'}]]
And similarly for name:
>>> common = set.intersection(*({d['name'] for d in j} for j in jj))
>>> jjnew = [[d for d in j if d['name'] in common] for j in jj]
>>> jjnew
[[{'age': 3, 'id': 1, 'name': 'john'}], [{'age': 5, 'id': 3, 'name': 'john'}]]

Dictionary transformation and counter

Object:
data = [{'key': 11, 'country': 'USA'},{'key': 21, 'country': 'Canada'},{'key': 12, 'country': 'USA'}]
the result should be:
{'USA': {0: {'key':11}, 1: {'key': 12}}, 'Canada': {0: {'key':21}}}
I started experiment with:
result = {}
for i in data:
k = 0
result[i['country']] = dict(k = dict(key=i['key']))
and I get:
{'Canada': {'k': {'key': 21}}, 'USA': {'k': {'key': 12}}}
So how can I put the counter instead k? Maybe there is a more elegant way to create the dictionary?
I used the len() of the existing result item:
>>> import collections
>>> data = [{'key': 11, 'country': 'USA'},{'key': 21, 'country': 'Canada'},{'key': 12, 'country': 'USA'}]
>>> result = collections.defaultdict(dict)
>>> for item in data:
... country = item['country']
... result[country][len(result[country])] = {'key': item['key']}
...
>>> dict(result)
{'Canada': {0: {'key': 21}}, 'USA': {0: {'key': 11}, 1: {'key': 12}}}
There may be a more efficient way to do this, but I thought this would be most readable.
#zigg's answer is better.
Here's an alternative way:
import itertools as it, operator as op
def dict_transform(dataset, key_name=None, group_by=None):
result = {}
sorted_dataset = sorted(data, key=op.itemgetter(group_by))
for k,g in it.groupby(sorted_dataset, key=op.itemgetter(group_by)):
result[k] = {i:{key_name:j[key_name]} for i,j in enumerate(g)}
return result
if __name__ == '__main__':
data = [{'key': 11, 'country': 'USA'},
{'key': 21, 'country': 'Canada'},
{'key': 12, 'country': 'USA'}]
expected_result = {'USA': {0: {'key':11}, 1: {'key': 12}},
'Canada': {0: {'key':21}}}
result = dict_transform(data, key_name='key', group_by='country')
assert result == expected_result
To add the number, use the {key:value} syntax
result = {}
for i in data:
k = 0
result[i['country']] = dict({k : dict(key=i['key'])})
dict(k = dict(key=i['key']))
This passes i['key'] as the key keyword argument to the dict constructor (which is what you want - since that results in the string "key" being used as a key), and then passes the result of that as the k keyword argument to the dict constructor (which is not what you want) - that's how parameter passing works in Python. The fact that you have a local variable named k is irrelevant.
To make a dict where the value of k is used as a key, the simplest way is to use the literal syntax for dictionaries: {1:2, 3:4} is a dict where the key 1 is associated with the value 2, and the key 3 is associated with the value 4. Notice that here we're using arbitrary expressions for keys and values - not names - so we can use a local variable and the resulting dictionary will use the named value.
Thus, you want {k: {'key': i['key']}}.
Maybe there is a more elegant way to create the dictionary?
You could create a list by appending items, and then transform the list into a dictionary with dict(enumerate(the_list)). That at least saves you from having to do the counting manually, but it's pretty indirect.

Categories