Dictionary of dictionaries from data frame - python

I have created a data frame from data got from database. I need to convert the data frame to dictionary with format mentioned below.
{ 0 : {'column1': 'value', 'column2' : 'value',....},
1 : {'column1': 'value', 'column2' : 'value',....},....
I tried
data_list = [tuple(r) for r in data_df.to_numpy().tolist()]#convert df to list of tuples
data_dict = [dict(zip(ag_titles, x)) for x in data_list]
data_final = {i:{k:v for k,v in dict(data_dict).items}for i in range(len(data_dict))}
But I am not getting the expected output. How can this be done with dictionary comprehension?

You can try-
b = data_df.to_dict(orient='index')

Related

How to normalize a complex json format in a pandas data frame that is a list of dictionaries

I have a pandas data frame that has one column like this in json format. I am not able to understand how to extract this.
df['completionDetails'][0] gives:
[{'name': 'start', 'time': 1654098788177},
{'name': 'arrival',
'time': 1654099038368,
'location': [-74.2713929, 40.5017297]},
{'name': 'departure',
'time': 1654098843357,
'location': [-74.2802414, 40.5095964]}]
I have tried:
dict_df = pd.DataFrame([ast.literal_eval(i) for i in df['completionDetails'].values])
But it is giving me error. What method can I use for this?
Expected Output:
start_time arrival_time arrival_location departure_time departure_location
1654098788177 1654099038368 [-74.2713929, 40.5017297] 1654098843357 [-74.2802414, 40.5095964]
IIUC each cell of the completionDetails column is a list of dictionaries.
You can make a dataframe out of each cell and concatenate the dfs:
dict_df = pd.concat([pd.DataFrame(i) for i in df['completionDetails'].values])
Edit:
Following your own edit, this is how you'd get the desired output:
dict_df = pd.concat([pd.DataFrame({f"{x['name']}_{k}": [v]
for x in i for k,v in x.items() if k!='name'}
) for i in df['completionDetails'].values if isinstance(i, list)])
As you can see we're building key names from the name key and other keys to create new dictionaries that will be used to create dataframes (that in turn will be concatenated to each other)
Output:
start_time arrival_time arrival_location departure_time departure_location
0 1654098788177 1654099038368 [-74.2713929, 40.5017297] 1654098843357 [-74.2802414, 40.5095964]

how to merge keys in dictionary outside of list?

I'm trying to get dictionary with same keys and merge its values and if there is a duplicate leave only one value of duplicate.
data = {"test1":["data1", "data2"],
"test1":["data3", "data4", "data2"],
"test2":["1data", "2data"],
"test2":["3data", "4data", "2data"]
}
desired_result = {"test1":["data1", "data2", "data3", "data4"],
"test2":["1data", "2data", "3data", "4data"]
}
any ideas how to get result?
First you need create list of dict (because you can't have dictionary with same keys) then iterate over them and extend them to list with key of dict then use set for delete duplicated like below:
data = [{"test1":["data1", "data2"]},{"test1":["data3", "data4", "data2"]},{"test2":["1data", "2data"]},{"test2":["3data", "4data", "2data"]}]
from collections import defaultdict
rslt_out = defaultdict(list)
for dct in data:
for k,v in dct.items():
rslt_out[k].extend(v)
for k,v in rslt_out.items():
rslt_out[k] = list(set((v)))
print(rslt_out)
output:
defaultdict(list,
{'test1': ['data3', 'data4', 'data2', 'data1'],
'test2': ['2data', '3data', '1data', '4data']})

How to iterate through this nested dictionary within a list using for loop

I have a list of nested dictionaries that I want to get specific values and put into a dictionary like this:
vid = [{'a':{'display':'axe', 'desc':'red'}, 'b':{'confidence':'good'}},
{'a':{'display':'book', 'desc':'blue'}, 'b':{'confidence':'poor'}},
{'a':{'display':'apple', 'desc':'green'}, 'b':{'confidence':'good'}}
]
I saw previous questions similar to this, but I still can't get the values such as 'axe' and 'red'. I would like the new dict to have a 'Description', 'Confidence' and other columns with the values from the nested dict.
I have tried this for loop:
new_dict = {}
for x in range(len(vid)):
for y in vid[x]['a']:
desc = y['desc']
new_dict['Description'] = desc
I got many errors but mostly this error:
TypeError: string indices must be integers
Can someone please help solve how to get the values from the nested dictionary?
You don't need to iterate through the keys in the dictionary (the inner for-loop), just access the value you want.
vid = [{'a':{'display':'axe', 'desc':'red'}, 'b':{'confidence':'good'} },
{'a':{'display':'book', 'desc':'blue'}, 'b':{'confidence':'poor'}},
{'a':{'display':'apple', 'desc':'green'}, 'b':{'confidence':'good'}}
]
new_dict = {}
list_of_dicts = []
for x in range(len(vid)):
desc = vid[x]['a']['desc']
list_of_dicts.append({'desc': desc})
I have found a temporary solution for this. I decided to use the pandas dataframe instead.
df = pd.DataFrame(columns = ['Desc'])
for x in range(len(vid)):
desc = vid[x]['a']['desc']
df.loc[len(df)] = [desc]
so you want to write this to csv later so pandas will help you a lot for this problem using pandas you can get the desc by
import pandas as pd
new_dict = {}
df = pd.DataFrame(vid)
for index, row in df.iterrows() :
new_dict['description'] = row['a']['desc']
a b
0 {'display': 'axe', 'desc': 'red'} {'confidence': 'good'}
1 {'display': 'book', 'desc': 'blue'} {'confidence': 'poor'}
2 {'display': 'apple', 'desc': 'green'} {'confidence': 'good'}
this is how dataframe looks like a b are column of the dataframe and your nested dicts are rows of dataframe
Try using this list comprehension:
d = [{'Description': i['a']['desc'], 'Confidence': i['b']['confidence']} for i in vid]
print(d)

Passing multiple dictionary observations to function?

How would I pass multiple dictionary observations (row) into function for model prediction?
This is what I have ... it can accept 1 dictionary row as input and returns the prediction + probabilities, but fails when adding additional dictionaries.
import json
# func
def preds(dict):
df = pd.DataFrame([dict])
result = model.predict(df)
result = np.where(result==0,"CLASS_0","CLASS_1").astype('str')
probas_c0 = model.predict_proba(df)[0][0]
probas_c1 = model.predict_proba(df)[0][1]
data={"prediction": result[0],
"CLASS_0_PROB": probas_c0,
"CLASS_1_PROB": probas_c1}
data = {"parameters": [data]}
j = json.dumps(data)
j = json.loads(j)
return j
# call func
preds({"feature0": "value",
"feature1": "value",
"feature2": "value"})
# result
{'parameters': [{'prediction': 'CLASS_0',
'CLASS_0_PROB': 0.9556067383610446,
'CLASS_1_PROB': 0.0443932616389555}]}
# Tried with more than 1 row but it fails with arguments error
{'parameters': [{'prediction': 'CLASS_0',
'CLASS_0_PROB': 0.9556067383610446,
'CLASS_1_PROB': 0.0443932616389555},
{'parameters': [{'prediction': 'CLASS_0',
'CLASS_0_PROB': 0.9556067383610446,
'CLASS_1_PROB': 0.0443932616389555}]}
TypeError: preds() takes 1 positional argument but 2 were given
NEW UPDATE
The source data format from end users will most likely be a dataframe so want to convert that to format of [{...},{...}] so it can be plugged into preds() function here df=pd.DataFrame([rows])
Tried this so far...
rows = [
{"c1": "value1",
"c2": "value2",
"c3": 0,
},
{"c1": "value1,
"c2": "value2,
"c3": 0}
]
df = pd.DataFrame(rows)
json_rows = df.to_json(orient='records', lines=True)
l = [json_rows]
preds(l)
KeyError: "None of [['c1', 'c2', 'c3']] are in the [columns]"
UPDATED
Ok, based on your commentaries, what you need is the DataFrame get all rows, then you can use the next aproachs
Using *args
def preds(*args):
# args is tuple you need to cast as list
dict_rows = list(args)
df = pd.DataFrame(dict_rows)
result = model.predict(df)
...
# calling the function you need to unpack
preds(*rows)
Checking the element beforehand
def preds(dict_rows):
# checking if dict_rows is a list or a dict
if isinstance(dict_rows, dict)
dict_rows = [dict_rows]
df = pd.DataFrame(dict_rows)
result = model.predict(df)
...
# For calling you need to
preds(rows)
Please note that pd.DataFrame(dict_rows) not accepting [dict].
Old Anwser
If preds() can't handle multiple rows you can do
pred_rows = [
{"feature0": "value","feature1": "value", "feature2": "value"}
{"feature3": "value","feature4": "value", "feature5": "value"}
]
# List Comprehension
result = [preds(row) for row in pred_rows]
PS: also don't use dict as a variable name, is a Mapping Type, a constructor/class for dictionaries

Converting list of dictionaries into single dictionary in python 3

I have a snippet of data from which I need to extract specific information. The Data looks like this:
pid log Date
91 json D1
189 json D2
276 json D3
293 json D4
302 json D5
302 json D6
343 json D7
The LOG is a json file stored in a column of an excel file which looks something like this:
{"Before":{"freq_term":"Daily","ideal_pmt":"246.03","datetime":"2015-01-08 06:26:11},"After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}
{"Before":{"freq_term":"Daily","ideal_pmt":"637.5","datetime":"2015-01-08 06:26:11"},"After":{"freq_term":"Weekly","ideal_pmt":"3346.88","datetime":"2015-02-02 06:16:07"}}
{"Before":{"buy_rate":"1.180","irr":"31.63","uwfee":"","freq_term":"Weekly"}, "After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}
Now, what I want is an output something like this:
{
"pid": 91,
"Date": "2016-05-15 03:54:24"
"Before": {
"freq_term": "Daily"
},
"After": {
"freq_term": "Weekly",
}
}
Basically I want only the "freq_term" and "Datetime" of "Before" and "After" from the log file. So far I have done the following code. After this whatever I do it gives me the error: list object is not callable. Any help appreciated. Thanks.
import pandas as pd
data = pd.read_excel("C:\\Users\\Desktop\\dealChange.xlsx")
df = pd.DataFrame(data, columns = ['pid', 'log', 'date'])
li = df.to_dict('records')
dict(kv for d in li for kv in d.iteritems()) # error: list obj is not callable
How do I convert the list into a dictionary so that I can access only the data required..
I believe you need:
df = pd.DataFrame({'log':['{"Before":{"freq_term":"Daily","ideal_pmt":"637.5","datetime":"2015-01-08 06:26:11"},"After":{"freq_term":"Weekly","ideal_pmt":"3346.88","datetime":"2015-02-02 06:16:07"}}','{"Before":{"buy_rate":"1.180","irr":"31.63","uwfee":"","freq_term":"Weekly"}, "After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}']})
print (df)
log
0 {"Before":{"freq_term":"Daily","ideal_pmt":"63...
1 {"Before":{"buy_rate":"1.180","irr":"31.63","u...
First convert values to nested dictionaries and then filter by nested dict comprehension:
df['log'] = df['log'].apply(pd.io.json.loads)
L1 = ['Before','After']
L2 = ['freq_term','datetime']
f = lambda x: {k:{k1:v1 for k1,v1 in v.items() if k1 in L2} for k,v in x.items() if k in L1}
df['new'] = df['log'].apply(f)
print (df)
log \
0 {'After': {'ideal_pmt': '3346.88', 'freq_term'...
1 {'After': {'ideal_pmt': '2583.33', 'freq_term'...
new
0 {'After': {'freq_term': 'Weekly', 'datetime': ...
1 {'After': {'freq_term': 'Bi-Monthly'}, 'Before...
EDIT:
For find all rows with unparseable values is possible use:
def f(x):
try:
return ast.literal_eval(x)
except:
return 1
print (df[df['log'].apply(f) == 1])

Categories