Converting list of dictionaries into single dictionary in python 3 - python

I have a snippet of data from which I need to extract specific information. The Data looks like this:
pid log Date
91 json D1
189 json D2
276 json D3
293 json D4
302 json D5
302 json D6
343 json D7
The LOG is a json file stored in a column of an excel file which looks something like this:
{"Before":{"freq_term":"Daily","ideal_pmt":"246.03","datetime":"2015-01-08 06:26:11},"After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}
{"Before":{"freq_term":"Daily","ideal_pmt":"637.5","datetime":"2015-01-08 06:26:11"},"After":{"freq_term":"Weekly","ideal_pmt":"3346.88","datetime":"2015-02-02 06:16:07"}}
{"Before":{"buy_rate":"1.180","irr":"31.63","uwfee":"","freq_term":"Weekly"}, "After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}
Now, what I want is an output something like this:
{
"pid": 91,
"Date": "2016-05-15 03:54:24"
"Before": {
"freq_term": "Daily"
},
"After": {
"freq_term": "Weekly",
}
}
Basically I want only the "freq_term" and "Datetime" of "Before" and "After" from the log file. So far I have done the following code. After this whatever I do it gives me the error: list object is not callable. Any help appreciated. Thanks.
import pandas as pd
data = pd.read_excel("C:\\Users\\Desktop\\dealChange.xlsx")
df = pd.DataFrame(data, columns = ['pid', 'log', 'date'])
li = df.to_dict('records')
dict(kv for d in li for kv in d.iteritems()) # error: list obj is not callable
How do I convert the list into a dictionary so that I can access only the data required..

I believe you need:
df = pd.DataFrame({'log':['{"Before":{"freq_term":"Daily","ideal_pmt":"637.5","datetime":"2015-01-08 06:26:11"},"After":{"freq_term":"Weekly","ideal_pmt":"3346.88","datetime":"2015-02-02 06:16:07"}}','{"Before":{"buy_rate":"1.180","irr":"31.63","uwfee":"","freq_term":"Weekly"}, "After":{"freq_term":"Bi-Monthly","ideal_pmt":"2583.33"}}']})
print (df)
log
0 {"Before":{"freq_term":"Daily","ideal_pmt":"63...
1 {"Before":{"buy_rate":"1.180","irr":"31.63","u...
First convert values to nested dictionaries and then filter by nested dict comprehension:
df['log'] = df['log'].apply(pd.io.json.loads)
L1 = ['Before','After']
L2 = ['freq_term','datetime']
f = lambda x: {k:{k1:v1 for k1,v1 in v.items() if k1 in L2} for k,v in x.items() if k in L1}
df['new'] = df['log'].apply(f)
print (df)
log \
0 {'After': {'ideal_pmt': '3346.88', 'freq_term'...
1 {'After': {'ideal_pmt': '2583.33', 'freq_term'...
new
0 {'After': {'freq_term': 'Weekly', 'datetime': ...
1 {'After': {'freq_term': 'Bi-Monthly'}, 'Before...
EDIT:
For find all rows with unparseable values is possible use:
def f(x):
try:
return ast.literal_eval(x)
except:
return 1
print (df[df['log'].apply(f) == 1])

Related

Flatten a column value using dataframe

Im trying to flatten 2 columns from a table loaded into a dataframe as below:
u_group
t_group
{"link": "https://hi.com/api/now/table/system/2696f18b376bca0", "value": "2696f18b376bca0"}
{"link": "https://hi.com/api/now/table/system/2696f18b376bca0", "value": "2696f18b376bca0"}
{"link": "https://hi.com/api/now/table/system/99b27bc1db761f4", "value": "99b27bc1db761f4"}
{"link": "https://hi.com/api/now/table/system/99b27bc1db761f4", "value": "99b27bc1db761f4"}
I want to separate them and get them as:
u_group.link
u_group.value
t_group.link
t_group.value
https://hi.com/api/now/table/system/2696f18b376bca0
2696f18b376bca0
https://hi.com/api/now/table/system/2696f18b376bca0
2696f18b376bca0
https://hi.com/api/now/table/system/99b27bc1db761f4
99b27bc1db761f4
https://hi.com/api/now/table/system/99b27bc1db761f4
99b27bc1db761f4
I used the below code, but wasnt successful.
import ast
from pandas.io.json import json_normalize
df12 = spark.sql("""select u_group,t_group from tbl""")
def only_dict(d):
'''
Convert json string representation of dictionary to a python dict
'''
return ast.literal_eval(d)
def list_of_dicts(ld):
'''
Create a mapping of the tuples formed after
converting json strings of list to a python list
'''
return dict([(list(d.values())[1], list(d.values())[0]) for d in ast.literal_eval(ld)])
A = json_normalize(df12['u_group'].apply(only_dict).tolist()).add_prefix('link.')
B = json_normalize(df['u_group'].apply(list_of_dicts).tolist()).add_prefix('value.')
TypeError: 'Column' object is not callable
Kindly help or suggest if any other code would work better.
need simple example and code for answer
example:
data = [[{'link':'A1', 'value':'B1'}, {'link':'A2', 'value':'B2'}],
[{'link':'C1', 'value':'D1'}, {'link':'C2', 'value':'D2'}]]
df = pd.DataFrame(data, columns=['u', 't'])
output(df):
u t
0 {'link': 'A1', 'value': 'B1'} {'link': 'A2', 'value': 'B2'}
1 {'link': 'C1', 'value': 'D1'} {'link': 'C2', 'value': 'D2'}
use following code:
pd.concat([df[i].apply(lambda x: pd.Series(x)).add_prefix(i + '_') for i in df.columns], axis=1)
output:
u_link u_value t_link t_value
0 A1 B1 A2 B2
1 C1 D1 C2 D2
Here are my 2 cents,
A simple way to achieve this using PYSPARK.
Create the dataframe as follows:
data = [
(
"""{"link": "https://hi.com/api/now/table/system/2696f18b376bca0", "value": "2696f18b376bca0"}""",
"""{"link": "https://hi.com/api/now/table/system/2696f18b376bca0", "value": "2696f18b376bca0"}"""
),
(
"""{"link": "https://hi.com/api/now/table/system/2696f18b376bca0", "value": "2696f18b376bca0"}""",
"""{"link": "https://hi.com/api/now/table/system/99b27bc1db761f4", "value": "99b27bc1db761f4"}"""
)
]
df = spark.createDataFrame(data,schema=['u_group','t_group'])
Then use the from_json() to parse the dictionary and fetch the individual values as follows:
from pyspark.sql.types import *
from pyspark.sql.functions import *
schema_column = StructType([
StructField("link",StringType(),True),
StructField("value",StringType(),True),
])
df = df .withColumn('U_GROUP_PARSE',from_json(col('u_group'),schema_column))\
.withColumn('T_GROUP_PARSE',from_json(col('t_group'),schema_column))\
.withColumn('U_GROUP.LINK',col("U_GROUP_PARSE.link"))\
.withColumn('U_GROUP.VALUE',col("U_GROUP_PARSE.value"))\
.withColumn('T_GROUP.LINK',col("T_GROUP_PARSE.link"))\
.withColumn('T_GROUP.VALUE',col("T_GROUP_PARSE.value"))\
.drop('u_group','t_group','U_GROUP_PARSE','T_GROUP_PARSE')
Print the dataframe
df.show(truncate=False)
Please check the below image for your reference:

Converting complex nested json to csv via pandas

I have the following json file
{
"matches": [
{
"team": "Sunrisers Hyderabad",
"overallResult": "Won",
"totalMatches": 3,
"margins": [
{
"bar": 290
},
{
"bar": 90
}
]
},
{
"team": "Pune Warriors",
"overallResult": "None",
"totalMatches": 0,
"margins": null
}
],
"totalMatches": 70
}
Note - Above json is fragment of original json. The actual file contains lot more attributes after 'margins', some of them nested and others not so. I just put some for brevity and to give an idea of expectations.
My goal is to flatten the data and load it into CSV. Here is the code I have written so far -
import json
import pandas as pd
path = r"/Users/samt/Downloads/test_data.json"
with open(path) as f:
t_data = {}
data = json.load(f)
for team in data['matches']:
if team['margins']:
for idx, margin in enumerate(team['margins']):
t_data['team'] = team['team']
t_data['overallResult'] = team['overallResult']
t_data['totalMatches'] = team['totalMatches']
t_data['margin'] = margin.get('bar')
else:
t_data['team'] = team['team']
t_data['overallResult'] = team['overallResult']
t_data['totalMatches'] = team['totalMatches']
t_data['margin'] = margin.get('bar')
df = pd.DataFrame.from_dict(t_data, orient='index')
print(df)
I know that data is getting over-written and loop is not properly structured.I am bit new to dealing with JSON objects using Python and I am not able to understand how to concate the results.
My goal is once, all the results are appended, use to_csv and convert them into rows. For each margin, the entire data is to be replicated as a seperate row. Here is what I am expecting the output to be. Can someone please help how to translate this?
From whatever I find on the net, it is about first gathering the dictionary items but how to transpose it to rows is something I am not able to understand. Also, is there a better way to parse the json than doing the loop twice for one attribute i.e. margins?
I can't use json_normalize as that library is not supported in our environment.
[output data]
Using the json and csv modules: create a dictionary for each team, for each margin if there is one.
import json, csv
s = '''{
"matches": [
{
"team": "Sunrisers Hyderabad",
"overallResult": "Won",
"totalMatches": 3,
"margins": [
{
"bar": 290
},
{
"bar": 90
}
]
},
{
"team": "Pune Warriors",
"overallResult": "None",
"totalMatches": 0,
"margins": null
}
],
"totalMatches": 70
}'''
j = json.loads(s)
matches = j['matches']
rows = []
for thing in matches:
# print(thing)
if not thing['margins']:
rows.append(thing)
else:
for bar in (b['bar'] for b in thing['margins']):
d = dict((k,thing[k]) for k in ('team','overallResult','totalMatches'))
d['margins'] = bar
rows.append(d)
# for row in rows: print(row)
# using an in-memory stream for this example instead of an actual file
import io
f = io.StringIO(newline='')
fieldnames=('team','overallResult','totalMatches','margins')
writer = csv.DictWriter(f,fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
f.seek(0)
print(f.read())
team,overallResult,totalMatches,margins
Sunrisers Hyderabad,Won,3,290
Sunrisers Hyderabad,Won,3,90
Pune Warriors,None,0,
Getting multiple item values from a dictionary can be aided by using operator.itemgetter()
>>> import operator
>>> items = operator.itemgetter(*('team','overallResult','totalMatches'))
>>> #items = operator.itemgetter('team','overallResult','totalMatches')
>>> #stuff = ('team','overallResult','totalMatches'))
>>> #items = operator.itemgetter(*stuff)
>>> d = {'margins': 90,
... 'overallResult': 'Won',
... 'team': 'Sunrisers Hyderabad',
... 'totalMatches': 3}
>>> items(d)
('Sunrisers Hyderabad', 'Won', 3)
>>>
I like to use use it and give the callable a descriptive name but I don't see it used much here on SO.
You can use pd.DataFrame to create DataFrame and explode the margins column
import json
import pandas as pd
with open('data.json', 'r', encoding='utf-8') as f:
data = json.loads(f.read())
df = pd.DataFrame(data['matches']).explode('margins', ignore_index=True)
print(df)
team overallResult totalMatches margins
0 Sunrisers Hyderabad Won 3 {'bar': 290}
1 Sunrisers Hyderabad Won 3 {'bar': 90}
2 Pune Warriors None 0 None
Then fill the None value in margins column to dictionary and convert it to column
bar = df['margins'].apply(lambda x: x if x else {'bar': pd.NA}).apply(pd.Series)
print(bar)
bar
0 290
1 90
2 <NA>
At last, join the Series to original dataframe
df = df.join(bar).drop(columns='margins')
print(df)
team overallResult totalMatches bar
0 Sunrisers Hyderabad Won 3 290
1 Sunrisers Hyderabad Won 3 90
2 Pune Warriors None 0 <NA>

Dictionary of dictionaries from data frame

I have created a data frame from data got from database. I need to convert the data frame to dictionary with format mentioned below.
{ 0 : {'column1': 'value', 'column2' : 'value',....},
1 : {'column1': 'value', 'column2' : 'value',....},....
I tried
data_list = [tuple(r) for r in data_df.to_numpy().tolist()]#convert df to list of tuples
data_dict = [dict(zip(ag_titles, x)) for x in data_list]
data_final = {i:{k:v for k,v in dict(data_dict).items}for i in range(len(data_dict))}
But I am not getting the expected output. How can this be done with dictionary comprehension?
You can try-
b = data_df.to_dict(orient='index')

Fill pandas dataframe within a for loop

I am working with Amazon Rekognition to do some image analysis.
With a symple Python script, I get - at every iteration - a response of this type:
(example for the image of a cat)
{'Labels':
[{'Name': 'Pet', 'Confidence': 96.146484375, 'Instances': [],
'Parents': [{'Name': 'Animal'}]}, {'Name': 'Mammal', 'Confidence': 96.146484375,
'Instances': [], 'Parents': [{'Name': 'Animal'}]},
{'Name': 'Cat', 'Confidence': 96.146484375.....
I got all the attributes I need in a list, that looks like this:
[Pet, Mammal, Cat, Animal, Manx, Abyssinian, Furniture, Kitten, Couch]
Now, I would like to create a dataframe where the elements in the list above appear as columns and the rows take values 0 or 1.
I created a dictionary in which I add the elements in the list, so I get {'Cat': 1}, then I go to add it to the dataframe and I get the following error:
TypeError: Index(...) must be called with a collection of some kind, 'Cat' was passed.
Not only that, but I don't even seem able to add to the same dataframe the information from different images. For example, if I only insert the data in the dataframe (as rows, not columns), I get a series with n rows with the n elements (identified by Amazon Rekognition) of only the last image, i.e. I start from an empty dataframe at each iteration.
The result I would like to get is something like:
Image Human Animal Flowers etc...
Pic1 1 0 0
Pic2 0 0 1
Pic3 1 1 0
For reference, this is the code I am using now (I should add that I am working on a software called KNIME, but this is just Python):
from pandas import DataFrame
import pandas as pd
import boto3
fileName=flow_variables['Path_Arr[1]'] #This is just to tell Amazon the name of the image
bucket= 'mybucket'
client=boto3.client('rekognition', region_name = 'us-east-2')
response = client.detect_labels(Image={'S3Object':
{'Bucket':bucket,'Name':fileName}})
data = [str(response)] # This is what I inserted in the first cell of this question
d= {}
for key, value in response.items():
for el in value:
if isinstance(el,dict):
for k, v in el.items():
if k == "Name":
d[v] = 1
print(d)
df = pd.DataFrame(d, ignore_index=True)
print(df)
output_table = df
I am definitely getting it all wrong both in the for loop and when adding things to my dataframe, but nothing really seems to work!
Sorry for the super long question, hope it was clear! Any ideas?
I do not know if this answers your question completely, because i do not know, what you data can look like, but it's a good step that should help you, i think. I added the same data multiple time, but the way should be clear.
import pandas as pd
response = {'Labels': [{'Name': 'Pet', 'Confidence': 96.146484375, 'Instances': [], 'Parents': [{'Name': 'Animal'}]},
{'Name': 'Cat', 'Confidence': 96.146484375, 'Instances': [{'BoundingBox':
{'Width': 0.6686800122261047,
'Height': 0.9005332589149475,
'Left': 0.27255237102508545,
'Top': 0.03728689253330231},
'Confidence': 96.146484375}],
'Parents': [{'Name': 'Pet'}]
}]}
def handle_new_data(repsonse_data: dict, image_name: str) -> pd.DataFrame:
d = {"Image": image_name}
result = pd.DataFrame()
for key, value in repsonse_data.items():
for el in value:
if isinstance(el, dict):
for k, v in el.items():
if k == "Name":
d[v] = 1
result = result.append(d, ignore_index=True)
return result
df_all = pd.DataFrame()
df_all = df_all.append(handle_new_data(response, "image1"))
df_all = df_all.append(handle_new_data(response, "image2"))
df_all = df_all.append(handle_new_data(response, "image3"))
df_all = df_all.append(handle_new_data(response, "image4"))
df_all.reset_index(inplace=True)
print(df_all)

How to iterate through this nested dictionary within a list using for loop

I have a list of nested dictionaries that I want to get specific values and put into a dictionary like this:
vid = [{'a':{'display':'axe', 'desc':'red'}, 'b':{'confidence':'good'}},
{'a':{'display':'book', 'desc':'blue'}, 'b':{'confidence':'poor'}},
{'a':{'display':'apple', 'desc':'green'}, 'b':{'confidence':'good'}}
]
I saw previous questions similar to this, but I still can't get the values such as 'axe' and 'red'. I would like the new dict to have a 'Description', 'Confidence' and other columns with the values from the nested dict.
I have tried this for loop:
new_dict = {}
for x in range(len(vid)):
for y in vid[x]['a']:
desc = y['desc']
new_dict['Description'] = desc
I got many errors but mostly this error:
TypeError: string indices must be integers
Can someone please help solve how to get the values from the nested dictionary?
You don't need to iterate through the keys in the dictionary (the inner for-loop), just access the value you want.
vid = [{'a':{'display':'axe', 'desc':'red'}, 'b':{'confidence':'good'} },
{'a':{'display':'book', 'desc':'blue'}, 'b':{'confidence':'poor'}},
{'a':{'display':'apple', 'desc':'green'}, 'b':{'confidence':'good'}}
]
new_dict = {}
list_of_dicts = []
for x in range(len(vid)):
desc = vid[x]['a']['desc']
list_of_dicts.append({'desc': desc})
I have found a temporary solution for this. I decided to use the pandas dataframe instead.
df = pd.DataFrame(columns = ['Desc'])
for x in range(len(vid)):
desc = vid[x]['a']['desc']
df.loc[len(df)] = [desc]
so you want to write this to csv later so pandas will help you a lot for this problem using pandas you can get the desc by
import pandas as pd
new_dict = {}
df = pd.DataFrame(vid)
for index, row in df.iterrows() :
new_dict['description'] = row['a']['desc']
a b
0 {'display': 'axe', 'desc': 'red'} {'confidence': 'good'}
1 {'display': 'book', 'desc': 'blue'} {'confidence': 'poor'}
2 {'display': 'apple', 'desc': 'green'} {'confidence': 'good'}
this is how dataframe looks like a b are column of the dataframe and your nested dicts are rows of dataframe
Try using this list comprehension:
d = [{'Description': i['a']['desc'], 'Confidence': i['b']['confidence']} for i in vid]
print(d)

Categories