Converting python dataframe to a particular JSON structute - python

Hi i want to convert my dataframe to a specific json structure. my dataframe look something like this :
df = pd.DataFrame([["file1", "1.2.3.4.5.6.7.8.9", 91, "RMLO"], ["file2", "1.2.3.4.5.6.7.8.9", 92, "LMLO"], ["file3", "1.2.3.4.5.6.7.8.9", 93, "LCC"], ["file4", "1.2.3.4.5.6.7.8.9", 94, "RCC"]], columns=["Filename", "StudyID", "probablity", "finding_name"])
And the json structure in which i want to convert my datafram is below :
{
"findings": [
{
"name": "RMLO",
"probability": "91"
},
{
"name": "LMLO",
"probability": "92"
},
{
"name": "LCC",
"probability": "93"
}
{
"name": "LCC93",
"probability" : "94"
}
],
"status": "Processed",
"study_id": "1.2.3.4.5.6.7.8.9.0"
}
i tried implementing this with below code with different orient variables but i didn't get what i wanted.
j = df[["probablity","findings"]].to_json(orient='records')
so if any can help in achiveing this..
Thanks.

Is this similar to what you are trying to achieve:
import json
j = df[["finding_name","probablity"]].to_json(orient='records')
study_id = df["StudyID"][0]
j_dict = {"findings": json.loads(j), "status": "Processed", "study_id": study_id}
j_dict
This results in:
{'findings': [{'finding_name': 'RMLO', 'probablity': 91},
{'finding_name': 'LMLO', 'probablity': 92},
{'finding_name': 'LCC', 'probablity': 93},
{'finding_name': 'RCC', 'probablity': 94}],
'status': 'Processed',
'study_id': '1.2.3.4.5.6.7.8.9'}

Related

Convert dataframe into JSON file

Dataframe:
Name Location code ID Dept Details Fbk
Kirsh HD12 76 Admin "Age:25; Location : ""SF""; From: ""London""; Marital stays: ""Single"";" Good
John HD12 87 Support "Age:35; Location : ""SF""; From: ""Chicago""; Marital stays: ""Single"";" Good
Desired output:
{
“Kirsh”: {
“Location code”:”HD12”,
“ID”: “76”,
“Dept”: “IT”,
“Details”: {
“Age”:”25”;,
“Location”:”SF”;,
“From”: "London";,
“Marital stays”: "Single";,
}
“Fbk”: “good”
},
“John”: {
“Location code”:”HD12”,
“ID”: “87”,
“Dept”: “Support”,
“Details”: {
“Age”:”35”;,
“Location”:”SF”;,
“From”: "chicago";,
“Marital stays”: "Single";,
}
“Fbk”: “good”
}
}
import pandas as pd
import json
df = pd.DataFrame({'name':['a','b','c','d'],'age':[10,20,30,40],'address':['e','f','g','h']})
df_without_name = data1.loc[:, df.columns!='name']
dict_wihtout_name = df_without_name.to_dict(orient='records')
dict_index_by_name = dict(zip(df['name'], df_without_name))
print(json.dumps(dict_index_by_name, indent=2))
Output:
{
"a": {
"age": 10,
"address": "e"
},
"b": {
"age": 20,
"address": "f"
},
"c": {
"age": 30,
"address": "g"
},
"d": {
"age": 40,
"address": "h"
}
}
Answering the comment posted by #Eswar:
If a field has multiple values then you can store it as a tuple in the dataframe. Check this answer - https://stackoverflow.com/a/74584666/1788146 on how to store tuple values in pandas dataframe.

Python get multiple specific keys and values from list of dictionaries

I have the following data:
data={
"locations": [
{
"id": "27871f2d-101c-449e-87ad-36a663b144fe",
"switch_id": 20,
"switch_port": 16,
"vlan_id": 101,
},
{
"id": "94b1d7a2-7ff2-4ba3-8259-5eb7ddd09fe1",
"switch_id": 6,
"switch_port": 24,
"vlan_id": 203,
},
]
}
And what I want to do is extract 'id' and 'vlan_id' into a new dictionary with a list of sub dictionaries, like this:
new_data={
"connections": [
{
"id": "27871f2d-101c-449e-87ad-36a663b144fe",
"vlan_id": 101,
},
{
"id": "94b1d7a2-7ff2-4ba3-8259-5eb7ddd09fe1",
"vlan_id": 203,
},
]
}
My initial thoughts were as a dictionary comprehension like this:
new_data = {"connections": [some code here]}
But not sure of the some code bit yet.
Try:
new_data = {"connections": [{'id': d['id'], 'vlan_id': d['vlan_id']} for d in data['locations']]}
{'connections': [{'id': '27871f2d-101c-449e-87ad-36a663b144fe', 'vlan_id': 101}, {'id': '94b1d7a2-7ff2-4ba3-8259-5eb7ddd09fe1', 'vlan_id': 203}]}
You can create the new_data variable accesing the first dictionary data like this:
new_data={
"connections": [
{
"id": data['locations'][0]['id'],
"vlan_id": data['locations'][0]['vlan_id'],
},
{
"id": data['locations'][1]['id'],
"vlan_id": data['locations'][1]['vlan_id'],
},
]
}
edit:
You can get a more dynamic approach by reading every object in the list with a forloop like this:
new_data={
"connections": []
}
for object in data['locations']:
new_dict = {
"id": object["id"],
"vlan_id": object["vlan_id"]
}
new_data['connections'].append(new_dict)
Following Marc's answer here, you could modify it to
new_data = {}
for i in range(len(data['locations'])):
if "connections" not in new_data.keys():
new_data['connections'] = [{"id": data['locations'][i]['id'],"vlan_id": data['locations'][i]['vlan_id']}]
else:
new_data['connections'].append({"id": data['locations'][i]['id'],"vlan_id": data['locations'][i]['vlan_id']})
The Answers here are good but you can make the code more dynamic
keys_to_extract = ['id', 'vlan_id']
locations = data['locations']
connections = { key: val for key, val in locations.items() if key in keys_to_extract }
new_data = {'connections': connections}
Now you can change the keys you need on the fly

Converting Dictionary in list in list to dataframe in python

I am really a newbie. Thanks much.
Dictionary in list from JSON looks like this:
data1= [ [{Code:A, date:XXX}], [{Code:B, date:YYY}]]
How can i convert this into dataframe?
Output I want is:
enter image description here
I tried the following code but it's not working.
fda_df=pd.read_json(json.dumps(data1))
The real data is
[
[
{
"code": "AA.US",
"date": "2022-12-31",
"earningsEstimateAvg": "4.5400",
"earningsEstimateGrowth": "0.0630",
"earningsEstimateHigh": "8.5000",
"earningsEstimateLow": "2.2000",
"earningsEstimateNumberOfAnalysts": "12.0000",
"earningsEstimateYearAgoEps": "4.2700",
"epsRevisionsDownLast30days": "0.0000",
"epsRevisionsUpLast30days": "6.0000",
"epsRevisionsUpLast7days": "1.0000",
"epsTrend30daysAgo": "3.8700",
"epsTrend60daysAgo": "3.8200",
"epsTrend7daysAgo": "4.5200",
"epsTrend90daysAgo": "2.5900",
"epsTrendCurrent": "4.5400",
"growth": "0.0630",
"period": "+1y",
"revenueEstimateAvg": "11018700000.00",
"revenueEstimateGrowth": "0.0180",
"revenueEstimateHigh": "12927000000.00",
"revenueEstimateLow": "10029900000.00",
"revenueEstimateNumberOfAnalysts": "9.00",
"revenueEstimateYearAgoEps": null
} ],
[
{
"code": "AAIC.US",
"date": "2022-12-31",
"earningsEstimateAvg": "0.2600",
"earningsEstimateGrowth": "0.4440",
"earningsEstimateHigh": "0.3900",
"earningsEstimateLow": "0.1700",
"earningsEstimateNumberOfAnalysts": "3.0000",
"earningsEstimateYearAgoEps": "0.1800",
"epsRevisionsDownLast30days": "0.0000",
"epsRevisionsUpLast30days": "1.0000",
"epsRevisionsUpLast7days": "0.0000",
"epsTrend30daysAgo": "0.2600",
"epsTrend60daysAgo": "0.2100",
"epsTrend7daysAgo": "0.2600",
"epsTrend90daysAgo": "0.2300",
"epsTrendCurrent": "0.2600",
"growth": "0.4440",
"period": "+1y",
"revenueEstimateAvg": "17280000.00",
"revenueEstimateGrowth": "0.1680",
"revenueEstimateHigh": "22110000.00",
"revenueEstimateLow": "12450000.00",
"revenueEstimateNumberOfAnalysts": "2.00",
"revenueEstimateYearAgoEps": null
},
{
"code": "AAIC.US",
"date": "2020-09-30",
"earningsEstimateAvg": "0.0200",
"earningsEstimateGrowth": "-0.8890",
"earningsEstimateHigh": "0.0300",
"earningsEstimateLow": "0.0200",
"earningsEstimateNumberOfAnalysts": "4.0000",
"earningsEstimateYearAgoEps": "0.1800",
"epsRevisionsDownLast30days": "1.0000",
"epsRevisionsUpLast30days": "2.0000",
"epsRevisionsUpLast7days": "1.0000",
"epsTrend30daysAgo": "0.0300",
"epsTrend60daysAgo": "0.0300",
"epsTrend7daysAgo": "0.0300",
"epsTrend90daysAgo": "0.0600",
"epsTrendCurrent": "0.0200",
"growth": "-0.8890",
"period": "0q",
"revenueEstimateAvg": "3890000.00",
"revenueEstimateGrowth": "-0.1710",
"revenueEstimateHigh": "4110000.00",
"revenueEstimateLow": "3780000.00",
"revenueEstimateNumberOfAnalysts": "3.00",
"revenueEstimateYearAgoEps": null
}
] ]
I think pd.DataFrame.from_records(data1) might be what you are looking for
have a look at the documentation
I have done for a sample data. This is what you need
import pandas as pd
data= [[{'Code': 'A', 'date':'XXX', 'name' : 'anil', 'age': 15}], [{'Code':'B', 'date':'YYY', 'name': 'kapoor', 'age': 18}]]
col_name = list(data[0][0].keys())
row_data = []
for i in range(len(data)):
row_data.append(list(data[i][0].values()))
df = pd.DataFrame(row_data, columns =col_name)
print(df)

Dictionary length is equal to 3 but when trying to access an index receiving KeyError

I am attempting to parse a json response that looks like this:
{
"links": {
"next": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-08&end_date=2015-09-09&detailed=false&api_key=xxx",
"prev": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-06&end_date=2015-09-07&detailed=false&api_key=xxx",
"self": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-07&end_date=2015-09-08&detailed=false&api_key=xxx"
},
"element_count": 22,
"near_earth_objects": {
"2015-09-08": [
{
"links": {
"self": "http://www.neowsapp.com/rest/v1/neo/3726710?api_key=xxx"
},
"id": "3726710",
"neo_reference_id": "3726710",
"name": "(2015 RC)",
"nasa_jpl_url": "http://ssd.jpl.nasa.gov/sbdb.cgi?sstr=3726710",
"absolute_magnitude_h": 24.3,
"estimated_diameter": {
"kilometers": {
"estimated_diameter_min": 0.0366906138,
"estimated_diameter_max": 0.0820427065
},
"meters": {
"estimated_diameter_min": 36.6906137531,
"estimated_diameter_max": 82.0427064882
},
"miles": {
"estimated_diameter_min": 0.0227984834,
"estimated_diameter_max": 0.0509789586
},
"feet": {
"estimated_diameter_min": 120.3760332259,
"estimated_diameter_max": 269.1689931548
}
},
"is_potentially_hazardous_asteroid": false,
"close_approach_data": [
{
"close_approach_date": "2015-09-08",
"close_approach_date_full": "2015-Sep-08 09:45",
"epoch_date_close_approach": 1441705500000,
"relative_velocity": {
"kilometers_per_second": "19.4850295284",
"kilometers_per_hour": "70146.106302123",
"miles_per_hour": "43586.0625520053"
},
"miss_distance": {
"astronomical": "0.0269230459",
"lunar": "10.4730648551",
"kilometers": "4027630.320552233",
"miles": "2502653.4316094954"
},
"orbiting_body": "Earth"
}
],
"is_sentry_object": false
},
}
I am trying to figure out how to parse through to get "miss_distance" dictionary values ? I am unable to wrap my head around it.
Here is what I have been able to do so far:
After I get a Response object from request.get()
response = request.get(url
I convert the response object to json object
data = response.json() #this returns dictionary object
I try to parse the first level of the dictionary:
for i in data:
if i == "near_earth_objects":
dataset1 = data["near_earth_objects"]["2015-09-08"]
#this returns the next object which is of type list
Please someone can explain me :
1. How to decipher this response in the first place.
2. How can I move forward in parsing the response object and get to miss_distance dictionary ?
Please any pointers/help is appreciated.
Thank you
Your data will will have multiple dictionaries for the each date, near earth object, and close approach:
near_earth_objects = data['near_earth_objects']
for date in near_earth_objects:
objects = near_earth_objects[date]
for object in objects:
close_approach_data = object['close_approach_data']
for close_approach in close_approach_data:
print(close_approach['miss_distance'])
The code below gives you a table of date, miss_distances for every object for every date
import json
raw_json = '''
{
"near_earth_objects": {
"2015-09-08": [
{
"close_approach_data": [
{
"miss_distance": {
"astronomical": "0.0269230459",
"lunar": "10.4730648551",
"kilometers": "4027630.320552233",
"miles": "2502653.4316094954"
},
"orbiting_body": "Earth"
}
]
}
]
}
}
'''
if __name__ == "__main__":
parsed = json.loads(raw_json)
# assuming this json includes more than one near_earch_object spread across dates
near_objects = []
for date, near_objs in parsed['near_earth_objects'].items():
for obj in near_objs:
for appr in obj['close_approach_data']:
o = {
'date': date,
'miss_distances': appr['miss_distance']
}
near_objects.append(o)
print(near_objects)
output:
[
{'date': '2015-09-08',
'miss_distances': {
'astronomical': '0.0269230459',
'lunar': '10.4730648551',
'kilometers': '4027630.320552233',
'miles': '2502653.4316094954'
}
}
]

Flatten the Json file data using pandas normalizer

I would like to flatten the complex nested json file .Please find the below sample json data
{
"applications": [
{
"id": 87334412,
"name": "cdata1",
"language": "known",
"health_status": "unknown",
"reporting": true,
"last_reported_at": "2017-10-06T06:30:55+00:00",
"application_summary": {
"response_time": 1.2,
"throughput": 216,
"error_rate": 0,
"target": 0.5,
"ascore": 1,
"host_count": 3,
"instance_count": 3
},
"settings": {
"column": 0.5,
"columns": 7,
"columns1": true,
"columns2": false
},
"links": {
"application_data": [
93818199,
93819351,
93819359
],
"servers": [],
"application_content": [
32006189,
87342924,
47565225
]
}
},
code using :
import json
from pandas.io.json import json_normalize
json_file=open('ptr1.json')
json_data=json.load(json_file)
#print json_data["applications"]
for line in json_data:
data=json_normalize(line,['name','id'])
print data
can any one help to get the following data name,id,last_reported_at,instance_count. note json file contains many id details
IIUC:
In [34]: d = json.loads(json_str)
In [35]: cols = ['id','name','last_reported_at','application_summary.instance_count']
In [36]: pd.io.json.json_normalize(d['applications'])[cols]
Out[36]:
id name last_reported_at application_summary.instance_count
0 87334412 cdata1 2017-10-06T06:30:55+00:00 3
1 87334444 cdata2 2017-10-05T06:30:55+00:00 3

Categories