Converting python dataframe to a particular JSON structute

Converting python dataframe to a particular JSON structute - python

Hi i want to convert my dataframe to a specific json structure. my dataframe look something like this :
df = pd.DataFrame([["file1", "1.2.3.4.5.6.7.8.9", 91, "RMLO"], ["file2", "1.2.3.4.5.6.7.8.9", 92, "LMLO"], ["file3", "1.2.3.4.5.6.7.8.9", 93, "LCC"], ["file4", "1.2.3.4.5.6.7.8.9", 94, "RCC"]], columns=["Filename", "StudyID", "probablity", "finding_name"])
And the json structure in which i want to convert my datafram is below :
{
"findings": [
{
"name": "RMLO",
"probability": "91"
},
{
"name": "LMLO",
"probability": "92"
},
{
"name": "LCC",
"probability": "93"
}
{
"name": "LCC93",
"probability" : "94"
}
],
"status": "Processed",
"study_id": "1.2.3.4.5.6.7.8.9.0"
}
i tried implementing this with below code with different orient variables but i didn't get what i wanted.
j = df[["probablity","findings"]].to_json(orient='records')
so if any can help in achiveing this..
Thanks.

Is this similar to what you are trying to achieve:
import json
j = df[["finding_name","probablity"]].to_json(orient='records')
study_id = df["StudyID"][0]
j_dict = {"findings": json.loads(j), "status": "Processed", "study_id": study_id}
j_dict
This results in:
{'findings': [{'finding_name': 'RMLO', 'probablity': 91},
{'finding_name': 'LMLO', 'probablity': 92},
{'finding_name': 'LCC', 'probablity': 93},
{'finding_name': 'RCC', 'probablity': 94}],
'status': 'Processed',
'study_id': '1.2.3.4.5.6.7.8.9'}

Related

Convert dataframe into JSON file

Dataframe:
Name Location code ID Dept Details Fbk
Kirsh HD12 76 Admin "Age:25; Location : ""SF""; From: ""London""; Marital stays: ""Single"";" Good
John HD12 87 Support "Age:35; Location : ""SF""; From: ""Chicago""; Marital stays: ""Single"";" Good
Desired output:
{
“Kirsh”: {
“Location code”:”HD12”,
“ID”: “76”,
“Dept”: “IT”,
“Details”: {
“Age”:”25”;,
“Location”:”SF”;,
“From”: "London";,
“Marital stays”: "Single";,
}
“Fbk”: “good”
},
“John”: {
“Location code”:”HD12”,
“ID”: “87”,
“Dept”: “Support”,
“Details”: {
“Age”:”35”;,
“Location”:”SF”;,
“From”: "chicago";,
“Marital stays”: "Single";,
}
“Fbk”: “good”
}
}

import pandas as pd
import json
df = pd.DataFrame({'name':['a','b','c','d'],'age':[10,20,30,40],'address':['e','f','g','h']})
df_without_name = data1.loc[:, df.columns!='name']
dict_wihtout_name = df_without_name.to_dict(orient='records')
dict_index_by_name = dict(zip(df['name'], df_without_name))
print(json.dumps(dict_index_by_name, indent=2))
Output:
{
"a": {
"age": 10,
"address": "e"
},
"b": {
"age": 20,
"address": "f"
},
"c": {
"age": 30,
"address": "g"
},
"d": {
"age": 40,
"address": "h"
}
}
Answering the comment posted by #Eswar:
If a field has multiple values then you can store it as a tuple in the dataframe. Check this answer - https://stackoverflow.com/a/74584666/1788146 on how to store tuple values in pandas dataframe.

Python get multiple specific keys and values from list of dictionaries

I have the following data:
data={
"locations": [
{
"id": "27871f2d-101c-449e-87ad-36a663b144fe",
"switch_id": 20,
"switch_port": 16,
"vlan_id": 101,
},
{
"id": "94b1d7a2-7ff2-4ba3-8259-5eb7ddd09fe1",
"switch_id": 6,
"switch_port": 24,
"vlan_id": 203,
},
]
}
And what I want to do is extract 'id' and 'vlan_id' into a new dictionary with a list of sub dictionaries, like this:
new_data={
"connections": [
{
"id": "27871f2d-101c-449e-87ad-36a663b144fe",
"vlan_id": 101,
},
{
"id": "94b1d7a2-7ff2-4ba3-8259-5eb7ddd09fe1",
"vlan_id": 203,
},
]
}
My initial thoughts were as a dictionary comprehension like this:
new_data = {"connections": [some code here]}
But not sure of the some code bit yet.

Try:
new_data = {"connections": [{'id': d['id'], 'vlan_id': d['vlan_id']} for d in data['locations']]}
{'connections': [{'id': '27871f2d-101c-449e-87ad-36a663b144fe', 'vlan_id': 101}, {'id': '94b1d7a2-7ff2-4ba3-8259-5eb7ddd09fe1', 'vlan_id': 203}]}

You can create the new_data variable accesing the first dictionary data like this:
new_data={
"connections": [
{
"id": data['locations'][0]['id'],
"vlan_id": data['locations'][0]['vlan_id'],
},
{
"id": data['locations'][1]['id'],
"vlan_id": data['locations'][1]['vlan_id'],
},
]
}
edit:
You can get a more dynamic approach by reading every object in the list with a forloop like this:
new_data={
"connections": []
}
for object in data['locations']:
new_dict = {
"id": object["id"],
"vlan_id": object["vlan_id"]
}
new_data['connections'].append(new_dict)

Following Marc's answer here, you could modify it to
new_data = {}
for i in range(len(data['locations'])):
if "connections" not in new_data.keys():
new_data['connections'] = [{"id": data['locations'][i]['id'],"vlan_id": data['locations'][i]['vlan_id']}]
else:
new_data['connections'].append({"id": data['locations'][i]['id'],"vlan_id": data['locations'][i]['vlan_id']})

The Answers here are good but you can make the code more dynamic
keys_to_extract = ['id', 'vlan_id']
locations = data['locations']
connections = { key: val for key, val in locations.items() if key in keys_to_extract }
new_data = {'connections': connections}
Now you can change the keys you need on the fly

Converting Dictionary in list in list to dataframe in python

I am really a newbie. Thanks much.
Dictionary in list from JSON looks like this:
data1= [ [{Code:A, date:XXX}], [{Code:B, date:YYY}]]
How can i convert this into dataframe?
Output I want is:
enter image description here
I tried the following code but it's not working.
fda_df=pd.read_json(json.dumps(data1))
The real data is
[
[
{
"code": "AA.US",
"date": "2022-12-31",
"earningsEstimateAvg": "4.5400",
"earningsEstimateGrowth": "0.0630",
"earningsEstimateHigh": "8.5000",
"earningsEstimateLow": "2.2000",
"earningsEstimateNumberOfAnalysts": "12.0000",
"earningsEstimateYearAgoEps": "4.2700",
"epsRevisionsDownLast30days": "0.0000",
"epsRevisionsUpLast30days": "6.0000",
"epsRevisionsUpLast7days": "1.0000",
"epsTrend30daysAgo": "3.8700",
"epsTrend60daysAgo": "3.8200",
"epsTrend7daysAgo": "4.5200",
"epsTrend90daysAgo": "2.5900",
"epsTrendCurrent": "4.5400",
"growth": "0.0630",
"period": "+1y",
"revenueEstimateAvg": "11018700000.00",
"revenueEstimateGrowth": "0.0180",
"revenueEstimateHigh": "12927000000.00",
"revenueEstimateLow": "10029900000.00",
"revenueEstimateNumberOfAnalysts": "9.00",
"revenueEstimateYearAgoEps": null
} ],
[
{
"code": "AAIC.US",
"date": "2022-12-31",
"earningsEstimateAvg": "0.2600",
"earningsEstimateGrowth": "0.4440",
"earningsEstimateHigh": "0.3900",
"earningsEstimateLow": "0.1700",
"earningsEstimateNumberOfAnalysts": "3.0000",
"earningsEstimateYearAgoEps": "0.1800",
"epsRevisionsDownLast30days": "0.0000",
"epsRevisionsUpLast30days": "1.0000",
"epsRevisionsUpLast7days": "0.0000",
"epsTrend30daysAgo": "0.2600",
"epsTrend60daysAgo": "0.2100",
"epsTrend7daysAgo": "0.2600",
"epsTrend90daysAgo": "0.2300",
"epsTrendCurrent": "0.2600",
"growth": "0.4440",
"period": "+1y",
"revenueEstimateAvg": "17280000.00",
"revenueEstimateGrowth": "0.1680",
"revenueEstimateHigh": "22110000.00",
"revenueEstimateLow": "12450000.00",
"revenueEstimateNumberOfAnalysts": "2.00",
"revenueEstimateYearAgoEps": null
},
{
"code": "AAIC.US",
"date": "2020-09-30",
"earningsEstimateAvg": "0.0200",
"earningsEstimateGrowth": "-0.8890",
"earningsEstimateHigh": "0.0300",
"earningsEstimateLow": "0.0200",
"earningsEstimateNumberOfAnalysts": "4.0000",
"earningsEstimateYearAgoEps": "0.1800",
"epsRevisionsDownLast30days": "1.0000",
"epsRevisionsUpLast30days": "2.0000",
"epsRevisionsUpLast7days": "1.0000",
"epsTrend30daysAgo": "0.0300",
"epsTrend60daysAgo": "0.0300",
"epsTrend7daysAgo": "0.0300",
"epsTrend90daysAgo": "0.0600",
"epsTrendCurrent": "0.0200",
"growth": "-0.8890",
"period": "0q",
"revenueEstimateAvg": "3890000.00",
"revenueEstimateGrowth": "-0.1710",
"revenueEstimateHigh": "4110000.00",
"revenueEstimateLow": "3780000.00",
"revenueEstimateNumberOfAnalysts": "3.00",
"revenueEstimateYearAgoEps": null
}
] ]

I think pd.DataFrame.from_records(data1) might be what you are looking for
have a look at the documentation

I have done for a sample data. This is what you need
import pandas as pd
data= [[{'Code': 'A', 'date':'XXX', 'name' : 'anil', 'age': 15}], [{'Code':'B', 'date':'YYY', 'name': 'kapoor', 'age': 18}]]
col_name = list(data[0][0].keys())
row_data = []
for i in range(len(data)):
row_data.append(list(data[i][0].values()))
df = pd.DataFrame(row_data, columns =col_name)
print(df)

Dictionary length is equal to 3 but when trying to access an index receiving KeyError

I am attempting to parse a json response that looks like this:
{
"links": {
"next": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-08&end_date=2015-09-09&detailed=false&api_key=xxx",
"prev": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-06&end_date=2015-09-07&detailed=false&api_key=xxx",
"self": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-07&end_date=2015-09-08&detailed=false&api_key=xxx"
},
"element_count": 22,
"near_earth_objects": {
"2015-09-08": [
{
"links": {
"self": "http://www.neowsapp.com/rest/v1/neo/3726710?api_key=xxx"
},
"id": "3726710",
"neo_reference_id": "3726710",
"name": "(2015 RC)",
"nasa_jpl_url": "http://ssd.jpl.nasa.gov/sbdb.cgi?sstr=3726710",
"absolute_magnitude_h": 24.3,
"estimated_diameter": {
"kilometers": {
"estimated_diameter_min": 0.0366906138,
"estimated_diameter_max": 0.0820427065
},
"meters": {
"estimated_diameter_min": 36.6906137531,
"estimated_diameter_max": 82.0427064882
},
"miles": {
"estimated_diameter_min": 0.0227984834,
"estimated_diameter_max": 0.0509789586
},
"feet": {
"estimated_diameter_min": 120.3760332259,
"estimated_diameter_max": 269.1689931548
}
},
"is_potentially_hazardous_asteroid": false,
"close_approach_data": [
{
"close_approach_date": "2015-09-08",
"close_approach_date_full": "2015-Sep-08 09:45",
"epoch_date_close_approach": 1441705500000,
"relative_velocity": {
"kilometers_per_second": "19.4850295284",
"kilometers_per_hour": "70146.106302123",
"miles_per_hour": "43586.0625520053"
},
"miss_distance": {
"astronomical": "0.0269230459",
"lunar": "10.4730648551",
"kilometers": "4027630.320552233",
"miles": "2502653.4316094954"
},
"orbiting_body": "Earth"
}
],
"is_sentry_object": false
},
}
I am trying to figure out how to parse through to get "miss_distance" dictionary values ? I am unable to wrap my head around it.
Here is what I have been able to do so far:
After I get a Response object from request.get()
response = request.get(url
I convert the response object to json object
data = response.json() #this returns dictionary object
I try to parse the first level of the dictionary:
for i in data:
if i == "near_earth_objects":
dataset1 = data["near_earth_objects"]["2015-09-08"]
#this returns the next object which is of type list
Please someone can explain me :
1. How to decipher this response in the first place.
2. How can I move forward in parsing the response object and get to miss_distance dictionary ?
Please any pointers/help is appreciated.
Thank you

Your data will will have multiple dictionaries for the each date, near earth object, and close approach:
near_earth_objects = data['near_earth_objects']
for date in near_earth_objects:
objects = near_earth_objects[date]
for object in objects:
close_approach_data = object['close_approach_data']
for close_approach in close_approach_data:
print(close_approach['miss_distance'])

The code below gives you a table of date, miss_distances for every object for every date
import json
raw_json = '''
{
"near_earth_objects": {
"2015-09-08": [
{
"close_approach_data": [
{
"miss_distance": {
"astronomical": "0.0269230459",
"lunar": "10.4730648551",
"kilometers": "4027630.320552233",
"miles": "2502653.4316094954"
},
"orbiting_body": "Earth"
}
]
}
]
}
}
'''
if __name__ == "__main__":
parsed = json.loads(raw_json)
# assuming this json includes more than one near_earch_object spread across dates
near_objects = []
for date, near_objs in parsed['near_earth_objects'].items():
for obj in near_objs:
for appr in obj['close_approach_data']:
o = {
'date': date,
'miss_distances': appr['miss_distance']
}
near_objects.append(o)
print(near_objects)
output:
[
{'date': '2015-09-08',
'miss_distances': {
'astronomical': '0.0269230459',
'lunar': '10.4730648551',
'kilometers': '4027630.320552233',
'miles': '2502653.4316094954'
}
}
]

Flatten the Json file data using pandas normalizer

I would like to flatten the complex nested json file .Please find the below sample json data
{
"applications": [
{
"id": 87334412,
"name": "cdata1",
"language": "known",
"health_status": "unknown",
"reporting": true,
"last_reported_at": "2017-10-06T06:30:55+00:00",
"application_summary": {
"response_time": 1.2,
"throughput": 216,
"error_rate": 0,
"target": 0.5,
"ascore": 1,
"host_count": 3,
"instance_count": 3
},
"settings": {
"column": 0.5,
"columns": 7,
"columns1": true,
"columns2": false
},
"links": {
"application_data": [
93818199,
93819351,
93819359
],
"servers": [],
"application_content": [
32006189,
87342924,
47565225
]
}
},
code using :
import json
from pandas.io.json import json_normalize
json_file=open('ptr1.json')
json_data=json.load(json_file)
#print json_data["applications"]
for line in json_data:
data=json_normalize(line,['name','id'])
print data
can any one help to get the following data name,id,last_reported_at,instance_count. note json file contains many id details

IIUC:
In [34]: d = json.loads(json_str)
In [35]: cols = ['id','name','last_reported_at','application_summary.instance_count']
In [36]: pd.io.json.json_normalize(d['applications'])[cols]
Out[36]:
id name last_reported_at application_summary.instance_count
0 87334412 cdata1 2017-10-06T06:30:55+00:00 3
1 87334444 cdata2 2017-10-05T06:30:55+00:00 3

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting python dataframe to a particular JSON structute - python

Related

Convert dataframe into JSON file

Python get multiple specific keys and values from list of dictionaries

Converting Dictionary in list in list to dataframe in python

Dictionary length is equal to 3 but when trying to access an index receiving KeyError

Flatten the Json file data using pandas normalizer

Categories

Resources