Dataframe:
Name Location code ID Dept Details Fbk
Kirsh HD12 76 Admin "Age:25; Location : ""SF""; From: ""London""; Marital stays: ""Single"";" Good
John HD12 87 Support "Age:35; Location : ""SF""; From: ""Chicago""; Marital stays: ""Single"";" Good
Desired output:
{
“Kirsh”: {
“Location code”:”HD12”,
“ID”: “76”,
“Dept”: “IT”,
“Details”: {
“Age”:”25”;,
“Location”:”SF”;,
“From”: "London";,
“Marital stays”: "Single";,
}
“Fbk”: “good”
},
“John”: {
“Location code”:”HD12”,
“ID”: “87”,
“Dept”: “Support”,
“Details”: {
“Age”:”35”;,
“Location”:”SF”;,
“From”: "chicago";,
“Marital stays”: "Single";,
}
“Fbk”: “good”
}
}
import pandas as pd
import json
df = pd.DataFrame({'name':['a','b','c','d'],'age':[10,20,30,40],'address':['e','f','g','h']})
df_without_name = data1.loc[:, df.columns!='name']
dict_wihtout_name = df_without_name.to_dict(orient='records')
dict_index_by_name = dict(zip(df['name'], df_without_name))
print(json.dumps(dict_index_by_name, indent=2))
Output:
{
"a": {
"age": 10,
"address": "e"
},
"b": {
"age": 20,
"address": "f"
},
"c": {
"age": 30,
"address": "g"
},
"d": {
"age": 40,
"address": "h"
}
}
Answering the comment posted by #Eswar:
If a field has multiple values then you can store it as a tuple in the dataframe. Check this answer - https://stackoverflow.com/a/74584666/1788146 on how to store tuple values in pandas dataframe.
Related
If I have a heavy json file that have 30m entries like that
{"id":3,"price":"231","type":"Y","location":"NY"}
{"id":4,"price":"321","type":"N","city":"BR"}
{"id":5,"price":"354","type":"Y","city":"XE","location":"CP"}
--snip--
{"id":30373779,"price":"121","type":"N","city":"SR","location":"IU"}
{"id":30373780,"price":"432","type":"Y","location":"TB"}
{"id":30373780,"price":"562","type":"N","city":"CQ"}
how I can only abstract the location and the city and parse it into one json like that in python:
{
"orders":{
3:{
"location":"NY"
},
4:{
"city":"BR"
},
5:{
"city":"XE",
"location":"CP"
},
30373779:{
"city":"SR",
"location":"IU"
},
30373780:{
"location":"TB"
},
30373780:{
"city":"CQ"
}
}
}
P.S: beatufy the syntax is not necessary.
Assuming your input file is actually in jsonlines format, then you can read each line, extract the city and location keys from the dict and then append those to a new dict:
import json
from collections import defaultdict
orders = { 'orders' : defaultdict(dict) }
with open('orders.txt', 'r') as f:
for line in f:
o = json.loads(line)
id = o['id']
if 'location' in o:
orders['orders'][id]['location'] = o['location']
if 'city' in o:
orders['orders'][id]['city'] = o['city']
print(orders)
Output for your sample data (note it has two 30373780 id values, so the values get merged into one dict):
{
"orders": {
"3": {
"location": "NY"
},
"4": {
"city": "BR"
},
"5": {
"location": "CP",
"city": "XE"
},
"30373779": {
"location": "IU",
"city": "SR"
},
"30373780": {
"location": "TB",
"city": "CQ"
}
}
}
As you've said that your file is pretty big and you probably don't want to keep all entries in memory here is the way to consume source file line by line and write output immediately:
import json
with open(r"in.jsonp") as i_f, open(r"out.json", "w") as o_f:
o_f.write('{"orders":{')
for i in i_f:
i_obj = json.loads(i)
o_f.write(f'{i_obj["id"]}:')
o_obj = {}
if location := i_obj.get("location"):
o_obj["location"] = location
if city := i_obj.get("city"):
o_obj["city"] = city
json.dump(o_obj, o_f)
o_f.write(",")
o_f.write('}}')
It will generate semi-valid JSON object in same format you've provided in your question.
I am fetching api and trying that response into csv but on catch is there this is multilevel dict or json when i am converting into csv most of the look like list of dict or dicts
I am trying using this
def expand(data):
d = pd.Series(data)
t = d.index
for i in t:
if type(d[i]) in (list,dict):
expend_s = pd.Series(d[i])
t.append(expend_s.index)
d = d.append(expend_s)
d = d.drop([i])
return d
df['person'].apply(expand)
but this solution is not working. if we see person col there is multiple dict or list of dict like
"birthDate": "0000-00-00",
"genderCode": {
"codeValue": "M",
"shortName": "Male",
"longName": "Male"
},
"maritalStatusCode": {
"codeValue": "M",
"shortName": "Married"
},
"disabledIndicator": False,
"preferredName": {},
"ethnicityCode": {
"codeValue": "4",
"shortName": "4",
"longName": "Not Hispanic or Latino"
},
"raceCode": {
"identificationMethodCode": {},
"codeValue": "1",
"shortName": "White",
"longName": "White"
},
"militaryClassificationCodes": [],
"governmentIDs": [
{
"itemID": "9200037107708_4385",
"idValue": "XXX-XX-XXXX",
"nameCode": {
"codeValue": "SSN",
"longName": "Social Security Number"
},
"countryCode": "US"
}
],
"legalName": {
"givenName": "Jack",
"middleName": "C",
"familyName1": "Abele",
"formattedName": "Abele, Jack C"
},
"legalAddress": {
"nameCode": {
"codeValue": "Personal Address 1",
"shortName": "Personal Address 1",
"longName": "Personal Address 1"
},
"lineOne": "1932 Keswick Lane",
"cityName": "Concord",
"countrySubdivisionLevel1": {
"subdivisionType": "StateTerritory",
"codeValue": "CA",
"shortName": "California"
},
"countryCode": "US",
"postalCode": "94518"
},
"communication": {
"mobiles": [
{
"itemID": "9200037107708_4389",
"nameCode": {
"codeValue": "Personal Cell",
"shortName": "Personal Cell"
},
"countryDialing": "1",
"areaDialing": "925",
"dialNumber": "6860589",
"access": "1",
"formattedNumber": "(925) 686-0589"
}
]
}
}
your suggestion and advice would be so helpful
I think we can solve multiple dict using read as pd.josn_normalise and list of dict using the below functions first we get those columns which have list
def df_list_and_dict_col(explode_df: pd.DataFrame, primary_key: str,
col_name: str, folder: str) -> pd.DataFrame:
""" convert list of dict or list of into clean dataframe
Keyword arguments:
-----------------
dict: explode_df -- dataframe where we have to expand column
dict: col_name -- main_file name where most of data is present
Return: pd.DataFrame
return clean or expand dataframe
"""
explode_df[col_name] = explode_df[col_name].replace('', '[]', regex=True)
explode_df[col_name] = explode_df[col_name].fillna('[]')
explode_df[col_name] = explode_df[col_name].astype(
'string') # to make sure that entire column is string
explode_df[col_name] = explode_df[col_name].apply(ast.literal_eval)
explode_df = explode_df.explode(col_name)
explode_df = explode_df.reset_index(drop=True)
normalized_df = pd.json_normalize(explode_df[col_name])
explode_df = explode_df.join(
other=normalized_df,
lsuffix="_left",
rsuffix="_right"
)
explode_df = explode_df.drop(columns=col_name)
type_df = explode_df.applymap(type)
col_list = []
for col in type_df.columns:
if (type_df[col]==type([])).any():
col_list.append(col)
# print(col_list,explode_df.columns)
if len(col_list) != 0:
for col in col_list:
df_list_and_dict_col(explode_df[[primary_key,col]], primary_key,
col, folder)
explode_df.drop(columns=col, inplace =True)
print(f'{col}.csv is done')
explode_df.to_csv(f'{folder}/{col_name}.csv')
first we get list col and pass col to function one by one and then check is there any list inside col and then go on and save into csv
type_df = df.applymap(type)
col_list =[]
for col in type_df.columns:
if (type_df[col]==type([])).any():
col_list.append(col)
for col in col_list:
# print(col, df[['associateOID',col]])
df_list_and_dict_col(df[['primary_key',col]].copy(), 'primary_key', col,folder='worker')
df.drop(columns=col, inplace=True)
now you have multiple csv in normalise format
Hi i want to convert my dataframe to a specific json structure. my dataframe look something like this :
df = pd.DataFrame([["file1", "1.2.3.4.5.6.7.8.9", 91, "RMLO"], ["file2", "1.2.3.4.5.6.7.8.9", 92, "LMLO"], ["file3", "1.2.3.4.5.6.7.8.9", 93, "LCC"], ["file4", "1.2.3.4.5.6.7.8.9", 94, "RCC"]], columns=["Filename", "StudyID", "probablity", "finding_name"])
And the json structure in which i want to convert my datafram is below :
{
"findings": [
{
"name": "RMLO",
"probability": "91"
},
{
"name": "LMLO",
"probability": "92"
},
{
"name": "LCC",
"probability": "93"
}
{
"name": "LCC93",
"probability" : "94"
}
],
"status": "Processed",
"study_id": "1.2.3.4.5.6.7.8.9.0"
}
i tried implementing this with below code with different orient variables but i didn't get what i wanted.
j = df[["probablity","findings"]].to_json(orient='records')
so if any can help in achiveing this..
Thanks.
Is this similar to what you are trying to achieve:
import json
j = df[["finding_name","probablity"]].to_json(orient='records')
study_id = df["StudyID"][0]
j_dict = {"findings": json.loads(j), "status": "Processed", "study_id": study_id}
j_dict
This results in:
{'findings': [{'finding_name': 'RMLO', 'probablity': 91},
{'finding_name': 'LMLO', 'probablity': 92},
{'finding_name': 'LCC', 'probablity': 93},
{'finding_name': 'RCC', 'probablity': 94}],
'status': 'Processed',
'study_id': '1.2.3.4.5.6.7.8.9'}
I have below sample data in JSON format :
project_cost_details is my database result set after querying.
{
"1": {
"amount": 0,
"breakdown": [
{
"amount": 169857,
"id": 4,
"name": "SampleData",
"parent_id": "1"
}
],
"id": 1,
"name": "ABC PR"
}
}
Here is full json : https://jsoneditoronline.org/?id=2ce7ab19af6f420397b07b939674f49c
Expected output :https://jsoneditoronline.org/?id=56a47e6f8e424fe8ac58c5e0732168d7
I have this sample JSON which i created using loops in code. But i am stuck at how to convert this to expected JSON format. I am getting sequential changes, need to convert to tree like or nested JSON format.
Trying in Python :
project_cost = {}
for cost in project_cost_details:
if cost.get('Parent_Cost_Type_ID'):
project_id = str(cost.get('Project_ID'))
parent_cost_type_id = str(cost.get('Parent_Cost_Type_ID'))
if project_id not in project_cost:
project_cost[project_id] = {}
if "breakdown" not in project_cost[project_id]:
project_cost[project_id]["breakdown"] = []
if 'amount' not in project_cost[project_id]:
project_cost[project_id]['amount'] = 0
project_cost[project_id]['name'] = cost.get('Title')
project_cost[project_id]['id'] = cost.get('Project_ID')
if parent_cost_type_id == cost.get('Cost_Type_ID'):
project_cost[project_id]['amount'] += int(cost.get('Amount'))
#if parent_cost_type_id is None:
project_cost[project_id]["breakdown"].append(
{
'amount': int(cost.get('Amount')),
'name': cost.get('Name'),
'parent_id': parent_cost_type_id,
'id' : cost.get('Cost_Type_ID')
}
)
from this i am getting sample JSON. It will be good if get in this code only desired format.
Also tried this solution mention here : https://adiyatmubarak.wordpress.com/2015/10/05/group-list-of-dictionary-data-by-particular-key-in-python/
I got approach to convert sample JSON to expected JSON :
data = [
{ "name" : "ABC", "parent":"DEF", },
{ "name" : "DEF", "parent":"null" },
{ "name" : "new_name", "parent":"ABC" },
{ "name" : "new_name2", "parent":"ABC" },
{ "name" : "Foo", "parent":"DEF"},
{ "name" : "Bar", "parent":"null"},
{ "name" : "Chandani", "parent":"new_name", "relation": "rel", "depth": 3 },
{ "name" : "Chandani333", "parent":"new_name", "relation": "rel", "depth": 3 }
]
result = {x.get("name"):x for x in data}
#print(result)
tree = [];
for a in data:
#print(a)
if a.get("parent") in result:
parent = result[a.get("parent")]
else:
parent = ""
if parent:
if "children" not in parent:
parent["children"] = []
parent["children"].append(a)
else:
tree.append(a)
Reference help : http://jsfiddle.net/9FqKS/ this is a JavaScript solution i converted to Python
It seems that you want to get a list of values from a dictionary.
result = [value for key, value in project_cost_details.items()]
I am attempting to parse a json response that looks like this:
{
"links": {
"next": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-08&end_date=2015-09-09&detailed=false&api_key=xxx",
"prev": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-06&end_date=2015-09-07&detailed=false&api_key=xxx",
"self": "http://www.neowsapp.com/rest/v1/feed?start_date=2015-09-07&end_date=2015-09-08&detailed=false&api_key=xxx"
},
"element_count": 22,
"near_earth_objects": {
"2015-09-08": [
{
"links": {
"self": "http://www.neowsapp.com/rest/v1/neo/3726710?api_key=xxx"
},
"id": "3726710",
"neo_reference_id": "3726710",
"name": "(2015 RC)",
"nasa_jpl_url": "http://ssd.jpl.nasa.gov/sbdb.cgi?sstr=3726710",
"absolute_magnitude_h": 24.3,
"estimated_diameter": {
"kilometers": {
"estimated_diameter_min": 0.0366906138,
"estimated_diameter_max": 0.0820427065
},
"meters": {
"estimated_diameter_min": 36.6906137531,
"estimated_diameter_max": 82.0427064882
},
"miles": {
"estimated_diameter_min": 0.0227984834,
"estimated_diameter_max": 0.0509789586
},
"feet": {
"estimated_diameter_min": 120.3760332259,
"estimated_diameter_max": 269.1689931548
}
},
"is_potentially_hazardous_asteroid": false,
"close_approach_data": [
{
"close_approach_date": "2015-09-08",
"close_approach_date_full": "2015-Sep-08 09:45",
"epoch_date_close_approach": 1441705500000,
"relative_velocity": {
"kilometers_per_second": "19.4850295284",
"kilometers_per_hour": "70146.106302123",
"miles_per_hour": "43586.0625520053"
},
"miss_distance": {
"astronomical": "0.0269230459",
"lunar": "10.4730648551",
"kilometers": "4027630.320552233",
"miles": "2502653.4316094954"
},
"orbiting_body": "Earth"
}
],
"is_sentry_object": false
},
}
I am trying to figure out how to parse through to get "miss_distance" dictionary values ? I am unable to wrap my head around it.
Here is what I have been able to do so far:
After I get a Response object from request.get()
response = request.get(url
I convert the response object to json object
data = response.json() #this returns dictionary object
I try to parse the first level of the dictionary:
for i in data:
if i == "near_earth_objects":
dataset1 = data["near_earth_objects"]["2015-09-08"]
#this returns the next object which is of type list
Please someone can explain me :
1. How to decipher this response in the first place.
2. How can I move forward in parsing the response object and get to miss_distance dictionary ?
Please any pointers/help is appreciated.
Thank you
Your data will will have multiple dictionaries for the each date, near earth object, and close approach:
near_earth_objects = data['near_earth_objects']
for date in near_earth_objects:
objects = near_earth_objects[date]
for object in objects:
close_approach_data = object['close_approach_data']
for close_approach in close_approach_data:
print(close_approach['miss_distance'])
The code below gives you a table of date, miss_distances for every object for every date
import json
raw_json = '''
{
"near_earth_objects": {
"2015-09-08": [
{
"close_approach_data": [
{
"miss_distance": {
"astronomical": "0.0269230459",
"lunar": "10.4730648551",
"kilometers": "4027630.320552233",
"miles": "2502653.4316094954"
},
"orbiting_body": "Earth"
}
]
}
]
}
}
'''
if __name__ == "__main__":
parsed = json.loads(raw_json)
# assuming this json includes more than one near_earch_object spread across dates
near_objects = []
for date, near_objs in parsed['near_earth_objects'].items():
for obj in near_objs:
for appr in obj['close_approach_data']:
o = {
'date': date,
'miss_distances': appr['miss_distance']
}
near_objects.append(o)
print(near_objects)
output:
[
{'date': '2015-09-08',
'miss_distances': {
'astronomical': '0.0269230459',
'lunar': '10.4730648551',
'kilometers': '4027630.320552233',
'miles': '2502653.4316094954'
}
}
]