create dataframe from specific node in json response

create dataframe from specific node in json response - python

I have the following JSON structure:
{
"products": [
{
"id": 12121,
"product": "hair",
"tag":"now, later",
"types": [
{
"product_id": 11111,
"id": 22222
}
],
"options": [
{
"name": "Title"
}
]
},
{
"id": 1313131,
"product": "pillow",
"tag":"later, never",
"types": [
{
"product_id": 33333,
"id": 44444
}
],
"options": [
{
"name": "Title"
}
]
},
{
"id": 14141414,
"product": "face",
"tag":"now, never",
"types": [
{
"product_id": 5555,
"id": 7777
}
],
"options": [
{
"name": "Title"
}
]
}
]
}
I'm looking to create a dataframe of the values found in types only when the tag list says "now", output expected:
tag product_id id
0 now 11111 22222
1 now 5555 7777
I was hoping for some guidance as I haven't dealt with JSON structures that have multiples lists and how to target based on finding a value like what is inside tag. Any hints would be greatly appreciated. Thank you in advanced.

Try this with a list comprehension:
>>> pd.DataFrame([{'tag': 'now', **i['types'][0]} for i in dct['products'] if 'now' in i['tag']])
tag product_id id
0 now 11111 22222
1 now 5555 7777
>>>

Related

How to get data from nested list in response.json()

There is a json response from an API request in the following schema:
[
{
"id": "1",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{
"id": "1",
"name": "group",
"category": {
"555": "general"
}
}
],
"series": [
{
"location": {
"id": "1",
"level": {
"id": "n1",
"name": "z"
},
"name": "z"
},
"serie": {
"202001": "0.08",
"202002": "0.48",
"202003": "0.19"
}
}
]
}
]
}
]
I want to transform the data from the "serie" key into a pandas DataFrame.
I can do that explicitly:
content = val[0]["results"][0]["series"][0]["serie"]
df = pd.DataFrame(content.items())
df
0 1
0 202001 0.08
1 202002 0.48
2 202003 0.19
But if there is more than one record, that would get only the data from the first element because of the positional arguments [0].
Is there a way to retrieve that data not considering the positional arguments?

Try:
val = [
{
"id": "1",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{"id": "1", "name": "group", "category": {"555": "general"}}
],
"series": [
{
"location": {
"id": "1",
"level": {"id": "n1", "name": "z"},
"name": "z",
},
"serie": {"202001": "0.08", "202002": "0.48", "202003": "0.19"},
}
],
}
],
},
{
"id": "2",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{"id": "1", "name": "group", "category": {"555": "general"}}
],
"series": [
{
"location": {
"id": "1",
"level": {"id": "n1", "name": "z"},
"name": "z",
},
"serie": {"202001": "1.08", "202002": "1.48", "202003": "1.19"},
}
],
}
],
},
]
df = pd.DataFrame(
[k, v]
for i in val
for ii in i["results"]
for s in ii["series"]
for k, v in s["serie"].items()
)
print(df)
Prints:
0 1
0 202001 0.08
1 202002 0.48
2 202003 0.19
3 202001 1.08
4 202002 1.48
5 202003 1.19

Converting nested json data to csv using pandas dataframe

I have a JSON data like the below:
jsonStr = '''
{
"student_details": [
{
"ID": 101,
"Name": [
{
"First_Name": "AAA",
"Last_Name": "BBB"
},
{
"Father": "AAA1",
"Mother": "BBB1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111111111,
"Student_PhoneNum2": 1111111112
},
{
"Parent_PhoneNum1": 1111111121,
"Parent_PhoneNum2": 1111111132
}
],
"DOB": "1998-05-05",
"Place_of_Birth": "AA",
"Marks": [
{
"DataStructures": 95,
"ObjectOrientedProgramming": 85,
"DiscreteMathematics": 100,
"AnalysisOfAlgorithm": 99,
"NetworkSecurity": 85
}
],
"DateOfJoining": "2022-05-05"
},
{
"ID": 102,
"Name": [
{
"First_Name": "ZZZ",
"Last_Name": "YYY"
},
{
"Father": "ZZZ1",
"Mother": "YYY1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111111182,
"Student_PhoneNum2": 1111111182
},
{
"Parent_PhoneNum1": 1111111128,
"Parent_PhoneNum2": 1111111832
}
],
"DOB": "1998-06-10",
"Place_of_Birth": "ZZ",
"Marks": [
{
"DataStructures": 25,
"ObjectOrientedProgramming": 50,
"DiscreteMathematics": 75,
"AnalysisOfAlgorithm": 60,
"NetworkSecurity": 30
}
],
"DateOfJoining": "2022-05-05"
},
{
"ID": 103,
"Name": [
{
"First_Name": "TTT",
"Last_Name": "UUU"
},
{
"Father": "TTT1",
"Mother": "UUU1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111118753,
"Student_PhoneNum2": 1111111153
},
{
"Parent_PhoneNum1": 1111111523,
"Parent_PhoneNum2": 1111111533
}
],
"DOB": "1999-01-01",
"Place_of_Birth": "TT",
"Marks": [
{
"DataStructures": 50,
"ObjectOrientedProgramming": 75,
"DiscreteMathematics": 65,
"AnalysisOfAlgorithm": 75,
"NetworkSecurity": 40
}
],
"DateOfJoining": "2022-05-06"
}
]
}
'''
I'm trying to convert every key-value pair to a csv file from this data using the below code
import pandas as pd
ar = pd.read_json(jsonStr)
df = pd.json_normalize(ar['student_details'])
print(df)
df.to_csv('CSVresult.csv', index=False)
for accessing the JSON data, I have passed json data header named student_details.
Result:
is there any way to get the data like the below(every key-value pairs in separate columns) without passing the header student_details and the column names directly?(the json data contain a lot of nested data like this)

you can use:
df = pd.DataFrame(jsonStr)
df=df['student_details'].apply(pd.Series).explode('Name').explode('Phone_Number').explode('Marks')
for row in df.to_dict('records'):
row['Name']['ID']=row['ID']
row['Phone_Number']['ID']=row['ID']
def get_values_without_nans(col_name):
return df[col_name].apply(pd.Series).drop_duplicates().groupby("ID").agg(lambda x: np.nan if x.isnull().all() else x.dropna())
name = get_values_without_nans('Name')
phone_number=get_values_without_nans('Phone_Number')
phone_number.index=phone_number.index.astype('int32')
marks=df.set_index('ID').Marks.apply(pd.Series).drop_duplicates()
meta=df[['ID','DOB','Place_of_Birth','DateOfJoining']].drop_duplicates().set_index('ID')
final=meta.join([name,phone_number,marks])
print(final)
'''
ID DOB Place_of_Birth DateOfJoining First_Name Last_Name Father Mother Student_PhoneNum1 Student_PhoneNum2 Parent_PhoneNum1 Parent_PhoneNum2 DataStructures ObjectOrientedProgramming DiscreteMathematics AnalysisOfAlgorithm
0 101 1998-05-05 AA 2022-05-05 AAA BBB AAA1 BBB1 1111111111.0 1111111112.0 1111111121.0 1111111132.0 95 85 100 99
1 102 1998-06-10 ZZ 2022-05-05 ZZZ YYY ZZZ1 YYY1 1111111182.0 1111111182.0 1111111128.0 1111111832.0 25 50 75 60
2 103 1999-01-01 TT 2022-05-06 TTT UUU TTT1 UUU1 1111118753.0 1111111153.0 1111111523.0 1111111533.0 50 75 65 75
'''

Extract data from JSON index loaded file

My JSON file looks like:
{
"numAccounts": xxxx,
"filtersApplied": {
"accountIds": "All",
"checkIds": "All",
"categories": [
"cost_optimizing"
],
"statuses": "All",
"regions": "All",
"organizationalUnitIds": [
"yyyyy"
]
},
"categoryStatusMap": {
"cost_optimizing": {
"statusMap": {
"RULE_ERROR": {
"name": "Blue",
"count": 11
},
"ERROR": {
"name": "Red",
"count": 11
},
"OK": {
"name": "Green",
"count": 11
},
"WARN": {
"name": "Yellow",
"count": 11
}
},
"name": "Cost Optimizing",
"monthlySavings": 1111
}
},
"accountStatusMap": {
"xxxxxxxx": {
"cost_optimizing": {
"statusMap": {
"OK": {
"name": "Green",
"count": 1111
},
"WARN": {
"name": "Yellow",
"count": 111
}
},
"name": "Cost Optimizing",
"monthlySavings": 1111
}
},
Which I load into memory using pandas:
df = pd.read_json('file.json', orient='index')
I find the index orient the most suitable because it gives me:
print(df)
0
numAccounts 125
filtersApplied {'accountIds': 'All', 'checkIds': 'All', 'cate...
categoryStatusMap {'cost_optimizing': {'statusMap': {'RULE_ERROR...
accountStatusMap {'xxxxxxx': {'cost_optimizing': {'statusM...
Now, how can I access the accountStatusMap entry?
I tried account_status_map = df['accountStatusMap'] which gives me a
KeyError: 'accountStatusMap'
Is there something specific to the index orientation in how to access specific entries in a dataframe?

Dictionary data is not seperated into columns in Pandas DataFrame

I have created a variable that stores my json data. It looks like this:
datasett = '''
{
"data": {
"trafficRegistrationPoints": [
{
"id": "99100B1687283",
"name": "Menstad sykkeltellepunkt",
"location": {
"coordinates": {
"latLon": {
"lat": 59.173876,
"lon": 9.641772
}
}
}
},
{
"id": "11101B1800681",
"name": "Garpa - sykkel",
"location": {
"coordinates": {
"latLon": {
"lat": 63.795114,
"lon": 11.494511
}
}
}
},
{
"id": "30961B1175469",
"name": "STENMALEN-SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.27665,
"lon": 10.411814
}
}
}
},
{
"id": "53749B1700621",
"name": "TUNEVANNET SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.292846,
"lon": 11.084058
}
}
}
},
{
"id": "80565B1689290",
"name": "Nenset sykkeltellepunkt",
"location": {
"coordinates": {
"latLon": {
"lat": 59.168377,
"lon": 9.634257
}
}
}
},
{
"id": "24783B2045151",
"name": "Orstad sykkel- begge retn.",
"location": {
"coordinates": {
"latLon": {
"lat": 58.798377,
"lon": 5.72743
}
}
}
},
{
"id": "46418B2616452",
"name": "Elgeseter bru sykkel øst",
"location": {
"coordinates": {
"latLon": {
"lat": 63.425015,
"lon": 10.393928
}
}
}
},
{
"id": "35978B1700571",
"name": "Tune kirke nord",
"location": {
"coordinates": {
"latLon": {
"lat": 59.292626,
"lon": 11.084066
}
}
}
},
{
"id": "21745B1996708",
"name": "Munkedamsveien Sykkel",
"location": {
"coordinates": {
"latLon": {
"lat": 59.911198,
"lon": 10.725568
}
}
}
},
{
"id": "33443B2542097",
"name": "KANALBRUA-SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.261823,
"lon": 10.416349
}
}
}
},
{
"id": "77570B384357",
"name": "HAVRENESVEGEN (SYKKEL)",
"location": {
"coordinates": {
"latLon": {
"lat": 61.598202,
"lon": 5.016999
}
}
}
},
{
"id": "95959B971385",
"name": "JELØGATA SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.43385,
"lon": 10.65388
}
}
}
},
{
"id": "61523B971803",
"name": "ST.HANSFJELLET SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.218978,
"lon": 10.93455
}
}
}
},
}
}
}
]
}
}
'''
Next, I have used json.loads() to turn it into a dictionary in Python, using the following code:
dict = json.loads(datasett)
Because the result I get is a nested dictionary,I we want to move further into the nest.
movedDict = dict['data']
I then want to this into a Pandas DataFrame
df = pd.DataFrame.from_dict(movedDict)
However, when I print this. The data is not seperated into unique columns. What do I do wrong?

You can use json_normalize here, I also removed some extra } from your JSON:
data = json.loads(datasett)
df = pd.json_normalize(data, record_path=['data', 'trafficRegistrationPoints'])
print(df)
id name location.coordinates.latLon.lat location.coordinates.latLon.lon
0 99100B1687283 Menstad sykkeltellepunkt 59.173876 9.641772
1 11101B1800681 Garpa - sykkel 63.795114 11.494511
2 30961B1175469 STENMALEN-SYKKEL 59.276650 10.411814
3 53749B1700621 TUNEVANNET SYKKEL 59.292846 11.084058
4 80565B1689290 Nenset sykkeltellepunkt 59.168377 9.634257
5 24783B2045151 Orstad sykkel- begge retn. 58.798377 5.727430
6 46418B2616452 Elgeseter bru sykkel øst 63.425015 10.393928
7 35978B1700571 Tune kirke nord 59.292626 11.084066
8 21745B1996708 Munkedamsveien Sykkel 59.911198 10.725568
9 33443B2542097 KANALBRUA-SYKKEL 59.261823 10.416349
10 77570B384357 HAVRENESVEGEN (SYKKEL) 61.598202 5.016999
11 95959B971385 JELØGATA SYKKEL 59.433850 10.653880
12 61523B971803 ST.HANSFJELLET SYKKEL 59.218978 10.934550

when use from_dict the dict should look like this:
data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
pd.DataFrame.from_dict(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
in your case:
data = {'trafficRegistrationPoints':[.....]}
save the 'trafficRegistrationPoints' as a list and then create the dataFrame

The values for the data key in your dict are not individual dicts but rather a list of dicts under trafficRegistrationPoints key, so you need to move further into that key:
df = pd.DataFrame.from_dict(movedDict['trafficRegistrationPoints'])

Combine 2 JSON files into 1 file in Node or Python (i.e. longitude and latitude)

I want to append the longitude to a latitude stored in 2 separated json files
The result should be stored in a 3rd file
How can I do that on Python OR Javascript/Node?
Many thanks for your support,
LATITUDE
{
"tags": [{
"name": "LATITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 46.9976859318, 3],
[1123306774000, 46.9976859319, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}
LONGITUDE
{
"tags": [{
"name": "LONGITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 36.9976859318, 3],
[1123306774000, 36.9976859317, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}
Expected result: LATITUDE_AND_LONGITUDE
{
"tags": [{
"name": "LATITUDE_AND_LONGITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 46.9976859318, 36.9976859318, 3],
[1123306774000, 46.9976859319, 36.9976859317, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}

I have written the solution with a colleague, find the source code on github: https://gist.github.com/Abdelkrim/715eb222cc318219196c8be293c233bf

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

create dataframe from specific node in json response - python

Try this with a list comprehension: >>> pd.DataFrame([{'tag': 'now', **i['types'][0]} for i in dct['products'] if 'now' in i['tag']]) tag product_id id 0 now 11111 22222 1 now 5555 7777 >>>

Related

How to get data from nested list in response.json()

Converting nested json data to csv using pandas dataframe

Extract data from JSON index loaded file

Dictionary data is not seperated into columns in Pandas DataFrame

Combine 2 JSON files into 1 file in Node or Python (i.e. longitude and latitude)

Categories

Resources