I have the following JSON structure:
{
"products": [
{
"id": 12121,
"product": "hair",
"tag":"now, later",
"types": [
{
"product_id": 11111,
"id": 22222
}
],
"options": [
{
"name": "Title"
}
]
},
{
"id": 1313131,
"product": "pillow",
"tag":"later, never",
"types": [
{
"product_id": 33333,
"id": 44444
}
],
"options": [
{
"name": "Title"
}
]
},
{
"id": 14141414,
"product": "face",
"tag":"now, never",
"types": [
{
"product_id": 5555,
"id": 7777
}
],
"options": [
{
"name": "Title"
}
]
}
]
}
I'm looking to create a dataframe of the values found in types only when the tag list says "now", output expected:
tag product_id id
0 now 11111 22222
1 now 5555 7777
I was hoping for some guidance as I haven't dealt with JSON structures that have multiples lists and how to target based on finding a value like what is inside tag. Any hints would be greatly appreciated. Thank you in advanced.
Try this with a list comprehension:
>>> pd.DataFrame([{'tag': 'now', **i['types'][0]} for i in dct['products'] if 'now' in i['tag']])
tag product_id id
0 now 11111 22222
1 now 5555 7777
>>>
Related
There is a json response from an API request in the following schema:
[
{
"id": "1",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{
"id": "1",
"name": "group",
"category": {
"555": "general"
}
}
],
"series": [
{
"location": {
"id": "1",
"level": {
"id": "n1",
"name": "z"
},
"name": "z"
},
"serie": {
"202001": "0.08",
"202002": "0.48",
"202003": "0.19"
}
}
]
}
]
}
]
I want to transform the data from the "serie" key into a pandas DataFrame.
I can do that explicitly:
content = val[0]["results"][0]["series"][0]["serie"]
df = pd.DataFrame(content.items())
df
0 1
0 202001 0.08
1 202002 0.48
2 202003 0.19
But if there is more than one record, that would get only the data from the first element because of the positional arguments [0].
Is there a way to retrieve that data not considering the positional arguments?
Try:
val = [
{
"id": "1",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{"id": "1", "name": "group", "category": {"555": "general"}}
],
"series": [
{
"location": {
"id": "1",
"level": {"id": "n1", "name": "z"},
"name": "z",
},
"serie": {"202001": "0.08", "202002": "0.48", "202003": "0.19"},
}
],
}
],
},
{
"id": "2",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{"id": "1", "name": "group", "category": {"555": "general"}}
],
"series": [
{
"location": {
"id": "1",
"level": {"id": "n1", "name": "z"},
"name": "z",
},
"serie": {"202001": "1.08", "202002": "1.48", "202003": "1.19"},
}
],
}
],
},
]
df = pd.DataFrame(
[k, v]
for i in val
for ii in i["results"]
for s in ii["series"]
for k, v in s["serie"].items()
)
print(df)
Prints:
0 1
0 202001 0.08
1 202002 0.48
2 202003 0.19
3 202001 1.08
4 202002 1.48
5 202003 1.19
I have a JSON data like the below:
jsonStr = '''
{
"student_details": [
{
"ID": 101,
"Name": [
{
"First_Name": "AAA",
"Last_Name": "BBB"
},
{
"Father": "AAA1",
"Mother": "BBB1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111111111,
"Student_PhoneNum2": 1111111112
},
{
"Parent_PhoneNum1": 1111111121,
"Parent_PhoneNum2": 1111111132
}
],
"DOB": "1998-05-05",
"Place_of_Birth": "AA",
"Marks": [
{
"DataStructures": 95,
"ObjectOrientedProgramming": 85,
"DiscreteMathematics": 100,
"AnalysisOfAlgorithm": 99,
"NetworkSecurity": 85
}
],
"DateOfJoining": "2022-05-05"
},
{
"ID": 102,
"Name": [
{
"First_Name": "ZZZ",
"Last_Name": "YYY"
},
{
"Father": "ZZZ1",
"Mother": "YYY1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111111182,
"Student_PhoneNum2": 1111111182
},
{
"Parent_PhoneNum1": 1111111128,
"Parent_PhoneNum2": 1111111832
}
],
"DOB": "1998-06-10",
"Place_of_Birth": "ZZ",
"Marks": [
{
"DataStructures": 25,
"ObjectOrientedProgramming": 50,
"DiscreteMathematics": 75,
"AnalysisOfAlgorithm": 60,
"NetworkSecurity": 30
}
],
"DateOfJoining": "2022-05-05"
},
{
"ID": 103,
"Name": [
{
"First_Name": "TTT",
"Last_Name": "UUU"
},
{
"Father": "TTT1",
"Mother": "UUU1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111118753,
"Student_PhoneNum2": 1111111153
},
{
"Parent_PhoneNum1": 1111111523,
"Parent_PhoneNum2": 1111111533
}
],
"DOB": "1999-01-01",
"Place_of_Birth": "TT",
"Marks": [
{
"DataStructures": 50,
"ObjectOrientedProgramming": 75,
"DiscreteMathematics": 65,
"AnalysisOfAlgorithm": 75,
"NetworkSecurity": 40
}
],
"DateOfJoining": "2022-05-06"
}
]
}
'''
I'm trying to convert every key-value pair to a csv file from this data using the below code
import pandas as pd
ar = pd.read_json(jsonStr)
df = pd.json_normalize(ar['student_details'])
print(df)
df.to_csv('CSVresult.csv', index=False)
for accessing the JSON data, I have passed json data header named student_details.
Result:
is there any way to get the data like the below(every key-value pairs in separate columns) without passing the header student_details and the column names directly?(the json data contain a lot of nested data like this)
you can use:
df = pd.DataFrame(jsonStr)
df=df['student_details'].apply(pd.Series).explode('Name').explode('Phone_Number').explode('Marks')
for row in df.to_dict('records'):
row['Name']['ID']=row['ID']
row['Phone_Number']['ID']=row['ID']
def get_values_without_nans(col_name):
return df[col_name].apply(pd.Series).drop_duplicates().groupby("ID").agg(lambda x: np.nan if x.isnull().all() else x.dropna())
name = get_values_without_nans('Name')
phone_number=get_values_without_nans('Phone_Number')
phone_number.index=phone_number.index.astype('int32')
marks=df.set_index('ID').Marks.apply(pd.Series).drop_duplicates()
meta=df[['ID','DOB','Place_of_Birth','DateOfJoining']].drop_duplicates().set_index('ID')
final=meta.join([name,phone_number,marks])
print(final)
'''
ID DOB Place_of_Birth DateOfJoining First_Name Last_Name Father Mother Student_PhoneNum1 Student_PhoneNum2 Parent_PhoneNum1 Parent_PhoneNum2 DataStructures ObjectOrientedProgramming DiscreteMathematics AnalysisOfAlgorithm
0 101 1998-05-05 AA 2022-05-05 AAA BBB AAA1 BBB1 1111111111.0 1111111112.0 1111111121.0 1111111132.0 95 85 100 99
1 102 1998-06-10 ZZ 2022-05-05 ZZZ YYY ZZZ1 YYY1 1111111182.0 1111111182.0 1111111128.0 1111111832.0 25 50 75 60
2 103 1999-01-01 TT 2022-05-06 TTT UUU TTT1 UUU1 1111118753.0 1111111153.0 1111111523.0 1111111533.0 50 75 65 75
'''
My JSON file looks like:
{
"numAccounts": xxxx,
"filtersApplied": {
"accountIds": "All",
"checkIds": "All",
"categories": [
"cost_optimizing"
],
"statuses": "All",
"regions": "All",
"organizationalUnitIds": [
"yyyyy"
]
},
"categoryStatusMap": {
"cost_optimizing": {
"statusMap": {
"RULE_ERROR": {
"name": "Blue",
"count": 11
},
"ERROR": {
"name": "Red",
"count": 11
},
"OK": {
"name": "Green",
"count": 11
},
"WARN": {
"name": "Yellow",
"count": 11
}
},
"name": "Cost Optimizing",
"monthlySavings": 1111
}
},
"accountStatusMap": {
"xxxxxxxx": {
"cost_optimizing": {
"statusMap": {
"OK": {
"name": "Green",
"count": 1111
},
"WARN": {
"name": "Yellow",
"count": 111
}
},
"name": "Cost Optimizing",
"monthlySavings": 1111
}
},
Which I load into memory using pandas:
df = pd.read_json('file.json', orient='index')
I find the index orient the most suitable because it gives me:
print(df)
0
numAccounts 125
filtersApplied {'accountIds': 'All', 'checkIds': 'All', 'cate...
categoryStatusMap {'cost_optimizing': {'statusMap': {'RULE_ERROR...
accountStatusMap {'xxxxxxx': {'cost_optimizing': {'statusM...
Now, how can I access the accountStatusMap entry?
I tried account_status_map = df['accountStatusMap'] which gives me a
KeyError: 'accountStatusMap'
Is there something specific to the index orientation in how to access specific entries in a dataframe?
I have created a variable that stores my json data. It looks like this:
datasett = '''
{
"data": {
"trafficRegistrationPoints": [
{
"id": "99100B1687283",
"name": "Menstad sykkeltellepunkt",
"location": {
"coordinates": {
"latLon": {
"lat": 59.173876,
"lon": 9.641772
}
}
}
},
{
"id": "11101B1800681",
"name": "Garpa - sykkel",
"location": {
"coordinates": {
"latLon": {
"lat": 63.795114,
"lon": 11.494511
}
}
}
},
{
"id": "30961B1175469",
"name": "STENMALEN-SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.27665,
"lon": 10.411814
}
}
}
},
{
"id": "53749B1700621",
"name": "TUNEVANNET SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.292846,
"lon": 11.084058
}
}
}
},
{
"id": "80565B1689290",
"name": "Nenset sykkeltellepunkt",
"location": {
"coordinates": {
"latLon": {
"lat": 59.168377,
"lon": 9.634257
}
}
}
},
{
"id": "24783B2045151",
"name": "Orstad sykkel- begge retn.",
"location": {
"coordinates": {
"latLon": {
"lat": 58.798377,
"lon": 5.72743
}
}
}
},
{
"id": "46418B2616452",
"name": "Elgeseter bru sykkel øst",
"location": {
"coordinates": {
"latLon": {
"lat": 63.425015,
"lon": 10.393928
}
}
}
},
{
"id": "35978B1700571",
"name": "Tune kirke nord",
"location": {
"coordinates": {
"latLon": {
"lat": 59.292626,
"lon": 11.084066
}
}
}
},
{
"id": "21745B1996708",
"name": "Munkedamsveien Sykkel",
"location": {
"coordinates": {
"latLon": {
"lat": 59.911198,
"lon": 10.725568
}
}
}
},
{
"id": "33443B2542097",
"name": "KANALBRUA-SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.261823,
"lon": 10.416349
}
}
}
},
{
"id": "77570B384357",
"name": "HAVRENESVEGEN (SYKKEL)",
"location": {
"coordinates": {
"latLon": {
"lat": 61.598202,
"lon": 5.016999
}
}
}
},
{
"id": "95959B971385",
"name": "JELØGATA SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.43385,
"lon": 10.65388
}
}
}
},
{
"id": "61523B971803",
"name": "ST.HANSFJELLET SYKKEL",
"location": {
"coordinates": {
"latLon": {
"lat": 59.218978,
"lon": 10.93455
}
}
}
},
}
}
}
]
}
}
'''
Next, I have used json.loads() to turn it into a dictionary in Python, using the following code:
dict = json.loads(datasett)
Because the result I get is a nested dictionary,I we want to move further into the nest.
movedDict = dict['data']
I then want to this into a Pandas DataFrame
df = pd.DataFrame.from_dict(movedDict)
However, when I print this. The data is not seperated into unique columns. What do I do wrong?
You can use json_normalize here, I also removed some extra } from your JSON:
data = json.loads(datasett)
df = pd.json_normalize(data, record_path=['data', 'trafficRegistrationPoints'])
print(df)
id name location.coordinates.latLon.lat location.coordinates.latLon.lon
0 99100B1687283 Menstad sykkeltellepunkt 59.173876 9.641772
1 11101B1800681 Garpa - sykkel 63.795114 11.494511
2 30961B1175469 STENMALEN-SYKKEL 59.276650 10.411814
3 53749B1700621 TUNEVANNET SYKKEL 59.292846 11.084058
4 80565B1689290 Nenset sykkeltellepunkt 59.168377 9.634257
5 24783B2045151 Orstad sykkel- begge retn. 58.798377 5.727430
6 46418B2616452 Elgeseter bru sykkel øst 63.425015 10.393928
7 35978B1700571 Tune kirke nord 59.292626 11.084066
8 21745B1996708 Munkedamsveien Sykkel 59.911198 10.725568
9 33443B2542097 KANALBRUA-SYKKEL 59.261823 10.416349
10 77570B384357 HAVRENESVEGEN (SYKKEL) 61.598202 5.016999
11 95959B971385 JELØGATA SYKKEL 59.433850 10.653880
12 61523B971803 ST.HANSFJELLET SYKKEL 59.218978 10.934550
when use from_dict the dict should look like this:
data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
pd.DataFrame.from_dict(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
in your case:
data = {'trafficRegistrationPoints':[.....]}
save the 'trafficRegistrationPoints' as a list and then create the dataFrame
The values for the data key in your dict are not individual dicts but rather a list of dicts under trafficRegistrationPoints key, so you need to move further into that key:
df = pd.DataFrame.from_dict(movedDict['trafficRegistrationPoints'])
I want to append the longitude to a latitude stored in 2 separated json files
The result should be stored in a 3rd file
How can I do that on Python OR Javascript/Node?
Many thanks for your support,
LATITUDE
{
"tags": [{
"name": "LATITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 46.9976859318, 3],
[1123306774000, 46.9976859319, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}
LONGITUDE
{
"tags": [{
"name": "LONGITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 36.9976859318, 3],
[1123306774000, 36.9976859317, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}
Expected result: LATITUDE_AND_LONGITUDE
{
"tags": [{
"name": "LATITUDE_AND_LONGITUDE_deg",
"results": [{
"groups": [{
"name": "type",
"type": "number"
}],
"values": [
[1123306773000, 46.9976859318, 36.9976859318, 3],
[1123306774000, 46.9976859319, 36.9976859317, 3]
],
"attributes": {
"customer": ["Acme"],
"host": ["server1"]
}
}],
"stats": {
"rawCount": 2
}
}]
}
I have written the solution with a colleague, find the source code on github: https://gist.github.com/Abdelkrim/715eb222cc318219196c8be293c233bf