Converting nested json data to csv using pandas dataframe

Converting nested json data to csv using pandas dataframe - python

I have a JSON data like the below:
jsonStr = '''
{
"student_details": [
{
"ID": 101,
"Name": [
{
"First_Name": "AAA",
"Last_Name": "BBB"
},
{
"Father": "AAA1",
"Mother": "BBB1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111111111,
"Student_PhoneNum2": 1111111112
},
{
"Parent_PhoneNum1": 1111111121,
"Parent_PhoneNum2": 1111111132
}
],
"DOB": "1998-05-05",
"Place_of_Birth": "AA",
"Marks": [
{
"DataStructures": 95,
"ObjectOrientedProgramming": 85,
"DiscreteMathematics": 100,
"AnalysisOfAlgorithm": 99,
"NetworkSecurity": 85
}
],
"DateOfJoining": "2022-05-05"
},
{
"ID": 102,
"Name": [
{
"First_Name": "ZZZ",
"Last_Name": "YYY"
},
{
"Father": "ZZZ1",
"Mother": "YYY1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111111182,
"Student_PhoneNum2": 1111111182
},
{
"Parent_PhoneNum1": 1111111128,
"Parent_PhoneNum2": 1111111832
}
],
"DOB": "1998-06-10",
"Place_of_Birth": "ZZ",
"Marks": [
{
"DataStructures": 25,
"ObjectOrientedProgramming": 50,
"DiscreteMathematics": 75,
"AnalysisOfAlgorithm": 60,
"NetworkSecurity": 30
}
],
"DateOfJoining": "2022-05-05"
},
{
"ID": 103,
"Name": [
{
"First_Name": "TTT",
"Last_Name": "UUU"
},
{
"Father": "TTT1",
"Mother": "UUU1"
}
],
"Phone_Number": [
{
"Student_PhoneNum1": 1111118753,
"Student_PhoneNum2": 1111111153
},
{
"Parent_PhoneNum1": 1111111523,
"Parent_PhoneNum2": 1111111533
}
],
"DOB": "1999-01-01",
"Place_of_Birth": "TT",
"Marks": [
{
"DataStructures": 50,
"ObjectOrientedProgramming": 75,
"DiscreteMathematics": 65,
"AnalysisOfAlgorithm": 75,
"NetworkSecurity": 40
}
],
"DateOfJoining": "2022-05-06"
}
]
}
'''
I'm trying to convert every key-value pair to a csv file from this data using the below code
import pandas as pd
ar = pd.read_json(jsonStr)
df = pd.json_normalize(ar['student_details'])
print(df)
df.to_csv('CSVresult.csv', index=False)
for accessing the JSON data, I have passed json data header named student_details.
Result:
is there any way to get the data like the below(every key-value pairs in separate columns) without passing the header student_details and the column names directly?(the json data contain a lot of nested data like this)

you can use:
df = pd.DataFrame(jsonStr)
df=df['student_details'].apply(pd.Series).explode('Name').explode('Phone_Number').explode('Marks')
for row in df.to_dict('records'):
row['Name']['ID']=row['ID']
row['Phone_Number']['ID']=row['ID']
def get_values_without_nans(col_name):
return df[col_name].apply(pd.Series).drop_duplicates().groupby("ID").agg(lambda x: np.nan if x.isnull().all() else x.dropna())
name = get_values_without_nans('Name')
phone_number=get_values_without_nans('Phone_Number')
phone_number.index=phone_number.index.astype('int32')
marks=df.set_index('ID').Marks.apply(pd.Series).drop_duplicates()
meta=df[['ID','DOB','Place_of_Birth','DateOfJoining']].drop_duplicates().set_index('ID')
final=meta.join([name,phone_number,marks])
print(final)
'''
ID DOB Place_of_Birth DateOfJoining First_Name Last_Name Father Mother Student_PhoneNum1 Student_PhoneNum2 Parent_PhoneNum1 Parent_PhoneNum2 DataStructures ObjectOrientedProgramming DiscreteMathematics AnalysisOfAlgorithm
0 101 1998-05-05 AA 2022-05-05 AAA BBB AAA1 BBB1 1111111111.0 1111111112.0 1111111121.0 1111111132.0 95 85 100 99
1 102 1998-06-10 ZZ 2022-05-05 ZZZ YYY ZZZ1 YYY1 1111111182.0 1111111182.0 1111111128.0 1111111832.0 25 50 75 60
2 103 1999-01-01 TT 2022-05-06 TTT UUU TTT1 UUU1 1111118753.0 1111111153.0 1111111523.0 1111111533.0 50 75 65 75
'''

Related

How to get data from nested list in response.json()

There is a json response from an API request in the following schema:
[
{
"id": "1",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{
"id": "1",
"name": "group",
"category": {
"555": "general"
}
}
],
"series": [
{
"location": {
"id": "1",
"level": {
"id": "n1",
"name": "z"
},
"name": "z"
},
"serie": {
"202001": "0.08",
"202002": "0.48",
"202003": "0.19"
}
}
]
}
]
}
]
I want to transform the data from the "serie" key into a pandas DataFrame.
I can do that explicitly:
content = val[0]["results"][0]["series"][0]["serie"]
df = pd.DataFrame(content.items())
df
0 1
0 202001 0.08
1 202002 0.48
2 202003 0.19
But if there is more than one record, that would get only the data from the first element because of the positional arguments [0].
Is there a way to retrieve that data not considering the positional arguments?

Try:
val = [
{
"id": "1",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{"id": "1", "name": "group", "category": {"555": "general"}}
],
"series": [
{
"location": {
"id": "1",
"level": {"id": "n1", "name": "z"},
"name": "z",
},
"serie": {"202001": "0.08", "202002": "0.48", "202003": "0.19"},
}
],
}
],
},
{
"id": "2",
"variable": "x",
"unt": "%",
"results": [
{
"classification": [
{"id": "1", "name": "group", "category": {"555": "general"}}
],
"series": [
{
"location": {
"id": "1",
"level": {"id": "n1", "name": "z"},
"name": "z",
},
"serie": {"202001": "1.08", "202002": "1.48", "202003": "1.19"},
}
],
}
],
},
]
df = pd.DataFrame(
[k, v]
for i in val
for ii in i["results"]
for s in ii["series"]
for k, v in s["serie"].items()
)
print(df)
Prints:
0 1
0 202001 0.08
1 202002 0.48
2 202003 0.19
3 202001 1.08
4 202002 1.48
5 202003 1.19

How to get a specific key value from a nested dictionary in Pandas?

I have a nested JSON-file that looks like this:
[
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 2
}
],
"NumberOfPoints": 1,
},
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
}
],
"NumberOfPoints": 1,
}
]
As you can see, the list of this JSON-file consists of two dictionaries that each contains another list (= "Connections") that consists of at least one dictionary. In each dictionary of this JSON-file, I want to select all keys named "Quantity" to make a calculation with its value (so in the example code above, I want to calculate that there are 5 Quantities in total).
With the code below, I created a simple dataframe in Pandas to make this calculation :
import json
import pandas as pd
df = pd.read_json("chargingStations.json")
dfConnections = df["Connections"]
dfConnections = pd.json_normalize(dfConnections)
print(dfConnections)
Which results in:
Ideally, I want to get the "Quantity" key from each dictionary, so that I can make a dataframe like this (where each item has its own row):
However, I am not sure if this is the best way to make my calculation. I tried to get each value of the "Quantity" key by typing dfConnections = dfConnections.get("Quantity"), but that results in None. So: how can I get the value of each "Quantity" key in each dictionary to make my calculation?

If data is parsed Json data from your question, you can do:
df = pd.DataFrame(
[
{
i: sum(dd["Quantity"] for dd in d["Connections"])
for i, d in enumerate(data)
}
]
)
print(df)
Prints:
0
1
0
2
3

you can use json_normalize():
import pandas as pd
true=True
a=[
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 2
}
],
"NumberOfPoints": 1,
},
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 2
}
],
"NumberOfPoints": 1,
},
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 2
}
],
"NumberOfPoints": 1,
},
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
}
],
"NumberOfPoints": 1,
}
]
After reading the data, we use a group by function to index numbers and get the sum of the quantity.
df=pd.json_normalize(a)
df=df.explode('Connections')
df=df.join(pd.json_normalize(df.pop('Connections')))
df=df.reset_index().groupby('index')['Quantity'].sum().to_frame()
print(df)
'''
index Quantity
0 0 2
1 1 3
'''
#or another format
df2=df.T
print(df2)
'''
0 1
Quantity 2 3
'''

Extract data from JSON index loaded file

My JSON file looks like:
{
"numAccounts": xxxx,
"filtersApplied": {
"accountIds": "All",
"checkIds": "All",
"categories": [
"cost_optimizing"
],
"statuses": "All",
"regions": "All",
"organizationalUnitIds": [
"yyyyy"
]
},
"categoryStatusMap": {
"cost_optimizing": {
"statusMap": {
"RULE_ERROR": {
"name": "Blue",
"count": 11
},
"ERROR": {
"name": "Red",
"count": 11
},
"OK": {
"name": "Green",
"count": 11
},
"WARN": {
"name": "Yellow",
"count": 11
}
},
"name": "Cost Optimizing",
"monthlySavings": 1111
}
},
"accountStatusMap": {
"xxxxxxxx": {
"cost_optimizing": {
"statusMap": {
"OK": {
"name": "Green",
"count": 1111
},
"WARN": {
"name": "Yellow",
"count": 111
}
},
"name": "Cost Optimizing",
"monthlySavings": 1111
}
},
Which I load into memory using pandas:
df = pd.read_json('file.json', orient='index')
I find the index orient the most suitable because it gives me:
print(df)
0
numAccounts 125
filtersApplied {'accountIds': 'All', 'checkIds': 'All', 'cate...
categoryStatusMap {'cost_optimizing': {'statusMap': {'RULE_ERROR...
accountStatusMap {'xxxxxxx': {'cost_optimizing': {'statusM...
Now, how can I access the accountStatusMap entry?
I tried account_status_map = df['accountStatusMap'] which gives me a
KeyError: 'accountStatusMap'
Is there something specific to the index orientation in how to access specific entries in a dataframe?

How to convert multi level json file to csv using python?

I need to convert this json to pandas dataframe.
"""
{
"col": [
{
"desc": {
"cont": "Asia",
"country": "China",
"Sports": "TT"
},
"geo": {
"col": [
[
[
34,
92
],
]
],
"c_t": "matic"
},
"d_t": "fli"
}
],
"game": "outdoor"
}
"""
df_output:
col_desc_cont col_desc_country col_desc_Sports col_geo_col1 col_geo_co2 col_geo_c_t col_geo_d_t game
Asia China TT 34 92 matic fli outdoor
I want to loop every column value and column header, so that i can get the above result...

That's not actually a valid json (but I fixed it below).
.json_normlaize() is what you are looking for. I'll let you split the geo.col column though.
data = """
{
"col": [
{
"desc": {
"cont": "Asia",
"country": "China",
"Sports": "TT"
},
"geo": {
"col": [
[
[
34,
92
]
]
],
"c_t": "matic"
},
"d_t": "fli"
}
],
"game": "outdoor"
}
"""
import pandas as pd
import json
jsonData = json.loads(data)
df = pd.json_normalize(jsonData,
record_path=['col'],
meta=['game'] )
Output:
print(df)
d_t desc.cont desc.country desc.Sports geo.col geo.c_t game
0 fli Asia China TT [[[34, 92]]] matic outdoor

Get different values from repeating item JSON

I have this json derived dict:
{
"stats": [
{
"name": "Jengas",
"time": 166,
"uid": "177098244407558145",
"id": 1
},
{
"name": "- k",
"time": 20,
"uid": "199295228664872961",
"id": 2
},
{
"name": "MAD MARX",
"time": "0",
"uid": "336539711785009153",
"id": 3
},
{
"name": "loli",
"time": 20,
"uid": "366299640976375818",
"id": 4
},
{
"name": "Woona",
"time": 20,
"uid": "246996981178695686",
"id": 5
}
]
}
I want to get the "time" from everybody in the list and use it with sort.
So the result I get has this:
TOP 10:
Jengas: 166
Loli: 20
My first try is to list different values from repeating item.
Right now the code is:
with open('db.json') as json_data:
topvjson = json.load(json_data)
print(topvjson)
d = topvjson['stats'][0]['time']
print(d)

Extract the stats list, apply sort to it with the appropriate key:
from json import loads
data = loads("""{
"stats": [{
"name": "Jengas",
"time": 166,
"uid": "177098244407558145",
"id": 1
}, {
"name": "- k",
"time": 20,
"uid": "199295228664872961",
"id": 2
}, {
"name": "MAD MARX",
"time": "0",
"uid": "336539711785009153",
"id": 3
}, {
"name": "loli",
"time": 20,
"uid": "366299640976375818",
"id": 4
}, {
"name": "Woona",
"time": 20,
"uid": "246996981178695686",
"id": 5
}]
}""")
stats = data['stats']
stats.sort(key = lambda entry: int(entry['time']), reverse=True)
print("TOP 10:")
for entry in stats[:10]:
print("%s: %d" % (entry['name'], int(entry['time'])))
This prints:
TOP 10:
Jengas: 166
- k: 20
loli: 20
Woona: 20
MAD MARX: 0
Note that your time is neither an integer nor string: there are both 0 and "0" in the dataset. That's why you need the conversion int(...).

You can sort the list of dict values like:
Code:
top_three = [(x[1], -x[0]) for x in sorted(
(-int(user['time']), user['name']) for user in stats['stats'])][:3]
This works by taking the time and the name and building a tuple. The tuples can the be sorted, and then the names can be extracted (via: x[1]) after the sort.
Test Code:
stats = {
"stats": [{
"name": "Jengas",
"time": 166,
"uid": "177098244407558145",
"id": 1
}, {
"name": "- k",
"time": 20,
"uid": "199295228664872961",
"id": 2
}, {
"name": "MAD MARX",
"time": "0",
"uid": "336539711785009153",
"id": 3
}, {
"name": "loli",
"time": 20,
"uid": "366299640976375818",
"id": 4
}, {
"name": "Woona",
"time": 20,
"uid": "246996981178695686",
"id": 5
}]
}
top_three = [x[1] for x in sorted(
(-int(user['time']), user['name']) for user in stats['stats'])][:3]
print(top_three)
Results:
[('Jengas', 166), ('- k', 20), ('Woona', 20)]

Here's a way to do it using the built-in sorted() function:
data = {
"stats": [
{
"name": "Jengas",
"time": 166,
"uid": "177098244407558145",
"id": 1
},
{
etc ...
}
]
}
print('TOP 3')
sorted_by_time = sorted(data['stats'], key=lambda d: int(d['time']), reverse=True)
for i, d in enumerate(sorted_by_time, 1):
if i > 3: break
print('{name}: {time}'.format(**d))
Output:
TOP 3
Jengas: 166
- k: 20
loli: 20

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting nested json data to csv using pandas dataframe - python

Related

How to get data from nested list in response.json()

How to get a specific key value from a nested dictionary in Pandas?

Extract data from JSON index loaded file

How to convert multi level json file to csv using python?

Get different values from repeating item JSON

Categories

Resources