how do you convert json output to a data frame in python - python

I need to convert this json file to a data frame in python:
print(resp2)
{
"totalCount": 1,
"nextPageKey": null,
"result": [
{
"metricId": "builtin:tech.generic.cpu.usage",
"data": [
{
"dimensions": [
"process_345678"
],
"dimensionMap": {
"dt.entity.process_group_instance": "process_345678"
},
"timestamps": [
1642021200000,
1642024800000,
1642028400000
],
"values": [
10,
15,
12
]
}
]
}
]
}
Output needs to be like this:
metricId dimensions timestamps values
builtin:tech.generic.cpu.usage process_345678 1642021200000 10
builtin:tech.generic.cpu.usage process_345678 1642024800000 15
builtin:tech.generic.cpu.usage process_345678 1642028400000 12
I have tried this:
print(pd.json_normalize(resp2, "data"))
I get invalid syntax, any ideas?

Take a look at the examples of json_normalize, and you'll see a list of dictionaries that have the key names of the columns you want, unique to each row. When you have nested lists/objects, then the columns will be flatten to have dot-notation, but nested arrays will not end up duplicated across rows.
Therefore, parse the data into a flat list, then you can use from_records.
data = []
for r in resp2['result']:
metricId = r['metricId']
for d in r['data']:
dimension = d['dimensions'][0] # unclear why this is an array
timestamps = d['timestamps']
values = d['values']
for t, v in zip(timestamps, values):
data.append({'metricId': metricId, 'dimensions': dimension, 'timestamps': t, 'values': v})
df = pd.DataFrame.from_records(data)

Related

How to remove redundant elements from a JSON string in Python

I have the below JSON string which I converted from a Pandas data frame.
[
{
"ID":"1",
"Salary1":69.43,
"Salary2":513.0,
"Date":"2022-06-09",
"Name":"john",
"employeeId":12,
"DateTime":"2022-09-0710:57:55"
},
{
"ID":"2",
"Salary1":691.43,
"Salary2":5123.0,
"Date":"2022-06-09",
"Name":"john",
"employeeId":12,
"DateTime":"2022-09-0710:57:55"
}
]
I want to change the above JSON to the below format.
[
{
"Date":"2022-06-09",
"Name":"john",
"DateTime":"2022-09-0710:57:55",
"employeeId":12,
"Results":[
{
"ID":1,
"Salary1":69.43,
"Salary2":513
},
{
"ID":"2",
"Salary1":691.43,
"Salary2":5123
}
]
}
]
Kindly let me know how we can achieve this in Python.
Original Dataframe:
ID Salary1 Salary2 Date Name employeeId DateTime
1 69.43 513.0 2022-06-09 john 12 2022-09-0710:57:55
2 691.43 5123.0 2022-06-09 john 12 2022-09-0710:57:55
Thank you.
As #Harsha pointed, you can adapt one of the answers from another question, with just some minor tweaks to make it work for OP's case:
(
df.groupby(["Date","Name","DateTime","employeeId"])[["ID","Salary1","Salary2"]]
# to_dict(orient="records") - returns list of rows, where each row is a dict,
# "oriented" like [{column -> value}, … , {column -> value}]
.apply(lambda x: x.to_dict(orient="records"))
# groupBy makes a Series: with grouping columns as index, and dict as values.
# This structure is no good for the next to_dict() method.
# So here we create new DataFrame out of grouped Series,
# with Series' indexes as columns of DataFrame,
# and also renamimg our Series' values to "Results" while we are at it.
.reset_index(name="Results")
# Finally we can achieve the desired structure with the last call to to_dict():
.to_dict(orient="records")
)
# [{'Date': '2022-06-09', 'Name': 'john', 'DateTime': '2022-09-0710:57:55', 'employeeId': 12,
# 'Results': [
# {'ID': 1, 'Salary1': 69.43, 'Salary2': 513.0},
# {'ID': 2, 'Salary1': 691.43, 'Salary2': 5123.0}
# ]}]

Excel to nested Json including child elements into array

I am trying to convert an Excel to nested JSON using Python where the repeated values go in as an array of elements.
Ex: structure of CSV
Manufacturer,oilType,viscosity
shell,superOil,1ova
shell,superOil,2ova
shell,normalOil,1ova
bp, power, 10bba
Should be displayed in JSON (expected output) as
elements: [
{
"Manufacturer": "shell",
"details": [
{
"OilType": "superOil",
"Viscosity": [
"1ova",
"2ova"
]
},
{
"OilType": "normalOil",
"Viscosity": [
"1ova"
]
}
]
},
{
"Manufacturer": "bp",
"details": [
{
"OilType": "power",
"Viscosity": [
"10bba"
]
}
]
}
]
I have currently converted the CSV into JSON using openpyxl and the values are displayed for each of the headers in format like (Current output)
[{Manufacturer: "shell", oilType: "superOil", Viscosity:"1ova"},{...},{...},...]
Please help in getting the expected output.
Hi and welcome to StackOverflow.
Your question has actually nothing to do with openpyxl because you don't need to save into an Excel file.
You can do thought:
Load the csv (or Excel) into a pandas DataFrame
Group by Manufacturer and oil type
Dump into the format you want
Transform to JSON (either string or file)
In practice, that gives something like that:
import json
import pandas as pd
df = pd.read_csv("oil.csv") # or read_excel if this is an Excel
oils = df.groupby(["Manufacturer", "oilType"]).aggregate(pd.Series.to_list)
elements = [
{
"Manufacturer": manufacturer,
"Details": [
{"OilType": o, "Viscosity": v}
for o, v in data.droplevel(0).viscosity.items()
],
}
for manufacturer, data in oils.groupby(level="Manufacturer")
]
with open("oil.json", "w") as f:
json.dump({"elements": elements}, f)
For information, oils would look like this:
viscosity
Manufacturer oilType
bp power [10bba]
shell normalOil [1ova]
superOil [1ova, 2ova]

How to create a single json file from two DataFrames?

I have two DataFrames, and I want to post these DataFrames as json (to the web service) but first I have to concatenate them as json.
#first df
input_df = pd.DataFrame()
input_df['first'] = ['a', 'b']
input_df['second'] = [1, 2]
#second df
customer_df = pd.DataFrame()
customer_df['first'] = ['c']
customer_df['second'] = [3]
For converting to json, I used following code for each DataFrame;
df.to_json(
path_or_buf='out.json',
orient='records', # other options are (split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’)
date_format='iso',
force_ascii=False,
default_handler=None,
lines=False,
indent=2
)
This code gives me the table like this: For ex, input_df export json
[
{
"first":"a",
"second":1
},
{
"first":"b",
"second":2
}
]
my desired output is like that:
{
"input": [
{
"first": "a",
"second": 1
},
{
"first": "b",
"second": 2
}
],
"customer": [
{
"first": "d",
"second": 3
}
]
}
How can I get this output like this? I couldn't find the way :(
You can concatenate the DataFrames with appropriate key names, then groupby the keys and build dictionaries at each group; finally build a json string from the entire thing:
out = (
pd.concat([input_df, customer_df], keys=['input', 'customer'])
.droplevel(1)
.groupby(level=0).apply(lambda x: x.to_dict('records'))
.to_json()
)
Output:
'{"customer":[{"first":"c","second":3}],"input":[{"first":"a","second":1},{"first":"b","second":2}]}'
or a dict by replacing the last to_json() to to_dict().

Convert json dictionary to dataframe in Python

My API gives me a json file as output with the following structure:
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "PCJeremy",
"tags": {
"host": "001"
},
"columns": [
"time",
"memory"
],
"values": [
[
"2021-03-20T23:00:00Z",
1049911288
],
[
"2021-03-21T00:00:00Z",
1057692712
],
]
},
{
"name": "PCJohnny",
"tags": {
"host": "002"
},
"columns": [
"time",
"memory"
],
"values": [
[
"2021-03-20T23:00:00Z",
407896064
],
[
"2021-03-21T00:00:00Z",
406847488
]
]
}
]
}
]
}
I want to transform this output to a pandas dataframe so I can create some reports from it. I tried using the pdDataFrame.from_dict method:
with open(fn) as f:
data = json.load(f)
print(pd.DataFrame.from_dict(data))
But as a resulting set, I just get one column and one row with all the data back:
results
0 {'statement_id': 0, 'series': [{'name': 'Jerem...
The structure is just quite hard to understand for me as I am no professional. I would like to get a dataframe with 4 columns: name, host, time and memory with a row of data for every combination of values in the json file. Example:
name host time memory
JeremyPC 001 "2021-03-20T23:00:00Z" 1049911288
JeremyPC 001 "2021-03-21T00:00:00Z" 1049911288
Is this in any way possible? Thanks a lot in advance!
First extract the data from json you are interested in
extracted_data = []
for series in data['results'][0]['series']:
d = {}
d['name'] = series['name']
d['host'] = series['tags']['host']
d['time'] = [value[0] for value in series['values']]
d['memory'] = [value[1] for value in series['values']]
extracted_data.append(d)
df = pd.DataFrame(extracted_data)
# print(df)
name host time memory
0 PCJeremy 001 [2021-03-20T23:00:00Z, 2021-03-21T00:00:00Z] [1049911288, 1057692712]
1 PCJohnny 002 [2021-03-20T23:00:00Z, 2021-03-21T00:00:00Z] [407896064, 406847488]
Second, explode multiple columns into rows
df1 = pd.concat([df.explode('time')['time'], df.explode('memory')['memory']], axis=1)
df_ = df.drop(['time','memory'], axis=1).join(df1).reset_index(drop=True)
# print(df_)
name host time memory
0 PCJeremy 001 2021-03-20T23:00:00Z 1049911288
1 PCJeremy 001 2021-03-21T00:00:00Z 1057692712
2 PCJohnny 002 2021-03-20T23:00:00Z 407896064
3 PCJohnny 002 2021-03-21T00:00:00Z 406847488
With carefully constructing the dict, it could be done without exploding.
extracted_data = []
for series in data['results'][0]['series']:
d = {}
d['name'] = series['name']
d['host'] = series['tags']['host']
for values in series['values']:
d_ = d.copy()
for column, value in zip(series['columns'], values):
d_[column] = value
extracted_data.append(d_)
df = pd.DataFrame(extracted_data)
You could jmespath to extract the data; it is quite a handy tool for such nested json data. You can read the docs for more details; I will summarize the basics: If you want to access a key, use a dot, if you want to access values in a list, use []. Combination of these two will help in traversing the json paths. There are more tools; these basics should get you started.
Your json is wrapped in a data variable:
data
{'results': [{'statement_id': 0,
'series': [{'name': 'PCJeremy',
'tags': {'host': '001'},
'columns': ['time', 'memory'],
'values': [['2021-03-20T23:00:00Z', 1049911288],
['2021-03-21T00:00:00Z', 1057692712]]},
{'name': 'PCJohnny',
'tags': {'host': '002'},
'columns': ['time', 'memory'],
'values': [['2021-03-20T23:00:00Z', 407896064],
['2021-03-21T00:00:00Z', 406847488]]}]}]}
Let's create an expression to parse the json, and get the specific values:
expression = """{name: results[].series[].name,
host: results[].series[].tags.host,
time: results[].series[].values[*][0],
memory: results[].series[].values[*][-1]}
"""
Parse the expression to the json data:
expression = jmespath.compile(expression).search(data)
expression
{'name': ['PCJeremy', 'PCJohnny'],
'host': ['001', '002'],
'time': [['2021-03-20T23:00:00Z', '2021-03-21T00:00:00Z'],
['2021-03-20T23:00:00Z', '2021-03-21T00:00:00Z']],
'memory': [[1049911288, 1057692712], [407896064, 406847488]]}
Note the time and memory are nested lists, and match the values in data:
Create dataframe and explode relevant columns:
pd.DataFrame(expression).apply(pd.Series.explode)
name host time memory
0 PCJeremy 001 2021-03-20T23:00:00Z 1049911288
0 PCJeremy 001 2021-03-21T00:00:00Z 1057692712
1 PCJohnny 002 2021-03-20T23:00:00Z 407896064
1 PCJohnny 002 2021-03-21T00:00:00Z 406847488

DataFrame groupby a column which has dictionary values

I'm having a dataframe which contains a column as dictionary. And I need to groupby the column by the dictionary values. For example,
import pandas as pd
data = [
{
"name":"xx",
"values":{
"element":[
{
"path":"path1/id1"
},
{
"path":"path2/id1"
}
],
"nonrequired":[
{}
]
}
},
{
"name":"yy",
"values":{
"element":[
{
"path":"path1/id2"
},
{
"path":"path2/id2"
}
],
"nonrequired":[
{}
]
}
}
]
df = pd.DataFrame(data)
What I'm looking for,
I want to groupby the column "values" by inside specific key.
The grouping should be values->element->path
The grouping should be based on the partial path values. For example if path="path1/id2", the
grouping should be based on path="path1"
After grouping I need to extract the result as dictionary.
Expected result:
result = {
'path1': [
{
"name":'xx',
"renamecolumn":['id1','id2']
}
],
'path2': [
{
"name":'yy',
"renamecolumn":['id1','id2']
}
]
}
Still not 100% sure of the logic of the final dictionary creation as the example input and output don't quite match up. However, here is how you can extract the values and you can create your desired dictionary from there.
# ectract the values and split them on the forward slash
df['split'] = df['values'].apply(lambda x: [item['path'].split('/') for item in x['element']])
# generate the path and ids columns
df['path'] = df['split'].apply(lambda x: [x[i][0] for i in range(0,len(x))])
df['ids'] = df['split'].apply(lambda x: [x[i][1] for i in range(0,len(x))])
# separate out all the lists and
result = df.drop(['values', 'split'], axis=1) \
.explode('ids').explode('path').drop_duplicates()
Result is:
name path ids
0 xx path1 id1
0 xx path2 id1
1 yy path1 id2
1 yy path2 id2

Categories