Python Pandas - Convert dataframe into json - python

I have this pandas.dataframe:
date. pid value interval
0 2021-09-05 00:04:24 1 5.554 2021-09-05 00:00:00
1 2021-09-05 00:06:38 1 4.359 2021-09-05 00:05:00
2 2021-09-05 00:06:46 1 18.364 2021-09-05 00:05:00
3 2021-09-05 00:04:24 2 15.554 2021-09-05 00:00:00
4 2021-09-05 00:06:38 2 3.359 2021-09-05 00:05:00
5 2021-09-05 00:06:46 2 10.364 2021-09-05 00:05:00
which I want to turn it into JSON like this:
{
"2021-09-05 00:00:00": {
"pid1": [
{
"date": "2021-09-05 00:04:24",
"pid": 1,
"value": 5.554,
},
],
"pid2": [
{
"date": "2021-09-05 00:04:24",
"pid": 2,
"value": 15.554,
}
],
},
"2021-09-05 00:05:00": {
"pid1": [
{
"date": "2021-09-05 00:04:24",
"pid": 1,
"value": 4.359,
},
{
"date": "2021-09-05 00:04:24",
"pid": 1,
"value": 18.364,
},
],
"pid2": [
{
"date": "2021-09-05 00:06:38",
"pid": 2,
"value": 3.359,
},{
"date": "2021-09-05 00:06:46",
"pid": 1,
"value": 10.364,
},
],
}
}
Basically I want the group the data by the interval value.
Is there a quick way to format this?

Create helper column with pid, convert to MultiIndex Series and last crate nested dictionary:
s = (df.assign(new = 'pid' + df['pid'].astype(str))
.groupby(['interval','new'])[['date','pid','value']]
.apply(lambda x : x.to_dict(orient= 'records')))
d = {level: s.xs(level).to_dict() for level in s.index.levels[0]}
print (d)
{
'2021-09-05 00:00:00': {
'pid1': [{
'date': '2021-09-05 00:04:24',
'pid': 1,
'value': 5.554
}],
'pid2': [{
'date': '2021-09-05 00:04:24',
'pid': 2,
'value': 15.554
}]
},
'2021-09-05 00:05:00': {
'pid1': [{
'date': '2021-09-05 00:06:38',
'pid': 1,
'value': 4.359
},
{
'date': '2021-09-05 00:06:46',
'pid': 1,
'value': 18.364
}
],
'pid2': [{
'date': '2021-09-05 00:06:38',
'pid': 2,
'value': 3.359
},
{
'date': '2021-09-05 00:06:46',
'pid': 2,
'value': 10.364
}
]
}
}
Last for json use:
import json
json = json.dumps(d)

Related

Parse JSON from Api to pandas dataframe with doubled names

I'm parsing a json and I don't understand how to correctly decompose it into a dataframe.
Json structure i have (api response):
{
"result": {
"data": [],
"totals": [
0
]
},
"timestamp": "2021-11-25 15:19:21"
}
response_ =
{
"result":{
"data":[
{
"dimensions":[
{
"id":"2023-01-10",
"name":""
},
{
"id":"123",
"name":"good3"
}
],
"metrics":[
10,
20,
30,
40
]
},
{
"dimensions":[
{
"id":"2023-01-10",
"name":""
},
{
"id":"234",
"name":"good2"
}
],
"metrics":[
1,
2,
3,
4
]
}
],
"totals":[
11,
22,
33,
44
]
},
"timestamp":"2023-02-07 12:58:40"
}
I don't need "timestamp" and "totals" - just "data". So i do:
...
response_ = requests.post(url, headers=head, data=body)
datas = response_.json()
datas_ = datas['result']['data']
df1 = pd.json_normalize(datas_)
I got:
dimensions
metrics
0
[{'id': '2023-01-10', 'name': ''}, {'id': '123', 'name': 'good1'}]
[10, 20, 30, 40]
1
[{'id': '2023-01-10', 'name': ''}, {'id': '234', 'name': 'good2'}]
[1, 2, 3, 4]
But i need dataframe like:
id_
name_
id
name
metric1
metric2
metric3
metric4
0
2023-01-10
123
good1
10
20
30
40
1
2023-01-10
234
good2
1
2
3
4
When i try like:
df1 = pd.json_normalize(datas_, 'dimensions')
i get all id's and name's in one column.
Explain step by step if possible. Thank you.
Try:
response = {
"result": {
"data": [
{
"dimensions": [
{"id": "2023-01-10", "name": ""},
{"id": "123", "name": "good3"},
],
"metrics": [10, 20, 30, 40],
},
{
"dimensions": [
{"id": "2023-01-10", "name": ""},
{"id": "234", "name": "good2"},
],
"metrics": [1, 2, 3, 4],
},
],
"totals": [11, 22, 33, 44],
},
"timestamp": "2023-02-07 12:58:40",
}
tmp = [
{
**{f"{k}_": v for k, v in d["dimensions"][0].items()},
**{k: v for k, v in d["dimensions"][1].items()},
**{f'metric{i}':m for i, m in enumerate(d['metrics'], 1)}
}
for d in response["result"]["data"]
]
df = pd.DataFrame(tmp)
print(df)
Prints:
id_ name_ id name metric1 metric2 metric3 metric4
0 2023-01-10 123 good3 10 20 30 40
1 2023-01-10 234 good2 1 2 3 4

Convert a MultiIndex pandas DataFrame to a nested JSON

I have the following Dataframe with MultiIndex rows in pandas.
time available_slots status
month day
1 1 10:00:00 1 AVAILABLE
1 12:00:00 1 AVAILABLE
1 14:00:00 1 AVAILABLE
1 16:00:00 1 AVAILABLE
1 18:00:00 1 AVAILABLE
2 10:00:00 1 AVAILABLE
... ... ... ...
2 28 12:00:00 1 AVAILABLE
28 14:00:00 1 AVAILABLE
28 16:00:00 1 AVAILABLE
28 18:00:00 1 AVAILABLE
28 20:00:00 1 AVAILABLE
And I need to transform it to a hierarchical nested JSON as this:
[
{
"month": 1,
"days": [
{
"day": 1,
"slots": [
{
"time": "10:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "12:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
...
]
},
{
"day": 2,
"slots": [
...
]
}
]
},
{
"month": 2,
"days":[
{
"day": 1,
"slots": [
...
]
}
]
},
...
]
Unfortunately, it is not as easy as doing df.to_json(orient="index").
Does anyone know if there is a method in pandas to perform this kind of transformations? or in what way I could iterate over the DataFrame to build the final object?
Here's one way. Basically repeated groupby + apply(to_dict) + reset_index until we get the desired shape:
out = (df.groupby(level=[0,1])
.apply(lambda x: x.to_dict('records'))
.reset_index()
.rename(columns={0:'slots'})
.groupby('month')
.apply(lambda x: x[['day','slots']].to_dict('records'))
.reset_index()
.rename(columns={0:'days'})
.to_json(orient='records', indent=True)
)
Output:
[
{
"month":1,
"days":[
{
"day":1,
"slots":[
{
"time":"10:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"12:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"14:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"16:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"18:00:00",
"available_slots":1,
"status":"AVAILABLE"
}
]
},
{
"day":2,
"slots":[
{
"time":"10:00:00",
"available_slots":1,
"status":"AVAILABLE"
}
]
}
]
},
{
"month":2,
"days":[
{
"day":28,
"slots":[
{
"time":"12:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"14:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"16:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"18:00:00",
"available_slots":1,
"status":"AVAILABLE"
},
{
"time":"20:00:00",
"available_slots":1,
"status":"AVAILABLE"
}
]
}
]
}
]
You can use a double loop for each level of your index:
data = []
for month, df1 in df.groupby(level=0):
data.append({'month': month, 'days': []})
for day, df2 in df1.groupby(level=1):
data[-1]['days'].append({'day': day, 'slots': df2.to_dict('records')})
Output:
import json
print(json.dumps(data, indent=2))
[
{
"month": 1,
"days": [
{
"day": 1,
"slots": [
{
"time": "10:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "12:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "14:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "16:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "18:00:00",
"available_slots": 1,
"status": "AVAILABLE"
}
]
},
{
"day": 2,
"slots": [
{
"time": "10:00:00",
"available_slots": 1,
"status": "AVAILABLE"
}
]
}
]
},
{
"month": 2,
"days": [
{
"day": 28,
"slots": [
{
"time": "12:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "14:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "18:00:00",
"available_slots": 1,
"status": "AVAILABLE"
},
{
"time": "20:00:00",
"available_slots": 1,
"status": "AVAILABLE"
}
]
}
]
}
]

How to get the individual count of field from Elasticsearch

My content inside a dictionary is below
test=
[ { 'masterid': '1', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Accounting', 'parentname': 'Finance'}, { 'id': '3', 'name': 'Research', 'parentname': 'R & D' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }] },
{ 'masterid': '2', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Research', 'parentname': '' }, { 'id': '3', 'name': 'Accounting', 'parentname': '' } ], 'Role': [ { 'id': '5032', 'name': 'Tester' }, { 'id': '5033', 'name': 'Developer' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }]},
{ 'masterid': '3', 'name': 'Group1', 'BusinessArea': [ { 'id': '14', 'name': 'Engineering' }, { 'id': '3', 'name': 'Engineering', 'parentname': '' } ], 'Role': [ { 'id': '5032', 'name': 'Developer' }, { 'id': '5033', 'name': 'Developer', 'parentname': '' } ], 'Designation': [ { 'id': '16', 'name': 'L1' }, { 'id': '20', 'name': 'L2' }, { 'id': '25', 'name': 'L2' }]}]
Code is below to put into elastic search index
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index='new')
for e in test:
es.index(index="new", body=e, id=e['id'])
I want to get the count of masterid of BusinessArea which is all the names
Here it is Accounting, Research Engineering
[ {
"name": "BusinessArea",
"values": [
{
"name": "Accounting",
"count": "2"
},
{
"name": "Research",
"count": "2"
},
{
"name": "Engineering",
"count": "1"
}]
}]
or can i have answer like below
{
"A": {
"Designation": [{
"key": "L1",
"doc_count": 3
},
{
"key": "L2",
"doc_count": 3
}
]
},
{
"B": {
"BusinessArea": [{
"key": "Accounting",
"doc_count": 2
},
{
"key": "Research",
"doc_count": 2
},
{
"key": "Engineering",
"doc_count": 1
}
]
}
}
If you want to get the individual count of the field you can use the terms aggregation that is a multi-bucket value source-based aggregation where buckets are dynamically built - one per unique value.
Search Query:
{
"size":0,
"aggs": {
"countNames": {
"terms": {
"field": "BusinessArea.name.keyword"
}
}
}
}
Search Result:
"aggregations": {
"countNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Accounting",
"doc_count": 2
},
{
"key": "Research",
"doc_count": 2
},
{
"key": "Engineering",
"doc_count": 1
}
]
}
Update 1:
If you want to have an individual count of the field for Designation as well as BusinessArea
Search Query:
{
"size": 0,
"aggs": {
"countNames": {
"terms": {
"field": "BusinessArea.name.keyword"
}
},
"designationNames": {
"terms": {
"field": "Designation.name.keyword"
}
}
}
}
Search Result:
"aggregations": {
"designationNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "L1",
"doc_count": 3
},
{
"key": "L2",
"doc_count": 3
}
]
},
"countNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Accounting",
"doc_count": 2
},
{
"key": "Research",
"doc_count": 2
},
{
"key": "Engineering",
"doc_count": 1
}
]
}
You can simply use the count API of elasticsearch to get the count of All the documents in the elasticsearch index or based on a condition as shown in the same doc.
For your case, it should be like
GET /<your-index-name>/_count?q=name:BusinessArea
Or, if masterid is the Unique-id in your document, you can simply use
GET /<your-index-name>/_count

Adding new pairs to a json file

I have a json file I need to add pairs to, I convert it into a dict, but now I need to put my new values in a specific place.
This is some of the json file I convert:
"rootObject": {
"id": "6ff0010c-00fe-485b-b695-4ddd6aca4dcd",
"type": "IDO_GEAR",
"children": [
{
"id": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab",
"type": "IDO_SYSTEM_LOADCASE",
"children": [],
"childList": "SYSTEMLOADCASE",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab"
},
{
"name": "IDCO_DESIGNATION",
"value": "Lastfall 1"
},
{
"name": "IDSLC_TIME_PORTION",
"value": 100
},
{
"name": "IDSLC_DISTANCE_PORTION",
"value": 100
},
{
"name": "IDSLC_OPERATING_TIME_IN_HOURS",
"value": 1
},
{
"name": "IDSLC_OPERATING_TIME_IN_SECONDS",
"value": 3600
},
{
"name": "IDSLC_OPERATING_REVOLUTIONS",
"value": 1
},
{
"name": "IDSLC_OPERATING_DISTANCE",
"value": 1
},
{
"name": "IDSLC_ACCELERATION",
"value": 9.81
},
{
"name": "IDSLC_EPSILON_X",
"value": 0
},
{
"name": "IDSLC_EPSILON_Y",
"value": 0
},
{
"name": "IDSLC_EPSILON_Z",
"value": 0
},
{
"name": "IDSLC_CALCULATION_WITH_OWN_WEIGHT",
"value": "CO_CALCULATION_WITHOUT_OWN_WEIGHT"
},
{
"name": "IDSLC_CALCULATION_WITH_TEMPERATURE",
"value": "CO_CALCULATION_WITH_TEMPERATURE"
},
{
"name": "IDSLC_FLAG_FOR_LOADCASE_CALCULATION",
"value": "LB_CALCULATE_LOADCASE"
},
{
"name": "IDSLC_STATUS_OF_LOADCASE_CALCULATION",
"value": false
}
I want to add somthing like ENTRY_ONE and ENTRY_TWO like this:
"rootObject": {
"id": "6ff0010c-00fe-485b-b695-4ddd6aca4dcd",
"type": "IDO_GEAR",
"children": [
{
"id": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab",
"type": "IDO_SYSTEM_LOADCASE",
"children": [],
"childList": "SYSTEMLOADCASE",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab"
},
{
"name": "IDCO_DESIGNATION",
"value": "Lastfall 1"
},
{
"name": "IDSLC_TIME_PORTION",
"value": 100
},
{
"name": "IDSLC_DISTANCE_PORTION",
"value": 100
},
{
"name": "ENTRY_ONE",
"value": 100
},
{
"name": "ENTRY_TWO",
"value": 55
},
{
"name": "IDSLC_OPERATING_TIME_IN_HOURS",
"value": 1
},
{
"name": "IDSLC_OPERATING_TIME_IN_SECONDS",
"value": 3600
},
{
"name": "IDSLC_OPERATING_REVOLUTIONS",
"value": 1
},
{
"name": "IDSLC_OPERATING_DISTANCE",
"value": 1
},
{
"name": "IDSLC_ACCELERATION",
"value": 9.81
},
{
"name": "IDSLC_EPSILON_X",
"value": 0
},
{
"name": "IDSLC_EPSILON_Y",
"value": 0
},
{
"name": "IDSLC_EPSILON_Z",
"value": 0
},
{
"name": "IDSLC_CALCULATION_WITH_OWN_WEIGHT",
"value": "CO_CALCULATION_WITHOUT_OWN_WEIGHT"
},
{
"name": "IDSLC_CALCULATION_WITH_TEMPERATURE",
"value": "CO_CALCULATION_WITH_TEMPERATURE"
},
{
"name": "IDSLC_FLAG_FOR_LOADCASE_CALCULATION",
"value": "LB_CALCULATE_LOADCASE"
},
{
"name": "IDSLC_STATUS_OF_LOADCASE_CALCULATION",
"value": false
}
How can I add the entries so that they are under the IDCO_IDENTIFICATION tag, and not under the rootObject?
The way I see your json file, it WOULD be under rootObject as EVERYTHING is under that key. There's quite a few closing brackets and braces missing.
So I can only assume you are meaning you want it directly under IDCO_IDENTIFICATION (which is nested under rootObject). But that doesn't match what you have as your example output either. You have the new ENTRY_ONE and ENTRY_TWO within the properties, within the children, within the rootObject, not "under" IDCO_IDENTIFICATION. So I'm going to follow what you are asking for from your example output.
import json
with open('C:/test.json') as f:
data = json.load(f)
new_elements = [{"name":"ENTRY_ONE", "value":100},
{"name":"ENTRY_TWO", "value":55}]
for each in new_elements:
data['rootObject']['children'][0]['properties'].append(each)
with open('C:/test.json', 'w') as f:
json.dump(data, f)
Output:
import pprint
pprint.pprint(data)
{'rootObject': {'children': [{'childList': 'SYSTEMLOADCASE',
'children': [],
'id': '1dd94d1a-e52d-40b3-a82b-6db02a8fbbab',
'properties': [{'name': 'IDCO_IDENTIFICATION',
'value': '1dd94d1a-e52d-40b3-a82b-6db02a8fbbab'},
{'name': 'IDCO_DESIGNATION',
'value': 'Lastfall 1'},
{'name': 'IDSLC_TIME_PORTION',
'value': 100},
{'name': 'IDSLC_DISTANCE_PORTION',
'value': 100},
{'name': 'IDSLC_OPERATING_TIME_IN_HOURS',
'value': 1},
{'name': 'IDSLC_OPERATING_TIME_IN_SECONDS',
'value': 3600},
{'name': 'IDSLC_OPERATING_REVOLUTIONS',
'value': 1},
{'name': 'IDSLC_OPERATING_DISTANCE',
'value': 1},
{'name': 'IDSLC_ACCELERATION',
'value': 9.81},
{'name': 'IDSLC_EPSILON_X',
'value': 0},
{'name': 'IDSLC_EPSILON_Y',
'value': 0},
{'name': 'IDSLC_EPSILON_Z',
'value': 0},
{'name': 'IDSLC_CALCULATION_WITH_OWN_WEIGHT',
'value': 'CO_CALCULATION_WITHOUT_OWN_WEIGHT'},
{'name': 'IDSLC_CALCULATION_WITH_TEMPERATURE',
'value': 'CO_CALCULATION_WITH_TEMPERATURE'},
{'name': 'IDSLC_FLAG_FOR_LOADCASE_CALCULATION',
'value': 'LB_CALCULATE_LOADCASE'},
{'name': 'IDSLC_STATUS_OF_LOADCASE_CALCULATION',
'value': False},
{'name': 'ENTRY_ONE',
'value': 100},
{'name': 'ENTRY_TWO',
'value': 55}],
'type': 'IDO_SYSTEM_LOADCASE'}],
'id': '6ff0010c-00fe-485b-b695-4ddd6aca4dcd',
'type': 'IDO_GEAR'}}

Convert Pandas Dataframe to nested dictionary

I am trying to convert a dataframe to a nested dictionary but no success so far.
Dataframe: clean_data['Model', 'Problem', 'Size']
Here's how my data looks like:
Model Problem Size
lenovo a6020 screen broken 1
lenovo a6020a40 battery 60
bluetooth 60
buttons 60
lenovo k4 wi-fi 3
bluetooth 3
My desired output:
{
"name": "Brand",
"children": [
{
"name": "Lenovo",
"children": [
{
"name": "lenovo a6020",
"children": {
"name": "screen broken",
"size": 1
}
},
{
"name": "lenovo a6020a40",
"children": [
{
"name": "battery",
"size": 60
},
{
"name": "bluetooth",
"size": 60
},
{
"name": "buttons",
"size": 60
}
]
},
{
"name": "lenovo k4",
"children": [
{
"name": "wi-fi",
"size": 3
},
{
"name": "bluetooth",
"size": 3
}
]
}
]
}
]
}
I have tried pandas.DataFrame.to_dict method But it is returning a simple dictionary but I want it like the one mentioned above.
Use:
print (df)
Model Problem size
0 lenovo a6020 screen broken 1
1 lenovo a6020a40 battery 60
2 NaN bluetooth 60
3 NaN buttons 60
4 lenovo k4 wi-fi 3
5 NaN bluetooth 3
#repalce missing values by forward filling
df = df.ffill()
#split Model column by first whitesapces to 2 columns
df[['a','b']] = df['Model'].str.split(n=1, expand=True)
#each level convert to list of dictionaries
#for correct keys use rename
L = (df.rename(columns={'Problem':'name'})
.groupby(['a','b'])['name','size']
.apply(lambda x: x.to_dict('r'))
.rename('children')
.reset_index()
.rename(columns={'b':'name'})
.groupby('a')['name','children']
.apply(lambda x: x.to_dict('r'))
.rename('children')
.reset_index()
.rename(columns={'a':'name'})
.to_dict('r')
)
#print (L)
#create outer level by contructor
d = { "name": "Brand", "children": L}
print (d)
{
'name': 'Brand',
'children': [{
'name': 'lenovo',
'children': [{
'name': 'a6020',
'children': [{
'name': 'screen broken',
'size': 1
}]
}, {
'name': 'a6020a40',
'children': [{
'name': 'battery',
'size': 60
}, {
'name': 'bluetooth',
'size': 60
}, {
'name': 'buttons',
'size': 60
}]
}, {
'name': 'k4',
'children': [{
'name': 'wi-fi',
'size': 3
}, {
'name': 'bluetooth',
'size': 3
}]
}]
}]
}

Categories