Multiple Nested Dictionaries with Pandas

Multiple Nested Dictionaries with Pandas - python

Is there a way to import this kind of JSON response into Pandas? Ive been trying to get it usable with json_normalize but I can't seem to get more than one level to work at a time ( I can get notes but can't being in custom_fields). I also cannot figure out how to call out something like ['reporter']['name'] (which should be jdoe). This is from Mantis and its the JSON output of a requests response. Im now wondering if it needs to br broken up into multiple frames and put back together, or should I use a for loop and put the data I want into a better format for PD to import?
In my head each item should be a column in the series all tied to the id column like this.
id | summary | project.name | reporter.name ..|.. custom.fields.Project_Stage | ... notes1.reporter.name | notes1.text ... notes2.reporter.name | notes2.text
{
"issues": [
{
"id": 1234,
"summary": "Some text",
"project": {
"id": 1,
"name": "North America"
},
"category": {
"id": 11,
"name": "Retail"
},
"reporter": {
"id": 1099,
"name": "jdoe"
},
"custom_fields": [
{
"field": {
"id": 107,
"name": "Product Escalations"
},
"value": ""
},
{
"field": {
"id": 1,
"name": "Project_Stage"
},
"value": "Pending"
}
],
"notes": [
{
"id": 214288,
"reporter": {
"id": 9999,
"name": "jdoe"
},
"text": "Worked with Mark over e-mail",
"view_state": {
"id": 10,
"name": "public",
"label": "public"
},
"type": "note",
"created_at": "2020-12-04T15:55:02-08:00",
"updated_at": "2020-12-04T15:55:02-08:00"
},
{
"id": 214289,
"reporter": {
"id": 9999,
"name": "jdoe"
},
"text": "I attempted on numerous occasions to setup a meeting with him to set it up for him.",
"view_state": {
"id": 10,
"name": "public",
"label": "public"
},
"type": "note",
"created_at": "2020-12-04T15:57:02-08:00",
"updated_at": "2020-12-04T15:57:02-08:00"
}
]
}
]
}
Here is what the DF would look like in my head. All the data for one ticket on one line/series.

Those structures with many lists at the same level are tricky. Try flatten_json. https://github.com/amirziai/flatten
If you're response is called 'dic', you can use this.
from flatten_json import flatten
dic_flattened = (flatten(d, '.') for d in dic['issues'])
df = pd.DataFrame(dic_flattened)
Output
id summary project.id project.name category.id category.name ... notes.1.view_state.id notes.1.view_state.name notes.1.view_state.label notes.1.type notes.1.created_at notes.1.updated_at
0 1234 Some text 1 North America 11 Retail ... 10 public public note 2020-12-04T15:57:02-08:00 2020-12-04T15:57:02-08:00
In [101]: df.columns
Out[101]:
Index(['id', 'summary', 'project.id', 'project.name', 'category.id',
'category.name', 'reporter.id', 'reporter.name',
'custom_fields.0.field.id', 'custom_fields.0.field.name',
'custom_fields.0.value', 'custom_fields.1.field.id',
'custom_fields.1.field.name', 'custom_fields.1.value', 'notes.0.id',
'notes.0.reporter.id', 'notes.0.reporter.name', 'notes.0.text',
'notes.0.view_state.id', 'notes.0.view_state.name',
'notes.0.view_state.label', 'notes.0.type', 'notes.0.created_at',
'notes.0.updated_at', 'notes.1.id', 'notes.1.reporter.id',
'notes.1.reporter.name', 'notes.1.text', 'notes.1.view_state.id',
'notes.1.view_state.name', 'notes.1.view_state.label', 'notes.1.type',
'notes.1.created_at', 'notes.1.updated_at'],
dtype='object')

Related

Using pandas to convert csv into nested json with dynamic strucutre

I am new to python and now want to convert a csv file into json file. Basically the json file is nested with dynamic structure, the structure will be defined using the csv header.
From csv input:
ID, Name, person_id/id_type, person_id/id_value,person_id_expiry_date,additional_info/0/name,additional_info/0/value,additional_info/1/name,additional_info/1/value,salary_info/details/0/grade,salary_info/details/0/payment,salary_info/details/0/amount,salary_info/details/1/next_promotion
1,Peter,PASSPORT,A452817,1-01-2055,Age,19,Gender,M,Manager,Monthly,8956.23,unknown
2,Jane,PASSPORT,B859804,2-01-2035,Age,38,Gender,F,Worker, Monthly,125980.1,unknown
To json output:
[
{
"ID": 1,
"Name": "Peter",
"person_id": {
"id_type": "PASSPORT",
"id_value": "A452817"
},
"person_id_expiry_date": "1-01-2055",
"additional_info": [
{
"name": "Age",
"value": 19
},
{
"name": "Gender",
"value": "M"
}
],
"salary_info": {
"details": [
{
"grade": "Manager",
"payment": "Monthly",
"amount": 8956.23
},
{
"next_promotion": "unknown"
}
]
}
},
{
"ID": 2,
"Name": "Jane",
"person_id": {
"id_type": "PASSPORT",
"id_value": "B859804"
},
"person_id_expiry_date": "2-01-2035",
"additional_info": [
{
"name": "Age",
"value": 38
},
{
"name": "Gender",
"value": "F"
}
],
"salary_info": {
"details": [
{
"grade": "Worker",
"payment": " Monthly",
"amount": 125980.1
},
{
"next_promotion": "unknown"
}
]
}
}
]
Is this something can be done by the existing pandas API or I have to write lots of complex codes to dynamically construct the json object? Thanks.

3 levels json count in python

I am new at python, I´ve worked with other languages... I´ve made this code with Java and works, but now, I must do it in python. I have a json of 3 levels, the first two are: resources, usages, and I want to count the names on the third level. I´ve seen several examples but I cant get it done
import json
data = {
"startDate": "2019-06-23T16:07:21.205Z",
"endDate": "2019-07-24T16:07:21.205Z",
"status": "Complete",
"usages": [
{
"name": "PureCloud Edge Virtual Usage",
"resources": [
{
"name": "Edge01-VM-GNS-DemoSite01 (1f279086-a6be-4a21-ab7a-2bb1ae703fa0)",
"date": "2019-07-24T09:00:28.034Z"
},
{
"name": "329ad5ae-e3a3-4371-9684-13dcb6542e11",
"date": "2019-07-24T09:00:28.034Z"
},
{
"name": "e5796741-bd63-4b8e-9837-4afb95bb0c09",
"date": "2019-07-24T09:00:28.034Z"
}
]
},
{
"name": "PureCloud for SmartVideo Add-On Concurrent",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
},
{
"name": "PureCloud 3 Concurrent User Usage",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
},
{
"name": "PureCloud Skype for Business WebSDK",
"resources": [
{
"name": "jpizarro#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "jaguilera#gns.com.co",
"date": "2019-06-25T04:54:17.662Z"
},
{
"name": "dcortes#gns.com.co",
"date": "2019-07-15T15:06:09.203Z"
}
]
}
],
"selfUri": "/api/v2/billing/reports/billableusage"
}
cantidadDeLicencias = 0
cantidadDeUsages = len(data['usages'])
for x in range(cantidadDeUsages):
temporal = data[x]
cantidadDeResources = len(temporal['resource'])
for z in range(cantidadDeResources):
print(x)
What changes I have to make? Maybe I have to do it on another approach? Thanks in advance
Update
Code that works
cantidadDeLicencias = 0
for usage in data['usages']:
cantidadDeLicencias = cantidadDeLicencias + len(usage['resources'])
print(cantidadDeLicencias)

You can do this :
for usage in data['usages']:
print(len(usage['resources']))

If you want to know the number of names in each of the resources level, counting the duplicated names (e.g. "jaguilera#gns.com.co" appears more than one time in your data), then just do iterate over the first-level (usages) and sum the size of each array
cantidadDeLicencias = 0
for usage in data['usages']:
cantidadDeLicencias += len(usage['resources'])
print(cantidadDeLicencias)
If you don't want to count duplicates, then use a set and iterate over each resources array
cantidadDeLicencias_set = {}
for usage in data['usages']:
for resource in usage['resources']:
cantidadDeLicencias_set.add(resource['name'])
print(len(cantidadDeLicencias_set ))

Convert complex Json to CSV

The file is from a slack server export file, so the structure varies every time (if people responded to a thread with text or reactions).
I have tried several SO questions, with similar problems. But I guarantee my question is different. This one, This one too,This one as well
Sample JSON file:
"client_msg_id": "f347abdc-9e2a-4cad-a37d-8daaecc5ad51",
"type": "message",
"text": "I came here just to check <#U3QSFG5A4> This is a sample :slightly_smiling_face:",
"user": "U51N464MN",
"ts": "1550511445.321100",
"team": "T1559JB9V",
"user_team": "T1559JB9V",
"source_team": "T1559JB9V",
"user_profile": {
"avatar_hash": "gcc8ae3d55bb",
"image_72": "https:\/\/secure.gravatar.com\/avatar\/fcc8ae3d55bb91cb750438657694f8a0.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0026-72.png",
"first_name": "A",
"real_name": "a name",
"display_name": "user",
"team": "T1559JB9V",
"name": "name",
"is_restricted": false,
"is_ultra_restricted": false
},
"thread_ts": "1550511445.321100",
"reply_count": 3,
"reply_users_count": 3,
"latest_reply": "1550515952.338000",
"reply_users": [
"U51N464MN",
"U8DUH4U2V",
"U3QSFG5A4"
],
"replies": [
{
"user": "U51N464MN",
"ts": "1550511485.321200"
},
{
"user": "U8DUH4U2V",
"ts": "1550515191.337300"
},
{
"user": "U3QSFG5A4",
"ts": "1550515952.338000"
}
],
"subscribed": false,
"reactions": [
{
"name": "trolldance",
"users": [
"U51N464MN",
"U4B30MHQE",
"U68E6A0JF"
],
"count": 3
},
{
"name": "trollface",
"users": [
"U8DUH4U2V"
],
"count": 1
}
]
},
The issue is that there are several keys that vary, so the structure changes within the same json file between messages depending on how other users interact to a given message.

with open("file.json") as file:
d = json.load(file)
df = pd.io.json.json_normalize(d)
df.columns = df.columns.map(lambda x: x.split(".")[-1])

How to reshape DataFrame for nested JSON output

I am starting to work with some data manipulation and I need to create a new file (with new features) out of an old one. However, I could not realize how can I customize my own dataframe before using a ".to_json" method.
For example, I have a .csv as:
seller, customer, product, price
Roger, Will, 8129, 30
Roger, Markus, 1234, 100
Roger, Will, 2334, 50
Mike, Markus, 2295, 20
Mike, Albert, 1234, 100
...and I want to generate a .json file to support me in visualizing a network out of it. This should be more or less like:
{
"node": [
{"id":"Roger", "group": "seller" },
{"id":"Mike", "group": "seller" },
{"id":"Will", "group": "customer" },
{"id":"Markus", "group": "customer" },
{"id":"Albert", "group": "customer" }
],
"links":[
{"source":"Roger","target":"Will","product":8129,"price":30},
#...and so on
]
}
I tried to do something like:
df1 = pd.read_csv('file.csv')
seller_list = df1.seller.unique()
customer_list = df1.customer.unique()
..and I could get indeed lists with unique items. However, I could not find how I should add them in a dataframe in order to create an structure such as:
"node":[
...
{"id":"Mike", "group": "seller" },
{"id":"Markus", "group": "customer" },
...
]...#see above
Any support or hint on this is appreciated.

This will be a two step process. First, create the nodes dict using melt + drop_duplicates +to_dict -
nodes = df[['customer', 'seller']]\
.melt(var_name='group', value_name='id')\
.drop_duplicates()\
.to_dict('r')
Now, create the links dict using rename + to_dict
links = df.rename(columns={'seller' : 'source', 'customer' : 'target'}).to_dict('r')
Now, combine the data into one dictionary, and dump it as JSON to a file.
data = {'nodes' : nodes, 'links' : links}
with open('data.json', 'w') as f:
json.dump(data, f, indent=4)
Your data.json file should look like this -
{
"nodes": [
{
"id": "Will",
"group": "customer"
},
{
"id": "Markus",
"group": "customer"
},
{
"id": "Albert",
"group": "customer"
},
{
"id": "Roger",
"group": "seller"
},
{
"id": "Mike",
"group": "seller"
}
],
"links": [
{
"product": 8129,
"target": "Will",
"source": "Roger",
"price": 30
},
{
"product": 1234,
"target": "Markus",
"source": "Roger",
"price": 100
},
{
"product": 2334,
"target": "Will",
"source": "Roger",
"price": 50
},
{
"product": 2295,
"target": "Markus",
"source": "Mike",
"price": 20
},
{
"product": 1234,
"target": "Albert",
"source": "Mike",
"price": 100
}
]
}

Unable to pull data from json using python

I have the following json
{
"response": {
"message": null,
"exception": null,
"context": [
{
"headers": null,
"name": "aname",
"children": [
{
"type": "cluster-connectivity",
"name": "cluster-connectivity"
},
{
"type": "consistency-groups",
"name": "consistency-groups"
},
{
"type": "devices",
"name": "devices"
},
{
"type": "exports",
"name": "exports"
},
{
"type": "storage-elements",
"name": "storage-elements"
},
{
"type": "system-volumes",
"name": "system-volumes"
},
{
"type": "uninterruptible-power-supplies",
"name": "uninterruptible-power-supplies"
},
{
"type": "virtual-volumes",
"name": "virtual-volumes"
}
],
"parent": "/clusters",
"attributes": [
{
"value": "true",
"name": "allow-auto-join"
},
{
"value": "0",
"name": "auto-expel-count"
},
{
"value": "0",
"name": "auto-expel-period"
},
{
"value": "0",
"name": "auto-join-delay"
},
{
"value": "1",
"name": "cluster-id"
},
{
"value": "true",
"name": "connected"
},
{
"value": "synchronous",
"name": "default-cache-mode"
},
{
"value": "true",
"name": "default-caw-template"
},
{
"value": "blah",
"name": "default-director"
},
{
"value": [
"blah",
"blah"
],
"name": "director-names"
},
{
"value": [
],
"name": "health-indications"
},
{
"value": "ok",
"name": "health-state"
},
{
"value": "1",
"name": "island-id"
},
{
"value": "blah",
"name": "name"
},
{
"value": "ok",
"name": "operational-status"
},
{
"value": [
],
"name": "transition-indications"
},
{
"value": [
],
"name": "transition-progress"
}
],
"type": "cluster"
}
],
"custom-data": null
}
}
which im trying to parse using the json module in python. I am only intrested in getting the following information out of it.
Name Value
operational-status Value
health-state Value
Here is what i have tried.
in the below script data is the json returned from a webpage
json = json.loads(data)
healthstate= json['response']['context']['operational-status']
operationalstatus = json['response']['context']['health-status']
Unfortunately i think i must be missing something as the above results in an error that indexes must be integers not string.
if I try
healthstate= json['response'][0]
it errors saying index 0 is out of range.
Any help would be gratefully received.

json['response']['context'] is a list, so that object requires you to use integer indices.
Each item in that list is itself a dictionary again. In this case there is only one such item.
To get all "name": "health-state" dictionaries out of that structure you'd need to do a little more processing:
[attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
would give you a list of of matching values for health-state in the first context.
Demo:
>>> [attr['value'] for attr in json['response']['context'][0]['attributes'] if attr['name'] == 'health-state']
[u'ok']

You have to follow the data structure. It's best to interactively manipulate the data and check what every item is. If it's a list you'll have to index it positionally or iterate through it and check the values. If it's a dict you'll have to index it by it's keys. For example here is a function that get's the context and then iterates through it's attributes checking for a particular name.
def get_attribute(data, attribute):
for attrib in data['response']['context'][0]['attributes']:
if attrib['name'] == attribute:
return attrib['value']
return 'Not Found'
>>> data = json.loads(s)
>>> get_attribute(data, 'operational-status')
u'ok'
>>> get_attribute(data, 'health-state')
u'ok'

json['reponse']['context'] is a list, not a dict. The structure is not exactly what you think it is.
For example, the only "operational status" I see in there can be read with the following:
json['response']['context'][0]['attributes'][0]['operational-status']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple Nested Dictionaries with Pandas - python

Related

Using pandas to convert csv into nested json with dynamic strucutre

3 levels json count in python

Convert complex Json to CSV

How to reshape DataFrame for nested JSON output

Unable to pull data from json using python

Categories

Resources