How to reshape DataFrame for nested JSON output

How to reshape DataFrame for nested JSON output - python

I am starting to work with some data manipulation and I need to create a new file (with new features) out of an old one. However, I could not realize how can I customize my own dataframe before using a ".to_json" method.
For example, I have a .csv as:
seller, customer, product, price
Roger, Will, 8129, 30
Roger, Markus, 1234, 100
Roger, Will, 2334, 50
Mike, Markus, 2295, 20
Mike, Albert, 1234, 100
...and I want to generate a .json file to support me in visualizing a network out of it. This should be more or less like:
{
"node": [
{"id":"Roger", "group": "seller" },
{"id":"Mike", "group": "seller" },
{"id":"Will", "group": "customer" },
{"id":"Markus", "group": "customer" },
{"id":"Albert", "group": "customer" }
],
"links":[
{"source":"Roger","target":"Will","product":8129,"price":30},
#...and so on
]
}
I tried to do something like:
df1 = pd.read_csv('file.csv')
seller_list = df1.seller.unique()
customer_list = df1.customer.unique()
..and I could get indeed lists with unique items. However, I could not find how I should add them in a dataframe in order to create an structure such as:
"node":[
...
{"id":"Mike", "group": "seller" },
{"id":"Markus", "group": "customer" },
...
]...#see above
Any support or hint on this is appreciated.

This will be a two step process. First, create the nodes dict using melt + drop_duplicates +to_dict -
nodes = df[['customer', 'seller']]\
.melt(var_name='group', value_name='id')\
.drop_duplicates()\
.to_dict('r')
Now, create the links dict using rename + to_dict
links = df.rename(columns={'seller' : 'source', 'customer' : 'target'}).to_dict('r')
Now, combine the data into one dictionary, and dump it as JSON to a file.
data = {'nodes' : nodes, 'links' : links}
with open('data.json', 'w') as f:
json.dump(data, f, indent=4)
Your data.json file should look like this -
{
"nodes": [
{
"id": "Will",
"group": "customer"
},
{
"id": "Markus",
"group": "customer"
},
{
"id": "Albert",
"group": "customer"
},
{
"id": "Roger",
"group": "seller"
},
{
"id": "Mike",
"group": "seller"
}
],
"links": [
{
"product": 8129,
"target": "Will",
"source": "Roger",
"price": 30
},
{
"product": 1234,
"target": "Markus",
"source": "Roger",
"price": 100
},
{
"product": 2334,
"target": "Will",
"source": "Roger",
"price": 50
},
{
"product": 2295,
"target": "Markus",
"source": "Mike",
"price": 20
},
{
"product": 1234,
"target": "Albert",
"source": "Mike",
"price": 100
}
]
}

Related

Using pandas to convert csv into nested json with dynamic strucutre

I am new to python and now want to convert a csv file into json file. Basically the json file is nested with dynamic structure, the structure will be defined using the csv header.
From csv input:
ID, Name, person_id/id_type, person_id/id_value,person_id_expiry_date,additional_info/0/name,additional_info/0/value,additional_info/1/name,additional_info/1/value,salary_info/details/0/grade,salary_info/details/0/payment,salary_info/details/0/amount,salary_info/details/1/next_promotion
1,Peter,PASSPORT,A452817,1-01-2055,Age,19,Gender,M,Manager,Monthly,8956.23,unknown
2,Jane,PASSPORT,B859804,2-01-2035,Age,38,Gender,F,Worker, Monthly,125980.1,unknown
To json output:
[
{
"ID": 1,
"Name": "Peter",
"person_id": {
"id_type": "PASSPORT",
"id_value": "A452817"
},
"person_id_expiry_date": "1-01-2055",
"additional_info": [
{
"name": "Age",
"value": 19
},
{
"name": "Gender",
"value": "M"
}
],
"salary_info": {
"details": [
{
"grade": "Manager",
"payment": "Monthly",
"amount": 8956.23
},
{
"next_promotion": "unknown"
}
]
}
},
{
"ID": 2,
"Name": "Jane",
"person_id": {
"id_type": "PASSPORT",
"id_value": "B859804"
},
"person_id_expiry_date": "2-01-2035",
"additional_info": [
{
"name": "Age",
"value": 38
},
{
"name": "Gender",
"value": "F"
}
],
"salary_info": {
"details": [
{
"grade": "Worker",
"payment": " Monthly",
"amount": 125980.1
},
{
"next_promotion": "unknown"
}
]
}
}
]
Is this something can be done by the existing pandas API or I have to write lots of complex codes to dynamically construct the json object? Thanks.

Getting all the Keys from JSON Object?

Goal: To create a script that will take in nested JSON object as input and output a CSV file with all keys as rows in the CSV?
Example:
{
"Document": {
"DocumentType": 945,
"Version": "V007",
"ClientCode": "WI",
"Shipment": [
{
"ShipmentHeader": {
"ShipmentID": 123456789,
"OrderChannel": "Shopify",
"CustomerNumber": 234234,
"VendorID": "2343SDF",
"ShipViaCode": "FEDX2D",
"AsnDate": "2018-01-27",
"AsnTime": "09:30:47-08:00",
"ShipmentDate": "2018-01-23",
"ShipmentTime": "09:30:47-08:00",
"MBOL": 12345678901234568,
"BOL": 12345678901234566,
"ShippingNumber": "1ZTESTTEST",
"LoadID": 321456987,
"ShipmentWeight": 10,
"ShipmentCost": 2.3,
"CartonsTotal": 2,
"CartonPackagingCode": "CTN25",
"OrdersTotal": 2
},
"References": [
{
"Reference": {
"ReferenceQualifier": "TST",
"ReferenceText": "Testing text"
}
}
],
"Addresses": {
"Address": [
{
"AddressLocationQualifier": "ST",
"LocationNumber": 23234234,
"Name": "John Smith",
"Address1": "123 Main St",
"Address2": "Suite 12",
"City": "Hometown",
"State": "WA",
"Zip": 92345,
"Country": "USA"
},
{
"AddressLocationQualifier": "BT",
"LocationNumber": 2342342,
"Name": "Jane Smith",
"Address1": "345 Second Ave",
"Address2": "Building 32",
"City": "Sometown",
"State": "CA",
"Zip": "23665-0987",
"Country": "USA"
}
]
},
"Orders": {
"Order": [
{
"OrderHeader": {
"PurchaseOrderNumber": 23456342,
"RetailerPurchaseOrderNumber": 234234234,
"RetailerOrderNumber": 23423423,
"CustomerOrderNumber": 234234234,
"Department": 3333,
"Division": 23423,
"OrderWeight": 10.23,
"CartonsTotal": 2,
"QTYOrdered": 12,
"QTYShipped": 23
},
"Cartons": {
"Carton": [
{
"SSCC18": 12345678901234567000,
"TrackingNumber": "1ZTESTTESTTEST",
"CartonContentsQty": 10,
"CartonWeight": 10.23,
"LineItems": {
"LineItem": [
{
"LineNumber": 1,
"ItemNumber": 1234567890,
"UPC": 9876543212,
"QTYOrdered": 34,
"QTYShipped": 32,
"QTYUOM": "EA",
"Description": "Shoes",
"Style": "Tall",
"Size": 9.5,
"Color": "Bllack",
"RetailerItemNumber": 2342333,
"OuterPack": 10
},
{
"LineNumber": 2,
"ItemNumber": 987654321,
"UPC": 7654324567,
"QTYOrdered": 12,
"QTYShipped": 23,
"QTYUOM": "EA",
"Description": "Sunglasses",
"Style": "Short",
"Size": 10,
"Color": "White",
"RetailerItemNumber": 565465456,
"OuterPack": 12
}
]
}
}
]
}
}
]
}
}
]
}
}
In the above JSON Object, I want all the keys (nested included) in a List (Duplicates can be removed by using a set Data Structure). If Nested Key Occurs like in actual JSON they can be keys multiple times in the CSV !

I personally feel that recursion is a perfect application for this type of problem if the amount of nests you will encounter is unpredictable. Here I have written an example in Python of how you can utilise recursion to extract all keys. Cheers.
import json
row = ""
def extract_keys(data):
global row
if isinstance(data, dict):
for key, value in data.items():
row += key + "\n"
extract_keys(value)
elif isinstance(data, list):
for element in data:
extract_keys(element)
# MAIN
with open("input.json", "r") as rfile:
dicts = json.load(rfile)
extract_keys(dicts)
with open("output.csv", "w") as wfile:
wfile.write(row)

Multiple Nested Dictionaries with Pandas

Is there a way to import this kind of JSON response into Pandas? Ive been trying to get it usable with json_normalize but I can't seem to get more than one level to work at a time ( I can get notes but can't being in custom_fields). I also cannot figure out how to call out something like ['reporter']['name'] (which should be jdoe). This is from Mantis and its the JSON output of a requests response. Im now wondering if it needs to br broken up into multiple frames and put back together, or should I use a for loop and put the data I want into a better format for PD to import?
In my head each item should be a column in the series all tied to the id column like this.
id | summary | project.name | reporter.name ..|.. custom.fields.Project_Stage | ... notes1.reporter.name | notes1.text ... notes2.reporter.name | notes2.text
{
"issues": [
{
"id": 1234,
"summary": "Some text",
"project": {
"id": 1,
"name": "North America"
},
"category": {
"id": 11,
"name": "Retail"
},
"reporter": {
"id": 1099,
"name": "jdoe"
},
"custom_fields": [
{
"field": {
"id": 107,
"name": "Product Escalations"
},
"value": ""
},
{
"field": {
"id": 1,
"name": "Project_Stage"
},
"value": "Pending"
}
],
"notes": [
{
"id": 214288,
"reporter": {
"id": 9999,
"name": "jdoe"
},
"text": "Worked with Mark over e-mail",
"view_state": {
"id": 10,
"name": "public",
"label": "public"
},
"type": "note",
"created_at": "2020-12-04T15:55:02-08:00",
"updated_at": "2020-12-04T15:55:02-08:00"
},
{
"id": 214289,
"reporter": {
"id": 9999,
"name": "jdoe"
},
"text": "I attempted on numerous occasions to setup a meeting with him to set it up for him.",
"view_state": {
"id": 10,
"name": "public",
"label": "public"
},
"type": "note",
"created_at": "2020-12-04T15:57:02-08:00",
"updated_at": "2020-12-04T15:57:02-08:00"
}
]
}
]
}
Here is what the DF would look like in my head. All the data for one ticket on one line/series.

Those structures with many lists at the same level are tricky. Try flatten_json. https://github.com/amirziai/flatten
If you're response is called 'dic', you can use this.
from flatten_json import flatten
dic_flattened = (flatten(d, '.') for d in dic['issues'])
df = pd.DataFrame(dic_flattened)
Output
id summary project.id project.name category.id category.name ... notes.1.view_state.id notes.1.view_state.name notes.1.view_state.label notes.1.type notes.1.created_at notes.1.updated_at
0 1234 Some text 1 North America 11 Retail ... 10 public public note 2020-12-04T15:57:02-08:00 2020-12-04T15:57:02-08:00
In [101]: df.columns
Out[101]:
Index(['id', 'summary', 'project.id', 'project.name', 'category.id',
'category.name', 'reporter.id', 'reporter.name',
'custom_fields.0.field.id', 'custom_fields.0.field.name',
'custom_fields.0.value', 'custom_fields.1.field.id',
'custom_fields.1.field.name', 'custom_fields.1.value', 'notes.0.id',
'notes.0.reporter.id', 'notes.0.reporter.name', 'notes.0.text',
'notes.0.view_state.id', 'notes.0.view_state.name',
'notes.0.view_state.label', 'notes.0.type', 'notes.0.created_at',
'notes.0.updated_at', 'notes.1.id', 'notes.1.reporter.id',
'notes.1.reporter.name', 'notes.1.text', 'notes.1.view_state.id',
'notes.1.view_state.name', 'notes.1.view_state.label', 'notes.1.type',
'notes.1.created_at', 'notes.1.updated_at'],
dtype='object')

how to convert multi valued CSV to Json

I have a csv file with 4 columns data as below.
type,MetalType,Date,Acknowledge
Metal,abc123451,2018-05-26,Success
Metal,abc123452,2018-05-27,Success
Metal,abc123454,2018-05-28,Failure
Iron,abc123455,2018-05-29,Success
Iron,abc123456,2018-05-30,Failure
( I just provided header in the above example data but in my case i dont have header in the data)
how can i convert above csv file to Json in the below format...
1st Column : belongs to --> "type": "Metal"
2nd Column : MetalType: "values" : "value": "abc123451"
3rd column : "Date": "values":"value": "2018-05-26"
4th Column : "Acknowledge": "values":"value": "Success"
and remaining all columns are default values.
As per below format ,
{
"entities": [
{
"id": "XXXXXXX",
"type": "Metal",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "abc123451"
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "2018-05-26"
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": "Success"
}
]
}
}
}
}
]
}

Even though jww is right, I built something for you:
I import the csv using pandas:
df = pd.read_csv('data.csv')
then I create a template for the dictionaries you want to add:
d_json = {"entities": []}
template = {
"id": "XXXXXXX",
"type": "",
"data": {
"attributes": {
"MetalType": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Date": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
},
"Acknowledge": {
"values": [
{
"source": "XYZ",
"locale": "Australia",
"value": ""
}
]
}
}
}
}
Now you just need to fill in the dictionary:
for i in range(len(df)):
d = template
d['type'] = df['type'][i]
d['data']['attributes']['MetalType']['values'][0]['value'] = df['MetalType'][i]
d['data']['attributes']['Date']['values'][0]['value'] = df['Date'][i]
d['data']['attributes']['Acknowledge']['values'][0]['value'] = df['Acknowledge'][i]
d_json['entities'].append(d)
I know my way of iterating over the df is kind of ugly, maybe someone knows a cleaner way.
Cheers!

How find data from JSON using python and watson discovery news

{
"matching_results": 1264,
"results": [
{
"main_image_url": "https://s4.reutersmedia.net/resources_v2/images/rcom-default.png",
"enriched_text": {
"entities": [
{
"relevance": 0.33,
"disambiguation": {
"subtype": [
"Country"
]
},
"sentiment": {
"score": 0
},
"type": "Location",
"count": 1,
"text": "China"
},
{
"relevance": 0.33,
"disambiguation": {
"subtype": [
"Country"
]
},
"sentiment": {
"score": 0
},
This is too much large file so I want to find "relevance" and "score" using python.
How fetch this info?

Regardless of how large it is, it is only a simple dictionary.
Iterate the lists. Extract the key-values.
for result in data['results']:
for e in result['enriched_text']['entities']:
print(e['relevance'])
print(e['sentiment']['score'])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to reshape DataFrame for nested JSON output - python

Related

Using pandas to convert csv into nested json with dynamic strucutre

Getting all the Keys from JSON Object?

Multiple Nested Dictionaries with Pandas

how to convert multi valued CSV to Json

How find data from JSON using python and watson discovery news

Categories

Resources