Easiest way to split JSON file using Python - python

I am working on an interactive visualization of the world happiness report from the years 2015 up to 2020. The data was split into 6 csv files. Using pandas, I have succesfully cleaned the data and concatenated them into one big JSON file with the following format:
[
{
"Country": "Switzerland",
"Year": 2015,
"Happiness Rank": 1,
"Happiness Score": 7.587000000000001,
},
{
"Country": "Iceland",
"Year": 2015,
"Happiness Rank": 2,
"Happiness Score": 7.561,
},
{
"Country": "Switzerland",
"Year": 2016,
"Happiness Rank": 2,
"Happiness Score": 7.5089999999999995,
},
{
"Country": "Iceland",
"Year": 2016,
"Happiness Rank": 3,
"Happiness Score": 7.501,
},
{
"Country": "Switzerland",
"Year": 2017,
"Happiness Rank": 3,
"Happiness Score": 7.49399995803833,
},
{
"Country": "Iceland",
"Year": 2017,
"Happiness Rank": 1,
"Happiness Score": 7.801,
}
]
Now, I would like to programmatically format the JSON file such that it has the following format:
{
"2015": {
"Switzerland": {
"Happiness Rank": 1,
"Happiness Score": 7.587000000000001
},
"Iceland": {
"Happiness Rank": 2,
"Happiness Score": 7.561
}
},
"2016": {
"Switzerland": {
"Happiness Rank": 2,
"Happiness Score": 7.5089999999999995
},
"Iceland": {
"Happiness Rank": 3,
"Happiness Score": 7.501
}
},
"2017": {
"Switzerland": {
"Happiness Rank": 3,
"Happiness Score": 7.49399995803833
},
"Iceland": {
"Happiness Rank": 1,
"Happiness Score": 7.801
}
}
}
It has to be done programmatically, since there are over 900 distinct (country, year) pairs. I want the JSON in this format since it make the JSON file more readable, and makes it easier to select appropriate data. If I want the rank of Iceland in 2015, I can then do data[2015]["Iceland"]["Happiness Rank"]
Does anyone know the easiest / most convenient way to do this in Python?

If data is your original list of dictionaries:
def by_year(data):
from itertools import groupby
from operator import itemgetter
retain_keys = ("Happiness Rank", "Happiness Score")
for year, group in groupby(data, key=itemgetter("Year")):
as_tpl = tuple(group)
yield str(year), dict(zip(map(itemgetter("Country"), as_tpl), [{k: d[k] for k in retain_keys} for d in as_tpl]))
print(dict(by_year(data)))
Output:
{'2015': {'Switzerland': {'Happiness Rank': 1, 'Happiness Score': 7.587000000000001}, 'Iceland': {'Happiness Rank': 2, 'Happiness Score': 7.561}}, '2016': {'Switzerland': {'Happiness Rank': 2, 'Happiness Score': 7.5089999999999995}, 'Iceland': {'Happiness Rank': 3, 'Happiness Score': 7.501}}, '2017': {'Switzerland': {'Happiness Rank': 3, 'Happiness Score': 7.49399995803833}, 'Iceland': {'Happiness Rank': 1, 'Happiness Score': 7.801}}}
>>>
This assumes that the dictionaries in data will already be grouped together by year.

I assume you have the original pandas dataframe from which this JSON was created. With pandas, you can do df = df.groupby(['Year', 'Country']). You can then follow the procedure in pandas groupby to nested json to convert it to JSON.

you might find groupby from the itertools module useful. I was able to do this with
import itertools
groups = itertools.groupby(data, lambda x: x["Year"])
newdict = {str(year): {entry["Country"]:entry for entry in group} for year, group in groups}
Where data is the data with the form of the example you gave
It will retain the original fields in the dict, but it can easily be deleted in this way
for countries in newdict.values():
for c in countries.values():
del c["Year"]
del c["Country"]

Related

Parse List in nested dictionary Python

data = {
"persons": {"1": {"name": "siddu"}, "2": {"name": "manju"}},
"cars": {
"model1": {
"make": 1990,
"company_details": {
"name": "Ford Corporation",
"country": "US",
"some_list": [1, 2, 1],
},
},
"model2": {
"make": 1990,
"company_details": {
"name": "Ford Corporation",
"country": "US",
"some_list": [1, 2, 1, 1, 1],
},
},
},
}
This is my python object, How can I identify the Key's-Value is a list. example here, after traversing through 'print(data["cars"]["model1"]["company_details"]["some_list"])'I get the list, since it is small dictionary it was easy, but how can I identify the same if I encounter list as a value for some other key in future.
Example:
data = {
"persons": {"1": {"name": "siddu"}, "2": {"name": "manju"}},
"cars": {
"model1": {
"make": 1990,
"company_details": {
"name": "Ford Corporation",
"country": "US",
"some_list": [1, 2, 1],
},
},
"model2": {
"make": 1990,
"company_details": {
"name": "Ford Corporation",
"country": ["US", "UK", "IND"],
"some_list": [1, 2, 1, 1, 1],
},
},
},
}
Can anyone please suggest/guide me to understand how to identify the key's value is a list.
The final goal is to remove the duplicates in the list if any exists?
Thank you very much:)
You can have a recursive function that goes to any depth and make the items of the list unique like below:
In [8]: def removeDuplicatesFromList(di):
...: for key, val in di.items():
...: if isinstance(val, dict):
...: removeDuplicatesFromList(val)
...: elif isinstance(val, list):
...: di[key] =list(set(val))
...: else:
...: continue
...:
...:
In [9]: removeDuplicatesFromList(data)
In [10]: data
Out[10]:
{'persons': {'1': {'name': 'siddu'}, '2': {'name': 'manju'}},
'cars': {'model1': {'make': 1990,
'company_details': {'name': 'Ford Corporation',
'country': 'US',
'some_list': [1, 2]}},
'model2': {'make': 1990,
'company_details': {'name': 'Ford Corporation',
'country': 'US',
'some_list': [1, 2]}}}}

Combine json based on Key value using python

I have two JSON strings as sample:
json_1 = [
{
"breadth": 48.04,
"vessel_id": 1,
"vessel_name": "SHIP-01",
"vessel_type": "Crude Oil Tanker",
"year_built": 2012
},
{
"breadth": 42,
"vessel_id": 2,
"vessel_name": "SHIP-02",
"vessel_type": "Crude Oil Tanker",
"year_built": 2016
}
]
json_2 = [
{
"Ballast_miles": 43575.8,
"Ballast_miles_pct": 36.1,
"org_id": 1,
"port_days": 383.5,
"sea_days": 414.9,
"total_days": 798.4,
"vessel_id": 1
},
{
"Ballast_miles": 21642.7,
"Ballast_miles_pct": 29.8,
"org_id": 1,
"port_days": 325.7,
"sea_days": 259.8,
"total_days": 585.5,
"vessel_id": 2
}
]
I want to combine these two JSON based on vessel_id.
My output format should look like:
[{ vesselId: 1,
 json1:{},
 json2:{}
},
{ vesselId: 2,
 json1:{},
 json2:{}
}]
What I've tried so far is:
data = {'First_Json': json_1, 'Second_Json': json_2}
json.dumps(data)
But this combines entirely without checking based on vessel_id.
Something like this?
json_1 = [{ "breadth": 48.04, "vessel_id": 1, "vessel_name": "SHIP-01", "vessel_type": "Crude Oil Tanker", "year_built": 2012 }, { "breadth": 42, "vessel_id": 2, "vessel_name": "SHIP-02", "vessel_type": "Crude Oil Tanker", "year_built": 2016 }]
json_2 = [{ "Ballast_miles": 43575.8, "Ballast_miles_pct": 36.1, "org_id": 1, "port_days": 383.5, "sea_days": 414.9, "total_days": 798.4, "vessel_id": 1 }, { "Ballast_miles": 21642.7, "Ballast_miles_pct": 29.8, "org_id": 1, "port_days": 325.7, "sea_days": 259.8, "total_days": 585.5, "vessel_id": 2 }]
from collections import defaultdict
result = defaultdict(dict)
for item in json_1:
result[item['vessel_id']]['json_1'] = item
for item in json_2:
result[item['vessel_id']]['json_2'] = item
[{"vessel_id" : k,
"json1" : v['json_1'],
"json2" : v['json_2']}
for k,v in result.items()]
Output:
[{'json1': {'breadth': 48.04,
'vessel_id': 1,
'vessel_name': 'SHIP-01',
'vessel_type': 'Crude Oil Tanker',
'year_built': 2012},
'json2': {'Ballast_miles': 43575.8,
'Ballast_miles_pct': 36.1,
'org_id': 1,
'port_days': 383.5,
'sea_days': 414.9,
'total_days': 798.4,
'vessel_id': 1},
'vessel_id': 1},
{'json1': {'breadth': 42,
'vessel_id': 2,
'vessel_name': 'SHIP-02',
'vessel_type': 'Crude Oil Tanker',
'year_built': 2016},
'json2': {'Ballast_miles': 21642.7,
'Ballast_miles_pct': 29.8,
'org_id': 1,
'port_days': 325.7,
'sea_days': 259.8,
'total_days': 585.5,
'vessel_id': 2},
'vessel_id': 2}]
If you want to remove the redundant vessel_id, try using for loop with a del command on each dict

Merge and add two lists of dictionaries using Python

I have two lists of dictionaries and a piece of code that tries to merge them:
Json_Final =[]
try:
for keyIN in JSON1:
json_data_Merge= {}
for keyUS in JSON2:
if(keyIN['YEAR'] == keyUS['YEAR']) & (keyIN['MONTH'] == keyUS['MONTH'])& (keyIN['Name'] == keyUS['Name']):
json_data_Merge['YEAR'] = keyIN['YEAR']
json_data_Merge['MONTH'] = keyIN['MONTH']
json_data_Merge['Name'] = keyIN['Name']
json_data_Merge['Total']= int(keyIN['Total']) + int(keyUS['Total'])
Json_Final.append(json_data_Merge)
print( Json_Final )
except Exception as e:
print('MergeException',e)
JSON 1 = [{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 100},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 200},
{"YEAR": 2019, "MONTH": 2, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 100},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 200}]
JSON 2 = [{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 200},
{"YEAR": 2019, "MONTH": 1, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 400},
{"YEAR": 2019, "MONTH": 2, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Mango", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 500},
{"YEAR": 2019, "MONTH": 3, "Name": "Orange", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 250}]
Expected Output:
[{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 1, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 600},
{"YEAR": 2019, "MONTH": 2, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Mango", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 600},
{"YEAR": 2019, "MONTH": 3, "Name": "Orange", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 450}]
My Code Output:
[{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 600},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 300},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 450}]
This is one approach.
Ex:
JSON_1 = [{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 100},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 200},
{"YEAR": 2019, "MONTH": 2, "Name": "Apple", "Total": 300},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 100},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 200}]
JSON_2 = [{"YEAR": 2019, "MONTH": 1, "Name": "Apple", "Total": 200},
{"YEAR": 2019, "MONTH": 1, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Grape", "Total": 400},
{"YEAR": 2019, "MONTH": 2, "Name": "Orange", "Total": 300},
{"YEAR": 2019, "MONTH": 2, "Name": "Mango", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Grape", "Total": 500},
{"YEAR": 2019, "MONTH": 3, "Name": "Orange", "Total": 200},
{"YEAR": 2019, "MONTH": 3, "Name": "Apple", "Total": 250}]
JSON_2 = {"{}_{}_{}".format(i["YEAR"], i["MONTH"], i["Name"]): i for i in JSON_2} #Create a dict for easy loopup
for i in JSON_1:
key = "{}_{}_{}".format(i["YEAR"], i["MONTH"], i["Name"]) #Create key with Year, Month, Name
if key in JSON_2: #Check if item from JSON_1 exist in JSON_2
JSON_2[key]['Total'] += i["Total"] #Update Total
else:
JSON_2[key] = i #Else add new entry.
print(list(JSON_2.values())) #Get values.
Output:
[{'MONTH': 1, 'Name': 'Orange', 'Total': 300, 'YEAR': 2019},
{'MONTH': 2, 'Name': 'Mango', 'Total': 200, 'YEAR': 2019},
{'MONTH': 3, 'Name': 'Apple', 'Total': 450, 'YEAR': 2019},
{'MONTH': 1, 'Name': 'Apple', 'Total': 300, 'YEAR': 2019},
{'MONTH': 3, 'Name': 'Grape', 'Total': 600, 'YEAR': 2019},
{'MONTH': 3, 'Name': 'Orange', 'Total': 200, 'YEAR': 2019},
{'MONTH': 2, 'Name': 'Grape', 'Total': 600, 'YEAR': 2019},
{'MONTH': 2, 'Name': 'Apple', 'Total': 300, 'YEAR': 2019},
{'MONTH': 2, 'Name': 'Orange', 'Total': 300, 'YEAR': 2019}]

How to insert json from API to snowflake database using python?

I am getting data from Linkedin AD API using python.
I get the data as a json string.
How can I insert this json into Snowfalke table with a variant column?
Instead of variant, fields inside "elements" can also be inserted as a normal.
I am new to both json and python so would love to get some help on this.
Here is the sample json string I am getting.
{
"elements": [
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 3
},
"end": {
"month": 3,
"year": 2019,
"day": 3
}
},
"clicks": 11,
"impressions": 2453,
"pivotValues": [
"urn:li:sponsoredCampaign:1234567"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 4
},
"end": {
"month": 3,
"year": 2019,
"day": 4
}
},
"clicks": 4,
"impressions": 816,
"pivotValues": [
"urn:li:sponsoredCampaign:1234567"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 7
},
"end": {
"month": 3,
"year": 2019,
"day": 7
}
},
"clicks": 1,
"impressions": 629,
"pivotValues": [
"urn:li:sponsoredCampaign:1234565"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 21
},
"end": {
"month": 3,
"year": 2019,
"day": 21
}
},
"clicks": 3,
"impressions": 154,
"pivotValues": [
"urn:li:sponsoredCampaign:1323516"
]
}
],
"paging": {
"count": 10,
"start": 0,
"links": []
}
}
The documentation might be helpful here.
In particular:
INSERT INTO myTable (myColumn)
SELECT ('{"key3": "value3", "key4": "value4"}'::VARIANT);
Just insert your JSON string in the appropriate place.
Here is an example in python of how to insert JSON data:
https://github.com/snowflakedb/snowflake-connector-python/blob/master/test/test_cursor.py#L456
I imagine you're missing the parse_json function from your insert.

Converting a nested JSON to CSV in Python

I would like to convert nested json to a csv file.
I am receiving the json from Rest API.
The fields in csv should look like following.
daterange_start,daterange_end,clicks,impressions,pivotvalues.
I am new to Python and JSON so would love to get some help.
Here is the sample json.
{
"elements": [
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 3
},
"end": {
"month": 3,
"year": 2019,
"day": 3
}
},
"clicks": 11,
"impressions": 2453,
"pivotValues": [
"urn:li:sponsoredCampaign:1234567"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 7
},
"end": {
"month": 3,
"year": 2019,
"day": 7
}
},
"clicks": 1,
"impressions": 629,
"pivotValues": [
"urn:li:sponsoredCampaign:1234565"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 21
},
"end": {
"month": 3,
"year": 2019,
"day": 21
}
},
"clicks": 3,
"impressions": 154,
"pivotValues": [
"urn:li:sponsoredCampaign:1323516"
]
}
],
"paging": {
"count": 10,
"start": 0,
"links": []
}
}
You could use json_normalize. The only issue is the "pivotValues" is a list. So not sure what you'd want there, or if there are more than 1 element within those lists. If it's just one element, you can just easily process that column. If it can have multiple elements, you can eaither create a new row for each element (meaning you have multiple rows with the same data, except different pivotValues, or you could extend each row to have each pivotValues, but then would have nulls with those lists as different lengths.
I also added on there (seeing that the pivotValues all have same prefix), splitting out hat value for you in case yo needed it.
Given:
data = {
"elements": [
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 3
},
"end": {
"month": 3,
"year": 2019,
"day": 3
}
},
"clicks": 11,
"impressions": 2453,
"pivotValues": [
"urn:li:sponsoredCampaign:1234567"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 7
},
"end": {
"month": 3,
"year": 2019,
"day": 7
}
},
"clicks": 1,
"impressions": 629,
"pivotValues": [
"urn:li:sponsoredCampaign:1234565"
]
},
{
"dateRange": {
"start": {
"month": 3,
"year": 2019,
"day": 21
},
"end": {
"month": 3,
"year": 2019,
"day": 21
}
},
"clicks": 3,
"impressions": 154,
"pivotValues": [
"urn:li:sponsoredCampaign:1323516"
]
}
],
"paging": {
"count": 10,
"start": 0,
"links": []
}
}
Code:
import pandas as pd
from pandas.io.json import json_normalize
df = json_normalize(data['elements'])
df['pivotValues'] = df.pivotValues.apply(pd.Series).add_prefix('pivotValues_')
df['pivotValues_stripped'] = df['pivotValues'].str.rsplit(':',1, expand=True)[1]
df.to_csv('path/filename.csv', index=False)
Output:
print (results.to_string())
clicks dateRange.end.day dateRange.end.month dateRange.end.year dateRange.start.day dateRange.start.month dateRange.start.year impressions pivotValues pivotValues_stripped
0 11 3 3 2019 3 3 2019 2453 urn:li:sponsoredCampaign:1234567 1234567
1 1 7 3 2019 7 3 2019 629 urn:li:sponsoredCampaign:1234565 1234565
2 3 21 3 2019 21 3 2019 154 urn:li:sponsoredCampaign:1323516 1323516
You can load and parse the json in python with:
import json
y = json.loads(x)
y will be a python dict. Now loop over y['elements'] and create a list with your desired fields. For example extract the year of start and end dates:
list_for_csv=[]
for e in y['elements']:
list_for_csv.append([e['daterange']['start']['year'],e['daterange']['end']['year']])
Then use numpy to save as csv:
import numpy as np
for_csv = np.asarray(list_for_csv)
np.savetxt("your_file.csv", for_csv, delimiter=",")

Categories