Python Json data Group it by same last name - python

Newbie here. I have a Json data that have full name, age, Country and Department. By using python, how can i generate a new Json format with last name as a key and the Json data contain the total number of people that have same last name, list of age and list of departments?
Json as data
{
"John Jane": {
"age": 30,
"Country": "Denmark",
"Department": "Marketing"
},
"Gennie Jane": {
"age": 45,
"Country": "New Zealand",
"Department": "Finance"
},
"Mark Michael": {
"age": 55,
"Country": "Australia",
"Department": "HR"
},
"Jenny Jane": {
"age": 45,
"Country": "United States",
"Department": "IT"
},
"Jane Michael": {
"age": 27,
"Country": "United States",
"Department": "HR"
},
"Scofield Michael": {
"age": 37,
"Country": "England",
"Department": "HR"
}
}
Expected Result:
{
"Michael": {
"count": 3, // number of people that have same last name,
"age": {
"age1": 55,
"age2": 27,
"age3": 37
},
"Country": {
"Country1":"Australia",
"Country2":"United States",
"Country3":"England"
},
"Department": {
"Department1": "HR",
"Department2": "HR",
"Department3": "HR"
},
...
...
...
}
}

In my point of view, using dict for 'age', 'Country' or 'Department' is not necessary and more complicate, using list should be better.
import json
text = """{
"John Jane": {
"age": 30,
"Country": "Denmark",
"Department": "Marketing"
},
"Gennie Jane": {
"age": 45,
"Country": "New Zealand",
"Department": "Finance"
},
"Mark Michael": {
"age": 55,
"Country": "Australia",
"Department": "HR"
},
"Jenny Jane": {
"age": 45,
"Country": "United States",
"Department": "IT"
},
"Jane Michael": {
"age": 27,
"Country": "United States",
"Department": "HR"
},
"Scofield Michael": {
"age": 37,
"Country": "England",
"Department": "HR"
}
}"""
dictionary = json.loads(text)
result = {}
for key, value in dictionary.items():
last_name = key.split()[1]
if last_name in result:
result[last_name]['count'] += 1
result[last_name]['age'].append(value['age'])
result[last_name]['Country'].append(value['Country'])
result[last_name]['Department'].append(value['Department'])
else:
result[last_name] = {'count':1, 'age':[value['age']], 'Country':[value['Country']], 'Department':[value['Department']]}
print(result)
{'Jane': {'count': 3, 'age': [30, 45, 45], 'Country': ['Denmark', 'New Zealand', 'United States'], 'Department': ['Marketing', 'Finance', 'IT']}, 'Michael': {'count': 3, 'age': [55, 27, 37], 'Country': ['Australia', 'United States', 'England'], 'Department': ['HR', 'HR', 'HR']}}

Related

To delete dictionaries with duplicate values for keys in two dictionary lists

I want to remove a dictionary from the l2 list that has duplicate values for the ["name"] key in two dictionary lists.
How do I do this?
l = [
{
"id": 1,
"name": "John"
},
{
"id": 2,
"name": "Tom"
}
]
l2 = [
{
"name": "John",
"gender": "male",
"country": "USA"
},
{
"name": "Alex",
"gender": "male"
"country": "Canada"
},
{
"name": "Sofía",
"gender": "female"
"country": "Mexico"
},
]
Results sought
[
{
"name": "Alex",
"gender": "male"
"country": "Canada"
},
{
"name": "Sofía",
"gender": "female"
"country": "Mexico"
},
]
Try:
>>> [d for d in l2 if d["name"] not in [d1["name"] for d1 in l]]
[{'name': 'Alex', 'gender': 'male', 'country': 'Canada'},
{'name': 'Sofía', 'gender': 'female', 'country': 'Mexico'}]

Using Pandas for JSON to Excel conversion is putting the entire JSON file in 1 Excel Cell

I am making a call to the Google Places Autocomplete API which is returning data in a JSON formatted browser tab.
I use Beautiful Soup to get this data. I then write it to a file and get an unexpected output from pandas.
newUrl = webUrl+'json?input='+searchString+'&offset=0'+'&components=country:us&key='+apiKey
li = r"C:/Users/thebr/iCloudDrive/code/python/captones/capstoneData.json"
r = requests.get(newUrl)
html_page = r.content
soup = BeautifulSoup(html_page, 'html.parser')
newDictionary = json.loads(str(soup))
out_file = open("python/captones/nydata.json", "w")
json.dump(newDictionary, out_file)
out_file.close()
df = pd.read_json("python/captones/nydata.json")
df.to_excel('output.xlsx', index=False)
My output puts the entire JSON file into 1 cell. Do I have to add some parameters?
EDIT: Here's an example JSON that the API provides:
{
"predictions": [{
"description": "COVID-19 vaccine location - Stony Brook University, Nicolls Road, Stony Brook, NY, USA",
"matched_substrings": [{
"length": 5,
"offset": 0
}],
"place_id": "ChIJw-RIqTo_6IkRnW9u_u9x1b8",
"reference": "ChIJw-RIqTo_6IkRnW9u_u9x1b8",
"structured_formatting": {
"main_text": "COVID-19 vaccine location - Stony Brook University",
"main_text_matched_substrings": [{
"length": 5,
"offset": 0
}],
"secondary_text": "Nicolls Road, Stony Brook, NY, USA"
},
"terms": [{
"offset": 0,
"value": "COVID-19 vaccine location - Stony Brook University"
}, {
"offset": 52,
"value": "Nicolls Road"
}, {
"offset": 66,
"value": "Stony Brook"
}, {
"offset": 79,
"value": "NY"
}, {
"offset": 83,
"value": "USA"
}],
"types": ["health", "point_of_interest", "establishment"]
}, {
"description": "COVID-19 Vaccine Location - Meadowlands Racing & Entertainment, Racetrack Dr, East Rutherford, NJ, USA",
"matched_substrings": [{
"length": 5,
"offset": 0
}],
"place_id": "ChIJ5TThhWP4wokRQ9TLrhSPQRQ",
"reference": "ChIJ5TThhWP4wokRQ9TLrhSPQRQ",
"structured_formatting": {
"main_text": "COVID-19 Vaccine Location - Meadowlands Racing & Entertainment",
"main_text_matched_substrings": [{
"length": 5,
"offset": 0
}],
"secondary_text": "Racetrack Dr, East Rutherford, NJ, USA"
},
"terms": [{
"offset": 0,
"value": "COVID-19 Vaccine Location - Meadowlands Racing & Entertainment"
}, {
"offset": 64,
"value": "Racetrack Dr"
}, {
"offset": 78,
"value": "East Rutherford"
}, {
"offset": 95,
"value": "NJ"
}, {
"offset": 99,
"value": "USA"
}],
"types": ["health", "point_of_interest", "establishment"]
}, {
"description": "COVID-19 Drive-thru Testing at Walgreens, New York 112, Medford, NY, USA",
"matched_substrings": [{
"length": 5,
"offset": 0
}],
"place_id": "ChIJVVUVMLpI6IkRb0XzqEXttHQ",
"reference": "ChIJVVUVMLpI6IkRb0XzqEXttHQ",
"structured_formatting": {
"main_text": "COVID-19 Drive-thru Testing at Walgreens",
"main_text_matched_substrings": [{
"length": 5,
"offset": 0
}],
"secondary_text": "New York 112, Medford, NY, USA"
},
"terms": [{
"offset": 0,
"value": "COVID-19 Drive-thru Testing at Walgreens"
}, {
"offset": 42,
"value": "New York 112"
}, {
"offset": 56,
"value": "Medford"
}, {
"offset": 65,
"value": "NY"
}, {
"offset": 69,
"value": "USA"
}],
"types": ["pharmacy", "health", "point_of_interest", "store", "establishment"]
}, {
"description": "COVID-19 Vaccine Location - Medgar Evers College, Crown Street, Brooklyn, NY, USA",
"matched_substrings": [{
"length": 5,
"offset": 0
}],
"place_id": "ChIJqQOySnFbwokRcormV9Q7bvA",
"reference": "ChIJqQOySnFbwokRcormV9Q7bvA",
"structured_formatting": {
"main_text": "COVID-19 Vaccine Location - Medgar Evers College",
"main_text_matched_substrings": [{
"length": 5,
"offset": 0
}],
"secondary_text": "Crown Street, Brooklyn, NY, USA"
},
"terms": [{
"offset": 0,
"value": "COVID-19 Vaccine Location - Medgar Evers College"
}, {
"offset": 50,
"value": "Crown Street"
}, {
"offset": 64,
"value": "Brooklyn"
}, {
"offset": 74,
"value": "NY"
}, {
"offset": 78,
"value": "USA"
}],
"types": ["health", "establishment"]
}, {
"description": "COVID-19 Drive-Thru Testing at Walgreens, West Main Street, Patchogue, NY, USA",
"matched_substrings": [{
"length": 5,
"offset": 0
}],
"place_id": "ChIJVVVlOxpJ6IkRDcNMmDf-qi8",
"reference": "ChIJVVVlOxpJ6IkRDcNMmDf-qi8",
"structured_formatting": {
"main_text": "COVID-19 Drive-Thru Testing at Walgreens",
"main_text_matched_substrings": [{
"length": 5,
"offset": 0
}],
"secondary_text": "West Main Street, Patchogue, NY, USA"
},
"terms": [{
"offset": 0,
"value": "COVID-19 Drive-Thru Testing at Walgreens"
}, {
"offset": 42,
"value": "West Main Street"
}, {
"offset": 60,
"value": "Patchogue"
}, {
"offset": 71,
"value": "NY"
}, {
"offset": 75,
"value": "USA"
}],
"types": ["health", "point_of_interest", "establishment"]
}],
"status": "OK"
}
Also to add, this is how the output looks after converting to excel:
And this is how it looks using an online json to excel Converter (What I want):
It can help you select only the predictions object and then reading it with pandas. To flatten out more the output you will need additional work. You can look here for an example Conversion from nested json to csv with pandas
import json
import pandas as pd
dic = json.load("python/captones/nydata.json")
dic_predictions = dic["predictions"]
df = pd.read_json(dic_predictions)
df.to_excel('output.xlsx', index=False)

How to combine dups from dictionary with Python [duplicate]

I have this list of dictionaries:
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
I want to be able to find the duplicates of ingredients (by either name or id). If there are duplicates and have the same unit_of_measurement, combine them into one dictionary and add the quantity accordingly. So the above data should return:
[
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
How do I go about it?
Assuming you have a dictionary represented like this:
data = {
"ingredients": [
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Pound (Lb)", "id": 13},
"quantity": "1/2",
"ingredient": {"name": "Balsamic Vinegar", "id": 12},
},
{
"unit_of_measurement": {"name": "Tablespoon", "id": 15},
"ingredient": {"name": "Basil Leaves", "id": 14},
"quantity": "3",
},
]
}
What you could do is use a collections.defaultdict of lists to group the ingredients by a (name, id) grouping key:
from collections import defaultdict
ingredient_groups = defaultdict(list)
for ingredient in data["ingredients"]:
key = tuple(ingredient["ingredient"].items())
ingredient_groups[key].append(ingredient)
Then you could go through the grouped values of this defaultdict, and calculate the sum of the fraction quantities using fractions.Fractions. For unit_of_measurement and ingredient, we could probably just use the first grouped values.
from fractions import Fraction
result = [
{
"unit_of_measurement": value[0]["unit_of_measurement"],
"quantity": str(sum(Fraction(ingredient["quantity"]) for ingredient in value)),
"ingredient": value[0]["ingredient"],
}
for value in ingredient_groups.values()
]
Which will then give you this result:
[{'ingredient': {'id': 12, 'name': 'Balsamic Vinegar'},
'quantity': '1',
'unit_of_measurement': {'id': 13, 'name': 'Pound (Lb)'}},
{'ingredient': {'id': 14, 'name': 'Basil Leaves'},
'quantity': '3',
'unit_of_measurement': {'id': 15, 'name': 'Tablespoon'}}]
You'll probably need to amend the above to account for ingredients with different units or measurements, but this should get you started.

Convert a multi index dataframe to json

Here is the multi index data frame
accounting sales
PhNumber age firstName lastName PhNumber age firstName lastName
0 <PH_Number> 29 <first_Name> <last_Name> <PH_Number> 29 <first_Name> <last_Name>
1 <PH_Number> 38 <first_Name> <last_Name> <PH_Number> 48 <first_Name> <last_Name>
How do I convert this to a proper json?
I have used pandas.to_json().
But couldn't get the desired output like this
{ "accounting": [{"firstName": <first_name>,
"lastName": <last_name>,
"age": 29,
"PhNumber": <PH_Number>},
{"firstName": <first_name>,
"lastName": "<last_name>",
"age": 38,
"PhNumber": <PH_Number>}],
"sales": [{"firstName": "<first_name>",
"lastName": "<last_name>",
"age": 29,
"PhNumber": <PH_Number>},
{"firstName": "<first_name>",
"lastName": "<last_name>",
"age": 48,
"PhNumber": <PH_Number>}]}
What you ask is beyond the possibilies of to_json, so you should first compute the Python data structure and then convert it to JSON:
data_struct = {k: df[k].to_dict(orient='records') for k in df.columns.levels[0]}
You can then easily build a JSON file (or string):
print(json.dumps(data_struct, indent=2)
gives:
{
"accounting": [
{
"PhNumber": "<PH_Number>",
"age": 29,
"firstName": "<first_Name>",
"lastName": "<last_Name>"
},
{
"PhNumber": "<PH_Number>",
"age": 38,
"firstName": "<first_Name>",
"lastName": "<last_Name>"
}
],
"sales": [
{
"PhNumber": "<PH_Number>",
"age": 29,
"firstName": "<first_Name>",
"lastName": "<last_Name>"
},
{
"PhNumber": "<PH_Number>",
"age": 48,
"firstName": "<first_Name>",
"lastName": "<last_Name>"
}
]
}

Adding new pairs to a json file

I have a json file I need to add pairs to, I convert it into a dict, but now I need to put my new values in a specific place.
This is some of the json file I convert:
"rootObject": {
"id": "6ff0010c-00fe-485b-b695-4ddd6aca4dcd",
"type": "IDO_GEAR",
"children": [
{
"id": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab",
"type": "IDO_SYSTEM_LOADCASE",
"children": [],
"childList": "SYSTEMLOADCASE",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab"
},
{
"name": "IDCO_DESIGNATION",
"value": "Lastfall 1"
},
{
"name": "IDSLC_TIME_PORTION",
"value": 100
},
{
"name": "IDSLC_DISTANCE_PORTION",
"value": 100
},
{
"name": "IDSLC_OPERATING_TIME_IN_HOURS",
"value": 1
},
{
"name": "IDSLC_OPERATING_TIME_IN_SECONDS",
"value": 3600
},
{
"name": "IDSLC_OPERATING_REVOLUTIONS",
"value": 1
},
{
"name": "IDSLC_OPERATING_DISTANCE",
"value": 1
},
{
"name": "IDSLC_ACCELERATION",
"value": 9.81
},
{
"name": "IDSLC_EPSILON_X",
"value": 0
},
{
"name": "IDSLC_EPSILON_Y",
"value": 0
},
{
"name": "IDSLC_EPSILON_Z",
"value": 0
},
{
"name": "IDSLC_CALCULATION_WITH_OWN_WEIGHT",
"value": "CO_CALCULATION_WITHOUT_OWN_WEIGHT"
},
{
"name": "IDSLC_CALCULATION_WITH_TEMPERATURE",
"value": "CO_CALCULATION_WITH_TEMPERATURE"
},
{
"name": "IDSLC_FLAG_FOR_LOADCASE_CALCULATION",
"value": "LB_CALCULATE_LOADCASE"
},
{
"name": "IDSLC_STATUS_OF_LOADCASE_CALCULATION",
"value": false
}
I want to add somthing like ENTRY_ONE and ENTRY_TWO like this:
"rootObject": {
"id": "6ff0010c-00fe-485b-b695-4ddd6aca4dcd",
"type": "IDO_GEAR",
"children": [
{
"id": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab",
"type": "IDO_SYSTEM_LOADCASE",
"children": [],
"childList": "SYSTEMLOADCASE",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab"
},
{
"name": "IDCO_DESIGNATION",
"value": "Lastfall 1"
},
{
"name": "IDSLC_TIME_PORTION",
"value": 100
},
{
"name": "IDSLC_DISTANCE_PORTION",
"value": 100
},
{
"name": "ENTRY_ONE",
"value": 100
},
{
"name": "ENTRY_TWO",
"value": 55
},
{
"name": "IDSLC_OPERATING_TIME_IN_HOURS",
"value": 1
},
{
"name": "IDSLC_OPERATING_TIME_IN_SECONDS",
"value": 3600
},
{
"name": "IDSLC_OPERATING_REVOLUTIONS",
"value": 1
},
{
"name": "IDSLC_OPERATING_DISTANCE",
"value": 1
},
{
"name": "IDSLC_ACCELERATION",
"value": 9.81
},
{
"name": "IDSLC_EPSILON_X",
"value": 0
},
{
"name": "IDSLC_EPSILON_Y",
"value": 0
},
{
"name": "IDSLC_EPSILON_Z",
"value": 0
},
{
"name": "IDSLC_CALCULATION_WITH_OWN_WEIGHT",
"value": "CO_CALCULATION_WITHOUT_OWN_WEIGHT"
},
{
"name": "IDSLC_CALCULATION_WITH_TEMPERATURE",
"value": "CO_CALCULATION_WITH_TEMPERATURE"
},
{
"name": "IDSLC_FLAG_FOR_LOADCASE_CALCULATION",
"value": "LB_CALCULATE_LOADCASE"
},
{
"name": "IDSLC_STATUS_OF_LOADCASE_CALCULATION",
"value": false
}
How can I add the entries so that they are under the IDCO_IDENTIFICATION tag, and not under the rootObject?
The way I see your json file, it WOULD be under rootObject as EVERYTHING is under that key. There's quite a few closing brackets and braces missing.
So I can only assume you are meaning you want it directly under IDCO_IDENTIFICATION (which is nested under rootObject). But that doesn't match what you have as your example output either. You have the new ENTRY_ONE and ENTRY_TWO within the properties, within the children, within the rootObject, not "under" IDCO_IDENTIFICATION. So I'm going to follow what you are asking for from your example output.
import json
with open('C:/test.json') as f:
data = json.load(f)
new_elements = [{"name":"ENTRY_ONE", "value":100},
{"name":"ENTRY_TWO", "value":55}]
for each in new_elements:
data['rootObject']['children'][0]['properties'].append(each)
with open('C:/test.json', 'w') as f:
json.dump(data, f)
Output:
import pprint
pprint.pprint(data)
{'rootObject': {'children': [{'childList': 'SYSTEMLOADCASE',
'children': [],
'id': '1dd94d1a-e52d-40b3-a82b-6db02a8fbbab',
'properties': [{'name': 'IDCO_IDENTIFICATION',
'value': '1dd94d1a-e52d-40b3-a82b-6db02a8fbbab'},
{'name': 'IDCO_DESIGNATION',
'value': 'Lastfall 1'},
{'name': 'IDSLC_TIME_PORTION',
'value': 100},
{'name': 'IDSLC_DISTANCE_PORTION',
'value': 100},
{'name': 'IDSLC_OPERATING_TIME_IN_HOURS',
'value': 1},
{'name': 'IDSLC_OPERATING_TIME_IN_SECONDS',
'value': 3600},
{'name': 'IDSLC_OPERATING_REVOLUTIONS',
'value': 1},
{'name': 'IDSLC_OPERATING_DISTANCE',
'value': 1},
{'name': 'IDSLC_ACCELERATION',
'value': 9.81},
{'name': 'IDSLC_EPSILON_X',
'value': 0},
{'name': 'IDSLC_EPSILON_Y',
'value': 0},
{'name': 'IDSLC_EPSILON_Z',
'value': 0},
{'name': 'IDSLC_CALCULATION_WITH_OWN_WEIGHT',
'value': 'CO_CALCULATION_WITHOUT_OWN_WEIGHT'},
{'name': 'IDSLC_CALCULATION_WITH_TEMPERATURE',
'value': 'CO_CALCULATION_WITH_TEMPERATURE'},
{'name': 'IDSLC_FLAG_FOR_LOADCASE_CALCULATION',
'value': 'LB_CALCULATE_LOADCASE'},
{'name': 'IDSLC_STATUS_OF_LOADCASE_CALCULATION',
'value': False},
{'name': 'ENTRY_ONE',
'value': 100},
{'name': 'ENTRY_TWO',
'value': 55}],
'type': 'IDO_SYSTEM_LOADCASE'}],
'id': '6ff0010c-00fe-485b-b695-4ddd6aca4dcd',
'type': 'IDO_GEAR'}}

Categories