How to extract nested JSON data? - python

I am trying to get a value from a data JSON. I have successfully traversed deep into the JSON data and almost have what I need!
Running this command in Python :
autoscaling_name = response['Reservations'][0]['Instances'][0]['Tags']
Gives me this :
'Tags': [{'Key': 'Name', 'Value': 'Trove-Dev-Inst : App WebServer'}, {'Key': 'aws:autoscaling:groupName', 'Value': 'CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT'}, {'Key': 'CodeDeployProvisioningDeploymentId', 'Value': 'd-4WTRTRTRT'}, {'Key': 'Environment', 'Value': 'ernie-dev'}]
I only want to get the value "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT". This is from the key "aws:autoscaling:groupName".
How can I further my command to only return the value "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT"?

Is this the full output? This a dictionary containing a list with nested dictionaries, so you should treat it that way. Suppose it is called:
A = {
"Tags": [
{
"Key": "Name",
"Value": "Trove-Dev-Inst : App WebServer"
},
{
"Key": "aws:autoscaling:groupName",
"Value": "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT"
},
{
"Key": "CodeDeployProvisioningDeploymentId",
"Value": "d-4WTRTRTRT"
},
{
"Key": "Environment",
"Value": "ernie-dev"
}
]
}
Your first adress the object, then its key in the dictionary, the index within the list and the key for that dictionary:
print(A['Tags'][1]['Value'])
Output:
CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT
EDIT: Based on what you are getting then you should try:
autoscaling_name = response['Reservations'][0]['Instances'][0]['Tags'][1]['Value']

You could also use glom it's great for deeply nested functions and has sooo many uses that make complicated nested tasks easy.
For example translating #Celius's answer:
glom(A, 'Tags.1.Value')
Returns the same thing:
CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT
So to answer your original question you'd use:
glom(response, 'Reservations.0.Instances.0.Tags.1.Value')

The final code for this is -
tags = response['Reservations'][0]['Instances'][0]['Tags']
autoscaling_name = next(t["Value"] for t in tags if t["Key"] == "aws:autoscaling:groupName")
This also ensures that if the order of the data is moved in the JSON data it will still find the correct one.

For anyone struggling to get their heads around list comprehensions and iterators, the cherrypicker package (pip install --user cherrypicker) does this sort of thing for you pretty easily:
from cherrypicker import CherryPicker
tags = CherryPicker(response['Reservations'][0]['Instances'][0]['Tags'])
tags(Key="aws:autoscaling:groupName")[0]["Value"].get()
which gives you 'CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT'. If you're expecting multiple values, omit the [0] to get back a list of all values that have an associated "aws:autoscaling:groupName" key.
This is probably all a bit overkill for your question, which can be solved easily with a simple list comprehension. But this approach might come in handy if you need to do more complicated things later, like matching on partial keys only (e.g. aws:* or something more complicated like a regular expression), or you need to filter based on the values in an intermediate layer of the nested object. This sort of task could lead to lots of complicated nested for loops or list comprehensions, whereas with CherryPicker it stays as a simple, potentially one-line command.
You can find out more about advanced usage at https://cherrypicker.readthedocs.io.

Related

Is there an efficient way to write data to a JSON file using a dictionary in Python?

I'm writing a program in Python to use an API that needs to get input from a JSON payload in a really specific way which is shown below. The poid element will contain a different number with each run of the program, the inventories element contains a list of dictionaries that I am trying to send to the API.
[
{
"poid":"22130",
"inventories":
[
{
"item": "SAMPLE-ITEM-1",
"mfgr": "SAMPLE-MANUFACTURER-1",
"quantity": "1",
"condition": "REF"
},
{
"item": "SAMPLE-ITEM-2",
"mfgr": "SAMPLE-MANUFACTURER-2",
"quantity": "3",
"condition": "REF"
}
]
}
]
The data I need to put into the file is stored in a dictionary and a list as shown below. For simplicity of this post, I'm showing what the dictionary and list would look like after another method creates them. I'm not sure if this is the most efficient way of storing this data when I'm having to write it to JSON.
pn_and_mfgr_dict = {'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1', 'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'}
quantities = ["1","3"]
poid = 22130 #this will be different each run
If it makes sense from what I've written above, I need to generate a JSON file that looks like the first codeblock given the information from the second codeblock. The item at index 0 in the quantities list corresponds to the first key/value pair in the dictionary and so on. The "condition" value in the first codeblock will always have "REF" as its value for my use, but I need to also include that in the final payload that gets sent to the API. Since the part number and manufacturer dictionary will be a different length with each run, I also need this method to work regardless of how many values are in the dictionary. This dictionary and the quantities list will always be the same length though. I think the best way I can solve this is making a for loop that iterates through the dictionary and puts respective data where it needs to be, then reading the file when the for loop is done and sending it as the payload but please correct me if there's a better way to do this like storing everything in variables. I also have no experience with JSON so I have attempted to use JSON libraries to accomplish this with no idea what I'm doing wrong. I can edit this with my attempts tonight but I wanted to post this as soon as possible.
Here is one possible solution:
import json
pn_and_mfgr_dict = {
'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1',
'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'
}
quantities = ['1', '3']
poid = 22130
payload = {
'poid': poid,
'inventories': [{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF'
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)]
}
print(json.dumps(payload, indent=2))
The code above will result in:
{
"poid": 22130,
"inventories": [
{
"item": "SAMPLE-ITEM-1",
"mfgr": "SAMPLE-MANUFACTURER-1",
"quantity": "1",
"condition": "REF"
},
{
"item": "SAMPLE-ITEM-2",
"mfgr": "SAMPLE-MANUFACTURER-2",
"quantity": "3",
"condition": "REF"
}
]
}
Naturally, you can adjust that for multiple poids with something like this:
poids = [22130, 22131, 22132]
for poid in poids:
# implement here the logic to get items and quantities for
# each poid
payload = {
'poid': poid,
'inventories': [{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF'
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)]
}
print(json.dumps(payload, indent=2))
You will need to change it to have the correspondents items and quantities for each poid, and I leave that as starting point for you to implement.
Your second block is your input, so you could immediately start by write down a function taking those input and returning a JSON string.
import json
from typing import Dict, List
def jsonify_data(pn_and_mfgr_dict: Dict, quantities: List, poid: int):
constructed_data = [] # TODO
return json.dumps(constructed_data)
Then you could start working on using the inputs to construct the output data you desired. And you already know how to do it.
I think the best way I can solve this is making a for loop that iterates through the dictionary and puts respective data where it needs to be
Yes, that's the way to do it.
Here's my version of solution:
import json
from typing import Dict, List
def jsonify_data(pn_and_mfgr_dict: Dict, quantities: List, poid: int):
inventories = [
{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF',
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)
]
constructed_data = [
{
'poid': f'{poid}',
'inventories': inventories,
}
]
return json.dumps(constructed_data)
import json
data = {'inventories': [{'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1'}, {'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'}]}
quantities = ["1", "3"]
poid = 22130
# Add poid to data
data['poid'] = poid
# Add quantities to data
for item in data['inventories']:
item['quantity'] = quantities.pop(0)
# Serializing json
json_object = json.dumps(data, indent=4)
print(json_object)

How to extract the given keys in one dictionary from another into a new dictionary (Python)

I'm trying to practice sets and dictionaries, and one thing I've been finding is myself stuck on this practice problem over and over.
For example if I have a dictionary like
employees =[
{
"name": "Jamie Mitchell",
"job": "Head Chef",
"city": "Toronto",
},
{
"name": "Michell Anderson",
"job": "Line Cook",
"city": "Mississauga",
}
]
How would I extract the second part of the dictionary from the first in order to only have the information on the right be in a new dictionary?
Quick Answer:
employees is a list of dictionaries so you can just directly index the list to get Michell:
newDict = employees[1]
More Detailed Answer:
Firstly, here is how you create a key-value pair in a dictionary:
dct = {}
dct['color'] = 'blue' # dct = {'color':'blue'}
Knowing this, all you would need to copy a dictionary is the keys and values. You can use the .keys(),.values(), and .items() methods of dictionaries to do this.
dct.keys() # returns a list of all the keys -> ['color']
dct.values() # returns a list of all the values -> ['blue']
dct.items() # return a list of all the pairs as tuples -> [('color','blue')]
There are other options to copy as another user has mentioned however I strongly suggest you get used to work with the 3 methods listed above. If you haven't already, make sure you are really comfortable with lists before you jump into dictionaries and combined structures. You already seem to know how to work loops so hopefully this is helpful enough, good luck!
You have them backwards; the outer one [] is a list. The inner ones {} are dictionaries.
You can get the second one with employees[1] (indexing starts from 0) or the last one with employees[-1] (in this case they are the same).
If you need a copy, you can call .copy() on the result:
second_employee = employees[1]
copied_details = second_employee.copy()

How to iterate over a python dictionary, setting the key of the dictionary as another dictionary's value

I come from a C++ background, I am new to Python, and I suspect this problem has something to do with [im]mutability.
I am building a JSON representation in Python that involves several layers of nested lists and dictionaries in one "object". My goal is to call jsonify on the end result and have it look like nicely structured data.
I hit a problem while building out an object:
approval_groups_list = list()
approval_group_dict = dict()
for groupMemKey, groupvals in groupsAndMembersDict.items():
approval_group_dict["group_name"] = groupMemKey
approval_group_dict["name_dot_numbers"] = groupvals # groupvals is a list of strings
approval_groups_list.append(approval_group_dict)
entity_approval_unit["approval_groups"] = approval_groups_list
The first run does as expected, but after, whatever groupMemkey is touched last, that is what all other objects mirror.
groupsAndMembersDict= {
'Art': ['string.1', 'string.2', 'string.3'],
'Math': ['string.10', 'string.20', 'string.30']
}
Expected result:
approval_groups:
[
{
"group_name": "Art",
"name_dot_numbers": ['string.1', 'string.2', 'string.3']
},
{
"group_name": "Math",
"name_dot_numbers": ['string.10', 'string.20', 'string.30']
}
]
Actual Result:
approval_groups:
[
{
"group_name": "Math",
"name_dot_numbers": ['string.10', 'string.20', 'string.30']
},
{
"group_name": "Math",
"name_dot_numbers": ['string.10', 'string.20', 'string.30']
}
]
What is happening, and how do I fix it?
Your problem is not the immutability, but the mutability of objects. I'm sure you would have ended up with the same result with the equivalent C++ code.
You construct approval_group_dict before the for loop and keep reusing it. All you have to do is to move the construction inside for so that a new object is created for each loop:
approval_groups_list = list()
for groupMemKey, groupvals in groupsAndMembersDict.items():
approval_group_dict = dict()
...
Through writing this question, it dawned on me to try a few things including this, which fixed my problem - however, I still don't know exactly why this works. Perhaps it is more like a pointer/referencing problem?
approval_groups_list = list()
approval_group_dict = dict()
for groupMemKey, groupvals in groupsAndMembersDict.items():
approval_group_dict["group_name"] = groupMemKey
approval_group_dict["name_dot_numbers"] = groupvals
approval_groups_list.append(approval_group_dict.copy()) # <== note, here is the difference ".copy()"
entity_approval_unit["approval_groups"] = approval_groups_list
EDIT: The problem turns out to be that Python is Pass by [object] reference all the time. If you are new to Python like me, this means: "pass by reference, except when the thing you are passing is immutable, then its pass by value". So in a way it did have to do with [im]mutability. Mostly it had to do with my lack of understanding how Python passes references.

Accessing Nested JSON [AWS Metadata] with Python

I'm using Lambda to run through my AWS account, returning a list of all instances. I need to be able to print out all of the 'VolumeId' values, but I can't work out how to access them as they are nested. I am able to print out the first VolumeId for each instance, however, some of the instances have several volumes, and some only have one. I think I know why I get these results, but I can't work out what to do to get all of them back.
Here's a snippet of what the JSON for one instance looks like:
{
'Groups':[],
'Instances':[
{
'AmiLaunchIndex':0,
'ImageId':'ami-0',
'InstanceId':'i-0123',
'InstanceType':'big',
'KeyName':'nonprod',
'LaunchTime':'date',
'Monitoring':{
'State':'disabled'
},
'Placement':{
'AvailabilityZone':'world',
'GroupName':'',
'Tenancy':'default'
},
'PrivateDnsName':'secret',
'PrivateIpAddress':'1.2.3.4',
'ProductCodes':[
],
'PublicDnsName':'',
'State':{
'Code':80,
'Name':'stopped'
},
'StateTransitionReason':'User initiated',
'SubnetId':'subnet-1',
'VpcId':'vpc-1',
'Architecture':'yes',
'BlockDeviceMappings':[
{
'DeviceName':'/sda',
'Ebs':{
'AttachTime':'date',
'DeleteOnTermination':True,
'Status':'attached',
'VolumeId':'vol-1'
}
},
{
'DeviceName':'/sdb',
'Ebs':{
'AttachTime':'date'),
'DeleteOnTermination':False,
'Status':'attached',
'VolumeId':'vol-2'
}
}
],
This is what I'm doing to get the first VolumeId:
ec2client = boto3.client('ec2')
ec2 = ec2client.describe_instances()
for reservation in ec2["Reservations"]:
for instance in reservation["Instances"]:
instanceid = instance["InstanceId"]
volumes = instance["BlockDeviceMappings"][0]["Ebs"]["VolumeId"]
print("The associated volume IDs for this instance are: ",(volumes))
I think the reason that I'm getting just the first ID is because I'm referencing the first element within "BlockDeviceMappings", but I can't work out how to get the other ones. If I try it without specifying the [0], I get the list indices must be integers or slices, not str error. I tried to use a dictionary instead of a list too, but felt like I was barking up the wrong tree with that one. Any suggestions/help would be appreciated!
One possible answer, not particularly pythonic
...
id_list = []
volumes_data = instance["BlockDeviceMappings"]
for element in volumes_data:
id_list.append(element["Ebs"]["VolumeId"])
Or else use json.loads and then iterate though json using .get syntax like the final answer in this

Update DynamoDB nested map within a set using conditional

I'm trying to update an item in DynamoDB that has somewhat complicated(?) data structure.
Item:
{
'user_id': 'abc123',
'groups': [
{
'group_id': 'Group1',
'games_won': [],
'games_lost': []
},
{
'group_id': 'Group2',
'games_won': [],
'games_lost': []
},
]
}
I am trying to append a string to games_won on a specific group_id. I am trying to use a conditional to avoid multiple db queries but I can't seem to figure out how to iterate over groups in my conditional.
Basically, I want to do this:
for g in groups:
if g.group_id == 'Group2':
g.games_won.append('game12345')
Sorry for the complicated title. I'm a bit new to DynamoDB and NoSQL in general.
You could read the 'groups' attribute, then change the data outside of the query and when your done write the whole thing back. That way no matter how many groups you change you always have just one read and one write action. The number of read or write capacity units consumed is off course related to the size of your 'groups' attribute.

Categories