Recommended nested data-structure for querying using [python-lifter] - python

I am scraping a collection of text documents and building a json object to query with python-lifter. I currently have data like
[
[
{name:dad},
{name:son, dob:2/24/2000}
],
[
{name:forever_alone, cats:12}
]
]
I would like to do two different queries based on the existence of the dob key: 1) to get son and 2) to get the family that contains the son (dad and son). As I understand it, a list of lists of dictionaries is not well supported in lifter. Suspending for a moment the issue that lifter does not yet allow queries on fields that are not on every record, what would be a better, what would be a better structure for lifter?
a list of dictionaries of dictionaries?
[
{
0:{name:dad},
1:{name:son, dob:2/24/2000}
},
{
0:{name:forever_alone, cats:12}
}
]
or a dictionary of lists of dictionaries?
{18283923:
[
{name:dad},
{name:son, dob:2/24/2000}
],
18283927:
[
{name:forever_alone, cats:12}
]
}
And, given an ideal nested data structure, what are the two queries that would return 1) the son and 2) the family containing the son?

[Disclaimer: lifter maintainer here]
This kind of requests is not supported by lifter right now, because lifter will try to lookup queried fields on each object and will raise an error if the field does not exist.
Support for querying against iterable fields is not good at the moment either.
An issue has been opened regarding the missing fields problem though,
but anyway, your data structure is not really suited for such queries.
A better data structure would be:
families = [
{
'id': 1,
'members': [
{'name': 'dad'},
{'name': 'son', 'dob':'2/24/2000'}
]
},
{
'id': 2,
'members': [
{'name': 'forever_alone', 'cats': 12}
]
}
]
Then, after previous linked issues has been solved, you could query with something like:
Family = lifter.models.Model('Family')
manager = Family.load(families)
# get families with son/dob members
son_dob_families = manager.filter(Family.members.name == 'son', Family.members.dob.exists())\
.values(Family.id, Family.members)
# keep only son members with dob
Member = lifter.models.Model('Member')
members = [member for family in son_dob_families for member in family['members']]
sons_with_dob = Member.load(members).filter(Member.name == 'son', Member.dob.exists())
This is a theorical API though, it's not implemented yet.

Related

Is there an efficient way to write data to a JSON file using a dictionary in Python?

I'm writing a program in Python to use an API that needs to get input from a JSON payload in a really specific way which is shown below. The poid element will contain a different number with each run of the program, the inventories element contains a list of dictionaries that I am trying to send to the API.
[
{
"poid":"22130",
"inventories":
[
{
"item": "SAMPLE-ITEM-1",
"mfgr": "SAMPLE-MANUFACTURER-1",
"quantity": "1",
"condition": "REF"
},
{
"item": "SAMPLE-ITEM-2",
"mfgr": "SAMPLE-MANUFACTURER-2",
"quantity": "3",
"condition": "REF"
}
]
}
]
The data I need to put into the file is stored in a dictionary and a list as shown below. For simplicity of this post, I'm showing what the dictionary and list would look like after another method creates them. I'm not sure if this is the most efficient way of storing this data when I'm having to write it to JSON.
pn_and_mfgr_dict = {'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1', 'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'}
quantities = ["1","3"]
poid = 22130 #this will be different each run
If it makes sense from what I've written above, I need to generate a JSON file that looks like the first codeblock given the information from the second codeblock. The item at index 0 in the quantities list corresponds to the first key/value pair in the dictionary and so on. The "condition" value in the first codeblock will always have "REF" as its value for my use, but I need to also include that in the final payload that gets sent to the API. Since the part number and manufacturer dictionary will be a different length with each run, I also need this method to work regardless of how many values are in the dictionary. This dictionary and the quantities list will always be the same length though. I think the best way I can solve this is making a for loop that iterates through the dictionary and puts respective data where it needs to be, then reading the file when the for loop is done and sending it as the payload but please correct me if there's a better way to do this like storing everything in variables. I also have no experience with JSON so I have attempted to use JSON libraries to accomplish this with no idea what I'm doing wrong. I can edit this with my attempts tonight but I wanted to post this as soon as possible.
Here is one possible solution:
import json
pn_and_mfgr_dict = {
'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1',
'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'
}
quantities = ['1', '3']
poid = 22130
payload = {
'poid': poid,
'inventories': [{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF'
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)]
}
print(json.dumps(payload, indent=2))
The code above will result in:
{
"poid": 22130,
"inventories": [
{
"item": "SAMPLE-ITEM-1",
"mfgr": "SAMPLE-MANUFACTURER-1",
"quantity": "1",
"condition": "REF"
},
{
"item": "SAMPLE-ITEM-2",
"mfgr": "SAMPLE-MANUFACTURER-2",
"quantity": "3",
"condition": "REF"
}
]
}
Naturally, you can adjust that for multiple poids with something like this:
poids = [22130, 22131, 22132]
for poid in poids:
# implement here the logic to get items and quantities for
# each poid
payload = {
'poid': poid,
'inventories': [{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF'
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)]
}
print(json.dumps(payload, indent=2))
You will need to change it to have the correspondents items and quantities for each poid, and I leave that as starting point for you to implement.
Your second block is your input, so you could immediately start by write down a function taking those input and returning a JSON string.
import json
from typing import Dict, List
def jsonify_data(pn_and_mfgr_dict: Dict, quantities: List, poid: int):
constructed_data = [] # TODO
return json.dumps(constructed_data)
Then you could start working on using the inputs to construct the output data you desired. And you already know how to do it.
I think the best way I can solve this is making a for loop that iterates through the dictionary and puts respective data where it needs to be
Yes, that's the way to do it.
Here's my version of solution:
import json
from typing import Dict, List
def jsonify_data(pn_and_mfgr_dict: Dict, quantities: List, poid: int):
inventories = [
{
'item': item,
'mfgr': mfgr,
'quantity': quantity,
'condition': 'REF',
} for (item, mfgr), quantity in zip(pn_and_mfgr_dict.items(), quantities)
]
constructed_data = [
{
'poid': f'{poid}',
'inventories': inventories,
}
]
return json.dumps(constructed_data)
import json
data = {'inventories': [{'SAMPLE-ITEM-1': 'SAMPLE-MANUFACTURER-1'}, {'SAMPLE-ITEM-2': 'SAMPLE-MANUFACTURER-2'}]}
quantities = ["1", "3"]
poid = 22130
# Add poid to data
data['poid'] = poid
# Add quantities to data
for item in data['inventories']:
item['quantity'] = quantities.pop(0)
# Serializing json
json_object = json.dumps(data, indent=4)
print(json_object)

How to extract nested JSON data?

I am trying to get a value from a data JSON. I have successfully traversed deep into the JSON data and almost have what I need!
Running this command in Python :
autoscaling_name = response['Reservations'][0]['Instances'][0]['Tags']
Gives me this :
'Tags': [{'Key': 'Name', 'Value': 'Trove-Dev-Inst : App WebServer'}, {'Key': 'aws:autoscaling:groupName', 'Value': 'CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT'}, {'Key': 'CodeDeployProvisioningDeploymentId', 'Value': 'd-4WTRTRTRT'}, {'Key': 'Environment', 'Value': 'ernie-dev'}]
I only want to get the value "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT". This is from the key "aws:autoscaling:groupName".
How can I further my command to only return the value "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT"?
Is this the full output? This a dictionary containing a list with nested dictionaries, so you should treat it that way. Suppose it is called:
A = {
"Tags": [
{
"Key": "Name",
"Value": "Trove-Dev-Inst : App WebServer"
},
{
"Key": "aws:autoscaling:groupName",
"Value": "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT"
},
{
"Key": "CodeDeployProvisioningDeploymentId",
"Value": "d-4WTRTRTRT"
},
{
"Key": "Environment",
"Value": "ernie-dev"
}
]
}
Your first adress the object, then its key in the dictionary, the index within the list and the key for that dictionary:
print(A['Tags'][1]['Value'])
Output:
CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT
EDIT: Based on what you are getting then you should try:
autoscaling_name = response['Reservations'][0]['Instances'][0]['Tags'][1]['Value']
You could also use glom it's great for deeply nested functions and has sooo many uses that make complicated nested tasks easy.
For example translating #Celius's answer:
glom(A, 'Tags.1.Value')
Returns the same thing:
CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT
So to answer your original question you'd use:
glom(response, 'Reservations.0.Instances.0.Tags.1.Value')
The final code for this is -
tags = response['Reservations'][0]['Instances'][0]['Tags']
autoscaling_name = next(t["Value"] for t in tags if t["Key"] == "aws:autoscaling:groupName")
This also ensures that if the order of the data is moved in the JSON data it will still find the correct one.
For anyone struggling to get their heads around list comprehensions and iterators, the cherrypicker package (pip install --user cherrypicker) does this sort of thing for you pretty easily:
from cherrypicker import CherryPicker
tags = CherryPicker(response['Reservations'][0]['Instances'][0]['Tags'])
tags(Key="aws:autoscaling:groupName")[0]["Value"].get()
which gives you 'CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT'. If you're expecting multiple values, omit the [0] to get back a list of all values that have an associated "aws:autoscaling:groupName" key.
This is probably all a bit overkill for your question, which can be solved easily with a simple list comprehension. But this approach might come in handy if you need to do more complicated things later, like matching on partial keys only (e.g. aws:* or something more complicated like a regular expression), or you need to filter based on the values in an intermediate layer of the nested object. This sort of task could lead to lots of complicated nested for loops or list comprehensions, whereas with CherryPicker it stays as a simple, potentially one-line command.
You can find out more about advanced usage at https://cherrypicker.readthedocs.io.

Looping through Dictionary Transforms Type to List

I have a simple dictionary in python. For each item in the dictionary, I have another dictionary I need to attach to each line (i.e. 5 contacts, each contact has FirstName, LastName, Gender, plus 'other' fields which all fall in single embedded dictionary.
I have attached the loop I am using. The resulting output is exactly how I want it, but when I run a type() function on it, Python reads it as a list rather than a dictionary. How can I convert it back to a dictionary?
itemcount = 0
for item in dict_primarydata:
dict_primarydata[itemcount]['otherData'] = dict_otherdata[itemcount]
itemcount = itemcount+1
I'm going to hazard a guess and say dict_primarydata and dict_otherdata look something like this to start out:
dict_primarydata = [
{
'first_name': 'Kaylee',
'last_name': 'Smith',
'gender': 'f'
},
{
'first_name': 'Kevin',
'last_name': 'Hoyt',
'gender': 'm'
}
]
dict_otherdata = [
{
'note': 'Note about kaylee'
},
{
'note': 'Note about kevin'
}
]
It looks like dict_primarydata and dict_otherdata are initialized as lists of dicts. In other words, dict_primarydata is not actually a dict; it's a list containing several dicts.
If you want your output to be a dict containing dicts you need to perform a conversion. Before you can do the conversion, you need to decide what you will use as key to your outer dict.
Sidenote
Since you are iterating over two lists. A range-based for loop would be a bit more readable:
for i in range(len(dict_primarydata)):
dict_primarydata[i]['otherData'] = dict_otherdata[i]

Update DynamoDB nested map within a set using conditional

I'm trying to update an item in DynamoDB that has somewhat complicated(?) data structure.
Item:
{
'user_id': 'abc123',
'groups': [
{
'group_id': 'Group1',
'games_won': [],
'games_lost': []
},
{
'group_id': 'Group2',
'games_won': [],
'games_lost': []
},
]
}
I am trying to append a string to games_won on a specific group_id. I am trying to use a conditional to avoid multiple db queries but I can't seem to figure out how to iterate over groups in my conditional.
Basically, I want to do this:
for g in groups:
if g.group_id == 'Group2':
g.games_won.append('game12345')
Sorry for the complicated title. I'm a bit new to DynamoDB and NoSQL in general.
You could read the 'groups' attribute, then change the data outside of the query and when your done write the whole thing back. That way no matter how many groups you change you always have just one read and one write action. The number of read or write capacity units consumed is off course related to the size of your 'groups' attribute.

Unwind multiple arrays from different structure in document

I have these two types of structures in my collection:
{
_id: "date1",
users: [{"user": "123", ...}, {"user": "456", ...}]
}
and
{
_id: "date2",
points: [{"point": "1234", ...}, {"point": "5678", ...}]
}
I need to make an agregation, that returns me a list of these documents and only the specific point or user information and with skip and limit. Something like:
[
{_id: "date1", user: {"user": "123", ...}},
{_id: "date2", point: {"point": "1234", ...}},
]
I have used, I'm new in mongo, can you have me any recommendation?
collection.aggregate([
{"$unwind": "$users"},
{"$unwind": "$points"},
{"$match": {"$or": [
{'users.user': an_user},
{'points.point': a_point}]}},
{"$sort": {"_id": -1}},
{"$skip": 10},
{"$limit": 10}
])
with the information of one specific user or one specific point depending if point or user key is in that document
Given your provided document example you may be able to just utilise db.collection.find(), for example:
db.collection.find({"users.user": a_user}, {"users.$":1});
db.collection.find({"points.point": a_point}, {"points.$":1});
Depending on your use case, this may not be ideal because you're executing find() twice.
This would return your a list of documents with their _id. The above query is equivalent to saying:
Find all documents in the collection
where in array field users contain a user 'a_user'
OR in array field points contain a point 'a_point'
See also MongoDB Indexes
Having said the above, you should really reconsider your document schema. Depending on your use case, you may find difficulties in querying the data later on and may impact your query performance. Please review MongoDB Data Models to provide more information on how to design your schema.

Categories