Pydantic create model for list with nested dictionary - python

I have a body looks like this:
{
"data": [
{
"my_api": {
"label": "First name",
"value": "Micheal"
}
},
{
"my_api": {
"label": "Last name",
"value": [
"Jackson"
]
}
},
{
"my_api": {
"label": "Favourite colour",
"value": "I don't have any"
}
}
]
}
This is my model.py so far:
class DictParameter(BaseModel): # pylint: disable=R0903
"""
`my_api` children
"""
label: Optional[str]
value: Optional[str]
class DataParameter(BaseModel): # pylint: disable=R0903
"""
`data` children
"""
my_api: Optional[dict] # NOTE: Future readers, this incorrect reference is part of the OP's Q
class InputParameter(BaseModel): # pylint: disable=R0903
"""
Takes predefined params
"""
data: Optional[List[DataParameter]]
In main.py:
from model import InputParameter
#router.post("/v2/workflow", status_code=200)
def get_parameter(user_input: InputParameter):
"""
Version 2 : No decoding & retrieve workflow params
"""
data = user_input.data
print(data)
Output:
[DataParameter(my_api={'label': 'First name', 'value': 'Micheal'}), DataParameter(my_api={'label': 'Last name', 'value': ['Jackson']}), DataParameter(my_api={'label': 'Favourite colour', 'value': "I don't have any"})]
I want to access the value inside my_api key. But I keep getting type error. I'm not sure how to access List of dictionary with nested child. Plus, the value of value can be str or array. It is dynamic.
Is there any other way of doing this?

Plus, the value of value can be str or array. It is dynamic.
What you currently have will cast single element lists to strs, which is probably what you want. If you want lists to stay as lists, use:
from Typing import Union
class DictParameter(BaseModel):
Value: Union[str, list[str]]
Unless you have the good luck to be running python 3.10, on which case str | list[str] is equivalent.
However, you do not actually use this model! You have my_api: Optional[dict] not my_api: Optional[DictParameter], so your current output is a plain old dict, and you need to do data[0].my_api["value"]. Currently this returns a str or a list, which is probably the problem. I suspect, though, that you meant to use the pydantic schema.
Note that data is a list: if you want all the values you need to iterate, something like
apis = [x.my_api for x in data]

Assuming you fix the issue in DictParameter (as pointed out by other answer by #2e0byo):
class DictParameter(BaseModel):
label: Optional[str]
value: Optional[Union[str, List[str]]]
And you fix the issue in DataParameter:
class DataParameter(BaseModel):
# my_api: Optional[dict] <-- prev value
my_api: Optional[DictParameter]
You can access the values in your object the following way:
def get_value_from_data_param(param_obj: InputParameter, key: str):
"""
Returns a value from an InputParameter object,
or returns `None` if not found
"""
# Iterate over objects
for item in param_obj.data:
# Skip if no value
if not item.my_api:
continue
# This assumes there are no duplicate labels
# if there are, perhaps make a list and append values
if item.my_api.label == label:
return item.my_api.value
# If nothing is found, return None (or some `default`)
return None
Now let's test it:
input_data = {
"data": [
{"my_api": {"label": "First name", "value": "Micheal"}},
{"my_api": {"label": "Last name", "value": ["Jordan"]}},
{"my_api": {"label": "Favourite colour", "value": "I don't have any"}}
]
}
# Create an object
input_param_obj = InputParameter.parse_obj(input_data)
# Let's see if we can get values:
f_name = get_value_from_data_param(input_param_obj, "First name")
assert f_name == 'Michael'
l_name = get_value_from_data_param(input_param_obj, "Last name")
assert l_name == ['Jordan']
nums = get_value_from_data_param(input_param_obj, "Numbers")
assert nums == ["1", "2", "3"]
erroneous = get_value_from_data_param(input_param_obj, "KEY DOES NOT EXIST")
assert erroneous == None

Related

Validating "Parallel" JSON Arrays

I am trying to use pydantic to validate JSON that is being returned in a "parallel" array format. Namely, there is an array defining the column names/types followed by an array of "rows" (this is similar to how pandas handles df.to_json(orient='split') seen here)
{
"columns": [
"sensor",
"value",
"range"
],
"data": [
[
"a",
1,
{"low": 1, "high": 2}
],
[
"b",
2,
{"low": 0, "high": 2}
]
]
}
I know that I could do this:
class ValueRange(BaseModel):
low: int
high: int
class Response(BaseModel):
columns: Tuple[Literal['sensor'], Literal['value'], Literal['range']]
data: List[Tuple[str, int, ValueRange]]
But this has a few downsides:
After parsing, it doesn't allow for an association of the data with the column names. So, you have to do everything by index. Ideally, I would like to parse a response into a List[Row] and then be able to do things like response.data[0].sensor.
It hardcodes the column order.
It doesn't allow for responses that have variable columns in the responses. For example, the same endpoint could also return the following:
{
"columns": ["sensor", "value"],
"data": [
["a", 1],
["b", 2]
]
}
At first I thought that I could use pydantic's discriminated unions, but I'm not seeing how to do this across arrays.
Anyone know of the best approach for validating this type of data? (I'm currently using pydantic, but am open to other libraries if it makes sense).
Thanks!
TL;DR
A very interesting use case for a custom validator.
from collections.abc import Sequence
from typing import Literal, Optional
from pydantic import BaseModel, validator
from pydantic.fields import Field
Column = Literal["sensor", "value", "range"]
class ValueRange(BaseModel):
low: int
high: int
class DataPoint(BaseModel):
sensor: str
value: int
range: Optional[ValueRange]
class Response(BaseModel):
columns: Optional[tuple[Column, ...]] = Field(exclude=True, repr=False)
data: list[DataPoint]
#validator("columns", pre=True)
def ensure_distinct(cls, v: object) -> object:
if isinstance(v, Sequence) and len(v) != len(set(v)):
raise ValueError("`columns` must all be distinct")
return v
#validator("data", pre=True, each_item=True)
def parse_nested_sequence(
cls,
v: object,
values: dict[str, object],
) -> object:
if not isinstance(v, Sequence):
return v
columns = values.get("columns")
if not isinstance(columns, Sequence):
raise TypeError(
"If `data` items are provided as a sequences, "
"the `columns` must be present as a sequence."
)
if len(columns) != len(v):
raise ValueError(
"`data` item must be the same length as `columns`"
)
return dict(zip(columns, v))
Explanation
Schema
First we need to set up the models to reflect the schema we want to have after parsing and validation are complete.
Since you mentioned that you would like the data field in the response model to be a list of model instances corresponding to a certain schema, we need to define that schema. From your example it seems to need at least the fields sensor, value and range. The range field should be its own model once again. You also mentioned that range should be optional, so we'll encode that too.
The actual top-level Response model will still have the columns field because we will need that for validation later on. But we can hide that field from the string representation and the exporter methods like dict and json. To communicate the variable number of elements in that columns tuple, we'll change the annotation a bit.
Here are the models I would suggest:
from collections.abc import Sequence
from typing import Literal, Optional
from pydantic import BaseModel, validator
from pydantic.fields import Field
Column = Literal["sensor", "value", "range"]
class ValueRange(BaseModel):
low: int
high: int
class DataPoint(BaseModel):
sensor: str
value: int
range: Optional[ValueRange]
class Response(BaseModel):
columns: Optional[tuple[Column, ...]] = Field(exclude=True, repr=False)
data: list[DataPoint]
... # more code
Field types and parameters
To hide columns we can use the appropriate parameters of the Field constructor. Making it Optional (default will be None) means we could still initialize an instance of Response without it, but we would then of course be forced to provide the data list in the correct format (as dictionaries or DataPoint instances).
Since we used tuple[Column, ...] as the annotation for columns, the elements could be in any order, which is what you wanted, but it could also theoretically be of arbitrary length and contain a bunch of duplicates. The Python typing system does not provide any elegant tool to define the type in a way that indicates that all elements of the tuple must be distinct. We could of course construct a huge type union of all permutations of the literal values that should be valid, but this is hardly practical.
Validators
Instead I would suggest a very simple validator to do this check for us:
...
class Response(BaseModel):
columns: Optional[tuple[Column, ...]] = Field(exclude=True, repr=False)
data: list[DataPoint]
#validator("columns", pre=True)
def ensure_distinct(cls, v: object) -> object:
if isinstance(v, Sequence) and len(v) != len(set(v)):
raise ValueError("`columns` must all be distinct")
return v
... # more code
The pre=True is actually important here, but only in conjunction with the second validator. That one is much more interesting because it is supposed to bring those columns together with the sequences of data. Here is what I propose:
...
class Response(BaseModel):
columns: Optional[tuple[Column, ...]] = Field(exclude=True, repr=False)
data: list[DataPoint]
#validator("columns", pre=True)
def ensure_distinct(cls, v: object) -> object:
if isinstance(v, Sequence) and len(v) != len(set(v)):
raise ValueError("`columns` must all be distinct")
return v
#validator("data", pre=True, each_item=True)
def parse_nested_sequence(
cls,
v: object,
values: dict[str, object],
) -> object:
if not isinstance(v, Sequence):
return v
columns = values.get("columns")
if not isinstance(columns, Sequence):
raise TypeError(
"If `data` items are provided as a sequences, "
"the `columns` must be present as a sequence."
)
if len(columns) != len(v):
raise ValueError(
"`data` item must be the same length as `columns`"
)
return dict(zip(columns, v))
The pre=True on both validators ensures that they are run before the default validators for those field types (otherwise we would immediately get a validation error from your example data). Field validators are always called in the order the fields were defined, so we ensured that our custom columns validator is called before our custom data validator.
That order also allows us to access the output of our columns validator in the values dictionary of our data validator.
The each_item=True flag changes the behavior of the validator so that it is applied to each individual element of the data list as opposed to the entire list. That means with our example data the v argument will always be a single "sub-list" (e.g. ["a", 1]).
If the value to validate is not of a sequence type, we don't bother with it. It will then be handled appropriately by the default field validator. If it is a sequence, we need to make sure that columns is present and also a sequence and that they are of the same length. If those checks pass, we can just zip them, consume that zip into a dictionary and send it on its merry way to the default field validator.
That's all.
Demo
Here is a little demo script:
def main() -> None:
from pydantic import ValidationError
print(Response.parse_raw(TEST_JSON_VALID_1).json(indent=2), "\n")
print(Response.parse_raw(TEST_JSON_VALID_2).json(indent=2), "\n")
try:
Response.parse_raw(TEST_JSON_INVALID_1)
except ValidationError as exc:
print(exc.json(indent=2), "\n")
try:
Response.parse_raw(TEST_JSON_INVALID_2)
except ValidationError as exc:
print(exc.json(indent=2), "\n")
try:
Response.parse_raw(TEST_JSON_INVALID_3)
except ValidationError as exc:
print(exc.json(indent=2))
if __name__ == "__main__":
main()
And here is the test data and its corresponding output:
TEST_JSON_VALID_1
{
"columns": [
"sensor",
"value",
"range"
],
"data": [
[
"a",
1,
{"low": 1, "high": 2}
],
[
"b",
2,
{"low": 0, "high": 2}
]
]
}
{
"data": [
{
"sensor": "a",
"value": 1,
"range": {
"low": 1,
"high": 2
}
},
{
"sensor": "b",
"value": 2,
"range": {
"low": 0,
"high": 2
}
}
]
}
TEST_JSON_VALID_2 (different order and no range)
{
"columns": ["value", "sensor"],
"data": [
[1, "a"],
[2, "b"]
]
}
{
"data": [
{
"sensor": "a",
"value": 1,
"range": null
},
{
"sensor": "b",
"value": 2,
"range": null
}
]
}
TEST_JSON_INVALID_1
{
"columns": ["foo", "value"],
"data": []
}
[
{
"loc": [
"columns",
0
],
"msg": "unexpected value; permitted: 'sensor', 'value', 'range'",
"type": "value_error.const",
"ctx": {
"given": "foo",
"permitted": [
"sensor",
"value",
"range"
]
}
}
]
TEST_JSON_INVALID_2
{
"columns": ["value", "value"],
"data": []
}
[
{
"loc": [
"columns"
],
"msg": "`columns` must all be distinct",
"type": "value_error"
}
]
TEST_JSON_INVALID_3
{
"columns": ["sensor", "value"],
"data": [["a", 1], ["b"]]
}
[
{
"loc": [
"data",
1
],
"msg": "`data` item must be the same length as `columns`",
"type": "value_error"
}
]

Extracting JSON value from key

I have a JSON object like so, and I need to extract the name value of any object, using the id. I have tried many different iterations of this but I can't seem to get anything to work. Any general pointers would be much appreciated. Thank you.
{
"weeks":[
{
"1":[
{
"name":"Stackoverflow Question",
"description":"Have you ever asked a question on StackoverFlow?",
"date":"11/25/2019",
"id":"whewhewhkahfasdjkhgjks"
},
{
"name":"I Can't Believe It's Not Butter!",
"description":"Can you believe it? I sure can't.",
"date":"11/25/2019",
"id":"agfasdgasdgasdgawe"
}
]
},
{
"2":[
{
"name":"Hello World",
"description":"A hello world.",
"date":"12/02/2019",
"id":"aewgasdgewa"
},
{
"name":"Testing 123",
"description":"Sometimes people don't say it be like it is but it do.",
"date":"12/04/2019",
"id":"asdgasdgasdgasd"
}
]
}
]
}
Hope you need to find the name based on id, then try out the code below,
def get_name(data, id):
for week in data['weeks']:
for i in week:
for j in week[i]:
if j['id'] == id:
return j['name']
return None
get_name(data, 'asdgasdgasdgasd')
output
'Testing 123'
Not sure if this is what you are looking for
for week in a["weeks"]:
for k, v in week.values():
print(v['name'])
considering the variable a your dict.
Is the structure fixed, or can the depth of the JSON differ from the example?
This one would work as well if there are more or lesser hierarchies.
It basically searches in each dictionary inside a JSON-like structure for the field_name and returns the value of the argument output_name.
Maybe it helps you when your data structure changes :)
data = {
"weeks":[
{
"1":[
{
"name":"Stackoverflow Question",
"description":"Have you ever asked a question on StackoverFlow?",
"date":"11/25/2019",
"id":"whewhewhkahfasdjkhgjks"
},
{
"name":"I Can't Believe It's Not Butter!",
"description":"Can you believe it? I sure can't.",
"date":"11/25/2019",
"id":"agfasdgasdgasdgawe"
}
]
},
{
"2":[
{
"name":"Hello World",
"description":"A hello world.",
"date":"12/02/2019",
"id":"aewgasdgewa"
},
{
"name":"Testing 123",
"description":"Sometimes people don't say it be like it is but it do.",
"date":"12/04/2019",
"id":"asdgasdgasdgasd"
}
]
}
]
}
def extract_name(data, field_name: str, matching_value: str, output_name: str):
"""
:param data: json-like datastructure in which you want to search
:param field_name: the field name with which you want to match
:param matching_value: the value you want to match
:param output_name: the name of the value which you want to get
:return:
"""
if isinstance(data, list):
for item in data:
res = _inner_extract_name(item, field_name, matching_value, output_name)
if res is not None:
return res
elif isinstance(data, dict):
for item in data.values():
res = _inner_extract_name(item, field_name, matching_value, output_name)
if res is not None:
return res
def _inner_extract_name(item, field_name, matching_value, output_name):
if isinstance(item, dict):
res = extract_name(item, field_name, matching_value, output_name)
if field_name in item:
if item[field_name] == matching_value:
if output_name in item:
return item[output_name]
else:
res = extract_name(item, field_name, matching_value, output_name)
return res
if __name__ == "__main__":
name = extract_name(data, "id", "aewgasdgewa", "name")
print(name)
``

How do I extract a list item from nested json in Python?

I have a json object and I'm trying to extract a couple of values from a nested list. Then print them in markup. I'm getting and error - AttributeError: 'list' object has no attribute 'get'
I understand that it's a list and I can't preform a get. I've been searching for the proper method for a few hours now and I'm running out of steam. I'm able to get the Event, but not Value1 and Value2.
This is the json object
{
"resource": {
"data": {
"event": "qwertyuiop",
"eventVersion": "1.05",
"parameters": {
"name": "sometext",
"othername": [
""
],
"thing": {
"something": {
"blah": "whatever"
},
"abc": "123",
"def": {
"xzy": "value"
}
},
"something": [
"else"
]
},
"whatineed": [{
"value1": "text.i.need",
"value2": "text.i.need.also"
}]
}
}
}
And this is my function
def parse_json(json_data: dict) -> Info:
some_data = json_data.get('resource', {})
specific_data = some_data.get('data', {})
whatineed_data = specific_data.get('whatineed', {})
formatted_json = json.dumps(json_data, indent=2)
description = f'''
h3. Details
*Event:* {some_data.get('event')}
*Value1:* {whatineed_data('value1')}
*Value2:* {whatineed_data('value2')}
'''
From the data structure, whatineed is a list with a single item, which in turn is a dictionary. So, one way to access it would be:
whatineed_list = specific_data.get('whatineed', [])
whatineed_dict = whatineed_list[0]
At this point you can do:
value1 = whatineed_dict.get('value1')
value2 = whatineed_dict.get('value2')
You can change your function to the following:
def parse_json(json_data: dict) -> Info:
some_data = json_data.get('resource')
specific_data = some_data.get('data', {})
whatineed_data = specific_data.get('whatineed', {})
formatted_json = json.dumps(json_data, indent=2)
description = '''
h3. Details
*Event:* {}
*Value1:* {}
*Value2:* {}
'''.format(some_data.get('data').get('event'),whatineed_data[0]['value1'], whatineed_data[0]['value2'])
Since whatineed_data is a list, you need to index the element first
Python handles json as strings unless they are coming directly from a file. This could be the source for some of your problems. Also this article might help.
Assuming that "whatineed" attribute is really a list, and it's elements are dicts, you can't call whatineed.get asking for Value1 or Value2 as if they are attributes, because it is a list and it don't have attributes.
So, you have two options:
If whatineed list has a single element ever, you can access this element directly and than access the element attributes:
element = whatineed[0]
v1 = element.get('value1', {})
v2 = element.get('value2', {})
Or, if whatineed list can have more items, so, you will need to iterate over this list and access those elements:
for element in whatineed:
v1 = element.get('value1', {})
v2 = element.get('value2', {})
## Do something with values

Update nested map dynamodb

I have a dynamodb table with an attribute containing a nested map and I would like to update a specific inventory item that is filtered via a filter expression that results in a single item from this map.
How to write an update expression to update the location to "in place three" of the item with name=opel,tags include "x1" (and possibly also f3)?
This should just update the first list elements location attribute.
{
"inventory": [
{
"location": "in place one", # I want to update this
"name": "opel",
"tags": [
"x1",
"f3"
]
},
{
"location": "in place two",
"name": "abc",
"tags": [
"a3",
"f5"
]
}],
"User" :"test"
}
Updated Answer - based on updated question statement
You can update attributes in a nested map using update expressions such that only a part of the item would get updated (ie. DynamoDB would apply the equivalent of a patch to your item) but, because DynamoDB is a document database, all operations (Put, Get, Update, Delete etc.) work on the item as a whole.
So, in your example, assuming User is the partition key and that there is no sort key (I didn't see any attribute that could be a sort key in that example), an Update request might look like this:
table.update_item(
Key={
'User': 'test'
},
UpdateExpression="SET #inv[0].#loc = :locVal",
ExpressionAttributeNames={
'#inv': 'inventory',
'#loc': 'location'
},
ExpressionAttributeValues={
':locVal': 'in place three',
},
)
That said, you do have to know what the item schema looks like and which attributes within the item should be updated exactly.
DynamoDB does NOT have a way to operate on sub-items. Meaning, there is no way to tell Dynamo to execute an operation such as "update item, set 'location' property of elements of the 'inventory' array that have a property of 'name' equal to 'opel'"
This is probably not the answer you were hoping for, but it is what's available today. You may be able to get closer to what you want by changing the schema a bit.
If you need to reference the sub-items by name, perhaps storing something like:
{
"inventory": {
"opel": {
"location": "in place one", # I want to update this
"tags": [ "x1", "f3" ]
},
"abc": {
"location": "in place two",
"tags": [ "a3", "f5" ]
}
},
"User" :"test"
}
Then your query would be:
table.update_item(
Key={
'User': 'test'
},
UpdateExpression="SET #inv.#brand.#loc = :locVal",
ExpressionAttributeNames={
'#inv': 'inventory',
'#loc': 'location',
'#brand': 'opel'
},
ExpressionAttributeValues={
':locVal': 'in place three',
},
)
But YMMV as even this has limitations because you are limited to identifying inventory items by name (ie. you still can't say "update inventory with tag 'x1'"
Ultimately you should carefully consider why you need Dynamo to perform these complex operations for you as opposed to you being specific about what you want to update.
You can update the nested map as follow:
First create and empty item attribute of type map. In the example graph is the empty item attribute.
dynamoTable = dynamodb.Table('abc')
dynamoTable.put_item(
Item={
'email': email_add,
'graph': {},
}
Update nested map as follow:
brand_name = 'opel'
DynamoTable = dynamodb.Table('abc')
dynamoTable.update_item(
Key={
'email': email_add,
},
UpdateExpression="set #Graph.#brand= :name, ",
ExpressionAttributeNames={
'#Graph': 'inventory',
'#brand': str(brand_name),
},
ExpressionAttributeValues = {
':name': {
"location": "in place two",
'tag': {
'graph_type':'a3',
'graph_title': 'f5'
}
}
Updating Mike's answer because that way doesn't work any more (at least for me).
It is working like this now (attention for UpdateExpression and ExpressionAttributeNames):
table.update_item(
Key={
'User': 'test'
},
UpdateExpression="SET inv.#brand.loc = :locVal",
ExpressionAttributeNames={
'#brand': 'opel'
},
ExpressionAttributeValues={
':locVal': 'in place three',
},
)
And whatever goes in Key={}, it is always partition key (and sort key, if any).
EDIT:
Seems like this way only works when with 2 level nested properties. In this case you would only use "ExpressionAttributeNames" for the "middle" property (in this example, that would be #brand: inv.#brand.loc). I'm not yet sure what is the real rule now.
DynamoDB UpdateExpression does not search on the database for matching cases like SQL (where you can update all items that match some condition). To update an item you first need to identify it and get primary key or composite key, if there are many items that match your criteria, you need to update one by one.
then the issue to update nested objects is to define UpdateExpression,ExpressionAttributeValues & ExpressionAttributeNames to pass to Dynamo Update Api .
I use a recursive function to update nested Objects on dynamoDB. You ask for Python but I use javascript, I think is easy to see this code and implents on Python:
https://gist.github.com/crsepulv/4b4a44ccbd165b0abc2b91f76117baa5
/**
* Recursive function to get UpdateExpression,ExpressionAttributeValues & ExpressionAttributeNames to update a nested object on dynamoDB
* All levels of the nested object must exist previously on dynamoDB, this only update the value, does not create the branch.
* Only works with objects of objects, not tested with Arrays.
* #param obj , the object to update.
* #param k , the seed is any value, takes sense on the last iteration.
*/
function getDynamoExpression(obj, k) {
const key = Object.keys(obj);
let UpdateExpression = 'SET ';
let ExpressionAttributeValues = {};
let ExpressionAttributeNames = {};
let response = {
UpdateExpression: ' ',
ExpressionAttributeNames: {},
ExpressionAttributeValues: {}
};
//https://stackoverflow.com/a/16608074/1210463
/**
* true when input is object, this means on all levels except the last one.
*/
if (((!!obj) && (obj.constructor === Object))) {
response = getDynamoExpression(obj[key[0]], key);
UpdateExpression = 'SET #' + key + '.' + response['UpdateExpression'].substring(4); //substring deletes 'SET ' for the mid level values.
ExpressionAttributeNames = {['#' + key]: key[0], ...response['ExpressionAttributeNames']};
ExpressionAttributeValues = response['ExpressionAttributeValues'];
} else {
UpdateExpression = 'SET = :' + k;
ExpressionAttributeValues = {
[':' + k]: obj
}
}
//removes trailing dot on the last level
if (UpdateExpression.indexOf(". ")) {
UpdateExpression = UpdateExpression.replace(". ", "");
}
return {UpdateExpression, ExpressionAttributeValues, ExpressionAttributeNames};
}
//you can try many levels.
const obj = {
level1: {
level2: {
level3: {
level4: 'value'
}
}
}
}
I had the same need.
Hope this code helps. You only need to invoke compose_update_expression_attr_name_values passing the dictionary containing the new values.
def compose_update_expression_attr_name_values(data: dict) -> (str, dict, dict):
""" Constructs UpdateExpression, ExpressionAttributeNames, and ExpressionAttributeValues for updating an entry of a DynamoDB table.
:param data: the dictionary of attribute_values to be updated
:return: a tuple (UpdateExpression: str, ExpressionAttributeNames: dict(str: str), ExpressionAttributeValues: dict(str: str))
"""
# prepare recursion input
expression_list = []
value_map = {}
name_map = {}
# navigate the dict and fill expressions and dictionaries
_rec_update_expression_attr_name_values(data, "", expression_list, name_map, value_map)
# compose update expression from single paths
expression = "SET " + ", ".join(expression_list)
return expression, name_map, value_map
def _rec_update_expression_attr_name_values(data: dict, path: str, expressions: list, attribute_names: dict,
attribute_values: dict):
""" Recursively navigates the input and inject contents into expressions, names, and attribute_values.
:param data: the data dictionary with updated data
:param path: the navigation path in the original data dictionary to this recursive call
:param expressions: the list of update expressions constructed so far
:param attribute_names: a map associating "expression attribute name identifiers" to their actual names in ``data``
:param attribute_values: a map associating "expression attribute value identifiers" to their actual values in ``data``
:return: None, since ``expressions``, ``attribute_names``, and ``attribute_values`` get updated during the recursion
"""
for k in data.keys():
# generate non-ambiguous identifiers
rdm = random.randrange(0, 1000)
attr_name = f"#k_{rdm}_{k}"
while attr_name in attribute_names.keys():
rdm = random.randrange(0, 1000)
attr_name = f"#k_{rdm}_{k}"
attribute_names[attr_name] = k
_path = f"{path}.{attr_name}"
# recursion
if isinstance(data[k], dict):
# recursive case
_rec_update_expression_attr_name_values(data[k], _path, expressions, attribute_names, attribute_values)
else:
# base case
attr_val = f":v_{rdm}_{k}"
attribute_values[attr_val] = data[k]
expression = f"{_path} = {attr_val}"
# remove the initial "."
expressions.append(expression[1:])

Python Recursively Maintain Keyed Depth

Input/Goal
My input data is an OrderedDict for which there can be a variable depth of nested OrderedDicts so I have opted to handle parsing this output recursively. The desired output is a csv with header.
Elaboration of Problem
My code below will work once I am able to correctly define field_name upon traversing back up a branch after completing all of a branch's leaves. (i.e. Type_1.Field_3.Data will incorrectly be called Type_1.Field_2.Field_3.Data).
Once the leaves on a branch have been exhausted, I want to remove the last .Field_x from the field_name so that a new (correct) one can be added for the following object.
Request for Help
Does anyone see where I can include this feature? Thanks,
...
Dependencies:
Code Snippet:
def get_soql_fields(soql):
soql_fields = re.search('(?<=select)(?s)(.*)(?=from)', soql) # get fields
soql_fields = re.sub(' ', '', soql_fields.group()) # remove extra spaces
fields = re.split(',|\n|\r', soql_fields) # split on commas and newlines
fields = [field for field in fields if field != ''] # remove empty strings
return fields
def parse_output(data, soql):
fields = get_soql_fields(soql)
header = fields
master = [header]
for record in data['records']: # for each 'record' in response
row = []
for obj, value in record.iteritems(): # for each obj in record
if isinstance(value, basestring): # if query base object has desired fields
if obj in fields:
row.append(value)
elif isinstance(value, dict): # traverse down into object
path = obj
row.append(_traverse_output(obj, value, fields, row, path))
master.append(row)
return master
def _traverse_output(obj, value, fields, row, path):
for f, v in value.iteritems(): # for each item in obj
if not isinstance(v, (dict, list, tuple)):
field_name = '{path}.{name}'.format(path=path, name=f) # TODO fix this to full field name
print('FName: {0}'.format(field_name))
if field_name in fields:
print('match')
row.append(v)
elif isinstance(v, dict): # it is a dict
path += '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, path)
Example Salesforce SOQL:
select
Type_1.Field_1,
Type_1.Field_2.Data,
Type_1.Field_3,
Type_1.Field_4,
Type_1.Field_5.Data_1.Data,
Type_1.Field_6,
Type_2.Field_1,
Type_2.Field_2
from
Obj_1
limit
1
;
Example Salesforce Output:
{
"records": [
{
"attributes": {
"type": "Obj_1",
"url": "<url>"
},
"Type_1": {
"attributes": {
"type": "Type_1",
"url": "<url>"
},
"Field_1": "<stuff>",
"Field_2": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data": "<data>"
},
"Field_3": "<data>",
"Field_4": "<data>",
"Field_5": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data_1": {
"attributes": {
"type": "Data_1",
"url": "<url>"
},
"Data": "<data>"
}
},
"Field_6": 1.0
},
"Type_2": {
"attributes": {
"type": "Type_2",
"url": "<url>"
},
"Field_1": "<data>",
"Field_2": "<data>"
}
}
]
}
I worked out a quick solution for this. I'll just note what I figured out, and append the code I wrote to the end.
Essentially your problem is that you keep trying to modify path in place, which isn't going to work. Instead do something like
new_path = path + '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, new_path)
A note about this: it will NOT necessarily result in a row where the values are in the same order as the header (i.e., if Type_1.Field_1 is in position 0 of the header list, then the value corresponding to it might not be).
The easy way to solve this (and handle csvs in general) is to use DictWriter from the csv module, then pass an empty dictionary to your first call where the keys will be the field names and the values will be their values.
Another way to solve the problem is to pre-populate your row list with None or empty strings, then use the list.index method to assign the value to the appropriate position.
I wrote an implementation of _traverse_output as examples for each, though they differ slightly from your code. They take an element of the 'records' list.
Dictionary Example
def _traverse_output_with_dict(record, fields, row_values, field_name=''):
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row_values[new_field_name] = value
else:
_traverse_output_with_dict(value, fields, row_values, new_field_name)
List Example
def _traverse_output_with_list(record, fields, row, field_name=''):
while len(row) < len(fields):
row.append('')
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row[fields.index(new_field_name)] = value
else:
_traverse_output_with_list(value, fields, row, new_field_name)

Categories