Unpack variable length list of dictionaries using Pydantic - python

I have a payload that is a list that will contain a variable number of dictionaries and I was hoping to be able to unpack that via a single call to a pydantic model.
Below is an example of what I am trying to accomplish, with two dictionaries:
#!/usr/bin/env python3.9
from pydantic import BaseModel
from typing import List, Dict
class RequestPayload(BaseModel):
site: str
class RequestPayload_unpack(BaseModel):
endpoints: List[Dict, RequestPayload]
payload = [
{
'site': 'nyc01',
},
{
'site': 'lax02',
},
]
request = RequestPayload_unpack(*payload)
print(request)
Running that I get the following error:
raise TypeError(f"Too {'many' if alen > elen else 'few'} parameters for {cls};"
TypeError: Too many parameters for typing.List; actual 2, expected 1
The workflow that I have working now (not using pydantic) parses/validates each dict in the list and then passes it along to the next step, which will perform additional lookups and then eventually do the work. So right now the model is that the entire RequestPayload is operated upon through the pipeline, versus each dict being treated as a distinct request.

Related

how to model simple nested data static structure instead of nested dictionaries (namedtuple, dataclass... )

I hope there is some best practice or design pattern for use case of having simple nested data/mapping with need to pre-store data, select proper data group, and conveniently access specific values. Example - install script parameters where install script will be given type of install (a.k.a label) and based on that proper folder to install and other params values should be evaluated
Sample as nested dict
install_params = {'prod':{'folder_name':'production','param_1' : value_a ... },
'dev':{'folder_name':'development','param_1' : value_b ... },
'test':{'folder_name':'test', 'param_1' : value_c ... }}
Nested dicts offers grouping of all data under parent dict install_params so all labels can be present to user with install_params.keys(), and (in this case) sole required functionality - retrieving proper values for given installation type/label (simply via install_params[user_selected_label] )
However having either numerous items in parent dict, or more nested levels, or more complex structures as values of params, this will be cumbersome to use and error prone.
Thought of rewriting it as either collection.namedtuple or typing.NamedTuple or #dataclass or just generic class but neither seems to offer convenient solution to the generic use case above.
e.g. named tuple offers dot notation to access fields, but I still have to group them all somehow and implement functionality of selecting proper instance based
prod = InstallType( 'prod', 'prod', 'value_a')
dev = InstallType( 'dev', 'development ', 'value_b')
test = InstallType( 'test', 'test ', 'value_c')
all_types = [prod,dev, test]
if type =='prod': # select prod named tuple
in the end I've started to think to create dictionary of named tuples or a single data class that would keep the data mapping private and init itself based on label input parameter. The later would nicely encapsulate all functionality and allow dot notation, but still uses ugly nested dicts inside
#dataclass(frozen=True)
class SomeDataClass:
label: str
folder_name: str = None
param_1: float = None
def __post__init__(self):
#dreaded nested_dict to populate self.folder_name and others based on submitted label
(...)
install_params = SomeDataClass(label=user_selection)
print(install_params.folder_name) # would work as expected since coded in SomeDataClass
This seemed promising, but SomeDataClass again produced further problems (How to expose all available install labels (prod, dev, test...) to user before instancing the class, how to presents fields as mandatory, but not requiring user to submit them, as they should be 'computed' based on submitted label. ( plain folder_name: str requires user to still submit it, while folder_name : str = None is confusing and suggest its optional and can be set to None ), etc... ) It also feels rather over-engineered solution for the simple nested dicts mappings

Dynamically creating get request query parameters based on Pydantic schema

I know that you can create get requests like this:
#router.get("/findStuff")
def get_stuff(a: int = None, b: str = None):
return {'a': a, 'b': b}
But is there a way to create the query parameters dynamically from, let's say, a Pydantic schema? I've tried this below and although it does seem to create the query parameters in the OpenAPI doc, it's unable to process them, returning a 422 (Unprocessable entity). Has anyone tried to do something similar? Being able to specify an object containing the query parameters lets me create get requests dynamically for any arbitrary object with primitive fields. I did this in Flask with webargs, but am not sure what I can do within FastApi.
class MySchema(BaseModel):
a: int = None
b: str = None
#router.get("/findStuff")
def get_stuff(inputs = Depends(MySchema)):
return inputs
This was unrealized user error. There was this end point with a path parameter
#router.get("/{id}")
def get_stuff_by_id(id: int):
return id
that appeared above the /findStuffs end point, so it got clobbered.
The solution was to just put the /findStuffs block above this block with the path parameter.

Django validate a python dictionary via DictField

I'm new to the django rest framework and have two basic questions about something I don't fully understand. I have a python function which takes as an input a dictionary and would like to make it available via an API.
In [24]: def f(d):
...: pass
...:
I'm testing a post request via postman sending a json file which should be translated to my python dictionary. The input I'm sending looks like
{
"test": {
"name1": {
"list": ["a",
"b",
"c"
],
"numbers": [0.0, 0.1]
}
}
}
As you can see this is a dict currently with one key name1 which itself is a dict with two keys, list and numbers.
The view set I wrote for this example looks like this:
class TestViewSet(viewsets.ViewSet):
def create(self, request):
serializer = TestSerializer(data=request.data)
print(type(request.data.get("test")))
if serializer.is_valid():
print(serializer.validated_data.get("test"))
f(serializer.validated_data.get("test"))
and my serializer looks like this:
class TestSerializer(serializers.Serializer):
test = serializers.DictField()
If I send the json input above I get the printed the following:
<type 'dict'>
None
So I have the following two questions:
As we can see the request.data.get("test") is already the desired type (dict). Why do we want to call the serializer and doing some validation. This might cast certain inputs to not standard python types. E.g. validating a decimal via the DecimalField returns a object of type Decimal which is a Django type not a standard python type. Do I need to recast this after calling the serializer or how do I know that this won't cause any trouble with function expecting for example native python float64 type?
How can I define the serializer in the above example for the dictionary so that I get returned the correct object and not None? This dictionary consists of a dictionary consisting of two keys, where one value is a list of decimals and the other a list of strings. How can I write a validation for this within the DictField?

Object schemas/models without ORM, DOM or forms

I've used MongoEngine a lot lately. Apart from the MongoDB integration, I like the idea of defining the structures of entities explicitly. Field definitions make code easier to understand. Also, using those definitions, I can validate objects to catch potential bugs or serialize/deserialize them more accurately.
The problem with MongoEngine is that it is designed specifically to work with a storage engine. The same applies for Django and SQLAlchemy models, which also lack list and set types. My question is, then, is there an object schema/model library for Python that does automated object validation and serialization, but not object-relational mapping or any other fancy stuff?
Let me give an example.
class Wheel(Entity):
radius = FloatField(1.0)
class Bicycle(Entity):
front = EntityField(Wheel)
back = EntityField(Wheel)
class Owner(Entity):
name = StringField()
bicycles = ListField(EntityField(Bicycle))
owner = Owner(name='Eser Aygün', bicycles=[])
bmx = Bicycle()
bmx.front = Wheel()
bmx.back = Wheel()
trek = Bicycle()
trek.front = Wheel(1.2)
trek.back = Wheel(1.2)
owner.bicycles.append(bmx)
owner.bicycles.append(trek)
owner.validate() # checks the structure recursively
Given the structure, it is also easy to serialize and deserialize objects. For example, owner.jsonify() may return the dictionary
{
'name': 'Eser Aygün',
'bicycles': [{
'front': {
radius: 1.0
},
'back': {
radius: 1.0
}
}, {
'front': {
radius: 1.2
},
'back': {
radius: 1.2
}
}],
}
and you can easily convert it back calling owner.dejsonify(dic).
If anyone is still looking, as python-entities hasn't been updated for a while, there are some good libraries out there:
schematics - https://schematics.readthedocs.org/en/latest/
colander - http://docs.pylonsproject.org/projects/colander/en/latest/
voluptuous - https://pypi.python.org/pypi/voluptuous (more of a validation library)
Check out mongopersist, which uses mongo as a persistence layer for Python objects like ZODB. It does not perform schema validation, but it lets you move objects between Mongo and Python transparently.
For validation or other serialization/deserialization scenarios (e.g. forms), consider colander. Colander project description:
Colander is useful as a system for validating and deserializing data obtained via XML, JSON, an HTML form post or any other equally simple data serialization. It runs on Python 2.6, 2.7 and 3.2. Colander can be used to:
Define a data schema.
Deserialize a data structure composed of strings, mappings, and lists into an arbitrary Python structure after validating the data structure against a data schema.
Serialize an arbitrary Python structure to a data structure composed of strings, mappings, and lists.
What you're describing can be achieved with remoteobjects, which contains a mechanism (called dataobject) to allow you to define the structure of an object such that it can be validated and it can be marshalled easily to and from JSON.
It also includes some functionality for building a REST client library that makes HTTP requests, but the use of this part is not required.
The main remoteobjects distribution does not come with specific StringField or IntegerField types, but it's easy enough to implement them. Here's an example BooleanField from a codebase I maintain that uses remoteobjects:
class BooleanField(dataobject.fields.Field):
def encode(self, value):
if value is not None and type(value) is not bool:
raise TypeError("Requires boolean")
return super(BooleanField, self).encode(value)
This can then be used in an object definition:
class ThingWithBoolean(dataobject.DataObject):
my_boolean = BooleanField()
And then:
thing = ThingWithBoolean.from_dict({"my_boolean":true})
thing.my_boolean = "hello"
return json.dumps(thing.to_dict()) # will fail because my_boolean is not a boolean
As I said earlier in a comment, I've decided to invent my own wheel. I started implementing an open-source Python library, Entities, that just does what I wanted. You can check it out from https://github.com/eseraygun/python-entities/.
The library supports recursive and non-recursive collection types (list, set and dict), nested entities and reference fields. It can automatically validate, serialize, deserialize and generate hashable keys for entities of any complexity. (In fact, de/serialization feature is not complete yet.)
This is how you use it:
from entities import *
class Account(Entity):
id = IntegerField(group=PRIMARY) # this field is in primary key group
iban = IntegerField(group=SECONDARY) # this is in secondary key group
balance = FloatField(default=0.0)
class Name(Entity):
first_name = StringField(group=SECONDARY)
last_name = StringField(group=SECONDARY)
class Customer(Entity):
id = IntegerField(group=PRIMARY)
name = EntityField(Name, group=SECONDARY)
accounts = ListField(ReferenceField(Account), default=list)
# Create Account objects.
a_1 = Account(1, 111, 10.0) # __init__() recognizes positional arguments
a_2 = Account(id=2, iban=222, balance=20.0) # as well as keyword arguments
# Generate hashable key using primary key.
print a_1.keyify() # prints '(1,)'
# Generate hashable key using secondary key.
print a_2.keyify(SECONDARY) # prints '(222,)'
# Create Customer object.
c = Customer(1, Name('eser', 'aygun'))
# Generate hashable key using primary key.
print c.keyify() # prints '(1,)'
# Generate hashable key using secondary key.
print c.keyify(SECONDARY) # prints '(('eser', 'aygun'),)'
# Try validating an invalid object.
c.accounts.append(123)
try:
c.validate() # fails
except ValidationError:
print 'accounts list is only for Account objects'
# Try validating a valid object.
c.accounts = [a_1, a_2]
c.validate() # succeeds

Subclassing python dictionaries

I am using ElasticSearch in my Python application and want to be able to create a reusable dictionary object that represents a query. The JSON structures are described here http://pulkitsinghal.blogspot.co.uk/2012/02/how-to-use-elasticsearch-query-dsl.html and I am using PyES to query the search server. With PyES we can pass a dict object which gets jsonified before sending to the server. I want to create a library of common queries where only the actual query term changes, so I thought I would subclass dict so I could pass in the query term via the constructor, for example, and when the dict gets jsonified I would end up with something like this:
{
"fields": [
"name",
"shortDescription",
"longDescription"
],
"query": {
"query_string": {
"fields": [
"name"
],
"query": query_term,
"use_dis_max": true
}
}
}
How would I do this? Is it true that only instance members get returned via __dict__ if so would I have to set up this data structure in the constructor? Is this the best way of doing this or should I create a class that does not extend dict and just create a to_dict() method that returns a dictionary in the correct structure?
Answer:
This seems to work fine, any suggestions for making this more 'pythonic' will be appreciated! (Yes I know there are no doc strings)
class StandardQuery(object):
search_fields = ['meta_keywords', 'meta_description', \
'fields.title.value', 'slug']
return_fields = ['slug', 'fields.title.value']
def __init__(self, query):
self.query = query
def to_dict(self):
output = dict()
output['fields'] = self.search_fields
output['query'] = {'query_string': {'fields': self.return_fields, \
'query': self.query, 'use_dis_max': True}}
return output
If you don't want all of the normal dict behaviour then you should definitely just create a separate class. You can then either give it a to_dict() method, or better since you really want to convert to json create a custom json encoder (and if required decoder) to use with the default argument of json.dumps().
In json.dump() and json.dumps() the optional argument default is a callable that should either return a serialized version of the object or raise TypeError (to get the default behaviour).

Categories