I have a use case where I am reading some data from an API call, but need to transform the data before inserting it into a database. The data comes in a integer format, and I need to save it as a string. The database does not offer a datatype conversion, so the conversion needs to happen in Python before inserting.
Within a config file I have like:
config = {"convert_fields": ["payment", "cash_flow"], "type": "str"}
Then within python I am using the eval() function to check what type to convert the fields to.
So the code ends up being like data['field'] = eval(config['type'])(data['field'])
Does anyone have a better suggestion how I can dynamically change these values, maybe without storing the python class type within a config file.
To add, like sure I could just do str(), but there may be a need to have other fields to convert at some point, which are not string. So I want it to be dynamic, from whatever is defined in the config file for the required conversion fields.
How about using getattr() and __builtins__ that I feel is a little better than exec()/eval() in this instance.
def cast_by_name(type_name, value):
return getattr(__builtins__, type_name)(value)
print(cast_by_name("bool", 1))
Should spit back:
True
You will likely want to include some support for exceptions and perhaps defaults but this should get you started.
#mistermiyagi Points out a critical flaw that of course eval is a bulitin as well. We might want to limit this to safe types:
def cast_by_name(type_name, value):
trusted_types = ["int", "float", "complex", "bool", "str"] ## others as needed
if type_name in trusted_types:
return getattr(__builtins__, type_name)(value)
return value
print(cast_by_name("bool", 1))
Build up a conversion lookup dictionary in advance.
Faster
Easier to debug
config = {"convert_fields":
{"payment" : "str", "cash_flow" : "str", "customer_id" : "int", "name" : "name_it"}
}
def name_it(s : str):
return s.capitalize()
data_in = dict(
payment = 101.00,
customer_id = 3,
cash_flow = 1,
name = "bill",
city = "london"
)
convert_functions = {
#support builtins and custom functions
fieldname : globals().get(funcname) or getattr(__builtins__, funcname)
for fieldname, funcname in config["convert_fields"].items()
if not funcname in {"eval"}
}
print(f"{convert_functions=}")
data_db = {
fieldname :
#if no conversion is specified, use `str`
convert_functions.get(fieldname, str)(value)
for fieldname, value in data_in.items()
}
print(f"{data_db=}")
Output:
convert_functions={'payment': <class 'str'>, 'cash_flow': <class 'str'>, 'customer_id': <class 'int'>, 'name': <function name_it at 0x10f0fbe20>}
data_db={'payment': '101.0', 'customer_id': 3, 'cash_flow': '1', 'name': 'Bill', 'city': 'london'}
if the config could be stored in code, rather than a json-type approach, I'd look into Pydantic though that is not exactly your problem space here:
from pydantic import BaseModel
class Data_DB(BaseModel):
payment : str
customer_id : int
cash_flow : str
#you'd need a custom validator to handle capitalization
name : str
city : str
pydata = Data_DB(**data_in)
print(f"{pydata=}")
print(pydata.dict())
output:
pydata=Data_DB(payment='101.0', customer_id=3, cash_flow='1', name='bill', city='london')
{'payment': '101.0', 'customer_id': 3, 'cash_flow': '1', 'name': 'bill', 'city': 'london'}
Related
I am looking at moving to cattrs / attrs from a completely manual process of typing out all my classes but need some help understanding how to achieve the following.
This is a single example but the data returned will be varied and sometimes not with all the fields populated.
data = {
"data": [
{
"broadcaster_id": "123",
"broadcaster_login": "Sam",
"language": "en",
"subscriber_id": "1234",
"subscriber_login": "Dave",
"moderator_id": "12345",
"moderator_login": "Tom",
"delay": "0",
"title": "Weekend Events"
}
]
}
#attrs.define
class PartialUser:
id: int
login: str
#attrs.define
class Info:
language: str
title: str
delay: int
broadcaster: PartialUser
subscriber: PartialUser
moderator: PartialUser
So I understand how you would construct this and it works perfectly fine with 1:1 mappings, as expected, but how would you create the PartialUser objects dynamically since the names are not identical to the JSON response from the API?
instance = cattrs.structure(data["data"][0], Info)
Is there some trick to using a converter?
This would need to be done for around 70 classes which is why I thought maybe cattrs could modernise and simplify what I'm trying to do.
thanks
Here's one possible solution.
This is the strategy: we will customize the structuring hook by wrapping it. The default hook expects the keys in the input dictionary to match the structure of the class, but here this is not the case. So we'll substitute our own structuring hook that does a little preprocessing and then calls into the default hook.
The default hook for an attrs class cls can be retrieved like this:
from cattrs import Converter
from cattrs.gen import make_dict_structure_fn
c = Converter()
handler = make_dict_structure_fn(cls, c)
Knowing this, we can implement a helper function thusly:
def group_by_prefix(cls: type, c: Converter, *prefixes: str) -> None:
handler = make_dict_structure_fn(cls, c)
def prefix_grouping_hook(val: dict[str, Any], _) -> Any:
by_prefix = {}
for key in val:
if "_" in key and (prefix := (parts := key.split("_", 1))[0]) in prefixes:
by_prefix.setdefault(prefix, {})[parts[1]] = val[key]
return handler(val | by_prefix, _)
c.register_structure_hook(cls, prefix_grouping_hook)
This function takes an attrs class cls, a converter, and a list of prefixes. Then it creates a hook and registers it with the converter for the class cls. Inside, it does a little bit of preprocessing to beat the data into the shape cattrs expects.
Here's how you'd use it for the Info class:
>>> c = Converter()
>>> group_by_prefix(Info, c, "broadcaster", "subscriber", "moderator")
>>> print(c.structure(data["data"][0], Info))
Info(language='en', title='Weekend Events', delay=0, broadcaster=PartialUser(id=123, login='Sam'), subscriber=PartialUser(id=1234, login='Dave'), moderator=PartialUser(id=12345, login='Tom'))
You can use this approach to make the solution more elaborate as needed.
I have a dictionary like the following:
OAUTH2_PROVIDER = {
'SCOPES': {
'read': 'Read scope',
'write': 'Write scope',
'userinfo': 'Access to user info',
'full-userinfo': 'Access to full user info',
},
'DEFAULT_SCOPES': {
'userinfo'
},
'ALLOWED_REDIRECT_URI_SCHEMES': ['http', 'https', 'rutube'],
'PKCE_REQUIRED': import_string('tools.oauth2.is_pkce_required'),
'OAUTH2_VALIDATOR_CLASS': 'oauth2.validator.OAuth2WithJwtValidator',
'REFRESH_TOKEN_EXPIRE_SECONDS': 30 * 24 * 60 * 60,
"ACCESS_TOKEN_EXPIRE_SECONDS": 3600,
}
And I want to annotate the following key with integer type to check that it's always integer:
'REFRESH_TOKEN_EXPIRE_SECONDS': 30 * 24 * 60 * 60,
as Integer. In python 3.6 we don't have TypedDict. What may I replace it with?
Inside of dictionaries object types are preserved. You can set the type of the value inside of the dictionary (as this value is actually its own object with its own type).
In your example, you can set the type of the value as an int with:
OAUTH2_PROVIDER['REFRESH_TOKEN_EXPIRE_SECONDS']=int(
OAUTH2_PROVIDER['REFRESH_TOKEN_EXPIRE_SECONDS']
)
print(type(OAUTH2_PROVIDER['REFRESH_TOKEN_EXPIRE_SECONDS'])) #=> <class 'int'>
EDIT:
Getting to OP's core issue of removing nested if statements here:
You can simply add a single if statement that forces a int class or raises an error:
REFRESH_TOKEN_EXPIRE_SECONDS = oauth2_settings.REFRESH_TOKEN_EXPIRE_SECONDS
if not isinstance(REFRESH_TOKEN_EXPIRE_SECONDS, int):
REFRESH_TOKEN_EXPIRE_SECONDS = timedelta(seconds=REFRESH_TOKEN_EXPIRE_SECONDS)
else:
e = "REFRESH_TOKEN_EXPIRE_SECONDS must be an int"
raise ImproperlyConfigured(e)
refresh_expire_at = now - REFRESH_TOKEN_EXPIRE_SECONDS
I'm trying to convert a dictionary to bytes but facing issues in converting it to a correct format.
First, I'm trying to map an dictionary with an custom schema. Schema is defined as follows -
class User:
def __init__(self, name=None, code=None):
self.name = name
self.code = code
class UserSchema:
name = fields.Str()
code = fields.Str()
#post_load
def create_userself, data):
return User(**data)
My Dictionary structure is as follows-
user_dict = {'name': 'dinesh', 'code': 'dr-01'}
I'm trying to map the dictionary to User schema with the below code
schema = UserSchema(partial=True)
user = schema.loads(user_dict).data
While doing, schema.loads expects the input to be str, bytes or bytearray. Below are the steps that I followed to convert dictionary to Bytes
import json
user_encode_data = json.dumps(user_dict).encode('utf-8')
print(user_encode_data)
Output:
b'{"name ": "dinesh", "code ": "dr-01"}
If I try to map with the schema I'm not getting the required schema object. But, if I have the output in the format given below I can able to get the correct schema object.
b'{\n "name": "dinesh",\n "code": "dr-01"}\n'
Any suggestions how can I convert a dictionary to Bytes?
You can use indent option in json.dumps() to obtain \n symbols:
import json
user_dict = {'name': 'dinesh', 'code': 'dr-01'}
user_encode_data = json.dumps(user_dict, indent=2).encode('utf-8')
print(user_encode_data)
Output:
b'{\n "name": "dinesh",\n "code": "dr-01"\n}'
You can use Base64 library to convert string dictionary to bytes, and although you can convert bytes result to a dictionary using json library. Try this below sample code.
import base64
import json
input_dict = {'var1' : 0, 'var2' : 'some string', 'var1' : ['listitem1','listitem2',5]}
message = str(input_dict)
ascii_message = message.encode('ascii')
output_byte = base64.b64encode(ascii_message)
msg_bytes = base64.b64decode(output_byte)
ascii_msg = msg_bytes.decode('ascii')
# Json library convert stirng dictionary to real dictionary type.
# Double quotes is standard format for json
ascii_msg = ascii_msg.replace("'", "\"")
output_dict = json.loads(ascii_msg) # convert string dictionary to dict format
# Show the input and output
print("input_dict:", input_dict, type(input_dict))
print()
print("base64:", output_byte, type(output_byte))
print()
print("output_dict:", output_dict, type(output_dict))
Output:
>>> print("input_dict:", input_dict, type(input_dict))
input_dict: {'var1': ['listitem1', 'listitem2', 5], 'var2': 'some string'} <class 'dict'>
>>> print()
>>> print("base64:", output_byte, type(output_byte))
base64: b'eyd2YXIxJzogWydsaXN0aXRlbTEnLCAnbGlzdGl0ZW0yJywgNV0sICd2YXIyJzogJ3NvbWUgc3RyaW5nJ30=' <class 'bytes'>
>>> print()
>>> print("output_dict:", output_dict, type(output_dict))
output_dict: {'var1': ['listitem1', 'listitem2', 5], 'var2': 'some string'} <class 'dict'>
I am writing a script to query an ArcGIS rest service and return records. I want to use {} and .format to allow a dictionary item to be changed a time. How do I write this:
time = '2016-10-06 19:18:00'
URL = 'http://XXXXXXXXX.gov/arcgis/rest/services/AGO_Street/StreetMaint_ServReqs/FeatureServer/10/query'
params = {'f': 'pjson', 'where': "CLOSE_DATE > '{}'", 'outfields' : 'OBJECTID, REPORTED_DATE, SUMMARY, ADDRESS1, REQUEST_STATUS, CLOSE_DATE, INCIDENT_NUMBER', 'returnGeometry' : 'false'}.format(time)
req = urllib2.Request(URL, urllib.urlencode(params))
if I use this for param it will work
params = {'f': 'pjson', 'where': "CLOSE_DATE > '2016-10-06 19:18:00'", 'outfields' : 'OBJECTID, REPORTED_DATE, SUMMARY, ADDRESS1, REQUEST_STATUS, CLOSE_DATE, INCIDENT_NUMBER', 'returnGeometry' : 'false'}
What is the proper python formatting to do this?
str.format is a string method, not a method on a dictionary. Just apply the method to that one string value:
params = {
'f': 'pjson',
'where': "CLOSE_DATE > '{}'".format(time),
'outfields' : 'OBJECTID, REPORTED_DATE, SUMMARY, ADDRESS1, REQUEST_STATUS, CLOSE_DATE, INCIDENT_NUMBER',
'returnGeometry' : 'false'
}
Each of the key and value parts in a dictionary definition is just another expression, you are free to use any valid Python expression to produce the value, including calling methods on the string and using the result as the value.
Try this:
'where': "CLOSE_DATE > '{}'".format(time)
Is it possible to validate list using marshmallow?
class SimpleListInput(Schema):
items = fields.List(fields.String(), required=True)
# expected invalid type error
data, errors = SimpleListInput().load({'some': 'value'})
# should be ok
data, errors = SimpleListInput().load(['some', 'value'])
Or it is expected to validate only objects?
To validate top-level lists, you need to instantiate your list item schema with many=True argument.
Example:
class UserSchema(marshmallow.Schema):
name = marshmallow.fields.String()
data, errors = UserSchema(many=True).load([
{'name': 'John Doe'},
{'name': 'Jane Doe'}
])
But it still needs to be an object schema, Marshmallow does not support using top-level non-object lists. In case you need to validate top-level list of non-object types, a workaround would be to define a schema with one List field of your types and just wrap payload as if it was an object:
class SimpleListInput(marshmallow.Schema):
items = marshmallow.fields.List(marshmallow.fields.String(), required=True)
payload = ['foo', 'bar']
data, errors = SimpleListInput().load({'items': payload})
SimpleListInput is a class with a property "items". The property "items" is who accepts a list of strings.
>>> data, errors = SimpleListInput().load({'items':['some', 'value']})
>>> print data, errors
{'items': [u'some', u'value']}
{}
>>> data, errors = SimpleListInput().load({'items':[]})
>>> print data, errors
{'items': []}
{}
>>> data, errors = SimpleListInput().load({})
>>> print data, errors
{}
{'items': [u'Missing data for required field.']}
If you want a custom validate, for example, not accept an empty list in "items":
from marshmallow import fields, Schema, validates, ValidationError
class SimpleListInput(Schema):
items = fields.List(fields.String(), required=True)
#validates('items')
def validate_length(self, value):
if len(value) < 1:
raise ValidationError('Quantity must be greater than 0.')
Then...
>>> data, errors = SimpleListInput().load({'items':[]})
>>> print data, errors
{'items': []}
{'items': ['Quantity must be greater than 0.']}
Take a look at Validation
UPDATE:
As #Turn commented below. You can do this:
from marshmallow import fields, Schema, validate
class SimpleListInput(Schema):
items = fields.List(fields.String(), required=True, validate=validate.Length(min=1))
Please take a look at a little library written by me which tries to solve exactly this problem: https://github.com/and-semakin/marshmallow-toplevel.
Installation:
pip install marshmallow-toplevel
Usage (on the example from Maxim Kulkin):
import marshmallow
from marshmallow_toplevel import TopLevelSchema
class SimpleListInput(TopLevelSchema):
_toplevel = marshmallow.fields.List(
marshmallow.fields.String(),
required=True,
validate=marshmallow.validate.Length(1, 10)
)
# raises ValidationError, because:
# Length must be between 1 and 10.
SimpleListInput().load([])
# raises ValidationError, because:
# Length must be between 1 and 10.
SimpleListInput().load(["qwe" for _ in range(11)])
# successfully loads data
payload = ["foo", "bar"]
data = SimpleListInput().load(payload)
assert data == ["foo", "bar"]
Of course it can be used with more complex schemas than just string in example.
It is possible to use a Field type directly [documentation]:
simple_list_input = fields.List(fields.String(), required=True)
# ValidationError: Not a valid list.
simple_list_input.deserialize({'some': 'value'})
# ValidationError: {0: ['Not a valid string.']}
simple_list_input.deserialize([1, 'value'])
# Returns: ['some', 'value'] with no errors
simple_list_input.deserialize(['some', 'value'])
In comparison with Schema:
deserialize == load
serialize == dump