JSON Serialization of Custom Objects (Encoding and Decoding)

JSON Serialization of Custom Objects (Encoding and Decoding) - python

I've been searching the internet and couldn't find a simple example to encode and decode a custom object using JSON in python.
Let's say I have the following class:
class Test:
def __init__(self, name=None, grade=None):
self.name = name
self.grade = grade
and also have a list of Test objects:
t1 = Test("course1", 80)
t2 = Test("course2", 90)
list_of_tests = [t1, t2]
How can I serialize the class Test and the object list_of_tests
using JSON? I want to be able to write it to a file and read it from a file, using python.

To be honest the easiest thing to do here is to manually create a list of dictionaries from your objects. Then you can pass that directly to the JSON functions.
data = [{'name': x.name, 'grade': x.grade} for x in list_of_tests]
with open('output.json', 'w') as out:
json.dump(data, out)
and read it back:
with open('output.json') as inp:
data = json.load(inp)
list_of_tests = [Test(x['name'], x['grade']) for x in data]

You can control how an unrecognised object is serialised by dumps(default=converter_function). For it to be valid JSON you'd have to return a plain dict with the fields you want plus some tag field identifying that it is to be treated specially by loads.
Then have another converter function to reverse the process passed to loads() as object_hook.

Related

Azure Data Factory Python SDK - Convert PipelineResource to JSON

I am creating Azure Data Factory pipeline using Python SDK (azure.mgmt.datafactory.models.PipelineResource). I need to convert PipelineResource object to JSON file. Is it possible anyhow?
I tried json.loads(pipeline_object) , json.dumps(pipeline_object) but no luck.

I need to convert PipelineResource object to JSON file. Is it possible anyhow?
You can try the following code snippet as suggested by mccoyp:
You can add a default argument to json.dumps to make objects that are not JSON serializable into dict
import json
from azure.mgmt.datafactory.models import Activity, PipelineResource
activity = Activity(name="activity-name")
resource = PipelineResource(activities=[activity])
json_dict = json.dumps(resource, default=lambda obj: obj.__dict__)
print(json_dict)

you can use this.
# Create a copy activity
act_name = 'copyBlobtoBlob'
blob_source = BlobSource()
blob_sink = BlobSink()
dsin_ref = DatasetReference(reference_name=ds_name)
dsOut_ref = DatasetReference(reference_name=dsOut_name)
copy_activity = CopyActivity(name=act_name,inputs=[dsin_ref], outputs=[dsOut_ref], source=blob_source, sink=blob_sink)
#Create a pipeline with the copy activity
#Note1: To pass parameters to the pipeline, add them to the json string params_for_pipeline shown below in the format { “ParameterName1” : “ParameterValue1” } for each of the parameters needed in the pipeline.
#Note2: To pass parameters to a dataflow, create a pipeline parameter to hold the parameter name/value, and then consume the pipeline parameter in the dataflow parameter in the format #pipeline().parameters.parametername.
p_name = 'copyPipeline'
params_for_pipeline = {}
p_name = 'copyPipeline'
params_for_pipeline = {}
p_obj = PipelineResource(activities=[copy_activity], parameters=params_for_pipeline)
p = adf_client.pipelines.create_or_update(rg_name, df_name, p_name, p_obj)
print_item(p)

Creating namedtuple valid for differents parameters

I'm trying to figure it out a way to create a namedtuple with variable fields depending on the data you receive, in my case, I'm using the data from StatCounter and not on all the periods are the same browsers. I tried this way but it is a bit ugly and I'm sure there is a better way to achieve it.
def namedtuple_fixed(name: str, fields: List[str]) -> namedtuple:
"""Check the fields of the namedtuple and changes the invalid ones."""
fields_fixed: List[str] = []
for field in fields:
field = field.replace(" ", "_")
if field[0].isdigit():
field = f"n{field}"
fields_fixed.append(field)
return namedtuple(name, fields_fixed)
Records: namedtuple = namedtuple("empty_namedtuple", "")
def read_file(file: str) -> List["Records"]:
"""
Read the file with info about the percentage of use of various browsers
"""
global Records
with open(file, encoding="UTF-8") as browsers_file:
reader: Iterator[List[str]] = csv.reader(browsers_file)
field_names: List[str] = next(reader)
Records = namedtuple_fixed("Record", field_names)
result: List[Records] = [
Records(
*[
dt.datetime.strptime(n, "%Y-%m").date()
if record.index(n) == 0
else float(n)
for n in record
]
)
for record in reader
]
return result
The "namedtuple_fixed" function is to fix the names that have invalid identifiers.
Basically, I want to create a named tuple that receives a variable number of parameters, depending on the file you want to analyze. And if it's with type checking incorporated (I mean using NamedTuple from the typing module), much better.
Thanks in advance.

This solves my problem, but just partially
class Record(SimpleNamespace):
def __repr__(self):
items = [f"{key}={value!r}" for key, value in self.__dict__.items()]
return f"Record({', '.join(items)})"
Using the types.SimpleSpace documentation
And it can cause problems, like for example if you initiallize a Record like the following:
foo = Record(**{"a": 1, "3a": 2})
print(foo.a) # Ok
print(foo.3a) # Syntax Error

How to Parse YAML Using PyYAML if there are '!' within the YAML

I have a YAML file that I'd like to parse the description variable only; however, I know that the exclamation points in my CloudFormation template (YAML file) are giving PyYAML trouble.
I am receiving the following error:
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!Equals'
The file has many !Ref and !Equals. How can I ignore these constructors and get a specific variable I'm looking for -- in this case, the description variable.

If you have to deal with a YAML document with multiple different tags, and
are only interested in a subset of them, you should still
handle them all. If the elements you are intersted in are nested
within other tagged constructs you at least need to handle all of the "enclosing" tags
properly.
There is however no need to handle all of the tags individually, you
can write a constructor routine that can handle mappings, sequences
and scalars register that to PyYAML's SafeLoader using:
import yaml
inp = """\
MyEIP:
Type: !Join [ "::", [AWS, EC2, EIP] ]
Properties:
InstanceId: !Ref MyEC2Instance
"""
description = []
def any_constructor(loader, tag_suffix, node):
if isinstance(node, yaml.MappingNode):
return loader.construct_mapping(node)
if isinstance(node, yaml.SequenceNode):
return loader.construct_sequence(node)
return loader.construct_scalar(node)
yaml.add_multi_constructor('', any_constructor, Loader=yaml.SafeLoader)
data = yaml.safe_load(inp)
print(data)
which gives:
{'MyEIP': {'Type': ['::', ['AWS', 'EC2', 'EIP']], 'Properties': {'InstanceId': 'MyEC2Instance'}}}
(inp can also be a file opened for reading).
As you see above will also continue to work if an unexpected !Join tag shows up in your code,
as well as any other tag like !Equal. The tags are just dropped.
Since there are no variables in YAML, it is a bit of guesswork what
you mean by "like to parse the description variable only". If that has
an explicit tag (e.g. !Description), you can filter out the values by adding 2-3 lines
to the any_constructor, by matching the tag_suffix parameter.
if tag_suffix == u'!Description':
description.append(loader.construct_scalar(node))
It is however more likely that there is some key in a mapping that is a scalar description,
and that you are interested in the value associated with that key.
if isinstance(node, yaml.MappingNode):
d = loader.construct_mapping(node)
for k in d:
if k == 'description':
description.append(d[k])
return d
If you know the exact position in the data hierarchy, You can of
course also walk the data structure and extract anything you need
based on keys or list positions. Especially in that case you'd be better of
using my ruamel.yaml, was this can load tagged YAML in round-trip mode without
extra effort (assuming the above inp):
from ruamel.yaml import YAML
with YAML() as yaml:
data = yaml.load(inp)

You can define a custom constructors using a custom yaml.SafeLoader
import yaml
doc = '''
Conditions:
CreateNewSecurityGroup: !Equals [!Ref ExistingSecurityGroup, NONE]
'''
class Equals(object):
def __init__(self, data):
self.data = data
def __repr__(self):
return "Equals(%s)" % self.data
class Ref(object):
def __init__(self, data):
self.data = data
def __repr__(self):
return "Ref(%s)" % self.data
def create_equals(loader,node):
value = loader.construct_sequence(node)
return Equals(value)
def create_ref(loader,node):
value = loader.construct_scalar(node)
return Ref(value)
class Loader(yaml.SafeLoader):
pass
yaml.add_constructor(u'!Equals', create_equals, Loader)
yaml.add_constructor(u'!Ref', create_ref, Loader)
a = yaml.load(doc, Loader)
print(a)
Outputs:
{'Conditions': {'CreateNewSecurityGroup': Equals([Ref(ExistingSecurityGroup), 'NONE'])}}

django - combine the output of json views

I write a simple json api, I use one base class, and I mostly write one api view per one model class. What I want is to combine the output of few views into one url endpoint, with as least as possible additional code.
code:
# base class
class JsonView(View):
def get(self, request):
return JsonResponse(self.get_json())
def get_json(self):
return {}
class DerivedView(JsonView):
param = None
def get_json(self):
# .. use param..
return {'data': []}
urls.py:
url('/endpoint1', DerivedView.as_view(param=1))
url('/endpoint2', DerivedView2.as_view())
# What I want:
url('/combined', combine_json_views({
'output1': DerivedView.as_view(param=1),
'output2': DerivedView2.as_view()
}))
So /combined would give me the following json response:
{'output1': {'data': []}, 'output2': output of DerivedView2}
This is how combine_json_views could be implemented:
def combine_json_views(views_dict):
d = {}
for key, view in views_dict.items():
d[key] = view() # The problem is here
return json.dumps(d)
The problem is that calling view() give me the encoded json, so calling json.dumps again gives invalid json. I could call json.loads(view()), but that looks bad to decode the json that I just encoded.
How can I modify the code (maybe a better base class) here, while keeping it elegant and short? without adding too much code. Is there any way to access the data (dict) which is used to construct JsonResponse?

You can create a combined view that calls the get_json() methods and combines them:
class CombinedView(JsonView):
def get_json(self):
view1 = DerivedView(param=1)
view2 = DerivedView2()
d = view1.get_json()
d.update(view2.get_json())
return d
then:
url('/combined', CombinedView.as_view()),

ndb.Key filter for MapReduce input_reader

Playing with new Google App Engine MapReduce library filters for input_reader I would like to know how can I filter by ndb.Key.
I read this post and I've played with datetime, string, int, float, in filters tuples, but How I can filter by ndb.Key?
When I try to filter by a ndb.Key I get this error:
BadReaderParamsError: Expected Key, got u"Key('Clients', 406)"
Or this error:
TypeError: Key('Clients', 406) is not JSON serializable
I tried to pass a ndb.Key object and string representation of the ndb.Key.
Here are my two filters tuples:
Sample 1:
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client","=", ndb.Key('Clients', 406))]
}
Sample 2:
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client","=", "%s" % ndb.Key('Clients', 406))]
}

This is a bit tricky.
If you look at the code on Google Code you can see that mapreduce.model defines a JSON_DEFAULTS dict which determines the classes that get special-case handling in JSON serialization/deserialization: by default, just datetime. So, you can monkey-patch the ndb.Key class into there, and provide it with functions to do that serialization/deserialization - something like:
from mapreduce import model
def _JsonEncodeKey(o):
"""Json encode an ndb.Key object."""
return {'key_string': o.urlsafe()}
def _JsonDecodeKey(d):
"""Json decode a ndb.Key object."""
return ndb.Key(urlsafe=d['key_string'])
model.JSON_DEFAULTS[ndb.Key] = (_JsonEncodeKey, _JsonDecodeKey)
model._TYPE_IDS['Key'] = ndb.Key
You may also need to repeat those last two lines to patch mapreduce.lib.pipeline.util as well.
Also note if you do this, you'll need to ensure that this gets run on any instance that runs any part of a mapreduce: the easiest way to do this is to write a wrapper script that imports the above registration code, as well as mapreduce.main.APP, and override the mapreduce URL in your app.yaml to point to your wrapper.

Make your own input reader based on DatastoreInputReader, which knows how to decode key-based filters:
class DatastoreKeyInputReader(input_readers.DatastoreKeyInputReader):
"""Augment the base input reader to accommodate ReferenceProperty filters"""
def __init__(self, *args, **kwargs):
try:
filters = kwargs['filters']
decoded = []
for f in filters:
value = f[2]
if isinstance(value, list):
value = db.Key.from_path(*value)
decoded.append((f[0], f[1], value))
kwargs['filters'] = decoded
except KeyError:
pass
super(DatastoreKeyInputReader, self).__init__(*args, **kwargs)
Run this function on your filters before passing them in as options:
def encode_filters(filters):
if filters is not None:
encoded = []
for f in filters:
value = f[2]
if isinstance(value, db.Model):
value = value.key()
if isinstance(value, db.Key):
value = value.to_path()
entry = (f[0], f[1], value)
encoded.append(entry)
filters = encoded
return filters

Are you aware of the to_old_key() and from_old_key() methods?

I had the same problem and came up with a workaround with computed properties.
You can add to your Sales model a new ndb.ComputedProperty with the Key id. Ids are just strings, so you wont have any JSON problems.
client_id = ndb.ComputedProperty(lambda self: self.client.id())
And then add that condition to your mapreduce query filters
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client_id","=", '406']
}
The only drawback is that Computed properties are not indexed and stored until you call the put() parameter, so you will have to traverse all the Sales entities and save them:
for sale in Sales.query().fetch():
sale.put()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

JSON Serialization of Custom Objects (Encoding and Decoding) - python

Related

Azure Data Factory Python SDK - Convert PipelineResource to JSON

Creating namedtuple valid for differents parameters

How to Parse YAML Using PyYAML if there are '!' within the YAML

django - combine the output of json views

ndb.Key filter for MapReduce input_reader

Categories

Resources