I am used to build rest apis with PHP and I make heavy usage of the JMC serializer. It basically lets me write annotations to class properties that define the name and type of variables, including nested classes and arrays. This lets me completely abstract away the json format and just work with classes which transparently serialize and deserialize to JSON. In combination with symfony validator, this approach is very simple to work with, but also very powerful.
Now to my question, I recently started adopting python for some projects and I would like to reimplement an API in python. I have searched the internet for a suitable equivalent to JMS serializer, but I didn't find one that has the same or simillar capabilities.
Would anyone be so kind to point me in the right direction? (either good library or recommend a different approach with equal or better efficiency)
What I need:
ability to serialize and deserialize object into JSON
define how object is serialized - names of JSON attributes and their data types
define a complex object graphs (ability to define a class as property type, which would than be mapped by its own definition)
ability to map dicts or arrays and types they contain
Thanks in advance
Marshmallow
I haven't used it yet, so I can't tell if it answers all you needs. Feedback welcome.
Related
I'm currently dealing with a material science dataset having various information.
In particular, I have a column 'Structure' with several pymatgen.core.Structure objects.
I would like to save/store this dataset as .csv file or something similar but the problem is that after having done that and reopening, the pymatgen structures lose their type becoming just formatted strings and I cannot get back to their initial pymatgen.core.Structure data type.
Any hints on how to that? I'm searching on pymatgen documentation but haven't been lucky for now..
Thanks in advance!
From the docs:
Side-note : as_dict / from_dict
As you explore the code, you may
notice that many of the objects have an as_dict method and a from_dict
static method implemented. For most of the non-basic objects, we have
designed pymatgen such that it is easy to save objects for subsequent
use. While python does provide pickling functionality, pickle tends to
be extremely fragile with respect to code changes. Pymatgen’s as_dict
provide a means to save your work in a more robust manner, which also
has the added benefit of being more readable. The dict representation
is also particularly useful for entering such objects into certain
databases, such as MongoDb. This as_dict specification is provided in
the monty library, which is a general python supplementary library
arising from pymatgen.
The output from an as_dict method is always json/yaml serializable. So
if you want to save a structure, you may do the following:
with open('structure.json','w') as f:
json.dump(structure.as_dict(), f)
Similarly, to get the structure back from a json, you can do the following to restore the
structure (or any object with a as_dict method) from the json as
follows:
with open('structure.json', 'r') as f:
d = json.load(f)
structure = Structure.from_dict(d)
You may replace any of the above json commands with yaml in the PyYAML package to create a yaml
file instead. There are certain tradeoffs between the two choices.
JSON is much more efficient as a format, with extremely fast
read/write speed, but is much less readable. YAML is an order of
magnitude or more slower in terms of parsing, but is more human
readable.
See also https://pymatgen.org/usage.html#montyencoder-decoder and https://pymatgen.org/usage.html#reading-and-writing-structures-molecules
pymatgen.core.structure object can be stored with only some sort of fixed format, for example, cif, vasp, xyz... so maybe you, first, need to store your structure information to cif or vasp. and open it and preprocess to make it "csv" form with python command.(hint : using python string-related command).
I am implementing a parser in python 2.7 for an xml-based file format. The format defines classes with fields of specific types.
What I want to do is define objects like this:
class DataType1(SomeCoolParent):
a = Integer()
b = Float()
c = String()
#and so on...
The resulting classes need to be able to be constructed from an XML document, and render back into XML (so that I can save and load files).
This looks a whole lot like the ORM libraries in SQLAlchemy or Django... except that there is no database, just XML. It's not clear to me how I would implement what I'm after with those libraries. I would basically want to rip out their backend database stuff and have my own conversion to XML and back again.
My current plan: set up a metaclass and a parent class (SomeCoolParent), define all the types (Integer, Float, String), and generally build a custom almost-ORM myself. I know how to do this, but it seems that someone else must have done/thought of this before, and maybe come up with a general-purpose solution...
Is there a "better" way to implement a hierarchy of fixed type structures with python objects, which allows or at least facilitates conversion to XML?
Caveats/Notes
As mentioned, I'm aware of SQLAlchemy and Django, but I'm not clear on how/whether I can do what I want with their ORM libraries. Answers that show how to do this with either of them definately count.
I'm definitely open to other methods, those two are just the ORM libraries I am aware of.
There a several XML Schema specifications out there. My knowledge is somewhat dated but XML Schema (W3C) - a.k.a. XSD and relax ng are the most popular with relax ng having a lower learning curve. With that in mind, a good ORM (or would that be OXM 'Object-XML Mapper) using these data description formats ought to be out there. Some candidates are
pyxb
generateDS
Stronger google foo may yield more options.
I have an object, an in-memory data provider, which contains a lot of data.
The dilemma I'm facing is how to get other objects to use this data without
passing them the data provider. I would like to keep objects and their methods, that use this data, as decoupled from the data provider as possible. But I'm also not able to extract the needed data and pass it to objects. I was thinking of creating some helper classes with very restricted functionality, wrap the data provider inside, and pass them around. It this a good idea or does anyone know better?
Refactoring the methods, so the needed data can be extracted from the data provider, before passing it to the methods is actually an option, but it would require a lot of work.
I'm developing a framework of sorts. I'm providing a base class, that will be subclassed by other developers to add behavior to the system. The instances of those classes will have attributes that my framework doesn't necessarily expect, except by inspecting those instances' __dict__. To make things even more interesting, some of those classes can be created dynamically, at any time.
I'd like some things to be handled by the framework, namely, I will need to persist those instances, display their attribute values to the user, and let her search/filter instances using those values.
I have to use a relational database. I know there are some decent python OO database out there, but unfortunately they're not an option in this case.
I'm not looking for a full-blown ORM too... and it may not even be an option, given that some of the classes can be created dynamically.
So, my question is, what state of a python instance do I need to serialize to ensure that I can deserialize it later on? Is it enough to look at __dict__, or are there other private attributes that I should be using?
Pickling the instances is not enough, because I'll need to unpickle them to search/filter the attribute values, and I'm afraid it's too much data to do it in-memory (instead of letting the database do it).
Just use an ORM. This is what they are for.
What you are proposing to do is create your own half-assed ORM on your own time. Save your time for your own code that does things, and use the effort other people put for free into solving this problem for you.
Note that all class creation in python is "dynamic" - this is not an issue, for, well, anything at all. In fact, if you are assembling classes programmatically, it is probably slightly easier with an ORM, because they provide reifications of fields.
In the worst case, if you really do need to store your objects in a fake nosql-type schema, you will still only have to write your own backend driver if you use an existing ORM, rather than coding the whole stack yourself. (As it happens, you're not the first person to face this - solutions exist. Goole "python orm store dynamically created models" and "sqlalchemy store dynamically created models")
Candidates include:
Django ORM
SQLAlchemy
Some others you can find by googling "Python ORM".
surfing on the web, reading about django dev best practices points to use pickled model fields with extreme caution.
But in a real life example, where would you use a PickledObjectField, to solve what specific problems?
We have a system of social-networks "backends" which do some generic stuff like "post message", "get status", "get friends" etc. The link between each backend class and user is django model, which keeps user, backend name and credentials. Now imagine how many auth systems are there: oauth, plain passwords, facebook's obscure js stuff etc. This is where JSONField shines, we keep all backend-specif auth data in a dictionary on this model, which is stored in db as json, we can put anything into it no problem.
You would use it to store... almost-arbitrary Python objects. In general there's little reason to use it; JSON is safer and more portable.
You can definitely substitute a PickledObjectField with JSON and some extra logic to create an object out of the JSON. At the end of the day, your use case, when considering to use a PickledObjectField or JSON+logic, is serializing a Python object into your database. If you can trust the data in the Python object, and know that it will always be serialize-able, you can reasonably use the PickledObjectField. In my case (I don't use django's ORM, but this should still apply), I have a couple different object types that can go into my PickledObjectField, and their definitions are constantly mutating. Rather than constantly updating my JSON parsing logic to create an object out of JSON values, I simply use a PickledObjectField to just store the different objects, and then later retrieve them in perfectly usable form (calling their functions). Caveat: If you store an object via PickledObjectField, then you change the object definition, and then you retrieve the object, the old object may have trouble fitting into the new object's definition (depending on what you changed).
The problems to be solved are the efficiency and the convenience of defining and handling a complex object consisting of many parts.
You can turn each part type into a Model and connect them via ForeignKeys.
Or you can turn each part type into a class, dictionary, list, tuple, enum or whathaveyou to your liking and use PickledObjectField to store and retrieve the whole beast in one step.
That approach makes sense if you will never manipulate parts individually, only the complex object as a whole.
Real life example
In my application there are RQdef objects that represent essentially a type with a certain basic structure (if you are curious what they mean, look here).
RQdefs consist of several Aspects and some fixed attributes.
Aspects consist of one or more Facets and some fixed attributes.
Facets consist of two or more Levels and some fixed attributes.
Levels consist of a few fixed attributes.
Overall, a typical RQdef will have about 20-40 parts.
An RQdef is always completely constructed in a single step before it is stored in the database and it is henceforth never modified, only read (but read frequently).
PickledObjectField is more convenient and much more efficient for this purpose than would be a set of four models and 20-40 objects for each RQdef.