Serializing python object to python source code - python

I have a python dictionary that I'd like to serialize into python source code required to initialize that dictionary's keys and values.
I'm looking for something like json.dumps(), but the output format should be Python not JSON.
My object is very simple: it's a dictionary whose values are one of the following:
built-in type literals (strings, ints, etc.)
lists of literals
I know I can build one myself, but I suspect there are corner cases involving nested objects, keyword escaping, etc. so I'd prefer to use an existing library with those kinks already worked out.

In the most general case, it's not possible to dump an arbitrary Python object into Python source code. E.g. if the object is a socket, recreating the very same socket in a new object cannot work.
As aix explains, for the simple case, repr is explicitly designed to make reproducable source representations. As a slight generalization, the pprint module allows selective customization through the PrettyPrinter class.
If you want it even more general, and if your only requirement is that you get executable Python source code, I recommend to pickle the object into a string, and then generate the
source code
obj = pickle.loads(%s)
where %s gets substituted with repr(pickle.dumps(obj)).

repr(d) where d is your dictionary could be a start (doesn't address all the issues that you mention though).

You can use yaml
http://pyyaml.org/
YAML doesn't serialise EXACTLY to python source code. But the yaml module can directly serialise and deserialise python objects. It's a superset of JSON.
What exactly are you trying to do?

Related

Saving/Storing pymatgen Structures

I'm currently dealing with a material science dataset having various information.
In particular, I have a column 'Structure' with several pymatgen.core.Structure objects.
I would like to save/store this dataset as .csv file or something similar but the problem is that after having done that and reopening, the pymatgen structures lose their type becoming just formatted strings and I cannot get back to their initial pymatgen.core.Structure data type.
Any hints on how to that? I'm searching on pymatgen documentation but haven't been lucky for now..
Thanks in advance!
From the docs:
Side-note : as_dict / from_dict
As you explore the code, you may
notice that many of the objects have an as_dict method and a from_dict
static method implemented. For most of the non-basic objects, we have
designed pymatgen such that it is easy to save objects for subsequent
use. While python does provide pickling functionality, pickle tends to
be extremely fragile with respect to code changes. Pymatgen’s as_dict
provide a means to save your work in a more robust manner, which also
has the added benefit of being more readable. The dict representation
is also particularly useful for entering such objects into certain
databases, such as MongoDb. This as_dict specification is provided in
the monty library, which is a general python supplementary library
arising from pymatgen.
The output from an as_dict method is always json/yaml serializable. So
if you want to save a structure, you may do the following:
with open('structure.json','w') as f:
json.dump(structure.as_dict(), f)
Similarly, to get the structure back from a json, you can do the following to restore the
structure (or any object with a as_dict method) from the json as
follows:
with open('structure.json', 'r') as f:
d = json.load(f)
structure = Structure.from_dict(d)
You may replace any of the above json commands with yaml in the PyYAML package to create a yaml
file instead. There are certain tradeoffs between the two choices.
JSON is much more efficient as a format, with extremely fast
read/write speed, but is much less readable. YAML is an order of
magnitude or more slower in terms of parsing, but is more human
readable.
See also https://pymatgen.org/usage.html#montyencoder-decoder and https://pymatgen.org/usage.html#reading-and-writing-structures-molecules
pymatgen.core.structure object can be stored with only some sort of fixed format, for example, cif, vasp, xyz... so maybe you, first, need to store your structure information to cif or vasp. and open it and preprocess to make it "csv" form with python command.(hint : using python string-related command).

Is there an alternative to pickle - save a dictionary (python)

I need to save a dictionary to a file, In the dictionary there are strings, integers, and dictionarys.
I did it by my own and it's not pretty and nice to user.
I know about pickle but as I know it is not safe to use it, because if someone replace the file and I (or someone else) will run the file that uses the replaced file, It will be running and might do some things. it's just not safe.
Is there another function or imported thing that does it.
Pickle is not safe when transfered by a untrusted 3rd party. Local files are just fine, and if something can replace files on your filesystem then you have a different problem.
That said, if your dictionary contains nothing but string keys and the values are nothing but Python lists, numbers, strings or other dictionaries, then use JSON, via the json module.
Presuming your dictionary contains only basic data types, the normal answer is json, it's a popular, well defined language for this kind of thing.
If your dictionary contains more complex data, you will have to manually serialise it at least part of the way.
JSON is not quite Python-way because of several reasons:
It can't wrap/unwrap all Python data types: there's no support for sets or tuples.
Not fast enough because it needs to deal with textual data and encodings.
Try to use sPickle instead.

Does PyMongo automatically BSON.encode all strings you insert?

I've heard people say that PyMongo automatically uses BSON format for everything you insert in the database. Is this true? Or do I still need to run BSON.encode manually?
The drivers will handle marshaling python builtin objects to their bson counterpart as part of the intermediate layer between you and the database. Ultimately, the data stored in mongodb is bson.
datetime objects will be saved properly, as will numerics, strings, lists. You do not need to specifically serialize them. A document object is a dictionary.
The only reason for manual encoding is when you want to give custom classes the ability to be stored, without having to break them down into builtin types. It's very much like any other serialization format (pickle, json, ...). They usually handle the built-ins fine, but need extra help for custom types.

Python CJSON encoding custom objects

I am updating an old project that used an old version of cjson to speed up its json encoding. It also has a custom class called JSONString (which sets a string to its 'value' property) that is used for communicating with the database.
It used to call cjson.encode((dict containing a JSONString), (custom encoding funct for JSONSTRING)) but the newer version of cjson has changed its parameters to only accepting one argument, and not exposing any other functions that could allow customizations of the encoding process. Encoding the dict without the custom encoder throws an EncodeError (object is not JSON encodable).
The options I have now are to either find out how to use custom encoders in cjson, modify the cjson source (trying to avoid patching libraries), or make it so the JSONString type inserted into the dict is converted to a string before the operation, but I am trying to avoid placing 'fixes' all over the code (compartmentalization and re-usability and all that). Modifying JSONString in some way so that the encoder takes the string value of it instead of throwing an exception would work too, but I don't know enough of python's quirks to do this. I can understand why cjson might not allow custom encoders (speed reasons) but if there is no way I might just have to find something else.
Any suggestions would be greatly appreciated.
Looking through my unanswered posts and remembered I never marked this as answered. Yavar's post did help; there is an enhanced version of cjson for python. It works well but has some interesting name collisions at times so be aware of that.
http://python.cx.hu/python-cjson/

Database-backed dictionary with arbitrary keys

The python shelf module only seems to be allow string keys. Is there a way to use arbitrary typed (e.g. numeric) keys? May be an sqlite-backed dictionary?
Thanks!
Why not convert your keys to strings? Numeric keys should be pretty easy to do this with.
You can serialize on the fly (via pickle or cPickle, like shelve.py does) every key, as well as every value. It's not really worth subclassing shelve.Shelf since you'd have to subclass almost every method -- for once, I'd instead recommend copying shelve.py into your own module and editing it to suit. That's basically like coding your new module from scratch but you get a working example to show you the structure and guidelines;-).
sqlite has no real advantage in a sufficiently general case (where the keys could be e.g. arbitrary tuples, of different arity and types for every entry) -- you're going to have to serialize the keys anyway to make them homogeneous. Still, nothing stops you from using sqlite, e.g. to keep several "generalized shelves" into a single file (different tables of the same sqlite DB) -- if you care about performance you should measure it each way, though.
I think you want to overload the [] operator. You can do it by defining the __getitem__ method.
I ended up subclassing the DbfilenameShelf from the shelve-module. I made a shelf which automatically converts non-string-keys into string-keys and returns them in original form when queried. It works well for Python's standard immutable objects: int, float, string, tuple, boolean.
It can be found in: https://github.com/North-Guard/simple_shelve

Categories