Is there an alternative to pickle - save a dictionary (python) - python

I need to save a dictionary to a file, In the dictionary there are strings, integers, and dictionarys.
I did it by my own and it's not pretty and nice to user.
I know about pickle but as I know it is not safe to use it, because if someone replace the file and I (or someone else) will run the file that uses the replaced file, It will be running and might do some things. it's just not safe.
Is there another function or imported thing that does it.

Pickle is not safe when transfered by a untrusted 3rd party. Local files are just fine, and if something can replace files on your filesystem then you have a different problem.
That said, if your dictionary contains nothing but string keys and the values are nothing but Python lists, numbers, strings or other dictionaries, then use JSON, via the json module.

Presuming your dictionary contains only basic data types, the normal answer is json, it's a popular, well defined language for this kind of thing.
If your dictionary contains more complex data, you will have to manually serialise it at least part of the way.

JSON is not quite Python-way because of several reasons:
It can't wrap/unwrap all Python data types: there's no support for sets or tuples.
Not fast enough because it needs to deal with textual data and encodings.
Try to use sPickle instead.

Related

Saving/Storing pymatgen Structures

I'm currently dealing with a material science dataset having various information.
In particular, I have a column 'Structure' with several pymatgen.core.Structure objects.
I would like to save/store this dataset as .csv file or something similar but the problem is that after having done that and reopening, the pymatgen structures lose their type becoming just formatted strings and I cannot get back to their initial pymatgen.core.Structure data type.
Any hints on how to that? I'm searching on pymatgen documentation but haven't been lucky for now..
Thanks in advance!
From the docs:
Side-note : as_dict / from_dict
As you explore the code, you may
notice that many of the objects have an as_dict method and a from_dict
static method implemented. For most of the non-basic objects, we have
designed pymatgen such that it is easy to save objects for subsequent
use. While python does provide pickling functionality, pickle tends to
be extremely fragile with respect to code changes. Pymatgen’s as_dict
provide a means to save your work in a more robust manner, which also
has the added benefit of being more readable. The dict representation
is also particularly useful for entering such objects into certain
databases, such as MongoDb. This as_dict specification is provided in
the monty library, which is a general python supplementary library
arising from pymatgen.
The output from an as_dict method is always json/yaml serializable. So
if you want to save a structure, you may do the following:
with open('structure.json','w') as f:
json.dump(structure.as_dict(), f)
Similarly, to get the structure back from a json, you can do the following to restore the
structure (or any object with a as_dict method) from the json as
follows:
with open('structure.json', 'r') as f:
d = json.load(f)
structure = Structure.from_dict(d)
You may replace any of the above json commands with yaml in the PyYAML package to create a yaml
file instead. There are certain tradeoffs between the two choices.
JSON is much more efficient as a format, with extremely fast
read/write speed, but is much less readable. YAML is an order of
magnitude or more slower in terms of parsing, but is more human
readable.
See also https://pymatgen.org/usage.html#montyencoder-decoder and https://pymatgen.org/usage.html#reading-and-writing-structures-molecules
pymatgen.core.structure object can be stored with only some sort of fixed format, for example, cif, vasp, xyz... so maybe you, first, need to store your structure information to cif or vasp. and open it and preprocess to make it "csv" form with python command.(hint : using python string-related command).

Is using strings as an object identifier bad practice?

I am developing a small app for managing my favourite recipes. I have two classes - Ingredient and Recipe. A Recipe consists of Ingredients and some additional data (preparation, etc). The reason i have an Ingredient class is, that i want to save some additional info in it (proper technique, etc). Ingredients are unique, so there can not be two with the same name.
Currently i am holding all ingredients in a "big" dictionary, using the name of the ingredient as the key. This is useful, as i can ask my model, if an ingredient is already registered and use it (including all it's other data) for a newly created recipe.
But thinking back to when i started programming (Java/C++), i always read, that using strings as an identifier is bad practice. "The Magic String" was a keyword that i often read (But i think that describes another problem). I really like the string approach as it is right now. I don't have problems with encoding either, because all string generation/comparison is done within my program (Python3 uses UTF-8 everywhere if i am not mistaken), but i am not sure if what i am doing is the right way to do it.
Is using strings as an object identifier bad practice? Are there differences between different languages? Can strings prove to be an performance issue, if the amount of data increases? What are the alternatives?
No -
actually identifiers in Python are always strings. Whether you keep then in a dictionary yourself (you say you are using a "big dictionary") or the object is used programmaticaly, with a name hard-coded into the source code. In this later case, Python creates the name in one of its automaticaly handled internal dictionary (that can be inspected as the return of globals() or locals()).
Moreover, Python does not use "utf-8" internally, it does use "unicode" - which means it is simply text, and you should not worry how that text is represented in actual bytes.
Python relies on dictionaries for many of its core features. For that reason the pythonic default dict already comes with a quite effective, fast implementation "from factory", decent hash, etc.
Considering that, the dictionary performance itself should not be a concern for what you need (eventual calls to read and write on it), although the way you handle it / store it (in a python file, json, pickle, gzip, etc.) could impact load/access time, etc.
Maybe if you provide a few lines of code showing us how you deal with the dictionary we could provide specific details.
About the string identifier, check jsbueno's answer, he gave a much better explanation then I could do.

Serializing python object to python source code

I have a python dictionary that I'd like to serialize into python source code required to initialize that dictionary's keys and values.
I'm looking for something like json.dumps(), but the output format should be Python not JSON.
My object is very simple: it's a dictionary whose values are one of the following:
built-in type literals (strings, ints, etc.)
lists of literals
I know I can build one myself, but I suspect there are corner cases involving nested objects, keyword escaping, etc. so I'd prefer to use an existing library with those kinks already worked out.
In the most general case, it's not possible to dump an arbitrary Python object into Python source code. E.g. if the object is a socket, recreating the very same socket in a new object cannot work.
As aix explains, for the simple case, repr is explicitly designed to make reproducable source representations. As a slight generalization, the pprint module allows selective customization through the PrettyPrinter class.
If you want it even more general, and if your only requirement is that you get executable Python source code, I recommend to pickle the object into a string, and then generate the
source code
obj = pickle.loads(%s)
where %s gets substituted with repr(pickle.dumps(obj)).
repr(d) where d is your dictionary could be a start (doesn't address all the issues that you mention though).
You can use yaml
http://pyyaml.org/
YAML doesn't serialise EXACTLY to python source code. But the yaml module can directly serialise and deserialise python objects. It's a superset of JSON.
What exactly are you trying to do?

Database-backed dictionary with arbitrary keys

The python shelf module only seems to be allow string keys. Is there a way to use arbitrary typed (e.g. numeric) keys? May be an sqlite-backed dictionary?
Thanks!
Why not convert your keys to strings? Numeric keys should be pretty easy to do this with.
You can serialize on the fly (via pickle or cPickle, like shelve.py does) every key, as well as every value. It's not really worth subclassing shelve.Shelf since you'd have to subclass almost every method -- for once, I'd instead recommend copying shelve.py into your own module and editing it to suit. That's basically like coding your new module from scratch but you get a working example to show you the structure and guidelines;-).
sqlite has no real advantage in a sufficiently general case (where the keys could be e.g. arbitrary tuples, of different arity and types for every entry) -- you're going to have to serialize the keys anyway to make them homogeneous. Still, nothing stops you from using sqlite, e.g. to keep several "generalized shelves" into a single file (different tables of the same sqlite DB) -- if you care about performance you should measure it each way, though.
I think you want to overload the [] operator. You can do it by defining the __getitem__ method.
I ended up subclassing the DbfilenameShelf from the shelve-module. I made a shelf which automatically converts non-string-keys into string-keys and returns them in original form when queried. It works well for Python's standard immutable objects: int, float, string, tuple, boolean.
It can be found in: https://github.com/North-Guard/simple_shelve

Efficient method to store Python dictionary on disk?

What is the most efficient method to store a Python dictionary on the disk? The only methods I know of right now are plain-text and the pickle module.
Edit: Sorry for not being very clear. By efficient I meant fastest execution speed. The dictionary will contain mutable objects that will hold information to be parsed and modified.
shelve is pretty nice as well
or this persistent dictionary recipe
for a convenient method that keeps your objects synchronized with storage, there's the ORM SQLAlchemy for python
if you just need a way to store string values by some key, theres the dbm.ndbm and dbm.gnu modules.
if you need a hyper efficient, distributed key value cache, something like memcached for python...
JSON and YAML work well, also.
Depends on what you mean by "efficient"? Size of file? Time required? Amount of code you need to write?
You have the timeit module available to determine what meets your specific criteria for "efficient".

Categories