The python shelf module only seems to be allow string keys. Is there a way to use arbitrary typed (e.g. numeric) keys? May be an sqlite-backed dictionary?
Thanks!
Why not convert your keys to strings? Numeric keys should be pretty easy to do this with.
You can serialize on the fly (via pickle or cPickle, like shelve.py does) every key, as well as every value. It's not really worth subclassing shelve.Shelf since you'd have to subclass almost every method -- for once, I'd instead recommend copying shelve.py into your own module and editing it to suit. That's basically like coding your new module from scratch but you get a working example to show you the structure and guidelines;-).
sqlite has no real advantage in a sufficiently general case (where the keys could be e.g. arbitrary tuples, of different arity and types for every entry) -- you're going to have to serialize the keys anyway to make them homogeneous. Still, nothing stops you from using sqlite, e.g. to keep several "generalized shelves" into a single file (different tables of the same sqlite DB) -- if you care about performance you should measure it each way, though.
I think you want to overload the [] operator. You can do it by defining the __getitem__ method.
I ended up subclassing the DbfilenameShelf from the shelve-module. I made a shelf which automatically converts non-string-keys into string-keys and returns them in original form when queried. It works well for Python's standard immutable objects: int, float, string, tuple, boolean.
It can be found in: https://github.com/North-Guard/simple_shelve
Related
I am developing a small app for managing my favourite recipes. I have two classes - Ingredient and Recipe. A Recipe consists of Ingredients and some additional data (preparation, etc). The reason i have an Ingredient class is, that i want to save some additional info in it (proper technique, etc). Ingredients are unique, so there can not be two with the same name.
Currently i am holding all ingredients in a "big" dictionary, using the name of the ingredient as the key. This is useful, as i can ask my model, if an ingredient is already registered and use it (including all it's other data) for a newly created recipe.
But thinking back to when i started programming (Java/C++), i always read, that using strings as an identifier is bad practice. "The Magic String" was a keyword that i often read (But i think that describes another problem). I really like the string approach as it is right now. I don't have problems with encoding either, because all string generation/comparison is done within my program (Python3 uses UTF-8 everywhere if i am not mistaken), but i am not sure if what i am doing is the right way to do it.
Is using strings as an object identifier bad practice? Are there differences between different languages? Can strings prove to be an performance issue, if the amount of data increases? What are the alternatives?
No -
actually identifiers in Python are always strings. Whether you keep then in a dictionary yourself (you say you are using a "big dictionary") or the object is used programmaticaly, with a name hard-coded into the source code. In this later case, Python creates the name in one of its automaticaly handled internal dictionary (that can be inspected as the return of globals() or locals()).
Moreover, Python does not use "utf-8" internally, it does use "unicode" - which means it is simply text, and you should not worry how that text is represented in actual bytes.
Python relies on dictionaries for many of its core features. For that reason the pythonic default dict already comes with a quite effective, fast implementation "from factory", decent hash, etc.
Considering that, the dictionary performance itself should not be a concern for what you need (eventual calls to read and write on it), although the way you handle it / store it (in a python file, json, pickle, gzip, etc.) could impact load/access time, etc.
Maybe if you provide a few lines of code showing us how you deal with the dictionary we could provide specific details.
About the string identifier, check jsbueno's answer, he gave a much better explanation then I could do.
As I'm trying to solve "Not JSON serializable" problems for the last couple of hours, I'm very interested in what the hard part is while serializing and deserializing an instance.
Why a class instance can not be serialized - for example - with JSON?
To serialize:
Note the class name (in order to rebuild the object)
Note the variable values at the time of packaging.
Convert it to string.
Optionally compress it (as msgpack does)
To deserialize:
Create a new instance
Assign known values to appropriate variables
Return the object.
What is difficult? What is complex data type?
The "hard" part is mainly step 3 of your serialization, converting the contained values to strings (and later back during deserialization)
For simple types like numbers, strings, booleans, it's quite straight forward, but for complex types like a socket connected to a remote server or an open file descriptor, it won't work very well.
The solution is usually to either move the complex types from the types you want to serialize and keep the serialized types very clean, or somehow tag or otherwise tell the serializers exactly which properties should be serialized, and which should not.
I need to save a dictionary to a file, In the dictionary there are strings, integers, and dictionarys.
I did it by my own and it's not pretty and nice to user.
I know about pickle but as I know it is not safe to use it, because if someone replace the file and I (or someone else) will run the file that uses the replaced file, It will be running and might do some things. it's just not safe.
Is there another function or imported thing that does it.
Pickle is not safe when transfered by a untrusted 3rd party. Local files are just fine, and if something can replace files on your filesystem then you have a different problem.
That said, if your dictionary contains nothing but string keys and the values are nothing but Python lists, numbers, strings or other dictionaries, then use JSON, via the json module.
Presuming your dictionary contains only basic data types, the normal answer is json, it's a popular, well defined language for this kind of thing.
If your dictionary contains more complex data, you will have to manually serialise it at least part of the way.
JSON is not quite Python-way because of several reasons:
It can't wrap/unwrap all Python data types: there's no support for sets or tuples.
Not fast enough because it needs to deal with textual data and encodings.
Try to use sPickle instead.
surfing on the web, reading about django dev best practices points to use pickled model fields with extreme caution.
But in a real life example, where would you use a PickledObjectField, to solve what specific problems?
We have a system of social-networks "backends" which do some generic stuff like "post message", "get status", "get friends" etc. The link between each backend class and user is django model, which keeps user, backend name and credentials. Now imagine how many auth systems are there: oauth, plain passwords, facebook's obscure js stuff etc. This is where JSONField shines, we keep all backend-specif auth data in a dictionary on this model, which is stored in db as json, we can put anything into it no problem.
You would use it to store... almost-arbitrary Python objects. In general there's little reason to use it; JSON is safer and more portable.
You can definitely substitute a PickledObjectField with JSON and some extra logic to create an object out of the JSON. At the end of the day, your use case, when considering to use a PickledObjectField or JSON+logic, is serializing a Python object into your database. If you can trust the data in the Python object, and know that it will always be serialize-able, you can reasonably use the PickledObjectField. In my case (I don't use django's ORM, but this should still apply), I have a couple different object types that can go into my PickledObjectField, and their definitions are constantly mutating. Rather than constantly updating my JSON parsing logic to create an object out of JSON values, I simply use a PickledObjectField to just store the different objects, and then later retrieve them in perfectly usable form (calling their functions). Caveat: If you store an object via PickledObjectField, then you change the object definition, and then you retrieve the object, the old object may have trouble fitting into the new object's definition (depending on what you changed).
The problems to be solved are the efficiency and the convenience of defining and handling a complex object consisting of many parts.
You can turn each part type into a Model and connect them via ForeignKeys.
Or you can turn each part type into a class, dictionary, list, tuple, enum or whathaveyou to your liking and use PickledObjectField to store and retrieve the whole beast in one step.
That approach makes sense if you will never manipulate parts individually, only the complex object as a whole.
Real life example
In my application there are RQdef objects that represent essentially a type with a certain basic structure (if you are curious what they mean, look here).
RQdefs consist of several Aspects and some fixed attributes.
Aspects consist of one or more Facets and some fixed attributes.
Facets consist of two or more Levels and some fixed attributes.
Levels consist of a few fixed attributes.
Overall, a typical RQdef will have about 20-40 parts.
An RQdef is always completely constructed in a single step before it is stored in the database and it is henceforth never modified, only read (but read frequently).
PickledObjectField is more convenient and much more efficient for this purpose than would be a set of four models and 20-40 objects for each RQdef.
What is the most efficient method to store a Python dictionary on the disk? The only methods I know of right now are plain-text and the pickle module.
Edit: Sorry for not being very clear. By efficient I meant fastest execution speed. The dictionary will contain mutable objects that will hold information to be parsed and modified.
shelve is pretty nice as well
or this persistent dictionary recipe
for a convenient method that keeps your objects synchronized with storage, there's the ORM SQLAlchemy for python
if you just need a way to store string values by some key, theres the dbm.ndbm and dbm.gnu modules.
if you need a hyper efficient, distributed key value cache, something like memcached for python...
JSON and YAML work well, also.
Depends on what you mean by "efficient"? Size of file? Time required? Amount of code you need to write?
You have the timeit module available to determine what meets your specific criteria for "efficient".