Examples of use for PickledObjectField (django-picklefield)?

Examples of use for PickledObjectField (django-picklefield)? - python

surfing on the web, reading about django dev best practices points to use pickled model fields with extreme caution.
But in a real life example, where would you use a PickledObjectField, to solve what specific problems?

We have a system of social-networks "backends" which do some generic stuff like "post message", "get status", "get friends" etc. The link between each backend class and user is django model, which keeps user, backend name and credentials. Now imagine how many auth systems are there: oauth, plain passwords, facebook's obscure js stuff etc. This is where JSONField shines, we keep all backend-specif auth data in a dictionary on this model, which is stored in db as json, we can put anything into it no problem.

You would use it to store... almost-arbitrary Python objects. In general there's little reason to use it; JSON is safer and more portable.

You can definitely substitute a PickledObjectField with JSON and some extra logic to create an object out of the JSON. At the end of the day, your use case, when considering to use a PickledObjectField or JSON+logic, is serializing a Python object into your database. If you can trust the data in the Python object, and know that it will always be serialize-able, you can reasonably use the PickledObjectField. In my case (I don't use django's ORM, but this should still apply), I have a couple different object types that can go into my PickledObjectField, and their definitions are constantly mutating. Rather than constantly updating my JSON parsing logic to create an object out of JSON values, I simply use a PickledObjectField to just store the different objects, and then later retrieve them in perfectly usable form (calling their functions). Caveat: If you store an object via PickledObjectField, then you change the object definition, and then you retrieve the object, the old object may have trouble fitting into the new object's definition (depending on what you changed).

The problems to be solved are the efficiency and the convenience of defining and handling a complex object consisting of many parts.
You can turn each part type into a Model and connect them via ForeignKeys.
Or you can turn each part type into a class, dictionary, list, tuple, enum or whathaveyou to your liking and use PickledObjectField to store and retrieve the whole beast in one step.
That approach makes sense if you will never manipulate parts individually, only the complex object as a whole.
Real life example
In my application there are RQdef objects that represent essentially a type with a certain basic structure (if you are curious what they mean, look here).
RQdefs consist of several Aspects and some fixed attributes.
Aspects consist of one or more Facets and some fixed attributes.
Facets consist of two or more Levels and some fixed attributes.
Levels consist of a few fixed attributes.
Overall, a typical RQdef will have about 20-40 parts.
An RQdef is always completely constructed in a single step before it is stored in the database and it is henceforth never modified, only read (but read frequently).
PickledObjectField is more convenient and much more efficient for this purpose than would be a set of four models and 20-40 objects for each RQdef.

Related

Store a Python exception in a Django model

What is the recommended way to store a Python exception – in a structured way that allows access to the different parts of that exception – in a Django model?
It is common to design a Django model that records “an event” or “an attempt to do foo”; part of the information to be recorded is “… and the result was this error”. That then becomes a field on the model.
An important aspect of recording the error in the database is to query it in various structured ways: What was the exception type? What was the specific message? What were the other arguments to the exception instance? What was the full traceback?
(I'm aware of services that will let me stream logging or errors to them across the network; that is not what this question asks. I need the structured exception data in the project's own database, for reference directly from other models in the same database.)
The Python exception object is a data structure that has all of these parts – the exception type, the arguments to the exception instance, the “cause” and “context”, the traceback – available separately, as attributes of the object. The objective of this question is to have the Python exception information stored in the database model, such that they can be queried in an equivalent structured manner.
So it isn't enough to have a free-form TextField to record a string representation of the exception. A structured field or set of fields is needed; perhaps even an entire separate model for storing exception instances.
Of course I could design such a thing myself, and spend time making the same mistakes countless people before me have undoubtedly made in trying to implement something like this. Instead, I want to learn from existing art in solving this problem.
What is a general pattern to store structured Python exception data in a Django model, preferably in an existing mature general-purpose library at PyPI that my project can use for its models?

I am not that sure that many people had sought a custom way of storing exception data
Ultimately, you have to know what you need. In all projects I had worked so far, the traceback text had contained enough information to dig to the error source. (sometimes, even naively, so that the TB was escaped multipletimes, still it had been enough to "reverse escape" it, and fix the source with only one instance of the error).
Some debugging tools offer live interactive instrospection in each execution frame when an exception happens - but that has to be "live", because you can't ordinarily serialize Python execution frames and store it on the DB.
That said, if the traceback text is not enough for you, you can have the traceback object by calling sys.exc_info()[2]. That allows you to introspect each frame and know for yourself the file, line number, local and global variables as dictionaries. If you want the variable values to be available on the database, you have to serialize them, but not all values on variables will be easily serializable. So, it is your call to know 'when enough is enough' in this process.
Most modern databases allow for JSON fields, and serializing the exception info to a JSON field restricting data to strings, numbers, bool and None is probably enough.
One way is to run manually each key on the f_globals and f_locals dict for each fame, and try to json.dumps that key's value, and on an exception on the JSON serializtion, use the object's repr instead. Or you can customize a JSON serializer that could store customized relevant data for datetime and dates, open files, sockets and so on - only you can know your needs.
TL;DR: Use a JSON field, at the except clause get hold of the traceback object by calling sys.last_traceback(), and have a custom function to serialize what you want from it into the JSON field.
Or, just use traceback.format_tb to save the traceback text - it probably will be enough anyway.

Most people delegate tasks like this to third-party services like RollBar or LogEntries, or to middleware services running in a VPC like Elastic LogStash.
I'd suggest that the Django ORM is not well suited for this type of storage. Most of the advantage of the ORM is letting you join on relational tables without elaborate SQL and manage schema changes and referential integrity with migrations. These are the problems encountered by the "CRUD" applications Django is designed for — web applications with user accounts, preferences, notification inboxes, etc. The ORM is intended to manage mutable data with more reads than writes.
Your problem, storing Python exceptions that happened in production, is quite different. Your schema needs will almost never change. The data you want to store is never modified at all once written, and all writes are strictly appends. Your data does not contain foreign keys or other fields that would change in a migration. You will almost always query recent data over historical, which will rarely be read outside of offline/bulk analytics.
If you really want to store this information in Django, I'd suggest storing only a rolling window that you periodically rotate into compressed logs on disk. Otherwise you will be maintaining costly indexes in data that is almost never needed. In this case, you should consider constructing your own custom Django model that extracts the Exception metadata you need. You could also put this information into a JSON field that you store as a string as #jsbueno suggests, but this sacrifices indexed selection.
(Note Python exceptions cannot be directly serialized to JSON or pickled. There is a project called tblib that enables pickling, which in turn could be stored as BLOB fields in Django, but I have no idea if the performance would be reasonable. My guess is it would not be worth it.)
In recent years there are many alternative DBMS products for log-like, append-only storage with analytic query patterns. But most of this progress is too recent and too "web-scale" to have off-the-shelf integration with Django, which focuses on smaller, more traditional CRUD applications. You should look for solutions that can be integrated with Python more generally, as complex logging/event storage is mostly out-of-scope for Django.

Suggestions for python REST API json serialization

I am used to build rest apis with PHP and I make heavy usage of the JMC serializer. It basically lets me write annotations to class properties that define the name and type of variables, including nested classes and arrays. This lets me completely abstract away the json format and just work with classes which transparently serialize and deserialize to JSON. In combination with symfony validator, this approach is very simple to work with, but also very powerful.
Now to my question, I recently started adopting python for some projects and I would like to reimplement an API in python. I have searched the internet for a suitable equivalent to JMS serializer, but I didn't find one that has the same or simillar capabilities.
Would anyone be so kind to point me in the right direction? (either good library or recommend a different approach with equal or better efficiency)
What I need:
ability to serialize and deserialize object into JSON
define how object is serialized - names of JSON attributes and their data types
define a complex object graphs (ability to define a class as property type, which would than be mapped by its own definition)
ability to map dicts or arrays and types they contain
Thanks in advance

Marshmallow
I haven't used it yet, so I can't tell if it answers all you needs. Feedback welcome.

Is using strings as an object identifier bad practice?

I am developing a small app for managing my favourite recipes. I have two classes - Ingredient and Recipe. A Recipe consists of Ingredients and some additional data (preparation, etc). The reason i have an Ingredient class is, that i want to save some additional info in it (proper technique, etc). Ingredients are unique, so there can not be two with the same name.
Currently i am holding all ingredients in a "big" dictionary, using the name of the ingredient as the key. This is useful, as i can ask my model, if an ingredient is already registered and use it (including all it's other data) for a newly created recipe.
But thinking back to when i started programming (Java/C++), i always read, that using strings as an identifier is bad practice. "The Magic String" was a keyword that i often read (But i think that describes another problem). I really like the string approach as it is right now. I don't have problems with encoding either, because all string generation/comparison is done within my program (Python3 uses UTF-8 everywhere if i am not mistaken), but i am not sure if what i am doing is the right way to do it.
Is using strings as an object identifier bad practice? Are there differences between different languages? Can strings prove to be an performance issue, if the amount of data increases? What are the alternatives?

No -
actually identifiers in Python are always strings. Whether you keep then in a dictionary yourself (you say you are using a "big dictionary") or the object is used programmaticaly, with a name hard-coded into the source code. In this later case, Python creates the name in one of its automaticaly handled internal dictionary (that can be inspected as the return of globals() or locals()).
Moreover, Python does not use "utf-8" internally, it does use "unicode" - which means it is simply text, and you should not worry how that text is represented in actual bytes.

Python relies on dictionaries for many of its core features. For that reason the pythonic default dict already comes with a quite effective, fast implementation "from factory", decent hash, etc.
Considering that, the dictionary performance itself should not be a concern for what you need (eventual calls to read and write on it), although the way you handle it / store it (in a python file, json, pickle, gzip, etc.) could impact load/access time, etc.
Maybe if you provide a few lines of code showing us how you deal with the dictionary we could provide specific details.
About the string identifier, check jsbueno's answer, he gave a much better explanation then I could do.

Getting and serializing the state of dynamically created python instances to a relational model

I'm developing a framework of sorts. I'm providing a base class, that will be subclassed by other developers to add behavior to the system. The instances of those classes will have attributes that my framework doesn't necessarily expect, except by inspecting those instances' __dict__. To make things even more interesting, some of those classes can be created dynamically, at any time.
I'd like some things to be handled by the framework, namely, I will need to persist those instances, display their attribute values to the user, and let her search/filter instances using those values.
I have to use a relational database. I know there are some decent python OO database out there, but unfortunately they're not an option in this case.
I'm not looking for a full-blown ORM too... and it may not even be an option, given that some of the classes can be created dynamically.
So, my question is, what state of a python instance do I need to serialize to ensure that I can deserialize it later on? Is it enough to look at __dict__, or are there other private attributes that I should be using?
Pickling the instances is not enough, because I'll need to unpickle them to search/filter the attribute values, and I'm afraid it's too much data to do it in-memory (instead of letting the database do it).

Just use an ORM. This is what they are for.
What you are proposing to do is create your own half-assed ORM on your own time. Save your time for your own code that does things, and use the effort other people put for free into solving this problem for you.
Note that all class creation in python is "dynamic" - this is not an issue, for, well, anything at all. In fact, if you are assembling classes programmatically, it is probably slightly easier with an ORM, because they provide reifications of fields.
In the worst case, if you really do need to store your objects in a fake nosql-type schema, you will still only have to write your own backend driver if you use an existing ORM, rather than coding the whole stack yourself. (As it happens, you're not the first person to face this - solutions exist. Goole "python orm store dynamically created models" and "sqlalchemy store dynamically created models")
Candidates include:
Django ORM
SQLAlchemy
Some others you can find by googling "Python ORM".

Converting Python App into Django

I've got a Python program with about a dozen classes, with several classes possessing instances of other classes, e.g. ObjectA has a list of ObjectB's, and a dictionary of (ObjectC, ObjectD) pairs.
My goal is to put the program's functionality on a website.
I've written and tested JSON encode and decode methods for each class. The problem as I see it now is that I need to choose between starting over and writing the models and logic afresh from a database perspective, or simply storing the python objects (encoded as JSON) in the database, and pulling out the saved states for changes.
Can someone confirm that these are both valid approaches, and that I'm not missing any other simple options?

Man, what I think you can do is convert the classes you already have made into django model classes. Of course, only the ones that need to be saved to a database. The other classes, as the rest of the code, I recommend you to encapsulate them for use as helper functions. So you don't have to change too much your code and it's going to work fine. ;D
Or, another choice, that can be easier to implement is: put everything in a helper, the classes, the functions and everything else.
SO you'll just need to call the functions in your views and define models to save your data into the database.
Your idea of saving the objects as JSON on the database works, but it's ugly. ;)
Anyway, if you are in a hurry to deliver the website, anything is valid. Just remember that things made in this way always give us lots of problems in the future.
It hopes that it could be useful! :D

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.