Best way to retrieve database results for further use? - python

I am working heavily with a database, using python, and I am trying to write code that actually makes my life easier.
Most of the time, I need to run a query and get results to process them; most of the time I get the same fields from the same table, so my idea was to collect the various results in an object, to process it later.
I am using SQLAlchemy for the DB interaction. From what I can read, there is no direct way to just say "dump the result of this query to an object", so I can access the various fields like
print object.fieldA
print object.fieldB
and so on. I tried dumping the results to JSON, but even that require parsing and it is not as straightforward as I hoped.
So at this point is there anything else that I can actually try? Or should I write a custom object that mimic the db structure, and parse the result with for loops, to put the data in the right place? I was hoping to find a way to do this automatically, but so far it seems that the only way to get something close to what I am looking for, is to use JSON.
EDIT:
Found some info about serialization and the capabilities that SQLAlchemy has, to read a table and reproduce a sort of 1:1 copy of it in an object, but I am not sure that this will actually work with a query.

Found that the best way is to actually use a custom object.
You can use reflection trough SQLAlchemy to extrapolate the structure, but if you are dealing with a small database with few tables, you can simply create on your own the object that will host the data. This gives you control over the object and what you can put in it.
There are obvious other ways, but since nobody posted anything; I assume that either are too easy to be mentioned, or too hard and specific to each case.

Related

How to dynamically store and execute equations and functions

During my current project, I have been receiving data from a set of long-range sensors, which are sending data as a series of bytes. Generally, due to having multiple types of sensors, the bytes structures and data contained are different, hence the need to make the functionality more dynamic as to avoid having to hard-code every single setup in the future (which is not practical).
The server will be using Django, which I believe is irrelevant to the issue at hand but I have mentioned just in case it might have something that can be used.
The bytes data I am receiving looks like this:
b'B\x10Vu\x87%\x00x\r\x0f\x04\x01\x00\x00\x00\x00\x1e\x00\x00\x00\x00ad;l'
And my current process looks like this:
Take the first bytes to get the deviceID (deviceID = val[0:6].hex())
Look up the format to be used in the struct.unpack() (here: >BBHBBhBHhHL after removing the first bytes for the id.
Now, the issue is the next step. Many of the datas I have have different forms of per-processing that needs to be done. F.e. some values need to be ran with a join statement (e.g. ".".join(str(values[2]) ) while others need some simple mathematical changes (-113 + 2 * values[4]) and finally, others needs a simple logic check (values[7]==0x80) to return a boolean value.
My question is, what's the best way to code those methods? I would really like to avoid hardcoding them, but it almost seems like the best idea. another idea I saw was to store the functionalities as a string and execute them such as seen here, but I've been reading that its a very bad idea, and that it also slows down execution. The last idea I had was to hardcode some general functions only and use something similar to here, but this doesn't solve the issue of having to hard-code every new sensor-type, which is not realistic in a live-installation. Are there any better methods to achieve the same thing?
I have also looked at here, with the idea that some functionality can be somehow optimized as an equation, but I didn't see that a possibility for every occurrence, especially when any string manipulation is needed at all.
Additionally, is there a possibility of using some maths to apply some basic string manipulation? I can hard-code one string manipulation maybe, but to be honest this whole thing has been bugging me...
Finally, I am considering if I go with the function storing as string then executing, is there a way to set some "security" to avoid any malicious exploitation? Since such a method is... awful insecure to say the least.
However, after almost a week total of searching I am so far unable to find a better solution than storing functions as a string and running eval on them, despite not liking that option. If anyone finds a better option before then, I would be extremely grateful to any tips or ideas.
Appendum: Minimum code that can be used to show-case and test different methods:
import struct
def decode(input):
val = bytearray(input)
deviceID = val[0:6].hex()
del(val[0:6])
print(deviceID)
values = list(struct.unpack('>BBHBBhBHhHL', val))
print(values)
# Now what?
decode(b'B\x10Vu\x87%\x00x\r\x0f\x04\x01\x00\x00\x00\x00\x1e\x00\x00\x00\x00ad;l')

How can I prefetch_related() everything related to an object?

I'm trying to export all data connected to an User instance to CSV file. In order to do so, I need to get it from the DB first. Using something like
data = SomeModel.objects.filter(owner=user)
on every model possible seems to be very inefficient, so I want to use prefetch_related(). My question is, is there any way to prefetch all different model's instances with FK pointing at my User, at once?
Actually, you don't need to "prefetch everything" in order to create a CSV file – or, anything else – and you really don't want to. Python's CSV support is of course designed to work "row by row," and that's what you want to do here: in a loop, read one row at a time from the database and write it one row at a time to the file.
Remember that Django is lazy. Functions like filter() specify what the filtration is going to be, but things really don't start happening until you start to iterate over the actual collection. That's when Django will build the query, submit it to the SQL engine, and start retrieving the data that's returned ... one row at a time.
Let the SQL engine, Python and the operating system take care of "efficiency." They're really good at that sort of thing.

SQLAlchemy Expression Language first or one

I'm using the sqlalchemy expression language (i.e. Core). I'm trying to build a query that should only return one result.
I want to do
query = select([table]).where(cond).one()
or
query = select([table]).where(cond).first()
but that only yields
AttributeError: 'Select' object has no attribute 'one'
The closest I have come is
query = select([table]).where(cond).limit(1)
but that is not entirely satisfactory because I get a list of results where I want a single result. I can work around by inserting extra logic but I'd be much happier to find a way to do this cleanly. I also would prefer not to use plain text queries.
Any ideas? Much appreciated.
one and first are methods available on ORM query objects. On closer inspection you can see that they cause execution of the query and then post process to get get the first/only entry and error check etc.
The SQL dialects that I have checked actually don't seem to have this functionality inbuilt for queries (oops I thought they did). They have limit or something similar.
The only option is to work around using limit and some logic or some call to execute or on the result.

Storing the results of a SQLAlchemy query to merge into another session

I have a SQLAlchemy-based tool for selectively copying data between two different databases for testing purposes. I use the merge() function to take model objects from one session and store them in another session. I'd like to be able to store the source objects in some intermediate form and then merge() them at some later point in time.
It seems like there are a few options to accomplish this:
Exporting DELETE/INSERT SQL statements. Seems pretty straightforward, I think I can get SQLAlchemy to give me the INSERT statements, and maybe even the DELETEs.
Exproting the data to a SQLite database file with the same (or similar) schema, that could then be read in as a source at a later point in time.
Serializing the data in some manner and then reading them back into memory for the merge. I don't know if SQLAlchemy has something like this built-in or not. I'm not sure what the challenges would be in rolling this myself.
Has anyone tackled this problem before? If so, what was your solution?
EDIT: I found a tool built on top of SQLAlchemy called dataset that provides the freeze functionality I'm looking for, but there seems to be no corresponding thaw functionality for restoring the data.
I haven't used it before, but the dogpile caching techniques described in the documentation might be what you want. This allows you to query to and from a cache using the SQLAlchemy API:
http://docs.sqlalchemy.org/en/rel_0_9/orm/examples.html#module-examples.dogpile_caching

Examples of use for PickledObjectField (django-picklefield)?

surfing on the web, reading about django dev best practices points to use pickled model fields with extreme caution.
But in a real life example, where would you use a PickledObjectField, to solve what specific problems?
We have a system of social-networks "backends" which do some generic stuff like "post message", "get status", "get friends" etc. The link between each backend class and user is django model, which keeps user, backend name and credentials. Now imagine how many auth systems are there: oauth, plain passwords, facebook's obscure js stuff etc. This is where JSONField shines, we keep all backend-specif auth data in a dictionary on this model, which is stored in db as json, we can put anything into it no problem.
You would use it to store... almost-arbitrary Python objects. In general there's little reason to use it; JSON is safer and more portable.
You can definitely substitute a PickledObjectField with JSON and some extra logic to create an object out of the JSON. At the end of the day, your use case, when considering to use a PickledObjectField or JSON+logic, is serializing a Python object into your database. If you can trust the data in the Python object, and know that it will always be serialize-able, you can reasonably use the PickledObjectField. In my case (I don't use django's ORM, but this should still apply), I have a couple different object types that can go into my PickledObjectField, and their definitions are constantly mutating. Rather than constantly updating my JSON parsing logic to create an object out of JSON values, I simply use a PickledObjectField to just store the different objects, and then later retrieve them in perfectly usable form (calling their functions). Caveat: If you store an object via PickledObjectField, then you change the object definition, and then you retrieve the object, the old object may have trouble fitting into the new object's definition (depending on what you changed).
The problems to be solved are the efficiency and the convenience of defining and handling a complex object consisting of many parts.
You can turn each part type into a Model and connect them via ForeignKeys.
Or you can turn each part type into a class, dictionary, list, tuple, enum or whathaveyou to your liking and use PickledObjectField to store and retrieve the whole beast in one step.
That approach makes sense if you will never manipulate parts individually, only the complex object as a whole.
Real life example
In my application there are RQdef objects that represent essentially a type with a certain basic structure (if you are curious what they mean, look here).
RQdefs consist of several Aspects and some fixed attributes.
Aspects consist of one or more Facets and some fixed attributes.
Facets consist of two or more Levels and some fixed attributes.
Levels consist of a few fixed attributes.
Overall, a typical RQdef will have about 20-40 parts.
An RQdef is always completely constructed in a single step before it is stored in the database and it is henceforth never modified, only read (but read frequently).
PickledObjectField is more convenient and much more efficient for this purpose than would be a set of four models and 20-40 objects for each RQdef.

Categories