How to create and restore a backup from SqlAlchemy? - python

I'm writing a Pylons app, and am trying to create a simple backup system where every table is serialized and tarred up into a single file for an administrator to download, and use to restore the app should something bad happen.
I can serialize my table data just fine using the SqlAlchemy serializer, and I can deserialize it fine as well, but I can't figure out how to commit those changes back to the database.
In order to serialize my data I am doing this:
from myproject.model.meta import Session
from sqlalchemy.ext.serializer import loads, dumps
q = Session.query(MyTable)
serialized_data = dumps(q.all())
In order to test things out, I go ahead and truncation MyTable, and then attempt to restore using serialized_data:
from myproject.model import meta
restore_q = loads(serialized_data, meta.metadata, Session)
This doesn't seem to do anything... I've tried calling a Session.commit after the fact, individually walking through all the objects in restore_q and adding them, but nothing seems to work.
What am I missing? Or is there a better way to do what I'm aiming for? I don't want to shell out and directly touch the database, since SqlAlchemy supports different database engines.

You have to use Session.merge() method instead of Session.add() to put deserialized object back into the session.

Related

SQLAlchemy - cache table obj locally

I'm querying an existing read-only database with SQLAlchemy, and wonder if there is a way to cache the queried table object locally (in an automatic way) so that I can use it later. The main reason of this need is not to lock the database while my script is running (e.g. I have to keep a session connected to wait for user's response), and the database is read-only so I really don't need the modified data to be synchronized back.
Right now I'm working through a solution to convert the queried results into pd.DataFrame, but it would be nice to keep the advantage of SQLAlchemy where the queried result can retain it's structure rather than being converted to a flat table in pd.DataFrame.
I'm new to SQLAlchemy and still learning. Any suggestions about the solution or if I miss some major features already provided in SQLAlchemy package would really be appreciated!

SQLAlchemy: use related object when session is closed

I have many models with relational links to each other which I have to use. My code is very complicated so I cannot keep session alive after a query. Instead, I try to preload all the objects:
def db_get_structure():
with Session(my_engine) as session:
deps = {x.id: x for x in session.query(Department).all()}
...
return (deps, ...)
def some_logic(id):
struct = db_get_structure()
return some_other_logic(struct.deps[id].owner)
However, I get the following error anyway regardless of the fact that all the objects are already loaded:
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Department at 0x10476e780> is not bound to a Session; lazy load operation of attribute 'owner' cannot proceed
Is it possible to link preloaded objects with each other so that the relations will work after session get closed?
I know about joined queries (.options(joinedload(), but this approach leads to more code lines and bigger DB request, and I think this should be solved simpler, because all the objects are already loaded into Python objects.
It's even possible now to request the related objects like struct.deps[struct.deps[id].owner_id], but I think the ORM should do this and provide shorter notation struct.deps[id].owner using some "cached load".
Whenever you access an attribute on a DB entity that has not yet been loaded from the DB, SQLAlchemy will issue an implicit SQL statement to the DB to fetch that data. My guess is that this is what happens when you issue struct.deps[struct.deps[id].owner_id].
If the object in question has been removed from the session it is in a "detached" state and SQLAlchemy protects you from accidentally running into inconsistent data. In order to work with that object again it needs to be "re-attached".
I've done this already fairly often with session.merge:
attached_object = new_session.merge(detached_object)
But this will reconile the object instance with the DB and potentially issue updates to the DB if necessary. The detached_object is taken as "truth".
I believe you can do the reverse (attaching it by reading from the DB instead of writing to it) by using session.refresh(detached_object), but I need to verify this. I'll update the post if I found something.
Both ways have to talk to the DB with at least a select to ensure the data is consistent.
In order to avoid loading, issue session.merge(..., load=False). But this has some very important cavetas. Have a look at the docs of session.merge() for details.
I will need to read up on your link you added concerning your "complicated code". I would like to understand why you need to throw away your session the way you do it. Maybe there is an easier way?

Have django ORM prepare sql, without executing

I am doing a massive data conversion for data that will end up in a django managed database. For reasons of efficiency and politics, we need to fill the destination database with manually run mass INSERTS.
I would like to have my Django ORM prepare those statements, so I can write them to a file to be run later.
So I need somthing like this:
50000_or_so_Foos = [...]
sql_str = Foo.objects.bulk_create_sql(50000_or_so_Foos)
with file("pre_preped.sql", 'w') as f:
f.write(sql_str)
Then we will pass pre_preped.sql to another department and they will play it into the database.
Is there a way to do this?
Is this actually going to save us any time?
ADDED Question: Should I be creating a csv for LOADDATA instead?
(I should note that in the real world, we have more than one model and way more than 50000 objects)
I am not sure of any easy way to get the query from bulk_create, because it executes query when it is called, as opposed to somethign like filtering where you can view the querysets query property.
As I was quickly scanning source code, it looks like you can manually build a query using the sql object, the same way django does in bulk_create. https://github.com/django/django/blob/master/django/db/models/query.py#L917 can provide a blueprint on how to do that.

Django model retrieves same results

I have a django model, TestModel, over an SQL database.
Whenever I do
TestModel.objects.all()
I seem to be getting the same results if I run it multiple times from the same process. I tested that by manually deleting (without using ANY of the django primitives) a line from the table the model is constructed on, the query still returns the same results, even though obviously there should be less objects after the delete.
Is there a caching mechanism of some sort and django is not going to the database every time I want to retrieve the objects?
If there is, is there a way I could still force django to go to the database on each query, preferably without writing raw SQL queries?
I should also specify that by restarting the process the model once again returns the correct objects, I don't see the deleted ones anymore, but if I delete some more the issue occurs again.
This is because your database isolation level is repeatable read. In a django shell all requests are enclosed in a single transaction.
Edited
You can try in your shell:
from django.db import transaction
with transaction.autocommit():
t = TestModel.objects.all()
...
Sounds like a db transaction issue. If you're keeping a shell session open while you separately go into the database itself and modify data, the transaction that's open in the shell won't see the changes because of isolation. You'll need to exit and reload the shell to get a new transaction before you can see them.
Note that in production, transactions are tied to the request/response cycle so this won't be a significant issue.

Getting text from db as django dictionary

I recently joined a company which is using django to build their product. I'm currently responsible for one of the apps, which was already developed a little bit before I was here.
One of the entities in the app has a json dictionary attribute, which has been kept in the db as a text field. Also, this attribute is marked in the model as a text field. So, as you can imagine it's not being handled correctly.
I wanted to change this and set it as a json field using https://github.com/bradjasper/django-jsonfield , which works really well.
However, I've run into a peculiar problem. Previous data stored in the db was not correctly handled and since it was unicode data, the text field in the db looks like:
{u'key': u'value'}
Now when the entity manager tries to load those values using the json field, it of course breaks since it's no longer a valid json string.
I've done some research on how to overcome this, but haven't found nothing.
My question:
Do you have any suggestion on how to overcome this? It can be any type of solution.
Something that I can run over night altering that field to transform it to a valid json string.
Some changes to the json-field code, which enables it to correctly handle these values.
Additional info
We use postgres with psycopg2 as django's db backend.
Thank you very much.
You're probably just going to need to iterate over the whole table, load the field, convert it into a real Python dict, and dump it back out with json.dumps. ast.literal_eval is a good choice for the conversion stage because it works like the built-in eval but is more restricted, so less risky to your system.
for obj in MyModel.objects.all():
value = ast.literal_eval(obj.dict_value)
obj.dict_value = json.dumps(value)
value.save()

Categories