How to save to two tables using one SQLAlchemy model - python

I have an SQLAlchemy ORM class, linked to MySQL, which works great at saving the data I need down to the underlying table. However, I would like to also save the identical data to a second archive table.
Here's some psudocode to try and explain what I mean
my_data = Data() #An ORM Class
my_data.name = "foo"
#This saves just to the 'data' table
session.add(my_data)
#This will save it to the identical 'backup_data' table
my_data_archive = my_data
my_data_archive.__tablename__ = 'backup_data'
session.add(my_data_archive)
#And commits them both
session.commit()
Just a heads up, I am not interested in mapping a class to a JOIN, as in: http://www.sqlalchemy.org/docs/05/mappers.html#mapping-a-class-against-multiple-tables

I list some options below. I would go for the DB trigger if you do not need to work on those objects in your model.
use database trigger to do this job for you
create a SessionExtension which will create and add to session copy-objects (usually on before_flush). Edit-1: You can take versioning example from SA as a basic; this code is doing even more then you need.
see SA Versioning example which will not only give you a copy of the object, but the whole version history, which might be what you wish for
see Reverse mapping from a table to a model in SQLAlchemy question, where the proposed solution is described in the blogpost.

Create 2 identical models: one mapped to main table and another mapped to archive table. Create a MapperExtension with redefined method after_insert() (depending on your demands you might also need after_update() and after_delete()). This method should copy data from main model to archive and add it to the session. There are some tricks to copy all columns and many-to-many relations automagically.
Note, that you'll have to flush() session twice to store both objects since unit of work is computed before mapper extension adds new object to the session. You can redefine Session.flush() to take care of this problem. Also auto-incremented fields are assigned when the object is flushed, so you'll have to delay copying if you need them too.
It is one possible scenario which is proved to work. I'd like to know if there is a better way.

Related

Create Pony ORM entity without fetching related objects?

I recently started using Pony ORM, and I think it's awesome. Even though the api is well documented on the official website, I'm having a hard time working with relationships. In particular I want to insert a new entity which is part of a set. However I cannot seem to find a way to create the entity without fetching related objects first:
post = Post(
#author = author_id, # Python complains about author's type, which is int and should be User
author = User[author_id], # This works, but as I understand it actually fetches the user object
#author = User(id=author_id), # This will try and (fail to) create a new record for the User table when I commit, am I right?
# ...
)
In the end only the id value is inserted in the table, why should I fetch the entire object when I only need the id?
EDIT
I had a quick Peek at Pony ORM source code and using the primary key of the reverse entity should work, but even in that case we end up calling _get_by_raw_pkval_ which fetches the object either from the local cache or from the database, thus probably it's not possible.
It is a part of inner API and also not the way Pony assumes you use it, but you actually can use author = User._get_by_raw_pkval_((author_id,)) if you sure that you has those objects with these ids.

SQLAlchemy: add a child in many-to-many relationship by IDs

I am looking for a way to add a "Category" child to an "Object" entity without wasting the performance on loading the child objects first.
The "Object" and "Category" tables are linked with many-to-many relationship, stored in "ObjectCategory" table. The "Object" model is supplied with the relationsip:
categories = relationship('Category', secondary = 'ObjectCategory', backref = 'objects')
Now this code works just fine:
obj = models.Object.query.get(9)
cat1 = models.Category.query.get(22)
cat2 = models.Category.query.get(28)
obj.categories.extend([cat1, cat2])
But in the debug output I see that instantiating the obj and each category costs me a separate SELECT command to the db server, in addition to the single bulk INSERT command. Totally unneeded in this case, because I was not interested in manipulating the given category objects. Basically all I need is to nicely insert the appropriate category IDs.
The most obvious solution would be to go ahead and insert the entries in the association table directly:
db.session.add(models.ObjectCategory(oct_objID=9, oct_catID=22))
db.session.add(models.ObjectCategory(oct_objID=9, oct_catID=28))
But this approach is kind of ugly, it doesn't seem to use the power of the abstracted SQLAlchemy relationships. What's more it produces separate INSERT for every add(), vs the nice bulk INSERT as in the obj.categories.extend([list]) case. I imagine there could be some lazy object mode that would let the object live with only it's ID (unverified) and load the other fields only if they are requested. That would allow adding children in one-to-many or many-to-many relationships without issuing any SELECT to the database, yet letting to use the powerful ORM abstraction (ie, treating the list of children as a Python list).
How should I adjust my code to carry out this task using the power of SQLAlchemy but being conservative on the database use?
Do you have a ORM mapping for the ObjectCategory table? If so you could create and add ObjectCategory objects:
session.add(ObjectCategory(obj_id=9, category_id=22)
session.add(ObjectCategory(obj_id=9, category_id=28)

Can I use dynamic relationships (or VIEWS) in SQLAlchemy ORM?

I'm looking for a way, so that a relationship only uses some rows of a table, instead of the whole table.
I was thinking about using a view instead of the original table as base for the relationship. Where the view contains only a filtered set of rows from the original table.
Example
I want to collect statistical information for songs (rating, play-count, last time played) from a couple of players over different devices and from different users. I use Python and SQL Alchemy.
I came up with the following table layout (you can take a look at the code for the objects and the SQL for the tables):
The problem
For each Song object I can easily get all the associated Stats objects. But most of the time I only want some of them. I only want those Stats for a Song that relate to one or more Commit, eg. only the data from all commits by a given user, or one certain commit.
So I need some way to filter the stats data on the song object.
I'm not quiet sure how to archive this.
Should I use some custom query? But where do I place it and how? Also: while this might give me the data I want (select path, rating... over the three joined tables or something along those lines), I won't get objects back.
I was thinking about using a view on the stats table, containing only the lines matching the given commits. The view has to be created dynamically trough, so different commits can be filtered. And than using this view as the base for a relationship from songs to stats. But I have no idea how to do it.
So: Any ideas on how to solve this problem?
Or how to solve this another way?
You could try something like this:
from sqlalchemy.orm import object_session
# defined inside of your Song class
def stats_by_commit(self,commit):
#this could also be implemented as a join
return object_session(self).query(Stat)\
.filter(Stat.song_id == self.song_id,Stat.commit_id == Commit.commit_id)
Usage:
commit = db_session.query(Commit).filter_by(id=1)
for song in db_session.query(Song).filter_by(path='some_song_path'):
for stat in song.stats_by_commit(self,commit):
print stat.rating

Django models - assign id instead of object

I apologize if my question turns out to be silly, but I'm rather new to Django, and I could not find an answer anywhere.
I have the following model:
class BlackListEntry(models.Model):
user_banned = models.ForeignKey(auth.models.User,related_name="user_banned")
user_banning = models.ForeignKey(auth.models.User,related_name="user_banning")
Now, when i try to create an object like this:
BlackListEntry.objects.create(user_banned=int(user_id),user_banning=int(banning_id))
I get a following error:
Cannot assign "1": "BlackListEntry.user_banned" must be a "User" instance.
Of course, if i replace it with something like this:
user_banned = User.objects.get(pk=user_id)
user_banning = User.objects.get(pk=banning_id)
BlackListEntry.objects.create(user_banned=user_banned,user_banning=user_banning)
everything works fine. The question is:
Does my solution hit the database to retrieve both users, and if yes, is it possible to avoid it, just passing ids?
The answer to your question is: YES.
Django will hit the database (at least) 3 times, 2 to retrieve the two User objects and a third one to commit your desired information. This will cause an absolutelly unnecessary overhead.
Just try:
BlackListEntry.objects.create(user_banned_id=int(user_id),user_banning_id=int(banning_id))
These is the default name pattern for the FK fields generated by Django ORM. This way you can set the information directly and avoid the queries.
If you wanted to query for the already saved BlackListEntry objects, you can navigate the attributes with a double underscore, like this:
BlackListEntry.objects.filter(user_banned__id=int(user_id),user_banning__id=int(banning_id))
This is how you access properties in Django querysets. with a double underscore. Then you can compare to the value of the attribute.
Though very similar, they work completely different. The first one sets an atribute directly while the second one is parsed by django, that splits it at the '__', and query the database the right way, being the second part the name of an attribute.
You can always compare user_banned and user_banning with the actual User objects, instead of their ids. But there is no use for this if you don't already have those objects with you.
Hope it helps.
I do believe that when you fetch the users, it is going to hit the db...
To avoid it, you would have to write the raw sql to do the update using method described here:
https://docs.djangoproject.com/en/dev/topics/db/sql/
If you decide to go that route keep in mind you are responsible for protecting yourself from sql injection attacks.
Another alternative would be to cache the user_banned and user_banning objects.
But in all likelihood, simply grabbing the users and creating the BlackListEntry won't cause you any noticeable performance problems. Caching or executing raw sql will only provide a small benefit. You're probably going to run into other issues before this becomes a problem.

Does django with mongodb make migrations a thing of the past?

Since mongo doesn't have a schema, does that mean that we won't have to do migrations when we change the models?
What does the migration process look like with a non-relational db?
I think this is a really good question, but the answers are going to be a little scattered based on the libs you're using and your expectations for a "migration".
Let's take a look at some common migration actions:
Add a field: Mongo makes this very easy. Just add a field and you're done.
Delete a field: In theory, you're not actually tied to your schema, so "deletion" here is relative. If you remove the "property" and no longer load the field, then it doesn't really matter if that field is in the data. So if you don't care about "cleaning up" the database, then removing a field doesn't affect the database. If you do care about cleaning the DB, you'll basically need to run a giant for loop against the DB.
Modify a field name: This is also a difficult problem. When you rename a field "where" are you renaming it? If you want the DB to reflect the new field name, then you basically have to execute a giant for loop on the DB. TO be safe you probably have to "add" data, then push code, then "unset" the old field.
Some Wrinkles
However, the concept of a field name in tandem with an ActiveRecord object is just a little skewed. An ActiveRecord object is effectively providing mappings of object properties to actual database fields.
In a typical RDBMS the "size" of a field name is not really relevant. However, in Mongo, the field name actually occupies data space and this makes a big difference in terms of performance.
Now, if you're using some form of "data object" like ActiveRecord, why would you attempt to store full field names in the data? The DB should probably be storing all fields in alphabetical order with a map on the Object side. So a Document could have 8 fields/properties and the DB names would be "a", "b"..."j", but the Object names would be readable stuff like "Name", "Price", "Quantity".
The reason I bring this up is that it adds yet another wrinkle to Modify a field name. If you're implementing a mapping then modifying a field name doesn't really cause a migration at all.
Some more Wrinkles
If you do want to implement a migration on a deletion, then you'll have to do so after a deploy. You'll also have to recognize that you won't save any current disk space when you do so.
Mongo pre-allocates space and it doesn't really "give it back" unless you do a DB repair. So if you delete a bunch of fields on documents, those documents still occupy the same space on disk. If the documents are later moved, then you may reclaim space, however documents only move when they grow.
If you remove a large field from lots of documents you'll want to do a repair or a check out the new in-place compact command.
There is no silver bullet. Adding or removing fields is easier with non-relational db (just don't use unneeded fields or use new fields), renaming a field is easier with traditional db (you'll usually have to change a lot of data in case of field rename in schemaless db), data migration is on par - depending on task.
What does the migration process look like with a non-relational db?
Depends on if you need to update all the existing data or not.
In many cases, you may not need to touch the old data, such as when adding a new optional field. If that field also has a default value, you may also not need to update the old documents, if your application can handle a missing field correctly. However, if you want to build an index on the new field to be able to search/filter/sort, you need to add the default value back into the old documents.
Something like field renaming (trivial in a relational db, because you only need to update the catalog and not touch any data) is a major undertaking in MongoDB (you need to rewrite all documents).
If you need to update the existing data, you usually have to write a migration function that iterates over all the documents and updates them one by one (although this process can be shared and run in parallel). For large data sets, this can take a lot of time (and space), and you may miss transactions (if you end up with a crashed migration that went half-way through).

Categories