are web2py Database Abstraction Layer (DAL) references on cascade by default? - python

When I create a database in web2py using the DAL, and I create a table for the user comments on my website for example and I need to be able to get the user that sent that particular comment, I can do it by email..
However emails can change over time (possible option) and the database itself could end up looking for a non existent user if the email is not updated on all "child" tables that uses that email as reference 1 to 1 for that user.
For this reasons, I would need to automatically update all the Foreign Keys in child tables, so is this feature (update on cascade on foreign keys) present and by default when using DAL / is it possible to tell the DAL connection to do it by adding a updateoncascade=True in needed Field("name", type="type", notnull=True, updateoncascade=True) fields?

The DAL does not provide an API to specify ON UPDATE CASCADE when creating a table, so you would have to do that externally. Alternatively, you could make use of the _after_update hook to update records in any child tables.
Also consider whether you want to set a foreign key on the email address rather than using the built-in reference field functionality, which creates a foreign key on the id field of the parent table. Because the id of a given user record will never change, you do not have to worry about cascading updates:
db.define_table('comments',
...,
Field('author', 'reference auth_user'))
Above, 'reference auth_user' sets up a foreign key to the db.auth_user.id field.

Related

SQLAlchemy: avoid updating a field with `onupdate` based on which other fields are being updated

I've got a table called Host, which defines two columns that are somewhat related - mac_address and manufacturer. The manufacturer field is determined using a public mapping between mac addresses to their owners (Hardware manufacturing companies).
Although manufacturer could be just a method in the ORM class, I would like to be able to filter based on it's value.
The problem I have with implementing an onupdate function is that I require the ability to keep the value of manufacturer unchanged if a mac_address field was not provided for the update, where to my understanding the previous mac_address value will not be available through get_current_parameters.
Simply put, I'm looking for a way to skip updating a field from within it's onupdate method.

How to check if a list of primary keys already exist in DB in a single query?

In db I have a table called register that has mail-id as primary key. I used to submit in bulk using session.add_all(). But sometimes some records already exist; in that case I want to separate already existing records and non- existing.
http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.merge
If all the objects you are adding to the database are complete (e.g. the new object contains at least all the information that existed for the record in the database) you can use Session.merge(). Effectively merge() will either create or update the existing row (by finding the primary key if it exists in the session/database and copying the state across from the object you merge). The crucial thing to take note of is that the attribute values of the object passed to merge will overwrite that which already existed in the database.
I think this is not so great in terms of performance, so if that is important, SQLAlchemy has some bulk operations. You would need to check existence for the set of primary keys that will be added/updated, and do one bulk insert for the objects which didn't exist and one bulk update for the ones that did. The documentation has some info on the bulk operations if it needs to be a high-performance approach.
http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#bulk-operations
User SQL ALchemys inspector for this:
inspector = inspect(engine)
inspector.get_primary_keys(table, schema)
Inspector "reflects" all primary keys and you can check agaist the returned list.

How can I improve django mysql copy performance?

I have a django app that has a model (Person) defined and I also have some DB (in there is a table Appointment) that do not have any models defined (not meant to be connected to the django app).
I need to move some data from Appointment table over to the Person such that all information the People table needs to mirror the Appointment table. It is this way because there are multiple independent DBs like Appointment that needs to be copied to the Person table (so I do not want to make any architectural changes to how this is setup).
Here is what I do now:
res = sourcedb.fetchall() # from Appointment Table
for myrecord in res:
try:
existingrecord = Person.objects.filter(vendorid = myrecord[12], office = myoffice)[0]
except:
existingrecord = Person(vendorid = myrecord[12], office = myoffice)
existingrecord.firstname = myrecord[0]
existingrecord.midname = myrecord[1]
existingrecord.lastname = myrecord[2]
existingrecord.address1 = myrecord[3]
existingrecord.address2 = myrecord[4]
existingrecord.save()
The problem is that this is way too slow (takes about 8 minutes for 20K records). What can I do to speed this up?
I have considered the following approach:
1. bulk_create: Cannot use this because I have to update sometimes.
2. delete all and then bulk_create There is dependency on the Person model to other things, so I cannot delete records in Person model.
3. INSERT ... ON DUPLICATE KEY UPDATE: cannot do this because the Person table's PK is different from the Appointment table PK (primary key). The Appointment PK is copied into Person table. If there was a way to check on two duplicate keys, this approach would work I think.
A few ideas:
EDIT: See Trewq's comment to this and create Indexes on your tables first of all…
Wrap it all in a transaction using with transaction.atomic():, as by default Django will create a new transaction per save() call which can become very expensive. With 20K records, one giant transaction might also be a problem, so you might have to write some code to split your transactions into multiple batches. Try it out and measure!
If RAM is not an issue (should not be one with 20k records), fetch all data first from the appointment table and then fetch all existing Person objects using a single SELECT query instead of one per record
Use bulk_create even if some of them are updates. This will still issue UPDATE queries for your updates, but will reduce all your INSERT queries to just one/a few, which still is an improvement. You can distinguish inserts and updates by the fact that inserts wont have a primary key set before calling save() and save the inserts into a Python list for a later bulk_create instead of saving them directly
As a last resort: Write raw SQL to make use of MySQLs INSERT … ON DUPLICATE KEY UPDATE syntax. You don't need the same primary key for this, a UNIQUE key would suffice. Keys can span multiple columns, see Django's Meta.unique_together model option.

Preventing select query on ForeignKey when the primary key is known

to insert a row to a table that has a one-to-one relationship, you would do this in Django:
mypk=2 # Comes from the POST request
model=MyModel(myField="Hello", myForeignModel=ForeignModel.objects.get(pk=mypk))
model.save()
This will cause a SELECT query followed by an INSERT query.
However, the SELECT query isn't really necessary as it will be the mypk that is inserted into the foreign key field. Is there a way to get Django to just insert the primary key without doing a SELECT?
Secondly, are there concurrency issues here (in the event that the primary key would change before the user submits the request). If so, how are these dealt with?
From the docs:
Behind the scenes, Django appends "_id" to the field name to create its database column name.
Simply set myForeignModel_id to the FK value.

Load an existing many-to-many table relation with sqlalchemy

I'm using SqlAlchemy to interact with an existing PostgreSQL database.
I need to access data organized in a many-to-many relationship. The documentation describes how to create relationships, but I cannot find an example for neatly loading and query an existing one.
Querying an existing relation is not really different than creating a new one. You pretty much write the same code but specify the table and column names that are already there, and of course you won't need SQLAlchemy to issue the CREATE TABLE statements.
See http://www.sqlalchemy.org/docs/05/mappers.html#many-to-many . All you need to do is specify the foreign key columns for your existing parent, child, and association tables as in the example, and specify autoload=True to fill out the other fields on your Tables. If your association table stores additional information, as they almost always do, you should just break your many-to-many relation into two many-to-one relations.
I learned SQLAlchemy while working with MySQL. With that database I always had to specify the foreign key relationships because they weren't explicit database constraints. You might get lucky and be able to reflect even more from your database, but you might prefer to use something like http://pypi.python.org/pypi/sqlautocode to just code the entire database schema and avoid the reflection delay.

Categories