Deleting objects with unique-together constraints on SQL Server - python

Consider the following Django ORM example:
class A(models.Model):
pass
class B(models.Model):
a = models.ForeignKey('A', on_delete=models.CASCADE, null=True)
b_key = models.SomeOtherField(doesnt_really_matter=True)
class Meta:
unique_together = (('a', 'b_key'),)
Now let's say I delete an instance of A that's linked to an instance of B. Normally, this is no big deal: Django can delete the A object, setting B.a = NULL, then delete B after the fact. This is usually fine because most databases don't consider NULL values in unique constraints, so even if you have b1 = B(a=a1, b_key='non-unique') and b2 = B(a=a2, b_key='non-unique'), and you delete both a1 and a2, that's not a problem because (NULL, 'non-unique') != (NULL, 'non-unique') because NULL != NULL.
However, that's not the case with SQL Server. SQL Server brilliantly defines NULL == NULL, which breaks this logic. The workaround, if you're writing raw SQL, is to use WHERE key IS NOT NULL when defining the unique constraint, but this is generated for me by Django. While I can manually create the RunSQL migrations that I'd need to drop all the original unique constraints and add the new filtered ones, it's definitely a shortcoming of the ORM or driver (I'm using pyodbc + django-pyodbc-azure). Is there some way to either coax Django into generating filtered unique constraints in the first place, or force it to delete tables in a certain order to circumvent this issue altogether, or some other general fix I could apply to the SQL server?

Related

Optimize a primary key that I won't use

I'm creating a new Django (1.7) model:
class MyModel(models.Model):
field1 = models.ForeignKey('OtherModel')
field2 = models.ForeignKey('AnotherModel', null=True)
field3 = models.PositiveSmallIntegerField(db_index=True, null=True)
other_field1 = models.FloatField(default=0, db_index=True)
class Meta:
unique_together = (('field1', 'field2', 'field3'), )
Ideally, I would have liked it to have the tuple (field1, field2, field3) as primary key, but that's not possible at the moment.
So instead, I have this automatically generated and incremented id column that is required by Django but totally useless for the rest of my code.
The thing is that I'd like to be able to delete and recreate instances of this model very often (almost continuously).
For performance reasons, I'd like to avoid having to use a create_or_update approach, as deleting and creating is much quicker from what I've tested (18op/s with the create_or_update method, 72op/s with the "delete all and create" method, and I expect these numbers to be higher on our production server).
But I'm afraid to reach the auto_increment upper limit too soon (about a year it seems).
Other possibilities that I've imagined :
using a Bigint primary key: that would work, but it would probably be less effective, especially considering that I will never use this primary key (and BigAutoField is not available until Django 1.10)
using an UUID primary key: same stuff (and UUIDField is not available until Django 1.8)
using a custom CharField primary key that would be something like "{self.field1.pk}/{self.field2.pk}/{self.field3}", but I don't even know if Django can handle a primary key generator based on the instance itself (the advantage though is that it would ensure the unicity even with null values on field2 and field3)
What would you suggest (aside from upgrading to a newer version of Django, I know I'm late...)? Do you think I'm looking for too much optimizations?
Stack:
Django 1.7 (can't upgrade for now)
PostgreSQL 9.6
If you are feeling adventurous, you can try turning the id column into a nullable integer column without default.
Note: This will break parts of the Model API, the admin and most likely cause all kinds of unforeseen trouble, but depending on your use case it might just do what you want. I'm not saying this is a good idea, I'm just showing you the option.
That being said, here is the necessary migration:
migrations.RunSQL([
"ALTER TABLE myapp_mymodel DROP CONSTRAINT myapp_mymodel_pkey",
"ALTER TABLE myapp_mymodel ALTER COLUMN id drop default",
"ALTER TABLE myapp_mymodel ALTER COLUMN id drop not null"]
),
You can still create new objects and get existing objects, but you cannot directly save or delete objects:
>>> m = MyModel.objects.create(field1=1, field2=2, field3=3)
>>> m
<MyModel: MyModel object (None)>
>>> m.delete()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File ".../lib/python3.6/site-packages/django/db/models/base.py", line 886, in delete
(self._meta.object_name, self._meta.pk.attname)
AssertionError: MyModel object can't be deleted because its id attribute is set to None.
However, you can use .filter(...).delete() and .filter(...).update(...):
>>> MyModel.objects.filter(field1=1, field2=2, field3=3).update(field3=4)
1
>>> MyModel.objects.filter(field1=1, field2=2, field3=4).delete()
(1, {'myapp.MyModel': 1})
The MyModel I used to test this behavior has three PositiveSmallIntegerFields, not ForeignKeys, but that shouldn't make a difference:
class MyModel(models.Model):
field1 = models.PositiveSmallIntegerField()
field2 = models.PositiveSmallIntegerField()
field3 = models.PositiveSmallIntegerField(db_index=True, null=True)
other_field1 = models.FloatField(default=0, db_index=True)
class Meta:
unique_together = (('field1', 'field2', 'field3'), )

Django ManyToMany through with multiple databases

TLTR: Django does not include database names in SQL queries, can I somehow force it to do this or is there a workaround?
The long version:
I have two legacy MySQL databases (Note: I have no influence on the DB layout) for which I'm creating a readonly API using DRF on Django 1.11 and python 3.6
I'm working around the referential integrity limitation of MyISAM DBs by using the SpanningForeignKey field suggested here: https://stackoverflow.com/a/32078727/7933618
I'm trying to connect a table from DB1 to a table from DB2 via a ManyToMany through table on DB1. That's the query Django is creating:
SELECT "table_b"."id" FROM "table_b" INNER JOIN "throughtable" ON ("table_b"."id" = "throughtable"."b_id") WHERE "throughtable"."b_id" = 12345
Which of course gives me an Error "Table 'DB2.throughtable' doesn't exist" because throughtable is on DB1 and I have no idea how to force Django to prefix the tables with the DB name. The query should be:
SELECT table_b.id FROM DB2.table_b INNER JOIN DB1.throughtable ON (table_b.id = throughtable.b_id) WHERE throughtable.b_id = 12345
Models for app1 db1_app/models.py: (DB1)
class TableA(models.Model):
id = models.AutoField(primary_key=True)
# some other fields
relations = models.ManyToManyField(TableB, through='Throughtable')
class Throughtable(models.Model):
id = models.AutoField(primary_key=True)
a_id = models.ForeignKey(TableA, to_field='id')
b_id = SpanningForeignKey(TableB, db_constraint=False, to_field='id')
Models for app2 db2_app/models.py: (DB2)
class TableB(models.Model):
id = models.AutoField(primary_key=True)
# some other fields
Database router:
def db_for_read(self, model, **hints):
if model._meta.app_label == 'db1_app':
return 'DB1'
if model._meta.app_label == 'db2_app':
return 'DB2'
return None
Can I force Django to include the database name in the query? Or is there any workaround for this?
A solution exists for Django 1.6+ (including 1.11) for MySQL and sqlite backends, by option ForeignKey.db_constraint=False and explicit Meta.db_table. If the database name and table name are quoted by ' ` ' (for MySQL) or by ' " ' (for other db), e.g. db_table = '"db2"."table2"'). Then it is not quoted more and the dot is out of quoted. Valid queries are compiled by Django ORM. A better similar solution is db_table = 'db2"."table2' (that allows not only joins but it is also by one issue nearer to cross db constraint migration)
db2_name = settings.DATABASES['db2']['NAME']
class Table1(models.Model):
fk = models.ForeignKey('Table2', on_delete=models.DO_NOTHING, db_constraint=False)
class Table2(models.Model):
name = models.CharField(max_length=10)
....
class Meta:
db_table = '`%s`.`table2`' % db2_name # for MySQL
# db_table = '"db2"."table2"' # for all other backends
managed = False
Query set:
>>> qs = Table2.objects.all()
>>> str(qs.query)
'SELECT "DB2"."table2"."id" FROM DB2"."table2"'
>>> qs = Table1.objects.filter(fk__name='B')
>>> str(qs.query)
SELECT "app_table1"."id"
FROM "app_table1"
INNER JOIN "db2"."app_table2" ON ( "app_table1"."fk_id" = "db2"."app_table2"."id" )
WHERE "db2"."app_table2"."b" = 'B'
That query parsing is supported by all db backends in Django, however other necessary steps must be discussed individually by backends. I'm trying to answer more generally because I found a similar important question.
The option 'db_constraint' is necessary for migrations, because Django can not create the reference integrity constraint
ADD foreign key table1(fk_id) REFERENCES db2.table2(id),
but it can be created manually for MySQL.
A question for particular backends is if another database can be connected to the default at run-time and if a cross database foreign key is supported. These models are also writable. The indirectly connected database should be used as a legacy database with managed=False (because only one table django_migrations for migrations tracking is created only in the directly connected database. This table should describe only tables in the same database.) Indexes for foreign keys can however be created automatically on the managed side if the database system supports such indexes.
Sqlite3: It has to be attached to another default sqlite3 database at run-time (answer SQLite - How do you join tables from different databases), at best by the signal connection_created:
from django.db.backends.signals import connection_created
def signal_handler(sender, connection, **kwargs):
if connection.alias == 'default' and connection.vendor == 'sqlite':
cur = connection.cursor()
cur.execute("attach '%s' as db2" % db2_name)
# cur.execute("PRAGMA foreign_keys = ON") # optional
connection_created.connect(signal_handler)
Then it doesn't need a database router of course and a normal django...ForeignKey can be used with db_constraint=False. An advantage is that "db_table" is not necessary if the table names are unique between databases.
In MySQL foreign keys between different databases are easy. All commands like SELECT, INSERT, DELETE support any database names without attaching them previously.
This question was about legacy databases. I have however some interesting results also with migrations.
I have a similar setup with PostgreSQL. Utilizing search_path to make cross-schema references possible in Django (schema in postgres = database in mysql). Unfortunately, seems like MySQL doesn't have such a mechanism.
However, you might try your luck creating views for it. Make views in one databases that references other databases, use it to select data. I think it's the best option since you want your data read-only anyway.
It's however not a perfect solution, executing raw queries might be more useful in some cases.
UPD: Providing mode details about my setup with PostgreSQL (as requested by bounty later). I couldn't find anything like search_path in MySQL documentation.
Quick intro
PostgreSQL has Schemas. They are synonymous to MySQL databases. So if you are MySQL user, imaginatively replace word "schema" with word "database". Requests can join tables between schemas, create foreign keys, etc... Each user (role) has a search_path:
This variable [search_path] specifies the order in which schemas are searched when
an object (table, data type, function, etc.) is referenced by a simple
name with no schema specified.
Special attention on "no schema specified", because that's exactly what Django does.
Example: Legacy databases
Let's say we got coupe legacy schemas, and since we are not allowed to modify them, we also want one new schema to store the NM relation in it.
old1 is the first legacy schema, it has old1_table (which is also the model name, for convenience sake)
old2 is the second legacy schema, it has old2_table
django_schema is a new one, it will store the required NM relation
All we need to do is:
alter role django_user set search_path = django_schema, old1, old2;
This is it. Yes, that simple. Django has no names of the schemas ("databases") specified anywhere. Django actually has no idea what is going on, everything is managed by PostgreSQL behind the scenes. Since django_schema is first in the list, new tables will be created there. So the following code ->
class Throughtable(models.Model):
a_id = models.ForeignKey('old1_table', ...)
b_id = models.ForeignKey('old2_table', ...)
-> will result in a migration that creates table throughtable that references old1_table and old2_table.
Problems: if you happened to have several tables with same names, you will either need to rename them or still trick Django into using a dot inside of table names.
Django does have the ability to work with multiple databases. See https://docs.djangoproject.com/en/1.11/topics/db/multi-db/.
You can also use raw SQL queries in Django. See https://docs.djangoproject.com/en/1.11/topics/db/sql/.

Django Polymoprhic

I want to implement models using inheritance and I've found this package django-polymorphic. But I was reading about inheritance in django models and almost on every page I found they recommend using abstract = True in parent model. Which will duplicate fields for subclasses, resultsing in making queries faster.
I've done some testing and found out that this library doesn't use use abstract varaible:
class Parent(PolymorphicModel):
parent_field = models.TextField()
class Child(Parent):
child_field = models.TextField()
This results in:
Parent table:
| app_parent| CREATE TABLE `app_parent` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_field` longtext NOT NULL,
`polymorphic_ctype_id` int(11),
PRIMARY KEY (`id`),
KEY `app_polymorphic_ctype_id_a7b8d4c7_fk_django_content_type_id` (`polymorphic_ctype_id`),
CONSTRAINT `app_polymorphic_ctype_id_a7b8d4c7_fk_django_content_type_id` FOREIGN KEY (`polymorphic_ctype_id`) REFERENCES `django_content_type` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Child table:
| app_child | CREATE TABLE `app_child` (
`parent_ptr_id` int(11) NOT NULL,
`child_field` varchar(20) NOT NULL,
PRIMARY KEY (`parent_ptr_id`),
CONSTRAINT `no_parent_ptr_id_079ccc0e_fk_app_parent_id` FOREIGN KEY (`parent_ptr_id`) REFERENCES `app_arent` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Should I use my own classes which uses abstract field or should i stick with this?
Do you have a need to be able to query parent table?
Parent.objects.all()
If yes, then most probably you will need to use the multi-table inheritance with abstract=False.
Using model inheritance with abstract=False you get more complicated database schema, with more database relations. Creating a child instance will take 2 inserts instead of 1 (parent and child table). Querying child data will require table join. So this method certainly has got it's shortcomings. But when you want to query for common columns data, it's best supported way in django.
Django polymorphic builds on top of standard django model inheritance, by adding extra column polymorphic_ctype which allows to identify subclass having only a parent object.
There are various ways that you can use to achieve similar results using abstract=True. But often it results in more complicated querying code.
If using abstract=True bellow is 2 examples how you can query common-data of all childs.
Chaining multiple queries
def query_all_childs(**kwargs):
return chain(
Child1.objects.filter(**kwargs)
Child2.objects.filter(**kwargs)
)
Using database views
Manually create a database view that combines several tables (this could be done by attaching sql code to post-migrate signal):
create database view myapp_commonchild
select 'child1' as type, a, b from child1
union all
select 'child2' as type, a, b from child2
Create a concrete-parent model with managed=False. This flag tells django to ignore the table in database migrations (because we have manually created database view for that).
class Parent(models.Model):
a = CharField()
b = CharField()
class CommonChild(Parent):
type = models.CharField()
class Meta:
managed = False
class Child1(Parent):
pass
class Child2(Parent):
pass
Now you can query CommonChild.objects.all() and access common fields of child classes.
Speaking of performance, I don't know how big your tables are or how heavy reads/writes are, but most probably using abstract=False will not impact your performance in a noticeable way.

how to prevent django database from using deleted objects' id for new objects?

I'm writing a django website with a sqlite database, I have a model in my website like this :
class Foo(models.model):
name = models.Charfield(max_length=30)
the database creates an Id column for this class that is unique, consider that I create an object from class Foo and delete it and create another object g, their Id is the same :
>> f = Foo.objects.create(name="fooname")
>> f.id
4
>> f.delete()
>> g = Foo.objects.create(name="fooname2")
>> g.id
4
but I don't want the database to use an old deleted object's id for a newly created one, How can I do that?
This is a known bug that was fixed in Django 1.7, see:
sqlite3 backend: AutoField values aren't monotonically increasing.
Quote from release notes:
AutoField columns in SQLite databases will now be created using the
AUTOINCREMENT option, which guarantees monotonic increments. This will
cause primary key numbering behavior to change on SQLite, becoming
consistent with most other SQL databases. This will only apply to
newly created tables. If you have a database created with an older
version of Django, you will need to migrate it to take advantage of
this feature.

SQLAlchemy declarative concrete autoloaded table inheritance

I've an already existing database and want to access it using SQLAlchemy. Because, the database structure's managed by another piece of code (Django ORM, actually) and I don't want to repeat myself, describing every table structure, I'm using autoload introspection. I'm stuck with a simple concrete table inheritance.
Payment FooPayment
+ id (PK) <----FK------+ payment_ptr_id (PK)
+ user_id + foo
+ amount
+ date
Here is the code, with table SQL descritions as docstrings:
class Payment(Base):
"""
CREATE TABLE payments(
id serial NOT NULL,
user_id integer NOT NULL,
amount numeric(11,2) NOT NULL,
date timestamp with time zone NOT NULL,
CONSTRAINT payment_pkey PRIMARY KEY (id),
CONSTRAINT payment_user_id_fkey FOREIGN KEY (user_id)
REFERENCES users (id) MATCH SIMPLE)
"""
__tablename__ = 'payments'
__table_args__ = {'autoload': True}
# user = relation(User)
class FooPayment(Payment):
"""
CREATE TABLE payments_foo(
payment_ptr_id integer NOT NULL,
foo integer NOT NULL,
CONSTRAINT payments_foo_pkey PRIMARY KEY (payment_ptr_id),
CONSTRAINT payments_foo_payment_ptr_id_fkey
FOREIGN KEY (payment_ptr_id)
REFERENCES payments (id) MATCH SIMPLE)
"""
__tablename__ = 'payments_foo'
__table_args__ = {'autoload': True}
__mapper_args__ = {'concrete': True}
The actual tables have additional columns, but this is completely irrelevant to the question, so in attempt to minimize the code I've simplified everything just to the core.
The problem is, when I run this:
payment = session.query(FooPayment).filter(Payment.amount >= 200.0).first()
print payment.date
The resulting SQL is meaningless (note the lack of join condidion):
SELECT payments_foo.payment_ptr_id AS payments_foo_payment_ptr_id,
... /* More `payments_foo' columns and NO columns from `payments' */
FROM payments_foo, payments
WHERE payments.amount >= 200.0 LIMIT 1 OFFSET 0
And when I'm trying to access payment.date I get the following error: Concrete Mapper|FooPayment|payments_foo does not implement attribute u'date' at the instance level.
I've tried adding implicit foreign key reference id = Column('payment_ptr_id', Integer, ForeignKey('payments_payment.id'), primary_key=True) to FooPayment without any success. Trying print session.query(Payment).first().user works (I've omited User class and commented the line) perfectly, so FK introspection works.
How can I perform a simple query on FooPayment and access Payment's values from resulting instance?
I'm using SQLAlchemy 0.5.3, PostgreSQL 8.3, psycopg2 and Python 2.5.2.
Thanks for any suggestions.
Your table structures are similar to what is used in joint table inheritance, but they certainly don't correspond to concrete table inheritance where all fields of parent class are duplicated in the table of subclass. Right now you have a subclass with less fields than parent and a reference to instance of parent class. Switch to joint table inheritance (and use FooPayment.amount in your condition or give up with inheritance in favor of simple aggregation (reference).
Filter by a field in other model doesn't automatically add join condition. Although it's obvious what condition should be used in join for your example, it's not possible to determine such condition in general. That's why you have to define relation property referring to Payment and use its has() method in filter to get proper join condition.

Categories