SQLAlchemy unique constrain by field - python

I have UniqueConstraint on field, but it wont allow me to add multiple entries (two is max!)
from sqlalchemy import Column, Integer, String, Boolean, UniqueConstraint
class Cart(SqlAlchemyBase):
__tablename__ = 'cart'
__table_args__ = (UniqueConstraint('is_latest'), {})
sid = Column(Integer, primary_key=True)
is_latest = Column(Boolean, index=True, nullable=False)
name = Column(String)
I would like to support more entries, so that one name can have two variants:
name=foo, is_latest=True
name=foo, is_latest=False
name=bar, is_latest=True
name=bar, is_latest=False
but then reject any subsequent attempt to write name=foo (or bar) and is_latest=True

What you are trying to achieve here is a type 2 slowly changing dimension, this is a topic that has been discussed extensively and I encourage you to look it up.
When I look at your table you seem to use sid as a surrogate key, but I fail to see what is the natural key and what will be updated as time goes.
Anyway, there are several ways to achieve SCD type 2 result without the need to worry about your check, but the the simplest in my mind is to keep on adding records with your natural key and when querying, select only the one with highest surrogate key (autoincrementing integer), no need for current uniqueness here as only the latest value is fetched.
There are examples for versioning rows in SQLAlchemy docs, but since website come and go, I'll put a simplified draft of the above approach here.
class VersionedItem(Versioned, Base):
id = Column(Integer, primary_key=True) # surrogate key
sku = Column(String, index=True) # natural key
price = Column(Integer) # the value that changes with time
#event.listens_for(Session, "before_flush")
def before_flush(session, flush_context, instances):
for instance in session.dirty:
if not (
isinstance(instance, VersionedItem)
and session.is_modified(instance)
and attributes.instance_state(instance).has_identity
):
continue
make_transient(instance) # remove db identity from instance
instance.id = None # remove surrogate key
session.add(instance) # insert instance as new record

Looks like a Partial Unique Index can be used:
class Cart(SqlAlchemyBase):
__tablename__ = 'cart'
id = Column(Integer, primary_key=True)
cart_id = Column(Integer)
is_latest = Column(Boolean, default=False)
name = Column(String)
__table_args__ = (
Index('only_one_latest_cart', name, is_latest,
unique=True,
postgresql_where=(is_latest)),
)
name=foo, is_latest = True
name=foo, is_latest = False
name=bar, is_latest = False
name=bar, is_latest = False
And when adding another name=foo, is_latest = True
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "only_one_latest_cart"
DETAIL: Key (name, is_latest)=(foo, t) already exists.

Related

Instance <xxx> has been deleted after commit(), but when and why?

Here's a really simple piece of code. After adding the "poll" instance to the DB and committing, I cannot later read it. SQLAlchemy fails with the error:
Instance '<PollStat at 0x7f9372ea72b0>' has been deleted, or its row is otherwise not present.
Weirdly, this does not happen if I replace the ts_start/ts_run primary key by an integer autoincrement one. Is it possible that DateTime columns are not suitable as primary key?
db = Session()
poll = models.PollStat(
ts_start=datetime.datetime.now(),
ts_run=ts_run,
polled_tools=0)
db.add(poll)
db.commit() # I want to commit here in case something fails later
print(poll.polled_tools) # this fails
PollStat in module models.py:
class PollStat(Base):
__tablename__ = 'poll_stat'
ts_run = Column(Integer, primary_key=True)
ts_start = Column(DateTime, primary_key=True)
elapsed_ms = Column(Integer, default=None)
polled_tools = Column(Integer, default=0)
But if I do this:
class PollStat(Base):
__tablename__ = 'poll_stat'
id = Column(Integer, primary_key=True)
ts_run = Column(Integer)
ts_start = Column(DateTime)
elapsed_ms = Column(Integer, default=None)
polled_tools = Column(Integer, default=0)
it works. Why?
For anyone that still has this problem, this error happened to me because I submitted a JSON object with an id of 0; I use the same form for adding and editing said object, so when editing this would normally have an ID number, but when creating the item, the id property needs to be deleted before inserting the item. Some databases don't accept an ID of 0. In the end the row is created but the ID of 0 no longer matches the current ID, hence why the error pops up.

SQLAlchemy: list of objects, preserve reference if they're deleted

I'm trying to implement a user-facing PreviewList of Articles, which will keep its size even if an Article is deleted. So if the list has four objects [1, 2, 3, 4] and one is deleted, I want it to contain [1, 2, None, 4].
I'm using a relationship with a secondary table. Currently, deleting either Article or PreviewList will delete the row in that table. I've experimented with cascade options, but they seem to affect the related items directly, not the contents of the secondary table.
The snippet below tests for the desired behaviour: deleting an Article should preserve the row in ArticlePreviewListAssociation, but deleting a PreviewList should delete it (and not the Article).
In the code below, deleting the Article will preserve the ArticlePreviewListAssociation, but pl.articles does not treat that as a list entry.
from db import DbSession, Base, init_db
from sqlalchemy import Column, String, Integer, ForeignKey
from sqlalchemy.orm import relationship
session = DbSession()
class Article(Base):
__tablename__ = 'articles'
id = Column(Integer, primary_key=True)
title = Column(String)
class PreviewList(Base):
__tablename__ = 'preview_lists'
id = Column(Integer, primary_key=True)
articles = relationship('Article', secondary='associations')
class ArticlePreviewListAssociation(Base):
__tablename__ = 'associations'
article_id = Column(Integer, ForeignKey('articles.id'), nullable=True)
previewlist_id = Column(Integer, ForeignKey('preview_lists.id'), primary_key=True)
article = relationship('Article')
preview_list = relationship('PreviewList')
init_db()
print(f"Creating test data")
a = Article(title="StackOverflow: 'Foo' not setting 'Bar'?")
pl = PreviewList(articles=[a])
session.add(a)
session.add(pl)
session.commit()
print(f"ArticlePreviewListAssociations: {session.query(ArticlePreviewListAssociation).all()}")
print(f"Deleting PreviewList")
session.delete(pl)
associations = session.query(ArticlePreviewListAssociation).all()
print(f"ArticlePreviewListAssociations: should be empty: {associations}")
if len(associations) > 0:
print("FAIL")
print("Reverting transaction")
session.rollback()
print("Deleting article")
session.delete(a)
articles_in_list = pl.articles
associations = session.query(ArticlePreviewListAssociation).all()
print(f"ArticlePreviewListAssociations: should not be empty: {associations}")
if len(associations) == 0:
print("FAIL")
print(f"Articles in PreviewList: should not be empty: {articles_in_list}")
if len(articles_in_list) == 0:
print("FAIL")
# desired outcome: pl.articles should be [None], not []
print("Reverting transaction")
session.rollback()
This may come down to "How can you make a many-to-many relationship where pk_A == 1 and pk_B == NULL include the None in A's list?"
The given examples would seem to assume that the order of related articles is preserved, even upon deletion. There are multiple approaches to that, for example the Ordering List extension, but it is easier to first solve the problem of preserving associations to deleted articles. This seems like a use case for an association object and proxy.
The Article class gets a new relationship so that deletions cascade in a session. The default ORM-level cascading behavior is to set the foreign key to NULL, but if the related association object is not loaded, we want to let the DB do it, so passive_deletes=True is used:
class Article(Base):
__tablename__ = 'articles'
id = Column(Integer, primary_key=True)
title = Column(String)
previewlist_associations = relationship(
'ArticlePreviewListAssociation', back_populates='article',
passive_deletes=True)
Instead of a many to many relationship PreviewList uses the association object pattern, along with an association proxy that replaces the many to many relationship. This time the cascades are a bit different, since the association object should be deleted, if the parent PreviewList is deleted:
class PreviewList(Base):
__tablename__ = 'preview_lists'
id = Column(Integer, primary_key=True)
article_associations = relationship(
'ArticlePreviewListAssociation', back_populates='preview_list',
cascade='all, delete-orphan', passive_deletes=True)
articles = association_proxy(
'article_associations', 'article',
creator=lambda a: ArticlePreviewListAssociation(article=a))
Originally the association object used previewlist_id as the primary key, but then a PreviewList could contain a single Article only. A surrogate key solves that. The foreign key configurations include the DB level cascades. These are the reason for using passive deletes:
class ArticlePreviewListAssociation(Base):
__tablename__ = 'associations'
id = Column(Integer, primary_key=True)
article_id = Column(
Integer, ForeignKey('articles.id', ondelete='SET NULL'))
previewlist_id = Column(
Integer, ForeignKey('preview_lists.id', ondelete='CASCADE'),
nullable=False)
# Using a unique constraint on a nullable column is a bit ugly, but
# at least this prevents inserting an Article multiple times to a
# PreviewList.
__table_args__ = (UniqueConstraint(article_id, previewlist_id), )
article = relationship(
'Article', back_populates='previewlist_associations')
preview_list = relationship(
'PreviewList', back_populates='article_associations')
With these changes in place no "FAIL" is printed.

Item and Category database relationship

So I would like to have users add an item and an arbitrary category. Right now I use if statements to make sure that if the category has been created already, not to add it again. Is there a better way to make use of SQLAlchemy relationships so that I could skip some of the logic I had to write to ensure that the categories are unique?
Here are the model's I used:
class Category(Base):
__tablename__ = 'category'
id = Column(Integer, primary_key=True)
name = Column(String(250), nullable=False)
class Item(Base):
__tablename__ = 'item'
id = Column(Integer, primary_key=True)
name = Column(String(250), nullable=False)
description = Column(String)
category_id = Column(Integer, ForeignKey('category.id'))
category = relationship(Category)
date_created = Column(DateTime)
date_updated = Column(DateTime)
user_id = Column(Integer, ForeignKey('user.id'))
user = relationship(User)
Here is an example of how I would edit an item:
if new_category_name != category.name:
if db_session.query(Category).\
filter_by(name=new_category_name).count() == 0:
new_category = Category(name=new_category_name)
else:
new_category = db_session.query(Category)\
.filter_by(name=new_category_name).one()
is_last_of_category = db_session.query(Item)\
.filter_by(category_id=item.category_id).count() == 1
if is_last_of_category:
db_session.delete(category)
item.category = new_category
db_session.commit()
Any other suggestions you are willing to make I am happy to listen to.
Use the unique constraint,
Quoting from sqlalchemy's docs
unique – When True, indicates that this column contains a unique
constraint, or if index is True as well, indicates that the Index
should be created with the unique flag. To specify multiple columns in
the constraint/index or to specify an explicit name, use the
UniqueConstraint or Index constructs explicitly.
Example from sqlalchemy documentation:
from sqlalchemy import UniqueConstraint
meta = MetaData()
mytable = Table('mytable', meta,
# per-column anonymous unique constraint
Column('col1', Integer, unique=True),
Column('col2', Integer),
Column('col3', Integer),
# explicit/composite unique constraint. 'name' is optional.
UniqueConstraint('col2', 'col3', name='uix_1')
)

How to avoid inserting duplicate entries when adding values via a sqlalchemy relationship?

Let's assume we have two tables in a many to many relationship as shown below:
class User(db.Model):
__tablename__ = 'user'
uid = db.Column(db.String(80), primary_key=True)
languages = db.relationship('Language', lazy='dynamic',
secondary='user_language')
class UserLanguage(db.Model):
__tablename__ = 'user_language'
__tableargs__ = (db.UniqueConstraint('uid', 'lid', name='user_language_ff'),)
id = db.Column(db.Integer, primary_key=True)
uid = db.Column(db.String(80), db.ForeignKey('user.uid'))
lid = db.Column(db.String(80), db.ForeignKey('language.lid'))
class Language(db.Model):
lid = db.Column(db.String(80), primary_key=True)
language_name = db.Column(db.String(30))
Now in the python shell:
In [4]: user = User.query.all()[0]
In [11]: user.languages = [Language('1', 'English')]
In [12]: db.session.commit()
In [13]: user2 = User.query.all()[1]
In [14]: user2.languages = [Language('1', 'English')]
In [15]: db.session.commit()
IntegrityError: (IntegrityError) column lid is not unique u'INSERT INTO language (lid, language_name) VALUES (?, ?)' ('1', 'English')
How can I let the relationship know that it should ignore duplicates and not break the unique constraint for the Language table? Of course, I could insert each language separately and check if the entry already exists in the table beforehand, but then much of the benefit offered by sqlalchemy relationships is gone.
The SQLAlchemy wiki has a collection of examples, one of which is how you might check uniqueness of instances.
The examples are a bit convoluted though. Basically, create a classmethod get_unique as an alternate constructor, which will first check a session cache, then try a query for existing instances, then finally create a new instance. Then call Language.get_unique(id, name) instead of Language(id, name).
I've written a more detailed answer in response to OP's bounty on another question.
I would suggest to read Association Proxy: Simplifying Association Objects. In this case your code would translate into something like below:
# NEW: need this function to auto-generate the PK for newly created Language
# here using uuid, but could be any generator
def _newid():
import uuid
return str(uuid.uuid4())
def _language_find_or_create(language_name):
language = Language.query.filter_by(language_name=language_name).first()
return language or Language(language_name=language_name)
class User(Base):
__tablename__ = 'user'
uid = Column(String(80), primary_key=True)
languages = relationship('Language', lazy='dynamic',
secondary='user_language')
# proxy the 'language_name' attribute from the 'languages' relationship
langs = association_proxy('languages', 'language_name',
creator=_language_find_or_create,
)
class UserLanguage(Base):
__tablename__ = 'user_language'
__tableargs__ = (UniqueConstraint('uid', 'lid', name='user_language_ff'),)
id = Column(Integer, primary_key=True)
uid = Column(String(80), ForeignKey('user.uid'))
lid = Column(String(80), ForeignKey('language.lid'))
class Language(Base):
__tablename__ = 'language'
# NEW: added a *default* here; replace with your implementation
lid = Column(String(80), primary_key=True, default=_newid)
language_name = Column(String(30))
# test code
user = User(uid="user-1")
# NEW: add languages using association_proxy property
user.langs.append("English")
user.langs.append("Spanish")
session.add(user)
session.commit()
user2 = User(uid="user-2")
user2.langs.append("English") # this will not create a new Language row...
user2.langs.append("German")
session.add(user2)
session.commit()

SQLAlchemy Many-to-many table with multiple foreign key entires

I'm new with sqlalchemy and I want to do this as simply as possible, yet correctly. I want to track domain use across multiple companies on a monthly basis, so I set up the following tables:
class Company(Base):
__tablename__ = 'company'
id = Column(Integer, primary_key = True)
name = Column('name', String)
class Domains(Base):
__tablename__ = 'domains'
id = Column(Integer, primary_key=True)
name = Column('name', String, unique=True)
class MonthlyUsage(Base):
'''
Track domain usage across all
companies on a monthly basis.
'''
__tablename__ = 'monthlyusage'
month = Column(DateTime)
company_id = Column(Integer, ForeignKey('company.id'))
domain_id = Column(Integer, ForeignKey('domains.id'))
# <...other columns snipped out...>
company = relationship('Company', backref='company_assoc')
domain = relationship('Domains', backref='domain_assoc')
This works fine, until I add usage details for the second month. Then I get duplicate key value errors:
*sqlalchemy.exc.IntegrityError: (IntegrityError) duplicate key value violates unique constraint "monthlyusage_pkey"*
Does this mean I have to split out the "monthlyusage" into a third table? That seems unnecessarily complicated, since all that needs to be unique is the month, company_id, and domain_id fields.
Any suggestions for my layout here, to keep it as simple as possible, yet still correct?
TIA!
Ok, I needed to add a primary key column to MonthlyUsage. The code below now works...
class MonthlyUsage(Base):
'''
Track domain usage across all
companies on a monthly basis.
'''
__tablename__ = 'monthlyusage'
month = Column(DateTime)
month_id = Column(Integer, primary_key=True)
company_id = Column(Integer, ForeignKey('company.id'), primary_key=True)
domain_id = Column(Integer, ForeignKey('domains.id'), primary_key=True)
# <...other columns snipped out...>
company = relationship('Company', backref='company_assoc')
domain = relationship('Domains', backref='domain_assoc')

Categories