SQLAlchemy supports creating partial indexes in postgresql.
Is it possible to create a partial unique index through SQLAlchemy?
Imagine a table/model as so:
class ScheduledPayment(Base):
invoice_id = Column(Integer)
is_canceled = Column(Boolean, default=False)
I'd like a unique index where there can be only one "active" ScheduledPayment for a given invoice.
I can create this manually in postgres:
CREATE UNIQUE INDEX only_one_active_invoice on scheduled_payment
(invoice_id, is_canceled) where not is_canceled;
I'm wondering how I can add that to my SQLAlchemy model using SQLAlchemy 0.9.
class ScheduledPayment(Base):
id = Column(Integer, primary_key=True)
invoice_id = Column(Integer)
is_canceled = Column(Boolean, default=False)
__table_args__ = (
Index('only_one_active_invoice', invoice_id, is_canceled,
unique=True,
postgresql_where=(~is_canceled)),
)
In case someone stops by looking to set up a partial unique constraint with a column that can optionally be NULL, here's how:
__table_args__ = (
db.Index(
'uk_providers_name_category',
'name', 'category',
unique=True,
postgresql_where=(user_id.is_(None))),
db.Index(
'uk_providers_name_category_user_id',
'name', 'category', 'user_id',
unique=True,
postgresql_where=(user_id.isnot(None))),
)
where user_id is a column that can be NULL and I want a unique constraint enforced across all three columns (name, category, user_id) with NULL just being one of the allowed values for user_id.
To add to the answer by sas, postgresql_where does not seem to be able to accept multiple booleans. So in the situation where you have TWO null-able columns (let's assume an additional 'price' column) it is not possible to have four partial indices for all combinations of NULL/~NULL.
One workaround is to use default values which would never be 'valid' (e.g. -1 for price or '' for a Text column. These would compare correctly, so no more than one row would be allowed to have these default values.
Obviously, you will also need to insert this default value in all existing rows of data (if applicable).
Related
Given these three classes
class User(BaseModel):
name = models.CharField(..)
class Order(BaseModel):
user = models.ForeignKey(User,...,related_name='orders')
class OrderItem(BaseModel):
order = models.ForeignKey(Order,...,related_name='items'
quatity = models.IntegerField(default=1)
price = models.FloatField()
and this is the base class (it is enough to note that it has the created_at field)
class BaseModel(models.Model):
createt_at = models.DateTimeField(auto_now_add=True)
Now each User will have multiple Orders and each Order has multiple OrdeItems
I want to annotate the User objects with the total price of the last order.
Take this data for example:
The User objects should be annotated with the sum of the last order that is for user john with id=1
we should return the sum of order_items (with ids= 3 & 4) since they are related to the order id=2 since it is the latest order.
I hope I have made my self clear. I am new to Django and tried to go over the docs and tried many different things but I keep getting stuck at getting the last order items
Sometimes it's unclear how to make such query in Django ORM. In your case I'd write the query in raw SQL something like:
WITH last_order_for_user AS (
SELECT id, user_id, MAX(created_at)
FROM orders
GROUP BY user_id
) -- last order_id for each user_id
SELECT
order.user_id, order.id, SUM(item.price)
FROM
last_order_for_user order
LEFT JOIN
orderitems item ON order.id=item.order_id
GROUP BY 1,2 -- sum of items for last order, user
And then perform raw SQL django-docs
I need to replace the default integer id in my model with an uuid. The problem is that it's beeing used in another model (foreignkey).
Any idea on how to perform this operation without losing data?
class A(Base):
__tablename__ = 'a'
b_id = Column(
GUID(), ForeignKey('b.id'), nullable=False,
server_default=text("uuid_generate_v4()")
)
class B(Base):
__tablename__ = 'b'
id = Column(
GUID(), primary_key=True,
server_default=text("uuid_generate_v4()")
)
Unfortunately it doesn't work, also I'm afraid I'll break the relation.
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) default for column "id" cannot be cast automatically to type uuid
Alembic migration I've tried looks similar to:
op.execute('ALTER TABLE a ALTER COLUMN b_id SET DATA TYPE UUID USING (uuid_generate_v4())')
Add an id_tmp column to b with autogenerated UUID values, and a b_id_tmp column to a. Update a joining b on the foreign key to fill a.b_id_tmp with the corresponding UUIDs. Then drop a.b_id and b.id, rename the added columns, and reestablish the primary key and foreign key.
CREATE TABLE a(id int PRIMARY KEY, b_id int);
CREATE TABLE b(id int PRIMARY KEY);
ALTER TABLE a ADD CONSTRAINT a_b_id_fkey FOREIGN KEY(b_id) REFERENCES b(id);
INSERT INTO b VALUES (1), (2), (3);
INSERT INTO a VALUES (1, 1), (2, 2), (3, 2);
ALTER TABLE b ADD COLUMN id_tmp UUID NOT NULL DEFAULT uuid_generate_v1mc();
ALTER TABLE a ADD COLUMN b_id_tmp UUID;
UPDATE a SET b_id_tmp = b.id_tmp FROM b WHERE b.id = a.b_id;
ALTER TABLE a DROP COLUMN b_id;
ALTER TABLE a RENAME COLUMN b_id_tmp TO b_id;
ALTER TABLE b DROP COLUMN id;
ALTER TABLE b RENAME COLUMN id_tmp TO id;
ALTER TABLE b ADD PRIMARY KEY (id);
ALTER TABLE a ADD CONSTRAINT b_id_fkey FOREIGN KEY(b_id) REFERENCES b(id);
Just as an aside, it's more efficient to index v1 UUIDs than v4 since they contain some reproducible information, which you'll notice if you generate several in a row. That's a minor savings unless you need the higher randomness for external security reasons.
I am working with Python, PostgreSQL, SQLAlchemy and alembic.
I have to design a database, but I am kinda stuck because my design needs to have a column which will store a list of IDs which are basically foreign keys. I am not sure how to do that and moreover if I should be doing that.
Example: I have a discount table which basically contains all the available discount codes.I have a column discount_applies to where I want to store a list of all products to which the discount applies (I cannot edit the products table). Basically the column will contain a list of UUIDs of products on which the discount can be applied
class Product(Base):
.....
class Discount(Base):
.....
class ProductDiscount(Base):
__tablename__ = 'discount_applies'
product_id = Column(String(32), nullable=False, ,ForeignKey('product.id'))
discount_id = Column(String(32), nullable=False,ForeignKey('discount.id')) #If discount primary key if Integer then change String to Integer
product = relationship(Product)
discount = relationship(Discount)
I'm using SQLAlchemy with MySQL and have a table with two foreign keys:
class CampaignCreativeLink(db.Model):
__tablename__ = 'campaign_creative_links'
campaign_id = db.Column(db.Integer, db.ForeignKey('campaigns.id'),
primary_key=True)
creative_id = db.Column(db.Integer, db.ForeignKey('creatives.id'),
primary_key=True)
Then I use a for loop to insert 3 items into the table like this:
session.add(8, 3)
session.add(8, 2)
session.add(8, 1)
But when I checked the table, the items are ordered reversely
8 1
8 2
8 3
And the query shows the order reversely too. What's the reason for this and how can I keep the order same as when they were added?
A table is a set of rows and are therefore not guaranteed to have any order unless you specify ORDER BY.
In MySQL (InnoDB), the primary key acts as the clustered index. This means that the rows are physically stored in the order specified by the primary key, in this case (campaign_id, created_id), regardless of the order of insertion. This is usually the order the rows are returned in if you don't specify an ORDER BY.
If you need your rows returned in a certain order, specify ORDER BY when you query.
I'm querying my SQLAlchemy-mapped star schema directly into a pandas DataFrame and am getting an annoying SAWarning from pandas that I'd like to address. Here's a simplified version.
class School(Base):
__tablename__ = 'DimSchool'
id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
class StudentScore(Base):
__tablename__ = 'FactStudentScore'
StudentKey = Column('StudentKey', Integer, ForeignKey('DimStudent.StudentKey'), primary_key = True)
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)
student = relationship("Student", backref='studentscore')
school = relationship("School", backref='studentscore')
I query the date with statements like this:
standard = session.query(StudentdScore, School).\
join(School).filter(School.name.like('%Dever%'))
testdf = pd.read_sql(sch.statement, sch.session.bind)
And then get this warning:
SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key. Consider use_labels for select() statements.
I get this error for every additional table (class) included in my join. The message always refers to the foreign key.
Anyone else encounter this error and determine root cause? Or have ya'll just been ignoring it as well?
EDIT/UPDATE:
Handling Duplicate Columns in Pandas DataFrame constructor from SQLAlchemy Join
These guys seem to be talking about a related issue, but they use a different pandas method to bring the dataframe in and want to keep duplicates, not drop them. Anyone have thoughts on how to implement a similar styled function, but drop the duplicates as the query comes back?
For what it's worth, here's my limited answer.
For the following SAWarning:
SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.
Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key.
Consider use_labels for select() statements.
It's really telling you that there are columns with duplicate names, even if the columns are in separate tables. In most cases this is innocuous as the columns are simple the join keys. However, I have encountered cases where the tables contain duplicately named by distincted populated columns (ie a teacher table with name column and student table with name column). In these cases, rename the pandas dataframe with an approach like this, or rename the underlying database tables.
I'll keep an eye out on this question and if anyone has a better one I'll gladly award the answer.