I'm querying my SQLAlchemy-mapped star schema directly into a pandas DataFrame and am getting an annoying SAWarning from pandas that I'd like to address. Here's a simplified version.
class School(Base):
__tablename__ = 'DimSchool'
id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
class StudentScore(Base):
__tablename__ = 'FactStudentScore'
StudentKey = Column('StudentKey', Integer, ForeignKey('DimStudent.StudentKey'), primary_key = True)
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)
student = relationship("Student", backref='studentscore')
school = relationship("School", backref='studentscore')
I query the date with statements like this:
standard = session.query(StudentdScore, School).\
join(School).filter(School.name.like('%Dever%'))
testdf = pd.read_sql(sch.statement, sch.session.bind)
And then get this warning:
SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key. Consider use_labels for select() statements.
I get this error for every additional table (class) included in my join. The message always refers to the foreign key.
Anyone else encounter this error and determine root cause? Or have ya'll just been ignoring it as well?
EDIT/UPDATE:
Handling Duplicate Columns in Pandas DataFrame constructor from SQLAlchemy Join
These guys seem to be talking about a related issue, but they use a different pandas method to bring the dataframe in and want to keep duplicates, not drop them. Anyone have thoughts on how to implement a similar styled function, but drop the duplicates as the query comes back?
For what it's worth, here's my limited answer.
For the following SAWarning:
SAWarning: Column 'SchoolKey' on table <sqlalchemy.sql.selectable.
Select at 0x1ab7abe0; Select object> being replaced by Column('SchoolKey', Integer(), table=<Select object>, primary_key=True, nullable=False), which has the same key.
Consider use_labels for select() statements.
It's really telling you that there are columns with duplicate names, even if the columns are in separate tables. In most cases this is innocuous as the columns are simple the join keys. However, I have encountered cases where the tables contain duplicately named by distincted populated columns (ie a teacher table with name column and student table with name column). In these cases, rename the pandas dataframe with an approach like this, or rename the underlying database tables.
I'll keep an eye out on this question and if anyone has a better one I'll gladly award the answer.
Related
I need to replace the default integer id in my model with an uuid. The problem is that it's beeing used in another model (foreignkey).
Any idea on how to perform this operation without losing data?
class A(Base):
__tablename__ = 'a'
b_id = Column(
GUID(), ForeignKey('b.id'), nullable=False,
server_default=text("uuid_generate_v4()")
)
class B(Base):
__tablename__ = 'b'
id = Column(
GUID(), primary_key=True,
server_default=text("uuid_generate_v4()")
)
Unfortunately it doesn't work, also I'm afraid I'll break the relation.
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) default for column "id" cannot be cast automatically to type uuid
Alembic migration I've tried looks similar to:
op.execute('ALTER TABLE a ALTER COLUMN b_id SET DATA TYPE UUID USING (uuid_generate_v4())')
Add an id_tmp column to b with autogenerated UUID values, and a b_id_tmp column to a. Update a joining b on the foreign key to fill a.b_id_tmp with the corresponding UUIDs. Then drop a.b_id and b.id, rename the added columns, and reestablish the primary key and foreign key.
CREATE TABLE a(id int PRIMARY KEY, b_id int);
CREATE TABLE b(id int PRIMARY KEY);
ALTER TABLE a ADD CONSTRAINT a_b_id_fkey FOREIGN KEY(b_id) REFERENCES b(id);
INSERT INTO b VALUES (1), (2), (3);
INSERT INTO a VALUES (1, 1), (2, 2), (3, 2);
ALTER TABLE b ADD COLUMN id_tmp UUID NOT NULL DEFAULT uuid_generate_v1mc();
ALTER TABLE a ADD COLUMN b_id_tmp UUID;
UPDATE a SET b_id_tmp = b.id_tmp FROM b WHERE b.id = a.b_id;
ALTER TABLE a DROP COLUMN b_id;
ALTER TABLE a RENAME COLUMN b_id_tmp TO b_id;
ALTER TABLE b DROP COLUMN id;
ALTER TABLE b RENAME COLUMN id_tmp TO id;
ALTER TABLE b ADD PRIMARY KEY (id);
ALTER TABLE a ADD CONSTRAINT b_id_fkey FOREIGN KEY(b_id) REFERENCES b(id);
Just as an aside, it's more efficient to index v1 UUIDs than v4 since they contain some reproducible information, which you'll notice if you generate several in a row. That's a minor savings unless you need the higher randomness for external security reasons.
I am working with Python, PostgreSQL, SQLAlchemy and alembic.
I have to design a database, but I am kinda stuck because my design needs to have a column which will store a list of IDs which are basically foreign keys. I am not sure how to do that and moreover if I should be doing that.
Example: I have a discount table which basically contains all the available discount codes.I have a column discount_applies to where I want to store a list of all products to which the discount applies (I cannot edit the products table). Basically the column will contain a list of UUIDs of products on which the discount can be applied
class Product(Base):
.....
class Discount(Base):
.....
class ProductDiscount(Base):
__tablename__ = 'discount_applies'
product_id = Column(String(32), nullable=False, ,ForeignKey('product.id'))
discount_id = Column(String(32), nullable=False,ForeignKey('discount.id')) #If discount primary key if Integer then change String to Integer
product = relationship(Product)
discount = relationship(Discount)
I'm using SQLAlchemy with MySQL and have a table with two foreign keys:
class CampaignCreativeLink(db.Model):
__tablename__ = 'campaign_creative_links'
campaign_id = db.Column(db.Integer, db.ForeignKey('campaigns.id'),
primary_key=True)
creative_id = db.Column(db.Integer, db.ForeignKey('creatives.id'),
primary_key=True)
Then I use a for loop to insert 3 items into the table like this:
session.add(8, 3)
session.add(8, 2)
session.add(8, 1)
But when I checked the table, the items are ordered reversely
8 1
8 2
8 3
And the query shows the order reversely too. What's the reason for this and how can I keep the order same as when they were added?
A table is a set of rows and are therefore not guaranteed to have any order unless you specify ORDER BY.
In MySQL (InnoDB), the primary key acts as the clustered index. This means that the rows are physically stored in the order specified by the primary key, in this case (campaign_id, created_id), regardless of the order of insertion. This is usually the order the rows are returned in if you don't specify an ORDER BY.
If you need your rows returned in a certain order, specify ORDER BY when you query.
SQLAlchemy supports creating partial indexes in postgresql.
Is it possible to create a partial unique index through SQLAlchemy?
Imagine a table/model as so:
class ScheduledPayment(Base):
invoice_id = Column(Integer)
is_canceled = Column(Boolean, default=False)
I'd like a unique index where there can be only one "active" ScheduledPayment for a given invoice.
I can create this manually in postgres:
CREATE UNIQUE INDEX only_one_active_invoice on scheduled_payment
(invoice_id, is_canceled) where not is_canceled;
I'm wondering how I can add that to my SQLAlchemy model using SQLAlchemy 0.9.
class ScheduledPayment(Base):
id = Column(Integer, primary_key=True)
invoice_id = Column(Integer)
is_canceled = Column(Boolean, default=False)
__table_args__ = (
Index('only_one_active_invoice', invoice_id, is_canceled,
unique=True,
postgresql_where=(~is_canceled)),
)
In case someone stops by looking to set up a partial unique constraint with a column that can optionally be NULL, here's how:
__table_args__ = (
db.Index(
'uk_providers_name_category',
'name', 'category',
unique=True,
postgresql_where=(user_id.is_(None))),
db.Index(
'uk_providers_name_category_user_id',
'name', 'category', 'user_id',
unique=True,
postgresql_where=(user_id.isnot(None))),
)
where user_id is a column that can be NULL and I want a unique constraint enforced across all three columns (name, category, user_id) with NULL just being one of the allowed values for user_id.
To add to the answer by sas, postgresql_where does not seem to be able to accept multiple booleans. So in the situation where you have TWO null-able columns (let's assume an additional 'price' column) it is not possible to have four partial indices for all combinations of NULL/~NULL.
One workaround is to use default values which would never be 'valid' (e.g. -1 for price or '' for a Text column. These would compare correctly, so no more than one row would be allowed to have these default values.
Obviously, you will also need to insert this default value in all existing rows of data (if applicable).
In sqlalchemy 0.5 i have a table defined like this one:
orders = Table('orders', metadata,
Column('id', Integer, primary_key=True),
Column('responsable', String(255)),
Column('customer', String(255)),
Column('progressive', Integer),
Column('date', Date),
Column('exported', Boolean()),
)
Is it possible to define only the customer and year of the date as unique ?
The year isn't a column only a part of a date but it could be nice to have the year of the date and the customer to be a single key of the table.
Is it possible in sqlalchemy 0.5 ?
Thanks
Without splitting the date field into a month field, a date field, and a year field, there really isn't a way to do what you're asking. It would (probably) be easier and simpler for you to include the whole date (month/day/year) in the composite primary; if (id, year) uniquely defines a record, then so will (id, date).
Quoth the documentation:
Multiple columns may be assigned the
primary_key=True flag which denotes a
multi-column primary key, known as a
composite primary key.
You can use a function in a constraint but it's not database independent. See here for details: Compound UniqueConstraint with a function
However I would argue that the best thing to do would be to create a column for year. It will certainly be easier.