SQLAlchemy Joining 2 tables using Junction table - python

I am learning SQL-Python using SQLAlchemy and will appreciate much help on this.
I have 3 tables,
Table 1 (Actors) : nconst (primary key), names
Table 2 (Movies) : tconst (primary key) , titles
Table 3 (Junction table) : nconst (from Actors table) , tconst(from Movies table)
I am trying to obtain 10 rows of actors that acted in particular movies. Hence I am trying to do an inner join of Actors on Junction table (using nconst) and then another inner join onto Movies table.
In SQL, this means
FROM principals INNER JOIN actors
ON principals.nconst=actors.nconst INNER JOIN
movies ON principals.tconst=movies.tconst
In SQLAlchemy, my current code is:
mt = list(session.query(Movies, Principals, Actors).select_from(
join(Movies, Principals, Movies.tconst == Principals.tconst)
.join(Actors, Principals, Actors.nconst == Principals.nconst
).with_entities(
Movies.title, # Select clause
))
Alternatively, I am trying
from sqlalchemy.orm import join
mv = list(session.query(Actors).select_from(
join(Movies, Principals, Actors, Movies.tconst == Principals.tconst,
Actors.nconst == Principals.nconst) # Join clause
).with_entities(
Actors.name, # Select clause
Movies.title,
))
mv
The error I am getting is an Attribute Error, "Actor type object 'Actors' has no attribute '_from_objects'
Appreciate much help on this. Thank you very much.

Related

Filter rows with remaining only the latest record in SQLAlchemy

I have been trying to write the SQLAlchemy code that should function as the following SQL query.
SELECT * FROM events AS ev
INNER JOIN event_types AS et1 on ev.event_type_id = et1.id
INNER JOIN (
SELECT event_type, MAX(created_at) AS LatestCreatedAt
FROM event_types et GROUP BY event_type
) AS et2
ON
et2.event_type = et1.event_type
AND
et2.LatestCreatedAt = et1.created_at
What I'm trying to do is to
Get all columns from the events table
Inner join the event_type table (et1) on the event table
Group by the event_type with only the rows that have the latest record (i.e. Filter out old event types by looking at created_at if duplicated)
Inner join the grouped event_type (et2) on the event_type table (et1)
What I wrote for the SQL Alchemy version of the above is
from sqlalchemy import func
subquery = session.query(EventTypeTable.event_type,
func.max(EventTypeTable.created_at).group_by(EventTypeTable.event_type)).all()
events = (session.query(EventTable)
.join(EventTypeTable)
.join(subquery)
.all())
However, I get the following error.
Neither 'max' object nor 'Comparator' object has an attribute 'group_by'
It seems to complain that I can not use group_by with max function. Is there any other way to get the query results while leaving only the latest record on the created_at column in the event_type table in SQL Alchemy?
Any help or comments are appreciated. Thank you!

Python: use one sqlite query to find the NOT EXISTS result

I have a dataset of million entries, its comprised of songs and their artists.
I have
a track_id
an artist_id.
There are 3 tables
tracks (track_id, title, artist_id),
artists(artist_id and artist_name) and
artist_term (artist_id and term).
Using only one query, I have to count the number of tracks whose artists don't have any linked terms.
For more reference, the schema of the DB is as follows:
CREATE TABLE tracks (track_id text PRIMARY KEY, title text, release text, year int, duration real, artist_id text);
CREATE TABLE artists (artist_id text, artist_name text);
CREATE TABLE artist_term (artist_id text, term text, FOREIGN KEY(artist_id)
REFERENCES artists(artist_id));
How do I get to the solution? please help!
You can use not exists:
select count(*) cnt
from tracks t
where not exists (select 1 from artist_term at where at.artist_id = t.artist_id)
As far as concerns you do not need to bring in the artists table since artist_id is available in both tracks and artist_term tables.
For performance you want an index on tracks(artist_id) and another one on artist_term(artist_id).
An anti-left join would also get the job done:
select count(*) cnt
from tracks t
left join artist_term at on at.artist_id = t.artist_id
where at.artist_id is null
You can join the tables tracks and artists and left join the table artist_term so to find the unmatched artist_ids:
select count(distinct t.track_id)
from tracks t
inner join artists a on a.artist_id = t.artist_id
left join artist_term at on at.artist_id = a.artist_id
where at.artist_id is null
The condition at.artist_id is null in the WHERE clause will return only the unmatched rows which will be counted.
If I'm not mistaken, such a query could be built in a similar fashion like its sibling SQL languages. If so, it should look something like this:
SELECT COUNT(track_id)
FROM tracks as t
WHERE EXISTS (
SELECT *
FROM artists AS a
WHERE a.artist_id = t.artist_id
AND NOT EXISTS(
SELECT *
FROM artist_term as at
WHERE at.artist_id = a.artist_id
)
)
So this query basically says: count the number of different tracks (marked by their unique track_id), where there is an artist that has the same artist_id, where no artist_term exists that refers to the artist_id of the artist.
Hope this helps!

How to do a SELECT * following all foreign keys

I want to do a SELECT statement that will get all the data in one table + follow all the foreign keys from that table with a LEFT OUTER JOIN. For example:
`orderitem`
id
name
title_id
`title`
id
name
In the above example, I would be able to use the statement:
SELECT * FROM orderitem LEFT OUTER JOIN title on orderitem.title_id=title.id
Is there a way that I could do this not knowing the table structure? That is, to have a function like the following:
def get_select_statement(table)
???
get_select_statement(orderitem)
==> "SELECT * FROM orderitem LEFT OUTER JOIN title on orderitem.title_id=title.id"
How would this be done?
To clarify this question, I think I'm looking for the following information from this function:
What are all the column names in the given table?
What tables do they reference in a ForeignKey relationship and what is the relationship to be able to join?
In addition, note that not all orderitems will have a title, so doing any sort of INNER JOIN would delete data.
In MySQLDB you could retrieve column names by using describe statement:
DESCRIBE table_name;
And all info about foreign keys:
select *
from information_schema.KEY_COLUMN_USAGE
where TABLE_SCHEMA = "schema_name"
and TABLE_NAME="table_name"
and REFERENCED_TABLE_NAME IS NOT NULL
To evaluate this query and load the result in python you could use SQLAlchemy package, for example.
engine = sqlalchemy.create_engine("mysqldb://user:password#host/db")
res = engine.execute("DESCRIBE table_name;")
columns = [row["Field"] for row in res]
res = engine.execute("{}".format(query_for_foreign_keys))
foreign_keys = [row["COLUMN_NAME"] for row in res]
referenced_column_names = [row["REFERENCED_COLUMN_NAME"] for row in res]
referenced_table_names = [row["REFERENCED_TABLE_NAME"] for row in res]
Then you could generate the query using all the data above

Parse SQL Script to extract table and column names

If I have a SQL script is there a way to parse and extract the columns and tables referenced in the script into a table like structure :
Script:
Select t1.first, t1.last, t2.car, t2.make, t2.year
from owners t1
left join cars t2
on t1.owner_id = t2.owner_id
Output:
Table Column
owners first
owners last
owners owner_id
cars car
cars make
cars year
cars owner_id
Old question but interesting so here it goes - turn your script temporarily into a stored procedure forcing SQL Server to map the dependencies and then you can retrieve them by using:
SELECT referenced_entity_name ,referenced_minor_name FROM sys.dm_sql_referenced_entities('dbo.stp_ObjectsToTrack', 'Object')
This is what you want in SQL Server:
select t.name as [Table], c.name as [Column]
from sys.columns c
inner join sys.tables t
on c.object_id = t.object_id

How to count rows with SELECT COUNT(*) with SQLAlchemy?

I'd like to know if it's possible to generate a SELECT COUNT(*) FROM TABLE statement in SQLAlchemy without explicitly asking for it with execute().
If I use:
session.query(table).count()
then it generates something like:
SELECT count(*) AS count_1 FROM
(SELECT table.col1 as col1, table.col2 as col2, ... from table)
which is significantly slower in MySQL with InnoDB. I am looking for a solution that doesn't require the table to have a known primary key, as suggested in Get the number of rows in table using SQLAlchemy.
Query for just a single known column:
session.query(MyTable.col1).count()
I managed to render the following SELECT with SQLAlchemy on both layers.
SELECT count(*) AS count_1
FROM "table"
Usage from the SQL Expression layer
from sqlalchemy import select, func, Integer, Table, Column, MetaData
metadata = MetaData()
table = Table("table", metadata,
Column('primary_key', Integer),
Column('other_column', Integer) # just to illustrate
)
print select([func.count()]).select_from(table)
Usage from the ORM layer
You just subclass Query (you have probably anyway) and provide a specialized count() method, like this one.
from sqlalchemy.sql.expression import func
class BaseQuery(Query):
def count_star(self):
count_query = (self.statement.with_only_columns([func.count()])
.order_by(None))
return self.session.execute(count_query).scalar()
Please note that order_by(None) resets the ordering of the query, which is irrelevant to the counting.
Using this method you can have a count(*) on any ORM Query, that will honor all the filter andjoin conditions already specified.
I needed to do a count of a very complex query with many joins. I was using the joins as filters, so I only wanted to know the count of the actual objects. count() was insufficient, but I found the answer in the docs here:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
The code would look something like this (to count user objects):
from sqlalchemy import func
session.query(func.count(User.id)).scalar()
Addition to the Usage from the ORM layer in the accepted answer: count(*) can be done for ORM using the query.with_entities(func.count()), like this:
session.query(MyModel).with_entities(func.count()).scalar()
It can also be used in more complex cases, when we have joins and filters - the important thing here is to place with_entities after joins, otherwise SQLAlchemy could raise the Don't know how to join error.
For example:
we have User model (id, name) and Song model (id, title, genre)
we have user-song data - the UserSong model (user_id, song_id, is_liked) where user_id + song_id is a primary key)
We want to get a number of user's liked rock songs:
SELECT count(*)
FROM user_song
JOIN song ON user_song.song_id = song.id
WHERE user_song.user_id = %(user_id)
AND user_song.is_liked IS 1
AND song.genre = 'rock'
This query can be generated in a following way:
user_id = 1
query = session.query(UserSong)
query = query.join(Song, Song.id == UserSong.song_id)
query = query.filter(
and_(
UserSong.user_id == user_id,
UserSong.is_liked.is_(True),
Song.genre == 'rock'
)
)
# Note: important to place `with_entities` after the join
query = query.with_entities(func.count())
liked_count = query.scalar()
Complete example is here.
If you are using the SQL Expression Style approach there is another way to construct the count statement if you already have your table object.
Preparations to get the table object. There are also different ways.
import sqlalchemy
database_engine = sqlalchemy.create_engine("connection string")
# Populate existing database via reflection into sqlalchemy objects
database_metadata = sqlalchemy.MetaData()
database_metadata.reflect(bind=database_engine)
table_object = database_metadata.tables.get("table_name") # This is just for illustration how to get the table_object
Issuing the count query on the table_object
query = table_object.count()
# This will produce something like, where id is a primary key column in "table_name" automatically selected by sqlalchemy
# 'SELECT count(table_name.id) AS tbl_row_count FROM table_name'
count_result = database_engine.scalar(query)
I'm not clear on what you mean by "without explicitly asking for it with execute()" So this might be exactly what you are not asking for.
OTOH, this might help others.
You can just run the textual SQL:
your_query="""
SELECT count(*) from table
"""
the_count = session.execute(text(your_query)).scalar()
def test_query(val: str):
query = f"select count(*) from table where col1='{val}'"
rtn = database_engine.query(query)
cnt = rtn.one().count
but you can find the way if you checked debug watch
query = session.query(table.column).filter().with_entities(func.count(table.column.distinct()))
count = query.scalar()
this worked for me.
Gives the query:
SELECT count(DISTINCT table.column) AS count_1
FROM table where ...
Below is the way to find the count of any query.
aliased_query = alias(query)
db.session.query(func.count('*')).select_from(aliased_query).scalar()
Here is the link to the reference document if you want to explore more options or read details.

Categories