I have a dataset of million entries, its comprised of songs and their artists.
I have
a track_id
an artist_id.
There are 3 tables
tracks (track_id, title, artist_id),
artists(artist_id and artist_name) and
artist_term (artist_id and term).
Using only one query, I have to count the number of tracks whose artists don't have any linked terms.
For more reference, the schema of the DB is as follows:
CREATE TABLE tracks (track_id text PRIMARY KEY, title text, release text, year int, duration real, artist_id text);
CREATE TABLE artists (artist_id text, artist_name text);
CREATE TABLE artist_term (artist_id text, term text, FOREIGN KEY(artist_id)
REFERENCES artists(artist_id));
How do I get to the solution? please help!
You can use not exists:
select count(*) cnt
from tracks t
where not exists (select 1 from artist_term at where at.artist_id = t.artist_id)
As far as concerns you do not need to bring in the artists table since artist_id is available in both tracks and artist_term tables.
For performance you want an index on tracks(artist_id) and another one on artist_term(artist_id).
An anti-left join would also get the job done:
select count(*) cnt
from tracks t
left join artist_term at on at.artist_id = t.artist_id
where at.artist_id is null
You can join the tables tracks and artists and left join the table artist_term so to find the unmatched artist_ids:
select count(distinct t.track_id)
from tracks t
inner join artists a on a.artist_id = t.artist_id
left join artist_term at on at.artist_id = a.artist_id
where at.artist_id is null
The condition at.artist_id is null in the WHERE clause will return only the unmatched rows which will be counted.
If I'm not mistaken, such a query could be built in a similar fashion like its sibling SQL languages. If so, it should look something like this:
SELECT COUNT(track_id)
FROM tracks as t
WHERE EXISTS (
SELECT *
FROM artists AS a
WHERE a.artist_id = t.artist_id
AND NOT EXISTS(
SELECT *
FROM artist_term as at
WHERE at.artist_id = a.artist_id
)
)
So this query basically says: count the number of different tracks (marked by their unique track_id), where there is an artist that has the same artist_id, where no artist_term exists that refers to the artist_id of the artist.
Hope this helps!
Related
I have 3 tables in SQLite database:
Songs:
_id | name | length | artist_id (foreign key) | album_id (foreign key)
Artists:
_id | name
Albums:
_id | name
I need a query (use it in an Android app) of a table that consists of the following columns:
_id | name | length | artist_id | artist_name | album_id | album_name
However, I write the following query statement:
SELECT Songs._id, Songs.name, Songs.length, Songs.artist_id, Artists.name, Songs.album_id, Albums.name FROM Songs, Artists, Albums WHERE Songs.artist_id = Artists._id AND Songs.album_id = Albums._id
but it gives me an empty table. I tried OR instead of AND and it gives incorrect results (every song duplicates in each album, though the artist is correct). How can I fix my query statement to join the 3 tables in a single table?
Using an explicit JOIN instead of an implicit one, the following should get what you want, although it is curious that your implicit join syntax did not return correct results in the first place. I have used a LEFT JOIN, to account for songs which do not have an associated artist or album, but that may not be necessary for your data set and an INNER JOIN could be used instead.
I have also added column aliases to eliminate ambiguity when fetching rows, since you have similar column names in most of your tables (id, name).
SELECT
Songs._id AS song_id,
Songs.name AS song_name,
Songs.length,
Songs.artist_id AS artist_id,
Artists.name AS artist_name,
Songs.album_id AS album_id,
Albums.name AS album_name
FROM
Songs
LEFT JOIN Artists ON Songs.artist_id = Artists._id
LEFT JOIN Albums ON Songs.album_id = Albums._id
Try this select, may by the Artists is more important than others, so the Songs come trough Artists and Albums from Songs.
SELECT
Songs._id AS song_id,
Songs.name AS song_name,
Songs.length,
Songs.artist_id AS artist_id,
Artists.name AS artist_name,
Songs.album_id AS album_id,
Albums.name AS album_name
FROM
Artists
LEFT JOIN Songs ON Songs.artist_id = Artists._id
LEFT JOIN Albums ON Songs.album_id = Albums._id
Also if there is no entry in Songs belonging to a particular artist or no entry in Albums belonging to a particular song, you will still get the artist entry thanks to the LEFT JOIN. If you would like to return only artists with songs and albums, use JOIN instead.
try sql inner join
inner join Artists on Songs.artist_id = Artists._id
I have a database in SQL Server which contains huge numbers of tables. But each table has 'id' and 'value'. I want to join all the tables which their names contain a certain text (e.g., 'ts2') based on their id (id is a common key). So my desired table should have 'id' and the 'values' of each table with the name of the table. For example:
TableAts2:
id,
value
TableBts2:
id,
value
Tablts2C:
id,
value
...
My desired table:
mytable:
id, value_TableAts2,value_TableBts2, value_Tablts2C
Some source datatables
CREATE TABLE TableAts2 (id INT, value INT);
INSERT INTO TableAts2 VALUES (1,1), (2,2);
CREATE TABLE TableBts2 LIKE TableAts2;
INSERT INTO TableBts2 VALUES (1,11), (3,33);
CREATE TABLE TableCts2 LIKE TableAts2;
INSERT INTO TableCts2 VALUES (2,222), (3,333);
Build the query text
SELECT
CONCAT( 'SELECT id, ',
GROUP_CONCAT(table_name, '.value value_',table_name) ,
'\nFROM (',
GROUP_CONCAT('SELECT id FROM ',table_name SEPARATOR ' UNION '),
') ids\nLEFT JOIN ',
GROUP_CONCAT(table_name, ' USING (id)' SEPARATOR '\nLEFT JOIN ')
)
INTO #sql
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = DATABASE();
Check the query built
SELECT #sql;
SELECT id, TableAts2.value value_TableAts2,TableBts2.value value_TableBts2,TableCts2.value value_TableCts2
FROM (SELECT id FROM TableAts2 UNION SELECT id FROM TableBts2 UNION SELECT id FROM TableCts2) ids
LEFT JOIN TableAts2 USING (id)
LEFT JOIN TableBts2 USING (id)
LEFT JOIN TableCts2 USING (id)
Execute the query
PREPARE stmt FROM #sql;
EXECUTE stmt;
DROP PREPARE stmt;
id
value_TableAts2
value_TableBts2
value_TableCts2
1
1
11
null
2
2
null
222
3
null
33
333
db<>fiddle here
Of course the query needs in a check that the table contains both id and value columns (by according subquery to INFORMATION_SCHEMA.COLUMNS).
i think this should work fine for you :
select tableAts1.* , TableBts2.* , Tablts2C.*
from tableAts1
inner join TableBts2 on tableAts.id = TableBts2.id
inner join Tablts2C on tableBts2.id = tablts2C.id
First, you need to insert one column in each table.
For example
TableAts2:
id,
user_id,
value
TableBts2:
id,
user_id,
value
Tablts2C:
id,
user_id,
value
...
Second, you need to join each table with user_id.
Now please run this sql command:
select t.value_TableAts2 as value_TableAts2, t.value_TableBts2 as value_TableBts2, Tablts2C.value as value_Tablts2C from (select TableAts2.value as value_TableAts2, TableBts2.value as value_TableBts2, TableAts2.user_id as user_id from TableAts2 left join TableBts2 on TableAts2.user_id = TableBts2.user_id) as t left join Tablts2C on t.user_id = Tablts2C.user_id
Result :::
Your desired table:
id, value_TableAts2,value_TableBts2, value_Tablts2C
I am learning SQL-Python using SQLAlchemy and will appreciate much help on this.
I have 3 tables,
Table 1 (Actors) : nconst (primary key), names
Table 2 (Movies) : tconst (primary key) , titles
Table 3 (Junction table) : nconst (from Actors table) , tconst(from Movies table)
I am trying to obtain 10 rows of actors that acted in particular movies. Hence I am trying to do an inner join of Actors on Junction table (using nconst) and then another inner join onto Movies table.
In SQL, this means
FROM principals INNER JOIN actors
ON principals.nconst=actors.nconst INNER JOIN
movies ON principals.tconst=movies.tconst
In SQLAlchemy, my current code is:
mt = list(session.query(Movies, Principals, Actors).select_from(
join(Movies, Principals, Movies.tconst == Principals.tconst)
.join(Actors, Principals, Actors.nconst == Principals.nconst
).with_entities(
Movies.title, # Select clause
))
Alternatively, I am trying
from sqlalchemy.orm import join
mv = list(session.query(Actors).select_from(
join(Movies, Principals, Actors, Movies.tconst == Principals.tconst,
Actors.nconst == Principals.nconst) # Join clause
).with_entities(
Actors.name, # Select clause
Movies.title,
))
mv
The error I am getting is an Attribute Error, "Actor type object 'Actors' has no attribute '_from_objects'
Appreciate much help on this. Thank you very much.
This is my query using code found perusing this site:
query="""SELECT Family
FROM Table2
INNER JOIN Table1 ON Table1.idSequence=Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN ({seq})
""".format(seq=','.join(['?']*len(matchIds_list)))
matchIds_list is a list of tuples in (?,?) format.
It works if I just ask for one condition (ie just Table1.Chromosome as oppose to both Chromosome and hg_coordinate) and matchIds_list is just a simple list of single values, but I don't know how to get it to work with a composite key or both columns.
Since you're running SQLite 3.7.17, I'd recommend to just use a temporary table.
Create and populate your temporary table.
cursor.executescript("""
CREATE TEMP TABLE control_list (
Chromosome TEXT NOT NULL,
hg19_coordinate TEXT NOT NULL
);
CREATE INDEX control_list_idx ON control_list (Chromosome, hg19_coordinate);
""")
cursor.executemany("""
INSERT INTO control_list (Chromosome, hg19_coordinate)
VALUES (?, ?)
""", matchIds_list)
Just constrain your query to the control list temporary table.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM control_list
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
And finally perform your query (there's no need to format this one).
cursor.execute(query)
# Remove the temporary table since we're done with it.
cursor.execute("""
DROP TABLE control_list;
""")
Short Query (requires SQLite 3.15): You actually almost had it. You need to make the IN ({seq}) a subquery
expression.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN (VALUES {seq});
Long Query (requires SQLite 3.8.3): It looks a little complicated, but it's pretty straight forward. Put your
control list into a sub-select, and then constrain that main select by the control
list.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM (
SELECT
-- Name the columns (must match order in tuples).
"" AS Chromosome,
":1" AS hg19_coordinate
FROM (
-- Get control list.
VALUES {seq}
) AS control_values
) AS control_list
-- Constrain Table1 to control_list.
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
Regardless of which query you use, when formatting the SQL replace {seq} with (?,?) for each compsite
key instead of just ?.
query = " ... ".format(seq=','.join(['(?,?)']*len(matchIds_list)))
And finally flatten matchIds_list when you execute the query because it is a list of tuples.
import itertools
cursor.execute(query, list(itertools.chain.from_iterable(matchIds_list)))
I want to do a SELECT statement that will get all the data in one table + follow all the foreign keys from that table with a LEFT OUTER JOIN. For example:
`orderitem`
id
name
title_id
`title`
id
name
In the above example, I would be able to use the statement:
SELECT * FROM orderitem LEFT OUTER JOIN title on orderitem.title_id=title.id
Is there a way that I could do this not knowing the table structure? That is, to have a function like the following:
def get_select_statement(table)
???
get_select_statement(orderitem)
==> "SELECT * FROM orderitem LEFT OUTER JOIN title on orderitem.title_id=title.id"
How would this be done?
To clarify this question, I think I'm looking for the following information from this function:
What are all the column names in the given table?
What tables do they reference in a ForeignKey relationship and what is the relationship to be able to join?
In addition, note that not all orderitems will have a title, so doing any sort of INNER JOIN would delete data.
In MySQLDB you could retrieve column names by using describe statement:
DESCRIBE table_name;
And all info about foreign keys:
select *
from information_schema.KEY_COLUMN_USAGE
where TABLE_SCHEMA = "schema_name"
and TABLE_NAME="table_name"
and REFERENCED_TABLE_NAME IS NOT NULL
To evaluate this query and load the result in python you could use SQLAlchemy package, for example.
engine = sqlalchemy.create_engine("mysqldb://user:password#host/db")
res = engine.execute("DESCRIBE table_name;")
columns = [row["Field"] for row in res]
res = engine.execute("{}".format(query_for_foreign_keys))
foreign_keys = [row["COLUMN_NAME"] for row in res]
referenced_column_names = [row["REFERENCED_COLUMN_NAME"] for row in res]
referenced_table_names = [row["REFERENCED_TABLE_NAME"] for row in res]
Then you could generate the query using all the data above