recursive sqlite query given a parent id - python

I have a sqlite database with this structure
create table categories
( CategoryID integer not null primary key
, CategoryName text not null
, CategoryParentID integer null
,FOREIGN KEY (CategoryParentID) REFERENCES categories(CategoryID)
);
And given a parent ID i want to have all the descendants (as a tree) with n levels, but i don't know how to do it.
Really appreciate your help on this.
I already have something like this, but i dont know where to put the parent id condition to start the search
WITH RECURSIVE cte_categories (categorias.CategoryID, categorias.CategoryName, categorias.CategoryParentID, depth) AS (
SELECT categorias.CategoryID, categorias.CategoryName, categorias.CategoryParentID, 1
FROM categorias
WHERE categorias.CategoryParentID IS NULL
UNION ALL
SELECT c.CategoryID, c.CategoryName, c.CategoryParentID, r.depth + 1
FROM categories AS c
INNER JOIN cte_categories AS r ON (c.CategoryParentID = r.CategoryID)
)
SELECT CategoryName, depth, CategoryParentID
FROM cte_categories ORDER BY depth, CategoryName;

Related

How do I get a SQLite query to return the id of a table when I have two tables that have an attribute named id?

I can't seem to find anything on how to access the id attribute from the table I want. I have 4 tables that I have joined. User, workouts, exercises, and sets. They all have primary keys with the attribute name id.
My query:
query = """SELECT users.firstName, workouts.dateandtime, workouts.id, sets.*, exercises.name FROM users
JOIN workouts ON users.id = workouts.userID JOIN sets ON workouts.id = sets.workoutID JOIN exercises ON
sets.exerciseID = exercises.id WHERE users.id = ? ORDER BY sets.id DESC"""
I'm only grabbing the workouts.id and sets.id because user.id is found when the user logs in and exercises.id is cast amongst all users and it's important in this step.
Trying to access the sets.id like this does not work:
posts_unsorted = cur.execute(query, userID).fetchall()
for e in posts_unsorted:
print(e['id']) # Prints workouts.id I'm assuming because it's the first id I grab in the query
print(e['sets.id']) # Error because sets.id does not exist
Is there a way to name the sets.id when making the query so that I can actually use it? Should I be setting up my database differently to gab the sets.id? I don't know what direction I should be going.
This post How do you avoid column name conflicts?. Shows that you can give your tables aliases. This helps make it easier to refer to you tables in queries. It also gives what your query returns direction in what to name everything.
If you have two tables that both have an attribute called id. You will need to give them an alias to be able to access both attributes.
An example:
.schema sets
CREATE TABLE "sets"(
id INTEGER NOT NULL,
interval INTEGER NOT NULL,
workoutID INTEGER NOT NULL,
PRIMARY KEY id,
FORGIEN KEY workoutID REFERENCES workouts(id)
);
.schema workouts
CREATE TABLE "workouts"(
id INTEGER NOT NULL,
date SMALLDATETIME NOT NULL,
PRIMARY KEY id,
FORGIEN KEY workoutID REFERENCES workouts(id)
);
Fill the database:
INSERT INTO workouts (date) VALUES (2022-03-14), (2022-02-13);
INSERT INTO sets (interval, workoutID) VALUES (5, 1), (4, 1), (3, 2), (2, 2);
Both tables have a primary key labeled id. If you must access both ids you will need to add an alias in your query.
database = sqlite3.connect("name.db")
database.row = sqlite3.Row
cur = database.cursor()
query = """SELECT sets.id AS s_id, workouts.date AS w_date, workouts.id AS w_id
FROM sets JOIN workouts ON sets.workoutID=w_id"""
posts = cur.execute(query).fetchall()
This will return to you named tuples making to easy to retrieve the data you want. The data will look like this:
[{'s_id':1, 'w_date':'2022-03-14', 'w_id':1},
{'s_id':2, 'w_date':'2022-03-14', 'w_id':1},
{'s_id':3, 'w_date':'2022-02-13', 'w_id':2},
{'s_id':4, 'w_date':'2022-02-13', 'w_id':2}]
With this set of data you will be able to access everything by name instead of index.

How to know which records cause an issue when I run a SQL MERGE statement in Python

I was running below codes in python. It's doing a merge from one table to another table. But sometimes it gave me errors due to duplicates. How do I know which records have been merge and which one has not so that I can trace the records and fix it. Or at least, how to make my code log hinted error message so that I can trace it?
# Exact match client on NAME/DOB (not yet using name_dob_v)
sql = """
merge into nf.es es using (
select id, name_last, name_first, dob
from fd.emp
where name_last is not null and name_first is not null and dob is not null
) es6
on (upper(es.patient_last_name) = upper(es6.name_last) and upper(es.patient_first_name) = upper(es6.name_first)
and es.patient_dob = ems6.dob)
when matched then update set
es.client_id = ems6.id
, es.client_id_comment = '2 exact name/exact dob match'
where
es.client_id is null -- exclude those already matched
and es.patient_last_name is not null and es.patient_first_name is not null and es.patient_dob is not null
and es.is_lock = 'Locked' and es.is_active = 'Yes' and es.patient_last_name NOT IN ('DOE','UNKNOWN','DELETE', 'CANCEL','CANCELLED','CXL','REFUSED')
"""
log.info(sql)
curs.execute(sql)
msg = "nf.es rows updated with es6 client_id due to exact name/dob match: %d" % curs.rowcount
log.info(msg)
emailer.append(msg)
You can't know, merge won't tell you. You have to actually find them and take appropriate action.
Maybe it'll help if you select distinct values:
merge into nf.es es using (
select DISTINCT --> this
id, name_last, name_first, dob
from fd.emp
...
If it still doesn't work, then join table to be merged with the one in using clause on all columns you're doing it already and see which rows are duplicate. Something like this:
SELECT *
FROM (SELECT d.id,
d.name_last,
d.name_first,
d.dob
FROM fd.emp d
JOIN nf.es e
ON UPPER (e.patient_last_name) = UPPER (d.name_last)
AND UPPER (e.patient_first_name) = UPPER (d.name_first)
WHERE d.name_last IS NOT NULL
AND d.name_first IS NOT NULL
AND d.dob IS NOT NULL)
GROUP BY id,
name_last,
name_first,
dob
HAVING COUNT (*) > 2;

Python: use one sqlite query to find the NOT EXISTS result

I have a dataset of million entries, its comprised of songs and their artists.
I have
a track_id
an artist_id.
There are 3 tables
tracks (track_id, title, artist_id),
artists(artist_id and artist_name) and
artist_term (artist_id and term).
Using only one query, I have to count the number of tracks whose artists don't have any linked terms.
For more reference, the schema of the DB is as follows:
CREATE TABLE tracks (track_id text PRIMARY KEY, title text, release text, year int, duration real, artist_id text);
CREATE TABLE artists (artist_id text, artist_name text);
CREATE TABLE artist_term (artist_id text, term text, FOREIGN KEY(artist_id)
REFERENCES artists(artist_id));
How do I get to the solution? please help!
You can use not exists:
select count(*) cnt
from tracks t
where not exists (select 1 from artist_term at where at.artist_id = t.artist_id)
As far as concerns you do not need to bring in the artists table since artist_id is available in both tracks and artist_term tables.
For performance you want an index on tracks(artist_id) and another one on artist_term(artist_id).
An anti-left join would also get the job done:
select count(*) cnt
from tracks t
left join artist_term at on at.artist_id = t.artist_id
where at.artist_id is null
You can join the tables tracks and artists and left join the table artist_term so to find the unmatched artist_ids:
select count(distinct t.track_id)
from tracks t
inner join artists a on a.artist_id = t.artist_id
left join artist_term at on at.artist_id = a.artist_id
where at.artist_id is null
The condition at.artist_id is null in the WHERE clause will return only the unmatched rows which will be counted.
If I'm not mistaken, such a query could be built in a similar fashion like its sibling SQL languages. If so, it should look something like this:
SELECT COUNT(track_id)
FROM tracks as t
WHERE EXISTS (
SELECT *
FROM artists AS a
WHERE a.artist_id = t.artist_id
AND NOT EXISTS(
SELECT *
FROM artist_term as at
WHERE at.artist_id = a.artist_id
)
)
So this query basically says: count the number of different tracks (marked by their unique track_id), where there is an artist that has the same artist_id, where no artist_term exists that refers to the artist_id of the artist.
Hope this helps!

Trying to SELECT row by long list of composite primary keys in SQLite

This is my query using code found perusing this site:
query="""SELECT Family
FROM Table2
INNER JOIN Table1 ON Table1.idSequence=Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN ({seq})
""".format(seq=','.join(['?']*len(matchIds_list)))
matchIds_list is a list of tuples in (?,?) format.
It works if I just ask for one condition (ie just Table1.Chromosome as oppose to both Chromosome and hg_coordinate) and matchIds_list is just a simple list of single values, but I don't know how to get it to work with a composite key or both columns.
Since you're running SQLite 3.7.17, I'd recommend to just use a temporary table.
Create and populate your temporary table.
cursor.executescript("""
CREATE TEMP TABLE control_list (
Chromosome TEXT NOT NULL,
hg19_coordinate TEXT NOT NULL
);
CREATE INDEX control_list_idx ON control_list (Chromosome, hg19_coordinate);
""")
cursor.executemany("""
INSERT INTO control_list (Chromosome, hg19_coordinate)
VALUES (?, ?)
""", matchIds_list)
Just constrain your query to the control list temporary table.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM control_list
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
And finally perform your query (there's no need to format this one).
cursor.execute(query)
# Remove the temporary table since we're done with it.
cursor.execute("""
DROP TABLE control_list;
""")
Short Query (requires SQLite 3.15): You actually almost had it. You need to make the IN ({seq}) a subquery
expression.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN (VALUES {seq});
Long Query (requires SQLite 3.8.3): It looks a little complicated, but it's pretty straight forward. Put your
control list into a sub-select, and then constrain that main select by the control
list.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM (
SELECT
-- Name the columns (must match order in tuples).
"" AS Chromosome,
":1" AS hg19_coordinate
FROM (
-- Get control list.
VALUES {seq}
) AS control_values
) AS control_list
-- Constrain Table1 to control_list.
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
Regardless of which query you use, when formatting the SQL replace {seq} with (?,?) for each compsite
key instead of just ?.
query = " ... ".format(seq=','.join(['(?,?)']*len(matchIds_list)))
And finally flatten matchIds_list when you execute the query because it is a list of tuples.
import itertools
cursor.execute(query, list(itertools.chain.from_iterable(matchIds_list)))

How do I speed up (or break up) this MySQL query?

I'm building a video recommendation site (think pandora for music videos) in python and MySQL. I have three tables in my db:
video - a table of of the videos. Data doesn't change. Columns are:
CREATE TABLE `video` (
id int(11) NOT NULL AUTO_INCREMENT,
website_id smallint(3) unsigned DEFAULT '0',
rating_global varchar(128) DEFAULT '0',
title varchar(256) DEFAULT NULL,
thumb_url text,
PRIMARY KEY (`id`),
KEY `websites` (`website_id`),
KEY `id` (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=49362 DEFAULT CHARSET=utf8
video_tag - a table of the tags (attributes) associated with each video. Doesn't change.
CREATE TABLE `video_tag` (
id int(7) NOT NULL AUTO_INCREMENT,
video_id mediumint(7) unsigned DEFAULT '0',
tag_id mediumint(7) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `video_id` (`video_id`),
KEY `tag_id` (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=562456 DEFAULT CHARSET=utf8
user_rating - a table of good or bad ratings that the user has given each tag. Data always changing.
CREATE TABLE `user_rating` (
id int(11) NOT NULL AUTO_INCREMENT,
user_id smallint(3) unsigned DEFAULT '0',
tag_id int(5) unsigned DEFAULT '0',
tag_rating float(10,5) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `video` (`tag_id`),
KEY `user_id` (`user_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=447 DEFAULT CHARSET=utf8
Based on the user's preferences, I want to score each unwatched video, and try and predict what they will like best. This has resulted in the following massive query, which takes about 2 seconds to complete for 50,000 videos:
SELECT video_tag.video_id,
(sum(user_rating.tag_rating) * video.rating_global) as score
FROM video_tag
JOIN user_rating ON user_rating.tag_id = video_tag.tag_id
JOIN video ON video.id = video_tag.video_id
WHERE user_rating.user_id = 1 AND video.website_id = 2
AND rating_global > 0 AND video_id NOT IN (1,2,3) GROUP BY video_id
ORDER BY score DESC LIMIT 20
I desperately need to make this more efficient, so I'm just looking for advice as to what the best direction is. Some ideas I've considered:
a) Rework my db table structure (not sure how)
b) Offload more of the grouping and aggregation into Python (haven't figured out a way to join three tables that is actually faster)
c) Store the non-changing tables in memory to try and speed computation time (earlier tinkering hasn't yielded any gains yet..)
How would you recommend making this more efficient?
Thanks you!!
--
Per request in the comments, EXPLAIN SELECT.. shows:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE user_rating ref video,user_id user_id 3 const 88 Using where; Using temporary; Using filesort
1 SIMPLE video_tag ref video_id,tag_id tag_id 4 db.user_rating.tag_id 92 Using where
1 SIMPLE video eq_ref PRIMARY,websites,id PRIMARY 4 db.video_tag.video_id 1 Using where
Change the field type of the *rating_global* to a numeric type (either float or integer), no need for it to be varchar. Personally I would change all rating fields to integer, I find no need for them to be float.
Drop the KEY on id, PRIMARY KEY is already indexed.
video.id,rating_global,website_id
Watch the integer length for your references (e.g. video_id -> video.id) you may run out of numbers. These sizes should be the same.
I suggest the following 2-step solution to replace your query:
CREATE TEMPORARY TABLE rating_stats ENGINE=MEMORY
SELECT video_id, SUM(tag_rating) AS tag_rating_sum
FROM user_rating ur JOIN video_tag vt ON vt.id = ur.tag_id AND ur.user_id=1
GROUP BY video_id ORDER BY NULL
SELECT v.id, tag_rating_sum*rating_global AS score FROM video v
JOIN rating_stats rs ON rs.video_id = v.id
WHERE v.website_id=2 AND v.rating_global > 0 AND v.id NOT IN (1,2,3)
ORDER BY score DESC LIMIT 20
For the latter query to perform really fast, you could incorporate in your PRIMARY KEY in the video table fields website_id and rating_global (perhaps only website_id is enough though).
You can also use another table with these statistics and precalculate dynamically based on user login/action frequency. I am guessing you can show the cached data instead of showing live results, there shouldn't be much difference.

Categories