How make list of ingredients in recipes in SQLITE3 Python - python

I'm trying make database of recipes. In the table "recipes" with col "ingredients", I would like to have a list of ingredient IDs, e.g. [2,5,7]. Can I make something like this or I should be looking for another solution?
import sqlite3
conn = sqlite3.connect('recipes.db')
c = conn.cursor()
c.execute('''CREATE TABLE recipes(ID INT, name TEXT, ingredients INT)''')
c.execute('''CREATE TABLE ingredients(ID INT, nazwa TEXT, kcal REAL)''')
Another idea is to make another table (The list of ingredients) where I will have 15 cols with number of ingredients.
c.execute('''CREATE TABLE The_list_of_ingredients(ID INT, ingredient1 INT, ingredient2 INT, ...)''')
Can I connect every ingredient 1, ingredient 2 ... with their respective ingredients ID?

You're likely looking for a many-to-many relation between recipes and their ingredients.
CREATE TABLE recipes(ID INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE ingredients(ID INTEGER PRIMARY KEY, name TEXT, kcal REAL);
CREATE TABLE recipe_ingredients(
ID INTEGER PRIMARY KEY AUTOINCREMENT,
recipe_id INTEGER,
ingredient_id INTEGER,
quantity REAL,
FOREIGN KEY(recipe_id) REFERENCES recipes(ID),
FOREIGN KEY(ingredient_id) REFERENCES ingredients(ID)
);
This way your data might look something like e.g.
ingredients
id
name
kcal
1
egg
155
2
cream
196
recipes
id
name
1000
omelette
recipe_ingredients
recipe_id
ingredient_id
quantity
1000
1
100
1000
2
50
(assuming kcal is kcal per 100g, and quantity is in grams and a rather creamy omelette)

You can try to store id's as string
json.dumps(list_of_ingredients_ids)
But probably best solution is many to many relation

Related

How to write python program to import csv data file into a relational database without inserting duplicate entries

I am using the following python code to import CSV data file into a relational database.
However, my code is inserting duplicate rows (with unique primary key ID) in all dictionary tables. How do I update my code to get rid of the dupes.
import csv
from cs50 import SQL
open ("shows.db", "w").close()
db = SQL("sqlite:///shows.db")
db.execute("CREATE TABLE shows (id INTEGER, title TEXT, PRIMARY KEY(id) )")
db.execute("CREATE TABLE genres (id INTEGER, genre TEXT, PRIMARY KEY(id) )")
db.execute("CREATE TABLE shows_genres (show_id INTEGER, genre_id INTEGER, FOREIGN KEY(show_id) REFERENCES shows(id), FOREIGN KEY(genre_id) REFERENCES genres(id) )")
with open("Favorite TV Shows - Form Responses 1.csv", "r") as file:
reader = csv.DictReader(file)
for row in reader:
titles = row["title"].strip().upper()
show_id = db.execute("INSERT INTO shows (title) VALUES (?)", titles)
for genre in row["genres"].split(", "):
genre_id = db.execute("INSERT OR IGNORE INTO genres (genre) VALUES (?)", genre)
db.execute("INSERT INTO shows_genres (show_id, genre_id) VALUES(?, ?)", show_id, genre_id)
Raw Data - Favorite Movie Poll:
Date Time
Show
Genre
10/1/2021 9:00:00
The Office
Comedy
10/1/2021 9:03:00
Fringe
SciFi
10/1/2021 9:08:00
The Office
Comedy
10/1/2021 9:10:00
Games of Thrones
Action, Fantasy
Example of current output (has dupes) - Genre Dictionary Table:
Genre_ID
Name
1
Comedy
2
Sci-Fi
3
Comedy
4
Action
5
Fantasy
Desired Output - Shows Dictionary Table:
Show_ID
Title
1
The Office
2
Fringe
3
GoT
Desired Output - Genre Dictionary Table:
Genre_ID
Name
1
Comedy
2
Sci-Fi
3
Action
4
Fantasy
Desired Output - Shows_Genres Table:
Show_ID
Genre_ID
1
1
2
2
3 (GoT)
3 (Action)
3 (GoT)
4 (Fantasy)
You can create your tables using a UNIQUE constraint on the appropriate column(s) (UNIQUE constraint tutorial). This prevents duplicates from being inserted.
Note that when you try to insert a duplicate it will come back with an error. You simply need to catch this error so that it doesn't crash your program.
In this case, if you need the ID, you can then simply query the table using a WHERE clause to get the existing row.
Also, it's important to note that running a query and then your insertion will not guarantee uniqueness if anything else might be editing the database. It's possible that another thread/process could insert the value in question between your query and insertion.

Python: use one sqlite query to find the NOT EXISTS result

I have a dataset of million entries, its comprised of songs and their artists.
I have
a track_id
an artist_id.
There are 3 tables
tracks (track_id, title, artist_id),
artists(artist_id and artist_name) and
artist_term (artist_id and term).
Using only one query, I have to count the number of tracks whose artists don't have any linked terms.
For more reference, the schema of the DB is as follows:
CREATE TABLE tracks (track_id text PRIMARY KEY, title text, release text, year int, duration real, artist_id text);
CREATE TABLE artists (artist_id text, artist_name text);
CREATE TABLE artist_term (artist_id text, term text, FOREIGN KEY(artist_id)
REFERENCES artists(artist_id));
How do I get to the solution? please help!
You can use not exists:
select count(*) cnt
from tracks t
where not exists (select 1 from artist_term at where at.artist_id = t.artist_id)
As far as concerns you do not need to bring in the artists table since artist_id is available in both tracks and artist_term tables.
For performance you want an index on tracks(artist_id) and another one on artist_term(artist_id).
An anti-left join would also get the job done:
select count(*) cnt
from tracks t
left join artist_term at on at.artist_id = t.artist_id
where at.artist_id is null
You can join the tables tracks and artists and left join the table artist_term so to find the unmatched artist_ids:
select count(distinct t.track_id)
from tracks t
inner join artists a on a.artist_id = t.artist_id
left join artist_term at on at.artist_id = a.artist_id
where at.artist_id is null
The condition at.artist_id is null in the WHERE clause will return only the unmatched rows which will be counted.
If I'm not mistaken, such a query could be built in a similar fashion like its sibling SQL languages. If so, it should look something like this:
SELECT COUNT(track_id)
FROM tracks as t
WHERE EXISTS (
SELECT *
FROM artists AS a
WHERE a.artist_id = t.artist_id
AND NOT EXISTS(
SELECT *
FROM artist_term as at
WHERE at.artist_id = a.artist_id
)
)
So this query basically says: count the number of different tracks (marked by their unique track_id), where there is an artist that has the same artist_id, where no artist_term exists that refers to the artist_id of the artist.
Hope this helps!

SQLite Trigger: Update a table after insert is done on another

I have three main tables to keep track of products, location and the logistics between them which includes moving products to and from various locations. I have made another table balance to keep a final balance of the quantity of each product in respective locations.
Here are the schemas:
products(prod_id INTEGER PRIMARY KEY AUTOINCREMENT,
prod_name TEXT UNIQUE NOT NULL,
prod_quantity INTEGER NOT NULL,
unallocated_quantity INTEGER)
Initially, when products are added, prod_quantity and unallocated_quantity have the same values. unallocated_quantity is then subtracted from, each time a certain quantity of the respective product is allocated.
location(loc_id INTEGER PRIMARY KEY AUTOINCREMENT,
loc_name TEXT UNIQUE NOT NULL)
logistics(trans_id INTEGER PRIMARY KEY AUTOINCREMENT,
prod_id INTEGER NOT NULL,
from_loc_id INTEGER NULL,
to_loc_id INTEGER NOT NULL,
prod_quantity INTEGER NOT NULL,
trans_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY(prod_id) REFERENCES products(prod_id),
FOREIGN KEY(from_loc_id) REFERENCES location(loc_id),
FOREIGN KEY(to_loc_id) REFERENCES location(loc_id))
balance(prod_id INTEGER NOT NULL,
loc_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
FOREIGN KEY(prod_id) REFERENCES products(prod_id),
FOREIGN KEY(loc_id) REFERENCES location(loc_id))
At each entry made in logistics, I want a trigger to update the values in balance thereby keeping a summary of all the transactions (moving products between locations)
I thought of a trigger solution which checks if for each insert on the table logistics, there already exists the same prod_id, loc_id entry in the balance table, which if exists will be updated appropriately. However, I don't have the experience in SQLite to implement this idea.
I believe that your TRIGGER would be along the lines of either :-
CREATE TRIGGER IF NOT EXISTS logistics_added AFTER INSERT ON logistics
BEGIN
UPDATE balance SET quantity = ((SELECT quantity FROM balance WHERE prod_id = new.prod_id AND loc_id = new.from_loc_id) - new.prod_quantity) WHERE prod_id = new.prod_id AND loc_id = new.from_loc_id;
UPDATE balance SET quantity = ((SELECT quantity FROM balance WHERE prod_id = new.prod_id AND loc_id = new.to_loc_id) + new.prod_quantity) WHERE prod_id = new.prod_id AND loc_id = new.to_loc_id;
END;
or :-
CREATE TRIGGER IF NOT EXISTS logistics_added AFTER INSERT ON logistics
BEGIN
INSERT OR REPLACE INTO balance VALUES(new.prod_id,new.from_loc_id,(SELECT quantity FROM balance WHERE prod_id = new.prod_id AND loc_id = new.from_loc_id) - new.prod_quantity);
INSERT OR REPLACE INTO balance VALUES(new.prod_id,new.to_loc_id,(SELECT quantity FROM balance WHERE prod_id = new.prod_id AND loc_id = new.to_loc_id) + new.prod_quantity);
END;
Note that the second relies upon adding a UNIQUE constraint to the balance table by using PRIMARY KEY (prod_id,loc_id) or alternately UNIQUE (prod_id,loc_id). The UNIQUE constraint would probably be required/wanted anyway.
The subtle difference is that the second would INSERT a balance row if and appropriate one didn't exist. The latter would do nothing if the appropriate balance row didn't exist.

Print values from another table SQLite3 w/ Python

I want to create a simple recipe script. So I have a Table with some recipes and one with ingredients. Now I linked all ingredients_id to the the recipe.
Is it possible to print the name of those id's from another table?
import sqlite3
conn = sqlite3.connect('food.db')
c = conn.cursor()
c.execute("""CREATE TABLE IF NOT EXISTS ingredients (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL)""")
c.execute("""CREATE TABLE IF NOT EXISTS recipes (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
quantity REAL,
ingredients_id TEXT,
FOREIGN KEY(ingredients_id) REFERENCES ingredients(id)
)""")
c.execute("INSERT INTO recipes VALUES ('0', 'Pasta with Tomato', '1', '2,3')")
c.execute("INSERT INTO ingredients VALUES ('2', 'Pasta')")
c.execute("INSERT INTO ingredients VALUES ('3', 'Tomato')")
c.execute("SELECT * FROM recipes WHERE id='0'")
print(c.fetchone())
conn.commit()
conn.close()
Your problem is classic many-to-many relationship.
Each product can be ingredient in many recipes, each recipe can have multiple ingredients.
To implement this, you need 3rd table, that holds "membership" per product per recipe. This is common (if not best) practice for m2m problem.
There's a lot of examples in SO, pls look for them

How do I speed up (or break up) this MySQL query?

I'm building a video recommendation site (think pandora for music videos) in python and MySQL. I have three tables in my db:
video - a table of of the videos. Data doesn't change. Columns are:
CREATE TABLE `video` (
id int(11) NOT NULL AUTO_INCREMENT,
website_id smallint(3) unsigned DEFAULT '0',
rating_global varchar(128) DEFAULT '0',
title varchar(256) DEFAULT NULL,
thumb_url text,
PRIMARY KEY (`id`),
KEY `websites` (`website_id`),
KEY `id` (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=49362 DEFAULT CHARSET=utf8
video_tag - a table of the tags (attributes) associated with each video. Doesn't change.
CREATE TABLE `video_tag` (
id int(7) NOT NULL AUTO_INCREMENT,
video_id mediumint(7) unsigned DEFAULT '0',
tag_id mediumint(7) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `video_id` (`video_id`),
KEY `tag_id` (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=562456 DEFAULT CHARSET=utf8
user_rating - a table of good or bad ratings that the user has given each tag. Data always changing.
CREATE TABLE `user_rating` (
id int(11) NOT NULL AUTO_INCREMENT,
user_id smallint(3) unsigned DEFAULT '0',
tag_id int(5) unsigned DEFAULT '0',
tag_rating float(10,5) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `video` (`tag_id`),
KEY `user_id` (`user_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=447 DEFAULT CHARSET=utf8
Based on the user's preferences, I want to score each unwatched video, and try and predict what they will like best. This has resulted in the following massive query, which takes about 2 seconds to complete for 50,000 videos:
SELECT video_tag.video_id,
(sum(user_rating.tag_rating) * video.rating_global) as score
FROM video_tag
JOIN user_rating ON user_rating.tag_id = video_tag.tag_id
JOIN video ON video.id = video_tag.video_id
WHERE user_rating.user_id = 1 AND video.website_id = 2
AND rating_global > 0 AND video_id NOT IN (1,2,3) GROUP BY video_id
ORDER BY score DESC LIMIT 20
I desperately need to make this more efficient, so I'm just looking for advice as to what the best direction is. Some ideas I've considered:
a) Rework my db table structure (not sure how)
b) Offload more of the grouping and aggregation into Python (haven't figured out a way to join three tables that is actually faster)
c) Store the non-changing tables in memory to try and speed computation time (earlier tinkering hasn't yielded any gains yet..)
How would you recommend making this more efficient?
Thanks you!!
--
Per request in the comments, EXPLAIN SELECT.. shows:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE user_rating ref video,user_id user_id 3 const 88 Using where; Using temporary; Using filesort
1 SIMPLE video_tag ref video_id,tag_id tag_id 4 db.user_rating.tag_id 92 Using where
1 SIMPLE video eq_ref PRIMARY,websites,id PRIMARY 4 db.video_tag.video_id 1 Using where
Change the field type of the *rating_global* to a numeric type (either float or integer), no need for it to be varchar. Personally I would change all rating fields to integer, I find no need for them to be float.
Drop the KEY on id, PRIMARY KEY is already indexed.
video.id,rating_global,website_id
Watch the integer length for your references (e.g. video_id -> video.id) you may run out of numbers. These sizes should be the same.
I suggest the following 2-step solution to replace your query:
CREATE TEMPORARY TABLE rating_stats ENGINE=MEMORY
SELECT video_id, SUM(tag_rating) AS tag_rating_sum
FROM user_rating ur JOIN video_tag vt ON vt.id = ur.tag_id AND ur.user_id=1
GROUP BY video_id ORDER BY NULL
SELECT v.id, tag_rating_sum*rating_global AS score FROM video v
JOIN rating_stats rs ON rs.video_id = v.id
WHERE v.website_id=2 AND v.rating_global > 0 AND v.id NOT IN (1,2,3)
ORDER BY score DESC LIMIT 20
For the latter query to perform really fast, you could incorporate in your PRIMARY KEY in the video table fields website_id and rating_global (perhaps only website_id is enough though).
You can also use another table with these statistics and precalculate dynamically based on user login/action frequency. I am guessing you can show the cached data instead of showing live results, there shouldn't be much difference.

Categories