Python sqlite - insert if not exists [duplicate] - python

I have an SQLite database. I am trying to insert values (users_id, lessoninfo_id) in table bookmarks, only if both do not exist before in a row.
INSERT INTO bookmarks(users_id,lessoninfo_id)
VALUES(
(SELECT _id FROM Users WHERE User='"+$('#user_lesson').html()+"'),
(SELECT _id FROM lessoninfo
WHERE Lesson="+lesson_no+" AND cast(starttime AS int)="+Math.floor(result_set.rows.item(markerCount-1).starttime)+")
WHERE NOT EXISTS (
SELECT users_id,lessoninfo_id from bookmarks
WHERE users_id=(SELECT _id FROM Users
WHERE User='"+$('#user_lesson').html()+"') AND lessoninfo_id=(
SELECT _id FROM lessoninfo
WHERE Lesson="+lesson_no+")))
This gives an error saying:
db error near where syntax.

If you never want to have duplicates, you should declare this as a table constraint:
CREATE TABLE bookmarks(
users_id INTEGER,
lessoninfo_id INTEGER,
UNIQUE(users_id, lessoninfo_id)
);
(A primary key over both columns would have the same effect.)
It is then possible to tell the database that you want to silently ignore records that would violate such a constraint:
INSERT OR IGNORE INTO bookmarks(users_id, lessoninfo_id) VALUES(123, 456)

If you have a table called memos that has two columns id and text you should be able to do like this:
INSERT INTO memos(id,text)
SELECT 5, 'text to insert'
WHERE NOT EXISTS(SELECT 1 FROM memos WHERE id = 5 AND text = 'text to insert');
If a record already contains a row where text is equal to 'text to insert' and id is equal to 5, then the insert operation will be ignored.
I don't know if this will work for your particular query, but perhaps it give you a hint on how to proceed.
I would advice that you instead design your table so that no duplicates are allowed as explained in #CLs answer below.

For a unique column, use this:
INSERT OR REPLACE INTO tableName (...) values(...);
For more information, see: sqlite.org/lang_insert

insert into bookmarks (users_id, lessoninfo_id)
select 1, 167
EXCEPT
select user_id, lessoninfo_id
from bookmarks
where user_id=1
and lessoninfo_id=167;
This is the fastest way.
For some other SQL engines, you can use a Dummy table containing 1 record.
e.g:
select 1, 167 from ONE_RECORD_DUMMY_TABLE

Related

How do I get a SQLite query to return the id of a table when I have two tables that have an attribute named id?

I can't seem to find anything on how to access the id attribute from the table I want. I have 4 tables that I have joined. User, workouts, exercises, and sets. They all have primary keys with the attribute name id.
My query:
query = """SELECT users.firstName, workouts.dateandtime, workouts.id, sets.*, exercises.name FROM users
JOIN workouts ON users.id = workouts.userID JOIN sets ON workouts.id = sets.workoutID JOIN exercises ON
sets.exerciseID = exercises.id WHERE users.id = ? ORDER BY sets.id DESC"""
I'm only grabbing the workouts.id and sets.id because user.id is found when the user logs in and exercises.id is cast amongst all users and it's important in this step.
Trying to access the sets.id like this does not work:
posts_unsorted = cur.execute(query, userID).fetchall()
for e in posts_unsorted:
print(e['id']) # Prints workouts.id I'm assuming because it's the first id I grab in the query
print(e['sets.id']) # Error because sets.id does not exist
Is there a way to name the sets.id when making the query so that I can actually use it? Should I be setting up my database differently to gab the sets.id? I don't know what direction I should be going.
This post How do you avoid column name conflicts?. Shows that you can give your tables aliases. This helps make it easier to refer to you tables in queries. It also gives what your query returns direction in what to name everything.
If you have two tables that both have an attribute called id. You will need to give them an alias to be able to access both attributes.
An example:
.schema sets
CREATE TABLE "sets"(
id INTEGER NOT NULL,
interval INTEGER NOT NULL,
workoutID INTEGER NOT NULL,
PRIMARY KEY id,
FORGIEN KEY workoutID REFERENCES workouts(id)
);
.schema workouts
CREATE TABLE "workouts"(
id INTEGER NOT NULL,
date SMALLDATETIME NOT NULL,
PRIMARY KEY id,
FORGIEN KEY workoutID REFERENCES workouts(id)
);
Fill the database:
INSERT INTO workouts (date) VALUES (2022-03-14), (2022-02-13);
INSERT INTO sets (interval, workoutID) VALUES (5, 1), (4, 1), (3, 2), (2, 2);
Both tables have a primary key labeled id. If you must access both ids you will need to add an alias in your query.
database = sqlite3.connect("name.db")
database.row = sqlite3.Row
cur = database.cursor()
query = """SELECT sets.id AS s_id, workouts.date AS w_date, workouts.id AS w_id
FROM sets JOIN workouts ON sets.workoutID=w_id"""
posts = cur.execute(query).fetchall()
This will return to you named tuples making to easy to retrieve the data you want. The data will look like this:
[{'s_id':1, 'w_date':'2022-03-14', 'w_id':1},
{'s_id':2, 'w_date':'2022-03-14', 'w_id':1},
{'s_id':3, 'w_date':'2022-02-13', 'w_id':2},
{'s_id':4, 'w_date':'2022-02-13', 'w_id':2}]
With this set of data you will be able to access everything by name instead of index.

With pyodbc & SQL Server, how to insert multiple foreign keys in a table

I am able to insert a foreign key in a SQL table. However, after doing the same thing for 3 other tables, I will have to insert those 4 FK's in my fact table. I am asking now to know in advance if this is the way to go, database-model-wise.
Code to skip duplicate rows, insert columns and a FK RegionID:
cursor.execute(("""
IF NOT EXISTS (
SELECT #address1client, #address2client, #cityClient
INTERSECT
SELECT address1client, address2client, cityClient
FROM dbo.AddressClient)
BEGIN
INSERT INTO dbo.AddressClient (address1client, address2client, cityClient, RegionID)
SELECT #address1client, #address2client, #cityClient, RegionID
FROM dbo.Region
WHERE province=#province AND country=#country)
END""")
My questions are:
Does a BEGIN ... END statement execute all at once? If the answer is yes, would the code below work? I ask, because there can at no point be FK_ID columns with null values.
...
BEGIN
INSERT INTO dbo.Fact(product,saleTotal,saleDate) VALUES (#product, #saleTotal, #saleDate)
INSERT INTO dbo.Fact (ClientAddressID)
SELECT ClientAddressID
FROM dbo.ClientAddress
WHERE address1c=#address1c AND address2c=#address2c AND cityC=#cityC)
INSERT INTO dbo.Fact (SupplierAddressID)
SELECT SupplierAddressID
FROM dbo.SupplierAddress
WHERE address1s=#address1s AND address2s=#address2s AND cityS=#cityS)
INSERT INTO dbo.Fact (DetailID)
SELECT DetailID
FROM dbo.Detail
WHERE categoryNum=#categoryNum AND type=#type AND nature=#nature)
END""")
2- If a BEGIN ... END statement doesn't execute all at once, how do I go about inserting multiple FK's in a table?

Postgresql: Insert from huge csv file, collect the ids and respect unique constraints

In a postgresql database:
class Persons(models.Model):
person_name = models.CharField(max_length=10, unique=True)
The persons.csv file, contains 1 million names.
$cat persons.csv
Name-1
Name-2
...
Name-1000000
I want to:
Create the names that do not already exist
Query the database and fetch the id for each name contained in the csv file.
My approach:
Use the COPY command or the django-postgres-copy application that implements it.
Also take advantage of the new Postgresql-9.5+ upsert feature.
Now, all the names in the csv file, are also in the database.
I need to get their ids -from the database- either in memory or in another csv file with an efficient way:
Use Q objects
list_of_million_q = <iterate csv and append Qs>
million_names = Names.objects.filter(list_of_million_q)
or
Use __in to filter based on a list of names:
list_of_million_names = <iterate csv and append strings>
million_names = Names.objects.filter(
person_name__in=[list_of_million_names]
)
or
?
I do not feel that any of the above approaches for fetching the ids is efficient.
Update
There is a third option, along the lines of this post that should be a great solution which combines all the above.
Something like:
SELECT * FROM persons;
make a name: id dictionary out of the names recieved from the database:
db_dict = {'Harry': 1, 'Bob': 2, ...}
Query the dictionary:
ids = []
for name in list_of_million_names:
if name in db_dict:
ids.append(db_dict[name])
This way you're using the quick dictionary indexing as opposed to the slower if x in list approach.
But the only way to really know for sure is to benchmark these 3 approaches.
This post describes how to use RETURNING with ON CONFLICT so while inserting into the database the contents of the csv file, the ids will be saved in another table either when an insertion was successful, or when -due to unique constraints- the insertion was omitted.
I have tested it in sqlfiddle where I used a set up that resembles the one used for the COPY command which inserts to the database straight from a csv file, respecting the unique constraints.
The schema:
CREATE TABLE IF NOT EXISTS label (
id serial PRIMARY KEY,
label_name varchar(200) NOT NULL UNIQUE
);
INSERT INTO label (label_name) VALUES
('Name-1'),
('Name-2');
CREATE TABLE IF NOT EXISTS ids (
id serial PRIMARY KEY,
label_ids varchar(12) NOT NULL
);
The script:
CREATE TEMP TABLE tmp_table
(LIKE label INCLUDING DEFAULTS)
ON COMMIT DROP;
INSERT INTO tmp_table (label_name) VALUES
('Name-2'),
('Name-3');
WITH ins AS(
INSERT INTO label
SELECT *
FROM tmp_table
ON CONFLICT (label_name) DO NOTHING
RETURNING id
)
INSERT INTO ids (label_ids)
SELECT
id FROM ins
UNION ALL
SELECT
l.id FROM tmp_table
JOIN label l USING(label_name);
The output:
SELECT * FROM ids;
SELECT * FROM label;

Trying to SELECT row by long list of composite primary keys in SQLite

This is my query using code found perusing this site:
query="""SELECT Family
FROM Table2
INNER JOIN Table1 ON Table1.idSequence=Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN ({seq})
""".format(seq=','.join(['?']*len(matchIds_list)))
matchIds_list is a list of tuples in (?,?) format.
It works if I just ask for one condition (ie just Table1.Chromosome as oppose to both Chromosome and hg_coordinate) and matchIds_list is just a simple list of single values, but I don't know how to get it to work with a composite key or both columns.
Since you're running SQLite 3.7.17, I'd recommend to just use a temporary table.
Create and populate your temporary table.
cursor.executescript("""
CREATE TEMP TABLE control_list (
Chromosome TEXT NOT NULL,
hg19_coordinate TEXT NOT NULL
);
CREATE INDEX control_list_idx ON control_list (Chromosome, hg19_coordinate);
""")
cursor.executemany("""
INSERT INTO control_list (Chromosome, hg19_coordinate)
VALUES (?, ?)
""", matchIds_list)
Just constrain your query to the control list temporary table.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM control_list
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
And finally perform your query (there's no need to format this one).
cursor.execute(query)
# Remove the temporary table since we're done with it.
cursor.execute("""
DROP TABLE control_list;
""")
Short Query (requires SQLite 3.15): You actually almost had it. You need to make the IN ({seq}) a subquery
expression.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
WHERE (Table1.Chromosome, Table1.hg19_coordinate) IN (VALUES {seq});
Long Query (requires SQLite 3.8.3): It looks a little complicated, but it's pretty straight forward. Put your
control list into a sub-select, and then constrain that main select by the control
list.
SELECT Family
FROM Table2
INNER JOIN Table1
ON Table1.idSequence = Table2.idSequence
-- Constrain to control_list.
WHERE EXISTS (
SELECT *
FROM (
SELECT
-- Name the columns (must match order in tuples).
"" AS Chromosome,
":1" AS hg19_coordinate
FROM (
-- Get control list.
VALUES {seq}
) AS control_values
) AS control_list
-- Constrain Table1 to control_list.
WHERE control_list.Chromosome = Table1.Chromosome
AND control_list.hg19_coordinate = Table1.hg19_coordinate
)
Regardless of which query you use, when formatting the SQL replace {seq} with (?,?) for each compsite
key instead of just ?.
query = " ... ".format(seq=','.join(['(?,?)']*len(matchIds_list)))
And finally flatten matchIds_list when you execute the query because it is a list of tuples.
import itertools
cursor.execute(query, list(itertools.chain.from_iterable(matchIds_list)))

How can django produce this SQL?

I have the following SQL query that returns what i need:
SELECT sensors_sensorreading.*, MAX(sensors_sensorreading.timestamp) AS "last"
FROM sensors_sensorreading
GROUP BY sensors_sensorreading.chipid
In words: get the last sensor reading entry for each unique chipid.
But i cannot seem to figure out the correct Django ORM statement to produce this query. The best i could come up with is:
SensorReading.objects.values('chipid').annotate(last=Max('timestamp'))
But if i inspect the raw sql it generates:
>>> print connection.queries[-1:]
[{u'time': u'0.475', u'sql': u'SELECT
"sensors_sensorreading"."chipid",
MAX("sensors_sensorreading"."timestamp") AS "last" FROM
"sensors_sensorreading" GROUP BY "sensors_sensorreading"."chipid"'}]
As you can see, it almost generates the correct SQL, except django selects only the chipid field and the aggregate "last" (but i need all the table fields returned instead).
Any idea how to return all fields?
Assuming you also have other fields in the table besides chipid and timestamp, then I would guess this is the SQL you actually need:
select * from (
SELECT *, row_number() over (partition by chipid order by timestamp desc) as RN
FROM sensors_sensorreading
) X where RN = 1
This will return the latest rows for each chipid with all the data that is in the row.

Categories