Get the columns that were actually updated to a different value? - python

I'm running something like this:
cursor.execute(
'''
UPDATE my_table
SET name=%s, about=%s, title=%s
WHERE id=%s
''',
(name, about, title, id_)
)
Which is guaranteed to only update one row, since it's doing the update based on the id primary key.
However, most of the time only one of the fields actually changes, i.e. the about and title are "updated" to the same value they were already, and only name has actually changed.
How can I get which of the columns actually changed? This is needed to log every individual change.

you can select values before update and using RETURNING * compare values in final query, like here:
t=# create table m1 (i int, e int);
CREATE TABLE
Time: 1.855 ms
t=# insert into m1 select 1,2;
INSERT 0 1
Time: 1.037 ms
t=# begin;
BEGIN
t=# with o as (select * from m1 where i=1)
,u as (update m1 set e=3 where i=1 returning *)
select * from o
join u on o.i = u.i
;
i | e | i | e
---+---+---+---
1 | 2 | 1 | 3
(1 row)
so you can put logic against u.e <> o.e or alike

Related

How to remove millions of rows in MySQL?

I have one huge table that I would like to make smaller. It has ~230 Million rows.
Both columns are indexed. The structure is:
+--------------+------------+
| id_my_value | id_ref |
+--------------+------------+
| YYYY | XXXX |
+--------------+------------+
I would have to remove the values that have a particular "id_ref" value. I have tried the following:
sql = f"SELECT id_ref FROM REFS"
cursor.execute(sql)
refs = cursor.fetchall()
limit = 1000
for current in refs:
id = current["id_ref"]
sql = f"DELETE FROM MY_VALUES WHERE id_ref = {id} LIMIT {limit}"
while True:
cursor.execute(sql)
mydb.commit()
if cursor.rowcount == 0:
break
Regardless the value I set to "limit" the query is tremendously slow:
DELETE FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10;
I have also tried the other way around. Select the id_value associated with a particular id_ref, and delete:
SELECT id_value FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10
DELETE FROM MY_VALUES WHERE id_value = YYYY
Here is my EXPLAIN.
EXPLAIN DELETE FROM MY_VALUES WHERE id_ref = YYYY LIMIT 1000;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+------------+---------+-------+----------+----------+-------------+
| 1 | DELETE | MY_VALUES | NULL | range | id_ref | id_ref | 5 | const | 20647922 | 100.00 | Using where |
It does use the right INDEX.
I would not have any problem to have this operation running for days ont he server.
What is the right way to approach this "cleaning"?
EDIT
Here is the output from SHOW CREATE TABLE MY_VALUES
MY_VALUES | CREATE TABLE `MY_VALUES` (
`id_my_value` int NOT NULL AUTO_INCREMENT,
`id_document` int NOT NULL,
`id_ref` int DEFAULT NULL,
`value` mediumtext CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`weigth` int DEFAULT NULL,
`id_analysis` int DEFAULT NULL,
`url` text CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`domain` varchar(64) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`filetype` varchar(16) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`id_domain` int DEFAULT NULL,
`id_city` int DEFAULT NULL,
`city_name` varchar(32) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`is_hidden` tinyint NOT NULL DEFAULT '0',
`id_company` int DEFAULT NULL,
`is_hidden_by_user` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_my_value`),
KEY `id_ref` (`id_ref`),
KEY `id_document` (`id_document`),
KEY `id_analysis` (`id_analysis`),
KEY `weigth` (`weigth`),
KEY `id_domain` (`id_domain`),
KEY `id_city` (`id_city`),
KEY `id_company` (`id_company`),
KEY `value` (`value`(15))
UPDATE
I just tried to remove one register:
DELETE FROM MY_VALUES WHERE id_MY_VALUE = 8
That operation takes "forever". To prevent a timeout, I followed this SO question ,so I have set:
show variables like 'innodb_lock_wait_timeout';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| innodb_lock_wait_timeout | 100000 |
+--------------------------+--------+
a=0;
limit=1000;
while true
b=a+1000
sql = "delete from VALUES where id>{a} and id<={b}"
cursor.execute(sql)
mydb.commit()
if cursor.rowcount == 0:
break
a=a+1000
First thing to try. Put this right after your second cursor.execute().
cnx.commit()
In connector/python, autocommit is turned off by default. If you don't commit, your MySQL server buffers up all your changes (DELETEs in your case) so it can roll them back if you choose, or if your program crashes.
I guess your slow query is
DELETE FROM `VALUES` WHERE id_ref=constant LIMIT 1000;
Try doing this. EXPLAIN shows you the query plan.
EXPLAIN DELETE FROM `VALUES` WHERE id_ref=constant LIMIT 1000;
It should employ the index on your id_ref row. It's possible your indexes aren't selective enough so your query planner chooses a table scan. In that case you might consider raising the LIMIT so your query does more work each time it runs.
You could try this. If my guess about the table scan is correct, it might help.
DELETE FROM `VALUES` FORCE INDEX (your_index_on_id_ref) WHERE id_ref=constant LIMIT 1000;
(Usually FORCE INDEX is a terrible idea. But this might be the exception.)
You could also try this: create a cleaned up temporary table, then rename tables to put it into service.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
CREATE TABLE purged_values AS
SELECT *
FROM `VALUES`
WHERE id_ref NOT IN (SELECT id_ref FROM `REFS`);
This will take a while. Run it at zero-dark-thirty. The transaction isolation level helps prevent contention with other sessions using the table while this is in progress.
Then you'll have a new, purged, table. You can index it, then do these renames to put it into service.
ALTER TABLE `VALUES` RENAME TO old_values;
ALTER TABLE purged_values RENAME to `VALUES';
Finally I did a bit more experimentation and found a way.
First step
The python loop to delete the entries on the DB was running for ~12h. I addedcouple of lines to measure the time execution:
start = time.time()
cursor.execute(sql)
mydb.commit()
end = time.time()
Here is a sample of the first measurements:
1 > 900 > 0.4072246551513672
2 > 900 > 1.7270898818969727
3 > 900 > 1.8365845680236816
4 > 900 > 1.124634027481079
5 > 900 > 1.8552422523498535
6 > 900 > 13.80513596534729
7 > 900 > 8.379877090454102
8 > 900 > 10.675175428390503
9 > 900 > 6.14388370513916
10 > 900 > 11.806004762649536
11 > 900 > 12.884040117263794
12 > 900 > 23.604055881500244
13 > 900 > 19.162535905838013
14 > 900 > 24.980825662612915
....
It went for an average of ~30s per execution after 900 iterations. Picture attached for reference:
In my case this have would taken ~80 days to remove all the rows with this implementation.
Final solution
Created a temporary table with the appropiate value, index, etc...
CREATE TABLE ZZ_MY_VALUES AS
SELECT * FROM ZZ_MY_VALUES WHERE ZZ_MY_VALUES.id_ref IN
(
SELECT id_ref FROM MY_REFS WHERE id_ref = 3 OR id_ref = 4 OR id_ref = 5
)
It took ~3h and went from 230M rows to 21M rows.
A bit quicker than the orignal statimation of 3 months. :)
Thanks all for your tips.

How to create a filter in SQLite database across multiple tables?

I am looking for a way to create a number of filters across a few tables in my SQL database. The 2 tables I require the data from are Order and OrderDetails.
The Order table is like this:
------------------------------------
| OrderID | CustomerID | OrderDate |
------------------------------------
The OrderDetails table is like this:
----------------------------------
| OrderID | ProductID | Quantity |
----------------------------------
I want to make it so that it counts the number of instances a particular OrderID pops up in a single day. For example, it will choose an OrderID in Order and then match it to the OrderIDs in OrderDetails, counting the number of times it pops up in OrderDetails.
-----------------------------------------------------------
| OrderID | CustomerID | OrderDate | ProductID | Quantity |
-----------------------------------------------------------
The code I used is below here:
# Execute SQL Query (number of orders made on a particular day entered by a user)
cursor.execute("""
SELECT 'order.*', count('orderdetails.orderid') as 'NumberOfOrders'
from 'order'
left join 'order'
on ('order.orderid' = 'orderdetais.orderid')
group by
'order.orderid'
""")
print(cursor.fetchall())
Also, the current output that I get is this when I should get 3:
[('order.*', 830)]
Your immediate problem is that you are abusing the use of single quotes. If you need to quote an identifiers (table name, column name and the-like), then you should use double quotes in SQLite (this actually is the SQL standard). And an expression such as order.* should not be quoted at all. You are also self-joining the orders table, while you probably want to bring the orderdetails.
You seem to want:
select
o.orderID,
o.customerID,
o.orderDate,
count(*) number_of_orders
from "order" o
left join orderdetails od on od.orderid = o.orderid
group by o.orderID, o.customerID, o.orderDate
order is a language keyword, so I did quote it - that table would be better named orders, to avoid the conflicting name. Other identifiers do not need to be quoted here.
Since all you want from orderdetails is the count, you could also use a subquery instead of aggregation:
select
o.*,
(select count(*) from orderdetails od where od.orderid = o.oderid) number_of_orders
from "order" o

SQL - Possible to Auto Increment Number but with leading zeros?

Our IDs look something like this "CS0000001" which stands for Customer with the ID 1. Is this possible to to with SQL and Auto Increment or do i need to to that in my GUI ?
I need the leading zeroes but with auto incrementing to prevent double usage if am constructing the ID in Python and Insert them into the DB.
Is that possible?
You have few choices:
Construct the CustomerID in your code which inserts the data into
the Customer table (=application side, requires change in your code)
Create a view on top of the Customer-table that contains the logic
and use that when you need the CustomerID (=database side, requires change in your code)
Use a procedure to do the inserts and construct the CustomerID in
the procedure (=database side, requires change in your code)
Possible realization.
Create data table
CREATE TABLE data (id CHAR(9) NOT NULL DEFAULT '',
val TEXT,
PRIMARY KEY (id));
Create service table
CREATE TABLE ids (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY);
Create trigger which generates id value
CREATE TRIGGER tr_bi_data
BEFORE INSERT
ON data
FOR EACH ROW
BEGIN
INSERT INTO ids () VALUES ();
SET NEW.id = CONCAT('CS', LPAD(LAST_INSERT_ID(), 7, '0'));
DELETE FROM ids;
END
Create trigger which prohibits id value change
CREATE TRIGGER tr_bu_data
BEFORE UPDATE
ON data
FOR EACH ROW
BEGIN
SET NEW.id = OLD.id;
END
Insert some data, check result
INSERT INTO data (val) VALUES ('data-1'), ('data-2');
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
Try to update, ensure id change prohibited
UPDATE data SET id = 'CS0000100' WHERE val = 'data-1';
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
Insert one more data, ensure enumeration continues
INSERT INTO data (val) VALUES ('data-3'), ('data-4');
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
CS0000003 | data-3
CS0000004 | data-4
Check service table is successfully cleared
SELECT COUNT(*) FROM ids;
| COUNT(*) |
| -------: |
| 0 |
db<>fiddle here
Disadvantages:
Additional table needed.
Generated id value edition is disabled (copy and delete old record must be used instead, custom value cannot be set).

Create new SQLite table combining column from other tables with sqlite3 and python

I am trying to create a new table that combines columns from two different tables.
Let's imagine then that I have a database named db.db that includes two tables named table1 and table2.
table1 looks like this:
id | item | price
-------------
1 | book | 20
2 | copy | 30
3 | pen | 10
and table2 like this (note that has duplicated axis):
id | item | color
-------------
1 | book | blue
2 | copy | red
3 | pen | red
1 | book | blue
2 | copy | red
3 | pen | red
Now I'm trying to create a new table named new_table that combines both columns price and color over the same axis and also without duplicates. My code is the following (it does not obviously work because of my poor SQL skills):
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("CREATE TABLE new_table (id varchar, item integer, price integer, color integer)")
cur.execute("ATTACH DATABASE 'db.db' AS other;")
cur.execute("INSERT INTO new_table (id, item, price) SELECT * FROM other.table1")
cur.execute("UPDATE new_table SET color = (SELECT color FROM other.table2 WHERE distinct(id))")
con.commit()
I know there are multiple errors in the last line of code but I can't get my head around it. What would be your approach to this problem? Thanks!
Something like
CREATE TABLE new_table(id INTEGER, item TEXT, price INTEGER, color TEXT);
INSERT INTO new_table(id, item, price, color)
SELECT DISTINCT t1.id, t1.item, t1.price, t2.color
FROM table1 AS t1
JOIN table2 AS t2 ON t1.id = t2.id;
Note the fixed column types; yours were all sorts of strange. item and color as integers?
If each id value is unique in the new table (Only one row will ever have an id of 1, only will be 2, and so on), that column should probably be an INTEGER PRIMARY KEY, too.
EDIT: Also, since you're creating this table in an in-memory database from tables from an attached file-based database... maybe you want a temporary table instead? Or a view might be more appropriate? Not sure what your goal is.

Compare the results of 2 queries that have "ORDER BY" in PostgreSQL to check for match and mismatch

I am writing a small app that mark students' queries in PostgreSQL against the teacher's queries. For normal query I can easily use EXCEPT and UNION to find the mismatches. But how can I check the ones that need sorting.
If the answer matches all rows but only part of it are in right order. How can find the number of sorted rows and mark the case properly?
My program is written in Python with Psycopg2 library.
You can compare both queries joined by row_number(). Example:
create table example (id int, str text);
insert into example values (1, 'alfa'), (2, 'beta');
with teacher as ( -- teachers query
select * from example order by id
),
student as ( -- students query
select * from example order by id desc
),
teacher_rn as (
select row_number() over () rn, *
from teacher
),
student_rn as (
select row_number() over () rn, *
from student
)
select t.*, s.*
from teacher_rn t
join student_rn s
on t.rn = s.rn
where t <> s;
rn | id | str | rn | id | str
----+----+------+----+----+------
1 | 1 | alfa | 1 | 2 | beta
2 | 2 | beta | 2 | 1 | alfa
(2 rows)

Categories