Our IDs look something like this "CS0000001" which stands for Customer with the ID 1. Is this possible to to with SQL and Auto Increment or do i need to to that in my GUI ?
I need the leading zeroes but with auto incrementing to prevent double usage if am constructing the ID in Python and Insert them into the DB.
Is that possible?
You have few choices:
Construct the CustomerID in your code which inserts the data into
the Customer table (=application side, requires change in your code)
Create a view on top of the Customer-table that contains the logic
and use that when you need the CustomerID (=database side, requires change in your code)
Use a procedure to do the inserts and construct the CustomerID in
the procedure (=database side, requires change in your code)
Possible realization.
Create data table
CREATE TABLE data (id CHAR(9) NOT NULL DEFAULT '',
val TEXT,
PRIMARY KEY (id));
Create service table
CREATE TABLE ids (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY);
Create trigger which generates id value
CREATE TRIGGER tr_bi_data
BEFORE INSERT
ON data
FOR EACH ROW
BEGIN
INSERT INTO ids () VALUES ();
SET NEW.id = CONCAT('CS', LPAD(LAST_INSERT_ID(), 7, '0'));
DELETE FROM ids;
END
Create trigger which prohibits id value change
CREATE TRIGGER tr_bu_data
BEFORE UPDATE
ON data
FOR EACH ROW
BEGIN
SET NEW.id = OLD.id;
END
Insert some data, check result
INSERT INTO data (val) VALUES ('data-1'), ('data-2');
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
Try to update, ensure id change prohibited
UPDATE data SET id = 'CS0000100' WHERE val = 'data-1';
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
Insert one more data, ensure enumeration continues
INSERT INTO data (val) VALUES ('data-3'), ('data-4');
SELECT * FROM data;
id | val
:-------- | :-----
CS0000001 | data-1
CS0000002 | data-2
CS0000003 | data-3
CS0000004 | data-4
Check service table is successfully cleared
SELECT COUNT(*) FROM ids;
| COUNT(*) |
| -------: |
| 0 |
db<>fiddle here
Disadvantages:
Additional table needed.
Generated id value edition is disabled (copy and delete old record must be used instead, custom value cannot be set).
Related
I have one huge table that I would like to make smaller. It has ~230 Million rows.
Both columns are indexed. The structure is:
+--------------+------------+
| id_my_value | id_ref |
+--------------+------------+
| YYYY | XXXX |
+--------------+------------+
I would have to remove the values that have a particular "id_ref" value. I have tried the following:
sql = f"SELECT id_ref FROM REFS"
cursor.execute(sql)
refs = cursor.fetchall()
limit = 1000
for current in refs:
id = current["id_ref"]
sql = f"DELETE FROM MY_VALUES WHERE id_ref = {id} LIMIT {limit}"
while True:
cursor.execute(sql)
mydb.commit()
if cursor.rowcount == 0:
break
Regardless the value I set to "limit" the query is tremendously slow:
DELETE FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10;
I have also tried the other way around. Select the id_value associated with a particular id_ref, and delete:
SELECT id_value FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10
DELETE FROM MY_VALUES WHERE id_value = YYYY
Here is my EXPLAIN.
EXPLAIN DELETE FROM MY_VALUES WHERE id_ref = YYYY LIMIT 1000;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+------------+---------+-------+----------+----------+-------------+
| 1 | DELETE | MY_VALUES | NULL | range | id_ref | id_ref | 5 | const | 20647922 | 100.00 | Using where |
It does use the right INDEX.
I would not have any problem to have this operation running for days ont he server.
What is the right way to approach this "cleaning"?
EDIT
Here is the output from SHOW CREATE TABLE MY_VALUES
MY_VALUES | CREATE TABLE `MY_VALUES` (
`id_my_value` int NOT NULL AUTO_INCREMENT,
`id_document` int NOT NULL,
`id_ref` int DEFAULT NULL,
`value` mediumtext CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`weigth` int DEFAULT NULL,
`id_analysis` int DEFAULT NULL,
`url` text CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`domain` varchar(64) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`filetype` varchar(16) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`id_domain` int DEFAULT NULL,
`id_city` int DEFAULT NULL,
`city_name` varchar(32) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`is_hidden` tinyint NOT NULL DEFAULT '0',
`id_company` int DEFAULT NULL,
`is_hidden_by_user` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_my_value`),
KEY `id_ref` (`id_ref`),
KEY `id_document` (`id_document`),
KEY `id_analysis` (`id_analysis`),
KEY `weigth` (`weigth`),
KEY `id_domain` (`id_domain`),
KEY `id_city` (`id_city`),
KEY `id_company` (`id_company`),
KEY `value` (`value`(15))
UPDATE
I just tried to remove one register:
DELETE FROM MY_VALUES WHERE id_MY_VALUE = 8
That operation takes "forever". To prevent a timeout, I followed this SO question ,so I have set:
show variables like 'innodb_lock_wait_timeout';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| innodb_lock_wait_timeout | 100000 |
+--------------------------+--------+
a=0;
limit=1000;
while true
b=a+1000
sql = "delete from VALUES where id>{a} and id<={b}"
cursor.execute(sql)
mydb.commit()
if cursor.rowcount == 0:
break
a=a+1000
First thing to try. Put this right after your second cursor.execute().
cnx.commit()
In connector/python, autocommit is turned off by default. If you don't commit, your MySQL server buffers up all your changes (DELETEs in your case) so it can roll them back if you choose, or if your program crashes.
I guess your slow query is
DELETE FROM `VALUES` WHERE id_ref=constant LIMIT 1000;
Try doing this. EXPLAIN shows you the query plan.
EXPLAIN DELETE FROM `VALUES` WHERE id_ref=constant LIMIT 1000;
It should employ the index on your id_ref row. It's possible your indexes aren't selective enough so your query planner chooses a table scan. In that case you might consider raising the LIMIT so your query does more work each time it runs.
You could try this. If my guess about the table scan is correct, it might help.
DELETE FROM `VALUES` FORCE INDEX (your_index_on_id_ref) WHERE id_ref=constant LIMIT 1000;
(Usually FORCE INDEX is a terrible idea. But this might be the exception.)
You could also try this: create a cleaned up temporary table, then rename tables to put it into service.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
CREATE TABLE purged_values AS
SELECT *
FROM `VALUES`
WHERE id_ref NOT IN (SELECT id_ref FROM `REFS`);
This will take a while. Run it at zero-dark-thirty. The transaction isolation level helps prevent contention with other sessions using the table while this is in progress.
Then you'll have a new, purged, table. You can index it, then do these renames to put it into service.
ALTER TABLE `VALUES` RENAME TO old_values;
ALTER TABLE purged_values RENAME to `VALUES';
Finally I did a bit more experimentation and found a way.
First step
The python loop to delete the entries on the DB was running for ~12h. I addedcouple of lines to measure the time execution:
start = time.time()
cursor.execute(sql)
mydb.commit()
end = time.time()
Here is a sample of the first measurements:
1 > 900 > 0.4072246551513672
2 > 900 > 1.7270898818969727
3 > 900 > 1.8365845680236816
4 > 900 > 1.124634027481079
5 > 900 > 1.8552422523498535
6 > 900 > 13.80513596534729
7 > 900 > 8.379877090454102
8 > 900 > 10.675175428390503
9 > 900 > 6.14388370513916
10 > 900 > 11.806004762649536
11 > 900 > 12.884040117263794
12 > 900 > 23.604055881500244
13 > 900 > 19.162535905838013
14 > 900 > 24.980825662612915
....
It went for an average of ~30s per execution after 900 iterations. Picture attached for reference:
In my case this have would taken ~80 days to remove all the rows with this implementation.
Final solution
Created a temporary table with the appropiate value, index, etc...
CREATE TABLE ZZ_MY_VALUES AS
SELECT * FROM ZZ_MY_VALUES WHERE ZZ_MY_VALUES.id_ref IN
(
SELECT id_ref FROM MY_REFS WHERE id_ref = 3 OR id_ref = 4 OR id_ref = 5
)
It took ~3h and went from 230M rows to 21M rows.
A bit quicker than the orignal statimation of 3 months. :)
Thanks all for your tips.
I have looked around for the solution to my problem a lot and haven't found anything. The problem is with inserting a new record into a database table that has an auto increment primary key. Below is the different scenarios and behaviors that I'm getting. Sample code.
model = QSqlTableModel(db = db, table = 'table')
record = model.record()
record.setGenerated('id', False)
record.setValue('name', 'a_name')
model.insertRecord(-1, record)
Scenario 1: Pre-populated table with some rows.
| id | name |
|----|------|
| 1 | name1|
| 2 | name2|
behavior 1: insertRecord does not add a new record to the table. If I remove the setGenerated flag and manually specify an id [record.setValue('id', 3)] I can insert a new record.
Scenario 2: Pre-populated table with some rows and first id not starting at 1.
| id | name |
|----|------|
| 3 | name1|
| 4 | name2|
behavior 2: first insertRecord adds a new record with id = 1, the second insertRecord adds another record with id = 2 . After that insertRecord does not add any new records. It will try to add a new record with id = 3, but id = 3 already exist and it won't jump to the next available id (id = 5). Again if I remove the setGenerated flag and manually specify the next available id [record.setValue('id', 5)] it works.
Scenario 3:
| id | name |
|----|------|
behavior 3: Empty table without any rows. insertRecord will start at id = 1 and continues to add new records with no problem.
How can I fix this problem without having to manually specify an auto incremented id?
I am looking for a way to create a number of filters across a few tables in my SQL database. The 2 tables I require the data from are Order and OrderDetails.
The Order table is like this:
------------------------------------
| OrderID | CustomerID | OrderDate |
------------------------------------
The OrderDetails table is like this:
----------------------------------
| OrderID | ProductID | Quantity |
----------------------------------
I want to make it so that it counts the number of instances a particular OrderID pops up in a single day. For example, it will choose an OrderID in Order and then match it to the OrderIDs in OrderDetails, counting the number of times it pops up in OrderDetails.
-----------------------------------------------------------
| OrderID | CustomerID | OrderDate | ProductID | Quantity |
-----------------------------------------------------------
The code I used is below here:
# Execute SQL Query (number of orders made on a particular day entered by a user)
cursor.execute("""
SELECT 'order.*', count('orderdetails.orderid') as 'NumberOfOrders'
from 'order'
left join 'order'
on ('order.orderid' = 'orderdetais.orderid')
group by
'order.orderid'
""")
print(cursor.fetchall())
Also, the current output that I get is this when I should get 3:
[('order.*', 830)]
Your immediate problem is that you are abusing the use of single quotes. If you need to quote an identifiers (table name, column name and the-like), then you should use double quotes in SQLite (this actually is the SQL standard). And an expression such as order.* should not be quoted at all. You are also self-joining the orders table, while you probably want to bring the orderdetails.
You seem to want:
select
o.orderID,
o.customerID,
o.orderDate,
count(*) number_of_orders
from "order" o
left join orderdetails od on od.orderid = o.orderid
group by o.orderID, o.customerID, o.orderDate
order is a language keyword, so I did quote it - that table would be better named orders, to avoid the conflicting name. Other identifiers do not need to be quoted here.
Since all you want from orderdetails is the count, you could also use a subquery instead of aggregation:
select
o.*,
(select count(*) from orderdetails od where od.orderid = o.oderid) number_of_orders
from "order" o
I would like to be able to return a list of all fields (ideally with the table details) used by an given SQL query. E.g. the input of the query:
SELECT t1.field1, field3
FROM dbo.table1 AS t1
INNER JOIN dbo.table2 as t2
ON t2.field2 = t1.field2
WHERE t2.field1 = 'someValue'
would return
+--------+-----------+--------+
| schema | tablename | field |
+--------+-----------+--------+
| dbo | table1 | field1 |
| dbo | table1 | field2 |
| dbo | table1 | field3 |
| dbo | table2 | field1 |
| dbo | table2 | field2 |
+--------+-----------+--------+
Really it needs to make use of the SQL kernal (is that the right word? engine?) as there is no way that the reader can know that field3 is in table1 not table2. For this reason I would assume that the solution would be an SQL. Bonus points if it can handle SELECT * too!
I have attempted a python solution using sqlparse (https://sqlparse.readthedocs.io/en/latest/), but was having trouble with the more complex SQL queries involving temporary tables, subqueries and CTEs. Also handling of aliases was very difficult (particularly if the query used the same alias in multiple places). Obviously it could not handle cases like field3 above which had no table identifier. Nor can it handle SELECT *.
I was hoping there might be a more elgant solution within SQL Server Management Studio or even some function within SQL Server itself. We have SQL Prompt from Redgate, which must have some understand within its intellisense, of the architecture and SQL query it is formatting.
UPDATE:
As requested: the reason I'm trying to do this is to work out which Users can execute which SSRS Reports within our organisation. This is entirely dependent on them having GRANT SELECT permissions assigned to their Roles on all fields used by all datasets (in our case SQL queries) in a given report. I have already managed to report on which Users have GRANT SELECT on which fields according to their Roles. I now want to extend that to which reports those permissions allow them to run.
The column table names may be tricky because column names can be ambiguous or even derived. However, you can get the column names, sequence and type from virtually any query or stored procedure.
Example
Select column_ordinal
,name
,system_type_name
From sys.dm_exec_describe_first_result_set('Select * from YourTable',null,null )
I think I have now found an answer. Please note: I currently do not have permissions to execute these functions so I have not yet tested it - I will update the answer when I've had a chance to test it. Thanks for the answer goes to #milivojeviCH. The answer is copied from here: https://stackoverflow.com/a/19852614/6709902
The ultimate goal of selecting all the columns used in an SQL Server's execution plan solved:
USE AdventureWorksDW2012
DBCC FREEPROCCACHE
SELECT dC.Gender, dc.HouseOwnerFlag,
SUM(fIS.SalesAmount) AS SalesAmount
FROM
dbo.DimCustomer dC INNER JOIN
dbo.FactInternetSales fIS ON fIS.CustomerKey = dC.CustomerKey
GROUP BY dC.Gender, dc.HouseOwnerFlag
ORDER BY dC.Gender, dc.HouseOwnerFlag
/*
query_hash query_plan_hash
0x752B3F80E2DB426A 0xA15453A5C2D43765
*/
DECLARE #MyQ AS XML;
-- SELECT qstats.query_hash, query_plan_hash, qplan.query_plan AS [Query Plan],qtext.text
SELECT #MyQ = qplan.query_plan
FROM sys.dm_exec_query_stats AS qstats
CROSS APPLY sys.dm_exec_query_plan(qstats.plan_handle) AS qplan
cross apply sys.dm_exec_sql_text(qstats.plan_handle) as qtext
where text like '% fIS %'
and query_plan_hash = 0xA15453A5C2D43765
SeLeCt #MyQ
;WITH xmlnamespaces (default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT DISTINCT
[Database] = x.value('(#Database)[1]', 'varchar(128)'),
[Schema] = x.value('(#Schema)[1]', 'varchar(128)'),
[Table] = x.value('(#Table)[1]', 'varchar(128)'),
[Alias] = x.value('(#Alias)[1]', 'varchar(128)'),
[Column] = x.value('(#Column)[1]', 'varchar(128)')
FROM #MyQ.nodes('//ColumnReference') x1(x)
Leads to the following output:
Database Schema Table Alias Column
------------------------- ------ ---------------- ----- ----------------
NULL NULL NULL NULL Expr1004
[AdventureWorksDW2012] [dbo] [DimCustomer] [dC] CustomerKey
[AdventureWorksDW2012] [dbo] [DimCustomer] [dC] Gender
[AdventureWorksDW2012] [dbo] [DimCustomer] [dC] HouseOwnerFlag
[AdventureWorksDW2012] [dbo] [FactInternetSal [fIS] CustomerKey
[AdventureWorksDW2012] [dbo] [FactInternetSal [fIS] SalesAmount
I was working through the Pattern 1 and was wondering if there is a way, in python or otherwise to "gather" all the rows with the same id into a row with dictionaries.
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
Insert some data
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
Query the database.
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
Result:
weatherstation_id | event_time | temperature
-------------------+--------------------------+-------------
1234ABCD | 2013-04-03 06:01:00+0000 | 72F
1234ABCD | 2013-04-03 06:02:00+0000 | 73F
1234ABCD | 2013-04-03 06:03:00+0000 | 73F
1234ABCD | 2013-04-03 06:04:00+0000 | 74F
Which works, but I was wondering if I can turn this into a row per weatherstationid.
E.g.
{
"weatherstationid": "1234ABCD",
"2013-04-03 06:01:00+0000": "72F",
"2013-04-03 06:02:00+0000": "73F",
"2013-04-03 06:03:00+0000": "73F",
"2013-04-03 06:04:00+0000": "74F"
}
Is there some parameter in cassandra driver that can be specified to gather by certain id (weatherstationid) and turn everything else into dictionary? Or is there needs to be some Python magic to turn list of rows into single row per id (or set of IDs)?
Alex, you will have to do some post execution data processing to get this format. The driver returns row by row, no matter what row_factory you use.
One of the reasons the driver cannot accomplish the format you suggest is that there is pagination involved internally (default fetch_size is 5000). So the results generated your way could potentially be partial or incomplete. Additionally, this can be easily be done with Python when the query execution is done and you are sure that all required results are fetched.