I'm using mysql connector 1.0.9. and Python 3.2
This query fails due to a syntax error (mysql.connector throws ProgrammingError, the specific MySQL error is just "there is a syntax error to the right of "%(IP)s AND DATE_SUB(NOW(), INTERVAL 1 HOUR) < accessed":
SELECT COUNT(*) FROM bad_ip_logins WHERE IP = %(IP)s AND DATE_SUB(NOW(), INTERVAL 1 HOUR) < accessed
But if I quote the variable IP, it works:
SELECT COUNT(*) FROM bad_ip_logins WHERE IP = '%(IP)s' AND DATE_SUB(NOW(), INTERVAL 1 HOUR) < accessed
In context:
IP = 1249764151 # IP converted to an int
conn = mysql.connector.connect(db_params)
curs = conn.cursor()
query = "SELECT COUNT(*) FROM bad_ip_logins WHERE IP = %(IP)s AND DATE_SUB(NOW(), INTERVAL 1 HOUR) < accessed"
params = {'IP', IP}
curs.execute(query, params)
My understanding is that you never have to quote variables for a prepared statement (and this is true for every other query in my code, even ones that access the IP variable on this table). Why do I need to quote it in this single instance, and nowhere else?
If this isn't doing a prepared statement I'd be interested in hearing about that as well. I wasn't able to inject anything with this - was it just quoting it in such a way as to prevent that?
If it matters, this is the table description:
+----------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+-------+
| IP | int(10) unsigned | YES | | NULL | |
| user_id | int(11) | YES | | NULL | |
| accessed | datetime | YES | | NULL | |
+----------+------------------+------+-----+---------+-------+
Do not use string interpolation. Leave the SQL parameter to the database adapter:
cursor.execute('''\
SELECT COUNT(*) FROM bad_ip_logins WHERE IP = %s AND DATE_SUB(NOW(), INTERVAL 1 HOUR) < accessed''', (ip,))
Here, we pass the parameter ip in to the execute() call as a separate parameter (in a tuple, to make it a sequence), and the database adapter will take care of proper quoting, filling in the %s placeholder.
Related
I have one huge table that I would like to make smaller. It has ~230 Million rows.
Both columns are indexed. The structure is:
+--------------+------------+
| id_my_value | id_ref |
+--------------+------------+
| YYYY | XXXX |
+--------------+------------+
I would have to remove the values that have a particular "id_ref" value. I have tried the following:
sql = f"SELECT id_ref FROM REFS"
cursor.execute(sql)
refs = cursor.fetchall()
limit = 1000
for current in refs:
id = current["id_ref"]
sql = f"DELETE FROM MY_VALUES WHERE id_ref = {id} LIMIT {limit}"
while True:
cursor.execute(sql)
mydb.commit()
if cursor.rowcount == 0:
break
Regardless the value I set to "limit" the query is tremendously slow:
DELETE FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10;
I have also tried the other way around. Select the id_value associated with a particular id_ref, and delete:
SELECT id_value FROM MY_VALUES WHERE id_ref = XXXX LIMIT 10
DELETE FROM MY_VALUES WHERE id_value = YYYY
Here is my EXPLAIN.
EXPLAIN DELETE FROM MY_VALUES WHERE id_ref = YYYY LIMIT 1000;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+------------+---------+-------+----------+----------+-------------+
| 1 | DELETE | MY_VALUES | NULL | range | id_ref | id_ref | 5 | const | 20647922 | 100.00 | Using where |
It does use the right INDEX.
I would not have any problem to have this operation running for days ont he server.
What is the right way to approach this "cleaning"?
EDIT
Here is the output from SHOW CREATE TABLE MY_VALUES
MY_VALUES | CREATE TABLE `MY_VALUES` (
`id_my_value` int NOT NULL AUTO_INCREMENT,
`id_document` int NOT NULL,
`id_ref` int DEFAULT NULL,
`value` mediumtext CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`weigth` int DEFAULT NULL,
`id_analysis` int DEFAULT NULL,
`url` text CHARACTER SET utf8 COLLATE utf8_spanish_ci,
`domain` varchar(64) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`filetype` varchar(16) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`id_domain` int DEFAULT NULL,
`id_city` int DEFAULT NULL,
`city_name` varchar(32) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,
`is_hidden` tinyint NOT NULL DEFAULT '0',
`id_company` int DEFAULT NULL,
`is_hidden_by_user` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_my_value`),
KEY `id_ref` (`id_ref`),
KEY `id_document` (`id_document`),
KEY `id_analysis` (`id_analysis`),
KEY `weigth` (`weigth`),
KEY `id_domain` (`id_domain`),
KEY `id_city` (`id_city`),
KEY `id_company` (`id_company`),
KEY `value` (`value`(15))
UPDATE
I just tried to remove one register:
DELETE FROM MY_VALUES WHERE id_MY_VALUE = 8
That operation takes "forever". To prevent a timeout, I followed this SO question ,so I have set:
show variables like 'innodb_lock_wait_timeout';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| innodb_lock_wait_timeout | 100000 |
+--------------------------+--------+
a=0;
limit=1000;
while true
b=a+1000
sql = "delete from VALUES where id>{a} and id<={b}"
cursor.execute(sql)
mydb.commit()
if cursor.rowcount == 0:
break
a=a+1000
First thing to try. Put this right after your second cursor.execute().
cnx.commit()
In connector/python, autocommit is turned off by default. If you don't commit, your MySQL server buffers up all your changes (DELETEs in your case) so it can roll them back if you choose, or if your program crashes.
I guess your slow query is
DELETE FROM `VALUES` WHERE id_ref=constant LIMIT 1000;
Try doing this. EXPLAIN shows you the query plan.
EXPLAIN DELETE FROM `VALUES` WHERE id_ref=constant LIMIT 1000;
It should employ the index on your id_ref row. It's possible your indexes aren't selective enough so your query planner chooses a table scan. In that case you might consider raising the LIMIT so your query does more work each time it runs.
You could try this. If my guess about the table scan is correct, it might help.
DELETE FROM `VALUES` FORCE INDEX (your_index_on_id_ref) WHERE id_ref=constant LIMIT 1000;
(Usually FORCE INDEX is a terrible idea. But this might be the exception.)
You could also try this: create a cleaned up temporary table, then rename tables to put it into service.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
CREATE TABLE purged_values AS
SELECT *
FROM `VALUES`
WHERE id_ref NOT IN (SELECT id_ref FROM `REFS`);
This will take a while. Run it at zero-dark-thirty. The transaction isolation level helps prevent contention with other sessions using the table while this is in progress.
Then you'll have a new, purged, table. You can index it, then do these renames to put it into service.
ALTER TABLE `VALUES` RENAME TO old_values;
ALTER TABLE purged_values RENAME to `VALUES';
Finally I did a bit more experimentation and found a way.
First step
The python loop to delete the entries on the DB was running for ~12h. I addedcouple of lines to measure the time execution:
start = time.time()
cursor.execute(sql)
mydb.commit()
end = time.time()
Here is a sample of the first measurements:
1 > 900 > 0.4072246551513672
2 > 900 > 1.7270898818969727
3 > 900 > 1.8365845680236816
4 > 900 > 1.124634027481079
5 > 900 > 1.8552422523498535
6 > 900 > 13.80513596534729
7 > 900 > 8.379877090454102
8 > 900 > 10.675175428390503
9 > 900 > 6.14388370513916
10 > 900 > 11.806004762649536
11 > 900 > 12.884040117263794
12 > 900 > 23.604055881500244
13 > 900 > 19.162535905838013
14 > 900 > 24.980825662612915
....
It went for an average of ~30s per execution after 900 iterations. Picture attached for reference:
In my case this have would taken ~80 days to remove all the rows with this implementation.
Final solution
Created a temporary table with the appropiate value, index, etc...
CREATE TABLE ZZ_MY_VALUES AS
SELECT * FROM ZZ_MY_VALUES WHERE ZZ_MY_VALUES.id_ref IN
(
SELECT id_ref FROM MY_REFS WHERE id_ref = 3 OR id_ref = 4 OR id_ref = 5
)
It took ~3h and went from 230M rows to 21M rows.
A bit quicker than the orignal statimation of 3 months. :)
Thanks all for your tips.
I am trying to store some tables I create in my code in an RDS instance using psycopg2. The script runs without issue and I can see the table being stored correctly in the DB. However, if I try to retrieve the query, I only see the columns, but no data:
import pandas as pd
import psycopg2
test=pd.DataFrame({'A':[1,1],'B':[2,2]})
#connect is a function to connect to the RDS instance
connection= connect()
cursor=connection.cursor()
query='CREATE TABLE test (A varchar NOT NULL,B varchar NOT NULL);'
cursor.execute(query)
connection.commit()
cursor.close()
connection.close()
This script runs without issues and, printing out file_check from the following script:
connection=connect()
# check if file already exists in SQL
sql = """
SELECT "table_name","column_name", "data_type", "table_schema"
FROM INFORMATION_SCHEMA.COLUMNS
WHERE "table_schema" = 'public'
ORDER BY table_name
"""
file_check=pd.read_sql(sql, con=connection)
connection.close()
I get:
table_name column_name data_type table_schema
0 test a character varying public
1 test b character varying public
which looks good.
Running the following however:
read='select * from public.test'
df=pd.read_sql(read,con=connection)
returns:
Empty DataFrame
Columns: [a, b]
Index: []
Anybody have any idea why this is happening? I cannot seem to get around this
Erm, your first script has a test_tbl dataframe, but it's never referred to after it's defined.
You'll need to
test_tbl.to_sql("test", connection)
or similar to actually write it.
A minimal example:
$ createdb so63284022
$ python
>>> import sqlalchemy as sa
>>> import pandas as pd
>>> test = pd.DataFrame({'A':[1,1],'B':[2,2], 'C': ['yes', 'hello']})
>>> engine = sa.create_engine("postgres://localhost/so63284022")
>>> with engine.connect() as connection:
... test.to_sql("test", connection)
...
>>>
$ psql so63284022
so63284022=# select * from test;
index | A | B | C
-------+---+---+-------
0 | 1 | 2 | yes
1 | 1 | 2 | hello
(2 rows)
so63284022=# \d+ test
Table "public.test"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+--------+-----------+----------+---------+----------+--------------+-------------
index | bigint | | | | plain | |
A | bigint | | | | plain | |
B | bigint | | | | plain | |
C | text | | | | extended | |
Indexes:
"ix_test_index" btree (index)
Access method: heap
so63284022=#
I was able to solve this:
As it was pointed out by #AKX, I was only creating the table structure, but I was not filling in the table.
I now import import psycopg2.extras as well and, after this:
query='CREATE TABLE test (A varchar NOT NULL,B varchar NOT NULL);'
cursor.execute(query)
I add something like:
update_query='INSERT INTO test(A, B) VALUES(%s,%s) ON CONFLICT DO NOTHING'
psycopg2.extras.execute_batch(cursor, update_query, test.values)
cursor.close()
connection.close()
My table is now correctly filled after checking with pd.read_sql
I have a table which has columns named measured_time, data_type and value.
In data_type, there is two types, temperature and humidity.
I want to combine two rows of data if they have same measured_time using Django ORM.
I am using Maria DB.
Using Raw SQL, The following Query does what I want to.
SELECT T1.measured_time, T1.temperature, T2.humidity
FROM ( SELECT CASE WHEN data_type = 1 then value END as temperature,
CASE WHEN data_type = 2 then value END as humidity ,
measured_time FROM data_table) as T1,
( SELECT CASE WHEN data_type = 1 then value END as temperature ,
CASE WHEN data_type = 2 then value END as humidity ,
measured_time FROM data_table) as T2
WHERE T1.measured_time = T2.measured_time and
T1.temperature IS NOT null and T2.humidity IS NOT null and
DATE(T1.measured_time) = '2019-07-01'
Original Table
| measured_time | data_type | value |
|---------------------|-----------|-------|
| 2019-07-01-17:27:03 | 1 | 25.24 |
| 2019-07-01-17:27:03 | 2 | 33.22 |
Expected Result
| measured_time | temperaure | humidity |
|---------------------|------------|----------|
| 2019-07-01-17:27:03 | 25.24 | 33.22 |
I've never used it and so can't answer in detail, but you can feed a raw SQL query into Django and get the results back through the ORM. Since you have already got the SQL this may be the easiest way to proceed. Documentation here
Question: how do I insert a datetime value into MS SQL server, given the code below?
Context:
I have a 2-D list (i.e., a list of lists) in Python that I'd like to upload to a table in Microsoft SQL Server 2008. For this project I am using Python's pymssql package. Each value in each list is a string except for the very first element, which is a datetime value.
Here is how my code reads:
import pymssql
db_connect = pymssql.connect( # these are just generic names
server = server_name,
user = db_usr,
password = db_pwd,
database = db_name
)
my_cursor = db_connect.cursor()
for individual_list in list_of_lists:
# the first value in the paranthesis should be datetime
my_cursor.execute("INSERT INTO [DB_Table_Name] VALUES (%s, %s, %s, %s, %s, %s, %s, %s)", tuple(individual_list))
db_connect.commit()
The python interpreter is having a tough time inserting my datetime values. I understand that currently I have %s and that it is a string formatter, but I'm unsure what I should use for datetime, which is what the database's first column is formatted as.
The "list of lists" looks like this (after each list is converted into a tuple):
[(datetime.datetime(2012, 4, 1), '1', '4.1', 'hip', 'A1', 'J. Smith', 'B123', 'XYZ'),...]
Here is an illustration of what the table should look like:
+-----------+------+------+--------+-------+-----------+---------+---------+
| date | step | data | type | ID | contact | notif. | program |
+-----------+------+------+--------+-------+-----------+---------+---------+
|2012-04-01 | 1 | 4.1 | hip | A1 | J. Smith | B123 | XYZ |
|2012-09-05 | 2 | 5.1 | hip | A9 | B. Armst | B123 | ABC |
|2012-01-16 | 5 | 9.0 | horray | C6 | F. Bayes | P995 | XYZ |
+-----------+------+------+--------+-------+-----------+---------+---------+
Thank you in advance.
I would try formatting the date time to "yyyymmdd hh:mm:ss" before inserting. With what you are doing SQL will be parsing the string so I would also build the entire string and then insert the string as a variable. See below
for individual_list in list_of_lists:
# the first value in the parentheses should be datetime
date_time = individual_list[0].strftime("%Y%m%d %H:%M:%S")
insert_str = "INSERT INTO [DB_Table_Name] VALUES (" + str(date_time) + "),(" + str(individual_list[1]) + ");"
print insert_str
my_cursor.execute(insert_str)
db_connect.commit()
I apologize for the crude python but SQL should like that insert statement as long as all the fields match up. If not you may want to specify what fields those values go to in your insert statement.
Let me know if that works.
Thanks for taking the time to read this. It's going to be a long post to explain the problem. I haven't been able to find an answer in all the usual sources.
Problem:
I am having an issue with using the select statement with python to recall data from a table in a mysql database.
System and versions:
Linux ubuntu 2.6.38-14-generic #58-Ubuntu SMP Tue Mar 27 20:04:55 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Python: 2.7.1+
MySql: Server version: 5.1.62-0ubuntu0.11.04.1 (Ubuntu)
Here's the table:
mysql> describe hashes;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id | varchar(20) | NO | PRI | NULL | |
| hash | varbinary(4) | NO | MUL | NULL | |
+-------+--------------+------+-----+---------+-------+
Here are responses that I want via a normal mysql query:
mysql> SELECT id FROM hashes WHERE hash='f';
+------+
| id |
+------+
| 0x67 |
+------+
mysql> SELECT id FROM hashes WHERE hash='ff';
+--------+
| id |
+--------+
| 0x6700 |
+--------+
As before, these are the responses that are expected and how I designed the DB.
My code:
import mysql.connector
from database import login_info
import sys
db = mysql.connector.Connect(**login_info)
cursor = db.cursor()
data = 'ff'
cursor.execute("""SELECT
* FROM hashes
WHERE hash=%s""",
(data))
rows = cursor.fetchall()
print rows
for row in rows:
print row[0]
This returns the result I expect:
[(u'0x67', 'f')]
0x67
If I change data to :
data = 'ff'
I receive the following error:
Traceback (most recent call last):
File "test.py", line 11, in <module>
(data))
File "/usr/local/lib/python2.7/dist-packages/mysql_connector_python-0.3.2_devel- py2.7.egg/mysql/connector/cursor.py", line 310, in execute
"Wrong number of arguments during string formatting")
mysql.connector.errors.ProgrammingError: Wrong number of arguments during string formatting
OK. So, I add a string formatting character to my SQL statement as so:
cursor.execute("""SELECT
* FROM hashes
WHERE hash=%s%s""",
(data))
And I get the following response:
[(u'0x665aa6', "f'f")]
0x665aa6
and it should by 0x6700.
I know that I should be passing the data with one %s character. That is how I built my database table, using one %s per variable:
cursor.execute("""
INSERT INTO hashes (id, hash)
VALUES (%s, %s)""", (k, hash))
Any ideas how to fix this?
Thanks.
Your execute statement doesn't seem quite correct. My understanding is that it should follow the pattern cursor.execute( <select statement string>, <tuple>) and by putting only a single value in the tuple location it is actually just a string. To make the second argument the correct data type you need to put a comma in there, so your statement would look like:
cursor.execute("""SELECT
* FROM hashes
WHERE hash=%s""",
(data, ))