I'm trying to optimize the below query and python script.
There are 100,000+ rows in file and about 300,000 items in file. It takes a very long time to run. Any ideas?
import MySQLdb.cursors as cursors
db = MySQLdb.connect(host=*,user=*,passwd=*,db=*, cursorclass=cursors.SSCursor)
cursor = db.cursor()
print 'running large query'
cursor.execute('
SELECT `file`.`file_id`,`file`.`format` FROM `file`
LEFT OUTER JOIN (SELECT `file_id`, `filename` FROM `file_version` WHERE `filename` = %s) AS `version`
ON `file`.`file_id` = `version`.`file_id` WHERE `version`.`filename` IS NULL',[thumbnail])
results = cursor.fetchall()
cursor.close()
for row in results
table file
file_id (pk) | format | etc.
table file_version
version_id (pk) | file_id | filename | etc.
Each file can have multiple versions, e.g. original.jpg, 300.jpg, 600.jpg, 800.jpg. The select inside a select is what is really slowing it down IMO. Thank you.
Related
import sqlite3
conn = sqlite3.connect('example.db') # connectiong to database or creating new if table does
not exist
cursor = conn.cursor() # allows python to use sql commands
cursor.execute('''CREATE TABLE IF NOT EXISTS Fruits( #creating table
code TEXT PRIMARY KEY, # text data-type, primary key -constraint
name TEXT,
price FLOAT);
''')
conn.commit() # saves all changes made
fruit_list = [('1','blackberry', 300.00),('2','raspberry', 250.00), ('3','bananas', 150.00),
('4','strawberry', 200.00)] #list of tuples
cursor.executemany('INSERT INTO Fruits VALUES(?,?,?)', fruit_list) # inserting data into table
cursor.execute("""DELETE FROM Fruits WHERE name='blackberry'""") cursor.execute("""DELETE FROM Fruits WHERE price='150.00'""")
cursor.execute('SELECT * FROM Fruits')
show_all = cursor.fetchall()
for e in show_all:
print('code: {} | name: {} | price {}'.format(e[0],e[1],e[2]))
You can combine all your conditions in the WHERE clause with the logical operator OR in a single statement:
DELETE
FROM Fruits
WHERE name='blackberry' OR price='150.00'
I use mysql.connector library in Python to send query to database. But, when the database is changed after the initialization, the mysql.connector’s tools answer like if the database had never change.
As example, let’s imagine I have a minimalist table students with just two columns id and name.
+----+------+
| id | name |
+----+------+
| 0 | foo |
+----+------+
In the following code, the query will ask the user with id 0. But, inside the process, some events will happened from outside the Python script and alter the database.
import mysql.connector
maindb = mysql.connector.connect(
host = "<host>",
user = "<user>",
password = "<password>",
db = "<database name>"
)
cursor = maindb.cursor()
# Here, I will send outside the python script a MySQL query to modify the name of the student from “foo” to “bar” like this:
# `UPDATE `students` SET `name` = 'bar' WHERE `students`.`id` = 0;`
cursor.execute("SELECT `id`, `name` FROM `students` WHERE `id` = 0")
result = cursor.fetchall()
print(result)
Then I get this answer [(0, 'foo')]. As you see, Python is not aware the data base has change since maindb.cursor() was called. So I get foo as name field instead of bar as expected.
So how to tell mysql.connector’s tools to take the last updates from the database when I send a query?
You will need to use a socket or if the changes occur frequently have your code re-run every x minutes
I just need to .connect() maindb object and .close() it before each new need.
maindb.connect()
cursor.execute("SELECT `id`, `name` FROM `students` WHERE `id` = 0")
result = cursor.fetchall()
print(result)
maindb.close()
The database maintains data integrity by preventing in-progress transactions from seeing changes made by other transactions (see transaction isolation levels).
You can commit your connection to allow it to see new changes:
cursor = maindb.cursor()
# Here, I will send outside the python script a MySQL query to modify the name of the student from “foo” to “bar” like this:
# `UPDATE `students` SET `name` = 'bar' WHERE `students`.`id` = 0;`
# Doesn't show the update
cursor.execute("SELECT `id`, `name` FROM `students` WHERE `id` = 0")
result = cursor.fetchall()
print(result)
# Shows the update because we have committed.
maindb.commit()
cursor.execute("SELECT `id`, `name` FROM `students` WHERE `id` = 0")
result = cursor.fetchall()
print(result)
I'm trying to write combined table data into a new table for a timing system I'm working on.
The following SQL works in PHPMyAdmin.:
INSERT
INTO
results(
firstName,
lastName,
raceNumber,
raceTime
)
SELECT
s.firstName,
s.lastName,
s.raceNumber,
h.time
FROM
runners s
INNER JOIN
chipData hp
ON
s.raceNumber = hp.bandID
INNER JOIN
readings h
ON
hp.tagId = h.tagId
WHERE
hp.tagId = 123456
LIMIT 1
However, if I add this into a Python statement as follows, it doesn't work:
db = connect()
cur = db.cursor()
cur.execute("""INSERT INTO results( firstName, lastName, raceNumber, raceTime ) SELECT s.firstName, s.lastName, s.raceNumber, h.time FROM runners s INNER JOIN chipData hp ON s.raceNumber = hp.bandID INNER JOIN readings h ON hp.tagId = h.tagId WHERE hp.tagId = %s LIMIT 1""", (123456)
db.commit()
db.close()
Any help is appreciated!
I am trying to do a insert query in the SQL. It indicates that it succeed but shows no record in the database. Here's my code
conn = MySQLdb.connect("localhost",self.user,"",self.db)
cursor = conn.cursor()
id_val = 123456;
path_val = "/homes/error.path"
host_val = "123.23.45.64"
time_val = 7
cursor.execute("INSERT INTO success (id,path,hostname,time_elapsed) VALUES (%s,%s,%s,%s)", (id_val, path_val,host_val,time_val))
print "query executed"
rows = cursor.fetchall()
print rows
this outputs the following
query executed
()
it gives me no errors but the database seems to be empty. I tried my SQL query in the mysql console. executed the following command.
INSERT INTO success (id,path,hostname,time_elapsed)
VALUES (1,'sometext','hosttext',4);
This works fine as I can see the database got populated.
mysql> SELECT * FROM success LIMIT 5;
+----+----------+----------+--------------+
| id | path | hostname | time_elapsed |
+----+----------+----------+--------------+
| 1 | sometext | hosttext | 4 |
+----+----------+----------+--------------+
so I am guessing the SQL query command is right. Not sure why my cursor.execute is not responding. Could someone please point me to the right direction. Can't seem to figure out the bug. thanks
After you are sending your INSERT record, you should commit your changes in the database:
cursor.execute("INSERT INTO success (id,path,hostname,time_elapsed) VALUES (%s,%s,%s,%s)", (id_val, path_val,host_val,time_val))
conn.commit()
When you want to read the data, you should first send your query as you did through your interpreter.
So before you fetch the data, execute the SELECT command:
cursor.execute("SELECT * FROM success")
rows = cursor.fetchall()
print rows
If you want to do it pythonic:
cursor.execute("SELECT * FROM success")
for row in cursor:
print(row)
We use an object that keeps connection to PostgreSQL database and creates new cursors to serve requests. I observed strange behavior: even when the response was read and the cursor is closed, the request is still hanging in the database, preventing updating the table etc etc.
When the connection is closed, it disappears.
I know about ORM frameworks and maybe will end up using one of them, but I just want to understand what's happening here. Why the request is still there?
Here's the python code:
import psycopg2
def main():
conn = psycopg2.connect("dbname=tmpdb password=1 host=localhost")
cur = conn.cursor()
cur.execute("SELECT 1;")
items = cur.fetchall()
cur.close()
#uncommenting the following line solves the problem
#conn.close()
print items
while True:
pass
main()
Here's how to start the code:
>python test_loop.py
[(1,)]
Here's how to observe hanging request:
tmpdb=# SELECT datname,usename,pid,client_addr,waiting,query_start,query FROM pg_stat_activity ;
datname | usename | pid | client_addr | waiting | query_start | query
---------+----------+-------+-------------+---------+-------------------------------+------------------------------------------------------------------------------------------
tmpdb | savenkov | 530 | ::1 | f | 2013-08-12 13:56:32.652996+00 | SELECT 1;
tmpdb | savenkov | 88351 | | f | 2013-08-12 13:56:35.331442+00 | SELECT datname,usename,pid,client_addr,waiting,query_start,query FROM pg_stat_activity ;
(2 rows)
Why do you think it is blocking?
Create the table
create table t (i integer);
Now run it:
import psycopg2
def main():
conn = psycopg2.connect("dbname=cpn")
cur = conn.cursor()
cur.execute("SELECT i from t;")
items = cur.fetchall()
print items
raw_input('Enter to insert')
cur.execute("insert into t (i) values (1) returning i;")
items = cur.fetchall()
conn.commit()
cur.execute("SELECT i from t;")
items = cur.fetchall()
print items
raw_input('Enter to update')
cur.execute("update t set i = 2 returning i")
items = cur.fetchall()
conn.commit()
cur.execute("SELECT i from t;")
items = cur.fetchall()
print items
cur.close()
while True:
pass
main()
Notice that you need to connection.commit() for it to be commited.
With that said don't do connection management. In instead use a connection pooler like Pgbouncer. It will save you from lots of complexity and frustration.
If the application runs on the same machine as the db then don't even bother. Just always close the connection as frequently as necessary. If both are in a fast intranet it is also not worth the added complexity of a connection pooler unless there is a really huge number of queries.