optimize python code for quicker response

optimize python code for quicker response - python

This code works, but is very slow. And I will like to use sqlalchemy module because the rest of the script uses that instead of mysql. Is there any advantage of using sqlalchemy or should I continue with this ...
for emp_id in mylist:
try:
connection = mysql.connector.connect(host='x.x.x.x', port='3306', database='xxx', user='root', password='xxx')
cursor = connection.cursor(prepared=True)
sql_fetch_blob_query = """SELECT col1, col2, Photo from tbl where ProfileID = %s"""
cursor.execute(sql_fetch_blob_query, (emp_id, ))
record = cursor.fetchall()
for row in record:
image = row[2]
file_name = 'myimages4'+'/'+str(row[0])+ '_' + str(row[1]) + '/' + 'simage' + str(emp_id) + '.jpg'
write_file(image, file_name)
except mysql.connector.Error as error :
connection.rollback()
print("Failed to read BLOB data from MySQL table {}".format(error))
finally:
if(connection.is_connected()):
cursor.close()
connection.close()

Do you really need to set up new mysql connection and obtain cursor on each iteration? If no, opening it once at the beginning will really speed up your code.
connection = mysql.connector.connect(host='x.x.x.x', port='3306', database='xxx', user='root', password='xxx', charset="utf8")
cursor = connection.cursor(prepared=True)
for emp_id in mylist:
try:
sql_fetch_blob_query = """SELECT col1, col2, Photo from tbl where ProfileID = %s"""
cursor.execute(sql_fetch_blob_query, (emp_id, ))
record = cursor.fetchall()
for row in record:
image = row[2]
file_name = 'myimages4'+'/'+str(row[0])+ '_' + str(row[1]) + '/' + 'simage' + str(emp_id) + '.jpg'
write_file(image, file_name)
except mysql.connector.Error as error :
connection.rollback()
print("Failed to read BLOB data from MySQL table {}".format(error))
finally:
# ouch ...
if(connection.is_connected()):
cursor.close()
connection.close()
UPD:
Actually you don't even need to make N queries to database, because all data can be obtained in one query with WHERE ProfileID IN (.., ..) SQL statement. Take a look this small code, which solves a pretty much identical task:
transaction_ids = [c['transaction_id'] for c in checkouts]
format_strings = ','.join(['%s'] * len(transaction_ids))
dm_cursor.execute("SELECT ac_transaction_id, status FROM transactions_mapping WHERE ac_transaction_id IN (%s)" % format_strings, tuple(transaction_ids))
payments = dm_cursor.fetchall()
Please use it to solve your problem.

Related

executing a sql query using python

I'm trying to create a small python app to extract data from specific table of database.
The extracted rows have to be between CREATION_DATETIME specified by user.
Heres the code:
startdate = input("Prosze podac poczatek przedzialu czasowego (format RRRR-MM-DD GG:MM:SS): ")
enddate = input("Prosze podac koniec przedzialu czasowego (format RRRR-MM-DD GG:MM:SS): ")
query = "SELECT * FROM BRDB.RFX_IKW_MODIFY_EXEC_ORDER_CANCEL_LOG WHERE CREATION_DATETIME between '%s' and '%s' ORDER BY CREATION_DATETIME DESC;"
tuple1 = (startdate, enddate)
cursor.execute(*query, (tuple1,))
records = cursor.fetchall()
print("Total number of rows in table: ", cursor.rowcount)
print(records)
I'm not much of developer and I'm stuck at error "TypeError: CMySQLCursorPrepared.execute() takes from 2 to 4 positional arguments but 104 were given" in various counts, depends on how I try to modify the code.
Could you guys help me out in specyfing that query correctly?
Thank you in advance.
Tried various tutorial about parametrized query but with no luck.

You're starring the query, making it an iterable of the characters making up the string, which probably isn't what you meant (i.e., you should emove the * operator). In addition, tuple1 is already a tuple, you shouldn't enclose it inside another tuple:
cursor.execute(query, tuple1)
# Remove the *-^
# Use tuple1 directly-^

here is the full code
import mysql.connector
from mysql.connector import Error
try:
print("Laczenie z baza danych....")
connection = mysql.connector.connect(host='',
port='',
database='',
user='',
password='')
if connection.is_connected():
db_Info = connection.get_server_info()
print("Wersja servera MySQL:", db_Info)
cursor = connection.cursor(prepared=True)
cursor.execute("select database();")
record = cursor.fetchone()
print("Pomyslnie polaczono z baza danych: ", record)
except Error as e:
print("Blad polaczenia!", e)
quit()
try:
startdate = input("Prosze podac poczatek przedzialu czasowego (format RRRR-MM-DD GG:MM:SS): ")
enddate = input("Prosze podac koniec przedzialu czasowego (format RRRR-MM-DD GG:MM:SS): ")
query = "SELECT * FROM BRDB.RFX_IKW_MODIFY_EXEC_ORDER_CANCEL_LOG WHERE CREATION_DATETIME between '%s' and '%s' ORDER BY CREATION_DATETIME DESC;"
tuple1 = (startdate, enddate,)
cursor.execute(query, tuple1)
records = cursor.fetchall()
print("Fetching each row using column name")
for row in records:
message_id = row["MESSAGE_ID"]
executable_order_id = row["EXECUTABLE_ORDER_ID"]
creation_datetime = row["CREATION_DATETIME"]
message_type = row["MESSAGE_TYPE"]
message_status = row["MESSAGE_STATUS"]
print(message_id, executable_order_id, creation_datetime, message_status)
except mysql.connector.Error as e:
print("Error reading data from MySQL table", e)
finally:
if connection.is_connected():
cursor.close()
connection.close()
print("MySQL connection is closed")

Getting row counts from Redshift during unload process and counting rows loaded in S3

My python code looks like below where I am unloading data from Redshift to Amazon S3 bucket. I am trying to get row count from Redshift and S3 bucket to ensure that all the data is loaded. Additionally, I would also like to get last uploaded date from S3 bucket so that I know when last unload was performed. Kindly suggest the code with explanation.
Thanks in advance for your time and efforts!
import csv
import redshift_connector
import sys
CSV_FILE="Tables.csv"
CSV_DELIMITER=';'
S3_DEST_PATH="s3://..../"
DB_HOST="MY HOST"
DB_PORT=1234
DB_DB="MYDB"
DB_USER="MY_READ"
DB_PASSWORD="MY_PSWD"
IM_ROLE="arn:aws:iam::/redshift-role/unload data","arn:aws::iam::/write in bucket"
def get_tables(path):
tables=[]
with open (path, 'r') as file:
csv_reader = csv.reader (file,delimiter=CSV_DELIMITER)
header = next(csv_reader)
if header != None:
for row in csv_reader:
tables.append(row)
return tables
def unload(conn, tables, s3_path):
cur = conn.cursor()
for table in tables:
print(f">{table[0]}.{table[1]}")
try:
query= f'''unload('select * from {table[0]}.{table[1]}' to '{s3_path}/{table[1]}/'
iam_role '{IAM_ROLE}'
CSV
PARALLEL FALSE
CLEANPATH;'''
print(f"loading in progress")
cur.execute(query)
print(f"Done.")
except Esception as e:
print("Failed to load")
print(str(e))
sys.exit(1)
cur.close()
def main():
try:
conn = redshift_connector.connect(
host=DB_HOST,
port=DB_PORT,
database= DB_DB,
user= DB_USER,
password=DB_PASSWORD
)
tables = get_tables(CSV_FILE)
unload(conn,tables,S3_DEST_PATH)
conn.close()
except Exception as e:
print(e)
sys.exit(1)
Update code based on SO User's comment
tables=['schema1.tablename','schema2.table2']
conn=redshift_connector.connect(
host='my_host',
port= "my_port",
database='my_db'
user="user"
password='password')
cur=conn.cursor()
cur.execute ('select count(*) from {',' .join("'"+y+"'" for y in tables)}')
results=cur.fetchall()
print("The table {} contained".format(tables[0]),*result[0],"rows"+"\n" ) #Printing row counts along with table names
cur.close()
conn.close()
2nd Update:
tables=['schema1.tablename','schema2.table2']
conn=redshift_connector.connect(
host='my_host',
port= "my_port",
database='my_db'
user="user"
password='password')
cur=conn.cursor()
for table in tables:
cur.execute(f'select count(*) from {table};')
results=cur.fetchone()
for row in result:
print("The table {} contained".format(tables[0]),result[0],"rows"+"\n" ) #Printing row counts along with table names

The simple query to get number of rows is
query = "select count(*) from {table_name}"
For Redshift, all you need to do is
cur.execute(query)
row_count = cur.fetchall()
Using boto3, you can use a similar SQL query to fetch S3 row count as well, as elucidated in this answer.
Edit:
Corrected your updated approach a little:
cur=conn.cursor()
for table in tables:
cur.execute(f'select count(*) from {table};')
result=cur.fetchone()
count = result[0] if result else 0
print(f"The table {table} contained {count} rows.\n" )

how to overwrite or skip data when the same data is already existed in database?

i'm taking data from textfile which contains some duplicate data.And i'm trying to insert them into database without duplicating.i'm in trouble where inserting duplicate data.it should not be inserted again.data are not static values.
text_file = open(min_file, "r")
#doc = text_file.readlines()
for line in text_file:
field = line.split(";")
print(field)
try:
connection = mysql.connector.connect(host='localhost',
database='testing',
user='root',
password='root')
if connection.is_connected():
db_Info = connection.get_server_info()
print("Connected to MySQL Server version ", db_Info)
cursor = connection.cursor()
cursor.execute("select database();")
record = cursor.fetchone()
print("You're connected to database: ", record)
mycursor = connection.cursor()
#before inserting
mycursor.execute("Select * from ftp")
myresult = mycursor.fetchall()
for i in myresult:
print(i)
sql ="Insert into ftp(a,b,c,d) \
select * from( Select VALUES(%s,%s,%s,%s) as temp \
where not exists \
(Select a from ftp where a = %s) LIMIT 1"
mycursor.execute(sql,field)
print(mycursor.rowcount, "record inserted.")
connection.commit()
except Error as e:
print("Error while connecting to MySQL", e)
finally:
if connection.is_connected():
cursor.close()
connection.close()
print("MySQL connection is closed")

One option is to add Unique constraint and let the DB validate uniqueness, this will throw exception which you can catch and skip.

MySQL Python problem with two Placeholders

I have a problem with my function:
def dataf (p,k):
try:
connection = mysql.connector.connect(host='host',
database='products',
user='user',
password='pwd')
sql_select_Query = "select a from table where b LIKE %s AND c LIKE %s"
cursor = connection.cursor()
cursor.execute(sql_select_Query, ('%' + p+ '%',), ('%' + k + '%',))
records = cursor.fetchall()
return records
except Error as e:
print("Error reading data from MySQL table", e)
finally:
if (connection.is_connected()):
connection.close()
cursor.close()
When I execute this function with only the first placeholder, everything works fine. With the second placeholder I get the TypeError: NoneType
With the second placeholder I want to check if in column c a value is like = 0,5 kg for example. When I write the query without the second placeholder and insert the value directly, everything works fine:
sql_select_Query = "select a from table where b LIKE %s AND c LIKE '0,5 kg'"
What am I doing wrong?

Ok I got it with:
sql_select_Query = "select a from table where b LIKE %s AND c LIKE %s"
cursor = connection.cursor()
cursor.execute(sql_select_Query, ('%' + p+ '%','%' + k + '%',))
I got the result.

How do I get tables in postgres using psycopg2?

Can someone please explain how I can get the tables in the current database?
I am using postgresql-8.4 psycopg2.

This did the trick for me:
cursor.execute("""SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public'""")
for table in cursor.fetchall():
print(table)

pg_class stores all the required information.
executing the below query will return user defined tables as a tuple in a list
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("select relname from pg_class where relkind='r' and relname !~ '^(pg_|sql_)';")
print cursor.fetchall()
output:
[('table1',), ('table2',), ('table3',)]

The question is about using python's psycopg2 to do things with postgres. Here are two handy functions:
def table_exists(con, table_str):
exists = False
try:
cur = con.cursor()
cur.execute("select exists(select relname from pg_class where relname='" + table_str + "')")
exists = cur.fetchone()[0]
print exists
cur.close()
except psycopg2.Error as e:
print e
return exists
def get_table_col_names(con, table_str):
col_names = []
try:
cur = con.cursor()
cur.execute("select * from " + table_str + " LIMIT 0")
for desc in cur.description:
col_names.append(desc[0])
cur.close()
except psycopg2.Error as e:
print e
return col_names

Here's a Python3 snippet that includes connect() parameters as well as generate a Python list() for output:
conn = psycopg2.connect(host='localhost', dbname='mySchema',
user='myUserName', password='myPassword')
cursor = conn.cursor()
cursor.execute("""SELECT relname FROM pg_class WHERE relkind='r'
AND relname !~ '^(pg_|sql_)';""") # "rel" is short for relation.
tables = [i[0] for i in cursor.fetchall()] # A list() of tables.

Although it has been answered by Kalu, but the query mentioned returns tables + views from postgres database. If you need only tables and not views then you can include table_type in your query like-
s = "SELECT"
s += " table_schema"
s += ", table_name"
s += " FROM information_schema.tables"
s += " WHERE"
s += " ("
s += " table_schema = '"+SCHEMA+"'"
s += " AND table_type = 'BASE TABLE'"
s += " )"
s += " ORDER BY table_schema, table_name;"
db_cursor.execute(s)
list_tables = db_cursor.fetchall()

you can use this code for python 3
import psycopg2
conn=psycopg2.connect(database="your_database",user="postgres", password="",
host="127.0.0.1", port="5432")
cur = conn.cursor()
cur.execute("select * from your_table")
rows = cur.fetchall()
conn.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

optimize python code for quicker response - python

Related

executing a sql query using python

Getting row counts from Redshift during unload process and counting rows loaded in S3

how to overwrite or skip data when the same data is already existed in database?

MySQL Python problem with two Placeholders

How do I get tables in postgres using psycopg2?

Categories

Resources