I would like to connect to MS SQL Server and execute a SQL command with python. I am familiar with using SQLAlchemy to create SQL tables, pandas DataFrames, etc., but how can I execute the SQL in a .sql file with python/pandas/SQLAlchemy? Is there a better way to do it?
For example I have the file 'update.sql' that contains the SQL text:
truncate table dev.dbo.jobs
truncate table dev.dbo.workers
go
insert into dev.dbo.jobs select * from test.dbo.jobs
insert into dev.dbo.workers select * from test.dbo.workers
You can use SQLAlchemy's connection.execute to run raw SQL queries. If you have the sql statements stored in a file then it might look something like this:
from sqlalchemy import create_engine
from sqlalchemy.sql import text
engine = create_engine('urltodb')
conn = engine.connect()
with open('file.sql', 'r') as f:
for l in f:
stmt = text(l)
conn.execute(stmt)
conn.close()
Related
I am using sqlite3 with python, and after connecting to the database and creating a table, sqlite3 shows an error when I try to execute a SELECT statment on the table with the name of the databse in it :
con = sqlite3.connect("my_databse")
cur = con.cursor()
cur.execute('''CREATE TABLE my_table ... ''')
cur.execute("SELECT * FROM my_database.my_table") # this works fine without the name of the database before the table name
but I get this error from sqlite3 :
no such table : my_database.my_table
Is there a way to do a SELECT statment with the name of the database in it ?
The short answer is no you can't do this with SQLite. This is because you already specify the database name with sqlite3.connect() and SQLite3 doesn't allow multiple databases in the same file.
Make sure of the database is in the same directory with the python script. In order to verify this you can use os library and os.listdir() method. After connecting the database and creating the cursor, you can query with the table name.
cur.execute('SELECT * FROM my_table')
I like to keep one file with just my f strings that represents SQL queries and each query needs a db name. In my python program, i will populate the db variable as I am reading out db names.
E.g.
in queries.py, i have
query_1 = f"select * from {db} limit 10;"
in main.py, i use
from quries import quries
# read out db information into dbs
for db in dbs:
# execute the query_1 for each db
How can I achieve the logic in the loop?
Don't use an f-string, but .format()
# queries
query_1 = "select * from {db} limit 10;"
from queries import query_1
...
for db in dbs:
query = query_1.format(db=db)
...
Assuming I have 30 databases in MySQL from db1 to db30. I have a python script that will create engine and connect to one db,
import pandas as pd
import MySQLdb
from sqlalchemy import create_engine
df = pd.read_csv('pricelist.csv')
new_df = df[['date','time','new_price']]
engine = create_engine('mysql+mysqldb://root:python#localhost:3306/db1', echo = False)
new_df.to_sql(name='temporary_table', con=engine, if_exists = 'append', index=False)
with engine.begin() as cnx:
sql_insert_query_new = 'REPLACE INTO newlist (SELECT * FROM temporary_table)'
cnx.execute(sql_insert_query_new)
cnx.execute("DROP TABLE temporary_table")
Now with the above script, I will need to have 30 python scripts to create engine and connect each db to conduct the query. And to call these 30 scripts, I will need to use a batch file on a task scheduler.
Is there an optimize way of connecting to multiple databases with a single script? I read up on sessions and don't think it is able to take in multiple databases. And if I have 30 python scripts doing this creation engine and connection, will there be any issue in terms of processing performance? Eventually, I will have like hundreds of db in MySQL.
Thanks!
Note: Each database has their own unique table names.
Using Python 3.7
I think may be you can do something like this:
import pandas as pd
import MySQLdb
from sqlalchemy import create_engine
df = pd.read_csv('pricelist.csv')
new_df = df[['date','time','new_price']]
db_names = [f'db{i}' for i in range(1, 31)]
table_names = ['temporary_table', 'table_name_2', 'table_name_3', ...]
for db, tb in zip(db_names, table_names):
engine = create_engine(f'mysql+mysqldb://root:python#localhost:3306/{db}', echo=False)
new_df.to_sql(name=tb, con=engine, if_exists='append', index=False)
with engine.begin() as cnx:
sql_insert_query_new = f'REPLACE INTO newlist (SELECT * FROM {tb})'
cnx.execute(sql_insert_query_new)
cnx.execute(f"DROP TABLE {tb}")
I am new to the Python-SQL connectivity world. My goal is to retrieve data from SQL in a pandas DataFrame format by executing long SQL queries thru my python script.
Most of my SQL queries are long with multiple interim-temp tables before the final SELECT statement from the last temp table. When I run such a monolithic query in Python I get an error saying -
"pandas.io.sql.DatabaseError: Execution failed on sql"
Though they run absolutely fine in MS SQL Management Studio
I suspect this is due to the interim-temp tables, because if I split my long query into two pieces (with everything before the final SELECT in 1st section and final SELECT in the 2nd section) the two section sequentially, run fine
Can someone guide me why is it so or alternatively what is the best way to run long queries with temp tables/views and retrieve results in a pandas DataFrame?
Here is my sample Python code that ideally should take a fine name as an input and run the SQL to retrieve results in a data frame, however it fails in case of a query with temp tables
import pyodbc as db
import pandas as pd
filename = 'file.sql'
username = 'XXXX'
password = 'YYYYY'
driver= '{ODBC Driver 13 for SQL Server}'
database = 'DB'
server = 'local'
conn = db.connect('DRIVER='+driver+'; PORT=1433; SERVER='+server+';
PORT=1443; DATABASE='+database+'; UID='+username+'; PWD='+ password)
fd = open(filename, 'r')
sqlfile = fd.read()
fd.close()
sqlcommand1 = sql
df_table = pd.read_sql(sqlcommand1, conn)
If I break my sql query in two pieces (one with all temp tables and 2nd with final Select), then it runs fine. Below is a modified function that splits the long Query after finding '/**/' and it works fine
"""
This Function Reads a SQL Script From an Extrenal File and Executes The
Script in SQL. If The SQL Script Has Bunch of Tem Tables/Views
Followed By a Select Statement to Retrieve Data From Those Views Then Input
SQL File Should Have '/**/' Immediately Before the Final
Select Statement. This is to Esnure Final Select Statement is Executed on
the Temporary Views Already Run by Python.
Input is a SQL File Name and Output is a DataFrame
"""
import pyodbc as db
import pandas as pd
filename = 'filename.sql'
username = 'XXXX'
password = 'YYYYY'
driver= '{ODBC Driver 13 for SQL Server}'
database = 'DB'
server = 'local'
conn = db.connect('DRIVER='+driver+'; PORT=1433; SERVER='+server+';
PORT=1443; DATABASE='+database+'; UID='+username+'; PWD='+ password)
fd = open(filename, 'r')
sqlfile = fd.read()
fd.close()
sql = sqlfile.split('/**/')
sqlcommand1 = sql[0] #1st Section of Query with temp tables
sqlcommand2 = sql[1] #2nd section of Query with final SELECT statement
conn.execute(sqlcommand1)
df_table = pd.read_sql(sqlcommand2, conn)
Quick and dirty answer: if using T-SQL put the line SET NOCOUNT ON at the beginning of your query.
Like #Parfait mentioned above the pandas read_sql method can only support one result set. However, when you generate a temp table in T-sql you do create a result set in the form "(XX row(s) affected)" which is what causes your original query to fail. By setting NOCOUNT you eliminate any early returns and only get the results from your final SELECT statement.
Alternatively, if using pyodbc cursor instead of pandas you can utilize nextset() to skip the result sets from the temp table(s). More info on pyodbc here.
Is there a library or open source utility available to search all the tables and columns of an Sqlite database? The only input would be the name of the sqlite DB file.
I am trying to write a forensics tool and want to search sqlite files for a specific string.
Just dump the db and search it.
% sqlite3 file_name .dump | grep 'my_search_string'
You could instead pipe through less, and then use / to search:
% sqlite3 file_name .dump | less
You could use "SELECT name FROM sqlite_master WHERE type='table'"
to find out the names of the tables in the database. From there it is easy to SELECT all rows of each table.
For example:
import sqlite3
import os
filename = ...
with sqlite3.connect(filename) as conn:
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
for tablerow in cursor.fetchall():
table = tablerow[0]
cursor.execute("SELECT * FROM {t}".format(t = table))
for row in cursor:
for field in row.keys():
print(table, field, row[field])
I know this is late to the party, but I had a similar issue but since it was inside of a docker image I had no access to python, so I solved it like so:
for X in $(sqlite3 database.db .tables) ; do sqlite3 database.db "SELECT * FROM $X;" | grep >/dev/null 'STRING I WANT' && echo $X; done
This will iterate through all tables in a database file and perform a select all operation which I then grep for the string. If it finds the string, it prints the table, and from there I can simply use sqlite3 to find out how it was used.
Figured it might be helpful to other who cannot use python.
#MrWorf's answer didn't work for my sqlite file (an .exb file from Evernote) but this similar method worked:
Open the file with DB Browser for SQLite sqlitebrowser mynotes.exb
File / Export to SQL file (will create mynotes.exb.sql)
grep 'STRING I WANT" mynotes.exb.sql