Python Pandas SQLAlchemy how to make connection to a local SQL Server - python

I am trying to connect to a local network SQL Server using SQLAlchemy. I don't know how to use SQLAlchemy for doing this. Other examples I have seen do not use the more modern Python (3.6+) f-string. I need to have data in a Pandas dataframe "df". I'm not 100% sure but this local server does not have a username and password requirement...

So this is working right now.
import pandas as pd
import pyodbc
import sqlalchemy as sql
server = 'NetworkServer' # this is the server name that IT said my data is on.
database = 'Database_name' # The name of the database and this database has multiple tables.
table_name = 't_lake_data' # name of the table that I want.
# I'm not sure but this local server does not have a username and password requirement.
engine = sql.create_engine(f'mssql+pyodbc://{server}/{database}?trusted_connection=yes&driver=SQL+Server')
# I don't know all the column names so I use * to represent all column names.
sql_str = f"SELECT * FROM dbo.{table_name}"
df = pd.read_sql_query(sql_str, engine, parse_dates="DATE_TIME")
So if there are concerns with how this looks leave a comment. Thank you.

Related

Download AWS RDS Data using Python After a certain Timestamp

I have an RDS database where there is a single sql table and new timeseries data shows up in it every 3 hours.
I am trying to make a python script that pulls me all rows of data that came after a certain timestamp (for example t=04/03/2022 21:45:54)?
I tried to look for resources online but I am confused, what Boto3 functions I need to use for this? And what should be my example query?
Here is how I solved the main thing in this question. This code pulls all the rows from the RDS SQL Database that come after a certain timestamp (oldTimestamp). On the first search I found that pyodbc does the job, but it took me sometime to get it to work. One needs to be very careful with string formatting in the pyodbc.connect() function and the string format of sql query. With these 2 things handled well, this should work for you very smoothly. Cheers!
import pyodbc
import pandas as pd
server = 'write your server endpoint in here'
username = 'yourusername'
password = 'yourpassword'
database = 'nameofdatabase'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
oldTimestamp = '2022-04-22 23:30:00'
sql = "SELECT * FROM dbo.eq_min WHERE dbo.eq_min.Timestamp > '" +"{}' ORDER BY dbo.eq_min.Timestamp ASC".format(oldTimestamp)
df = pd.read_sql(sql, cnxn)
print(df.head())

Postgres tables reflected with SQLAlchemy not recognizing columns for Join

I'm trying to learn SQLAlchemy and I feel like I must be missing something fundamental. Hopefully, this is a straightforward error on my part and someone can show me the link to the documentation hat explains what I'm doing wrong.
I have a Postgres database running in a Docker container on my local machine. I can connect to it and export queries to Python using psycopg2 with no issues.
I'm trying to recreate what I did using pyscopg2 using SQLAlchemy but I'm having trouble when I try to join two tables. My current code looks like this:
from sqlalchemy import *
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import select
conn = create_engine('postgresql://postgres:my_password#localhost/in_votes')
metadata = MetaData(conn)
metadata.reflect(bind = conn)
pop = metadata.tables['pop']
absentee = metadata.tables['absentee']
Session = sessionmaker(bind=conn)
session = Session()
session.query(pop).join(absentee, county == pop.county).all()
I'm trying to join the pop and absentee tables on the county field and I get the error:
NameError: name 'county' is not defined
I can view the columns in each table and loop through them to access the data.
Can someone clear this up for me and explain what I'm doing wrong?

Update MSSQL table through SQLAlchemy using dataframes

I'm trying to replace some old MSSQL stored procedures with python, in an attempt to take some of the heavy calculations off of the sql server. The part of the procedure I'm having issues replacing is as follows
UPDATE mytable
SET calc_value = tmp.calc_value
FROM dbo.mytable mytable INNER JOIN
#my_temp_table tmp ON mytable.a = tmp.a AND mytable.b = tmp.b AND mytable.c = tmp.c
WHERE (mytable.a = some_value)
and (mytable.x = tmp.x)
and (mytable.b = some_other_value)
Up to this point, I've made some queries with SQLAlchemy, stored those data in Dataframes, and done the requisite calculations on them. I don't know now how to put the data back into the server using SQLAlchemy, either with raw SQL or function calls. The dataframe I have on my end would essentially have to work in the place of the temporary table created in MSSQL Server, but I'm not sure how I can do that.
The difficulty is of course that I don't know of a way to join between a dataframe and a mssql table, and I'm guessing this wouldn't work so I'm looking for a workaround
As the pandas doc suggests here :
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://user:password#DSN", echo = False)
dataframe.to_sql('tablename', engine , if_exists = 'replace')
engine parameter for msSql is basically the connection string check it here
if_exist parameter is a but tricky since 'replace' actually drops the table first and then recreates it and then inserts all data at once.
by setting the echo attribute to True it shows all background logs and sql's.

Fastest way to fetch table from MySQL into Pandas

I am trying to determine the fastest way to fetch data from MySQL into Pandas. So far, I have tried three different approaches:
Approach 1: Using pymysql and modifying field type (inspired by Fastest way to load numeric data into python/pandas/numpy array from MySQL)
import pymysql
from pymysql.converters import conversions
from pymysql.constants import FIELD_TYPE
conversions[FIELD_TYPE.DECIMAL] = float
conversions[FIELD_TYPE.NEWDECIMAL] = float
conn = pymysql.connect(host = host, port = port, user= user, passwd= passwd, db= db)
Approach 2: Using MySqldb
import MySQLdb
from MySQLdb.converters import conversions
from MySQLdb.constants import FIELD_TYPE
conversions[FIELD_TYPE.DECIMAL] = float
conversions[FIELD_TYPE.NEWDECIMAL] = float
conn = MySQLdb.connect(host = host, port = port, user= user, passwd= passwd, db= db)
Approach 3: Using sqlalchemy
import sqlalchemy as SQL
engine = SQL.create_engine('mysql+mysqldb://{0}:{1}#{2}:{3}/{4}'.format(user, passwd, host, port, db))
Approach 2 is the best out of these three and takes an average of 4 seconds to fetch my table. However, fetching the table only takes 2 seconds on MySQL Workbench. How can I shave off this 2 extra seconds ? Does anyone know of any alternative ways to accomplish this ?
You can use ConnectorX library that written with rust and is about 10 times faster than pandas.
This library gets data from the database and fills the dataframe.
I think you may find answers using a specific library such as "peewee" or the function df.read_sql_query from the pandas library. To use df.read_sql_query :
MyEngine = create_engine('[YourDatabase]://[User]:[Pass]#[Host]/[DatabaseName]', echo = True)
df = pd.read_sql_query('select * from [TableName]', con= MyEngine)
Also, for uploading data from a dataframe to SQL:
df.to_sql([TableName], MyEngine, if_exists = 'append', index=False)
You must put if_exists = 'append' if the table already exists, or it will auto-default to fail. You could also put replace if you want to replace as new table as well.
For data integrity sake it's nice using dataframes for uploads and downloads due to its ability to handle data well. Depending on your size of upload, it should be pretty efficient on upload time too.
If you want to go an extra step, peewee queries may help make upload time faster, although I have not personally tested speed. Peewee is an ORM library like SQLAlchemy that I found to be very easy and expressive to develop with.
You also could use dataframes as well. Just skim over the documentation - you would construct and assign a query, then convert it to a dataframe like this:
MyQuery = [TableName]select()where([TableName.column] == "value")
df = pd.DataFrame(list(MyQuery.dicts()))
Hope this helps.

SQLAlchemy/pandas to_sql for SQLServer -- CREATE TABLE in master db

Using MSSQL (version 2012), I am using SQLAlchemy and pandas (on Python 2.7) to insert rows into a SQL Server table.
After trying pymssql and pyodbc with a specific server string, I am trying an odbc name:
import sqlalchemy, pyodbc, pandas as pd
engine = sqlalchemy.create_engine("mssql+pyodbc://mssqlodbc")
sqlstring = "EXEC getfoo"
dbdataframe = pd.read_sql(sqlstring, engine)
This part works great and worked with the other methods (pymssql, etc). However, the pandas to_sql method doesn't work.
finaloutput.to_sql("MyDB.dbo.Loader_foo",engine,if_exists="append",chunksize="10000")
With this statement, I get a consistent error that pandas is trying to do a CREATE TABLE in the sql server Master db, which it is not permisioned for.
How do I get pandas/SQLAlchemy/pyodbc to point to the correct mssql database? The to_sql method seems to ignore whatever I put in engine connect string (although the read_sql method seems to pick it up just fine.
To have this question as answered: the problem is that you specify the schema in the table name itself. If you provide "MyDB.dbo.Loader_foo" as the table name, pandas will interprete this full string as the table name, instead of just "Loader_foo".
Solution is to only provide "Loader_foo" as table name. If you need to specify a specific schema to write this table into, you can use the schema kwarg (see docs):
finaloutput.to_sql("Loader_foo", engine, if_exists="append")
finaloutput.to_sql("Loader_foo", engine, if_exists="append", schema="something_else_as_dbo")

Categories