Using MSSQL (version 2012), I am using SQLAlchemy and pandas (on Python 2.7) to insert rows into a SQL Server table.
After trying pymssql and pyodbc with a specific server string, I am trying an odbc name:
import sqlalchemy, pyodbc, pandas as pd
engine = sqlalchemy.create_engine("mssql+pyodbc://mssqlodbc")
sqlstring = "EXEC getfoo"
dbdataframe = pd.read_sql(sqlstring, engine)
This part works great and worked with the other methods (pymssql, etc). However, the pandas to_sql method doesn't work.
finaloutput.to_sql("MyDB.dbo.Loader_foo",engine,if_exists="append",chunksize="10000")
With this statement, I get a consistent error that pandas is trying to do a CREATE TABLE in the sql server Master db, which it is not permisioned for.
How do I get pandas/SQLAlchemy/pyodbc to point to the correct mssql database? The to_sql method seems to ignore whatever I put in engine connect string (although the read_sql method seems to pick it up just fine.
To have this question as answered: the problem is that you specify the schema in the table name itself. If you provide "MyDB.dbo.Loader_foo" as the table name, pandas will interprete this full string as the table name, instead of just "Loader_foo".
Solution is to only provide "Loader_foo" as table name. If you need to specify a specific schema to write this table into, you can use the schema kwarg (see docs):
finaloutput.to_sql("Loader_foo", engine, if_exists="append")
finaloutput.to_sql("Loader_foo", engine, if_exists="append", schema="something_else_as_dbo")
Related
According to SQLAlchemy documentation you are supposed to use Session object when executing SQL statements. But using a Session with Pandas .read_sql, gives an error: AttributeError 'Session' object has no attribute 'cursor'.
However using the Connection object works even with the ORM Mapped Class:
with ENGINE.connect() as conn:
df = pd.read_sql_query(
sqlalchemy.select(MeterValue),
conn
)
Where MeterValue is a Mapped Class.
This doesn't feel like the correct solution, because SQLAlchemy documentation says you are not supposed to use engine connection with ORM. I just can't find out why.
Does anyone know if there is any issue using the connection instead of Session with ORM Mapped Class?
What is the correct way to read sql in to a DataFrame using SQLAlchemy ORM?
I found a couple of old answers on this where you use the engine directly as the second argument, or use session.bind and so on. Nothing works.
Just reading the documentation of pandas.read_sql_query:
pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)
Parameters:
sql: str SQL query or SQLAlchemy Selectable (select or text object)
SQL query to be executed.
con: SQLAlchemy connectable, str, or sqlite3 connection
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.
...
So pandas does allow a SQLAlchemy Selectable (e.g. select(MeterValue)) and a SQLAlchemy connectable (e.g. engine.connect()), so your code block is correct and pandas will handle the querying correctly.
with ENGINE.connect() as conn:
df = pd.read_sql_query(
sqlalchemy.select(MeterValue),
conn,
)
I have an issue with the dataframe.to_sql when trying to use this function
The dataframe.to_sql does not recognize or separate the data lab name and the table name, instead it takes it all as a string to create a table. So it is trying to create it on the default root level and gives the error, this user does not have permission to create on LABUSERS.
from sqlalchemy import create_engine
engine = create_engine(f'teradata://{username}:{password}#tdprod:22/')
df.to_sql('data_lab.table_name', engine)
How can I use df.to_sql function and specify the datalab?
schema (str, optional)
Specify the schema (if database flavor supports this). If None, use default schema.
to_sql documentation
I'm trying to replace some old MSSQL stored procedures with python, in an attempt to take some of the heavy calculations off of the sql server. The part of the procedure I'm having issues replacing is as follows
UPDATE mytable
SET calc_value = tmp.calc_value
FROM dbo.mytable mytable INNER JOIN
#my_temp_table tmp ON mytable.a = tmp.a AND mytable.b = tmp.b AND mytable.c = tmp.c
WHERE (mytable.a = some_value)
and (mytable.x = tmp.x)
and (mytable.b = some_other_value)
Up to this point, I've made some queries with SQLAlchemy, stored those data in Dataframes, and done the requisite calculations on them. I don't know now how to put the data back into the server using SQLAlchemy, either with raw SQL or function calls. The dataframe I have on my end would essentially have to work in the place of the temporary table created in MSSQL Server, but I'm not sure how I can do that.
The difficulty is of course that I don't know of a way to join between a dataframe and a mssql table, and I'm guessing this wouldn't work so I'm looking for a workaround
As the pandas doc suggests here :
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://user:password#DSN", echo = False)
dataframe.to_sql('tablename', engine , if_exists = 'replace')
engine parameter for msSql is basically the connection string check it here
if_exist parameter is a but tricky since 'replace' actually drops the table first and then recreates it and then inserts all data at once.
by setting the echo attribute to True it shows all background logs and sql's.
I'm creating a sqlalchemy engine (have pyhdb and sqlalchemy-hana installed) for a HANA db connection and passing it into pandas' to_sql function for dataframes:
hanaeng = create_engine('hana://username:password#host_address:port')
my_df.to_sql('table_name', con = hanaeng, index = False, if_exists = 'append')
However, I keep getting this error:
sqlalchemy.exc.DatabaseError: (pyhdb.exceptions.DatabaseError) invalid column name
I created a table in my Hana schema that matches the column names and type of what I'm trying to pass into it from the dataframe.
Has anyone ever come across this error? Or tried connecting to hana using a sqlalchemy engine? I tried using a pyhdb connector to make a connection object and passing that into to_sql but I believe pandas is trying to shift accepting only sqlalchemy engine objects in to_sql versus straight DBAPI connectors? Regardless, any help will be great! Thank you
Yes, it does work for sure.
Your problem is that my_df contains a column name that does not match any column in HANA table you are trying to insert data.
I have a 1,000,000 x 50 Pandas DataFrame that I am currently writing to a SQL table using:
df.to_sql('my_table', con, index=False)
It takes an incredibly long time. I've seen various explanations about how to speed up this process online, but none of them seem to work for MSSQL.
If I try the method in:
Bulk Insert A Pandas DataFrame Using SQLAlchemy
then I get a no attribute copy_from error.
If I try the multithreading method from:
http://techyoubaji.blogspot.com/2015/10/speed-up-pandas-tosql-with.html
then I get a QueuePool limit of size 5 overflow 10 reach, connection timed out error.
Is there any easy way to speed up to_sql() to an MSSQL table? Either via BULK COPY or some other method, but entirely from within Python code?
I've used ctds to do a bulk insert that's a lot faster with SQL server. In example below, df is the pandas DataFrame. The column sequence in the DataFrame is identical to the schema for mydb.
import ctds
conn = ctds.connect('server', user='user', password='password', database='mydb')
conn.bulk_insert('table', (df.to_records(index=False).tolist()))
in pandas 0.24 you can use method ='multi' with chunk size of 1000 which is the sql server limit
chunksize=1000, method='multi'
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method
New in version 0.24.0.
The parameter method controls the SQL insertion clause used. Possible values are:
None: Uses standard SQL INSERT clause (one per row).
'multi': Pass multiple values in a single INSERT clause. It uses a special SQL syntax not supported by all backends. This usually provides better performance for analytic databases like Presto and Redshift, but has worse performance for traditional SQL backend if the table contains many columns. For more information check the SQLAlchemy documention.
even I had the same issue so I applied sqlalchemy with fast execute many.
from sqlalchemy import event, create_engine
engine = create_egine('connection_string_with_database')
#event.listens_for(engine, 'before_cursor_execute')
def plugin_bef_cursor_execute(conn, cursor, statement, params, context,executemany):
if executemany:
cursor.fast_executemany = True # replace from execute many to fast_executemany.
cursor.commit()
always make sure that the given function should be present after the engine variable and before cursor execute.
conn = engine.execute()
df.to_sql('table', con=conn, if_exists='append', index=False) # for reference go to the pandas to_sql documentation.