Pandas 0.20.2 to_sql() using MySQL

Pandas 0.20.2 to_sql() using MySQL - python

I'm trying to write a dataframe to a MySQL table but am getting a (111 Connection refused) error.
I followed the accepted answer here:
Writing to MySQL database with pandas using SQLAlchemy, to_sql
Answer's code:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[user]:[pass]#[host]:[port]/[schema]', echo=False)
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)
...and the create_engine() line worked without error, but the to_sql() line failed with this error:
(mysql.connector.errors.InterfaceError) 2003: Can't connect to MySQL server on 'localhost:3306' (111 Connection refused)
How I connect to my MySQL database / table is not really relevant, so completely different answers are appreciated, but given the deprecation of the MySQL 'flavor' in pandas 0.20.2, what is the proper way to write a dataframe to MySQL?

Thanks to a tip from #AndyHayden, this answer was the trick. Basically replacing mysqlconnector with mysqldb was the linchpin.
engine = create_engine('mysql+mysqldb://[user]:[pass]#[host]:[port]/[schema]', echo = False)
df.to_sql(name = 'my_table', con = engine, if_exists = 'append', index = False)
Where [schema] is the database name, and in my particular case, :[port] is omitted with [host] being localhost.

Related

(psycopg2.OperationalError) Invalid - opcode

I am trying to connect to Netezza using SQLalchemy.create_engine(). The reason I want to use SQLAlchmey is because I want to be able to read and write through pandas dataframe.
What works is as follow:
import pandas as pd
import pyodbc
conn = pyodbc.connect('DSN=NZDWW')
df2 = pd.read_sql(Query,conn)
Above code runs fine. But in order to write df dataframe to the Netezza, I need to use the function to_sql(), which needs SQLAlchemy. This is what my code looks like:
from sqlalchemy import create_engine
username = os.getenv('REDSHIFT_USER')
password = os.getenv('REDSHIFT_PASS')
DATABASE = "SHP_TARGET"
HOST = "Netezza1"
PORT = 5480
conn_str = "postgresql://"+username+":"+password+"#"+HOST+':'+str(PORT)+'/'+DATABASE
engine3 = create_engine(conn_str)
df = pd.read_sql(Query, engine3)
When I execute this, I get the following error:
OperationalError: (psycopg2.OperationalError) Invalid - opcode
Invalid - opcodeInvalid packet length (Background on this error at: http://sqlalche.me/e/e3q8)
Any leads will be much appreciated. thanks.
Database: Netezza
Python version: 3.6
OS: Windows

The sqlalchemy dialect for Postges isn't compatible with Netezza.
The error you're receiving is the psycopg2 module, which facilitates the connection, complaining that it can't make sense of what the server is "saying", basically.
There appears to be a dialect for Netezza though. You may want to try that out.

Here's the formal dialect for Netezza has been released.
It can be used as documented here - https://github.com/IBM/nzalchemy#prerequisites
Example
from sqlalchemy import create_engine
from urllib import parse_quote_plus
# assumes NZ_HOST, NZ_USER, NZ_PASSWORD are set
import os
params = parse_quote_plus(f"DRIVER=NetezzaSQL;SERVER={os['NZ_HOST']};"
f"DATABASE={os['NZ_DATABASE']};USER={os['NZ_USER'};"
f"PASSWORD={os['NZ_PASSWORD']}")
engine = create_engine(f"netezza+pyodbc:///?odbc_connect={params}",
echo=True)

Python: Write Pandas Dataframe to MSSQL --> Database Error

I have a pandas dataframe that has about 20k rows and 20 columns. I want to write it to a table in MSSQL.
I have the connection successfully established:
connection = pypyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=line;'
'uid=XXX;'
'pwd=XXX')
cursor = connection.cursor()
I'm trying to write my pandas dataframe to the MSSQL server with the following code:
df_EVENT5_16.to_sql('MODREPORT', connection, if_exists = 'replace')
But I get the following error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master
WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC
SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'.")

Modern Pandas versions expect SQLAlchemy engine as a connection, so use SQLAlchemy:
from sqlalchemy import create_engine
con = create_engine('mssql+pyodbc://username:password#myhost:port/databasename?driver=SQL+Server+Native+Client+10.0')
and then:
df_EVENT5_16.to_sql('MODREPORT', con, if_exists='replace')
from DataFrame.to_sql() docs:
con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that library.
If a DBAPI2 object, only sqlite3 is supported.

No need to use pyodbc to connect with MSSQL, SQL Alchemy will do that for you.
And also we can insert the data-frame directly into the database without iterating the data-frame using to_sql() method. Here is the code that working fine for me -
# To insert data frame into MS SQL database without iterate the data-frame
import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
from six.moves import urllib
params = urllib.parse.quote_plus("DRIVER={SQL
Server};SERVER=serverName;DATABASE=dbName;UID=UserName;PWD=password")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
# suppose df is the data-frame that we want to insert in database
df.to_sql(name='table_name',con=engine, index=False, if_exists='append')

Python sqlalchemy trying to write pandas dataframe to SQL Server using .to_sql

I have a python code through which I am getting a pandas dataframe "df". I am trying to write this dataframe to Microsoft SQL server. I am trying to connect through the following code by I am getting an error
import pyodbc
from sqlalchemy import create_engine
engine = create_engine('mssql+pyodbc:///?odbc_connect=DRIVER={SQL Server};SERVER=bidept;DATABASE=BIDB;UID=sdcc\neils;PWD=neil!pass')
engine.connect()
df.to_sql(name='[BIDB].[dbo].[Test]',con=engine, if_exists='append')
However at the engine.connect() line I am getting the following error
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('08001', '[08001] [Microsoft][ODBC SQL Server Driver]Neither DSN nor SERVER keyword supplied (0) (SQLDriverConnect)')
Can anyone tell me what I am missing. I am using Microsoft SQL Server Management Studio - 14.0.17177.0
I connect to the SQL server through the following
Server type: Database Engine
Server name: bidept
Authentication: Windows Authentication
for which I log into my windows using username : sdcc\neils
and password : neil!pass
I am new to databases and python. Kindly let me know if you need any additional details. Any help will be greatly appreciated. Thank you in advance.

I was finally able to make it run.
import pyodbc
from sqlalchemy import create_engine
import urllib
params = urllib.quote_plus(r'DRIVER={SQL Server};SERVER=bidept;DATABASE=BIDB;Trusted_Connection=yes')
### For python 3.5: urllib.parse.quote_plus
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = create_engine(conn_str)
reload(sys)
sys.setdefaultencoding('utf8')
df.to_sql(name='Test',con=engine, if_exists='append',index=False)
Thanks to #gord-thompson who answered Here
Although my in my sql server, all the tables are under the 'dbo' schema (i.e. dbo.Test1, dbo.Other_Tables) and this query puts my table in 'sdcc\neils' schema (i.e. sdcc\neils.Test1, sdcc\neils.Other_Tables) any solution to this?

Writing a Pandas Dataframe to MySQL

I'm trying to write a Python Pandas Dataframe to a MySQL database. I realize that it's possible to use sqlalchemy for this, but I'm wondering if there is another way that may be easier, preferably already built into Pandas. I've spent quite some time trying to do it with a For loop, but it's not realiable.
If anyone knows of a better way, it would be greatly appreciated.
Thanks a lot!

The other option to sqlalchemy can be used to_sql but in future released will be deprecated but now in version pandas 0.18.1 documentation is still active.
According to pandas documentation pandas.DataFrame.to_sql you can use following syntax:
DataFrame.to_sql(name, con, flavor='sqlite', schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)
you specify the con type/mode and flavor ‘mysql’, here is some description:
con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.
flavor : {‘sqlite’, ‘mysql’}, default ‘sqlite’ The flavor of SQL to
use. Ignored when using SQLAlchemy engine. ‘mysql’ is deprecated and
will be removed in future versions, but it will be further supported
through SQLAlchemy engines.

You can do it by using pymysql:
For example, let's suppose you have a MySQL database with the next user, password, host and port and you want to write in the database 'data_2'.
import pymysql
user = 'root'
passw = 'my-secret-pw-for-mysql-12ud'
host = '172.17.0.2'
port = 3306
database = 'data_2'
If you already have the database created:
conn = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database,
charset='utf8')
data.to_sql(name=database, con=conn, if_exists = 'replace', index=False, flavor = 'mysql')
If you do NOT have the database created, also valid when the database is already there:
conn = pymysql.connect(host=host, port=port, user=user, passwd=passw)
conn.cursor().execute("CREATE DATABASE IF NOT EXISTS {0} ".format(database))
conn = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database,
charset='utf8')
data.to_sql(name=database, con=conn, if_exists = 'replace', index=False, flavor = 'mysql')
Similar threads:
Writing to MySQL database with pandas using SQLAlchemy, to_sql
How to insert pandas dataframe via mysqldb into database?

Get data from pandas into a SQL server with PYODBC

I am trying to understand how python could pull data from an FTP server into pandas then move this into SQL server. My code here is very rudimentary to say the least and I am looking for any advice or help at all. I have tried to load the data from the FTP server first which works fine.... If I then remove this code and change it to a select from ms sql server it is fine so the connection string works, but the insertion into the SQL server seems to be causing problems.
import pyodbc
import pandas
from ftplib import FTP
from StringIO import StringIO
import csv
ftp = FTP ('ftp.xyz.com','user','pass' )
ftp.set_pasv(True)
r = StringIO()
ftp.retrbinary('filname.csv', r.write)
pandas.read_table (r.getvalue(), delimiter=',')
connStr = ('DRIVER={SQL Server Native Client 10.0};SERVER=localhost;DATABASE=TESTFEED;UID=sa;PWD=pass')
conn = pyodbc.connect(connStr)
cursor = conn.cursor()
cursor.execute("INSERT INTO dbo.tblImport(Startdt, Enddt, x,y,z,)" "VALUES (x,x,x,x,x,x,x,x,x,x.x,x)")
cursor.close()
conn.commit()
conn.close()
print"Script has successfully run!"
When I remove the ftp code this runs perfectly, but I do not understand how to make the next jump to get this into Microsoft SQL server, or even if it is possible without saving into a file first.

For the 'write to sql server' part, you can use the convenient to_sql method of pandas (so no need to iterate over the rows and do the insert manually). See the docs on interacting with SQL databases with pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql
You will need at least pandas 0.14 to have this working, and you also need sqlalchemy installed. An example, assuming df is the DataFrame you got from read_table:
import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine("mssql+pyodbc://<username>:<password>#<dsnname>")
# write the DataFrame to a table in the sql database
df.to_sql("table_name", engine)
See also the documentation page of to_sql.
More info on how to create the connection engine with sqlalchemy for sql server with pyobdc, you can find here:http://docs.sqlalchemy.org/en/rel_1_1/dialects/mssql.html#dialect-mssql-pyodbc-connect
But if your goal is to just get the csv data into the SQL database, you could also consider doing this directly from SQL. See eg Import CSV file into SQL Server

Python3 version using a LocalDB SQL instance:
from sqlalchemy import create_engine
import urllib
import pyodbc
import pandas as pd
df = pd.read_csv("./data.csv")
quoted = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=(localDb)\ProjectsV14;DATABASE=database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
df.to_sql('TargetTable', schema='dbo', con = engine)
result = engine.execute('SELECT COUNT(*) FROM [dbo].[TargetTable]')
result.fetchall()

Yes, the bcp utility seems to be the best solution for most cases.
If you want to stay within Python, the following code should work.
from sqlalchemy import create_engine
import urllib
import pyodbc
quoted = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=YOUR\ServerName;DATABASE=YOur_Database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
df.to_sql('Table_Name', schema='dbo', con = engine, chunksize=200, method='multi', index=False, if_exists='replace')
Don't avoid method='multi', because it significantly reduces the task execution time.
Sometimes you may encounter the following error.
ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server
Driver][SQL Server]The incoming request has too many parameters. The
server supports a maximum of 2100 parameters. Reduce the number of
parameters and resend the request. (8003) (SQLExecDirectW)')
In such a case, determine the number of columns in your dataframe: df.shape[1]. Divide the maximum supported number of parameters by this value and use the result's floor as a chunk size.

I found that using bcp utility (https://learn.microsoft.com/en-us/sql/tools/bcp-utility) works best when you have a large dataset. I have 2.7 million rows that inserts at 80K rows/sec. You can store your data frame as csv file (use tabs for separator if your data doesn't have tabs and utf8 encoding). With bcp, I've used format "-c" and it works without issues so far.

This worked for me on Python 3.5.2:
import sqlalchemy as sa
import urllib
import pyodbc
conn= urllib.parse.quote_plus('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(conn))
frame.to_sql("myTable", engine, schema='dbo', if_exists='append', index=False, index_label='myField')

"As the Connection represents an open resource against the database, we want to always limit the scope of our use of this object to a specific context, and the best way to do that is by using Python context manager form, also known as the with statement."
https://docs.sqlalchemy.org/en/14/tutorial/dbapi_transactions.html
The example would then be
from sqlalchemy import create_engine
import urllib
import pyodbc
connection_string = (
"Driver={SQL Server Native Client 11.0};"
"Server=myserver;"
"UID=myuser;"
"PWD=mypwd;"
"Database=mydb;"
)
quoted = urllib.parse.quote_plus(connection_string)
engine = create_engine(f'mssql+pyodbc:///?odbc_connect={quoted}')
with engine.connect() as cnn:
df.to_sql('mytable',con=cnn, if_exists='replace', index=False)

Following is what worked for me using sqlalchemy. Pay attention to the last part ?driver=SQL+Server'.
import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine('mssql+pyodbc://MyUser:MyPWD#dataserver.sandbox.myserver/MY_DB?driver=SQL+Server')
dt.to_sql("PatientResultTest", engine,if_exists='append')
The SQL table needs an index column at the beginning to store the index value of dataframe.

# using class function
import pandas as pd
import pyodbc
import sqlalchemy
import urllib
class data_frame_to_sql():
def__init__(self,dataFrame,sql_table_name):
self.dataFrame=dataFrame
self.sql_table_name=sql_table_name
def conversion(self):
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER=######;"
"DATABASE=####;"
"UID=#####;"
"PWD=###;")
try:
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
return f"Table '{self.sql_table_name}' added sucsessfully in database" ,self.dataFrame.to_sql(self.sql_table_name, engine)
except Exception as e :
e=str(e).replace(".","")
print(f"{e} in Database." )
data={"BusinessEntityID":["1","2","3"],"FirstName":["raj","abhi","amir"],"LastName":["kapoor","bachn","khhan"]}
df = pd.DataFrame(data, columns= ['BusinessEntityID','FirstName','LastName'])
ab=data_frame_to_sql(df,"ab").conversion()
print(ab)

It's not necessary to use sqlamchemy, one could create a connection with pyodbc directly to use it with pandas, as below: `with pyodbc.connect('DRIVER={ODBC Driver 18 for SQL Server};SERVER='+server
+';DATABASE='+database+';UID='+username+';PWD='+ password) as newconn:
df = pd.read_sql(,newconn)
`

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas 0.20.2 to_sql() using MySQL - python

Related

(psycopg2.OperationalError) Invalid - opcode

Python: Write Pandas Dataframe to MSSQL --> Database Error

Python sqlalchemy trying to write pandas dataframe to SQL Server using .to_sql

Writing a Pandas Dataframe to MySQL

Get data from pandas into a SQL server with PYODBC

Categories

Resources