Writing a Pandas Dataframe to MySQL - python

I'm trying to write a Python Pandas Dataframe to a MySQL database. I realize that it's possible to use sqlalchemy for this, but I'm wondering if there is another way that may be easier, preferably already built into Pandas. I've spent quite some time trying to do it with a For loop, but it's not realiable.
If anyone knows of a better way, it would be greatly appreciated.
Thanks a lot!

The other option to sqlalchemy can be used to_sql but in future released will be deprecated but now in version pandas 0.18.1 documentation is still active.
According to pandas documentation pandas.DataFrame.to_sql you can use following syntax:
DataFrame.to_sql(name, con, flavor='sqlite', schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)
you specify the con type/mode and flavor ‘mysql’, here is some description:
con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.
flavor : {‘sqlite’, ‘mysql’}, default ‘sqlite’ The flavor of SQL to
use. Ignored when using SQLAlchemy engine. ‘mysql’ is deprecated and
will be removed in future versions, but it will be further supported
through SQLAlchemy engines.

You can do it by using pymysql:
For example, let's suppose you have a MySQL database with the next user, password, host and port and you want to write in the database 'data_2'.
import pymysql
user = 'root'
passw = 'my-secret-pw-for-mysql-12ud'
host = '172.17.0.2'
port = 3306
database = 'data_2'
If you already have the database created:
conn = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database,
charset='utf8')
data.to_sql(name=database, con=conn, if_exists = 'replace', index=False, flavor = 'mysql')
If you do NOT have the database created, also valid when the database is already there:
conn = pymysql.connect(host=host, port=port, user=user, passwd=passw)
conn.cursor().execute("CREATE DATABASE IF NOT EXISTS {0} ".format(database))
conn = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database,
charset='utf8')
data.to_sql(name=database, con=conn, if_exists = 'replace', index=False, flavor = 'mysql')
Similar threads:
Writing to MySQL database with pandas using SQLAlchemy, to_sql
How to insert pandas dataframe via mysqldb into database?

Related

Error when creating sqlalchemy engine: Driver keyword syntax error (IM012) [duplicate]

I am attempting to write a Python script that can take Excel sheets and import them into my SQL Server Express (with Windows Authentication) database as tables. To do this, I am using pandas to read the Excel files into a pandas DataFrame, I then hope to use pandas.to_sql() to import the data into my database. To use this function, however, I need to use sqlalchemy.create_engine().
I am able to connect to my database using pyodbc alone, and run test queries. This conection is done with the followng code:
def create_connection(server_name, database_name):
config = dict(server=server_name, database= database_name)
conn_str = ('SERVER={server};DATABASE={database};TRUSTED_CONNECTION=yes')
return pyodbc.connect(r'DRIVER={ODBC Driver 13 for SQL Server};' + conn_str.format(**config))
...
server = '<MY_SERVER_NAME>\SQLEXPRESS'
db = '<MY_DATABASE_NAME>
connection = create_connection(server, db)
cursor = connection.cursor()
cursor.execute('CREATE VIEW test_view AS SELECT * FROM existing_table')
cursor.commit()
However, this isn't much use as I can't use pandas.to_sql() - to do so I need an engine from sqlalchemy.create_engine(), but I am struggling to figure out how to use my same details in my create_connection() function above to successfully create an engine and connect to the database.
I have tried many, many combinations along the lines of:
engine = create_engine("mssql+pyodbc://#C<MY_SERVER_NAME>\SQLEXPRESS/<MY_DATABASE_NAME>?driver={ODBC Driver 13 for SQL Server}?trusted_connection=yes")
conn = engine.connect().connection
or
engine = create_engine("mssql+pyodbc://#C<MY_SERVER_NAME>\SQLEXPRESS/<MY_DATABASE_NAME>?trusted_connection=yes")
conn = engine.connect().connection
A Pass through exact Pyodbc string works for me:
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.engine import URL
connection_string = (
r"Driver=ODBC Driver 17 for SQL Server;"
r"Server=(local)\SQLEXPRESS;"
r"Database=myDb;"
r"Trusted_Connection=yes;"
)
connection_url = URL.create(
"mssql+pyodbc",
query={"odbc_connect": connection_string}
)
engine = create_engine(connection_url)
df = pd.DataFrame([(1, "foo")], columns=["id", "txt"])
pd.to_sql("test_table", engine, if_exists="replace", index=False)

How do I get sqlalchemy.create_engine with mysqlconnector to connect using mysql_native_password?

I'm working with pandas and sqlalchemy, and would like to load a DataFrame into a MySQL database. I'm currently using this code snippet:
db_connection = sqlalchemy.create_engine('mysql+mysqlconnector://user:pwd#hostname/db_name')
some_data_ref.to_sql(con=db_connection, name='db_table_name', if_exists='replace')
sqlalchemy, pandas have been imported prior to this.
My MySQL backend is 8.x, which I know uses caching_sha2_password. If I were to connect to the database using mysql.connector.connect and I want to use the mysql_native_password method, I know that I should specify auth_plugin = mysql_native_password like so:
mysql.connector.connect(user=user, password=pw, host=host, database=db, auth_plugin='mysql_native_password')
My question: Is there a way to force mysql_native_password authentication with sqlalchemy.create_engine('mysql+mysqlconnector://...)?
Any advice on this would be much appreciated...
You could use connect_args:
db_connection = sqlalchemy.create_engine(
'mysql+mysqlconnector://user:pwd#hostname/db_name',
connect_args={'auth_plugin': 'mysql_native_password'})
or the URL query:
db_connection = sqlalchemy.create_engine(
'mysql+mysqlconnector://user:pwd#hostname/db_name?auth_plugin=mysql_native_password')

Pandas 0.20.2 to_sql() using MySQL

I'm trying to write a dataframe to a MySQL table but am getting a (111 Connection refused) error.
I followed the accepted answer here:
Writing to MySQL database with pandas using SQLAlchemy, to_sql
Answer's code:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[user]:[pass]#[host]:[port]/[schema]', echo=False)
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)
...and the create_engine() line worked without error, but the to_sql() line failed with this error:
(mysql.connector.errors.InterfaceError) 2003: Can't connect to MySQL server on 'localhost:3306' (111 Connection refused)
How I connect to my MySQL database / table is not really relevant, so completely different answers are appreciated, but given the deprecation of the MySQL 'flavor' in pandas 0.20.2, what is the proper way to write a dataframe to MySQL?
Thanks to a tip from #AndyHayden, this answer was the trick. Basically replacing mysqlconnector with mysqldb was the linchpin.
engine = create_engine('mysql+mysqldb://[user]:[pass]#[host]:[port]/[schema]', echo = False)
df.to_sql(name = 'my_table', con = engine, if_exists = 'append', index = False)
Where [schema] is the database name, and in my particular case, :[port] is omitted with [host] being localhost.

pandas to mysql error: `ValueError: database flavors mysql is not supported`

I need to load data that i have in a DataFrame into mysql.
I tried to use the df.to_sql , following this documentation, that suggest using: .to_sql(name, con, flavor='mysql', if_exists='fail', index=True, index_label=None)
And i get this error: ValueError: database flavors mysql is not supported
How can i bypass this problem ?
The flavor 'mysql' is deprecated in pandas version 0.19. You have to use the engine from sqlalchemy to create the connection with the database.
from sqlalchemy import create_engine
engine = create_engine("mysql+mysqldb://USER:PASSWORD#HOST/DATABASE")
df.to_sql(name, con=engine, if_exists='fail', index=True, index_label=None)
When defining your engine, you need to specify your user, password, host and database.

Python pandas to_sql 'append'

I am trying to send monthly data to a MySQL database using Python's pandas to_sql command. My program runs one month of data at a time and I want to append the new data onto the existing database. However, Python gives me an error:
_mysql_exceptions.OperationalError: (1050, "Table 'cps_basic_tabulation' already exists")
Here is my code for connecting and exporting:
conn = MySQLdb.connect(host = config.get('db', 'host'),
user = config.get('db', 'user'),
passwd = config.get('db', 'password'),
db = 'cps_raw')
combined.to_sql(name = "cps_raw.cps_basic_tabulation",
con = conn,
flavor = 'mysql',
if_exists = 'append')
I have also tried using:
from sqlalchemy import create_engine
Replacing conn = MySQLdb.connect... with:
engine = mysql+mysqldb://<user>:<password>#<host>[:<port>]/<dbname>
conn = engine.connect().connection
Any ideas on why I cannot append to a database?
Thanks!
Starting from pandas 0.14, you have to provide directly the sqlalchemy engine, and not the connection object:
engine = create_engine("mysql+mysqldb://<user>:<password>#<host>[:<port>]/<dbname>")
combined.to_sql("cps_raw.cps_basic_tabulation", engine, if_exists='append')
Since I had the same error message and stumbled across this post I leave this here for others to find.
I found two ways to solve the duplicated table creation although I lack the insight as to why this solves it:
Either pass the database name in the url when creating a connection
or pass the database name as a schema in pd.to_sql.
Doing both does not hurt. Also, a few years later it is (again?) possible to pass the pure connection to pandas. My guess would be that in the previous answer by joris the first of my solution cases might have implicitly solved the problem.
```
#create connection to MySQL DB via sqlalchemy & pymysql
user = credentials['user']
password = credentials['password']
port = credentials['port']
host = credentials['hostname']
dialect = 'mysql'
driver = 'pymysql'
db_name = 'test_db'
# setup SQLAlchemy
from sqlalchemy import create_engine
cnx = f'{dialect}+{driver}://{user}:{password}#{host}:{port}/'
engine = create_engine(cnx)
# create database
with engine.begin() as con:
con.execute(f"CREATE DATABASE {db_name}")
############################################################
# either pass the db_name vvvv - HERE- vvvv after creating a database
cnx = f'{dialect}+{driver}://{user}:{password}#{host}:{port}/{db_name}'
############################################################
engine = create_engine(cnx)
table = 'test_table'
col = 'test_col'
with engine.begin() as con:
# this would work here instead of creating a new engine with a new link
# con.execute(f"USE {db_name}")
con.execute(f"CREATE TABLE {table} ({col} CHAR(1));")
# insert into database
import pandas as pd
df = pd.DataFrame({col : ['a','b','c']})
with engine.begin() as con:
# this has no effect here
# con.execute(f"USE {db_name}")
df.to_sql(
name= table,
if_exists='append',
# passing con = cnx here would equally work
con=con,
############################################################
# or pass it as a schema vvvv - HERE - vvvv
#schema=db_name,
############################################################
index=False
)```
Tested with python version 3.8.13, sqlalchemy 1.4.32 and pandas 1.4.2.
Same problem might have appeared here and here.

Categories