Assuming I have 30 databases in MySQL from db1 to db30. I have a python script that will create engine and connect to one db,
import pandas as pd
import MySQLdb
from sqlalchemy import create_engine
df = pd.read_csv('pricelist.csv')
new_df = df[['date','time','new_price']]
engine = create_engine('mysql+mysqldb://root:python#localhost:3306/db1', echo = False)
new_df.to_sql(name='temporary_table', con=engine, if_exists = 'append', index=False)
with engine.begin() as cnx:
sql_insert_query_new = 'REPLACE INTO newlist (SELECT * FROM temporary_table)'
cnx.execute(sql_insert_query_new)
cnx.execute("DROP TABLE temporary_table")
Now with the above script, I will need to have 30 python scripts to create engine and connect each db to conduct the query. And to call these 30 scripts, I will need to use a batch file on a task scheduler.
Is there an optimize way of connecting to multiple databases with a single script? I read up on sessions and don't think it is able to take in multiple databases. And if I have 30 python scripts doing this creation engine and connection, will there be any issue in terms of processing performance? Eventually, I will have like hundreds of db in MySQL.
Thanks!
Note: Each database has their own unique table names.
Using Python 3.7
I think may be you can do something like this:
import pandas as pd
import MySQLdb
from sqlalchemy import create_engine
df = pd.read_csv('pricelist.csv')
new_df = df[['date','time','new_price']]
db_names = [f'db{i}' for i in range(1, 31)]
table_names = ['temporary_table', 'table_name_2', 'table_name_3', ...]
for db, tb in zip(db_names, table_names):
engine = create_engine(f'mysql+mysqldb://root:python#localhost:3306/{db}', echo=False)
new_df.to_sql(name=tb, con=engine, if_exists='append', index=False)
with engine.begin() as cnx:
sql_insert_query_new = f'REPLACE INTO newlist (SELECT * FROM {tb})'
cnx.execute(sql_insert_query_new)
cnx.execute(f"DROP TABLE {tb}")
I am trying to submit a sql query to jdbc while being protected from sql injection attacks. I have some code such as
from pyspark import SparkContext
from pyspark.sql import DataFrameReader, SQLContext
from pyspark.sql.functions import col
url = 'jdbc:mysql://.../....'
properties = {'user': '', 'driver': 'com.mysql.jdbc.Driver', 'password': ''}
sc = SparkContext("local[*]", "name")
sqlContext = SQLContext(sc)
from pyspark.sql.functions import desc
pushdown_query = """(
select * from my_table
where timestamp > {}
) AS tmp""".format(my_date)
df = sqlContext.read.jdbc(url=url, properties=properties, table=pushdown_query)
Can I use bind params somehow?
Any solution that prevents SQL injection here would work.
I also use SQLAlchemy if that helps.
If you use SQLAlchemy, you can try:
from sqlalchemy.dialects import mysql
from sqlalchemy import text
pushdown_query = str(
text("""(select * from my_table where timestamp > :my_date ) AS tmp""")
.bindparams(my_date=my_date)
.compile(dialect=mysql.dialect(), compile_kwargs={"literal_binds": True}))
df = sqlContext.read.jdbc(url=url, properties=properties, table=pushdown_query)
but in a simple case, like this one, there is no need for subqueries. You can:
df = (sqlContext.read
.jdbc(url=url, properties=properties, table=my_table)
.where(col("timestamp") > my_date)))
and if you worry about SQL injections, it is possible you have a bigger problem. If alone has (almost) no security mechanisms built-in and probably shouldn't be exposed in untrusted environment.
In Python version 2.7.6
Pandas version 0.18.1
MySQL 5.7
import MySQLdb as dbapi
import sys
import csv
import os
import sys, getopt
import pandas as pd
df = pd.read_csv('test.csv')
rows = df.apply(tuple, 1).unique().tolist()
db=dbapi.connect(host=dbServer,user=dbUser,passwd=dbPass)
cur=db.cursor()
for (CLIENT_ID,PROPERTY_ID,YEAR) in rows:
INSERT_QUERY=("INSERT INTO {DATABASE}.TEST SELECT * FROM {DATABASE}_{CLIENT_ID}.TEST WHERE PROPERTY_ID = {PROPERTY_ID} AND YEAR = {YEAR};".format(
CLIENT_ID=CLIENT_ID,
PROPERTY_ID=PROPERTY_ID,
YEAR=YEAR,
DATABASE=DATABASE
))
print INSERT_QUERY
cur.execute(INSERT_QUERY)
db.query(INSERT_QUERY)
This will print out the query I am looking for, however, without successfully returning the results of INSERT INTO when I checked the results in MySQL
INSERT INTO test.TEST SELECT * FROM test_1.TEST WHERE PROPERTY_ID = 1 AND YEAR = 2015;
However, if I just copy and paste this MySQL query into MySQL GUI, it will execute without any problem. Could any guru enlighten?
I also tried the following
cur.execute(INSERT_QUERY, multi=True)
Returns an error
TypeError: execute() got an unexpected keyword argument 'multi'
The answer here is we need to use "from mysql.connector" and a db.commit(). Here is a good example
http://www.mysqltutorial.org/python-mysql-insert/
import MySQLdb as dbapi
import mysql.connector
import sys
import csv
import os
import sys, getopt
import pandas as pd
df = pd.read_csv('test.csv')
rows = df.apply(tuple, 1).unique().tolist()
db=dbapi.connect(host=dbServer,user=dbUser,passwd=dbPass)
cur=db.cursor()
conn = mysql.connector.connect(host=dbServer,user=dbUser,port=dbPort,password=dbPass)
cursor=conn.cursor()
for (CLIENT_ID,PROPERTY_ID,YEAR) in rows:
INSERT_QUERY=("INSERT INTO {DATABASE}.TEST SELECT * FROM {DATABASE}_{CLIENT_ID}.TEST WHERE PROPERTY_ID = {PROPERTY_ID} AND YEAR = {YEAR};".format(
CLIENT_ID=CLIENT_ID,
PROPERTY_ID=PROPERTY_ID,
YEAR=YEAR,
DATABASE=DATABASE
))
print INSERT_QUERY
cursor.execute(INSERT_QUERY)
conn.commit()
Only by having the commit, the database/ table changes will be accepted
I was using mysql-connector pool, trying to insert a new row into a table, and got the same problem. The version info: mysql-8, python3.7.
The solution is to add connection.commit at last even you didn't start transaction.
Has anyone found a way to read a Teradata query into a Pandas dataframe? It looks like SQLAlchemy does not have a Teradata dialect.
http://docs.sqlalchemy.org/en/latest/dialects/
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html
You can use slqalchemy but you will need to install sqlalchemy-teradata too. You can do that via PIP
pip install sqlachemy-teradata
The rest of the code remains the same :)
from sqlalchemy import create_engine
import pandas as pd
user, pasw, host = 'username','userpass', 'hostname'
# connect
td_engine = create_engine('teradata://{}:{}#{}:22/'.format(user,pasw,hostname))
# execute sql
query = 'select * from dbc.usersV'
result = td_engine.execute(query)
#To read your query to Pandas
df = pd.read_sql(query,td_engine)
I did it using read_sql . Below id the code snip :
def dqm() :
conn_rw = create_connection()
dataframes = []
srcfile = open('srcqueries.sql', 'rU').read()
querylist = srcfile.split(';')
querylist.pop()
for query in querylist :
dataframes.append(pd.read_sql(query, conn_rw))
close_connection(conn_rw)
return dataframes,querylist
You can create connection as below :
def create_connection():
conn = pyodbc.connect("DRIVER=Teradata;DBCNAME=tddb;UID=uid;PWD=pwd;QUIETMODE=YES", autocommit=True,unicode_results=True)
return conn
You can check complete code here : GitHub Link
Let me know if this answers your query .
I can connect to my local mysql database from python, and I can create, select from, and insert individual rows.
My question is: can I directly instruct mysqldb to take an entire dataframe and insert it into an existing table, or do I need to iterate over the rows?
In either case, what would the python script look like for a very simple table with ID and two data columns, and a matching dataframe?
Update:
There is now a to_sql method, which is the preferred way to do this, rather than write_frame:
df.to_sql(con=con, name='table_name_for_df', if_exists='replace', flavor='mysql')
Also note: the syntax may change in pandas 0.14...
You can set up the connection with MySQLdb:
from pandas.io import sql
import MySQLdb
con = MySQLdb.connect() # may need to add some other options to connect
Setting the flavor of write_frame to 'mysql' means you can write to mysql:
sql.write_frame(df, con=con, name='table_name_for_df',
if_exists='replace', flavor='mysql')
The argument if_exists tells pandas how to deal if the table already exists:
if_exists: {'fail', 'replace', 'append'}, default 'fail'
fail: If table exists, do nothing.
replace: If table exists, drop it, recreate it, and insert data.
append: If table exists, insert data. Create if does not exist.
Although the write_frame docs currently suggest it only works on sqlite, mysql appears to be supported and in fact there is quite a bit of mysql testing in the codebase.
Andy Hayden mentioned the correct function (to_sql). In this answer, I'll give a complete example, which I tested with Python 3.5 but should also work for Python 2.7 (and Python 3.x):
First, let's create the dataframe:
# Create dataframe
import pandas as pd
import numpy as np
np.random.seed(0)
number_of_samples = 10
frame = pd.DataFrame({
'feature1': np.random.random(number_of_samples),
'feature2': np.random.random(number_of_samples),
'class': np.random.binomial(2, 0.1, size=number_of_samples),
},columns=['feature1','feature2','class'])
print(frame)
Which gives:
feature1 feature2 class
0 0.548814 0.791725 1
1 0.715189 0.528895 0
2 0.602763 0.568045 0
3 0.544883 0.925597 0
4 0.423655 0.071036 0
5 0.645894 0.087129 0
6 0.437587 0.020218 0
7 0.891773 0.832620 1
8 0.963663 0.778157 0
9 0.383442 0.870012 0
To import this dataframe into a MySQL table:
# Import dataframe into MySQL
import sqlalchemy
database_username = 'ENTER USERNAME'
database_password = 'ENTER USERNAME PASSWORD'
database_ip = 'ENTER DATABASE IP'
database_name = 'ENTER DATABASE NAME'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}#{2}/{3}'.
format(database_username, database_password,
database_ip, database_name))
frame.to_sql(con=database_connection, name='table_name_for_df', if_exists='replace')
One trick is that MySQLdb doesn't work with Python 3.x. So instead we use mysqlconnector, which may be installed as follows:
pip install mysql-connector==2.1.4 # version avoids Protobuf error
Output:
Note that to_sql creates the table as well as the columns if they do not already exist in the database.
You can do it by using pymysql:
For example, let's suppose you have a MySQL database with the next user, password, host and port and you want to write in the database 'data_2', if it is already there or not.
import pymysql
user = 'root'
passw = 'my-secret-pw-for-mysql-12ud'
host = '172.17.0.2'
port = 3306
database = 'data_2'
If you already have the database created:
conn = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database,
charset='utf8')
data.to_sql(name=database, con=conn, if_exists = 'replace', index=False, flavor = 'mysql')
If you do NOT have the database created, also valid when the database is already there:
conn = pymysql.connect(host=host, port=port, user=user, passwd=passw)
conn.cursor().execute("CREATE DATABASE IF NOT EXISTS {0} ".format(database))
conn = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database,
charset='utf8')
data.to_sql(name=database, con=conn, if_exists = 'replace', index=False, flavor = 'mysql')
Similar threads:
Writing to MySQL database with pandas using SQLAlchemy, to_sql
Writing a Pandas Dataframe to MySQL
The to_sql method works for me.
However, keep in mind that the it looks like it's going to be deprecated in favor of SQLAlchemy:
FutureWarning: The 'mysql' flavor with DBAPI connection is deprecated and will be removed in future versions. MySQL will be further supported with SQLAlchemy connectables. chunksize=chunksize, dtype=dtype)
Python 2 + 3
Prerequesites
Pandas
MySQL server
sqlalchemy
pymysql: pure python mysql client
Code
from pandas.io import sql
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://{user}:{pw}#localhost/{db}"
.format(user="root",
pw="your_password",
db="pandas"))
df.to_sql(con=engine, name='table_name', if_exists='replace')
This should do the trick:
import pandas as pd
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine
# Create engine
engine = create_engine('mysql://USER_NAME_HERE:PASS_HERE#HOST_ADRESS_HERE/DB_NAME_HERE')
# Create the connection and close it(whether successed of failed)
with engine.begin() as connection:
df.to_sql(name='INSERT_TABLE_NAME_HERE/INSERT_NEW_TABLE_NAME', con=connection, if_exists='append', index=False)
You might output your DataFrame as a csv file and then use mysqlimport to import your csv into your mysql.
EDIT
Seems pandas's build-in sql util provide a write_frame function but only works in sqlite.
I found something useful, you might try this
This has worked for me. At first I've created only the database, no predefined table I created.
from platform import python_version
print(python_version())
3.7.3
path='glass.data'
df=pd.read_csv(path)
df.head()
!conda install sqlalchemy
!conda install pymysql
pd.__version__
'0.24.2'
sqlalchemy.__version__
'1.3.20'
restarted the Kernel after installation.
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://USER:PASSWORD#HOST:PORT/DATABASE_NAME', echo=False)
try:
df.to_sql(name='glasstable',con=engine,index=False, if_exists='replace')
print('Sucessfully written to Database!!!')
except Exception as e:
print(e)
df.to_sql(name = "owner", con= db_connection, schema = 'aws', if_exists='replace', index = >True, index_label='id')