Pandas only support SQL Alchemy connectable [duplicate] - python

Context: I'd like to send a concatenated data frame (I joined several dataframes from individual stock data) into a MySQL database, however, I can't seem to create a table and send the data there
Problem: When I run this code df.to_sql(name='stockdata', con=con, if_exists='append', index=False) (source: Writing a Pandas Dataframe to MySQL), I keep getting this error: pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting.
I'm new to MySQL as well so any help is very welcome! Thank you
from __future__ import print_function
import pandas as pd
from datetime import date, datetime, timedelta
import numpy as np
import yfinance as yf
import mysql.conector
import pymysql as pymysql
import pandas_datareader.data as web
from sqlalchemy import create_engine
import yahoo_fin.stock_info as si
######################################################
# PyMySQL configuration
user = '...'
passw = '...'
host = '...'
port = 3306
database = 'stockdata'
con.cursor().execute("CREATE DATABASE IF NOT EXISTS {0} ".format(database))
con = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database,
charset='utf8')
df.to_sql(name='stockdata', con=con, if_exists='append', index=False)

.to_sql() expects the second argument to be either a SQLAlchemy Connectable object (Engine or Connection) or a DBAPI Connection object. If it is the latter then pandas assumes that it is a SQLite connection.
You need to use SQLAlchemy to create an engine object
engine = create_engine("mysql+pymysql://…")
and pass that to to_sql()

Related

Retrieving a datetime value from a sql table to dataframe using sqlalchemy

I'm trying to retrieve a table into a dataframe but I'm getting a "ValueError: hour must be in 0..23".
Here is my code :
from sqlalchemy import create_engine
import pyodbc
import pandas as pd
SERVER = '(local)'
DATABASE = 'Projects'
DRIVER = 'SQL Server'
DATABASE_CONNECTION = f'mssql://#{SERVER}/{DATABASE}?driver={DRIVER}'
engine = create_engine(DATABASE_CONNECTION)
connection = engine.connect()
data = pd.read_sql_query('select TOP 1 * from PRODSYNTHESIS',connection)
#ValueError: hour must be in 0..23
connection.close()
engine.dispose()
table schema
My understanding is that I should probably be using python's datetime module to handle this datatype but not sure exactly how to get there. All other solutions I've found relate to the dateframe itself not the data type coming through from sql.
Not sure, but you can always woraround by converting the date to an ISO8601 string in the query itself. EG:
data = pd.read_sql_query('select TOP 1 Project_id, Project_name, convert(varchar(23), Date_results, 126) Date_results, P_Injecte from PRODSYNTHESIS',connection)

Export a Dataframe into MSSQL Server as a new Table

I have written a Code to connect to a SQL Server with Python and save a Table from a database in a df.
from pptx import Presentation
import pyodbc
import pandas as pd
cnxn = pyodbc.connect("Driver={ODBC Driver 11 for SQL Server};"
"Server=Servername;"
"Database=Test_Database;"
"Trusted_Connection=yes;")
df = pd.read_sql_query('select * from Table1', cnxn)
Now I would like to modify df in Python and save it as df2. After that I would like to export df2 as a new Table (Table2) into the Database.
I cant find anything about exporting a dataframe to a SQL Server. you guys know how to do it?
You can use df.to_sql() for that. First create the SQLAlchemy connection, e.g.
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://scott:tiger#myhost:port/databasename?driver=SQL+Server+Native+Client+10.0")
See this answer for more details the connection string for MSSQL.
Then do:
df.to_sql('table_name', con=engine)
This defaults to raising an exception if the table already exists, adjust the if_exists parameter as necessary.
This is how I do it.
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=ServerName;'
r'DATABASE=DatabaseName;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))

(psycopg2.OperationalError) Invalid - opcode

I am trying to connect to Netezza using SQLalchemy.create_engine(). The reason I want to use SQLAlchmey is because I want to be able to read and write through pandas dataframe.
What works is as follow:
import pandas as pd
import pyodbc
conn = pyodbc.connect('DSN=NZDWW')
df2 = pd.read_sql(Query,conn)
Above code runs fine. But in order to write df dataframe to the Netezza, I need to use the function to_sql(), which needs SQLAlchemy. This is what my code looks like:
from sqlalchemy import create_engine
username = os.getenv('REDSHIFT_USER')
password = os.getenv('REDSHIFT_PASS')
DATABASE = "SHP_TARGET"
HOST = "Netezza1"
PORT = 5480
conn_str = "postgresql://"+username+":"+password+"#"+HOST+':'+str(PORT)+'/'+DATABASE
engine3 = create_engine(conn_str)
df = pd.read_sql(Query, engine3)
When I execute this, I get the following error:
OperationalError: (psycopg2.OperationalError) Invalid - opcode
Invalid - opcodeInvalid packet length (Background on this error at: http://sqlalche.me/e/e3q8)
Any leads will be much appreciated. thanks.
Database: Netezza
Python version: 3.6
OS: Windows
The sqlalchemy dialect for Postges isn't compatible with Netezza.
The error you're receiving is the psycopg2 module, which facilitates the connection, complaining that it can't make sense of what the server is "saying", basically.
There appears to be a dialect for Netezza though. You may want to try that out.
Here's the formal dialect for Netezza has been released.
It can be used as documented here - https://github.com/IBM/nzalchemy#prerequisites
Example
from sqlalchemy import create_engine
from urllib import parse_quote_plus
# assumes NZ_HOST, NZ_USER, NZ_PASSWORD are set
import os
params = parse_quote_plus(f"DRIVER=NetezzaSQL;SERVER={os['NZ_HOST']};"
f"DATABASE={os['NZ_DATABASE']};USER={os['NZ_USER'};"
f"PASSWORD={os['NZ_PASSWORD']}")
engine = create_engine(f"netezza+pyodbc:///?odbc_connect={params}",
echo=True)

Python: Write Pandas Dataframe to MSSQL --> Database Error

I have a pandas dataframe that has about 20k rows and 20 columns. I want to write it to a table in MSSQL.
I have the connection successfully established:
connection = pypyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=line;'
'uid=XXX;'
'pwd=XXX')
cursor = connection.cursor()
I'm trying to write my pandas dataframe to the MSSQL server with the following code:
df_EVENT5_16.to_sql('MODREPORT', connection, if_exists = 'replace')
But I get the following error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master
WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC
SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'.")
Modern Pandas versions expect SQLAlchemy engine as a connection, so use SQLAlchemy:
from sqlalchemy import create_engine
con = create_engine('mssql+pyodbc://username:password#myhost:port/databasename?driver=SQL+Server+Native+Client+10.0')
and then:
df_EVENT5_16.to_sql('MODREPORT', con, if_exists='replace')
from DataFrame.to_sql() docs:
con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that library.
If a DBAPI2 object, only sqlite3 is supported.
No need to use pyodbc to connect with MSSQL, SQL Alchemy will do that for you.
And also we can insert the data-frame directly into the database without iterating the data-frame using to_sql() method. Here is the code that working fine for me -
# To insert data frame into MS SQL database without iterate the data-frame
import pandas as pd
from sqlalchemy import create_engine, MetaData, Table, select
from six.moves import urllib
params = urllib.parse.quote_plus("DRIVER={SQL
Server};SERVER=serverName;DATABASE=dbName;UID=UserName;PWD=password")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
# suppose df is the data-frame that we want to insert in database
df.to_sql(name='table_name',con=engine, index=False, if_exists='append')

Get data from pandas into a SQL server with PYODBC

I am trying to understand how python could pull data from an FTP server into pandas then move this into SQL server. My code here is very rudimentary to say the least and I am looking for any advice or help at all. I have tried to load the data from the FTP server first which works fine.... If I then remove this code and change it to a select from ms sql server it is fine so the connection string works, but the insertion into the SQL server seems to be causing problems.
import pyodbc
import pandas
from ftplib import FTP
from StringIO import StringIO
import csv
ftp = FTP ('ftp.xyz.com','user','pass' )
ftp.set_pasv(True)
r = StringIO()
ftp.retrbinary('filname.csv', r.write)
pandas.read_table (r.getvalue(), delimiter=',')
connStr = ('DRIVER={SQL Server Native Client 10.0};SERVER=localhost;DATABASE=TESTFEED;UID=sa;PWD=pass')
conn = pyodbc.connect(connStr)
cursor = conn.cursor()
cursor.execute("INSERT INTO dbo.tblImport(Startdt, Enddt, x,y,z,)" "VALUES (x,x,x,x,x,x,x,x,x,x.x,x)")
cursor.close()
conn.commit()
conn.close()
print"Script has successfully run!"
When I remove the ftp code this runs perfectly, but I do not understand how to make the next jump to get this into Microsoft SQL server, or even if it is possible without saving into a file first.
For the 'write to sql server' part, you can use the convenient to_sql method of pandas (so no need to iterate over the rows and do the insert manually). See the docs on interacting with SQL databases with pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql
You will need at least pandas 0.14 to have this working, and you also need sqlalchemy installed. An example, assuming df is the DataFrame you got from read_table:
import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine("mssql+pyodbc://<username>:<password>#<dsnname>")
# write the DataFrame to a table in the sql database
df.to_sql("table_name", engine)
See also the documentation page of to_sql.
More info on how to create the connection engine with sqlalchemy for sql server with pyobdc, you can find here:http://docs.sqlalchemy.org/en/rel_1_1/dialects/mssql.html#dialect-mssql-pyodbc-connect
But if your goal is to just get the csv data into the SQL database, you could also consider doing this directly from SQL. See eg Import CSV file into SQL Server
Python3 version using a LocalDB SQL instance:
from sqlalchemy import create_engine
import urllib
import pyodbc
import pandas as pd
df = pd.read_csv("./data.csv")
quoted = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=(localDb)\ProjectsV14;DATABASE=database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
df.to_sql('TargetTable', schema='dbo', con = engine)
result = engine.execute('SELECT COUNT(*) FROM [dbo].[TargetTable]')
result.fetchall()
Yes, the bcp utility seems to be the best solution for most cases.
If you want to stay within Python, the following code should work.
from sqlalchemy import create_engine
import urllib
import pyodbc
quoted = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=YOUR\ServerName;DATABASE=YOur_Database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
df.to_sql('Table_Name', schema='dbo', con = engine, chunksize=200, method='multi', index=False, if_exists='replace')
Don't avoid method='multi', because it significantly reduces the task execution time.
Sometimes you may encounter the following error.
ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server
Driver][SQL Server]The incoming request has too many parameters. The
server supports a maximum of 2100 parameters. Reduce the number of
parameters and resend the request. (8003) (SQLExecDirectW)')
In such a case, determine the number of columns in your dataframe: df.shape[1]. Divide the maximum supported number of parameters by this value and use the result's floor as a chunk size.
I found that using bcp utility (https://learn.microsoft.com/en-us/sql/tools/bcp-utility) works best when you have a large dataset. I have 2.7 million rows that inserts at 80K rows/sec. You can store your data frame as csv file (use tabs for separator if your data doesn't have tabs and utf8 encoding). With bcp, I've used format "-c" and it works without issues so far.
This worked for me on Python 3.5.2:
import sqlalchemy as sa
import urllib
import pyodbc
conn= urllib.parse.quote_plus('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(conn))
frame.to_sql("myTable", engine, schema='dbo', if_exists='append', index=False, index_label='myField')
"As the Connection represents an open resource against the database, we want to always limit the scope of our use of this object to a specific context, and the best way to do that is by using Python context manager form, also known as the with statement."
https://docs.sqlalchemy.org/en/14/tutorial/dbapi_transactions.html
The example would then be
from sqlalchemy import create_engine
import urllib
import pyodbc
connection_string = (
"Driver={SQL Server Native Client 11.0};"
"Server=myserver;"
"UID=myuser;"
"PWD=mypwd;"
"Database=mydb;"
)
quoted = urllib.parse.quote_plus(connection_string)
engine = create_engine(f'mssql+pyodbc:///?odbc_connect={quoted}')
with engine.connect() as cnn:
df.to_sql('mytable',con=cnn, if_exists='replace', index=False)
Following is what worked for me using sqlalchemy. Pay attention to the last part ?driver=SQL+Server'.
import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine('mssql+pyodbc://MyUser:MyPWD#dataserver.sandbox.myserver/MY_DB?driver=SQL+Server')
dt.to_sql("PatientResultTest", engine,if_exists='append')
The SQL table needs an index column at the beginning to store the index value of dataframe.
# using class function
import pandas as pd
import pyodbc
import sqlalchemy
import urllib
class data_frame_to_sql():
def__init__(self,dataFrame,sql_table_name):
self.dataFrame=dataFrame
self.sql_table_name=sql_table_name
def conversion(self):
params = urllib.parse.quote_plus("DRIVER={SQL Server};"
"SERVER=######;"
"DATABASE=####;"
"UID=#####;"
"PWD=###;")
try:
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
return f"Table '{self.sql_table_name}' added sucsessfully in database" ,self.dataFrame.to_sql(self.sql_table_name, engine)
except Exception as e :
e=str(e).replace(".","")
print(f"{e} in Database." )
data={"BusinessEntityID":["1","2","3"],"FirstName":["raj","abhi","amir"],"LastName":["kapoor","bachn","khhan"]}
df = pd.DataFrame(data, columns= ['BusinessEntityID','FirstName','LastName'])
ab=data_frame_to_sql(df,"ab").conversion()
print(ab)
It's not necessary to use sqlamchemy, one could create a connection with pyodbc directly to use it with pandas, as below: `with pyodbc.connect('DRIVER={ODBC Driver 18 for SQL Server};SERVER='+server
+';DATABASE='+database+';UID='+username+';PWD='+ password) as newconn:
df = pd.read_sql(,newconn)
`

Categories