I have a python script that loads , transform and calculates data. In sql-server there's a stored procedure that requires a table valued parameter, 2 required parameters and 2 optional parameters. In sql server I can call this SP:
USE [InstName]
GO
DECLARE #return_value int
DECLARE #MergeOnColumn core.MatchColumnTable
INSERT INTO #MergeOnColumn
SELECT 'foo.ExternalInput','bar.ExternalInput'
EXEC #return_value = [core].[_TableData]
#Target = N'[dbname].[tablename1]',
#Source = N'[dbname].[table2]',
#MergeOnColumn = #MergeOnColumn,
#Opt1Param = False,
#Opt2Param = False
SELECT 'Return Value' = #return_value
GO
after a comprehensive search I found the following post:
How to call stored procedure with SQLAlchemy that requires a user-defined-type Table parameter
it suggests to use PYTDS and the sql-alchemy 's dialect 'sql alchemy pytds' to call a SP with table valued parameters.
with this post and the documentation I created the following Python script:
import pandas as pd
import pytds
from pytds import login
import sqlalchemy as sa
from sqlalchemy import create_engine
import sqlalchemy_pytds
def connect():
return pytds.connect(dsn='ServerName',database='DBName', auth=login.SspiAuth())
engine = sa.create_engine('mssql+pytds://[ServerName]', creator=connect)
conn = engine.raw_connection()
with conn.cursor() as cur:
arg = ("foo.ExternalInput","bar.ExternalInput")
tvp = pytds.TableValuedParam(type_name="MergeOnColumn", rows=(arg))
cur.execute('EXEC test_proc %s', ("[dbname].[table2]", "[dbname].[table1]", tvp,))
cur.fetchall()
When I run this code I get the following error message:
TypeError: not all arguments converted during string formatting
Doe anyone know how to pass in the multiple arguments correctly or has a suggestion how I could handle this call SP directly?
On the basis of the comments to my question i've managed to get the stored procedure running with table valued parameters (and get the return values from the SP)
The final script is as follows:
import pandas as pd
import pytds
from pytds import login
import sqlalchemy as sa
from sqlalchemy import create_engine
import sqlalchemy_pytds
def connect():
return pytds.connect(dsn='ServerName',database='DBName',autocommit=True, auth=login.SspiAuth())
engine = sa.create_engine('mssql+pytds://[ServerName]', creator=connect)
conn = engine.raw_connection()
with conn.cursor() as cur:
arg = [["foo.ExternalInput","bar.ExternalInput"]]
tvp = pytds.TableValuedParam(type_name="core.MatchColumnTable", rows=arg)
cur.execute("EXEC test_proc #Target = N'[dbname].[tablename1]', #Source = N'[dbname].[table2]', #CleanTarget = 0, #UseColumnsFromTarget = 0, #MergeOnColumn = %s", (tvp,))
result = cur.fetchall()
print(result)
The autocommit is added in the connection (to commit the transaction in the cursor), the table valued parameter (marchcolumntable) expects 2 columns, so the arg is modified to fit 2 columns.
The parameters that are required besides the tvp are included in the exec string. The last param in the execute string is the name of the tvp parameter(mergeoncolumn) that is filled with the tvp.
optionally you can add the result status or row count as descripted in the pytds documentation:
https://python-tds.readthedocs.io/en/latest/index.html
Note!: in the stored procedure you have to make sure that the
SET NOCOUNT ON is added otherwise you wont get any results back to Python
pytds
Python DBAPI driver for MSSQL using pure Python TDS (Tabular Data Stream) protocol implementation
I used pytds for merge / upsert via a stored procedure targeting a SQL Server.
Example
Here are a example of the basic functions, a row data is represented by Tuple:
def get_connection(instance: str, database: str, user: str, password: str):
return pytds.connect(
dsn=instance, database=database, user=user, password=password, autocommit=True
)
def execute_with_tvp(connection: pytds.Connection, procedure_name: str, rows: list):
with connection.cursor() as cursor:
tvp = pytds.TableValuedParam(type_name=my_type, rows=rows)
cursor.callproc(procedure_name, tvp)
mssql+pyodbc://
pyodbc added support for table-valued parameters (TVPs) in version 4.0.25, released 2018-12-13. Simply supply the TVP value as a list of tuples:
proc_name = "so51930062"
type_name = proc_name + "Type"
# set up test environment
with engine.begin() as conn:
conn.exec_driver_sql(f"""\
DROP PROCEDURE IF EXISTS {proc_name}
""")
conn.exec_driver_sql(f"""\
DROP TYPE IF EXISTS {type_name}
""")
conn.exec_driver_sql(f"""\
CREATE TYPE {type_name} AS TABLE (
id int,
txt nvarchar(50)
)
""")
conn.exec_driver_sql(f"""\
CREATE PROCEDURE {proc_name}
#prefix nvarchar(10),
#tvp {type_name} READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT id, #prefix + txt AS new_txt FROM #tvp;
END
""")
#run test
with engine.begin() as conn:
data = {"prefix": "new_", "tvp": [(1, "foo"), (2, "bar")]}
sql = f"{{CALL {proc_name} (:prefix, :tvp)}}"
print(conn.execute(sa.text(sql), data).fetchall())
# [(1, 'new_foo'), (2, 'new_bar')]
Related
Why can't I raw insert a list of dicts with SQLalchemy ?
import os
import sqlalchemy
import pandas as pd
def connect_unix_socket() -> sqlalchemy.engine:
db_user = os.environ["DB_USER"]
db_pass = os.environ["DB_PASS"]
db_name = os.environ["DB_NAME"]
unix_socket_path = os.environ["INSTANCE_UNIX_SOCKET"]
return sqlalchemy.create_engine(
sqlalchemy.engine.url.URL.create(
drivername="postgresql+pg8000",
username=db_user,
password=db_pass,
database=db_name,
query={"unix_sock": f"{unix_socket_path}/.s.PGSQL.5432"},
)
)
def _insert_ecoproduct(df: pd.DataFrame) -> None:
db = connect_unix_socket()
db_matching = {
'gtin': 'ecoproduct_id',
'ITEM_NAME_AS_IN_MARKETPLACE' : 'ecoproductname',
'ITEM_WEIGHT_WITH_PACKAGE_KG' : 'ecoproductweight',
'ITEM_HEIGHT_CM' : 'ecoproductlength',
'ITEM_WIDTH_CM' : 'ecoproductwidth',
'test_gtin' : 'gtin_test',
'batteryembedded' : 'batteryembedded'
}
df = df[db_matching.keys()]
df.rename(columns=db_matching, inplace=True)
data = df.to_dict(orient='records')
sql_query = """INSERT INTO ecoproducts(
ecoproduct_id,
ecoproductname,
ecoproductweight,
ecoproductlength,
ecoproductwidth,
gtin_test,
batteryembedded)
VALUES (%(ecoproduct_id)s, %(ecoproductname)s,%(ecoproductweight)s,%(ecoproductlength)s,
%(ecoproductwidth)s,%(gtin_test)s,%(batteryembedded)s)
ON CONFLICT(ecoproduct_id) DO NOTHING;"""
with db.connect() as conn:
result = conn.exec_driver_sql(sql_query, data)
print(f"{result.rowcount} new rows were inserted.")
I keep having this error :
Is it possible to map parameters with th dialect pg8000 ? Or maybe I should use psycopg2 ?
What is the problem here ?
EDIT 1: see variable data details :
print(data)
print(type(data))
[{'ecoproduct_id': '6941487202157', 'ecoproductname': 'HUAWEI FreeBuds Pro Bluetooth sans Fil ', 'ecoproductweight': '4', 'ecoproductlength': '0.220', 'ecoproductwidth': '0.99', 'gtin_test': False, 'batteryembedded': 0}]
<class 'list'>
Is it possible to map [named] parameters with th (sic) dialect pg8000 ?
Yes. Using a SQLAlchemy text() object allows us to use named parameters even if the underlying DBAPI does not.
import sqlalchemy as sa
sql_query = """\
INSERT INTO ecoproducts(
ecoproduct_id,
ecoproductname,
ecoproductweight,
ecoproductlength,
ecoproductwidth,
gtin_test,
batteryembedded)
VALUES (
:ecoproduct_id,
:ecoproductname,
:ecoproductweight,
:ecoproductlength,
:ecoproductwidth,
:gtin_test,
:batteryembedded)
ON CONFLICT(ecoproduct_id) DO NOTHING;
"""
result = conn.execute(sa.text(sql_query), data)
I have been looking since yesterday about the way I could convert the output of an SQL Query into a Pandas dataframe.
For example a code that does this :
data = select * from table
I've tried so many codes I've found on the internet but nothing seems to work.
Note that my database is stored in Azure DataBricks and I can only access the table using its URL.
Thank you so much !
Hope this would help you out. Both insertion & selection are in this code for reference.
def db_insert_user_level_info(table_name):
#Call Your DF Here , as an argument in the function or pass directly
df=df_parameter
params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=DESKTOP-ITAJUJ2;DATABASE=githubAnalytics")
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
table_row_count=select_row_count(table_name)
df_row_count=df.shape[0]
if table_row_count == df_row_count:
print("Data Cannot Be Inserted Because The Row Count is Same")
else:
df.to_sql(name=table_name,con=engine, index=False, if_exists='append')
print("********************************** DONE EXECTUTED SUCCESSFULLY ***************************************************")
def select_row_count(table_name):
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DESKTOP-ITAJUJ2;"
"Database=githubAnalytics;"
"Trusted_Connection=yes;")
cur = cnxn.cursor()
try:
db_cmd = "SELECT count(*) FROM "+table_name
res = cur.execute(db_cmd)
# Do something with your result set, for example print out all the results:
for x in res:
return x[0]
except:
print("Table is not Available , Please Wait...")
Using sqlalchemy to connect to the database, and the built-in method read_sql_query from pandas to go straight to a DataFrame:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(url)
connection = engine.connect()
query = "SELECT * FROM table"
df = pd.read_sql_query(query,connection)
Problem
I am trying to read a csv file to Pandas, and write it to a SQLite database.Process works for all the columns in the csv file except for "Fill qty" which is a Positive Integer(int64). The process changes the type from TEXT/INTEGER to BLOB.
So I tried to load only the "Fll qty" column from Pandas to SQLite, and surprisingly I noticed I can safely do that for all integers smaller than 10 (I don't have 9 in my dataset, so basically 1,2,...,8 loaded successfully).
Here is what I tried:
I tried what I could think of: change "Fill_Qty" type in Schema to INTEGER to REAL, NULL or TEXT , change data type in Pandas from int64 to float or string before inserting to SQLite table. None of them worked. By the look of it, the "Trade_History.csv" file seems to be fine in Pandas or Excel. Is there something that my eyes dont see?!? So I am really confused what is happening here!
You would need the .csv file to test the code. Here is the code and .csv file: https://github.com/Meisam-Heidari/Trading_Min_code
The code:
### Imports:
import pandas as pd
import numpy as np
import sqlite3
from sqlite3 import Error
def create_database(db_file):
try:
conn = sqlite3.connect(db_file)
finally:
conn.close()
def create_connection(db_file):
""" create a database connection to the SQLite database
specified by db_file
:param db_file: database file
:return: Connection object or None
"""
try:
conn = sqlite3.connect(db_file)
return conn
return None
def create_table(conn,table_name):
try:
c = conn.cursor()
c.execute('''CREATE TABLE {} (Fill_Qty TEXT);'''.format(table_name))
except Error as e:
print('Error Code: ', e)
finally:
conn.commit()
conn.close()
return None
def add_trade(conn, table_name, trade):
try:
print(trade)
sql = '''INSERT INTO {} (Fill_Qty)
VALUES(?)'''.format(table_name)
cur = conn.cursor()
cur.execute(sql,trade)
except Error as e:
print('Error When trying to add this entry: ',trade)
return cur.lastrowid
def write_to_db(conn,table_name,df):
for i in range(df.shape[0]):
trade = (str(df.loc[i,'Fill qty']))
add_trade(conn,table_name,trade)
conn.commit()
def update_db(table_name='My_Trades', db_file='Trading_DB.sqlite', csv_file_path='Trade_History.csv'):
df_executions = pd.read_csv(csv_file_path)
create_database(db_file)
conn = create_connection(db_file)
table_name = 'My_Trades'
create_table(conn, table_name)
# writing to DB
conn = create_connection(db_file)
write_to_db(conn,table_name,df_executions)
# Reading back from DB
df_executions = pd.read_sql_query("select * from {};".format(table_name), conn)
conn.close()
return df_executions
### Main Body:
df_executions = update_db()
Any alternatives
I am wondering if anyone have a similar experience? Any advices/solutions to help me load the data in SQLite?
I am Trying to have something light and portable and unless there is no alternatives, I prefer not to go with Postgres or MySQL.
You're not passing a container to .execute() when inserting the data. Reference: https://www.python.org/dev/peps/pep-0249/#id15
What you need to do instead is:
trade = (df.loc[i,'Fill qty'],)
# ^ this comma makes `trade` into a tuple
The types of errors you got would've been:
ValueError: parameters are of unsupported type
Or:
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The
current statement uses 1, and there are 2 supplied.
I recently transitioned from using SQLite for most of my data storage and management needs to MySQL. I think I've finally gotten the correct libraries installed to work with Python 3.6, but now I am having trouble creating a new table from a dataframe in the MySQL database.
Here are the libraries I import:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
In my code, I first create a dataframe from a CSV file (no issues here).
def csv_to_df(infile):
return pd.read_csv(infile)
Then I establish a connection to the MySQL database using this def function:
def mysql_connection():
user = 'root'
password = 'abc'
host = '127.0.0.1'
port = '3306'
database = 'a001_db'
engine = create_engine("mysql://{0}:{1}#{2}:{3}/{4}?charset=utf8".format(user, password, host, port, database))
return engine
Lastly, I use the pandas function "to_sql" to create the database table in the MySQL database:
def df_to_mysql(df, db_tbl_name, conn=mysql_connection(), index=False):
df.to_sql(con = conn, name = db_tbl_name, if_exists='replace', index = False)
I run the code using this line:
df_to_mysql(csv_to_df(r'path/to/file.csv'), 'new_database_table')
The yields the following error:
InvalidRequestError: Could not reflect: requested table(s) not available in Engine(mysql://root:***#127.0.0.1:3306/a001_db?charset=utf8): (new_database_table)
I think this is telling me that I must first create a table in the database before passing the data in the dataframe to this table, but I'm not 100% positive about that. Regardless, I'm looking for a way to create a table in a MySQL database without manually creating the table first (I have many CSVs, each with 50+ fields, that have to be uploaded as new tables in a MySQL database).
Any suggestions?
I took an approach suggested by aws_apprentice above which was to create the table first, then write data to the table.
The code below first auto-generates a mysql table from a df (auto defining table names and datatypes) then writes the df data to that table.
There were a couple of hiccups I had to overcome, such as: unnamed csv columns, determining the correct data type for each field in the mysql table.
I'm sure there are multiple other (better?) ways to do this, but this seems to work.
import pandas as pd
from sqlalchemy import create_engine
infile = r'path/to/file.csv'
db = 'a001_db'
db_tbl_name = 'a001_rd004_db004'
'''
Load a csv file into a dataframe; if csv does not have headers, use the headers arg to create a list of headers; rename unnamed columns to conform to mysql column requirements
'''
def csv_to_df(infile, headers = []):
if len(headers) == 0:
df = pd.read_csv(infile)
else:
df = pd.read_csv(infile, header = None)
df.columns = headers
for r in range(10):
try:
df.rename( columns={'Unnamed: {0}'.format(r):'Unnamed{0}'.format(r)}, inplace=True )
except:
pass
return df
'''
Create a mapping of df dtypes to mysql data types (not perfect, but close enough)
'''
def dtype_mapping():
return {'object' : 'TEXT',
'int64' : 'INT',
'float64' : 'FLOAT',
'datetime64' : 'DATETIME',
'bool' : 'TINYINT',
'category' : 'TEXT',
'timedelta[ns]' : 'TEXT'}
'''
Create a sqlalchemy engine
'''
def mysql_engine(user = 'root', password = 'abc', host = '127.0.0.1', port = '3306', database = 'a001_db'):
engine = create_engine("mysql://{0}:{1}#{2}:{3}/{4}?charset=utf8".format(user, password, host, port, database))
return engine
'''
Create a mysql connection from sqlalchemy engine
'''
def mysql_conn(engine):
conn = engine.raw_connection()
return conn
'''
Create sql input for table names and types
'''
def gen_tbl_cols_sql(df):
dmap = dtype_mapping()
sql = "pi_db_uid INT AUTO_INCREMENT PRIMARY KEY"
df1 = df.rename(columns = {"" : "nocolname"})
hdrs = df1.dtypes.index
hdrs_list = [(hdr, str(df1[hdr].dtype)) for hdr in hdrs]
for hl in hdrs_list:
sql += " ,{0} {1}".format(hl[0], dmap[hl[1]])
return sql
'''
Create a mysql table from a df
'''
def create_mysql_tbl_schema(df, conn, db, tbl_name):
tbl_cols_sql = gen_tbl_cols_sql(df)
sql = "USE {0}; CREATE TABLE {1} ({2})".format(db, tbl_name, tbl_cols_sql)
cur = conn.cursor()
cur.execute(sql)
cur.close()
conn.commit()
'''
Write df data to newly create mysql table
'''
def df_to_mysql(df, engine, tbl_name):
df.to_sql(tbl_name, engine, if_exists='replace')
df = csv_to_df(infile)
create_mysql_tbl_schema(df, mysql_conn(mysql_engine()), db, db_tbl_name)
df_to_mysql(df, mysql_engine(), db_tbl_name)
This
connection = engine.connect()
df.to_sql(con=connection, name='TBL_NAME', schema='SCHEMA', index=False, if_exists='replace')
works with oracle DB in specific schema wothout errors, but will not work if you have limited permissions. And note that table names is case sensative.
I have been using Psycopg2 to read stored procedures from Postgres successfully and getting a nice tuple returned, which has been easy to deal with. For example...
def authenticate(user, password):
conn = psycopg2.connect("dbname=MyDB host=localhost port=5433 user=postgres password=mypwd")
cur = conn.cursor()
retrieved_pwd = None
retrieved_userid = None
retrieved_user = None
retrieved_teamname = None
cur.execute("""
select "email", "password", "userid", "teamname"
from "RegisteredUsers"
where "email" = '%s'
""" % user)
for row in cur:
print row
The row that prints would give me ('user#gmail.com ', '84894531656894hashedpassword5161651165 ', 36, 'test ')
However, when I run the following code to read a row of fixtures with a Stored Procedure, I get (what looks to me like) an unholy mess.
def get_from_sql(userid):
conn = psycopg2.connect("dbname=MyDB host=localhost port=5433 user=postgres password=pwd")
fixture_cursor = conn.cursor()
callproc_params = [userid]
fixture_cursor.execute("select sppresentedfixtures(%s)", callproc_params)
for row in fixture_cursor:
print row
The resulting output:
('(5,"2015-08-28 21:00:00","2015-08-20 08:00:00","2015-08-25 17:00:00","Team ",,"Team ",,"Final ")',)
I have researched the cursor class and cannot understand why it outputs like this for a stored procedure. When executing within Postgres, the output is in a perfect Tuple. Using Psycopg2 adds onto the tuple and I don't understand why?
How do I change this so I get a tidy tuple? What am I not understanding about the request that I am making that gives me this result?
I have tried the callproc function and get an equally unhelpful output. Any thoughts on this would be great.
This is because you're SELECTing the result of the function directly. Your function returns a set of things, and each "thing" happens to be a tuple, so you're getting a list of stringified tuples back. What you want is this:
SELECT * FROM sppresentedfixtures(...)
But this doesn't work, because you'll get the error:
ERROR: a column definition list is required for functions returning "record"
The solution is to return a table instead:
CREATE OR REPLACE FUNCTION sppresentedfixtures(useridentity integer) RETURNS TABLE(
Fixture_No int,
Fixture_Date timestamp,
...
) AS
$BODY$
select
"Fixtures"."Fixture_No",
"Fixtures"."Fixture_Date",
...
from "Fixtures" ...
$BODY$ LANGUAGE sql