I want to export pandas dataframe to Sybase SQL databse. From here i found the code how to do it link
However, I have the following error.
pyodbc.ProgrammingError: ('42000', '[42000] [Sybase][ODBC Driver]Syntax error or access violation (0) (SQLPrepare)')
Could you help to solve it?
My code -
import pandas as pd
import pyodbc as db
from datetime import datetime
#set constants
DSN = 'DSN'
input_table = 'table'
run_timestamp = str(datetime.now())[:19]
test_date_start = '2020-09-09'
test_date_end = '2025-08-08'
input_data = pd.DataFrame({
'model':['aaa','aaa'], 'result_type':['a','test_statistic'], 'test_name':['b', 'mwb'], 'input_variable_name':['c','pd'], 'segment':['car','book'],
'customer_type':['le','le'], 'value':[60, 0.58], 'del_flag':[0,0]
})
query = 'insert into schema.table (data_input_time,test_date_start,test_date_end,model,result_type,test_name,input_variable_name,segment,customer_type,value,del_flag) values (?,?,?,?,?,?,?,?,?,?,?)'
cnxn = db.connect(DSN)
cursor = cnxn.cursor()
cursor.execute('SETUSER MYUSERNAME')
for row_count in range(0, input_data.shape[0]):
#1 method
chunk = input_data.iloc[row_count:row_count + 1, :].values.tolist()
tuple_of_tuples = tuple(tuple(x) for x in chunk)
cursor.executemany(query, tuple_of_tuples)
#2 method
params = [(i,) for i in chunk] #f'txt{i}'
cursor.executemany(query, params)
You're using a 'f' string, but forgot the brackets. It should look like this
query = f'''insert into schema.table ({data_input_time},{test_date_start},{test_date_end,model},{result_type,test_name},{input_variable_name},{segment,customer_type},{value,del_flag}) values ('?,?,?,?,?,?,?,?,?,?,?)'''
Related
How can I easily write my pandas dataframe to a MySQL database using mysql.connector?
import mysql.connector as sql
import pandas as pd
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df.to_sql(con=db_connection, name='table_name', if_exists='replace')
Tried this but it gives me an error that:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': Not all parameters were used in the SQL statement
Does the mysql.connectornot have a df.to_sqlfunction?
These are the col names:
Col names Index(['Person_ID', 'AirTable_ID_Person', 'Person_Name', 'Gender', 'Ethnicity',
'LinkedIn_Link_to_the_Profile_of_Person', 'Jensen_Analyst',
'Data_Source', 'Created_Time', 'Last_Modified_Time', 'Last refresh',
'createdTime', 'Gender_ID', 'Ethnicity_ID', 'Jensen_Analyst_ID',
'Data_Source_ID', 'Position_ID', 'Egnyte_File', 'Comment', 'Move',
'Right_Move', 'Bio-Import-Assistant', 'Diversity'],
dtype='object')
Pandas requires an SQLAlchemy engine to write data to sql. You can take up the following two approaches, the first being writing with a connector execure and the second using the engine with a pandas.to_sql statement.
It works very similar to your pandas read function.
import pandas as pd
import mysql.connector as sql
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df_temp = df[['Person_Name', 'Person_ID']]
query_insert = 'insert into table_name(Person_Name) values %s where Person_ID = %s'
pars = df_temp.values.tolist()
pars = list(map(tuple, pars))
cursor = db_connection.cursor()
cursor.executemany(query, pars)
cursor.commit()
cursor.close()
Or you can establish an engine for uploading.
import pandas as pd
from sqlalchemy import create_engine
import mysql.connector as sql
# engine = create_engine('mysql+pymysql://username:password#host/database')
# or in your case-
engine = create_engine('mysql+pymysql://user:pw#124685.eu-central-1.rds.amazonaws.com/db_name')
db_connection = sql.connect(host='124685.eu-central-1.rds.amazonaws.com',
database="db_name", user='user', password='pw')
query = 'SELECT * FROM table_name'
df = pd.read_sql(sql=query, con=db_connection)
df["Person_Name"] = "xx"
df.to_sql(con=engine, name='table_name', if_exists='replace')
For this method be sure to install pymysql before running with pip install pymysql and you should be good to go.
I have been looking since yesterday about the way I could convert the output of an SQL Query into a Pandas dataframe.
For example a code that does this :
data = select * from table
I've tried so many codes I've found on the internet but nothing seems to work.
Note that my database is stored in Azure DataBricks and I can only access the table using its URL.
Thank you so much !
Hope this would help you out. Both insertion & selection are in this code for reference.
def db_insert_user_level_info(table_name):
#Call Your DF Here , as an argument in the function or pass directly
df=df_parameter
params = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=DESKTOP-ITAJUJ2;DATABASE=githubAnalytics")
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
engine.connect()
table_row_count=select_row_count(table_name)
df_row_count=df.shape[0]
if table_row_count == df_row_count:
print("Data Cannot Be Inserted Because The Row Count is Same")
else:
df.to_sql(name=table_name,con=engine, index=False, if_exists='append')
print("********************************** DONE EXECTUTED SUCCESSFULLY ***************************************************")
def select_row_count(table_name):
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DESKTOP-ITAJUJ2;"
"Database=githubAnalytics;"
"Trusted_Connection=yes;")
cur = cnxn.cursor()
try:
db_cmd = "SELECT count(*) FROM "+table_name
res = cur.execute(db_cmd)
# Do something with your result set, for example print out all the results:
for x in res:
return x[0]
except:
print("Table is not Available , Please Wait...")
Using sqlalchemy to connect to the database, and the built-in method read_sql_query from pandas to go straight to a DataFrame:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(url)
connection = engine.connect()
query = "SELECT * FROM table"
df = pd.read_sql_query(query,connection)
I'm trying to store a mySQL query result in a pandas DataFrame using pymysql and am running into errors building the dataframe. Found a similar question here and here, but it looks like there are pymysql-specific errors being thrown:
import pandas as pd
import datetime
import pymysql
# dummy values
connection = pymysql.connect(user='username', password='password', databse='database_name', host='host')
start_date = datetime.datetime(2017,11,15)
end_date = datetime.datetime(2017,11,16)
try:
with connection.cursor() as cursor:
query = "SELECT * FROM orders WHERE date_time BETWEEN %s AND %s"
cursor.execute(query, (start_date, end_date))
df = pd.DataFrame(data=cursor.fetchall(), index = None, columns = cursor.keys())
finally:
connection.close()
returns: AttributeError: 'Cursor' object has no attribute 'keys'
If I drop the index and columns arguments:
try:
with connection.cursor() as cursor:
query = "SELECT * FROM orders WHERE date_time BETWEEN %s AND %s"
cursor.execute(query, (start_date, end_date))
df = pd.DataFrame(cursor.fetchall())
finally:
connection.close()
returns ValueError: DataFrame constructor not properly called!
Thanks in advance!
Use Pandas.read_sql() for this:
query = "SELECT * FROM orders WHERE date_time BETWEEN ? AND ?"
df = pd.read_sql(query, connection, params=(start_date, end_date))
Thank you for your suggestion to use pandas.read_sql(). It works with executing a stored procedure as well! I tested it in MSSQL 2017 environment.
Below is an example (I hope it helps others):
def database_query_to_df(connection, stored_proc, start_date, end_date):
# Define a query
query ="SET NOCOUNT ON; EXEC " + stored_proc + " ?, ? " + "; SET NOCOUNT OFF"
# Pass the parameters to the query, execute it, and store the results in a data frame
df = pd.read_sql(query, connection, params=(start_date, end_date))
return df
Try This:
import pandas as pd
import pymysql
mysql_connection = pymysql.connect(host='localhost', user='root', password='', db='test', charset='utf8')
sql = "SELECT * FROM `brands`"
df = pd.read_sql(sql, mysql_connection, index_col='brand_id')
print(df)
I'm trying to drop an existing table, do a query and then recreate the table using the pandas to_sql function. This query works in pgadmin, but not here. Any ideas of if this is a pandas bug or if my code is wrong?
Specific error is ValueError: Table 'a' already exists.
import pandas.io.sql as psql
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
c = engine.connect()
conn = c.connection
sql = """
drop table a;
select * from some_table limit 1;
"""
df = psql.read_sql(sql, con=conn)
print df.head()
df.to_sql('a', engine)
conn.close()
Why are you doing this like that? There is a shorter way: the if_exists kwag in to_sql. Try this:
import pandas.io.sql as psql
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
c = engine.connect()
conn = c.connection
sql = """
select * from some_table limit 1;
"""
df = psql.read_sql(sql, con=conn)
print df.head()
# Notice how below line is different. You forgot the schema argument
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')
conn.close()
According to docs:
replace: If table exists, drop it, recreate it, and insert data.
Ps. Additional tip:
This is better way to handle the connection:
with engine.connect() as conn, conn.begin():
sql = """select * from some_table limit 1"""
df = psql.read_sql(sql, con=conn)
print df.head()
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')
Because it ensures that your connection is always closed, even if your program exits with an error. This is important to prevent data corruption. Further, I would just use this:
import pandas as pd
...
pd.read_sql(sql, conn)
instead of the way you are doing it.
So, if I was in your place writing that code, it would look like this:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://user#localhost:port/dbname')
with engine.connect() as conn, conn.begin():
df = pd.read_sql('select * from some_table limit 1', con=conn)
print df.head()
df.to_sql('a', con=conn, schema=schema_name, if_exists='replace')
I want to extract data from a postgresql database and use that data (in a dataframe format) in a script. Here's my initial try:
from pandas import DataFrame
import psycopg2
conn = psycopg2.connect(host=host_address, database=name_of_database, user=user_name, password=user_password)
cur = conn.cursor()
cur.execute("SELECT * FROM %s;" % name_of_table)
the_data = cur.fetchall()
colnames = [desc[0] for desc in cur.description]
the_frame = DataFrame(the_data)
the_frame.columns = colnames
cur.close()
conn.close()
Note: I am aware that I should not use "string parameters interpolation (%) to pass variables to a SQL query string", but this works great for me as it is.
Would there be a more direct approach to this?
Edit: Here's what I used from the selected answer:
import pandas as pd
import sqlalchemy as sq
engine = sq.create_engine("postgresql+psycopg2://username:password#host:port/database")
the_frame = pd.read_sql_table(name_of_table, engine)
Pandas can load data from Postgres directly:
import psycopg2
import pandas.io.sql as pdsql
conn = psycopg2.connect(...)
the_frame = pdsql.read_frame("SELECT * FROM %s;" % name_of_table, conn)
If you have a recent pandas (>=0.14), you should use read_sql_query/table (read_frame is deprecated) with an sqlalchemy engine:
import pandas as pd
import sqlalchemy
import psycopg2
engine = sqlalchemy.create_engine("postgresql+psycopg2://...")
the_frame = pd.read_sql_query("SELECT * FROM %s;" % name_of_table, engine)
the_frame = pd.read_sql_table(name_of_table, engine)
Here is an alternate method:
# run sql code
result = conn.execute(sql)
# Insert to a dataframe
df = DataFrame(data=list(result), columns=result.keys())