Insert pandas dataframe into actian PSQL database table using python - python

I want to import data of file "save.csv" into my actian PSQL database table "new_table" but i got error
ProgrammingError: ('42000', "[42000] [PSQL][ODBC Client Interface][LNA][PSQL][SQL Engine]Syntax Error: INSERT INTO 'new_table'<< ??? >> ('name','address','city') VALUES (%s,%s,%s) (0) (SQLPrepare)")
Below is my code:
connection = 'Driver={Pervasive ODBC Interface};server=localhost;DBQ=DEMODATA'
db = pyodbc.connect(connection)
c=db.cursor()
#create table i.e new_table
csv = pd.read_csv(r"C:\Users\user\Desktop\save.csv")
for row in csv.iterrows():
insert_command = """INSERT INTO new_table(name,address,city) VALUES (row['name'],row['address'],row['city'])"""
c.execute(insert_command)
c.commit()

Pandas have a built-in function that empty a pandas-dataframe into a sql-database called pd.to_sql(). This might be what you are looking for. Using this you dont have to manually insert one row at a time but you can insert the entire dataframe at once.
If you want to keep using your method, the issue might be that the table "new_table" hasn't been created yet in the database. And thus you first need something like this:
CREATE TABLE new_table
(
Name [nvarchar](100) NULL,
Address [nvarchar](100) NULL,
City [nvarchar](100) NULL
)
EDIT:
You can use to_sql() like this on tables that already exist in the database:
df.to_sql(
"new_table",
schema="name_of_the_schema",
con=c.session.connection(),
if_exists="append", # <--- This will append an already existing table
chunksize=10000,
index=False,
)

I have tried the same, in my case the table is created , I just want to insert each row from pandas dataframe into the database using Actian PSQL

Related

Pandas to_sql avoid duplicate rows

I am using pandas' to_sql method to insert data into a mysql table. The mysql table already exists and I'd like to avoid inserting duplicate rows.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html
Is there a way to do this in python?
# mysql connection
import pandas as pd
import pymysql
from sqlalchemy import create_engine
user = 'user1'
pwd = 'xxxx'
host = 'aa1.us-west-1.rds.amazonaws.com'
port = 3306
database = 'main'
engine = create_engine("mysql+pymysql://{}:{}#{}/{}".format(user,pwd,host,database))
con = engine.connect()
df.to_sql(name="dfx", con=con, if_exists = 'append')
con.close()
Are there any work-arounds, if there isn't a straight forward way to do this?
It sounds like you want to do an "upsert" (insert or update). Pangres is a useful package that will allow you to do an upsert using a pandas df. If you don't want to update the row if it exists, that is also an option by setting if_row_exists to 'ignore'
I have never heard of 'upsert' before today, but it sounds interesting. You could certainly delete dupes after the data is loaded into your table.
WITH a as
(
SELECT Firstname,ROW_NUMBER() OVER(PARTITION by Firstname, empID ORDER BY Firstname)
AS duplicateRecCount
FROM dbo.tblEmployee
)
--Now Delete Duplicate Records
DELETE FROM a
WHERE duplicateRecCount > 1
That will work fine, unless you have billions of rows.

Python dataframe value insertion to database table column

Is it possible to insert python dataframe values to database table column?
I am using snowflake as my database.
CommuteTime is the table which contains the StudentID column. "add_col" is the python dataframe. I need to insert the df values to StudentID column.
Below is my code which i have tried to insert df values to table column.
c_col = pd.read_sql_query('insert into "SIS_WIDE"."PUBLIC"."CommuteTime" ("StudentID") VALUES ("add_col")', engine)
When I execute the above its not accepting the dataframe. Its throwing the below error.
ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000904 (42000): SQL compilation error: error line 1 at position 68
invalid identifier '"add_col"' [SQL: 'insert into "SIS_WIDE"."PUBLIC"."CommuteTime" ("StudentID") VALUES ("add_col")']
(Background on this error at: http://sqlalche.me/e/f405)
Please provide suggestions to fix this..
You cannot make it with pd.read_sql_query.
First, you need to create Snowflake cursor.
e.g.
import snowflake.connector
cursor = snowflake.connector.connect(
user='username',
password='password',
database='database_name',
schema='PUBLIC',
warehouse='warehouse_name'
).cursor()
Once you have a cursor, you can query like this: cursor.execute("SELECT * from "CommuteTime")
To insert data into tables, you need to use INSERT INTO from Snowflake
Please provide more info about your dataframe, to help you further.
I was only able to do that using SQL Alchemy, not Snowflake Python Connector.
from sqlalchemy import create_engine
# Establish the connection to the Snowflake database
sf = 'snowflake://{}:{}#{}{}'.format(user, password, account, table_location)
engine = create_engine(sf)
# Write your data frame to a table in database
add_col.to_sql(table_name, con=engine, if_exists='replace', index=False)
See here to learn how to establish a connection to Snowflake by passing username, password, account, and table location.
Explore here to learn about the arguments you can pass as if_exists to the function to_sql().

Writing Python Dataframe to MSSQL Table

I currently have a Python dataframe that is 23 columns and 20,000 rows.
Using Python code, I want to write my data frame into a MSSQL server that I have the credentials for.
As a test I am able to successfully write some values into the table using the code below:
connection = pypyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=XXX;'
'uid=XXX;'
'pwd=XXX')
cursor = connection.cursor()
for index, row in df_EVENT5_15.iterrows():
cursor.execute("INSERT INTO MODREPORT(rowid, OPCODE, LOCATION, TRACKNAME)
cursor.execute("INSERT INTO MODREPORT(rowid, location) VALUES (?,?)", (5, 'test'))
connection.commit()
But how do I write all the rows in my data frame table to the MSSQL server? In order to do so, I need to code up the following steps in my Python environment:
Delete all the rows in the MSSQL server table
Write my dataframe to the server
When you say Python data frame, I'm assuming you're using a Pandas dataframe. If it's the case, then you could use the to_sql function.
df.to_sql("MODREPORT", connection, if_exists="replace")
The if_exists argument set to replace will delete all the rows in the existing table before writing the records.
I realise it's been a while since you asked but the easiest way to delete ALL the rows in the SQL server table (point 1 of the question) would be to send the command
TRUNCATE TABLE Tablename
This will drop all the data in the table but leave the table and indexes empty so you or the DBA would not need to recreate it. It also uses less of the transaction log when it runs.

Inserting through pymssql but no rows appear in the database

Im quite new to python and im trying to write a script that puts values into a SQL database.
its a simple 2 column table that looks like this:
CREATE TABLE [dbo].[pythonInsertTest](
[ID] [int] IDENTITY(1,1) NOT NULL,
[value] [varchar](50) NULL
I tried doing a select query from python and that worked! So the connection or anything of that sort is not the problem. but when i'm inserting into the table with the following code:
import pymssql
conn = pymssql.connect(server='XXXX', user='XXXX', password='XXXX', database='XXXX')
cursor = conn.cursor()
cursor.execute("insert into pythonInsertTest(value) OUTPUT INSERTED.ID VALUES('test')")
row = cursor.fetchone()
while row:
print "Inserted Product ID : " +str(row[0])
row = cursor.fetchone()
the response is:
$? && exit 1
Inserted Product ID : 20
Exit status: 0
However if i look in my sql manager and select all the rows in said table, the row i just added is not in there... But when i manually insert a row trough an SQL query in the manager its added.
Thing to note that it did skip the ID the id that was "inserted" trough my python script.
Anyone seen this before or knows what to do?
Ha, I've banged my head on this a few times as well. As far as I can tell, you are missing a commit statement. As per this example, add a conn.commit(), and hopefully you will be golden.

insert ignore pandas dataframe into mysql

I want to "insert ignore" an entire pandas dataframe into mysql. Is there a way to do this without looping over the rows?
In dataframe.to_sql I only see the option if_exists 'append' but will this still continue on duplicate unique keys?
Consider using a temp table (with exact structure of final table) that is always replaced by pandas then run the INSERT IGNORE in a cursor call:
dataframe.to_sql('myTempTable', con, if_exists ='replace')
cur = con.cursor()
cur.execute("INSERT IGNORE INTO myFinalTable SELECT * FROM myTempTable")
con.commit()
There is no way to do this in pandas till the current version of pandas (0.20.3) .
The option if_exists applies only on table ( not on rows ) as stated in the documentation.
if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’
fail: If table exists, do nothing.
replace: If table exists, drop it, recreate it, and insert data.
append: If table exists, insert data. Create if does not exist.
Via Looping
This will slow down the process as you are inserting one row at a time
for x in xrange(data_frame.shape[0]):
try:
data_frame.iloc[x:x+1].to_sql(con=sql_engine, name="table_name", if_exists='append')
except IntegrityError:
# Your code to handle duplicates
pass

Categories