Is it possible to insert python dataframe values to database table column?
I am using snowflake as my database.
CommuteTime is the table which contains the StudentID column. "add_col" is the python dataframe. I need to insert the df values to StudentID column.
Below is my code which i have tried to insert df values to table column.
c_col = pd.read_sql_query('insert into "SIS_WIDE"."PUBLIC"."CommuteTime" ("StudentID") VALUES ("add_col")', engine)
When I execute the above its not accepting the dataframe. Its throwing the below error.
ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000904 (42000): SQL compilation error: error line 1 at position 68
invalid identifier '"add_col"' [SQL: 'insert into "SIS_WIDE"."PUBLIC"."CommuteTime" ("StudentID") VALUES ("add_col")']
(Background on this error at: http://sqlalche.me/e/f405)
Please provide suggestions to fix this..
You cannot make it with pd.read_sql_query.
First, you need to create Snowflake cursor.
e.g.
import snowflake.connector
cursor = snowflake.connector.connect(
user='username',
password='password',
database='database_name',
schema='PUBLIC',
warehouse='warehouse_name'
).cursor()
Once you have a cursor, you can query like this: cursor.execute("SELECT * from "CommuteTime")
To insert data into tables, you need to use INSERT INTO from Snowflake
Please provide more info about your dataframe, to help you further.
I was only able to do that using SQL Alchemy, not Snowflake Python Connector.
from sqlalchemy import create_engine
# Establish the connection to the Snowflake database
sf = 'snowflake://{}:{}#{}{}'.format(user, password, account, table_location)
engine = create_engine(sf)
# Write your data frame to a table in database
add_col.to_sql(table_name, con=engine, if_exists='replace', index=False)
See here to learn how to establish a connection to Snowflake by passing username, password, account, and table location.
Explore here to learn about the arguments you can pass as if_exists to the function to_sql().
Related
I am using pandas' to_sql method to insert data into a mysql table. The mysql table already exists and I'd like to avoid inserting duplicate rows.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html
Is there a way to do this in python?
# mysql connection
import pandas as pd
import pymysql
from sqlalchemy import create_engine
user = 'user1'
pwd = 'xxxx'
host = 'aa1.us-west-1.rds.amazonaws.com'
port = 3306
database = 'main'
engine = create_engine("mysql+pymysql://{}:{}#{}/{}".format(user,pwd,host,database))
con = engine.connect()
df.to_sql(name="dfx", con=con, if_exists = 'append')
con.close()
Are there any work-arounds, if there isn't a straight forward way to do this?
It sounds like you want to do an "upsert" (insert or update). Pangres is a useful package that will allow you to do an upsert using a pandas df. If you don't want to update the row if it exists, that is also an option by setting if_row_exists to 'ignore'
I have never heard of 'upsert' before today, but it sounds interesting. You could certainly delete dupes after the data is loaded into your table.
WITH a as
(
SELECT Firstname,ROW_NUMBER() OVER(PARTITION by Firstname, empID ORDER BY Firstname)
AS duplicateRecCount
FROM dbo.tblEmployee
)
--Now Delete Duplicate Records
DELETE FROM a
WHERE duplicateRecCount > 1
That will work fine, unless you have billions of rows.
I'm new to Python and Pandas - please be gentle!
I'm using SqlAlchemy with pymssql to execute a SQL query against a SQL Server database and then convert the result set into a dataframe. I'm then attempting to write this dataframe as a Parquet file:
engine = sal.create_engine(connectionString)
conn = engine.connect()
df = pd.read_sql(query, con=conn)
df.to_parquet(outputFile)
The data I'm retrieving in the SQL query includes a uniqueidentifier column (i.e. a UUID) named rowguid. Because of this, I'm getting the following error on the last line above:
pyarrow.lib.ArrowInvalid: ("Could not convert UUID('92c4279f-1207-48a3-8448-4636514eb7e2') with type UUID: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column rowguid with type object')
Is there any way I can force all UUIDs to strings at any point in the above chain of events?
A few extra notes:
The goal for this portion of code was to receive the SQL query text as a parameter and act as a generic SQL-to-Parquet function.
I realise I can do something like df['rowguid'] = df['rowguid'].astype(str), but it relies on me knowing which columns have uniqueidentifier types. By the time it's a dataframe, everything is an object and each query will be different.
I also know I can convert it to a char(36) in the SQL query itself, however, I was hoping to do something more "automatic" so the person writing the query doesn't trip over this problem accidentally all the time / doesn't have to remember to always convert the datatype.
Any ideas?
Try DuckDB
engine = sal.create_engine(connectionString)
conn = engine.connect()
df = pd.read_sql(query, con=conn)
df.to_parquet(outputFile)
# Close the database connection
conn.close()
# Create DuckDB connection
duck_conn = duckdb.connect(':memory:')
# Write DataFrame content to a snappy compressed parquet file
COPY (SELECT * FROM df) TO 'df-snappy.parquet' (FORMAT 'parquet')
Ref:
https://duckdb.org/docs/guides/python/sql_on_pandas
https://duckdb.org/docs/sql/data_types/overview
https://duckdb.org/docs/data/parquet
I want to import data of file "save.csv" into my actian PSQL database table "new_table" but i got error
ProgrammingError: ('42000', "[42000] [PSQL][ODBC Client Interface][LNA][PSQL][SQL Engine]Syntax Error: INSERT INTO 'new_table'<< ??? >> ('name','address','city') VALUES (%s,%s,%s) (0) (SQLPrepare)")
Below is my code:
connection = 'Driver={Pervasive ODBC Interface};server=localhost;DBQ=DEMODATA'
db = pyodbc.connect(connection)
c=db.cursor()
#create table i.e new_table
csv = pd.read_csv(r"C:\Users\user\Desktop\save.csv")
for row in csv.iterrows():
insert_command = """INSERT INTO new_table(name,address,city) VALUES (row['name'],row['address'],row['city'])"""
c.execute(insert_command)
c.commit()
Pandas have a built-in function that empty a pandas-dataframe into a sql-database called pd.to_sql(). This might be what you are looking for. Using this you dont have to manually insert one row at a time but you can insert the entire dataframe at once.
If you want to keep using your method, the issue might be that the table "new_table" hasn't been created yet in the database. And thus you first need something like this:
CREATE TABLE new_table
(
Name [nvarchar](100) NULL,
Address [nvarchar](100) NULL,
City [nvarchar](100) NULL
)
EDIT:
You can use to_sql() like this on tables that already exist in the database:
df.to_sql(
"new_table",
schema="name_of_the_schema",
con=c.session.connection(),
if_exists="append", # <--- This will append an already existing table
chunksize=10000,
index=False,
)
I have tried the same, in my case the table is created , I just want to insert each row from pandas dataframe into the database using Actian PSQL
I am trying to write aa table to an Oracle database using Python's pandas.
Here's is my code:
import cx_Oracle
import pandas as pd
import csv
df = pd.read_csv('C:/Users/admin/Desktop/customer.csv')
conn = cx_Oracle.connect('SYSTEM/Mouni123$#localhost/orcl')
df = df.to_sql('cust', conn, 'if_exists=replace')
conn.close()
df
I get the following error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ORA-01036: illegal variable name/number
What am I doing wrong?
The error indicates that your code is actually trying to export to a SQLite database which is expected to fail if, in fact, the target is an Oracle database.
If I understand the documentation for dataframe.to_sql() correctly, it assumes an SQLite database as the target by default. So, in order to use Oracle as a database target, you'll have to make that explicit using SQLAlchemy as described in the documentation.
I currently have a Python dataframe that is 23 columns and 20,000 rows.
Using Python code, I want to write my data frame into a MSSQL server that I have the credentials for.
As a test I am able to successfully write some values into the table using the code below:
connection = pypyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=XXX;'
'uid=XXX;'
'pwd=XXX')
cursor = connection.cursor()
for index, row in df_EVENT5_15.iterrows():
cursor.execute("INSERT INTO MODREPORT(rowid, OPCODE, LOCATION, TRACKNAME)
cursor.execute("INSERT INTO MODREPORT(rowid, location) VALUES (?,?)", (5, 'test'))
connection.commit()
But how do I write all the rows in my data frame table to the MSSQL server? In order to do so, I need to code up the following steps in my Python environment:
Delete all the rows in the MSSQL server table
Write my dataframe to the server
When you say Python data frame, I'm assuming you're using a Pandas dataframe. If it's the case, then you could use the to_sql function.
df.to_sql("MODREPORT", connection, if_exists="replace")
The if_exists argument set to replace will delete all the rows in the existing table before writing the records.
I realise it's been a while since you asked but the easiest way to delete ALL the rows in the SQL server table (point 1 of the question) would be to send the command
TRUNCATE TABLE Tablename
This will drop all the data in the table but leave the table and indexes empty so you or the DBA would not need to recreate it. It also uses less of the transaction log when it runs.