Cannot Insert Pandas dataframe in to PGsql with Python - python

I am trying to use a pandas dataframe to insert data to sql. I am using pandas because there are some columns that I need to drop before I insert it into the SQL table.
The database is in the cloud, but that isn't the issue.
I've been able to create static strings, insert them in the the database & it works fine.
The database is postgres db, using the pg8000 driver.
In this example, I am pulling out one column & one value and trying to insert it in to the database.
connection = db_connection.connect()
for i, rowx in data.iterrows():
with connection as db_conn:
name_column = ['name']
name_value = [data.iloc[0]["name"]]
cols = "`,`".join([str(i) for i in name_column])
sql = "INSERT INTO person ('" + cols + "') VALUES ( " + " %s,"* ( len(name_value) - 1 ) + "%s" + " )"
db_conn.execute(sql, tuple(name_value))
The error I get is usually something related to the formatting of the cols.
Error: 'syntax error at or near "\'name\'"
variable cols:
(Pdb) cols
'name'
I guess it's upset that 'name' is a string but that seems odd.
variable sql:
"INSERT INTO persons ('name') VALUES ( %s )"
Not a fan of the string encapsulation, I got this from a guide:
https://www.dataquest.io/blog/sql-insert-tutorial/
Just looking for a reliable way to script this insert from pandas to pg.

IIUC, I think you can use sqlalchemy package with to_sql() to export pandas dataframe to the database table directly.
Please consider the code structure here
import sqlalchemy as sa
from sqlalchemy import create_engine
import psycopg2
user="username"
password="passwordgohere"
host="host.or.ip"
port=5432
dbname="your_db_name"
db_string = sa.engine.url.URL.create(
drivername="postgresql+psycopg2",
username=user,
password=password,
host=host,
port=port,
database=dbname,
)
db_engine = create_engine(db_string)
or you may use your pg8000 as your choice
import sqlalchemy as sa
from sqlalchemy import create_engine
import pg8000
user="username"
password="passwordgohere"
host="host.or.ip"
port=5432
dbname="your_db_name"
db_string = sa.engine.url.URL.create(
drivername="postgresql+pg8000",
username=user,
password=password,
host=host,
port=port,
database=dbname,
)
db_engine = create_engine(db_string)
And then you can export to the table like this (df is you pandas dataframe)
df.to_sql('your_table_name',con=db_engine, if_exists='replace', index=False, )
or if you would like to append, use if_exists='append'
df.to_sql('your_table_name',con=db_engine, if_exists='append', index=False, )

Related

Can't create a postgresql table using python

I am trying to create tables out of json files containing the field names and types of each table of a database downloaded from Bigquery. The SQL request semt fine to me and but no table was created according to psql command-line interpreter typing \d
So, to begin I've just tried with a simpler sql request that doesn't work neither,
Here is the code :
import pandas as pd
import psycopg2
# information used to create a database connection
sqluser = 'postgres'
dbname = 'testdb'
pwd = 'postgres'
# Connect to postgres database
con = psycopg2.connect(dbname=dbname, user=sqluser, password=pwd )
curs=con.cursor()
q="""set search_path to public,public ;
CREATE TABLE tab1(
i INTEGER
);
"""
curs.execute(q)
q = """
SELECT table_name
FROM information_schema.tables
WHERE table_schema='public'
AND table_type='BASE TABLE';
"""
df = pd.read_sql_query(q, con)
print(df.head())
print("End of test")
The code written above displays this new table tab1, but actually this new table doesn't appear listed when typing \d within the psql command line interpreter. If I type in the psql interpreter :
SELECT table_name
FROM information_schema.tables
WHERE table_type='BASE TABLE';
it doesn't get listed neither , seems it's not actually created, Thanks in advance for your help
There was a commit() call missing, that must be written after the table creation sql request,
This code works:
import pandas as pd
import psycopg2
# information used to create a database connection
sqluser = 'postgres'
dbname = 'testdb'
pwd = 'postgres'
# Connect to postgres database
con = psycopg2.connect(dbname=dbname, user=sqluser, password=pwd )
curs=con.cursor()
q="""set search_path to public,public ;
CREATE TABLE tab1(
i INTEGER
);
"""
curs.execute(q)
con.commit()
q = """
SELECT table_name
FROM information_schema.tables
WHERE table_schema='public'
AND table_type='BASE TABLE';
"""
df = pd.read_sql_query(q, con)
print(df.head())
print("End of test")

How do you select values from a SQL column in Python

I have a column called REQUIREDCOLUMNS in a SQL database which contains the columns which I need to select in my Python script below.
Excerpt of Current Code:
db = mongo_client.get_database(asqldb_row.SCHEMA_NAME)
coll = db.get_collection(asqldb_row.TABLE_NAME)
table = list(coll.find())
root = json_normalize(table)
The REQUIREDCOLUMNSin SQL contains values reportId, siteId, price, location
So instead of explicitly typing:
print(root[["reportId","siteId","price","location"]])
Is there a way to do print(root[REQUIREDCOLUMNS])?
Note: (I'm already connected to the SQL database in my python script)
You will have to use cursors if you are using mysql or pymysql , both the syntax are almost similar below i will mention for mysql
import mysql
import mysql.connector
db = mysql.connector.connect(
host = "localhost",
user = "root",
passwd = " ",
database = " "
)
cursor = db.cursor()
sql="select REQUIREDCOLUMNS from table_name"
cursor.execute(sql)
required_cols = cursor.fetchall()#this wll give ["reportId","siteId","price","location"]
cols_as_string=','.join(required_cols)
new_sql='select '+cols_as_string+' from table_name'
cursor.execute(new_sql)
result=cursor.fetchall()
This should probably work, i intentionally split many lines into several lines for understanding.
syntax could be slightly different for pymysql

Export a Dataframe into MSSQL Server as a new Table

I have written a Code to connect to a SQL Server with Python and save a Table from a database in a df.
from pptx import Presentation
import pyodbc
import pandas as pd
cnxn = pyodbc.connect("Driver={ODBC Driver 11 for SQL Server};"
"Server=Servername;"
"Database=Test_Database;"
"Trusted_Connection=yes;")
df = pd.read_sql_query('select * from Table1', cnxn)
Now I would like to modify df in Python and save it as df2. After that I would like to export df2 as a new Table (Table2) into the Database.
I cant find anything about exporting a dataframe to a SQL Server. you guys know how to do it?
You can use df.to_sql() for that. First create the SQLAlchemy connection, e.g.
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://scott:tiger#myhost:port/databasename?driver=SQL+Server+Native+Client+10.0")
See this answer for more details the connection string for MSSQL.
Then do:
df.to_sql('table_name', con=engine)
This defaults to raising an exception if the table already exists, adjust the if_exists parameter as necessary.
This is how I do it.
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=ServerName;'
r'DATABASE=DatabaseName;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))

How to execute pure SQL query in Python

I am trying to just create a temporary table in my SQL database, where I then want to insert data (from a Pandas DataFrame), and via this temporary table insert the data into a 'permanent' table within the database.
So far I have something like
""" Database specific... """
import sqlalchemy
from sqlalchemy.sql import text
dsn = 'dsn-sql-acc'
database = "MY_DATABASE"
connection_str = """
Driver={SQL Server Native Client 11.0};
Server=%s;
Database=%s;
Trusted_Connection=yes;
""" % (dsn,database)
connection_str_url = urllib.quote_plus(connection_str)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % connection_str_url, encoding='utf8', echo=True)
# Open connection
db_connection = engine.connect()
sql_create_table = text("""
IF OBJECT_ID('[MY_DATABASE].[SCHEMA_1].[TEMP_TABLE]', 'U') IS NOT NULL
DROP TABLE [MY_DATABASE].[SCHEMA_1].[TEMP_TABLE];
CREATE TABLE [MY_DATABASE].[SCHEMA_1].[TEMP_TABLE] (
[Date] Date,
[TYPE_ID] nvarchar(50),
[VALUE] nvarchar(50)
);
""")
db_connection.execute("commit")
db_connection.execute(sql_create_table)
db_connection.close()
The "raw" SQL-snippet within sql_create_table works fine when executed in SQL Server, but when running the above in Python, nothing happens in my database...
What seems to be the issue here?
Later on I would of course want to execute
BULK INSERT [MY_DATABASE].[SCHEMA_1].[TEMP_TABLE]
FROM '//temp_files/temp_file_data.csv'
WITH (FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR='\n');
in Python as well...
Thanks
These statements are out of order:
db_connection.execute("commit")
db_connection.execute(sql_create_table)
Commit after creating your table and your table will persist.

Python pandas to_sql 'append'

I am trying to send monthly data to a MySQL database using Python's pandas to_sql command. My program runs one month of data at a time and I want to append the new data onto the existing database. However, Python gives me an error:
_mysql_exceptions.OperationalError: (1050, "Table 'cps_basic_tabulation' already exists")
Here is my code for connecting and exporting:
conn = MySQLdb.connect(host = config.get('db', 'host'),
user = config.get('db', 'user'),
passwd = config.get('db', 'password'),
db = 'cps_raw')
combined.to_sql(name = "cps_raw.cps_basic_tabulation",
con = conn,
flavor = 'mysql',
if_exists = 'append')
I have also tried using:
from sqlalchemy import create_engine
Replacing conn = MySQLdb.connect... with:
engine = mysql+mysqldb://<user>:<password>#<host>[:<port>]/<dbname>
conn = engine.connect().connection
Any ideas on why I cannot append to a database?
Thanks!
Starting from pandas 0.14, you have to provide directly the sqlalchemy engine, and not the connection object:
engine = create_engine("mysql+mysqldb://<user>:<password>#<host>[:<port>]/<dbname>")
combined.to_sql("cps_raw.cps_basic_tabulation", engine, if_exists='append')
Since I had the same error message and stumbled across this post I leave this here for others to find.
I found two ways to solve the duplicated table creation although I lack the insight as to why this solves it:
Either pass the database name in the url when creating a connection
or pass the database name as a schema in pd.to_sql.
Doing both does not hurt. Also, a few years later it is (again?) possible to pass the pure connection to pandas. My guess would be that in the previous answer by joris the first of my solution cases might have implicitly solved the problem.
```
#create connection to MySQL DB via sqlalchemy & pymysql
user = credentials['user']
password = credentials['password']
port = credentials['port']
host = credentials['hostname']
dialect = 'mysql'
driver = 'pymysql'
db_name = 'test_db'
# setup SQLAlchemy
from sqlalchemy import create_engine
cnx = f'{dialect}+{driver}://{user}:{password}#{host}:{port}/'
engine = create_engine(cnx)
# create database
with engine.begin() as con:
con.execute(f"CREATE DATABASE {db_name}")
############################################################
# either pass the db_name vvvv - HERE- vvvv after creating a database
cnx = f'{dialect}+{driver}://{user}:{password}#{host}:{port}/{db_name}'
############################################################
engine = create_engine(cnx)
table = 'test_table'
col = 'test_col'
with engine.begin() as con:
# this would work here instead of creating a new engine with a new link
# con.execute(f"USE {db_name}")
con.execute(f"CREATE TABLE {table} ({col} CHAR(1));")
# insert into database
import pandas as pd
df = pd.DataFrame({col : ['a','b','c']})
with engine.begin() as con:
# this has no effect here
# con.execute(f"USE {db_name}")
df.to_sql(
name= table,
if_exists='append',
# passing con = cnx here would equally work
con=con,
############################################################
# or pass it as a schema vvvv - HERE - vvvv
#schema=db_name,
############################################################
index=False
)```
Tested with python version 3.8.13, sqlalchemy 1.4.32 and pandas 1.4.2.
Same problem might have appeared here and here.

Categories