Value error inserting into Postgres table with psycopg2 - python

I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?

You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.

Related

Insert data from pandas into sql db - keys doesn't fit columns

I have a database with around 10 columns. Sometimes I need to insert a row which has only 3 of the required columns, the rest are not in the dic.
The data to be inserted is a dictionary named row :
(this insert is to avoid duplicates)
row = {'keyword':'abc','name':'bds'.....}
df = pd.DataFrame([row]) # df looks good, I see columns and 1 row.
engine = getEngine()
connection = engine.connect()
df.to_sql('temp_insert_data_index', connection, if_exists ='replace',index=False)
result = connection.execute(('''
INSERT INTO {t} SELECT * FROM temp_insert_data_index
ON CONFLICT DO NOTHING''').format(t=table_name))
connection.close()
Problem : when I don't have all columns in the row(dic), it will insert dic fields by order (a 3 keys dic will be inserted to the first 3 columns) and not to the right columns. ( I expect the keys in dic to fit the db columns)
Why ?
Consider explicitly naming the columns to be inserted in INSERT INTO and SELECT clauses which is best practice for SQL append queries. Doing so, the dynamic query should work for all or subset of columns. Below uses F-string (available Python 3.6+) for all interpolation to larger SQL query:
# APPEND TO STAGING TEMP TABLE
df.to_sql('temp_insert_data_index', connection, if_exists='replace', index=False)
# STRING OF COMMA SEPARATED COLUMNS
cols = ", ".join(df.columns)
sql = (
f"INSERT INTO {table_name} ({cols}) "
f"SELECT {cols} FROM temp_insert_data_index "
"ON CONFLICT DO NOTHING"
)
result = connection.execute(sql)
connection.close()

use string as columns definition for DataFrame(cursor.fetchall(),columns

I would like to use a string as column names for pandas DataFrame.
The problem arised is that pandas DataFrame interpret the string var as single column instead of multiple ones. An thus the error:
ValueError: 1 columns passed, passed data had 11 columns
The first part of my code is intended to get the column names from the Mysql database I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" :
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = pd.DataFrame(cursor.fetchall(),columns=[colsTable])
#tabla = exec("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
#tabla = pd.DataFrame(cursor.fetchall())
I have tried ather aproaches like the use of exec(). In that case, there is no error but there is no response with information either, and the result of print(tabla) is None.
¿Is there any direct way of passing the columns dynamically as string to a python pandas DataFrame?
Thanks in advance
I am going to answer my question since I've already found the way.
The first part of my code is intended to get the column names from the Mysql database table I am about to query:
cursor1.execute ("SELECT GROUP_CONCAT(COLUMN_NAME) AS cols FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'or_red' AND TABLE_NAME = 'nomen_prefix'")
for colsTableMysql in cursor1.fetchall() :
colsTable = colsTableMysql[0]
colsTable="'"+colsTable.replace(",", "','")+"'"
The second part uses the created variable "colsTable" as input in the statement to define the columns.
cursor = connection.cursor()
cursor.execute("SELECT * FROM or_red.nomen_prefix WHERE C_emp IN ("+emplazamientos+")")
tabla = eval("pd.DataFrame(cursor.fetchall(),columns=["+colsTable+"])")
Using eval the string is parsed and evaluated as a Python expression.

Python saving data into PostgreSQL: array value error

I am trying to learn how to save dataframe created in pandas into postgresql db (hosted on Azure). I planned to start with simple dummy data:
data = {'a': ['x', 'y'],
'b': ['z', 'p'],
'c': [3, 5]
}
df = pd.DataFrame (data, columns = ['a','b','c'])
I found a function that pushed df data into psql table. It starts with defining connection:
def connect(params_dic):
""" Connect to the PostgreSQL database server """
conn = None
try:
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**params_dic)
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sys.exit(1)
print("Connection successful")
return conn
conn = connect(param_dic)
*param_dic contains all connection details (user/pass/host/db)
Once connection is established then I'm defining execute function:
def execute_many(conn, df, table):
"""
Using cursor.executemany() to insert the dataframe
"""
# Create a list of tupples from the dataframe values
tuples = [tuple(x) for x in df.to_numpy()]
# Comma-separated dataframe columns
cols = ','.join(list(df.columns))
# SQL quert to execute
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
cursor = conn.cursor()
try:
cursor.executemany(query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
cursor.close()
return 1
print("execute_many() done")
cursor.close()
I executed this function to a psql table that I created in the DB:
execute_many(conn,df,"raw_data.test")
The table raw_data.test consists of columns a(char[]), b(char[]), c(numeric).
When I run the code I get following information in the console:
Connecting to the PostgreSQL database...
Connection successful
Error: malformed array literal: "x"
LINE 1: INSERT INTO raw_data.test(a,b,c) VALUES('x','z',3)
^
DETAIL: Array value must start with "{" or dimension information.
I don't know how to interpret it because none of the columns in df are array
df.dtypes
Out[185]:
a object
b object
c int64
dtype: object
Any ideas what goes wrong there or suggestions how to maybe save df in pSQL in a simpler manner? I found quite a lot of solutions that use sqlalchemy with creating connection string in following way:
conn_string = 'postgres://user:password#host/database'
But I am not sure if that works on cloud db- if I try to edit such connection string with azure host details it does not work.
The usual data type for strings in PostgreSQL is TEXT or VARCHAR(n) or CHAR(n), with round brackets; not CHAR[] with square brackets.
I'm guessing that you want the column to contain a string and that CHAR[] was a typo; in that case, you'll need to recreate (or migrate) the table column to the correct type - most likely TEXT.
(You might use CHAR(n) for fixed-length data, if it's genuinely fixed-length; VARCHAR(n) is mostly of historical interest. In most cases, use TEXT.)
Alternately, if you do mean to make the column an array, you'll need to pass a list in that position from Python.
Consider adjusting your parameterization approach as psycopg2 supports a more optimal approach to format identifiers in SQL statements like table or column names.
In fact, docs indicate your current approach is not optimal and poses a security risk:
# This works, but it is not optimal
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
Instead use psycop2.sql module:
from psycopg2 import sql
...
query = (
sql.SQL("insert into {} values (%s, %s, %s)")
.format(sql.Identifier('table'))
)
...
cur.executemany(query, tuples)
Also, for best practice in SQL always include column names in append queries and do not rely on column order of stored table:
query = (
sql.SQL("insert into {0} ({1}, {2}, {3}) values (%s, %s, %s)")
.format(
sql.Identifier('table'),
sql.Identifier('col1'),
sql.Identifier('col2'),
sql.Identifier('col3')
)
)
Finally, discontinue using % for string formatting across all your Python code (not just psycopg2). As of Python 3, this method has been de-emphasized but not deprecated yet! Instead, use str.format (Python 2.6+) or F-string (Python 3.6+).

Syntax error when inserting into 2003 MDB file with Pyodbc

I am trying to write 2003 mdb files from scratch. I already have a file with the tables and column names (I have 112 columns). In my attempt I read lines from a pandas DataFrame (named sections in my code) and append those lines to the mdb file. But, when using the pyodbc INSERT INTO syntax it gave me this error:
ProgrammingError: ('42000', "[42000] [Microsoft][Driver ODBC Microsoft Access] Expression syntax error 'Equatorial-TB-BG-CA_IRI-1.0_SNP-1.0_ACA-0_ESAL-1000'. (-3100) (SQLExecDirectW)")
here is my code:
for k in range(len(sections)):
cols = tuple(list(sections.columns))
vals = tuple(list(sections.iloc[k]))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
cursor.execute(action)
conn.commit()
Does anyone know why I am having this kind of problem?
Actually, this is not an Access specific error but a general SQL error where your string literals are not properly enclosed with quotes. Therefore, the Access engine assumes they are named fields further complicated by the hyphens where engine assumes you are running a subtraction expression.
To demonstrate the issue, see below filling in for your unknown values. Notice the string items passed in VALUES are not quoted:
sections_columns = ['database', 'tool']
cols = tuple(list(sections_columns))
sections_vals = ['ms-access', 'pandas']
vals = tuple(list(sections_vals))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES (ms-access, pandas)
Now, you could leave in the single quotes you replace in str(vals):
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES ('ms-access', 'pandas')
But even better, consider parameterizing the query with qmark placeholders and passing the values as parameters (second argument of cursor.execute(query, params)). This avoids any need to quote or unquote string or numeric values:
# MOVED OUTSIDE LOOP AS UNCHANGING OBJECTS
cols = tuple(sections.columns) # REMOVED UNNEEDED list()
qmarks = tuple(['?' for i in cols]) # NEW OBJECT
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# INSERT INTO SECTIONS (col1, col2, col3, ...) VALUES (?, ?, ?...)
for k in range(len(sections)):
vals = list(sections.iloc[k]) # REMOVED tuple()
cursor.execute(action, vals) # EXECUTE PARAMETERIZED QUERY
conn.commit()
Even much better, avoid any looping with executemany of DataFrame.values.tolist() using a prepared statement:
# PREPARED STATEMENT
cols = tuple(sections.columns)
qmarks = tuple(['?' for i in cols])
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(action, sections.values.tolist())
conn.commit()

Using instance variables in SQLite3 update?

Ok so basically I'm trying to update an existing SQLite3 Database with instance variables (typ and lvl)
#Set variables
typ = 'Test'
lvl = 6
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
#Update Info
sql = """
UPDATE fieldmap
SET buildtype = typ, buildlevel = lvl
WHERE rowid = 11
"""
cursor.execute(sql)
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
As an Error I'm getting
sqlite3.OperationalError: no such column: typ
Now I basically know the problem is that my variable is inserted with the wrong syntax but I can not for the life of me find the correct one. It works with strings and ints just fine like this:
sql = """
UPDATE fieldmap
SET buildtype = 'house', buildlevel = 3
WHERE rowid = 11
"""
But as soon as I switch to the variables it throws the error.
Your query is not actually inserting the values of the variables typ and lvl into the query string. As written the query is trying to reference columns named typ and lvl, but these don't exist in the table.
Try writing is as a parameterised query:
sql = """
UPDATE fieldmap
SET buildtype = ?, buildlevel = ?
WHERE rowid = 11
"""
cursor.execute(sql, (typ, lvl))
The ? acts as a placeholder in the query string which is replaced by the values in the tuple passed to execute(). This is a secure way to construct the query and avoids SQL injection vulnerabilities.
Hey I think you should use ORM to manipulate with SQL database.
SQLAlchemy is your friend. I use that with SQLite, MySQL, PostgreSQL. It is fantastic.
That can make you get away from this syntax error since SQL does take commas and quotation marks as importance.
For hard coding, you may try this:
sql = """
UPDATE fieldmap
SET buildtype = '%s', buildlevel = 3
WHERE rowid = 11
""" % (house)
This can solve your problem temporarily but not for the long run. ORM is your friend.
Hope this could be helpful!

Categories