Syntax error when inserting into 2003 MDB file with Pyodbc - python

I am trying to write 2003 mdb files from scratch. I already have a file with the tables and column names (I have 112 columns). In my attempt I read lines from a pandas DataFrame (named sections in my code) and append those lines to the mdb file. But, when using the pyodbc INSERT INTO syntax it gave me this error:
ProgrammingError: ('42000', "[42000] [Microsoft][Driver ODBC Microsoft Access] Expression syntax error 'Equatorial-TB-BG-CA_IRI-1.0_SNP-1.0_ACA-0_ESAL-1000'. (-3100) (SQLExecDirectW)")
here is my code:
for k in range(len(sections)):
cols = tuple(list(sections.columns))
vals = tuple(list(sections.iloc[k]))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
cursor.execute(action)
conn.commit()
Does anyone know why I am having this kind of problem?

Actually, this is not an Access specific error but a general SQL error where your string literals are not properly enclosed with quotes. Therefore, the Access engine assumes they are named fields further complicated by the hyphens where engine assumes you are running a subtraction expression.
To demonstrate the issue, see below filling in for your unknown values. Notice the string items passed in VALUES are not quoted:
sections_columns = ['database', 'tool']
cols = tuple(list(sections_columns))
sections_vals = ['ms-access', 'pandas']
vals = tuple(list(sections_vals))
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals).replace("'",""))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES (ms-access, pandas)
Now, you could leave in the single quotes you replace in str(vals):
action = 'INSERT INTO SECTIONS {columns} VALUES {values}'.\
format(columns = str(cols).replace("'",""), values = str(vals))
print(action)
# INSERT INTO SECTIONS (database, tool) VALUES ('ms-access', 'pandas')
But even better, consider parameterizing the query with qmark placeholders and passing the values as parameters (second argument of cursor.execute(query, params)). This avoids any need to quote or unquote string or numeric values:
# MOVED OUTSIDE LOOP AS UNCHANGING OBJECTS
cols = tuple(sections.columns) # REMOVED UNNEEDED list()
qmarks = tuple(['?' for i in cols]) # NEW OBJECT
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# INSERT INTO SECTIONS (col1, col2, col3, ...) VALUES (?, ?, ?...)
for k in range(len(sections)):
vals = list(sections.iloc[k]) # REMOVED tuple()
cursor.execute(action, vals) # EXECUTE PARAMETERIZED QUERY
conn.commit()
Even much better, avoid any looping with executemany of DataFrame.values.tolist() using a prepared statement:
# PREPARED STATEMENT
cols = tuple(sections.columns)
qmarks = tuple(['?' for i in cols])
action = 'INSERT INTO SECTIONS {columns} VALUES {qmarks}'.\
format(columns = str(cols).replace("'",""), qmarks = str(qmarks))
# EXECUTE PARAMETERIZED QUERY
cursor.executemany(action, sections.values.tolist())
conn.commit()

Related

use python variable to read specific rows from access table using sqlalchemy

I have an access table called "Cell_list" with a key column called "Cell_#". I want to read the table into a dataframe, but only the rows that match indices which are specified in a python list "cell_numbers".
I tried several variations on:
import pyodbc
import pandas as pd
cell_numbers = [1,3,7]
cnn_str = r'Driver={Microsoft Access Driver (*.mdb,*.accdb)};DBQ=C:\folder\myfile.accdb;'
conn = pyodbc.connect(cnn_str)
query = ('SELECT * FROM Cell_list WHERE Cell_# in '+tuple(cell_numbers))
df = pd.read_sql(query, conn)
But no matter what I try I get a syntax error.
How do I do this?
Consider best practice of parameterization which is supported in pandas.read_sql:
# PREPARED STATEMENT, NO DATA
query = (
'SELECT * FROM Cell_list '
'WHERE [Cell_#] IN (?, ?, ?)'
)
# RUN SQL WITH BINDED PARAMS
df = pd.read_sql(query, conn, params=cell_numbers)
Consider even dynamic qmark placeholders dependent on length of cell_numbers:
qmarks = [', '.join('?' for _ in cell_numbers)]
query = (
'SELECT * FROM Cell_list '
f'WHERE [Cell_#] IN ({qmarks})'
)
Convert (join) cell_numbers to text:
cell_text = '(1,3,7)'
and concatenate this.
The finished SQL should read (you may need brackets around the weird field name Cell_#):
SELECT * FROM Cell_list WHERE [Cell_#] IN (1,3,7)

Insert data from pandas into sql db - keys doesn't fit columns

I have a database with around 10 columns. Sometimes I need to insert a row which has only 3 of the required columns, the rest are not in the dic.
The data to be inserted is a dictionary named row :
(this insert is to avoid duplicates)
row = {'keyword':'abc','name':'bds'.....}
df = pd.DataFrame([row]) # df looks good, I see columns and 1 row.
engine = getEngine()
connection = engine.connect()
df.to_sql('temp_insert_data_index', connection, if_exists ='replace',index=False)
result = connection.execute(('''
INSERT INTO {t} SELECT * FROM temp_insert_data_index
ON CONFLICT DO NOTHING''').format(t=table_name))
connection.close()
Problem : when I don't have all columns in the row(dic), it will insert dic fields by order (a 3 keys dic will be inserted to the first 3 columns) and not to the right columns. ( I expect the keys in dic to fit the db columns)
Why ?
Consider explicitly naming the columns to be inserted in INSERT INTO and SELECT clauses which is best practice for SQL append queries. Doing so, the dynamic query should work for all or subset of columns. Below uses F-string (available Python 3.6+) for all interpolation to larger SQL query:
# APPEND TO STAGING TEMP TABLE
df.to_sql('temp_insert_data_index', connection, if_exists='replace', index=False)
# STRING OF COMMA SEPARATED COLUMNS
cols = ", ".join(df.columns)
sql = (
f"INSERT INTO {table_name} ({cols}) "
f"SELECT {cols} FROM temp_insert_data_index "
"ON CONFLICT DO NOTHING"
)
result = connection.execute(sql)
connection.close()

Value error inserting into Postgres table with psycopg2

I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?
You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.

Pandas read_sql_query using multiple AND statements

I am trying to put together a SQL query in python pandas. I have attempted different methods, but always getting the following error:
Incorrect number of bindings supplied. The current statement uses 6, and there are 3 supplied.
My code is as follows. What am I doing wrong?
conn = sqlite3.connect(my_db)
df = pd.read_sql_query(
con = conn,
sql = """SELECT * FROM {table}
WHERE #var1 = (?)
AND #var2 = (?)
AND #var3 = (?)
;""".format(table=table),
params= (value1, value2, value3),
)
As you've said, #var1, #var2 and #var3 are all column names.
However, SQL interprets the # symbol as a parameter for a value which you will supply later in your code.
So, your SQL code expects 6 values because of the three (?)s and three #vars. But you're only supplying 3 values (for the (?)s), meaning that the said error is occurring.
I would recommend naming your columns something without '#' so that there is less chance for errors.
See this question for further clarification.
sqlite interprets the # symbol, like ?, as a parameter placeholder (Search for "Parameters"). If #var1 is the name of a column, then it must escaped by surrounding it with backticks:
df = pd.read_sql_query(
con = conn,
sql = """SELECT * FROM {table}
WHERE `#var1` = (?)
AND `#var2` = (?)
AND `#var3` = (?)""".format(table=table),
params= (value1, value2, value3), )
I agree with #AdiC, though -- it would be more convenient to rename your columns so that they do not use characters with special meaning.

Merge tables from two different databases - sqlite3/Python

I have two different SQLite databases XXX and YYY.
XXX contains table A and YYY contains B respectively.
A and B have same structure(columns).
How to append the rows of B in A in Python - SQLite API.
After appending A contains rows of A and rows of B.
You first get a connection to the database using sqlite3.connect, then create a cursor so you can execute sql. Once you have a cursor, you can execute arbitrary sql commands.
Example:
import sqlite3
# Get connections to the databases
db_a = sqlite3.connect('database_a.db')
db_b = sqlite3.connect('database_b.db')
# Get the contents of a table
b_cursor = db_b.cursor()
b_cursor.execute('SELECT * FROM mytable')
output = b_cursor.fetchall() # Returns the results as a list.
# Insert those contents into another table.
a_cursor = db_a.cursor()
for row in output:
a_cursor.execute('INSERT INTO myothertable VALUES (?, ?, ...etc..., ?, ?)', row)
# Cleanup
db_a.commit()
a_cursor.close()
b_cursor.close()
Caveat: I haven't actually tested this, so it might have a few bugs in it, but the basic idea is sound, I think.
This is a generalized function and should be customized to your particular environment. To do this, you may structure the "dynamically determine SQL expression requirements" section with the static SQL parameters (rather than PRAGMA table_info). This should improve performance.
import sqlite3
def merge_tables(cursor_new: sqlite3.Cursor, cursor_old: sqlite3.Cursor, table_name: str, del_old_table: bool = False) -> None:
'''
This function merges the content of a specific table from an old cursor into a new cursor.
:param cursor_new: [sqlite3.Cursor] the primary cursor
:param cursor_old: [sqlite3.Cursor] the secondary cursor
:param table_name: [str] the name of the table
:return: None
'''
# dynamically determine SQL expression requirements
column_names = cursor_new.execute(f"PRAGMA table_info({table_name})").fetchall()
column_names = tuple([x[1] for x in column_names][1:]) # remove the primary keyword
values_placeholders = ', '.join(['?' for x in column_names]) # format appropriately
# SQL select columns from table
data = cursor_old.execute(f"SELECT {', '.join(column_names)} FROM {table_name}").fetchall()
# insert the data into the primary cursor
cursor_new.executemany(f"INSERT INTO {table_name} {column_names} VALUES ({values_placeholders})", data)
if (cursor_new.connection.commit() == None):
# With Ephemeral RAM connections & testing, deleting the table may be ill-advised
if del_old_table:
cursor_old.execute(f"DELETE FROM {table_name}") # cursor_old.execute(f'DROP TABLE {table_name}')
cursor_old.connection.commit()
print(f"Table {table_name} merged from {cursor_old.connection} to {cursor_new.connection}") # Consider logging.info()
return None

Categories