Problem
I'm using PyMYSQL to query a database using the following SQL translater function.
def retrieve_column(lotkey, column="active",print = False):
result = None
try:
connection = sql_connect()
with connection.cursor() as cursor:
# Create a new record
sql = "SELECT %s FROM table_n"
val = (column)
os_print(sql + '\r\n...', end='', style='dim', flush=True)
cursor.execute(sql, val)
result = cursor.fetchall()
connection.close()
except Exception as e:
os_print(e, style='error')
os_print("ERROR: Can't connect to database!", style='error', tts='error')
return result
Which I call and print using the following lines. Note: The 'active' column is boolean.
active_col = retrieve_column(key)
print(active_col)
Which prints the following bizarre result. It seems to be a dictionary with no values present therein.
...[{'active': 'active'}, {'active': 'active'}, {'active': 'active'}, {'active': 'active'}, {'active': 'active'}, {'active': 'active'}, {'active': 'active'}]
Attempted Solutions
My first step was to run the same query in MySQL workbench which produced the following result.
Workbench Query
Which is roughly what I am trying to replicate in Python (Getting a dictionary with each row's boolean value).
Next, I used the python debugger and found that indeed the returned values from cursor.fetchall() are empty dictionaries with nothing but a single key and no values.
Has anyone encountered something similar before?
Actually using these three instructions:
sql = "SELECT %s FROM table_n"
val = (column)
cursor.execute(sql, val)
You will get the following query executed:
SELECT 'column' FROM table_n
The result is a list of 'column' values (name of the column is also 'column'). Because the parameters of the cursor.execute() method are not literals, but parameter values (in this case, a string value 'column')
If you are trying to select the column value, you need to format the SQL query content, not the paramaters:
sql = "SELECT {colname} FROM table_n"
sql.format( colname = column )
cursor.execute(sql)
Column names cannot be passed to cursor the same way argument values can be passed. For that you do actually need to format the query string.
sql = "SELECT {} FROM table_n".format(column)
cursor.execute(sql)
Related
I have dictionaries like this:
{'id': 8, 'name': 'xyzzy', 'done': False}
the table is already created with the correct column names (keys of the dictionary). How can I insert the values in the respective columns? I want to create a new row for each dictionary.
Note that for 'done' the type defined is originally Integer since sqlite does not offer bool type.
cur = connection().cursor()
query = "insert .... tablename"
In Python, database cursors accept two parameters:
an SQL statement as a string: the statement may contain placeholders instead of some values to handle cases where the values are not known until runtime.
a collection of values to be inserted into the SQL statement. These values replace the placeholders in the SQL statement when it is executed.
Placeholders may be positional or named:
# Positional placeholders: the order of values should match the order of
# placeholders in the statement. Values should be contained with
# a tuple or list, even if there is only one.
cur.execute("""SELECT * FROM tbl WHERE name = ? AND age = ?""", ('Alice', 42))
# Named placeholders: values and placeholders are matched by name, order
# is irrelevant. Values must be contained within a mapping (dict) of
# placeholders to values.
cur.execute(
"""SELECT * FROM tbl WHERE name = :name AND age = :age""",
{'age': 42, 'name': 'Alice'}
)
You can dictionary to cursor execute and it will do the right thing as long as the values placeholders in the SQL statement used the :named format (that is, the dict key prefixed by a colon ":").
conn = sqlite3.connect()
cur = conn.cursor()
stmt = """INSERT INTO mytable (id, name, done) VALUES (:id, :name, :done)"""
cur.execute(stmt, {'id': 8, 'name': 'xyzzy', 'done': False})
# Call commit() on the connection to "save" the data.
conn.commit()
This method ensures that values are correctly quoted before being inserted into the database and protects against SQL injection attacks.
See also the docs
You could use .format() method to insert into a query string however this is much more straightforward.
dic = {'id': 8, 'name': 'xyzzy', 'done': False}
cur.execute("INSERT INTO tablename VALUES (:id,:name,:done)",{"id" : dic["id"],"name" : dic["name"],"done" : dic["done"]})
I am trying to learn how to save dataframe created in pandas into postgresql db (hosted on Azure). I planned to start with simple dummy data:
data = {'a': ['x', 'y'],
'b': ['z', 'p'],
'c': [3, 5]
}
df = pd.DataFrame (data, columns = ['a','b','c'])
I found a function that pushed df data into psql table. It starts with defining connection:
def connect(params_dic):
""" Connect to the PostgreSQL database server """
conn = None
try:
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**params_dic)
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sys.exit(1)
print("Connection successful")
return conn
conn = connect(param_dic)
*param_dic contains all connection details (user/pass/host/db)
Once connection is established then I'm defining execute function:
def execute_many(conn, df, table):
"""
Using cursor.executemany() to insert the dataframe
"""
# Create a list of tupples from the dataframe values
tuples = [tuple(x) for x in df.to_numpy()]
# Comma-separated dataframe columns
cols = ','.join(list(df.columns))
# SQL quert to execute
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
cursor = conn.cursor()
try:
cursor.executemany(query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
cursor.close()
return 1
print("execute_many() done")
cursor.close()
I executed this function to a psql table that I created in the DB:
execute_many(conn,df,"raw_data.test")
The table raw_data.test consists of columns a(char[]), b(char[]), c(numeric).
When I run the code I get following information in the console:
Connecting to the PostgreSQL database...
Connection successful
Error: malformed array literal: "x"
LINE 1: INSERT INTO raw_data.test(a,b,c) VALUES('x','z',3)
^
DETAIL: Array value must start with "{" or dimension information.
I don't know how to interpret it because none of the columns in df are array
df.dtypes
Out[185]:
a object
b object
c int64
dtype: object
Any ideas what goes wrong there or suggestions how to maybe save df in pSQL in a simpler manner? I found quite a lot of solutions that use sqlalchemy with creating connection string in following way:
conn_string = 'postgres://user:password#host/database'
But I am not sure if that works on cloud db- if I try to edit such connection string with azure host details it does not work.
The usual data type for strings in PostgreSQL is TEXT or VARCHAR(n) or CHAR(n), with round brackets; not CHAR[] with square brackets.
I'm guessing that you want the column to contain a string and that CHAR[] was a typo; in that case, you'll need to recreate (or migrate) the table column to the correct type - most likely TEXT.
(You might use CHAR(n) for fixed-length data, if it's genuinely fixed-length; VARCHAR(n) is mostly of historical interest. In most cases, use TEXT.)
Alternately, if you do mean to make the column an array, you'll need to pass a list in that position from Python.
Consider adjusting your parameterization approach as psycopg2 supports a more optimal approach to format identifiers in SQL statements like table or column names.
In fact, docs indicate your current approach is not optimal and poses a security risk:
# This works, but it is not optimal
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
Instead use psycop2.sql module:
from psycopg2 import sql
...
query = (
sql.SQL("insert into {} values (%s, %s, %s)")
.format(sql.Identifier('table'))
)
...
cur.executemany(query, tuples)
Also, for best practice in SQL always include column names in append queries and do not rely on column order of stored table:
query = (
sql.SQL("insert into {0} ({1}, {2}, {3}) values (%s, %s, %s)")
.format(
sql.Identifier('table'),
sql.Identifier('col1'),
sql.Identifier('col2'),
sql.Identifier('col3')
)
)
Finally, discontinue using % for string formatting across all your Python code (not just psycopg2). As of Python 3, this method has been de-emphasized but not deprecated yet! Instead, use str.format (Python 2.6+) or F-string (Python 3.6+).
I was playing around with SQLalchemy and Microsoft SQL Server to get a hang of the functions when I came across a strange behavior. I was taught that the attribute rowcount on the result proxy object will tell how many rows were effected by executing a statement. However, when I select or insert single or multiple rows in my test database, I always get -1. How could this be and how can I fix this to reflect the reality?
connection = engine.connect()
metadata = MetaData()
# Ex1: select statement for all values
student = Table('student', metadata, autoload=True, autoload_with=engine)
stmt = select([student])
result_proxy = connection.execute(stmt)
results = result_proxy.fetchall()
print(result_proxy.rowcount)
# Ex2: inserting single values
stmt = insert(student).values(firstname='Severus', lastname='Snape')
result_proxy = connection.execute(stmt)
print(result_proxy.rowcout)
# Ex3: inserting multiple values
stmt = insert(student)
values_list = [{'firstname': 'Rubius', 'lastname': 'Hagrid'},
{'firstname': 'Minerva', 'lastname': 'McGonogall'}]
result_proxy = connection.execute(stmt, values_list)
print(result_proxy.rowcount)
The print function for each block seperately run example code prints -1. The Ex1 successfully fetches all rows and both insert statements successfully write the data to the database.
According to the following issue, the rowcount attribute isn't always to be trusted. Is that true here as well? And when, how can I compensate with a Count statement in a SQLalcehmy transaction?
PDO::rowCount() returning -1
The single-row INSERT … VALUES ( … ) is trivial: If the statement succeeds then one row was affected, and if it fails (throws an error) then zero rows were affected.
For a multi-row INSERT simply perform it inside a transaction and rollback if an error occurs. Then the number of rows affected will either be zero or len(values_list).
To get the number of rows that a SELECT will return, wrap the select query in a SELECT count(*) query and run that first, for example:
select_stmt = sa.select([Parent])
count_stmt = sa.select([sa.func.count(sa.text("*"))]).select_from(
select_stmt.alias("s")
)
with engine.connect() as conn:
conn.execution_options(isolation_level="SERIALIZABLE")
rows_found = conn.execute(count_stmt).scalar()
print(f"{rows_found} row(s) found")
results = conn.execute(select_stmt).fetchall()
for item in results:
print(item.id)
I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?
You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.
I'm trying to dynamically bind the variable value to be inserted into database table column.
Example variable value in json:
document= {'zipCode': '99999',
'name': 'tester',
'company': 'xxxx'}
And my database table column as:
table name: table1
column: id,zip_code,name,company
My code in python:
with connection.cursor() as cursor:
sql = "INSERT INTO table1(zip_code, name, company) VALUES (%s,%s,%s)"
cursor.execute(sql,(document['zipCode'],
document['name'],
document['company']))
connection.commit()
However, if one of the key-value in document is absent, definitely the INSERT query will encounter error. i.e. ONLY document['name'] exist in document variable
Any thought to handle this for efficient code ?
This is something that, generally, ORMs like SQLAlchemy or Peewee solve pretty easily for you.
But, if I were to implement, I would probably do something "dynamic" based on the available keys:
QUERY = "INSERT INTO table1({columns}) VALUES ({values})"
def get_query(document):
columns = list(document.keys())
return QUERY.format(columns=", ".join(columns),
values=", ".join('%({})s'.format(column) for column in columns))
Sample usage:
In [12]: get_query({'zipCode': '99999', 'name': 'tester', 'company': 'xxxx'})
Out[12]: 'INSERT INTO table1(company, zipCode, name) VALUES (%(company)s, %(zipCode)s, %(name)s)'
In [13]: get_query({'name': 'tester'})
Out[13]: 'INSERT INTO table1(name) VALUES (%(name)s)'
Then, you would just parameterize the query with the document dictionary as we've created named placeholders in the query:
cursor.execute(get_query(document), document)