I have a Pandas DataFrame with many columns out of which one column is 'value' containing an HTML web page. I am doing an Upsert query for each row of the DataFrame but I am getting the following error:
I have tried escaping the HTML with the following methods:
df.value = df.value.apply(lambda x: re.escape(x))
df.value = df.value.apply(lambda x: MySQLdb.escape_string(x))
Here is my function:
non_key_cols = df.columns.tolist()
non_key_cols.remove(primary_key)
# df.value = df.value.apply(lambda x: re.escape(x))
df.value = df.value.apply(lambda x: MySQLdb.escape_string(x))
enclose_with_quote = [True if type_name.name=='object' else False for type_name in df.dtypes]
all_cols = df.columns.tolist()
#enclose df columns in inverted commas
for i in range(len(enclose_with_quote)):
if enclose_with_quote[i]:
df[all_cols[i]] = df[all_cols[i]].apply(lambda x: '"' + x + '"')
else:
df[all_cols[i]] = df[all_cols[i]].apply(lambda x: str(x))
sql = "INSERT INTO " \
+ tablename \
+ "(" + ", ".join([col for col in df.columns]) + ")" \
+ " VALUES " \
+ ", ".join(["(" + ", ".join(list(row)) + ")" for row in df.itertuples(index=False, name=None)]) \
+ " ON CONFLICT (" + primary_key + ") DO UPDATE SET " \
+ ", ".join([col + "=EXCLUDED." + col for col in non_key_cols])
conn = _getpostgres_connection()
cur = conn.cursor()
cur.execute(sql)
cur.close()
conn.commit()
conn.close()
This is the error I get:
ProgrammingError: syntax error at or near "margin" LINE 1:
...t_of_nums_not_in_table_regex) VALUES ("<p style=\"margin: 0p...
The issue that you write string within doubles quotes. In Postgres, doubles quotes means a column/table name. You would have to use single quotes for strings.
if enclose_with_quote[i]:
df[all_cols[i]] = df[all_cols[i]].apply(lambda x: "'" + x + "'")
That being said, you will then get an error if your string contains single quotes. The safest - and simplest - way is to use a parameterized query, which will handle the quote escaping by itself. Else have a look at this post for using a custom string delimiter.
Related
I have multiple tables that are updated after a value is changed in a grid. These tables don't always have the same keys or columns so I cannot explicitly name the columns or formats. The only thing that is ever the same, is the column where the keys reside. I know the way I am currently doing this is not correct and leaves me open to injection attacks.
I also ran into an issue where some of the values contain keys that throw an error in the SQL statement. For example, updating WHERE email = t'est#email.com.
I am not really sure of the proper way to write these statements. I did some research and see multiple methods for different purposes but am not sure which is proper. I am looking to do this as dynamically as possible. Can anyone point me in the right direction?
To connect:
import mysql.connector as sql
import MySQLdb
#Connect
self.db_name = 'database'
self.server = 'server'
self.user_id = 'user'
self.pw = 'password'
try:
self.db_con = MySQLdb.connect(user=self.user_id,password=self.pw,database=self.db_name,
host=self.server,charset='utf8',autocommit=True)
self.cursor = self.db_con.cursor()
except:
print("Error connecting")
SQL Statements:
key_id = str("'") + self.GetCellValue(event.GetRow(),1) + str("'")
target_col = self.GetColLabelValue(event.GetCol())
key_col = self.GetColLabelValue(1)
nVal = str("'") + self.GetCellValue(event.GetRow(),event.GetCol()) + str("'")
#SQL STATEMENTS
sql_update = "UPDATE " + tbl + " SET " + target_col + " = " + nVal + " WHERE " + key_col + " = " + key_id + ""
#INSERT
sql_update = ("INSERT INTO " + str(self.tbl) + "(" + self.key_col + ")" + "VALUES (" + str("'") + str(val) + str("'") + ")")
#DELETE
sql_update = "DELETE FROM " + tbl + " WHERE " + self.key_col + " = " + self.key_id + ""
#SELECT
sql_query = "SELECT * FROM " + self.tbl
#Excecute
try:
self.cursor.execute(sql_update)
except:
print('Error')
self.db_con.rollback()
Databases have different notations for "quoting" identifiers (table and column names etc) and values (data).
MySQL uses backticks to quote identifiers. For values, it's best to use the parameter substitution mechanism provided by the connector package: it's more likely to handle tricky cases like embedded quotes correctly, and will reduce the risk of SQL injection.
Here's an example for inserts; the same techniques can be used for the other types of query.
key_id = str("'") + self.GetCellValue(event.GetRow(),1) + str("'")
target_col = self.GetColLabelValue(event.GetCol())
key_col = self.GetColLabelValue(1)
nVal = str("'") + self.GetCellValue(event.GetRow(),event.GetCol()) + str("'")
#INSERT (using f-strings for brevity)
sql_update = (f"INSERT INTO `{self.tbl}` (`{self.key_col}`) VALUES (%s)")
# Pass the statement and values to cursor.execute.
# The values are assumed to be a sequence, so a single value should be
# placed in a tuple or list.
self.cursor.execute(sql_update, (nVal,))
If you have more than one column / value pair you could do something like this:
cols = ['A', 'B', 'C']
vals = ['a', 'b', 'c']
col_names = ','.join([f'`{c}`' for c in cols])
values_placeholder = ','.join(['%s'] * len(cols))
sql_update = (f"INSERT INTO `{self.tbl}` (col_names) VALUES ({values_placeholder})")
self.cursor.execute(sql_update, vals)
Values are not only data for insertion, but also data that we are using for comparison, for example in WHERE clauses. So an update statement with a filter might be created like this:
sql_update = (f"UPDATE `{tbl}` SET (`{target_col}`) = (%s) WHERE (`{key_col}`) = %s")
self.cursor.execute(sql_update, (nVal, key_id))
However sometimes the target of a SET or WHERE clause may be a column, for example we want to do an update based on other values in the row. For example, this statement will set target_col to the value of other_col for all rows where key_col is equal to other_key_col:
sql_update = (f"UPDATE `{tbl}` SET (`{target_col}`) = `{other_col}` WHERE (`{key_col}`) = `{other_key_col}`")
self.cursor.execute(sql_update)
I have multiple tables that are updated after a value is changed in a grid. These tables don't always have the same keys or columns so I cannot explicitly name the columns or formats. The only thing that is ever the same, is the column where the keys reside. I know the way I am currently doing this is not correct and leaves me open to injection attacks.
I also ran into an issue where some of the values contain keys that throw an error in the SQL statement. For example, updating WHERE email = t'est#email.com.
I am not really sure of the proper way to write these statements. I did some research and see multiple methods for different purposes but am not sure which is proper. I am looking to do this as dynamically as possible. Can anyone point me in the right direction?
To connect:
import mysql.connector as sql
import MySQLdb
#Connect
self.db_name = 'database'
self.server = 'server'
self.user_id = 'user'
self.pw = 'password'
try:
self.db_con = MySQLdb.connect(user=self.user_id,password=self.pw,database=self.db_name,
host=self.server,charset='utf8',autocommit=True)
self.cursor = self.db_con.cursor()
except:
print("Error connecting")
SQL Statements:
key_id = str("'") + self.GetCellValue(event.GetRow(),1) + str("'")
target_col = self.GetColLabelValue(event.GetCol())
key_col = self.GetColLabelValue(1)
nVal = str("'") + self.GetCellValue(event.GetRow(),event.GetCol()) + str("'")
#SQL STATEMENTS
sql_update = "UPDATE " + tbl + " SET " + target_col + " = " + nVal + " WHERE " + key_col + " = " + key_id + ""
#INSERT
sql_update = ("INSERT INTO " + str(self.tbl) + "(" + self.key_col + ")" + "VALUES (" + str("'") + str(val) + str("'") + ")")
#DELETE
sql_update = "DELETE FROM " + tbl + " WHERE " + self.key_col + " = " + self.key_id + ""
#SELECT
sql_query = "SELECT * FROM " + self.tbl
#Excecute
try:
self.cursor.execute(sql_update)
except:
print('Error')
self.db_con.rollback()
Databases have different notations for "quoting" identifiers (table and column names etc) and values (data).
MySQL uses backticks to quote identifiers. For values, it's best to use the parameter substitution mechanism provided by the connector package: it's more likely to handle tricky cases like embedded quotes correctly, and will reduce the risk of SQL injection.
Here's an example for inserts; the same techniques can be used for the other types of query.
key_id = str("'") + self.GetCellValue(event.GetRow(),1) + str("'")
target_col = self.GetColLabelValue(event.GetCol())
key_col = self.GetColLabelValue(1)
nVal = str("'") + self.GetCellValue(event.GetRow(),event.GetCol()) + str("'")
#INSERT (using f-strings for brevity)
sql_update = (f"INSERT INTO `{self.tbl}` (`{self.key_col}`) VALUES (%s)")
# Pass the statement and values to cursor.execute.
# The values are assumed to be a sequence, so a single value should be
# placed in a tuple or list.
self.cursor.execute(sql_update, (nVal,))
If you have more than one column / value pair you could do something like this:
cols = ['A', 'B', 'C']
vals = ['a', 'b', 'c']
col_names = ','.join([f'`{c}`' for c in cols])
values_placeholder = ','.join(['%s'] * len(cols))
sql_update = (f"INSERT INTO `{self.tbl}` (col_names) VALUES ({values_placeholder})")
self.cursor.execute(sql_update, vals)
Values are not only data for insertion, but also data that we are using for comparison, for example in WHERE clauses. So an update statement with a filter might be created like this:
sql_update = (f"UPDATE `{tbl}` SET (`{target_col}`) = (%s) WHERE (`{key_col}`) = %s")
self.cursor.execute(sql_update, (nVal, key_id))
However sometimes the target of a SET or WHERE clause may be a column, for example we want to do an update based on other values in the row. For example, this statement will set target_col to the value of other_col for all rows where key_col is equal to other_key_col:
sql_update = (f"UPDATE `{tbl}` SET (`{target_col}`) = `{other_col}` WHERE (`{key_col}`) = `{other_key_col}`")
self.cursor.execute(sql_update)
I have following query to delete the duplicates from the table in database.
WITH x AS (SELECT "region_code" dup, min(ctid)
FROM public.test2 GROUP BY "region_code"
HAVING count(*) > 1)
DELETE FROM public.test2
USING x
WHERE (region_code) = (dup) AND public.test2.ctid <> x.min
RETURNING *;
Now I want to execute this query using python. When I run this query in python, nothing happens. I am using sqlalchemy with python 3.6.
query = "WITH x AS (SELECT \"region_code\" dup, min(ctid) FROM " + schema + "." + table_name + " GROUP BY \"region_code\" HAVING count(*) > 1) DELETE FROM " + schema + "." + table_name +" USING x WHERE (region_code) = (dup) AND " + schema + "." + table_name +".ctid <> x.min RETURNING *;"
data = con.execute(query)
I have a table in which all the 120 fields have got type varchar(75). I have coded like this.
sql = "create table " + tableName + "("
for i in range(len(flds)):
if i == len(flds) - 1:
sql += flds[i] + " varchar(75))"
else:
sql += flds[i] + " varchar(75), "
Is it possible to get a one-liner for it?
Thanks!
First, let's use join so we don't need the commas and if. And, while we're at it, we can just loop over flds instead of range(len(flds)):
columns = []
for fld in flds:
columns.append(fld + " varchar(75)"
Of course this means we have to add the ) on at the end:
sql += ', '.join(columns) + ')'
Now we can turn that loop into a comprehension:
columns = (fld + " varchar(75)" for fld in flds)
And now, we can inline that into the join:
sql += ', '.join(fld + " varchar(75)" for fld in flds) + ')'
And now, we have two lines that can obviously be combined into one:
sql = "create table " + tableName + "(" + ', '.join(fld + " varchar(75)" for fld in flds) + ')'
But that's way over 80 characters, so probably better to write it as two lines anyway. I'd probably do it like this:
columns = ', '.join(fld + " varchar(75)" for fld in flds)
sql = "create table " + tableName + "(" + columns + ")"
And finally, let's use an f-string instead of concatenating with +, which makes things only a little shorter, but a lot more readable.
columns = ', '.join(f'{fld} varchar(75)' for fld in flds)
sql = f'create table {tableName} ({columns})'
You can use join with format:
v = "create table {} ({} varchar(75));".format(tableName, " varchar(75), ".join(flds))
I try to add some information to the db table with method 'insert_users'. The problem associated with concatenation of variable self.tests2 to INSERT query, when i try do this i have: TypeError: not all arguments converted during string formatting. When I try to insert instead of 'self.test2' string '%s','%s','%s','%s','%s','%s','%s','%s' all works. What the matter?
def insert_users(self, users_dict, rows_names):
self.test1 = len(rows_names) * ["%s"]
print self.test1
self.test2 = "'" + "','".join(self.test1) + "'"
self.cursor = self.connection.cursor()
for name, val in users_tuple.items():
self.cursor.execute("""INSERT INTO users(""" + \
",".join(rows_names) + """) VALUES(""" + **self.test2** + """)""" %\
(name, val, 'SomeNickname', 'password', '13-09-11', '2', '#gmail.com','1'))
ConnectToDb.connection_cot_cle(self)
users_dict = {'FirstName':'LastName'}
rows_names = ('Fname', 'Lname', 'login', 'password','reg_date', 'role_id',\
'email', 'is_active')
db_instans.insert_users(users_dict, rows_names)
use temporary variable to save your string like:
temp = "INSER INTO users(" + ",".join(rows_names) + ") VALUES(" + test2 + ")"
temp = temp % (name, val, 'SomeNickname', 'password', '13-09-11', '2', '#gmail.com','1')
and then interpolate data into it
or add additional parentheses around
("INSER INTO users(" + ",".join(rows_names) + ") VALUES(" + test2 + ")")
it will look like
temp = ("INSER INTO users(" + ",".join(rows_names) + ") VALUES(" + test2 + ")") % (name, val, 'SomeNickname', 'password', '13-09-11', '2', '#gmail.com','1')
problem is that you didn't have complete string, and wanted to insert data into ")"