Update SQL table with list of dictionaries - python

How to proper use dictionaries in SQL queries? Should I predefine SQL variables to prevent SQL injection?
E.g. I've got an error in Python while trying update table with dictionary, but query works perfect in pure SQL with predefined variables and values.
_mysql_connector.MySQLInterfaceError: Incorrect datetime value: '1' for column 'colF' at row 1
My dict
{'ID': 1, 'colA': 'valA1', 'colB': 'valB1', 'colC': 'valC1',
'colD': 'valD1', 'colE': 'valE1', 'colF': datetime.datetime(2017, 12, 23, 0, 0),
'colG': datetime.datetime(2018,
7, 11, 0, 0), 'colH': datetime.datetime(2018, 9, 1, 0, 0)}
SQL statement
UPDATE table1
SET
colA = CASE \
WHEN %(valA1)s IS NOT NULL THEN %(valA1)s
END,\
colB = CASE \
WHEN %(valB1)s IS NOT NULL THEN %(valB1)s
END,\
colC = CASE \
WHEN %(valC1)s IS NOT NULL THEN %(valC1)s
END,\
colD = CASE \
WHEN %(valD1)s IS NOT NULL THEN %(valD1)s
END,\
colE = CASE \
WHEN %(valE1)s IS NOT NULL THEN %(valE1)s
END,\
colF = CASE\
WHEN %(valF)s IS NOT NULL THEN %(valF)s
END,\
colG = CASE\
WHEN %(valF1)s IS NOT NULL THEN %(valF1)s
END,\
colH = CASE\
WHEN %(valH1)s IS NOT NULL THEN %(valH1)s
END\
WHERE %(ID)s = Id """
When I format a query string
colF1 = CASE
WHEN 2018-07-11 00:00:00 IS NOT NULL THEN 2018-07-11 00:00:00
END,
colH1 = CASE
WHEN 2018-09-01 00:00:00 IS NOT NULL THEN 2018-09-01 00:00:00
END
WHERE Id = 1
And another issue when value is not null. Syntax is wrong, I suppose.

Several issues arise with your attempted parameterized query:
As described in MySQLdb docs, column or table identifiers cannot be used in parameter placeholders. Such placeholders are only used for literal values. Also, consider the triple quote strings for multiple lines to avoid \ line breaks:
sql = """UPDATE table1
SET
colA = CASE
WHEN colA IS NOT NULL
THEN %(valA1)s
END,
...
"""
cur.execute(sql, mydict)
con.commit()
To use the named placeholder approach, dictionary keys must match placeholder names. Currently, you need to reverse most of your key/value order in dictionary which you correctly do only for ID. However, as mentioned in #1 above, remove any column identifiers:
{'ID': 1, 'valA1': 'data value', 'valB1': 'data value', 'valC1': 'data value', ... }
Datetimes in most databases including MySQL must take the string form YYYY-MM-DD HH:MM:SS. MySQL engine cannot execute Python's datetime() objects. However, some DBI-APIs may be able to convert this object type.
{'valF': '2017-12-23 00:00:00',
'valG': '2018-07-11 00:00:00',
'valH': '2018-09-01 00:00:00'}

Related

how can I convert strings like "2022.7.1" to "2022-07-01" in sqlite3?

I'm using sqlite3 library on python and have some strings on my sqlite DB as follows:
date
2022.7.1
2022.7.11
2022.10.1
2022.10.21
I would like to convert the date column as follows:
date
2022-07-01
2022-07-11
2022-10-01
2022-10-21
I got some hints from this link, but I wasn't able to use it..
Would anyone mind to provide some hints or guidance?
Your help will be very appreciated.
Thanks..!
It may be possible to do this using only SQL, but as the answer you linked shows, it will look very complicated. Assuming reasonable quantities of data, it's simpler to fetch the values into Python, reformat them, and update the database.
The work is done in this statement:
date(*map(int, row[0].split('.'))).strftime('%Y-%m-%d')
which splits each date string on '.', converts the parts to ints, passes these ints to the date constructor and then formats the resulting date instance as a YYYY-mm-dd string.
Here's a complete example:
from datetime import date
import sqlite3
with sqlite3.connect(':memory:') as conn:
conn.execute("""CREATE TABLE t (d TEXT)""")
conn.execute(
"""insert into t (d) values ('2022.7.1'), ('2022.7.11'), ('2022.10.1'), ('2022.10.21')"""
)
conn.commit()
rows = conn.execute("""SELECT d FROM t""").fetchall()
# Make a list of dicts containing the old value and the new value.
values = [
{
'old': row[0],
'new': date(*map(int, row[0].split('.'))).strftime('%Y-%m-%d'),
}
for row in rows
]
conn.executemany("""UPDATE t SET d = :new WHERE d = :old""", values)
conn.commit()
rows = conn.execute("""SELECT d FROM t""")
for row in rows:
print(row[0])

How to remove quotations of string in list of values for sql query in python

I want to insert values into the table open.roads using SQL in python, the table has the following columns:
id integer NOT NULL DEFAULT nextval('open.res_la_roads_id_seq'::regclass),
run_id integer,
step integer,
agent integer,
road_id integer,
CONSTRAINT res_la_roads_pkey PRIMARY KEY (id)
I'm trying to write various rows of values into sql within one query, therefore I created a list with the values to be inserted in the query, following this example:
INSERT INTO open.roads(id, run_id, step, agent, road_id)
VALUES (DEFAULT, 1000, 2, 2, 5286), (DEFAULT, 1000, 1, 1, 5234);
The list in Python should contain:
list1=(DEFAULT, 1000, 2, 2, 5286), (DEFAULT, 1000, 1, 1, 5234), (.....
I have problems with the value "DEFAULT" as it is a string which should be introduced in sql without the quotations. But I don't manage to remove the quotations, I have tried to save "DEFAULT" in a variable as a string and used str.remove(), str.replace() etc.
The code I'm trying to use:
for road in roads:
a="DEFAULT", self.run_id, self.modelStepCount, self.unique_id, road
list1=append.(a)
val=','.join(map(str, list1))
sql = """insert into open.test ("id","run_id","step","agent","road_id")
values {0}""".format(val)
self.model.mycurs.execute(sql)
I get an error because of the quotiations:
psycopg2.DataError: invalid input syntax for integer: "DEFAULT"
LINE 2:('DEFAULT', 582, 0, 2, 13391),('DEFAULT'
How can I remove them? Or is there another way to do this?
I think this might help you.
sql = "INSERT INTO open.roads(id, run_id, step, agent, road_id) VALUES "
values = ['(DEFAULT, 1000, 2, 2, 5286), ', '(DEFAULT, 1000, 1, 1, 5234)']
for value in values:
sql += value
print(sql)
After running this code, it will print: INSERT INTO open.roads(id, run_id, step, agent, road_id) VALUES (DEFAULT, 1000, 2, 2, 5286), (DEFAULT, 1000, 1, 1, 5234)
Other solution is to format your string using %.
print("INSERT INTO open.roads(id, run_id, step, agent, road_id) VALUES (%s, %s, %s, %s, %s)" % ("DEFAULT", 1000, 2, 2, 5286))
This will print INSERT INTO open.roads(id, run_id, step, agent, road_id) VALUES (DEFAULT, 1000, 2, 2, 5286)
Simply omit the column in append query as it will be supplied with DEFAULT value per your table constraint. Also, use parameterization of values by passing a vars argument in execute (safer/more maintainable than string interpolation): execute(query, vars). And since you have a list of tuples, even consider executemany().
Below assumes database API (i.e., pymysql, pyscopg2) uses the %s as parameter placeholder, otherwise use ?:
# PREPARED STATEMENT
sql = """INSERT INTO open.test ("run_id", "step", "agent", "road_id")
VALUES (%s, %s, %s, %s);"""
list1 = []
for road in roads:
a = (self.run_id, self.modelStepCount, self.unique_id, road)
list1.append(a)
self.model.mycurs.executemany(sql, list1)

Python sqlite3 - operationalerror near "2017"

I'm new to programming. I have dictionary called record, that receives various inputs like 'Color', 'Type' 'quantity',etc. Now I tried to add a Date column then insert into sqlite table running through the 'if loop' with the code below. But I get an "Operational error near 2017", ie near the date.
Can anyone help please? Thanks in advance
Date = str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d'))
record['Date'] = Date
column = [record['Color'], Date]
values = [record['quantity'], record['Date']]
column = ','.join(column)
if record['Type'] == 'T-Shirts' and record['Style'] == 'Soft':
stment = ("INSERT INTO xtrasmall (%s) values(?)" %column)
c.execute(stment, values)
conn.commit()
Updated
You can simplify the code as follows:
from datetime import datetime
date = datetime.now().date()
sql = "INSERT INTO xtrasmall (%s, Date) values (?, ?)" % record['Color']
c.execute(sql, (record['quantity'], date))
This substitutes the value of the selected color directly into the column names in the query string. Then the query is executed passing the quantity and date string as arguments. The date should automatically be converted to a string, but you could convert with str() if desired.
This does assume that the other colour columns have a default value (presumably 0), or permit null values.
Original answer
Because you are constructing the query with string interpolation (i.e. substituting %s for a string) your statement becomes something like this:
INSERT INTO xtrasmall (Red,2017-10-06) values(?)
which is not valid because 2017-10-06 is not a valid column name. Print out stment before executing it to see.
If you know what the column names are just specify them in the query:
values = ['Red', 2, Date]
c.execute("INSERT INTO xtrasmall (color, quantity, date) values (?, ?, ?)", values)
conn.commit()
You need to use a ? for each column that you are inserting.
It looks like you want to insert the dictionary using its keys and values. This can be done like this:
record = {'date':'2017-10-06', 'color': 'Red', 'quantity': 2}
columns = ','.join(record.keys())
placeholders = ','.join('?' * len(record.values()))
sql = 'INSERT INTO xtrasmall ({}) VALUES ({})'.format(columns, placeholders)
c.execute(sql, record.values())
This code will generate the parameterised SQL statement:
INSERT INTO xtrasmall (date,color,quantity) VALUES (?,?,?)
and then execute it using the dictionary's values as the parameters.

Send a list of lists including multiple timestamp queries using cx_oracle.execute_many

I'm trying to send multiple timestamp queries using cx_oracle from a large list. I've only seen an article sending one query per execution but I haven't seen any examples with executing multiple using executemany. How would I get a list of multiple timestamps to go through? I made the following below but no luck!
import datetime
import cx_Oracle
listrows =[
[56, 'steveJob', 'Apple', '2016-08-15 20:23:03.317909', '2015-09-08 20:46:30.456299', ''],
[32, 'markZuck', 'Faceb', '2015-09-08 20:46:30.456299', '2015-09-08 20:46:30.456299', ''],
[45, 'elonMusk', 'Tesla', '2016-02-18 16:53:20.959984', '2016-02-18 17:17:05.664715', '']
]
con = cx_Oracle.connect("system","oracle","localhost:1521/xe.oracle.docker")
cursor = con.cursor()
cursor.prepare("""INSERT INTO lifexp (age, name, lastcompany, created_at, deleted_at, turnout) VALUES (:1,:2,:3,:4,:5,:6)""")
cursor.setinputsizes(cx_Oracle.TIMESTAMP)
cursor.executemany(None, listrows)
con.commit()
cursor.close()
con.close()
My table I'm connecting to is setup as this:
CREATE TABLE lifexp(
age NUMBER (19,0) primary key,
name VARCHAR(256),
lastcompany VARCHAR(256),
created_at timestamp,
deleted_at timestamp,
turnout timestamp
);
I get the error:
TypeError: expecting timestamp data
When I remove the line "cursor.setinputsizes(cx_Oracle.TIMESTAMP)" I get:
cx_Oracle.DatabaseError: ORA-01843: not a valid month
The problem is that you are sending strings, not timestamp values! So in the case where you are using setinputsizes() you are telling cx_Oracle that you have timestamp values to give....but are not providing them. Without the setinputsizes() call you are telling Oracle to convert those strings to timestamp values but the default timestamp format doesn't match the format of the dates you are passing.
You should either (a) convert your strings to Python datetime.datetime values or (b) specify the conversion in your SQL statement
Converting strings to Python datetime.datetime values is simple enough:
datetime.datetime.strptime("2016-08-15 20:23:03.317909", "%Y-%m-%d %H:%M:%S.%f")
Specifying the conversion in your SQL statement is fairly straightforward, too.
cursor.prepare("""
insert into lifexp
(age, name, lastcompany, created_at, deleted_at, turnout)
VALUES (:1,:2,:3,
to_timestamp(:4, 'YYYY-MM-DD HH24:MI:SS.FF'),
to_timestamp(:5, 'YYYY-MM-DD HH24:MI:SS.FF'):6)""")
I believe the first option is the better one but both should work for you.

How can I check if a record exists when passing a dataframe to SQL in pandas?

Background
I'm building an application that passes data from a CSV to a MS SQL database. This database is being used as a repository for all my enterprise's records of this type (phone calls). When I run the application, it reads the CSV and converts it to a Pandas dataframe, which I then use SQLAlchemy and pyodbc to append the records to my table in SQL.
However, due to the nature of the content I'm working with, there is oftentimes data that we already have imported to the table. I am looking for a way to check if my primary key exists (a column in my SQL table and in my dataframe) before appending each record to the table.
Current code
# save dataframe to mssql DB
engine = sql.create_engine('mssql+pyodbc://CTR-HV-DEVSQL3/MasterCallDb')
df.to_sql('Calls', engine, if_exists='append')
Sample data
My CSV is imported as a pandas dataframe (primary key is FileName, its always unique), then passed to MS SQL. This is my dataframe (df):
+---+------------+-------------+
| | FileName | Name |
+---+------------+-------------+
| 1 | 123.flac | Robert |
| 2 | 456.flac | Michael |
| 3 | 789.flac | Joesph |
+---+------------+-------------+
Any ideas? Thanks!
Assuming you have no memory constraints and you're not inserting null values, you could:
sql = "SELECT pk_1, pk_2, pk_3 FROM my_table"
sql_df = pd.read_sql(sql=sql, con=con)
df = pd.concat((df, sql_df)).drop_duplicates(subset=['pk_1', 'pk_2', 'pk_3'], keep=False)
df = df.dropna()
df.to_sql('my_table', con=con, if_exists='append')
Depending on the application you could also reduce the size of sql_df by changing the query.
Update - Better overall and can insert null values:
sql = "SELECT pk_1, pk_2, pk_3 FROM my_table"
sql_df = pd.read_sql(sql=sql, con=con)
df = df.loc[df[pks].merge(sql_df[pks], on=pks, how='left', indicator=True)['_merge'] == 'left_only']
# df = df.drop_duplicates(subset=pks) # add it if you want to drop any duplicates that you may insert
df.to_sql('my_table', con=con, if_exists='append')
What if you iterated through rows DataFrame.iterrows() and then on each iteration used ON DUPLICATE for your key value FileName to not add it again.
You can check if is empty, like this:
sql = "SELECT pk_1, pk_2, pk_3 FROM my_table"
sql_df = pd.read_sql(sql=sql, con=con)
if sql_df.empty:
print("Is empty")
else:
print("Is not empty")
you can set parameter index=False see example bellow
data.to_sql('book_details', con = engine, if_exists = 'append', chunksize = 1000, index=False)**
If it is not set, then the command automatically adds the indexcolumn
book_details is the name of the table we want to insert our dataframe into.
Result
[SQL: INSERT INTO book_details (`index`, book_id, title, price) VALUES (%(index)s, %(book_id)s, %(title)s, %(price)s)]
[parameters: ({'index': 0, 'book_id': 55, 'title': 'Programming', 'price': 29},
{'index': 1, 'book_id': 66, 'title': 'Learn', 'price': 23},
{'index': 2, 'book_id': 77, 'title': 'Data Science', 'price': 27})]
Therefore, it needs to be in the table!!!

Categories