I am working on a small project and I have created a helper function that will write a string of comma separated values to a database as if they were values. I realise there are implications to doing it this way but this is small and i need to get it going until i can do better
def db_insert(table,data):
"""
insert data into a table, the data should be a tuple
matching the number of columns with null for any columns that
have no value. False is returned on any error, error is logged to
database log file."""
if os.path.exists(database_name):
con = lite.connect(database_name)
else:
error = "Database file does not exist."
to_log(error)
return False
if con:
try:
cur = con.cursor()
data = str(data)
cur.execute('insert into %s values(%s)') % (table, data)
con.commit()
con.close()
except Exception, e:
pre_error = "Database insert raised and error;\n"
thrown_error = pre_error + str(e)
to_log(thrown_error)
finally:
con.close()
else:
error = "No connection to database"
to_log(error)
return False
database_name etc... are defined elsewhere in the script.
Barring any other obvious glaring errors;
what i need to be able to do (by this method or some other if there are suggestions) is allow somebody to create a list where each value represents a column value. As I will not know how many columns are being populated.
so somebody uses it as follows:
data = ["null", "foo","bar"]
db_insert("foo_table", data)
this insert that data into the table name foo_table. It is up to the user to know how many columns are in the table and supply the correct number of elements to satisfy that.
I realise that it is better to use sqlite parameters but there are two problems.
first you cannot use a parameter to specify the table only the values.
second is that you need to know how many values you are supplying. you have to do;
cur.execute('insert into table values(?,?,?), val1,val2,val3)
you need to be able to specify the three ?'s.
I am trying to write a general function that allows me to take an arbitrary number of values and insert them into an arbitrary table name.
Now, it was working relatively ok until i tried to pass in 'null' as a value.
One of the columns is the primary key and has an autoincrement. So passing in null will allow it to autoincrement. There will also be other instances where nulls would be required.
The problem is that python keeps wrapping my null in single quotes which sqlite complains about as a datatype mismatch as the primary key is an integer field. If I try passing None as the python null equivalent then the same thing happens.
So two problems.
How to insert an arbitrary number of columns.
How to pass a null.
Thank you for all your help on this and past questions.
Sorry, this looks like a duplicate of this
Using Python quick insert many columns into Sqlite\Mysql
my apologies I did not find it until after I wrote this.
Results in the following which works;
def db_insert(table,data):
"""
insert data into a table, the data should be a tuple
matching the number of columns with null for any columns that
have no value. False is returned on any error, error is logged to
database log file."""
if os.path.exists(database_name):
con = lite.connect(database_name)
else:
error = "Database file does not exist."
to_log(error)
return False
if con:
try:
tuple_len = len(data)
holders = ','.join('?' * tuple_len)
sql_query = 'insert into %s values({0})'.format(holders) % table
cur = con.cursor()
#data = str(data)
#cur.execute('insert into readings values(%s)') % table
cur.execute(sql_query, data)
con.commit()
con.close()
except Exception, e:
pre_error = "Database insert raised and error;\n"
thrown_error = pre_error + str(e)
to_log(thrown_error)
finally:
con.close()
else:
error = "No connection to database"
to_log(error)
return False
The second problem is a "Works for me". When I pass None as value it will correctly convert that value back and forth to and from the db.
import sqlite3
conn = sqlite3.connect("test.sqlite")
data = ("a", None)
conn.execute('INSERT INTO "foo" VALUES(' + ','.join("?" * len(data)) + ')', data)
list(conn.execute("SELECT * FROM foo")) # -> [("a", None)]
Related
I am trying to learn how to save dataframe created in pandas into postgresql db (hosted on Azure). I planned to start with simple dummy data:
data = {'a': ['x', 'y'],
'b': ['z', 'p'],
'c': [3, 5]
}
df = pd.DataFrame (data, columns = ['a','b','c'])
I found a function that pushed df data into psql table. It starts with defining connection:
def connect(params_dic):
""" Connect to the PostgreSQL database server """
conn = None
try:
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**params_dic)
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sys.exit(1)
print("Connection successful")
return conn
conn = connect(param_dic)
*param_dic contains all connection details (user/pass/host/db)
Once connection is established then I'm defining execute function:
def execute_many(conn, df, table):
"""
Using cursor.executemany() to insert the dataframe
"""
# Create a list of tupples from the dataframe values
tuples = [tuple(x) for x in df.to_numpy()]
# Comma-separated dataframe columns
cols = ','.join(list(df.columns))
# SQL quert to execute
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
cursor = conn.cursor()
try:
cursor.executemany(query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
cursor.close()
return 1
print("execute_many() done")
cursor.close()
I executed this function to a psql table that I created in the DB:
execute_many(conn,df,"raw_data.test")
The table raw_data.test consists of columns a(char[]), b(char[]), c(numeric).
When I run the code I get following information in the console:
Connecting to the PostgreSQL database...
Connection successful
Error: malformed array literal: "x"
LINE 1: INSERT INTO raw_data.test(a,b,c) VALUES('x','z',3)
^
DETAIL: Array value must start with "{" or dimension information.
I don't know how to interpret it because none of the columns in df are array
df.dtypes
Out[185]:
a object
b object
c int64
dtype: object
Any ideas what goes wrong there or suggestions how to maybe save df in pSQL in a simpler manner? I found quite a lot of solutions that use sqlalchemy with creating connection string in following way:
conn_string = 'postgres://user:password#host/database'
But I am not sure if that works on cloud db- if I try to edit such connection string with azure host details it does not work.
The usual data type for strings in PostgreSQL is TEXT or VARCHAR(n) or CHAR(n), with round brackets; not CHAR[] with square brackets.
I'm guessing that you want the column to contain a string and that CHAR[] was a typo; in that case, you'll need to recreate (or migrate) the table column to the correct type - most likely TEXT.
(You might use CHAR(n) for fixed-length data, if it's genuinely fixed-length; VARCHAR(n) is mostly of historical interest. In most cases, use TEXT.)
Alternately, if you do mean to make the column an array, you'll need to pass a list in that position from Python.
Consider adjusting your parameterization approach as psycopg2 supports a more optimal approach to format identifiers in SQL statements like table or column names.
In fact, docs indicate your current approach is not optimal and poses a security risk:
# This works, but it is not optimal
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
Instead use psycop2.sql module:
from psycopg2 import sql
...
query = (
sql.SQL("insert into {} values (%s, %s, %s)")
.format(sql.Identifier('table'))
)
...
cur.executemany(query, tuples)
Also, for best practice in SQL always include column names in append queries and do not rely on column order of stored table:
query = (
sql.SQL("insert into {0} ({1}, {2}, {3}) values (%s, %s, %s)")
.format(
sql.Identifier('table'),
sql.Identifier('col1'),
sql.Identifier('col2'),
sql.Identifier('col3')
)
)
Finally, discontinue using % for string formatting across all your Python code (not just psycopg2). As of Python 3, this method has been de-emphasized but not deprecated yet! Instead, use str.format (Python 2.6+) or F-string (Python 3.6+).
Question How do I append my dataframe to database so that it checks if stock_ticker exists , only to append the rows where stock_ticker does not exist?
This is the process that I did
Import CSV file to pandas dataframe
Assign column names to be same as in database
Sending the dataframe to database using the code below but getting
sqlite3.IntegrityError: UNIQUE constraint failed: stocks.stock_ticker
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
I looked at other Integrity Error cases but can't seem to find one that works with appending dataframes? I found and tried this but all it does is just not append anything.
try:
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
except sqlite3.IntegrityError:
print("Already in database")
I am not sure I am understanding the iterating thing correctly
How to iterate over rows in a DataFrame in Pandas
So I tried this, but it just prints out already in database for each of them. Even tough there is 4 new stock tickers.
for index, row in df.iterrows():
try:
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
df.to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
except sqlite3.IntegrityError:
print("Already in database")
The database looks like this
any insight much appreciated :)
It looks like this happens because Pandas doesn't allow for declaring a proper ON CONFLICT policy, in case you try to append data to a table that has the same (unique) primary key or violates some other UNIQUEness constraint. if_exists only refers to the whole table itself, not each individual row.
I think you already came up with a pretty good answer, and maybe with a small modification it would work for you:
# After connecting
for i in range(len(df)):
try:
df[df.index == i].to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
except sqlite3.IntegrityError:
pass
Now, this might be a problem if you want to actually replace the value if a newer one appears in your Pandas data and let's say you want to replace the old one that you have in the database. In that case, you might want to use the raw SQL command as a string, and pass the Pandas values iteratively. For example:
insert_statement = """
INSERT INTO stocks (stock_id,
stock_ticker,
{other columns})
VALUES (%s, %s, {as many %s as columns})
ON CONFLICT (stock_id) DO UPDATE
SET {Define which values you will update on conflict}"""
And then you could run
for i in range(len(df)):
values = tuple(df.iloc[i])
cursor.execute(insert_statement, values)
I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?
You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.
I am working on a small project where I need to insert the value into different tables based on the Slave Id,
for example, if the user is sending multiple values with slave id 1 I need to insert the value in the table "Instant_Values_S1" and if the slave id 2 I need to insert the value into "Instant_Values_S2"
this is my code:
c.execute("INSERT INTO Instant_Values_S1 VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
(
1,
utctime,
data[0],
data[1],
data[2],
data[3],
data[4],
data[5],
data[6],
data[7],
data[8],
data[9],
data[10],
data[11],
data[12],
data[13],
data[14],
))
conn.commit()
conn.close()
here I define the table name again and again but instead, I wanna give those automatically? can someone help me out!
Thanks
AFAIK, this is a know limitation of SQL: you can use parameters for values, but not for table or columns names. But a SQL command is just a string and you can use Python to build the static part of the request. More or less:
table_names = [None, "Instant_Values_S1", "Instant_Values_S2" ]
# will raise exception if slave_id is not 1 or 2
req = "INSERT INTO " + table_names[slave_id] + " VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
c.execute(req, (slave_id, ...)
I've spent some time reading the SQLite docs, various questions and answers here on Stack Overflow, and this thing, but have not come to a full answer.
I know that there is no way to do something like INSERT OR IGNORE INTO foo VALUES(...) with SQLite and get back the rowid of the original row, and that the closest to it would be INSERT OR REPLACE but that deletes the entire row and inserts a new row and thus gets a new rowid.
Example table:
CREATE TABLE foo(
id INTEGER PRIMARY KEY AUTOINCREMENT,
data TEXT
);
Right now I can do:
sql = sqlite3.connect(":memory:")
# create database
sql.execute("INSERT OR IGNORE INTO foo(data) VALUES(?);", ("Some text.", ))
the_id_of_the_row = None
for row in sql.execute("SELECT id FROM foo WHERE data = ?", ("Some text.", )):
the_id_of_the_row = row[0]
But something ideal would look like:
the_id_of_the_row = sql.execute("INSERT OR IGNORE foo(data) VALUES(?)", ("Some text", )).lastrowid
What is the best (read: most efficient) way to insert a row into a table and return the rowid, or to ignore the row if it already exists and just get the rowid? Efficiency is important because this will be happening quite often.
Is there a way to INSERT OR IGNORE and return the rowid of the row that the ignored row was compared to? This would be great, as it would be just as efficient as an insert.
The way that worked the best for me was to insert or ignore the values, and the select the rowid in two separate steps. I used a unique constraint on the data column to both speed up selects and avoid duplicates.
sql.execute("INSERT OR IGNORE INTO foo(data) VALUES(?);" ("Some text.", ))
last_row_id = sql.execute("SELECT id FROM foo WHERE data = ?;" ("Some text. ", ))
The select statement isn't as slow as I thought it would be. This, it seems, is due to SQLite automatically creating an index for the unique columns.
INSERT OR IGNORE is for situations where you do not care about the identity of the record; where the goal is only to have some record with that specific value.
If you want to know whether a new record is inserted or not, you have to check by hand:
the_id_of_the_row = None
for row in sql.execute("SELECT id FROM foo WHERE data = ?", ...):
the_id_of_the_row = row[0]
if the_id_of_the_row is None:
c = sql.cursor()
c.execute("INSERT INTO foo(data) VALUES(?)", ...)
the_id_of_the_row = c.lastrowid
As for efficiency: when SQLite checks the datacolumn for duplicates, it has to do exactly the same query that you're doing with the SELECT, and once you've done that, the access path is in the cache, so performance should not be a problem. In any case, it is necessary to execute two separate INSERT/SELECT queries (in either order, both your and my code work, but yours is simpler).