I am trying to learn how to save dataframe created in pandas into postgresql db (hosted on Azure). I planned to start with simple dummy data:
data = {'a': ['x', 'y'],
'b': ['z', 'p'],
'c': [3, 5]
}
df = pd.DataFrame (data, columns = ['a','b','c'])
I found a function that pushed df data into psql table. It starts with defining connection:
def connect(params_dic):
""" Connect to the PostgreSQL database server """
conn = None
try:
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**params_dic)
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sys.exit(1)
print("Connection successful")
return conn
conn = connect(param_dic)
*param_dic contains all connection details (user/pass/host/db)
Once connection is established then I'm defining execute function:
def execute_many(conn, df, table):
"""
Using cursor.executemany() to insert the dataframe
"""
# Create a list of tupples from the dataframe values
tuples = [tuple(x) for x in df.to_numpy()]
# Comma-separated dataframe columns
cols = ','.join(list(df.columns))
# SQL quert to execute
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
cursor = conn.cursor()
try:
cursor.executemany(query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
cursor.close()
return 1
print("execute_many() done")
cursor.close()
I executed this function to a psql table that I created in the DB:
execute_many(conn,df,"raw_data.test")
The table raw_data.test consists of columns a(char[]), b(char[]), c(numeric).
When I run the code I get following information in the console:
Connecting to the PostgreSQL database...
Connection successful
Error: malformed array literal: "x"
LINE 1: INSERT INTO raw_data.test(a,b,c) VALUES('x','z',3)
^
DETAIL: Array value must start with "{" or dimension information.
I don't know how to interpret it because none of the columns in df are array
df.dtypes
Out[185]:
a object
b object
c int64
dtype: object
Any ideas what goes wrong there or suggestions how to maybe save df in pSQL in a simpler manner? I found quite a lot of solutions that use sqlalchemy with creating connection string in following way:
conn_string = 'postgres://user:password#host/database'
But I am not sure if that works on cloud db- if I try to edit such connection string with azure host details it does not work.
The usual data type for strings in PostgreSQL is TEXT or VARCHAR(n) or CHAR(n), with round brackets; not CHAR[] with square brackets.
I'm guessing that you want the column to contain a string and that CHAR[] was a typo; in that case, you'll need to recreate (or migrate) the table column to the correct type - most likely TEXT.
(You might use CHAR(n) for fixed-length data, if it's genuinely fixed-length; VARCHAR(n) is mostly of historical interest. In most cases, use TEXT.)
Alternately, if you do mean to make the column an array, you'll need to pass a list in that position from Python.
Consider adjusting your parameterization approach as psycopg2 supports a more optimal approach to format identifiers in SQL statements like table or column names.
In fact, docs indicate your current approach is not optimal and poses a security risk:
# This works, but it is not optimal
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
Instead use psycop2.sql module:
from psycopg2 import sql
...
query = (
sql.SQL("insert into {} values (%s, %s, %s)")
.format(sql.Identifier('table'))
)
...
cur.executemany(query, tuples)
Also, for best practice in SQL always include column names in append queries and do not rely on column order of stored table:
query = (
sql.SQL("insert into {0} ({1}, {2}, {3}) values (%s, %s, %s)")
.format(
sql.Identifier('table'),
sql.Identifier('col1'),
sql.Identifier('col2'),
sql.Identifier('col3')
)
)
Finally, discontinue using % for string formatting across all your Python code (not just psycopg2). As of Python 3, this method has been de-emphasized but not deprecated yet! Instead, use str.format (Python 2.6+) or F-string (Python 3.6+).
Related
My database is on postgres and is local
I have an array that is in the form of:
[1,2,3,...2600]
As you can see it is a very long array so I cant type the elements one by one to insert them
So I wanted to use unnest() function to make it like this:
1
2
3
|
2600
and maybe go from there
however I still need to write the unnest like unnest(array [1,...,2600]) to work but ofcourse that didnt work
So how do I insert an array as rows of the same column at the same time?
You can use execute_values to bulk all your data into your table:
import psycopg2
from psycopg2.extras import execute_values
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
insert_query = "insert into table_name (col_name) values %s"
# create payload as list of tuples
data = [(i,) for i in range(1, 2601)]
execute_values(cursor, insert_query, data)
conn.commit()
I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?
You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.
Ok so basically I'm trying to update an existing SQLite3 Database with instance variables (typ and lvl)
#Set variables
typ = 'Test'
lvl = 6
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
#Update Info
sql = """
UPDATE fieldmap
SET buildtype = typ, buildlevel = lvl
WHERE rowid = 11
"""
cursor.execute(sql)
#Print Databse
print("\nHere's a listing of all the records in the table:\n")
for row in cursor.execute("SELECT rowid, * FROM fieldmap ORDER BY rowid"):
print(row)
As an Error I'm getting
sqlite3.OperationalError: no such column: typ
Now I basically know the problem is that my variable is inserted with the wrong syntax but I can not for the life of me find the correct one. It works with strings and ints just fine like this:
sql = """
UPDATE fieldmap
SET buildtype = 'house', buildlevel = 3
WHERE rowid = 11
"""
But as soon as I switch to the variables it throws the error.
Your query is not actually inserting the values of the variables typ and lvl into the query string. As written the query is trying to reference columns named typ and lvl, but these don't exist in the table.
Try writing is as a parameterised query:
sql = """
UPDATE fieldmap
SET buildtype = ?, buildlevel = ?
WHERE rowid = 11
"""
cursor.execute(sql, (typ, lvl))
The ? acts as a placeholder in the query string which is replaced by the values in the tuple passed to execute(). This is a secure way to construct the query and avoids SQL injection vulnerabilities.
Hey I think you should use ORM to manipulate with SQL database.
SQLAlchemy is your friend. I use that with SQLite, MySQL, PostgreSQL. It is fantastic.
That can make you get away from this syntax error since SQL does take commas and quotation marks as importance.
For hard coding, you may try this:
sql = """
UPDATE fieldmap
SET buildtype = '%s', buildlevel = 3
WHERE rowid = 11
""" % (house)
This can solve your problem temporarily but not for the long run. ORM is your friend.
Hope this could be helpful!
I'm trying to append a set containing a number into my MySQL database using the Python MySQLConnector. I am able to add data manually, but the following expression with %s won't work. I tried several variations on this, but nothing from the documentation seems to work in my case. The table was already buildt as you can see:
#Table erstellen:
#cursor.execute('''CREATE TABLE anzahlids( tweetid INT )''')
Here is my code and the error:
print len(idset)
id_data = [
len(idset)
]
print id_data
insert = ("""INSERT INTO anzahlids (idnummer) VALUES (%s)""")
cursor.executemany(insert, id_data)
db_connection.commit()
"Failed processing format-parameters; %s" % e)
mysql.connector.errors.ProgrammingError: Failed processing format-parameters; argument 2 to map() must support iteration
Late answer, but I would like to post some nicer code. Also, the original question was using MySQL Connector/Python.
The use of executemany() is wrong. The executemany() method expects a sequence of tuples, for example, [ (1,), (2,) ].
For the problem at hand, executemany() is actually not useful and execute() should be used:
cur.execute("DROP TABLE IF EXISTS anzahlids")
cur.execute("CREATE TABLE anzahlids (tweetid INT)")
some_ids = [ 1, 2, 3, 4, 5]
cur.execute("INSERT INTO anzahlids (tweetid) VALUES (%s)",
(len(some_ids),))
cnx.commit()
And with MySQL Connector/Python (unlike with MySQLdb), you have to make sure you are committing.
(Note for non-German speakers: 'anzahlids' means 'number_of_ids')
The following is an example that worked on my machine.
import MySQLdb
db = MySQLdb.connect(host="localhost", user="stackoverflow", passwd="", db="stackoverflow")
cursor = db.cursor()
try:
sql = 'create table if not exists anzahlids( tweetid int ) ; '
except:
#ignore
pass
sql = ("""INSERT INTO anzahlids (tweetid) VALUES (%s)""")
data = [1,2,3,4,5,6,7,8,9]
length = [len(data)]
cursor.executemany(sql,length)
db.commit()
if idset is a single value you can use
sql = ("""INSERT INTO anzahlids (tweetid) VALUES (%s)""") % len(idset)
cursor.execute(sql)
db.commit()
I am working on a small project and I have created a helper function that will write a string of comma separated values to a database as if they were values. I realise there are implications to doing it this way but this is small and i need to get it going until i can do better
def db_insert(table,data):
"""
insert data into a table, the data should be a tuple
matching the number of columns with null for any columns that
have no value. False is returned on any error, error is logged to
database log file."""
if os.path.exists(database_name):
con = lite.connect(database_name)
else:
error = "Database file does not exist."
to_log(error)
return False
if con:
try:
cur = con.cursor()
data = str(data)
cur.execute('insert into %s values(%s)') % (table, data)
con.commit()
con.close()
except Exception, e:
pre_error = "Database insert raised and error;\n"
thrown_error = pre_error + str(e)
to_log(thrown_error)
finally:
con.close()
else:
error = "No connection to database"
to_log(error)
return False
database_name etc... are defined elsewhere in the script.
Barring any other obvious glaring errors;
what i need to be able to do (by this method or some other if there are suggestions) is allow somebody to create a list where each value represents a column value. As I will not know how many columns are being populated.
so somebody uses it as follows:
data = ["null", "foo","bar"]
db_insert("foo_table", data)
this insert that data into the table name foo_table. It is up to the user to know how many columns are in the table and supply the correct number of elements to satisfy that.
I realise that it is better to use sqlite parameters but there are two problems.
first you cannot use a parameter to specify the table only the values.
second is that you need to know how many values you are supplying. you have to do;
cur.execute('insert into table values(?,?,?), val1,val2,val3)
you need to be able to specify the three ?'s.
I am trying to write a general function that allows me to take an arbitrary number of values and insert them into an arbitrary table name.
Now, it was working relatively ok until i tried to pass in 'null' as a value.
One of the columns is the primary key and has an autoincrement. So passing in null will allow it to autoincrement. There will also be other instances where nulls would be required.
The problem is that python keeps wrapping my null in single quotes which sqlite complains about as a datatype mismatch as the primary key is an integer field. If I try passing None as the python null equivalent then the same thing happens.
So two problems.
How to insert an arbitrary number of columns.
How to pass a null.
Thank you for all your help on this and past questions.
Sorry, this looks like a duplicate of this
Using Python quick insert many columns into Sqlite\Mysql
my apologies I did not find it until after I wrote this.
Results in the following which works;
def db_insert(table,data):
"""
insert data into a table, the data should be a tuple
matching the number of columns with null for any columns that
have no value. False is returned on any error, error is logged to
database log file."""
if os.path.exists(database_name):
con = lite.connect(database_name)
else:
error = "Database file does not exist."
to_log(error)
return False
if con:
try:
tuple_len = len(data)
holders = ','.join('?' * tuple_len)
sql_query = 'insert into %s values({0})'.format(holders) % table
cur = con.cursor()
#data = str(data)
#cur.execute('insert into readings values(%s)') % table
cur.execute(sql_query, data)
con.commit()
con.close()
except Exception, e:
pre_error = "Database insert raised and error;\n"
thrown_error = pre_error + str(e)
to_log(thrown_error)
finally:
con.close()
else:
error = "No connection to database"
to_log(error)
return False
The second problem is a "Works for me". When I pass None as value it will correctly convert that value back and forth to and from the db.
import sqlite3
conn = sqlite3.connect("test.sqlite")
data = ("a", None)
conn.execute('INSERT INTO "foo" VALUES(' + ','.join("?" * len(data)) + ')', data)
list(conn.execute("SELECT * FROM foo")) # -> [("a", None)]