How to convert numpy array to postgresql list - python

I am trying to use python to insert 2 columns of a numpy array into a postgresql table as two arrays.
postgresql table is DOS:
primary_key
energy integer[]
dos integer[]
I have a numpy array that is a 2d array of 2x1D arrays:
finArray = np.array([energy,dos])
I am trying to use the following script for inserting into a database and I keep getting errors with the insert. I can't figure out how to format the array so that it properly formats in the form: INSERT INTO dos VALUES(1,'{1,2,3}','{1,2,3}')"
Script:
import psycopg2
import argparse
import sys
import re
import numpy as np
import os
con = None
try:
con = psycopg2.connect(database='bla', user='bla')
cur = con.cursor()
cur.execute("INSERT INTO dos VALUES(1,'{%s}')", [str(finArray[0:3,0].tolist())[1:-1]])
con.commit()
except psycopg2.DatabaseError, e:
if con:
con.rollback()
print 'Error %s' % e
sys.exit(1)
finally:
if con:
con.close()
The part I can't figure out is I will get errors like this:
Error syntax error at or near "0.31691105000000003"
LINE 1: INSERT INTO dos VALUES(1,'{'0.31691105000000003, -300.0, -19...
I can't figure out where that inner ' ' is coming from in the bracket.

Too late, but putting this out anyway.
I was trying to insert a numpy array into Redshift today. After trying odo, df.to_sql() and what not, I finally got this to work at a pretty fast speed (~3k rows/minute). I won't talk about the issues I faced with those tools but here's something simple that works:
cursor = conn.cursor()
args_str = b','.join(cursor.mogrify("(%s,%s,...)", x) for x in tuple(map(tuple,np_data)))
cursor.execute("insert into table (a,b,...) VALUES "+args_str.decode("utf-8"))
cursor.commit()
cursor.close()
The 2nd line will need some work based on the dimensions of your array.
You might want to check these answers too:
Converting from numpy array to tuple
Multiple row inserts in psycopg2

You probably have an array of strings, try changing your command adding astype(float), like:
cur.execute("INSERT INTO dos VALUES(1,'{%s}')", [str(finArray[0:3,0].astype(float).tolist())[1:-1]])

The quotes come during the numpy.ndarray.tolist() and come because you actually have strings. If you don't want to assume that data is float-typed as #Saullo Castro suggested you could also do a simple str(finArray[0:3,0].tolist()).replace("'","")[1:-1] to get rid of them.
However, more appropriately, if you are treating the data in finArray in any way in your script and assume they are numbers, you should probably make sure they are imported into the array as numbers to start with.
You can require the array to have a certain datatype while initiating it by specifying, e.g. finArray = np.array(..., dtype=np.float) and then work backwards towards where it is suitable to enforce the type.

Psycopg will adapt a Python list to an array so you just have to cast the numpy array to a Python list and pass it to the execute method
import psycopg2
import numpy as np
energy = [1, 2, 3]
dos = [1, 2, 3]
finArray = np.array([energy,dos])
insert = """
insert into dos (pk, energy) values (1, %s);
;"""
conn = psycopg2.connect("host=localhost4 port=5432 dbname=cpn")
cursor = conn.cursor()
cursor.execute(insert, (list(finArray[0:3,0]),))
conn.commit()
conn.close()

You need convert the numpy array to a list, example:
import numpy as np
import psycopg2
fecha=12
tipo=1
precau=np.array([20.35,25.34,25.36978])
conn = psycopg2.connect("dbname='DataBase' user='Administrador' host='localhost' password='pass'")
cur = conn.cursor()
#make a list
vec1=[]
for k in precau:
vec1.append(k)
#make a query
query=cur.mogrify("""UPDATE prediccioncaudal SET fecha=%s, precaudal=%s WHERE idprecau=%s;""", (fecha,vec1,tipo))
#execute a query
cur.execute(query)
#save changes
conn.commit()
#close connection
cur.close()
conn.close()

Related

Python saving data into PostgreSQL: array value error

I am trying to learn how to save dataframe created in pandas into postgresql db (hosted on Azure). I planned to start with simple dummy data:
data = {'a': ['x', 'y'],
'b': ['z', 'p'],
'c': [3, 5]
}
df = pd.DataFrame (data, columns = ['a','b','c'])
I found a function that pushed df data into psql table. It starts with defining connection:
def connect(params_dic):
""" Connect to the PostgreSQL database server """
conn = None
try:
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**params_dic)
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sys.exit(1)
print("Connection successful")
return conn
conn = connect(param_dic)
*param_dic contains all connection details (user/pass/host/db)
Once connection is established then I'm defining execute function:
def execute_many(conn, df, table):
"""
Using cursor.executemany() to insert the dataframe
"""
# Create a list of tupples from the dataframe values
tuples = [tuple(x) for x in df.to_numpy()]
# Comma-separated dataframe columns
cols = ','.join(list(df.columns))
# SQL quert to execute
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
cursor = conn.cursor()
try:
cursor.executemany(query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
cursor.close()
return 1
print("execute_many() done")
cursor.close()
I executed this function to a psql table that I created in the DB:
execute_many(conn,df,"raw_data.test")
The table raw_data.test consists of columns a(char[]), b(char[]), c(numeric).
When I run the code I get following information in the console:
Connecting to the PostgreSQL database...
Connection successful
Error: malformed array literal: "x"
LINE 1: INSERT INTO raw_data.test(a,b,c) VALUES('x','z',3)
^
DETAIL: Array value must start with "{" or dimension information.
I don't know how to interpret it because none of the columns in df are array
df.dtypes
Out[185]:
a object
b object
c int64
dtype: object
Any ideas what goes wrong there or suggestions how to maybe save df in pSQL in a simpler manner? I found quite a lot of solutions that use sqlalchemy with creating connection string in following way:
conn_string = 'postgres://user:password#host/database'
But I am not sure if that works on cloud db- if I try to edit such connection string with azure host details it does not work.
The usual data type for strings in PostgreSQL is TEXT or VARCHAR(n) or CHAR(n), with round brackets; not CHAR[] with square brackets.
I'm guessing that you want the column to contain a string and that CHAR[] was a typo; in that case, you'll need to recreate (or migrate) the table column to the correct type - most likely TEXT.
(You might use CHAR(n) for fixed-length data, if it's genuinely fixed-length; VARCHAR(n) is mostly of historical interest. In most cases, use TEXT.)
Alternately, if you do mean to make the column an array, you'll need to pass a list in that position from Python.
Consider adjusting your parameterization approach as psycopg2 supports a more optimal approach to format identifiers in SQL statements like table or column names.
In fact, docs indicate your current approach is not optimal and poses a security risk:
# This works, but it is not optimal
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
Instead use psycop2.sql module:
from psycopg2 import sql
...
query = (
sql.SQL("insert into {} values (%s, %s, %s)")
.format(sql.Identifier('table'))
)
...
cur.executemany(query, tuples)
Also, for best practice in SQL always include column names in append queries and do not rely on column order of stored table:
query = (
sql.SQL("insert into {0} ({1}, {2}, {3}) values (%s, %s, %s)")
.format(
sql.Identifier('table'),
sql.Identifier('col1'),
sql.Identifier('col2'),
sql.Identifier('col3')
)
)
Finally, discontinue using % for string formatting across all your Python code (not just psycopg2). As of Python 3, this method has been de-emphasized but not deprecated yet! Instead, use str.format (Python 2.6+) or F-string (Python 3.6+).

Psycopg2 Postgres Unnesting a very long array to insert it into a tables column

My database is on postgres and is local
I have an array that is in the form of:
[1,2,3,...2600]
As you can see it is a very long array so I cant type the elements one by one to insert them
So I wanted to use unnest() function to make it like this:
1
2
3
|
2600
and maybe go from there
however I still need to write the unnest like unnest(array [1,...,2600]) to work but ofcourse that didnt work
So how do I insert an array as rows of the same column at the same time?
You can use execute_values to bulk all your data into your table:
import psycopg2
from psycopg2.extras import execute_values
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
insert_query = "insert into table_name (col_name) values %s"
# create payload as list of tuples
data = [(i,) for i in range(1, 2601)]
execute_values(cursor, insert_query, data)
conn.commit()

Storing numpy array in sqlite3 database with python issue

I have a problem with storing a numpy array in sqlite database. I have 1 table with Name and Data.
import sqlite3 as sql
from DIP import dip # function to caclculate numpy array
name = input('Enter your full name\t')
data = dip()
con = sql.connect('Database.db')
c = con.cursor()
c.execute("CREATE TABLE IF NOT EXISTS database(Name text, Vein real )")
con.commit()
c.execute("INSERT INTO database VALUES(?,?)", (name, data))
con.commit()
c.execute("SELECT * FROM database")
df = c.fetchall()
print(data)
print(df)
con.close()
Everything is fine but when Data is being stored instead of this:
[('Name', 0.03908678 0.04326234 0.18298542 ..., 0.15228545 0.09972548 0.03992807)]
I have this:
[('Name', b'\xccX+\xa8.\x03\xa4?\xf7\xda[\x1f ..., x10l\xc7?\xbf\x14\x12\)]
What is problem with this? Thank you.
P.S. I tried the solution from here Python insert numpy array into sqlite3 database but it didn't work. And my numpy array is being calculated from skimage (scikit-image) library with HOG (histogram of oriented gradients). Maybe that's a problem...
Also tried to calculate and store it from opencv3 but have the same issue.
On the assumption that it is saving data.tostring() to the database, I tried decoding it with fromstring.
Using your displayed string, and trimming off a few bytes I got:
In [79]: np.fromstring(b'\xccX+\xa8.\x03\xa4?\xf7\xda[\x1f\x10l\xc7?', float)
Out[79]: array([ 0.03908678, 0.18298532])
There's at least one matching number, so this looks promising.
I had similar issue and I have found out that sqlite has problem of storing custom numpy float type (np.float32 in my case).
Change the type of float values to string and it will work fine.
[float(x) for x in data]

How to retrieve data from SQLite faster in python

I have the following info in my database (example):
longitude (real): 70.74
userid (int): 12
This is how i fetch it:
import sqlite3 as lite
con = lite.connect(dbpath)
with con:
cur = con.cursor()
cur.execute('SELECT latitude, userid FROM message')
con.commit()
print "executed"
while True:
tmp = cur.fetchone()
if tmp != None:
info.append([tmp[0],tmp[1]])
else:
break
To get the same info on the form [70.74, 12]
What else can I do to speed up this process? At 10,000,000 rows this takes approx 50 seconds, as I'm aiming for 200,000,000 rows - I never get through this, possible to a memory leak or something like that?
From the sqlite3 documentation:
A Row instance serves as a highly optimized row_factory for Connection objects. It tries to mimic a tuple in most of its features.
Since a Row closely mimics a tuple, depending on your needs you may not even need to unpack the results.
However, since your numerical types are stored as strings, we do need to do some processing. As #Jon Clements pointed out, the cursor is an iterable, so we can just use a comprehension, obtaining the float and ints at the same time.
import sqlite3 as lite
with lite.connect(dbpath) as conn:
cur = conn.execute('SELECT latitude, userid FROM message')
items = [[float(x[0]), int(x[1])] for x in cur]
EDIT: We're not making any changes, so we don't need to call commit.

How to store numpy.array in sqlite3 to take benefit of the sum function?

I am trying to use sqlite3 to compute the average of a numpy.array and I would like to take advantage of the sum function.
So far I have taken advantage of this post :
stackoverflow numpy.array
which help me to store and retreive easily the arrays I need.
import sqlite3
import numpy
import io
def adapt_array(arr):
out = io.BytesIO()
numpy.save(out, arr)
out.seek(0)
a = out.read()
return buffer(a)
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return numpy.load(out)
sqlite3.register_adapter(numpy.ndarray, adapt_array)
sqlite3.register_converter("array", convert_array)
x1 = numpy.arange(12)
x2 = numpy.arange(12, 24)
con = sqlite3.connect(":memory:", detect_types = sqlite3.PARSE_DECLTYPES)
cur = con.cursor()
cur.execute("create table test (idx int, arr array)")
cur.execute("insert into test (idx, arr) values (?, ?)", (1, x1))
cur.execute("insert into test (idx, arr) values (?, ?)", (2, x2))
cur.execute("select idx, sum(arr) from test")
data = cur.fetchall()
print data
but unfortunately the request output does not give me the sum of the arrays.
[2, (0.0))
I would like to go one step further and get directly the result I want from an sql request. Thanks.
Edit : after reading stackoverflow : manipulation of nyumpy.array witl sqlite3 I am more sceptical about the feasibility of this. Any way to get a result close to what I want would be appreciated.
Edit2 : in other words what I am trying to do is to redefine the sum function to the particular kind of data I am using. IS it doable ? That's what was done to compress / uncompress the numpy.array.

Categories