I use cx_oracle executemany function to insert data into a table in Oracle.
I would like to check after commit the action what is the actual amount of records append to the table.
can it be done and how?
Thanks
If you are using a cursor with the executemany method, then use the cursor's rowcount attribute to retrieve the number of rows affected by executemany.
There are many nuances associated with the Cursor objects executmany method fro SELECT and DML statements. Take a look at the cx_Oracle documentation for details at https://cx-oracle.readthedocs.io/en/latest/user_guide/batch_statement.html#batchstmnt
It may be helpful if you could post a code snippet of what is being attempted to elicit a more accurate response.
I don't use Python, but - as your question is related to Oracle and if you can "move" the insert process into a stored procedure you'd then call from Python, you could utilize sql%rowcount attribute which returns number of rows affected by the most recent SQL statement ran from within PL/SQL.
Here's an example:
SQL> set serveroutput on
SQL> begin
2 insert into test (id, name)
3 select 1, 'Little' from dual union all
4 select 2, 'Foot' from dual union all
5 select 3, 'Amir' from dual;
6 dbms_output.put_line('Inserted ' || sql%rowcount || ' row(s)');
7 end;
8 /
Inserted 3 row(s)
^
|
value returned by SQL%ROWCOUNT
PL/SQL procedure successfully completed.
SQL>
Method executemany has a parameter named "parameters". The "parameters" is a list of sequences/dictionaries. Size of this list determines how many times the statement is executed (each time the statement is executed, database returns number of rows affected). Finaly you can get this information, but it will be a list of integers ( one for each execution)
Let me :
SQL> Create table abc_1
(id number)
;
Table created
Python:
dsn = cx_Oracle.makedsn("xxx.xxx.xxx.xxx", 1521, service_name="xxxx")
con = cx_Oracle.connect(user="xxxx", password="xxxx", dsn=dsn)
tab_cursor = con.cursor()
tab_query = '''
insert into abc_1 (id) select :x from dual where :y >= level connect by level<=10
'''
foo = {"x": 1, "y": 5} # it will insert 5 rows
foo2 = {"x": 1, "y": 6} # it will insert 6 rows
foo3 = {"x": 1, "y": 10} # it will insert 10 rows
tab_cursor.executemany(tab_query, parameters=[foo, foo2, foo3], arraydmlrowcounts=True)
print("Rows inserted:", tab_cursor.getarraydmlrowcounts())
Output shows how many rows was inserted for each execution:
Rows inserted: [5, 6, 10]
Related
I have a dataframe named Data2 and I wish to put values of it inside a postgresql table. For reasons, I cannot use to_sql as some of the values in Data2 are numpy arrays.
This is Data2's schema:
cursor.execute(
"""
DROP TABLE IF EXISTS Data2;
CREATE TABLE Data2 (
time timestamp without time zone,
u bytea,
v bytea,
w bytea,
spd bytea,
dir bytea,
temp bytea
);
"""
)
My code segment:
for col in Data2_mcw.columns:
for row in Data2_mcw.index:
value = Data2_mcw[col].loc[row]
if type(value).__module__ == np.__name__:
value = pickle.dumps(value)
cursor.execute(
"""
INSERT INTO Data2_mcw(%s)
VALUES (%s)
"""
,
(col.replace('\"',''),value)
)
Error generated:
psycopg2.errors.SyntaxError: syntax error at or near "'time'"
LINE 2: INSERT INTO Data2_mcw('time')
How do I rectify this error?
Any help would be much appreciated!
There are two problems I see with this code.
The first problem is that you cannot use bind parameters for column names, only for values. The first of the two %s placeholders in your SQL string is invalid. You will have to use string concatenation to set column names, something like the following (assuming you are using Python 3.6+):
cursor.execute(
f"""
INSERT INTO Data2_mcw({col})
VALUES (%s)
""",
(value,))
The second problem is that a SQL INSERT statement inserts an entire row. It does not insert a single value into an already-existing row, as you seem to be expecting it to.
Suppose your dataframe Data2_mcw looks like this:
a b c
0 1 2 7
1 3 4 9
Clearly, this dataframe has six values in it. If you were to run your code on this dataframe, then it would insert six rows into your database table, one for each value, and the data in your table would look like the following:
a b c
1
3
2
4
7
9
I'm guessing you don't want this: you'd rather your database table contained the following two rows instead:
a b c
1 2 7
3 4 9
Instead of inserting one value at a time, you will have to insert one entire row at time. This means you have to swap your two loops around, build the SQL string up once beforehand, and collect together all the values for a row before passing it to the database. Something like the following should hopefully work (please note that I don't have a Postgres database to test this against):
column_names = ",".join(Data2_mcw.columns)
placeholders = ",".join(["%s"] * len(Data2_mcw.columns))
sql = f"INSERT INTO Data2_mcw({column_names}) VALUES ({placeholders})"
for row in Data2_mcw.index:
values = []
for col in Data2_mcw.columns:
value = Data2_mcw[col].loc[row]
if type(value).__module__ == np.__name__:
value = pickle.dumps(value)
values.append(value)
cursor.execute(sql, values)
Context: I am using MSSQL, pandas, and pyodbc.
Steps:
Obtain dataframe from query using pyodbc (no problemo)
Process columns to generate the context of a new (but already existing) column
Fill an auxilliary column with UPDATE statements (i.e. UPDATE t SET t.value = df.value FROM dbo.table t where t.ID = df.ID)
Now how do I execute the sql code in the auxilliary column, without looping through each row?
sample data
The first two columns are obtained by querying dbo.table, the third columns exists but is empty in the database. The fourth column only exists in the dataframe to prepare the SQL statement that would correspond to updating dbo.table
ID
raw
processed
strSQL
1
lorum.ipsum#test.com
lorum ipsum
UPDATE t SET t.processed = 'lorum ipsum' FROM dbo.table t WHERE t.ID = 1
2
rumlo.sumip#test.com
rumlo sumip
UPDATE t SET t.processed = 'rumlo sumip' FROM dbo.table t WHERE t.ID = 2
3
...
...
...
I would like to execute the SQL script in each row in an efficient manner.
After I recommended .executemany() in a comment to the question, a subsequent comment from #Charlieface suggested that a table-valued parameter (TVP) would provide even better performance. I didn't think it would make that much difference, but I was wrong.
For an existing table named MillionRows
ID TextField
-- ---------
1 foo
2 bar
3 baz
…
and example data of the form
num_rows = 1_000_000
rows = [(f"text{x:06}", x + 1) for x in range(num_rows)]
print(rows)
# [('text000000', 1), ('text000001', 2), ('text000002', 3), …]
my test using a standard executemany() call with cnxn.autocommit = False and crsr.fast_executemany = True
crsr.executemany("UPDATE MillionRows SET TextField = ? WHERE ID = ?", rows)
took about 180 seconds (3 minutes).
However, by creating a user-defined table type
CREATE TYPE dbo.TextField_ID AS TABLE
(
TextField nvarchar(255) NULL,
ID int NOT NULL,
PRIMARY KEY (ID)
)
and a stored procedure
CREATE PROCEDURE [dbo].[mr_update]
#tbl dbo.TextField_ID READONLY
AS
BEGIN
SET NOCOUNT ON;
UPDATE MillionRows SET TextField = t.TextField
FROM MillionRows mr INNER JOIN #tbl t ON mr.ID = t.ID
END
when I used
crsr.execute("{CALL mr_update (?)}", (rows,))
it did the same update in approximately 80 seconds (less than half the time).
I have an SQLite database. I am trying to insert values (users_id, lessoninfo_id) in table bookmarks, only if both do not exist before in a row.
INSERT INTO bookmarks(users_id,lessoninfo_id)
VALUES(
(SELECT _id FROM Users WHERE User='"+$('#user_lesson').html()+"'),
(SELECT _id FROM lessoninfo
WHERE Lesson="+lesson_no+" AND cast(starttime AS int)="+Math.floor(result_set.rows.item(markerCount-1).starttime)+")
WHERE NOT EXISTS (
SELECT users_id,lessoninfo_id from bookmarks
WHERE users_id=(SELECT _id FROM Users
WHERE User='"+$('#user_lesson').html()+"') AND lessoninfo_id=(
SELECT _id FROM lessoninfo
WHERE Lesson="+lesson_no+")))
This gives an error saying:
db error near where syntax.
If you never want to have duplicates, you should declare this as a table constraint:
CREATE TABLE bookmarks(
users_id INTEGER,
lessoninfo_id INTEGER,
UNIQUE(users_id, lessoninfo_id)
);
(A primary key over both columns would have the same effect.)
It is then possible to tell the database that you want to silently ignore records that would violate such a constraint:
INSERT OR IGNORE INTO bookmarks(users_id, lessoninfo_id) VALUES(123, 456)
If you have a table called memos that has two columns id and text you should be able to do like this:
INSERT INTO memos(id,text)
SELECT 5, 'text to insert'
WHERE NOT EXISTS(SELECT 1 FROM memos WHERE id = 5 AND text = 'text to insert');
If a record already contains a row where text is equal to 'text to insert' and id is equal to 5, then the insert operation will be ignored.
I don't know if this will work for your particular query, but perhaps it give you a hint on how to proceed.
I would advice that you instead design your table so that no duplicates are allowed as explained in #CLs answer below.
For a unique column, use this:
INSERT OR REPLACE INTO tableName (...) values(...);
For more information, see: sqlite.org/lang_insert
insert into bookmarks (users_id, lessoninfo_id)
select 1, 167
EXCEPT
select user_id, lessoninfo_id
from bookmarks
where user_id=1
and lessoninfo_id=167;
This is the fastest way.
For some other SQL engines, you can use a Dummy table containing 1 record.
e.g:
select 1, 167 from ONE_RECORD_DUMMY_TABLE
In Python 2.7, let a dictionary with features' IDs as keys.
There are thousands of features.
Each feature has a single value, but this value is a tuple containing 6 parameters for the features (for example; size, color, etc.)
On the other hand I have a postgreSQL table in a database where these features parameters must be saved.
The features' IDs are already set in the table (as well as other informations about these features).
The IDs are unique (they are random (thus not serial) but unique numbers).
There is 6 empty columns with names: "param1", "param2", "param3", ..., "param6".
I already have a tuple containing these names:
columns = ("param1", "param2", "param3", ..., "param6")
The code I have doesn't work for saving these parameters in their respective columns for each feature:
# "view" is the dictionary with features's ID as keys()
# and their 6 params stored in values().
values = [view[i] for i in view.keys()]
columns = ("param1","param2","param3","param4","param5","param6")
conn = psycopg2.connect("dbname=mydb user=username password=password")
curs = conn.cursor()
curs.execute("DROP TABLE IF EXISTS mytable;")
curs.execute("CREATE TABLE IF NOT EXISTS mytable (LIKE originaltable including defaults including constraints including indexes);")
curs.execute("INSERT INTO mytable SELECT * from originaltable;")
insertstatmnt = 'INSERT INTO mytable (%s) values %s'
alterstatement = ('ALTER TABLE mytable '+
'ADD COLUMN param1 text,'+
'ADD COLUMN param2 text,'+
'ADD COLUMN param3 real,'+
'ADD COLUMN param4 text,'+
'ADD COLUMN param5 text,'+
'ADD COLUMN param6 text;'
)
curs.execute(alterstatement) # It's working up to this point.
curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values))) # The problem seems to be here.
conn.commit() # Making change to DB !
curs.close()
conn.close()
Here's the error I have:
curs.execute(insert_statement, (psycopg2.extensions.AsIs(','.join(columns)), tuple(values)))
ProgrammingError: INSERT has more expressions than target columns
I must miss something.
How to do that properly?
When using '%s' to get the statement as what I think you want, you just need to change a couple things.
Ignoring c.execute(), this statement is by no means wrong, but it does not return what you are looking for. Using my own version, this is what I got with that statement. I also ignored psycopg2.extensions.AsIs() because, it is just a Adapter conform to the ISQLQuote protocol useful for objects whose string representation is already valid as SQL representation.
>>> values = [ i for i in range(0,5)] #being I dont know the keys, I just made up values.
>>> insertstatmnt, (','.join(columns), tuple(vlaues))
>>> ('INSERT INTO mytable (%s) values %s', ('param1,param2,param3,param4,param5,param6', (0, 1, 2, 3, 4)))
As you can see, what you entered returns a tuple with the values.
>>> insertstatmnt % (','.join(columns), tuple(values))
>>> 'INSERT INTO mytable (param1,param2,param3,param4,param5,param6) values (0, 1, 2, 3, 4)'
Where as, this returns a string that is more likely to be read by the SQL. The values obviously do not match the specified ones. I believe the problem you have lies within creating your string.
Reference for pycopg2: http://initd.org/psycopg/docs/extensions.html
As I took the syntax of the psycopg2 command from this thread:
Insert Python Dictionary using Psycopg2
and as my values dictionary doesn't exactly follow the same structure as the mentioned example (I also have 1 key as ID, like in this example, but mine has only 1 corresponding value, as a tuple containing my 6-parameters, thus "nested 1 lever deeper" instead of directly 6 values corresponding to the keys) I need to loop through all features to execute one SQL statement per feature:
[curs.execute(insertstatmnt, (psycopg2.extensions.AsIs(', '.join(columns)), i)) for i in tuple(values)].
This, is working.
I'd like to perform a group-concat in SQLite only on those records where there is more than one row to concatenate. It seems like you could do this beforehand (count records using a group by then remove those singleton rows before proceeding with the group_concat); after (complete the group_concat then remove rows where nothing was concatenated); or possibly even during?
My question: what's the fastest way for SQLite to accomplish this?
I've worked out an "after" example using APSW in Python, but am not happy with it:
#set up a table with data
c.execute("create table foo(x,y)")
def getvals():
a = [1, 1, 2, 3, 3]
b = ['a','b','c','d','e']
for i in range(5):
yield a[i],b[i]
c.executemany("insert into foo values(?,?)",getvals())
c.execute('''create table fooc(a,b);
insert into fooc(a,b) select x, group_concat(y) from foo group by x''')
c.execute('select * from fooc')
c.fetchall() ## reports three records
c.execute("select * from fooc where b like '%,%'")
c.fetchall() ## reports two records .. what I want
It seems crazy (and slow?) to use LIKE for this kind of need.
Add a HAVING clause to your query:
INSERT INTO fooc(a,b)
SELECT x, group_concat(y)
FROM foo
GROUP BY x
HAVING COUNT(*) > 1