jaydebeapi set autocommit off for bulk inserts

jaydebeapi set autocommit off for bulk inserts - python

I have many rows to insert into a table and tried doing row by row but it is taking a really long time. i read this link Python+MySQL - Bulk Insert and seems like setting autocommit to be off can speed things up.
import jadebeapi
connection = jaydebeapi.connect('com.teradata.jdbc.TeraDriver', ['jdbc:teradata://some url',USER,PASS], ['tdgssconfig.jar','terajdbc4.jar'],)
cur = connection.cursor()
connection.jconn.setAutoCommit(False)
cur.execute('select * from my_table')
connection.commit()
Other queries i perform are:
l = [(1,2,3),(2,4,6).....]
for tup in l:
cur.execute('my insert statement')
#this is the really slow part.
When i have the connection.jconn.setAutoCommit(False) i always get this error:
[Teradata Database] [TeraJDBC 15.10.00.14] [Error 3932] [SQLState 25000] Only an ET or null statement is legal after a DDL Statement.
When that line and connection.commit() is commented out, the code works fine. What is the right syntax to set autocommit false?

If speed/efficiency is a concern, you should be using prepared statements and passing your parameters in as the second argument.
You could then also use .executemany():
l = [(1,2,3),(2,4,6).....]
cur.executemany('my insert statement with 3 ? params', l)
#this should be much faster

Related

About python sqlite3 order by

Now, I have a study about python sqlite3 database. I think it is very simple problem but not allow next step. Could help me?
There is print OK on vscode terminal, but not revised to DB file. I'm searching several times but I can not fix it.
If I execute the code, it not sorting on DB files.
import sqlite3
conn = sqlite3.connect('sqliteDB1.db')
cursor = conn.cursor()
cursor.execute("SELECT * FROM member")
temp123 = cursor. fetchall()
print(temp123)
cursor.execute("SELECT * FROM member ORDER BY -code")
temp321 = cursor.fetchall()
conn.commit
print(temp321)
conn.close()

A select statement just returns data from a database, it will not modify it. Moreover, tables in SQL databases are inherently unordered sets. They have no intrinsic value, and you should never rely on the order of the rows that happens to be returned unless you explicitly sort it with an order by clause.

Python: How to remove single quotes from list item

I'm working on a bit of python code to run a query against a redshift (postgres) SQL database, and I'm running into an issue where I can't strip off the surrounding single quotes from a variable I'm passing to the query. I'm trying to drop a number of tables from a list. This is the basics of my code:
def func(table_list):
drop_query = 'drop table if exists %s' #loaded from file
table_name = table_list[0] #table_name = 'my_db.my_table'
con=psycopg2.connect(dbname=DB, host=HOST, port=PORT, user=USER, password=PASS)
cur=con.cursor()
cur.execute(drop_query, (table_name, )) #this line is giving me trouble
#cleanup statements for the connection
table_list = ['my_db.my_table']
when func() gets called, I am given the following error:
syntax error at or near "'my_db.my_table'"
LINE 1: drop table if exists 'my_db.my_table...
^
Is there a way I can remove the surrounding single quotes from my list item?
for the time being, I've done it (what think is) the wrong way and used string concatenation, but know this is basically begging for SQL-injection.

This is not how psycopg2 works. You are using a string operator %s to replace with a string. The reason for this is to tokenize your string safely to avoid SQL injection, psycopg2 handles the rest.
You need to modify the query before it gets to the execute statement.
drop_query = 'drop table if exists {}'.format(table_name)
I warn you however, do not allow these table names to be create by outside sources, or you risk SQL injection.
However a new version of PSYCOPG2 kind of allows something similar
http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql
from psycopg2 import sql
cur.execute(
sql.SQL("insert into {} values (%s, %s)").format(sql.Identifier('my_table')),[10, 20]
)

SELECT in a while loop in python with mysql

I am trying to find the latest entry in a MySQL database by using a query with SELECT MAX(id). I already get the latest id, so I know the query works, but now I want to use it in a while loop so I keep getting the latest entry with each iteration.
This is what I have so far:
import pymysql
con = pymysql.connect(host='.....', user='.....',
password='.....', database='.....')
cur = con.cursor()
while True:
query = "SELECT MAX(id) FROM reports"
cur.execute(query)
data = cur.fetchall()
last = (data[0][0])
print(last)
The problem is that I keep getting the same result after updating the database. For instance, right now I have 45 entries, so my script prints '45' in a while loop. But after I add another row to the table it keeps printing '45' instead of the '46' I would expect. When I stop the script and run it again, it will print '46' and keep printing this even after I add another row.
I have only started working with MySQL about two weeks ago, so I don't have all that much knowledge about it. I feel like I'm missing something really small here. What should I do? Any help would be greatly appreciated.

I had this same problem, and just wanted to make it clear for anyone else searching for the solution.
Setting autocommit to True solved my issue and didn't require calling a commit after each query.
import pymysql
con = pymysql.connect(host='.....', user='.....',
password='.....', database='.....')
con.autocommit = True
cur = con.cursor()
while True:
query = "SELECT MAX(id) FROM reports"
cur.execute(query)
data = cur.fetchall()
last = (data[0][0])
print(last)
Here is a link to the documentation

APSW (or SQLite3) very slow INSERT on executemany

I have found the following issue with APSW (an SQLite parser for Python) when inserting lines.
Lets say my data is data = [[1,2],[3,4]]
APSW and SQLite3 allow me to do something like:
apsw.executemany("INSERT INTO Table VALUES(?,?)", b)
or I can write some code that does the following:
sql = "BEGIN TRANSACTION;
INSERT INTO Table Values('1','2');
INERT INTO Table Values('3','4');
COMMINT;"
apsw.execute(sql)
When data is a long list/array/table the performance of the first method is extremelly slow compared to the second one (for 400 rows it can be 20 sec vs less than 1!). I do not understand why this is as that is the method shown on all SQLite Python tutorials to add data into a table.
Any idea of what may be happening here?

(Disclosure: I am the author of APSW). If you do not explicitly have a transaction in effect, then SQLite automatically starts one at the beginning of each statement, and ends at the end of each statement. A write transaction is durable - meaning the contents must end up on storage and fsync called to ensure they will survive an unexpected power or system failure. Storage is slow!
I recommend using with rather than BEGIN/COMMIT in your case, because it will automatically rollback on error. That makes sure your data insertion either completely happens or not at all. See the documentation for an example.
When you are inserting a lot of data, you will find WAL mode to be more performant.

Thanks to Confuseh I got the following answer:
Executing:
apsw.execute("BEGIN TRANSACTION;")
apsw.executemany("INERT INTO Table VALUES(?,?)", b)
apsw.execute("COMMIT;")
Speeds up the process by A LOT! This seems to be the right way of adding data (vs using my method of creating multiple INSERT statments).

Thank you for this question, the answer help me when I use Sqlite with Python. finally, I get the following things, and wish it can help some people :
When connct to the sqlite database we can use
con = sqlite3.connect(":memory:",isolation_level=None) or con = sqlite3.connect(":memory:")
when use isolation_level=None, it will use autocommit mode that make too many transaction , and become too slow. this will help:
cur.execute("BEGIN TRANSACTION")
cur.executemany(....)
cur.execute("COMMIT")
And if use con = sqlite3.connect(":memory:"), cur.executemany(....) will be fast immediately.

Problem
There may be a confusion for mysqlclient-python/pymysql users who expect executemany of sqlite3/apsw to rewrite their INERT INTO table VALUES(?, ?) into a multi-row INSERT statement.
For instance, executemany of mysqlclient-python has this in its docstring:
This method improves performance on multiple-row INSERT and REPLACE. Otherwise it is equivalent to looping over args with execute().
Python stdlib's sqlite3.Cursor.executemany doesn't have this optimisation. It's always loop-equivalet. Here's how to demonstrate it (unless you want to read some C, _pysqlite_query_execute):
import sqlite3
conn = sqlite3.connect(':memory:', isolation_level=None)
conn.set_trace_callback(print)
conn.execute('CREATE TABLE tbl (x INTEGER, y INTEGER)')
conn.executemany('INSERT INTO tbl VALUES(?, ?)', [(i, i ** 2) for i in range(5)])
It prints:
CREATE TABLE tbl (x INTEGER, y INTEGER)
INSERT INTO tbl VALUES(0, 0)
INSERT INTO tbl VALUES(1, 1)
INSERT INTO tbl VALUES(2, 4)
INSERT INTO tbl VALUES(3, 9)
INSERT INTO tbl VALUES(4, 16)
Solution
Thus, you either need to rewrite these INSERTs into multi-row one (manually or, for instance, with python-sql) to stay in auto-commit mode (isolation_level=None), or wrap your INSERTs in a transaction (with sensible number of INSERTs in one) in default implicit-commit mode. The latter means the following for the above snippet:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.set_trace_callback(print)
conn.execute('CREATE TABLE tbl (x INTEGER, y INTEGER)')
with conn:
conn.executemany('INSERT INTO tbl VALUES(?, ?)', [(i, i ** 2) for i in range(5)])
Now it prints:
CREATE TABLE tbl (x INTEGER, y INTEGER)
BEGIN
INSERT INTO tbl VALUES(0, 0)
INSERT INTO tbl VALUES(1, 1)
INSERT INTO tbl VALUES(2, 4)
INSERT INTO tbl VALUES(3, 9)
INSERT INTO tbl VALUES(4, 16)
COMMIT
For further bulk-insert performance improvement in SQLite, I'd suggest to start with this overview question.

Python MySQLdb doesn't wait for the result

I am trying to run some querys that needs to create some temporary tables and then returns a result set, but i am unable to do that with MySQLdb api.
I already dig something about this issue like here but without success.
My query is like this:
create temporary table tmp1
select * from table1;
alter tmp1 add index(somefield);
create temporary table tmp2
select * from table2;
select * from tmp1 inner join tmp2 using(somefield);
This returns immediatly an empty result set. If i go to the mysql client and do a show full processlist i can see my queries executing. They take some minutes to complete.
Why cursor returns immediatly and don't wait to query to run.
If i try to run another query i have a "Commands out of sync; you can't run this command now"
I already tried to put my connection with autocommit to True
db = MySQLdb.connect(host='ip',
user='root',
passwd='pass',
db='mydb',
use_unicode=True
)
db.autocommit(True)
Or put every statement in is own cursor.execute() and between them db.commit() but without success too.
Can you help me to figure what is the problem? I know mysql don't support transactions for some operations like alter table, but why the api don't wait until everything is finished like it does with a select?
By the way i'm trying to do this on a ipython notebook.

I suspect that you're passing your multi-statement SQL string directly to the cursor.execute function. The thing is, each of the statements is a query in its own right so it's unclear what the result set should contain.
Here's an example to show what I mean. The first case is passing a semicolon set of statements to execute which is what I presume you have currently.
def query_single_sql(cursor):
print 'query_single_sql'
sql = []
sql.append("""CREATE TEMPORARY TABLE tmp1 (id int)""")
sql.append("""INSERT INTO tmp1 VALUES (1)""")
sql.append("""SELECT * from tmp1""")
cursor.execute(';'.join(sql))
print list(cursor.fetchall())
Output:
query_single_sql
[]
You can see that nothing is returned, even though there is clearly data in the table and a SELECT is used.
The second case is where each statement is executed as an independent query, and the results printed for each query.
def query_separate_sql(cursor):
print 'query_separate_sql'
sql = []
sql.append("""CREATE TEMPORARY TABLE tmp3 (id int)""")
sql.append("""INSERT INTO tmp3 VALUES (1)""")
sql.append("""SELECT * from tmp3""")
for query in sql:
cursor.execute(query)
print list(cursor.fetchall())
Output:
query_separate_sql
[]
[]
[(1L,)]
As you can see, we consumed the results of the cursor for each query and the final query has the results we expect.
I suspect that even though you've issued multiple queries, the API only has a handle to the first query executed and so immediately returns when the CREATE TABLE is done. I'd suggest serializing your queries as described in the second example above.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

jaydebeapi set autocommit off for bulk inserts - python

If speed/efficiency is a concern, you should be using prepared statements and passing your parameters in as the second argument. You could then also use .executemany(): l = [(1,2,3),(2,4,6).....] cur.executemany('my insert statement with 3 ? params', l) #this should be much faster

Related

About python sqlite3 order by

Python: How to remove single quotes from list item

SELECT in a while loop in python with mysql

APSW (or SQLite3) very slow INSERT on executemany

Python MySQLdb doesn't wait for the result

Categories

Resources