I've written a program to scrape a website for data, place it into several arrays, iterate through each array and place it in a query and then execute the query. The code looks like this:
for count in range(391):
query = #long query
values = (doctor_names[count].encode("utf-8"), ...) #continues for about a dozen arrays
cur.execute(query, values)
cur.close()
db.close()
I run the program and aside from a few truncation warnings everything goes fine. I open the database in MySQL Workbench and nothing has changed. I tried changing the arrays in the values to constant strings and running it but still nothing would change.
I then created an array to hold the last executed query: sql_queries.append(cur._last_executed) and pushed them out to a text file:
fo = open("foo.txt", "wb")
for q in sql_queries:
fo.write(q)
fo.close()
Which gives me a large text file with multiple queries. When I copy the whole text file and create a new query in MySQL Workbench and execute it, it populates the database as desired. What is my program missing?
If your table is using a transactional storage engine, like Innodb, then you need to call db.commit() to have the transaction stored:
for count in range(391):
query = #long query
values = (doctor_names[count].encode("utf-8"), ...)
cur.execute(query, values)
db.commit()
cur.close()
db.close()
Note that with a transactional database, besides comitting you also have the opportunity to handle errors by rollingback inserts or updates with db.rollback(). The db.commit is required to finalize the transaction. Otherwise,
Closing a connection without committing the changes first will cause
an implicit rollback to be performed.
Related
I am executing a number of SELECT queries on a postgres database using psycopg2, but I am getting ERROR: Out of shared memory. It suggests that I should increase max_locks_per_transaction., but this confuses me because each SELECT query is operating on only one table, and max_locks_per_transaction is already set to 512, 8 times the default.
I am using TimescaleDB, which could be the result of a larger than normal number of locks (one for each chunk rather than one for each table, maybe), but this still can't explain running out when so many are allowed. I'm assuming what is happening here is that all the queries are all being run as part of one transaction.
The code I am using looks something as follows.
db = DatabaseConnector(**connection_params)
tables = db.get_table_list()
for table in tables:
result = db.query(f"""SELECT a, b, COUNT(c) FROM {table} GROUP BY a, b""")
print(result)
Where db.query is defined as:
def query(self, sql):
with self._connection.cursor() as cur:
cur.execute(sql)
return_value = cur.fetchall()
return return_value
and self._connection is:
self._connection = psycopg2.connect(**connection_params)
Do I need to explicitly end the transaction in some way to free up locks? And how can I go about doing this in psycopg2? I would have assumed that there was an implicit end to the transaction when the cursor is closed on __exit__. I know if I was inserting or deleting rows I would use COMMIT at the end, but it seems strange to use as I am not changing the table.
UPDATE: When I explicitly open and close the connection in the loop, the error does not show. However, I assume there is a better way to end the transaction after each SELECT than this.
I am using python to perform basic ETL to transfer records from a mysql database to a postgres database. I am using python to commence the tranfer:
python code
source_cursor = source_cnx.cursor()
source_cursor.execute(query.extract_query)
data = source_cursor.fetchall()
source_cursor.close()
# load data into warehouse db
if data:
target_cursor = target_cnx.cursor()
#target_cursor.execute("USE {};".format(datawarehouse_name))
target_cursor.executemany(query.load_query, data)
print('data loaded to warehouse db')
target_cursor.close()
else:
print('data is empty')
MySQL Extract (extract_query):
SELECT `tbl_rrc`.`id`,
`tbl_rrc`.`col_filing_operator`,
`tbl_rrc`.`col_medium`,
`tbl_rrc`.`col_district`,
`tbl_rrc`.`col_type`,
DATE_FORMAT(`tbl_rrc`.`col_timestamp`, '%Y-%m-%d %T.%f') as `col_timestamp`
from `tbl_rrc`
PostgreSQL (loading_query)
INSERT INTO geo_data_staging.tbl_rrc
(id,
col_filing_operator,
col_medium,
col_district,
col_type,
col_timestamp)
VALUES
(%s,%s,%s,%s,%s);
Of note, there is a PK constraint on Id.
The problem is while I have no errors, I'm not seeing any of the records in the target table. I tested this by manually inserting a record, then running again. The code errored out violating PK constraint. So I know it's finding the table.
Any idea on what I could be missing, I would be greatly appreciate it.
Using psycopg2, you have to call commit() on your cursors in order for transactions to be committed. If you just call close(), the transaction will implicitly roll back.
There are a couple of exceptions to this. You can set the connection to autocommit. You can also use your cursors inside a with block, which will automatically commit if the block doesn't throw any exceptions.
I am trying to run some querys that needs to create some temporary tables and then returns a result set, but i am unable to do that with MySQLdb api.
I already dig something about this issue like here but without success.
My query is like this:
create temporary table tmp1
select * from table1;
alter tmp1 add index(somefield);
create temporary table tmp2
select * from table2;
select * from tmp1 inner join tmp2 using(somefield);
This returns immediatly an empty result set. If i go to the mysql client and do a show full processlist i can see my queries executing. They take some minutes to complete.
Why cursor returns immediatly and don't wait to query to run.
If i try to run another query i have a "Commands out of sync; you can't run this command now"
I already tried to put my connection with autocommit to True
db = MySQLdb.connect(host='ip',
user='root',
passwd='pass',
db='mydb',
use_unicode=True
)
db.autocommit(True)
Or put every statement in is own cursor.execute() and between them db.commit() but without success too.
Can you help me to figure what is the problem? I know mysql don't support transactions for some operations like alter table, but why the api don't wait until everything is finished like it does with a select?
By the way i'm trying to do this on a ipython notebook.
I suspect that you're passing your multi-statement SQL string directly to the cursor.execute function. The thing is, each of the statements is a query in its own right so it's unclear what the result set should contain.
Here's an example to show what I mean. The first case is passing a semicolon set of statements to execute which is what I presume you have currently.
def query_single_sql(cursor):
print 'query_single_sql'
sql = []
sql.append("""CREATE TEMPORARY TABLE tmp1 (id int)""")
sql.append("""INSERT INTO tmp1 VALUES (1)""")
sql.append("""SELECT * from tmp1""")
cursor.execute(';'.join(sql))
print list(cursor.fetchall())
Output:
query_single_sql
[]
You can see that nothing is returned, even though there is clearly data in the table and a SELECT is used.
The second case is where each statement is executed as an independent query, and the results printed for each query.
def query_separate_sql(cursor):
print 'query_separate_sql'
sql = []
sql.append("""CREATE TEMPORARY TABLE tmp3 (id int)""")
sql.append("""INSERT INTO tmp3 VALUES (1)""")
sql.append("""SELECT * from tmp3""")
for query in sql:
cursor.execute(query)
print list(cursor.fetchall())
Output:
query_separate_sql
[]
[]
[(1L,)]
As you can see, we consumed the results of the cursor for each query and the final query has the results we expect.
I suspect that even though you've issued multiple queries, the API only has a handle to the first query executed and so immediately returns when the CREATE TABLE is done. I'd suggest serializing your queries as described in the second example above.
i am inserting records to sql server from python using pymssql. The database takes 2 milliseconds to execute a query, yet it insert 6 rows per second. The only problem is at code side. how to optimize following code or what is the fastest method to insert records.
def save(self):
conn = pymssql.connect(host=dbHost, user=dbUser,
password=dbPassword, database=dbName, as_dict=True)
cur = conn.cursor()
self.pageURL = self.pageURL.replace("'","''")
query = "my query is there"
cur.execute(query)
conn.commit()
conn.close()
It looks like you're creating a new connection per insert there. That's probably the major reason for the slowdown: building new connections is typically quite slow. Create the connection outside the method and you should see a large improvement. You can also create a cursor outside function and re-use it, which will be another speedup.
Depending on your situation, you may also want to use the same transaction for more than a single insertion. This changes the behaviour a little -- since a transaction is supposed to be atomic and either completely succeeds or completely fails -- but committing a transaction is typically a slow operation, because it has to be certain the whole operation succeeded.
In addition to Thomas' great advice,
I'd suggest you look into executemany()*, e.g.:
cur.executemany("INSERT INTO persons VALUES(%d, %s)",
[ (1, 'John Doe'), (2, 'Jane Doe') ])
...where the second argument of executemany() should be a sequence of rows to insert.
This brings up another point:
You probably want to send your query and query parameters as separate arguments to either execute() or executemany(). This will allow the PyMSSQL module to handle any quoting issues for you.
*executemany() as described in the Python DB-API:
.executemany(operation,seq_of_parameters)
Prepare a database operation (query or
command) and then execute it against
all parameter sequences or mappings
found in the sequence
seq_of_parameters.
I am having problems with a Python script which is basically just analysing a CSV file line-by-line and then inserting each line into a MySQL table using a FOR loop:
f = csv.reader(open(filePath, "r"))
i = 1
for line in f:
if (i > skipLines):
vals = nullify(line)
try:
cursor.execute(query, vals)
except TypeError:
sys.exc_clear()
i += 1
return
Where the query is of the form:
query = ("insert into %s" % tableName) + (" values (%s)" % placeholders)
This is working perfectly fine with every file it is used for with one exception - the largest file. It stops at different points each time - sometimes it reaches 600,000 records, sometimes 900,000. But there are about 4,000,000 records in total.
I can't figure out why it is doing this. The table type is MyISAM. Plenty of disk space available. The table is reaching about 35MB when it stops. max_allowed_packet is set to 16MB but I don't think is a problem as it is executing line-by-line?
Anyone have any ideas what this could be? Not sure whether it is Python, MySQL or the MySQLdb module that is responsible for this.
Thanks in advance.
Have you tried LOAD MySQL function?
query = "LOAD DATA INFILE '/path/to/file' INTO TABLE atable FIELDS TERMINATED BY ',' ENCLOSED BY '\"' ESCAPED BY '\\\\'"
cursor.execute( query )
You can always pre-process the CSV file (at least that's what I do :)
Another thing worth trying would be bulk inserts. You could try to insert multiple rows with one query:
INSERT INTO x (a,b)
VALUES
('1', 'one'),
('2', 'two'),
('3', 'three')
Oh, yeah, and you don't need to commit since it's the MyISAM engine.
As S. Lott alluded to, aren't cursors used as handles into transactions?
So at any time the db is giving you the option of rolling back all those pending inserts.
You may simply have too many inserts for one transaction. Try committing the transaction every couple of thousand inserts.