make python wait for stored procedure to finish executing - python

I have a python script that uses pyodbc to call an MSSQL stored procedure, like so:
cursor.execute("exec MyProcedure #param1 = '" + myparam + "'")
I call this stored procedure inside a loop, and I notice that sometimes, the procedure gets called again before it was finished executing the last time. I know this because if I add the line
time.sleep(1)
after the execute line, everything works fine.
Is there a more elegant and less time-costly way to say, "sleep until the exec is finished"?
Update (Divij's solution): This code is currently not working for me:
from tornado import gen
import pyodbc
#gen.engine
def func(*args, **kwargs):
# connect to db
cnxn_str = """
Driver={SQL Server Native Client 11.0};
Server=172.16.111.235\SQLEXPRESS;
Database=CellTestData2;
UID=sa;
PWD=Welcome!;
"""
cnxn = pyodbc.connect(cnxn_str)
cnxn.autocommit = True
cursor = cnxn.cursor()
for _ in range(5):
yield gen.Task(cursor.execute, 'exec longtest')
return
func()

I know this is old, but I just spent several hours trying to figure out how to make my Python code wait for a stored proc on MSSQL to finish.
The issue is not with asynchronous calls.
The key to resolving this issue is to make sure that your procedure does not return any messages until it's finished running. Otherwise, PYDOBC interprets the first message from the proc as the end of it.
Run your procedure with SET NOCOUNT ON. Also, make sure any PRINT statements or RAISERROR you might use for debugging are muted.
Add a BIT parameter like #muted to your proc and only raise your debugging messages if it's 0.
In my particular case, I'm executing a proc to process a loaded table and my application was exiting and closing the cursor before the procedure finished running because I was getting row counts and debugging messages.
So to summarize, do something along the lines of
cursor.execute('SET NOCOUNT ON; EXEC schema.proc #muted = 1')
and PYODBC will wait for the proc to finish.

Here's my workaround:
In the database, I make a table called RunningStatus with just one field, status, which is a bit, and just one row, initially set to 0.
At the beginning of my stored procedure, I execute the line
update RunningStatus set status = 1;
And at the end of the stored procedure,
update RunningStatus set status = 0;
In my Python script, I open a new connection and cursor to the same database. After my execute line, I simply add
while 1:
q = status_check_cursor.execute('select status from RunningStatus').fetchone()
if q[0] == 0:
break
You need to make a new connection and cursor, because any calls from the old connection will interrupt the stored procedure and potentially cause status to never go back to 0.
It's a little janky but it's working great for me!

I have found a solution which does not require "muting" your stored procedures or altering them in any way. According to the pyodbc wiki:
nextset()
This method will make the cursor skip to the next available result
set, discarding any remaining rows from the current result set. If
there are no more result sets, the method returns False. Otherwise, it
returns a True and subsequent calls to the fetch methods will return
rows from the next result set.
This method is primarily used if you have stored procedures that
return multiple results.
To wait for a stored procedure to finish execution before moving on with the rest of the program, use the following code after executing the code that runs the stored procedure in the cursor.
slept = 0
while cursor.nextset():
if slept >= TIMEOUT:
break
time.sleep(1)
slept += 1
You could also change the time.sleep() value from 1 second to a little under a second to minimize extra wait time, but I don't recommend calling it very many times a second.
Here is a full program showing how this code would be implemented:
import time
import pyodbc
connection = pyodbc.connect('DRIVER={SQL Server};SERVER=<hostname>;PORT=1433;DATABASE=<database name>;UID=<database user>;PWD=password;CHARSET=UTF-8;')
cursor = connection.cursor()
TIMEOUT = 20 # Max number of seconds to wait for procedure to finish execution
params = ['value1', 2, 'value3']
cursor.execute("BEGIN EXEC dbo.sp_StoredProcedureName ?, ?, ? END", *params)
# here's where the magic happens with the nextset() function
slept = 0
while cursor.nextset():
if slept >= TIMEOUT:
break
time.sleep(1)
slept += 1
cursor.close()
connection.close()

There's no python built-in that allows you to wait for an asynchronous call to finish. However, you can achieve this behaviour using Tornado's IOLoop. Tornado's gen interface allows you to do register a function call as a Task and return to the next line in your function once the call has finished executing. Here's an example using gen and gen.Task
from tornado import gen
#gen.engine
def func(*args, **kwargs)
for _ in range(5):
yield gen.Task(async_function_call, arg1, arg2)
return
In the example, execution of func resumes after async_function_call is finished. This way subsequent calls to asnyc_function_call won't overlap, and you wont' have to pause execution of the main process with the time.sleep call.

i think my way is alittle bit more crude but in the same time much more easy to understand:
cursor = connection.cursor()
SQLCommand = ("IF EXISTS(SELECT 1 FROM msdb.dbo.sysjobs J JOIN
msdb.dbo.sysjobactivity A ON A.job_id = J.job_id WHERE J.name ='dbo.SPNAME' AND
A.run_requested_date IS NOT NULL AND A.stop_execution_date IS NULL) select 'The job is
running!' ELSE select 'The job is not running.'")
cursor.execute(SQLCommand)
results = cursor.fetchone()
sresult= str(results)
while "The job is not running" in sresult:
time.sleep(1)
cursor.execute(SQLCommand)
results = cursor.fetchone()
sresult= str(results)
while "SPNAME" return "the job is not running" from the jobactivity table sleep 1 second and check the result again.
this work for sql job, for SP should like in another table

Related

Python: Improving performance - Writing to database in seperate thread

I am running a python app where I for various reasons have to host my program on a server in one part of the world and then have my database in another.
I tested via a simple script, and from my home which is in a neighboring country to the database server, the time to write and retrieve a row from the database is about 0.035 seconds (which is a nice speed imo) compared to 0,16 seconds when my python server in the other end of the world performs same action.
This is an issue as I am trying to keep my python app as fast as possible so I was wondering if there is a smart way to do this?
As I am running my code synchronously my program is waiting every time it has to write to the db, which is about 3 times a second so the time adds up. Is it possible to run the connection to the database in a separate thread or something, so it doesn't halt the whole program while it tries to send data to the database? Or can this be done using asyncio (I have no experience with async code)?
I am really struggling figuring out a good way to solve this issue.
In advance, many thanks!
Yes, you can create a thread that does the writes in the background. In your case, it seems reasonable to have a queue where the main thread puts things to be written and the db thread gets and writes them. The queue can have a maximum depth so that when too much stuff is pending, the main thread waits. You could also do something different like drop things that happen too fast. Or, use a db with synchronization and write a local copy. You also may have an opportunity to speed up the writes a bit by committing multiple at once.
This is a sketch of a worker thread
import threading
import queue
class SqlWriterThread(threading.Thread):
def __init__(self, db_connect_info, maxsize=8):
super().__init__()
self.db_connect_info = db_connect_info
self.q = queue.Queue(maxsize)
# TODO: Can expose q.put directly if you don't need to
# intercept the call
# self.put = q.put
self.start()
def put(self, statement):
print(f"DEBUG: Putting\n{statement}")
self.q.put(statement)
def run(self):
db_conn = None
while True:
# get all the statements you can, waiting on first
statements = [self.q.get()]
try:
while True:
statements.append(self.q.get(), block=False)
except queue.Empty:
pass
try:
# early exit before connecting if channel is closed.
if statements[0] is None:
return
if not db_conn:
db_conn = do_my_sql_connect()
try:
print("Debug: Executing\n", "--------\n".join(f"{id(s)} {s}" for s in statements))
# todo: need to detect closed connection, then reconnect and resart loop
cursor = db_conn.cursor()
for statement in statements:
if statement is None:
return
cursor.execute(*statement)
finally:
cursor.commit()
finally:
for _ in statements:
self.q.task_done()
sql_writer = SqlWriterThread(('user', 'host', 'credentials'))
sql_writer.put(('execute some stuff',))

Can't pickle psycopg2.extensions.connection objects when using pool.imap, but can be done in individual processes

I am trying to build an application which will "check out" a cell, which is a square covering a part of land in a geographic database, and perform an analysis of the features within that cell. Since I have many cells to process, I am using a multiprocessing approach.
I had it somewhat working inside of my object like this:
class DistributedGeographicConstraintProcessor:
...
def _process_cell(self, conn_string):
conn = pg2.connect(conn_string)
try:
cur = conn.cursor()
cell_id = self._check_out_cell(cur)
conn.commit()
print(f"processing cell_id {cell_id}...")
for constraint in self.constraints:
# print(f"processing {constraint.name()}...")
query = constraint.prepare_distributed_query(self.job, self.grid)
cur.execute(query, {
"buffer": constraint.buffer(),
"cell_id": cell_id,
"name": constraint.name(),
"simplify_tolerance": constraint.simplify_tolerance()
})
# TODO: do a final race condition check to further suppress duplicates
self._check_in_cell(cur, cell_id)
conn.commit()
finally:
del cur
conn.close()
return None
def run(self):
while True:
if not self._job_finished():
params = [self.conn_string] * self.num_cores
processes = []
for param in params:
process = mp.Process(target=self._process_cell, args=(param,))
processes.append(process)
sleep(0.1) # Prevent multiple processes from checkout out the same grid square
process.start()
for process in processes:
process.join()
else:
self._finalize_job()
break
But the problem is that it will only start four processes and wait until they all finish before starting four new processes.
I want to make it so when one process finishes its work, it will begin working on the next cell immediately, even if its co-processes are not yet finished.
I am unsure about how to implement this and I have tried using a pool like this:
def run(self):
pool = mp.Pool(self.num_cores)
unprocessed_cells = self._unprocessed_cells()
for i in pool.imap(self._process_cell, unprocessed_cells):
print(i)
But this just tells me that the connection is not able to be pickled:
TypeError: can't pickle psycopg2.extensions.connection objects
But I do not understand why, because it is the exact same function that I am using in the imap function as in the Process target.
I have already looked at these threads, here is why they do not answer my question:
Error Connecting To PostgreSQL can't pickle psycopg2.extensions.connection objects - The answer here only indicates that multiple processes cannot share the same connection. I am aware of this, and am initializing the process inside the function which is being executed in the child process. Also, as I mentioned, it works when I map the function to individual Process instances, with the same function with the same inputs.
Multiprocessing result of a psycopg2 request. “Can't pickle psycopg2.extensions.connection objects” - There is no answer nor any comments on this question, and the code is not intact anyway - the author makes reference to a function that does not specified in the question, and in any case it is obvious that they are blatantly trying to share the same cursor between processes.
My guess is that you're attaching some connection object to self; try to rewrite your solution using functions only (no classes/methods).
Here is a simplified version of a single producer/multiple workers solution I used some time ago:
def worker(param):
//connect to pg
//do work
def main():
pool = Pool(processes=NUM_PROC)
tasks = []
for param in params:
t = pool.apply_async(utils.process_month, args=(param, ))
tasks.append(t)
pool.close()
finished = false
while not finished:
finished = True
for t in tasks:
if not t.ready():
finished = False
break
time.sleep(1)

Python running SQL statement in threaded timer

I am trying to run a SQL statement every minute, but I need to be able to iterate through incoming data as well continuously. So I have made a thread timer that I found with the help of another SO question, but I am receiving the following error when trying to run it. What are some solutions to this?
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 18460 and this is thread id 29296
Here is a sample of the timer code. I am using a update every second here for testing purposes only.
def update():
threading.Timer(1.0, update).start()
cursor.execute("UPDATE users SET timer = timer + 1")
db.commit()
print("Updating all timers")
update()
This should work by putting the connection in the same thread. If you do it every minute I suppose its ok. Making a connection every second seems a bit expensive though.
def update():
threading.Timer(1.0, update).start()
conn = sqlite3.connect(db)
cursor = conn.cursor()
cursor.execute("UPDATE users SET timer = timer + 1")
conn.commit()
print("Updating all timers")
update()
To answer your question of ‘why’. I think it is because your connection is on the main thread and your commit to the connection is on another. Hope this is correct and I hope this helps.

Performance of multiprocessing

I've created a quite simple script that works with multiprocessing and SQL. The aim of this exercise is to obtain the lowest time of execution :
def Query(Query):
conn = sqlite3.connect("DB.db")
cur = conn.cursor()
cur.execute(Query)
cur.close()
conn.close()
return
if __name__ == '__main__':
conn = sqlite3.connect("DB.db")
cur = conn.cursor()
start = time.time()
curOperations.execute(QUERY)
curOperations.execute(QUERY)
curOperations.execute(QUERY)
end = time.time()
TIME1 = end - start
cur.execute('PRAGMA journal_mode=wal')
conn.commit()
start = time.time()
pool = Pool(processes=2)
pool.imap(Query,[QUERY, QUERY, QUERY])
pool.close()
pool.join()
end = time.time()
TIME2 = end - start
cur.close()
conn.close()
The average result for TIME1 after 20 executions is 13.43 and for TIME2, 10.39.
Shouldn't it be lower than that ?! am I doing something wrong ?
For my answer, I will assume that your query only reads things from the database.
Before you try to make something faster, you need to know what exactly is preventing the process from being faster.
Not all speed problems are amendable to improvement by multiprocessing!
So what you really need to do is profile the application to see where it spends its time.
Since SQLite does caching of queries, I would suggest timing each execution of the query in the single process separately.
I would suspect that the first query takes longer than the following ones.
Also consider the overhead in the multiprocessing case. The query has to be pickled and sent to the worker process via IPC. Then each worker has to create a connection and cursor and close them afterwards. In a real world situation, your query function would have done something with the data, e.g. return it to the master process which also requires pickling it and sending it via IPC.
Since all workers access the same database, at some point the reading from the database will become the bottleneck.
If you query modifies the database, access will be serialized anyway to prevent coruption.

Multiprocess sqlite INSERT: "database is locked"

(Please note: There is a question called "SQLite3 and Multiprocessing" but that question is actually about multithreading and so is the accepted answer, this isn't a duplicate)
I'm implementing a multiprocess script, each process will need to write some results in an sqlite table. My program keeps crashing with database is locked (with sqlite only one DB modification is allowed at a time).
Here's an example of what I have:
def scan(n):
n = n + 1 # Some calculation
cur.execute(" \
INSERT INTO hello \
(n) \
VALUES ('"+n+"') \
")
con.commit()
con.close()
return True
if __name__ == '__main__':
pool = Pool(processes=int(sys.argv[1]))
for status in pool.imap_unordered(scan, range(0,9999)):
if status:
print "ok"
pool.close()
I've tried using a lock by declaring a lock in the main and using it as a global in scan(), but it didn't stop me getting the database is locked.
What is the proper way of making sure only one INSERT statement will get issued at the same time in a multiprocess Python script?
EDIT:
I'm running on a Debian-based Linux.
This will happen if the write lock can't be grabbed within (by default) a 5-second timeout. In general, make sure your code COMMITs its transactions with sufficient frequency, thereby releasing the lock and letting other processes have a chance to grab it. If you want to wait for longer, you can do that:
db = sqlite.connect(filename, timeout=30.0)
...waits for 30 seconds.

Categories