I was wondering if one of you could advice me how to tackle a problem I am having. I developed a python script that updates data to a database (MySQL) every iteration (endless while loop). What I want to prevent is that if the script is accidentally closed or stopped half way the script it waits till all the data is loaded into the database and the MySQL connection is closed (I want this to prevent incomplete queries). Is there a way to tell the program to wait till the loop is done before it closes?
I hope this all makes sense, feel free to ask questions.
Thank you for your time in advance.
There are some things you can do to prevent a program from being closed unexpectedly (signal handlers, etc), but they only work in some cases and not others. There is always the chance of a system shutdown, power failure or SIGKILL that will terminate your program whether you like it or not. The canonical solution to this sort of problem is to use database transactions.
If you do your work in a transaction, then the database will simply roll back any changes if your script is interrupted, so you will not have any incomplete queries. The worst that can happen is that you need to repeat the query from the beginning next time.
I hope you are asking for a way by which if someone presses ctrl+c or ctrl+z the program should not stop execution till it completes all the data insertion.
There are two approach to it.
1) Insert all the data into the database by enabling transaction. When transaction is enabled until you commit the data will not be inserted. So you can commit once all the datas are entered and incase if some one closes the application the transaction will not be commited.
2) You can trap the interrupt signals of ctrl+c and ctrl+z and thus your program still runs uninterrupted. This would help.
Use with statement. Some examples here.
Define some exception handler. Like:
class Cursor(object):
def __init__(self,
username,
password
):
# init your connection here
def __iter__(self):
# for reading content of cursor
def __enter__(self):
# something executed before establish connection
def __exit__(self, ext_type, exc_value, traceback):
# something executed when there is an error or connection finishes
with Cursor() as cursor:
print(cursor)
connection = (cursor.connection)
print(connection)
Related
I am trying to force killing (not closing) a q session once my query is done to save resources on my machine.
It is currently working using:
conn.sendAsync("exit 0")
Problem is, if I run a query right after it again (trying to reopen the connection and run another query), it might fail as the previous connection would still being killed as it is asynchronous.
Therefore, I am trying to do the same thing with a synchronous query, but when trying:
conn.sendSync("exit 0")
I get:
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
python-BaseException
Can I specify a timeout such that the q session will be killed automatically after say 10 seconds instead, or maybe there is another way to force killing the q session?
My code looks like this:
conn = qc.QConnection(host='localhost', port=12345, timeout=10000)
conn.open()
res = None
try:
res = conn.sendSync(query, numpy_temporals=True)
except Exception as e:
print(f'Error running {query}: {e}')
conn.sendSync("exit 0")
conn.close()
I'd suggest we take a step back and re-evaluate if it's really a right thing to kill the KDB process after your Python program runs a query. If the program isn't responsible to bring up the KDB process, most likely it should not bring the process down.
Given the rationale of saving resource, I believe it keeps many data in memory and thus takes time to start up. It adds another reason that you shouldnt kill it if you need to use it a second time.
You shouldn't be killing a kdb process you intend to query again. Some suggestions on points in your question:
once my query is done to save resources -> you can manually call garbage collection with .Q.gc[] to free up memory or alternatively and perhaps better enable immediate garbage collection with -g 1 on start. Note if you create large global variables in your query this memory will not be freed up / returned.
https://code.kx.com/q/ref/dotq/#qgc-garbage-collect
https://code.kx.com/q/basics/syscmds/#g-garbage-collection-mode
killed automatically after say 10 seconds -> if your intention here is to not allow client queries such as from your python process to run over 10 seconds you can set a query timeout with -T 10 on start or when process is running with \T 10 / system "T 10"
https://code.kx.com/q/basics/cmdline/#-t-timeout
In my program there are multiple asynchronous functions updating data in a database.
There can be some cases where the functions are executed parallelly.
My question is :
In my case, do I need to create a new connection each time in a function or having a single connection throughout the program will work fine.
Second, in case of a single connection is it necessary to close it at the end?
Also, please recommend the best tool to access .db file outside the code [Just in case], that shouldn't interrupt the connections of the code with the database even if I make some changes personally outside the code.
Note : Am on windows
Thanks!
I'm trying to call a stored procedure in my MSSQL database from a python script, but it does not run completely when called via python. This procedure consolidates transaction data into hour/daily blocks in a single table which is later grabbed by the python script. If I run the procedure in SQL studio, it completes just fine.
When I run it via my script, it gets cut short about 2/3's of the way through. Currently I found a work around, by making the program sleep for 10 seconds before moving on to the next SQL statement, however this is not time efficient and unreliable as some procedures may not finish in that time. I'm looking for a more elegant way to implement this.
Current Code:
cursor.execute("execute mySP")
time.sleep(10)
cursor.commit()
The most related article I can find to my issue is here:
make python wait for stored procedure to finish executing
I tried the solution using Tornado and I/O generators, but ran into the same issue as listed in the article, that was never resolved. I also tried the accepted solution to set a runningstatus field in the database by my stored procedures. At the beginnning of my SP Status is updated to 1 in RunningStatus, and when the SP finished Status is updated to 0 in RunningStatus. Then I implemented the following python code:
conn=pyodbc_connect(conn_str)
cursor=conn.cursor()
sconn=pyodbc_connect(conn_str)
scursor=sconn.cursor()
cursor.execute("execute mySP")
cursor.commit()
while 1:
q=scursor.execute("SELECT Status FROM RunningStatus").fetchone()
if(q[0]==0):
break
When I implement this, the same problem happens as before with my storedprocedure finishing executing prior to it actually being complete. If I eliminate my cursor.commit(), as follows, I end up with the connection just hanging indefinitely until I kill the python process.
conn=pyodbc_connect(conn_str)
cursor=conn.cursor()
sconn=pyodbc_connect(conn_str)
scursor=sconn.cursor()
cursor.execute("execute mySP")
while 1:
q=scursor.execute("SELECT Status FROM RunningStatus").fetchone()
if(q[0]==0):
break
Any assistance in finding a more efficient and reliable way to implement this, as opposed to time.sleep(10) would be appreciated.
As OP found out, inconsistent or imcomplete processing of stored procedures from application layer like Python may be due to straying from best practices of TSQL scripting.
As #AaronBetrand highlights in this Stored Procedures Best Practices Checklist blog, consider the following among other items:
Explicitly and liberally use BEGIN ... END blocks;
Use SET NOCOUNT ON to avoid messages sent to client for every row affected action, possibly interrupting workflow;
Use semicolons for statement terminators.
Example
CREATE PROCEDURE dbo.myStoredProc
AS
BEGIN
SET NOCOUNT ON;
SELECT * FROM foo;
SELECT * FROM bar;
END
GO
I'm kind of new to Python and its MySQLdb connector.
I'm writing an API to return some data from a database using the RESTful approach. In PHP, I wrapped the Connection management part in a class, acting as an abstraction layer for MySQL queries.
In Python:
I define the connection early on in the script: con = mdb.connect('localhost', 'user', 'passwd', 'dbname')
Then, in all subsequent methods:
import MySQLdb as mdb
def insert_func():
with con:
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("INSERT INTO table (col1, col2, col3) VALUES (%s, %s, %s)", (val1, val2, val3) )
rows = cur.fetchall()
#do something with the results
return someval
etc.
I use mdb.cursors.DictCursor because I prefer to be able to access database columns in an associative array manner.
Now the problems start popping up:
in one function, I issue an insert query to create a 'group' with unique 'groupid'.
This 'group' has a creator. Every user in the database holds a JSON array in the 'groups' column of his/her row in the table.
So when I create a new group, I want to assign the groupid to the user that created it.
I update the user's record using a similar function.
I've wrapped the 'insert' and 'update' parts in two separate function defs.
The first time I run the script, everything works fine.
The second time I run the script, the script runs endlessly (I suspect due to some idle connection to the MySQL database).
When I interrupt it using CTRL + C, I get one of the following errors:
"'Cursor' object has no attribute 'connection'"
"commands out of sync; you can't run this command now"
or any other KeyboardInterrupt exception, as would be expected.
It seems to me that these errors are caused by some erroneous way of handling connections and cursors in my code.
I read it was good practice to use with con: so that the connection will automatically close itself after the query. I use 'with' on 'con' in each function, so the connection is closed, but I decided to define the connection globally, for any function to use it. This seems incompatible with the with con: context management. I suspect the cursor needs to be 'context managed' in a similar way, but I do not know how to do this (To my knowledge, PHP doesn't use cursors for MySQL, so I have no experience using them).
I now have the following questions:
Why does it work the first time but not the second? (it will however, work again, once, after the CTRL + C interrupt).
How should I go about using connections and cursors when using multiple functions (that can be called upon in sequence)?
I think there are two main issues going on here- one appears to be python code and the other is the structure of how you're interacting to your DB.
First, you're not closing your connection. This depends on your application's needs - you have to decide how long it should stay open. Reference this SO question
from contextlib import closing
with closing( connection.cursor() ) as cursor:
... use the cursor ...
# cursor closed. Guaranteed.
connection.close()
Right now, you have to interrupt your program with Ctl+C because there's no reason for your with statement to stop running.
Second, start thinking about your interactions with the DB in terms of 'transactions'. Do something, commit it to the DB, if it didn't work, rollback, if it did, close the connection. Here's a tutorial.
With connections, as with file handles the rule of thumb is open late, close early.
So I would recommend share connections only where they are trying to do one thing. Or if you multiprocess, then each process gets a connection, again following open late, close early. And if you are doing sequential operation (say in a loop) open and close outside the loop. Having global connections can get messy. Mainly because now you have to keep track of which function uses it at what time, and what it tries to do with it.
The issue of "cannot run command now", is because your keyboard interrupt kills the active connection.
As to part one of your question - endlessly could be anywhere. Each instance of python will get its own connection. So when you run it the second time it should get its own connection. Open up a mysql client and do
show full processlist
to see whats going on.
I have built little custom web framework on top of Python 3.2 using Cherrypy to built WSGI application and SQLAlchemy Core (just for connection pooling and executing text SQL statements).
Versions I am using:
Python: 3.2.3
CherryPy: 3.2.2
SQL Alchemy: 0.7.5
Psycopg2: 2.4.5
For every request, a DB connection is retrieved from pool using sqlalchemy.engine.base.Engine´s connect method. After request handler finishes, the connection is closed using close method. Pseudocode for example:
with db.connect() as db:
handler(db)
Where db.connect() is context manager defined like this:
#contextmanager
def connect(self):
conn = self.engine.connect()
try:
yield conn
finally:
conn.close()
I hope that this is correct practice for doing this task. It worked until things went more complicated in page handlers.
I am getting weird behavior. Because of uknown reason, connection is sometimes closed before the handler finishes it´s work. But not every time!
By observation, this happens only when making requests quickly consecutively. If I make small pause between requests, the connection is not closed and request is finished successfully. But anyway, this does not happen every time. I have not found more specific pattern in failures/successes of requests.
I observed that the connection is not closed by my context manager. It is already closed at that point.
My question:
How to figure out when, why and by what code is my connection closed?
I tried debugging. I put breakpoint on sqlalchemy.engine.base.Connection´s close method but the connection is closed before it reach this code. Which is weird.
I will appreciate any tips or help.
*edit *
Information requested by zzzeek:
symptom of the "connection being closed":
Sorry for not clarifying this before. It is the sqlalchemy.engine.Connection that is closed.
In handlers I am calling sqlalchemy.engine.base.Connection´s execute method to get data from database (select statements). I can say that sqlalchemy.engine.Connection is closed, because I am checking it's closed property before calling execute.
I can post here traceback, but only thing that you will probably see in it is that Exception is raised before the execute in my DB wrapper library (because connection is closed).
If I remove this check (and let the execute method execute), SQLAlchemy raises this exception: http://pastebin.com/H6052yca
Regarding the concurency problem that zzzeek mentioned. I must apologize. After more observation the situation is slightly different.
This is exact procedure how to invoke the error:
Request for HandlerA. Everything ok.
Wait moment (about 10-20s).
Request for HandlerB. Everything ok.
Request for HandlerA. Everything ok.
Immediate request for HandlerB. Error!
Immediate request for HandlerB. Error!
Immediate request for HandlerB. Error!
Wait moment (about 10-20s).
Request for HandlerB. Everything ok.
I am using default SQLAlchemy pooling class with pool_size = 5.
I know that you cannot do miracles when you don't have the actual code. But unfortunately, I cannot share it. Is there any best practice for debugging this type of error? Or the only option is to debug more deeply step by step and try to figure it out?
Another observation:
When I start the server in debugger (WingIDE), I cannot bring up the error. Probably because the the debugger is so slow when interpreting the code, that the connection is somehow "repaired" before second request (RequestB) is handled.
After daylong debugging. I found out the problem.
Unfortunatelly it was not related to SQLAlchemy directly. So the question should be deleted. But you guys tried to help me, so I will answer my own question. And maybe, somebody will find this helpfull some day.
Basically, Error was caused by my custom publish/subscribe methods which did not play nicely in multi threaded enviorment.
I tried stepping code line by line... which was not working (as i described in the question). So I started generating very detailed log of what is going on.
Even then, everything looked normal, until I noticed that few lines before crash, the address of Connection object referenced in the model changed. Which practically meant that something assigned another Connection object to model and that connection object was already closed.
So the lesson is. When everything looks correct, print out / log the repr() of objects which are problematic.
Thanks to commenters for their time.