Segfault with pymssql when cannot connect to server on multiple threads - python

We've come across this when our MS SQL server became unreachable. This caused a bug in our code that brought our program to a screeching halt and of course pitchforks and torches of users to our door. We've been able to boil down our problem to this: If a user, Bob, attempts to connect to the downed database he will of course wait while the program attempts to connect. If at this point while Bob is waiting, a second user, Joe, attempts to connect and he will wait as well. After awhile Bob will timeout and get a proper error raised. However Joe's connection will timeout and a segmentation fault occurs bringing everything to a screeching halt.
We've been able to reliably reproduce this error with the following code
import threading
import datetime
import time
import pymssql
class ThreadClass(threading.Thread):
def run(self):
now = datetime.datetime.now()
print "%s connecting at time: %s" % (self.getName(), now)
conn = pymssql.connect(host="10.255.255.1", database='blah',
user="blah", password="pass")
for i in range(2):
t = ThreadClass()
t.start()
time.sleep(1)
This will cause a segfault after the first thread raises it's error. Is there a way to stop this segfault, make it properly raise an error, or is there something I'm missing here?
Pymssql version 1.0.2 and python 2.6.6.

We went and asked the question over at pymssql's user group as well to cover all our bases. According to one of the developers pymssql is not thread safe in the current stable release. It sounds like it might be in the development 1.9 release or in the next major 2.0 release. We will probably switch to a different module or maybe use some sort of connection pooler with it but that's probably more of a bandage fix and not really ideal.

Related

IDE Crash causes hung job on Server

Good Day All! I am using pyodbc to connnect to a Microsoft SQL server using a Native Client 11.0 ODBC connection. Occasionally something will happen to cause Spyder to crash resulting in my query to hanging on the server. When this happens, all variables are lost, so I'm not able to cancel the job that is still on the server or close the connection. My DBAs do not have rules in place to cancel long running queries, but hung queries like this block ETLs. I have my ODBC connection set up the way they've requested, so the question is, what else can I do to prevent issues for my partners when Spyder crashes? Note: I've imported pandas as "pd".
try:
data_conn = pyodbc.connect(dECTV)
data_conn.timeout = 1000
tfn = pd.read_sql(tele,data_conn)
print("Call information retrieved")
except:
print('!~!~!~!n Exception has been Raised for Inbound information!~!~!~!')
tfn = pd.read_csv(export_location + r'\TFN_Details.csv')
finally:
data_conn.close
print("Connection Closed. Moving on.")
BTW, I've done a lot of reading over the last two hours and have what I consider to be a solution, but I wanted to see if others agree. My thoughts would be to execute the following before running anything new on the same server.
exec sp_who 'my_login_id'; kill 'resulting_SPID';

Python 'requests' GET in loop eventually throws [WinError 10048]

Disclaimer: This is similar to some other questions relating to this error but my program is not using any multi-threading/processing and I'm working with the 'requests' module instead of raw socket commands so none of the solutions I saw related to my issue.
I have a basic status-checking program running Python 3.4 on Windows that uses a GET request to pull some data off a status site hosted by a number of servers I have to keep watch over. The core code is setup like this:
import requests
import time
URL_LIST = [some, list, of, the, status, sites] # https:// sites
session = requests.session()
previous_data = ""
while 1:
data = ""
for url in URL_LIST:
headers = {'X-Auth-Token': Associated_Auth_Token}
try:
status = session.get(url, headers=headers).json()['status']
except ConnectionError:
status = "SERVER DOWN"
data += "%s \t%s\n" % (url, status)
if data != previous_data:
print(data)
previous_data = data
time.sleep(15)
...which typically runs just fine for hours (this script is intended to run 24/7 and has additional logging built in I left out here for simplicity and relevance) but eventually it crashes and throws the error mentioned in the title:
[WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
The servers I'm requesting from are notoriously slow at times (and sometimes go down entirely, hence the try/except) so my inclination would be that after looping this over and over eventually a request has not fully finished before the next request comes through and Windows tries to step on itself, but I don't see how that could happen with my code since it iterates serially through the URLs.
Also, if this is a TIME_WAIT issue as some other related posts ran into, I'd rather not have to wait for that to finish since I'd like to update every 15 seconds or better, so then I considered closing and opening a new requests session every so often since it typically works fine for hours before hitting a snag, but based off Lukasa's comment here:
To avoid getting sockets in TIME_WAIT, the best thing to do is to use a single Session object at as high a scope as you can and leave it open for the lifetime of your program. Requests will do its best to re-use the sockets as much as possible, which should prevent them lapsing into TIME_WAIT
...it sounds that is not a good idea - though when he says 'lifetime of your program' he may not intend the statement to include 24/7 use as in my case.
So instead of blindly trying things and waiting some number of hours for the program to crash again so I can see if the error changes, I wanted to consult the wealth of knowledge here first to see if anyone can see what's going wrong and knows how I should fix it.
Thanks!

Python using try to reduce timeout wait

I am using exscripts module which has a call conn.connect('IP address').
It tries to open a telnet session to that IP.
It will generate an error after connection times out.
The timeout exception is set somewhere in the code of the module or it would be what the default for telnet is. (not sure)
This timeout time is too long and slowing down the script if 1 device is not reachable. Is there something we can do with the try except here ? Like
Try for 3 secs:
then process the code
except:
print " timed out"
We changed the API. Mike Pennington only recently introduced the new connect_timeout parameter for that specific use case.
New solution (current master, latest release on pypi 2.1.451):
conn = Telnet(connect_timeout=3)
We changed the API because you usually don't want to wait for unreachable devices, but want to wait for commands to finish (some take a little longer).
I think you can use
conn = Telnet(timeout=3)
I dont know whether timeout in seconds. If microseconds, try 3000

SQLAlchemy Core: Connection is closing unexpectedly

I have built little custom web framework on top of Python 3.2 using Cherrypy to built WSGI application and SQLAlchemy Core (just for connection pooling and executing text SQL statements).
Versions I am using:
Python: 3.2.3
CherryPy: 3.2.2
SQL Alchemy: 0.7.5
Psycopg2: 2.4.5
For every request, a DB connection is retrieved from pool using sqlalchemy.engine.base.Engine´s connect method. After request handler finishes, the connection is closed using close method. Pseudocode for example:
with db.connect() as db:
handler(db)
Where db.connect() is context manager defined like this:
#contextmanager
def connect(self):
conn = self.engine.connect()
try:
yield conn
finally:
conn.close()
I hope that this is correct practice for doing this task. It worked until things went more complicated in page handlers.
I am getting weird behavior. Because of uknown reason, connection is sometimes closed before the handler finishes it´s work. But not every time!
By observation, this happens only when making requests quickly consecutively. If I make small pause between requests, the connection is not closed and request is finished successfully. But anyway, this does not happen every time. I have not found more specific pattern in failures/successes of requests.
I observed that the connection is not closed by my context manager. It is already closed at that point.
My question:
How to figure out when, why and by what code is my connection closed?
I tried debugging. I put breakpoint on sqlalchemy.engine.base.Connection´s close method but the connection is closed before it reach this code. Which is weird.
I will appreciate any tips or help.
*edit *
Information requested by zzzeek:
symptom of the "connection being closed":
Sorry for not clarifying this before. It is the sqlalchemy.engine.Connection that is closed.
In handlers I am calling sqlalchemy.engine.base.Connection´s execute method to get data from database (select statements). I can say that sqlalchemy.engine.Connection is closed, because I am checking it's closed property before calling execute.
I can post here traceback, but only thing that you will probably see in it is that Exception is raised before the execute in my DB wrapper library (because connection is closed).
If I remove this check (and let the execute method execute), SQLAlchemy raises this exception: http://pastebin.com/H6052yca
Regarding the concurency problem that zzzeek mentioned. I must apologize. After more observation the situation is slightly different.
This is exact procedure how to invoke the error:
Request for HandlerA. Everything ok.
Wait moment (about 10-20s).
Request for HandlerB. Everything ok.
Request for HandlerA. Everything ok.
Immediate request for HandlerB. Error!
Immediate request for HandlerB. Error!
Immediate request for HandlerB. Error!
Wait moment (about 10-20s).
Request for HandlerB. Everything ok.
I am using default SQLAlchemy pooling class with pool_size = 5.
I know that you cannot do miracles when you don't have the actual code. But unfortunately, I cannot share it. Is there any best practice for debugging this type of error? Or the only option is to debug more deeply step by step and try to figure it out?
Another observation:
When I start the server in debugger (WingIDE), I cannot bring up the error. Probably because the the debugger is so slow when interpreting the code, that the connection is somehow "repaired" before second request (RequestB) is handled.
After daylong debugging. I found out the problem.
Unfortunatelly it was not related to SQLAlchemy directly. So the question should be deleted. But you guys tried to help me, so I will answer my own question. And maybe, somebody will find this helpfull some day.
Basically, Error was caused by my custom publish/subscribe methods which did not play nicely in multi threaded enviorment.
I tried stepping code line by line... which was not working (as i described in the question). So I started generating very detailed log of what is going on.
Even then, everything looked normal, until I noticed that few lines before crash, the address of Connection object referenced in the model changed. Which practically meant that something assigned another Connection object to model and that connection object was already closed.
So the lesson is. When everything looks correct, print out / log the repr() of objects which are problematic.
Thanks to commenters for their time.

Permanent 'Temporary failure in name resolution' after running for a number of hours

After running for a number of hours on Linux, my Python 2.6 program that uses urllib2, httplib and threads, starts raising this error for every request:
<class 'urllib2.URLError'> URLError(gaierror(-3, 'Temporary failure in name resolution'),)
If I restart the program it starts working again. My guess is some kind of resource exhaustion but I don't know how to check for it. How do I diagnose and fix the problem?
This was caused by a library's failure to close connections, leading to a large number of connections stuck in a CLOSE_WAIT state. Eventually this causes the 'Temporary failure in name resolution' error due to resource exhaustion.
Was experiencing the same issue, in my case it wasnt resource exhaustion, the problem for me happened when my dhcp server changed the nameserver address, libc did not want to play ball and reload the new resolv.conf file, maintaining the cached one and forcing me to restart the script every time it changed.
All my python socket connections attempts fail after this, so I found this code that solved the situation:
import ctypes
try:
libc = ctypes.CDLL('libc.so.6')
res_init = getattr(libc, '__res_init')
res_init(None)
except:
pass
Use it before calling the socket.connect, hope this helps

Categories