python "local variable referenced before assignment" with hundreds of threads - python

I am having a problem with a piece of code that is executed inside a thread in python. Everything works fine until I start using more than 100 or 150 threads, then I get the following error in several threads:
resp.read(1)
UnboundLocalError: local variable 'resp' referenced before assignment.
The code is the following:
try:
resp = self.opener.open(request)
code = 200
except urllib2.HTTPError as e:
code = e.code
#print e.reason,_url
#sys.stdout.flush()
except urllib2.URLError as e:
resp = None
code = None
try:
if code:
# ttfb (time to first byte)
resp.read(1)
ttfb = time.time() - start
# ttlb (time to last byte)
resp.read()
ttlb = time.time() - start
else:
ttfb = 0
ttlb = 0
except httplib.IncompleteRead:
pass
As you can see if "resp" is not assigned due to an exception, it should raise the exception and "code" coundn't be assigned so it couldn't enter in "resp.read(1)".
Anybody has some clue on wht it is failing? I guess it is related to scopes but I don't know how to avoid this or how to implement it differently.
Thanks and regards.

Basic python:
If there is a HttpError during the open call, resp will not be set, but code will be set to e.code in the exception handler.
Then code is tested and resp.read(1) is called.
This has nothing to do with threads directly, but maybe the high number of threads caused the HTTPError.

Defining and using resp variable are not is same code block. One of them in a try/except, the other is in another try/except block. Try to merge them:
Edited:
ttfb = 0
ttlb = 0
try:
resp = self.opener.open(request)
code = 200
resp.read(1)
ttfb = time.time() - start
resp.read()
ttlb = time.time() - start
except urllib2.HTTPError as e:
code = e.code
#print e.reason,_url
#sys.stdout.flush()
except urllib2.URLError as e:
pass
except httplib.IncompleteRead:
pass

Related

try-catch in a while-loop (python)

while var == 1:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
parsed_json = json.loads(get_response.text)
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
How do i put in a try-catch routine to this code, once per 2 days i get an Error in Line 4 at json.loads() and i cant reproduce it. What i´m trying to do is that the while loop is in a "try:" block and an catch block that only triggers when an error occurs inside the while loop. Additionally it would be great if the while loop doesnt stop on an error. How could i do this. Thank you very much for your help. (I started programming python just a week ago)
If you just want to catch the error in forth line, a "Try except" wrap the forth line will catch what error happened.
while var == 1:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
try:
parsed_json = json.loads(get_response.text)
except Exception as e:
print(str(e))
print('error data is {}',format(get_response.text))
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
You can simply
while var == 1:
try:
test_url = 'https://testurl.com'
get_response = requests.get(url=test_url)
parsed_json = json.loads(get_response.text)
test = requests.get('https://api.telegram.org/botid/' + 'sendMessage', params=dict(chat_id=str(0815), text="test"))
ausgabe = json.loads(test.text)
print(ausgabe['result']['text'])
time.sleep(3)
except Exception as e:
print "an exception {} of type {} occurred".format(e, type(e).__name__)

python Time a try except

My problem is very simple.
I have a try/except code. In the try I have some http requests attempts and in the except I have several ways to deal with the exceptions I'm getting.
Now I want to add a time parameter to my code. Which means the try will only last for 'n' seconds. otherwise catch it with except.
In free language it would appear as:
try for n seconds:
doSomthing()
except (after n seconds):
handleException()
this is mid-code. Not a function. and I have to catch the timeout and handle it. I cannot just continue the code.
while (recoveryTimes > 0):
try (for 10 seconds):
urllib2.urlopen(req)
response = urllib2.urlopen(req)
the_page = response.read()
recoveryTimes = 0
except (urllib2.URLError, httplib.BadStatusLine) as e:
print str(e.__unicode__())
print sys.exc_info()[0]
recoveryTimes -= 1
if (recoveryTimes > 0):
print "Retrying request. Requests left %s" %recoveryTimes
continue
else:
print "Giving up request, changing proxy."
setUrllib2Proxy()
break
except (timedout, 10 seconds has passed)
setUrllib2Proxy()
break
The solution I need is for the try (for 10 seconds)
and the except (timeout, after 10 seconds)
Check the documentation
import urllib2
request = urllib2.Request('http://www.yoursite.com')
try:
response = urllib2.urlopen(request, timeout=4)
content = response.read()
except urllib2.URLError, e:
print e
If you want to catch more specific errors check this post
or alternatively for requests
import requests
try:
r = requests.get(url,timeout=4)
except requests.exceptions.Timeout as e:
# Maybe set up for a retry
print e
except requests.exceptions.RequestException as e:
print e
More about exceptions while using requests can be found in docs or in this post
A generic solution if you are using UNIX:
import time as time
import signal
#Close session
def handler(signum, frame):
print 1
raise Exception('Action took too much time')
signal.signal(signal.SIGALRM, handler)
signal.alarm(3) #Set the parameter to the amount of seconds you want to wait
try:
#RUN CODE HERE
for i in range(0,5):
time.sleep(1)
except:
print 2
signal.alarm(10) #Resets the alarm to 10 new seconds
signal.alarm(0) #Disables the alarm

Ignoring exceptions for a specific amount of time

Trying to make the try to run in a loop, since I am booting a machine containing the webserver and I want to make it run and not just go direct to the except and stop the script. I have made a while for the http-status code, but that does only work if the machine is up.
So my question is how can I make the try loop for like 5 minutes before it goes to the except? Sorry for my poor explanation.
try:
r = requests.head("http://www.testing.co.uk")
while r.status_code != 200:
print "Response not == to 200."
time.sleep(30)
r = requests.head("http://www.testing.co.uk")
else:
print "Response is 200 - OK"
except requests.ConnectionError:
print "NOT FOUND - ERROR"
You could do something like:
import requests, time, datetime
# Determine "end" time -- in this case, 5 minutes from now
t_end = datetime.datetime.now() + datetime.timedelta(minutes=5)
while True:
try:
r = requests.head("http://www.testing.co.uk")
if r.status_code != 200:
# Do something
print "Response not == to 200."
else:
# Do something else
print "Response is 200 - OK"
break # Per comments
time.sleep(30) # Wait 30 seconds between requests
except requests.ConnectionError as e:
print "NOT FOUND - ERROR"
# If the time is past the end time, re-raise the exception
if datetime.datetime.now() > t_end: raise e
time.sleep(30) # Wait 30 seconds between requests
The important line is:
if datetime.datetime.now() > t_end: raise e
If the condition isn't met (less that 5 minutes have elapsed), the exception is silently ignored and the while loop continues.
If the condition is met, the exception is re-raised to be handled by some other, outer code or not handled at all -- in which case you'll see the exception "break" (in your words) the program.
The benefit of using this approach over something like (instead of while True:):
while datetime.datetime.now() > t_end:
is that if you find yourself outside of the while loop, you know you got there from break and not from 5 minutes elapsing. You also preserve the exception in case you want to do something special in that case.

Which is the right way of recovering from a requests.exceptions.ConnectionError?

I am scrapping a web site, but sometimes the laptop lost the connection, and I got (obviously) a requests.exceptions.ConnectionError. Which is the right (or most elegant?) way of recover from this error? I mean: I don't want the program to stop, but retry the connection, maybe some seconds later? This is my code, but I got the feeling is not correct:
def make_soup(session,url):
try:
n = randint(1, MAX_NAPTIME)
sleep(n)
response = session.get(url)
except requests.exceptions.ConnectionError as req_ce:
error_msg = req_ce.args[0].reason.strerror
print "Error: %s con la url %s" % (eror_msg, url)
session = logout(session)
n = randint(MIN_SLEEPTIME, MAX_SLEEPTIME)
sleep(n)
session = login(session)
response = session.get(url)
soup = BeautifulSoup(response.text)
return soup
Any ideas?
Note that I need a session to scrap this pages, so, I think that the login (i.e. login again to the site, after a logout) could be cause troubles
So why not something like
import requests
import time
def retry(cooloff=5, exc_type=None):
if not exc_type:
exc_type = [requests.exceptions.ConnectionError]
def real_decorator(function):
def wrapper(*args, **kwargs):
while True:
try:
return function(*args, **kwargs)
except Exception as e:
if e.__class__ in exc_type:
print "failed (?)"
time.sleep(cooloff)
else:
raise e
return wrapper
return real_decorator
Which is a decorator that allows you to call any function until it succeeds. e.g.
#retry(exc_type=[ZeroDivisionError])
def test():
return 1/0
print test()
Which will just print "failed (y)" every 5 seconds until the end of time (or until the laws of math change)
Is it really needed to logout and relogin into your session? I'd just retry the connection the same way:
def make_soup(session,url):
success = False
response = None
for attempt in range(1, MAXTRIES):
try:
response = session.get(url)
# If session.get succeeded, we break out of the
# for loop after setting a success flag
success = True
break
except requests.exceptions.ConnectionError as req_ce:
error_msg = req_ce.args[0].reason.strerror
print "Error: %s con la url %s" % (error_msg, url)
print " Attempt %s of %s" % (attempt, MAXTRIES)
sleep(randint(MIN_SLEEPTIME, MAX_SLEEPTIME))
# Figure out if we were successful.
# Note it may not be needed to have a flag, you can maybe just
# check the value of response here.
if not success:
print "Couldn't get it after retrying many times"
return None
#Once we get here, we know we got a good response
soup = BeautifulSoup(response.text)
return soup

detect disconnect persistant curl connection

Where should I check for a disconnect in a pycurl persistant connection?
Somewhere in my script the connection is dying/timing out/throwing an error but the script stays open. I need to detect the problem so I can restart the script.
We are connecting to gnip (a social media data provider)
My code is here: https://gist.github.com/3353033
I've read over the options for libcurl and I read through the php curl_setopts docs because they also leverage libcurl.
class Client:
time_start = time.time()
content = ""
def __init__(self,options):
self.options = options
self.buffer = ""
self.conn = pycurl.Curl()
self.conn.setopt(pycurl.USERPWD, "%s:%s" % (USER, PASS))
self.conn.setopt(pycurl.ENCODING,'gzip')
self.conn.setopt(pycurl.URL, STREAM_URL)
self.conn.setopt(pycurl.WRITEFUNCTION, self.on_receive)
self.conn.setopt(pycurl.FOLLOWLOCATION,1)
self.conn.setopt(pycurl.MAXREDIRS, 5)
self.conn.setopt(pycurl.COOKIEFILE,"cookie.txt")
try:
self.conn.perform()
except Exception,e:
print e.message
def on_receive(self, data):
self.buffer += data
if data.endswith("\r\n") and self.buffer.strip():
if(self.triggered()):
if(len(self.buffer) != 0 ):
try:
SaveThread(self.buffer).start()
except Exception, e:
print "something i commented would have told you there was an error"
system.exit(1)
self.buffer = ""
def triggered(self):
# First trigger based on size then based on time..
if (len(self.buffer) > SAVE_FILE_LENGTH):
return True
time_end = time.time()
if (((time_end - self.time_start) > ROLL_DURATION)): #for the time frame
self.time_start=time.time()
return True
return False
edit: i've fixed the gist
In the above code system.exit(1) should be sys.exit(1) right?
Other than that do you have any more bare except clauses that might be catching the SystemExit exception raised by sys.exit(1)?

Categories