Bug in python thread - python

I have some raspberry pi running some python code. Once and a while my devices will fail to check in. The rest of the python code continues to run perfectly but the code here quits. I am not sure why? If the devices can't check in they should reboot but they don't. Other threads in the python file continue to run correctly.
class reportStatus(Thread):
def run(self):
checkInCount = 0
while 1:
try:
if checkInCount < 50:
payload = {'d':device,'k':cKey}
resp = requests.post(url+'c', json=payload)
if resp.status_code == 200:
checkInCount = 0
time.sleep(1800) #1800
else:
checkInCount += 1
time.sleep(300) # 2.5 min
else:
os.system("sudo reboot")
except:
try:
checkInCount += 1
time.sleep(300)
except:
pass
The devices can run for days and weeks and will check in perfectly every 30 minutes, then out of the blue they will stop. My linux computers are in read-only and the computer continue to work and run correctly. My issue is in this thread. I think they might fail to get a response and this line could be the issue
resp = requests.post(url+'c', json=payload)
I am not sure how to solve this, any help or suggestions would be greatly appreciated.
Thank you

A bare except:pass is a very bad idea.
A much better approach would be to, at the very minimum, log any exceptions:
import traceback
while True:
try:
time.sleep(60)
except:
with open("exceptions.log", "a") as log:
log.write("%s: Exception occurred:\n" % datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
traceback.print_exc(file=log)
Then, when you get an exception, you get a log:
2016-12-20 13:28:55: Exception occurred:
Traceback (most recent call last):
File "./sleepy.py", line 8, in <module>
time.sleep(60)
KeyboardInterrupt
It is also possible that your code is hanging on sudo reboot or requests.post. You could add additional logging to troubleshoot which issue you have, although given you've seen it do reboots, I suspect it's requests.post, in which case you need to add a timeout (from the linked answer):
import requests
import eventlet
eventlet.monkey_patch()
#...
resp = None
with eventlet.Timeout(10):
resp = requests.post(url+'c', json=payload)
if resp:
# your code

Your code basically ignores all exceptions. This is considered a bad thing in Python.
The only reason I can think of for the behavior that you're seeing is that after checkInCount reaches 50, the sudo reboot raises an exception which is then ignored by your program, keeping this thread stuck in the infinite loop.
If you want to see what really happens, add print or loggging.info statements to all the different branches of your code.
Alternatively, remove the blanket try-except clause or replace it by something specific, e.g. except requests.exceptions.RequestException

Because of the answers given I was able to come up with a solution. I realized requests has a built in time out function. The timeout will never happen if a timeout is not specified as a parameter.
here is my solution:
resp = requests.post(url+'c', json=payload, timeout=45)
You can tell Requests to stop waiting for a response after a given
number of seconds with the timeout parameter. Nearly all production
code should use this parameter in nearly all requests. Failure to do
so can cause your program to hang indefinitely
The answers provided by TemporalWolf and other helped me alot. Thank you to all that helped.

Related

Request timed out: timeout('timed out') in Python's HTTPServer

I am trying to create a simple HTTP server that uses the Python HTTPServer which inherits BaseHTTPServer. [https://github.com/python/cpython/blob/main/Lib/http/server.py][1]
There are numerous examples of this approach online and I don't believe I am doing anything unusual.
I am simply importing the class via:
"from http.server import HTTPServer, BaseHTTPRequestHandler"
in my code.
My code overrides the do_GET() method to parse the path variable to determine what page to show.
However, if I start this server and connect to it locally (ex: http://127.0.0.1:50000) the first page loads fine. If I navigate to another page (via my first page links) that too works fine, however, on occasion (and this is somewhat sporadic), there is a delay and the server log shows a Request timed out: timeout('timed out') error. I have tracked this down to the handle_one_request method in the BaseHTTPServer class:
def handle_one_request(self):
"""Handle a single HTTP request.
You normally don't need to override this method; see the class
__doc__ string for information on how to handle specific HTTP
commands such as GET and POST.
"""
try:
self.raw_requestline = self.rfile.readline(65537)
if len(self.raw_requestline) > 65536:
self.requestline = ''
self.request_version = ''
self.command = ''
self.send_error(HTTPStatus.REQUEST_URI_TOO_LONG)
return
if not self.raw_requestline:
self.close_connection = True
return
if not self.parse_request():
# An error code has been sent, just exit
return
mname = 'do_' + self.command ## the name of the method is created
if not hasattr(self, mname): ## checking that we have that method defined
self.send_error(
HTTPStatus.NOT_IMPLEMENTED,
"Unsupported method (%r)" % self.command)
return
method = getattr(self, mname) ## getting that method
method() ## finally calling it
self.wfile.flush() #actually send the response if not already done.
except socket.timeout as e:
# a read or a write timed out. Discard this connection
self.log_error("Request timed out: %r", e)
self.close_connection = True
return
You can see where the exception is thrown in the "except socket.timeout as e:" clause.
I have tried overriding this method by including it in my code but it is not clear what is causing the error so I run into dead ends. I've tried creating very basic HTML pages to see if there was something in the page itself, but even "blank" pages cause the same sporadic issue.
What's odd is that sometimes a page loads instantly, and almost randomly, it will then timeout. Sometimes the same page, sometimes a different page.
I've played with the http.timeout setting, but it makes no difference. I suspect it's some underlying socket issue, but am unable to diagnose it further.
This is on a Mac running Big Sur 11.3.1, with Python version 3.9.4.
Any ideas on what might be causing this timeout, and in particular any suggestions on a resolution. Any pointers would be appreciated.
After further investigation, this particular appears to be an issue with Safari. Running the exact same code and using Firefox does not show the same issue.

Restarting/Rebuilding a timed out process using Pebble in Python?

I am using concurrent futures to download reports from a remote server using an API. To inform me that the report has downloaded correctly, I just have the function print out its ID.
I have an issue where there are rare times that a report download will hang in-definitely. I do not get a Timeout Error or a Connection Reset error, just hanging there for hours until I kill the whole process. This is a known issue with the API with no known workaround.
I did some research and switched to using a Pebble based approach to implement a timeout on the function. My aim is then to record the ID of the report that failed to download and start again.
Unfortunately, I ran into a bit of a brick wall as I do not know how to actually retrieve the ID of the report I failed to download. I am using a similar layout to this answer:
from pebble import ProcessPool
from concurrent.futures import TimeoutError
def sometimes_stalling_download_function(report_id):
...
return report_id
with ProcessPool() as pool:
future = pool.map(sometimes_stalling_download_function, report_id_list, timeout=10)
iterator = future.result()
while True:
try:
result = next(iterator)
except StopIteration:
break
except TimeoutError as error:
print("function took longer than %d seconds" % error.args[1])
#Retrieve report ID here
failed_accounts.append(result)
What I want to do is retrieve the report ID in the event of a timeout but it does not seem to be reachable from that exception. Is it possible to have the function output the ID anyway in the case of a timeout exception or will I have to re-think how I am downloading the reports entirely?
The map function returns a future object which yields the results in the same order they were submitted.
Therefore, to understand which report_id is causing the timeout you can simply check its position in the report_id_list.
index = 0
while True:
try:
result = next(iterator)
except StopIteration:
break
except TimeoutError as error:
print("function took longer than %d seconds" % error.args[1])
#Retrieve report ID here
failed_accounts.append(report_id_list[index])
finally:
index += 1

requests process hangs

I'm using requests to get a URL, such as:
while True:
try:
rv = requests.get(url, timeout=1)
doSth(rv)
except socket.timeout as e:
print e
except Exception as e:
print e
After it runs for a while, it quits working. No exception or any error, just like it suspended. I then stop the process by typing Ctrl+C from the console. It shows that the process is waiting for data:
.............
httplib_response = conn.getresponse(buffering=True) #httplib.py
response.begin() #httplib.py
version, status, reason = self._read_status() #httplib.py
line = self.fp.readline(_MAXLINE + 1) #httplib.py
data = self._sock.recv(self._rbufsize) #socket.py
KeyboardInterrupt
Why is this happening? Is there a solution?
It appears that the server you're sending your request to is throttling you - that is, it's sending bytes with less than 1 second between each package (thus not triggering your timeout parameter), but slow enough for it to appear to be stuck.
The only fix for this I can think of is to reduce the timeout parameter, unless you can fix this throttling issue with the Server provider.
Do keep in mind that you'll need to consider latency when setting the timeout parameter, otherwise your connection will be dropped too quickly and might not work at all.
The default requests doesn't not set a timeout for connection or read.
If for some reason, the server cannot get back to the client within the time, the client will stuck at connecting or read, mostly the read for the response.
The quick resolution is to set a timeout value in the requests object, the approach is well described here: http://docs.python-requests.org/en/master/user/advanced/#timeouts
(Thanks to the guys.)
If this resolves the issue, please kindly mark this a resolution. Thanks.

Errno 10054 while scraping HTML with Python: how to reconnect

I'm a novice Python programmer trying to use Python to scrape a large amount of pages from fanfiction.net and deposit a particular line of the page's HTML source into a .csv file. My program works fine, but eventually hits a snag where it stops running. My IDE told me that the program has encountered "Errno 10054: an existing connection was forcibly closed by the remote host".
I'm looking for a way to get my code to reconnect and continue every time I get the error. My code will be scraping a few hundred thousand pages every time it runs; is this maybe just too much for the site? The site doesn't appear to prevent scraping. I've done a fair amount of research on this problem already and attempted to implement a retry decorator, but the decorator doesn't seem to work. Here's the relevant section of my code:
def retry(ExceptionToCheck, tries=4, delay=3, backoff=2, logger=None):
def deco_retry(f):
#wraps(f)
def f_retry(*args, **kwargs):
mtries, mdelay = tries, delay
while mtries > 1:
try:
return f(*args, **kwargs)
except ExceptionToCheck as e:
msg = "%s, Retrying in %d seconds..." % (str(e), mdelay)
if logger:
logger.warning(msg)
else:
print(msg)
time.sleep(mdelay)
mtries -= 1
mdelay *= backoff
return f(*args, **kwargs)
return f_retry # true decorator
return deco_retry
#retry(urllib.error.URLError, tries=4, delay=3, backoff=2)
def retrieveURL(URL):
response = urllib.request.urlopen(URL)
return response
def main():
# first check: 5000 to 100,000
MAX_ID = 600000
ID = 400001
URL = "http://www.fanfiction.net/s/" + str(ID) + "/index.html"
fCSV = open('buffyData400k600k.csv', 'w')
fCSV.write("Rating, Language, Genre 1, Genre 2, Character A, Character B, Character C, Character D, Chapters, Words, Reviews, Favorites, Follows, Updated, Published, Story ID, Story Status, Author ID, Author Name" + '\n')
while ID <= MAX_ID:
URL = "http://www.fanfiction.net/s/" + str(ID) + "/index.html"
response = retrieveURL(URL)
Whenever I run the .py file outside of my IDE, it eventually locks up and stops grabbing new pages after about an hour, tops. I'm also running a different version of the same file in my IDE, and that appears to have been running for almost 12 hours now, if not longer-is it possible that the file could work in my IDE but not when run independently?
Have I set my decorator up wrong? What else could I potentially do to get python to reconnect? I've also seen claims that the SQL native client being out of date could cause problems for a Window user such as myself - is this true? I've tried to update that but had no luck.
Thank you!
You are catching URLErrors, which Errno: 10054 is not, so your #retry decorator is not going to retry. Try this.
#retry(Exception, tries=4)
def retrieveURL(URL):
response = urllib.request.urlopen(URL)
return response
This should retry 4 times on any Exception. Your #retry decorator is defined correctly.
Your code for reconnecting looks good except for one part - the exception that you're trying to catch. According to this StackOverflow question, an Errno 10054 is a socket.error. All you need to do is to import socket and add an except socket.error statement in your retry handler.

Python 2.7: Thread hanging, no idea how to debug.

I made a script to download wallpapers as a learning exercise to better familiarize myself with Python/Threading. Everything works well unless there is an exception trying to request a URL. This is the function I hit the exception (not a method of the same class, if that matters).
def open_url(url):
"""Opens URL and returns html"""
try:
response = urllib2.urlopen(url)
link = response.geturl()
html = response.read()
response.close()
return(html)
except urllib2.URLError, e:
if hasattr(e, 'reason'):
logging.debug('failed to reach a server.')
logging.debug('Reason: %s', e.reason)
logging.debug(url)
return None
elif hasattr(e, 'code'):
logging.debug('The server couldn\'t fulfill the request.')
logging.debug('Code: %s', e.reason)
logging.debug(url)
return None
else:
logging.debug('Shit fucked up2')
return None
At the end of my script:
main_thread = threading.currentThread()
for thread in threading.enumerate():
if thread is main_thread: continue
while thread.isAlive():
thread.join(2)
break
From my current understanding (which may be wrong) if the thread is not completed it's task within 2 seconds of reaching this it should time out. Instead it will stick in the last while. If I take that out it will just hang once the script is done executing.
Also, I decided it was time to man up and leave Notepad++ for a real IDE with debugging tools so I downloaded Wing. I'm a big fan of Wing, but the script doesn't hang there... What do you all use to write Python?
There is no thread interruption in Python and no way to cancel a thread. It can only finish execution by itself. The join method only waits 2 seconds or until termination, it does not kill anything. You need to implement timeout mechanism in the thread itself.
I hit the books and figured out enough to correct the issue I was having. I was able to remove that code that was near the end of my script completely. I corrected this issue by spawning the thread pool differently.
for i in range(queue.qsize()):
td = ThreadDownload(queue)
td.start()
queue.join()
I also was not using a try: for queue.get() during the thread's execution.
try:
img_url = self.queue.get()
...
except Queue.Empty:
...

Categories