Python 2.7: Thread hanging, no idea how to debug.

Python 2.7: Thread hanging, no idea how to debug. - python

I made a script to download wallpapers as a learning exercise to better familiarize myself with Python/Threading. Everything works well unless there is an exception trying to request a URL. This is the function I hit the exception (not a method of the same class, if that matters).
def open_url(url):
"""Opens URL and returns html"""
try:
response = urllib2.urlopen(url)
link = response.geturl()
html = response.read()
response.close()
return(html)
except urllib2.URLError, e:
if hasattr(e, 'reason'):
logging.debug('failed to reach a server.')
logging.debug('Reason: %s', e.reason)
logging.debug(url)
return None
elif hasattr(e, 'code'):
logging.debug('The server couldn\'t fulfill the request.')
logging.debug('Code: %s', e.reason)
logging.debug(url)
return None
else:
logging.debug('Shit fucked up2')
return None
At the end of my script:
main_thread = threading.currentThread()
for thread in threading.enumerate():
if thread is main_thread: continue
while thread.isAlive():
thread.join(2)
break
From my current understanding (which may be wrong) if the thread is not completed it's task within 2 seconds of reaching this it should time out. Instead it will stick in the last while. If I take that out it will just hang once the script is done executing.
Also, I decided it was time to man up and leave Notepad++ for a real IDE with debugging tools so I downloaded Wing. I'm a big fan of Wing, but the script doesn't hang there... What do you all use to write Python?

There is no thread interruption in Python and no way to cancel a thread. It can only finish execution by itself. The join method only waits 2 seconds or until termination, it does not kill anything. You need to implement timeout mechanism in the thread itself.

I hit the books and figured out enough to correct the issue I was having. I was able to remove that code that was near the end of my script completely. I corrected this issue by spawning the thread pool differently.
for i in range(queue.qsize()):
td = ThreadDownload(queue)
td.start()
queue.join()
I also was not using a try: for queue.get() during the thread's execution.
try:
img_url = self.queue.get()
...
except Queue.Empty:
...

Related

PyMongo with Multiprocessing Pool in Python

I have a problem I'm struggling with for weeks.
def doSomething(item):
client = MongoClient(DB_CONNECTION_STRING, maxPoolSize=100000, connect=False)
dbclient = client.mydb
try:
getAWebpage()
except MyException as e:
client.close()
# I know that the recursion here could cause problems, but this is not the point
doSomething(item)
else:
# do something with the web page and save in the DB
dbclient.collection.insertone({...})
...
client.close()
def singleTask(item):
doSomething(item)
time.sleep(0.3)
def startTask(list, threads=10):
pool = ThreadPool(threads)
results = pool.map(singleTask, list)
pool.close()
pool.join()
return results
I see in the MongoDB log that it opens a lot of connections without closing them and at a certain point MongoDB stops accepting connections, crashing everything.
What is wrong with this code? I can't really understand.
I know that the recursion could cause problem with the overall code but I'm first trying to solve this problem.
Any help will be much appreciated! Thank you

python gRPC client disconnect while server streaming response

Good day,
This is my first time posting so forgive me if I do something wrong with the post.
I am trying to get a subscription type service running, which works fine up until the client disconnects. Depending on the timing, this works fine or blocks indefinitely.
def Subscribe(self, request, context):
words = ["Please", "help", "me", "solve", "my", "problem", "!"]
while context.is_active():
try:
for word in words:
event = fr_pb2.Event(word=word)
if not context.is_active():
break
yield event
print (event)
except Exception as ex:
print(ex)
context.cancel()
print("Subscribe ended")
I am new to gRPC so there's possibly a few things I'm doing wrong, but my main issue is that if the client disconnects just before/while the yield occurs, the code hangs indefinitely. I've tried a few things to get out of this situation but they only work some times. A timeout set on the client side does count down, but the yield doesn't end when the countdown hits 0. The callback happens fine, but context.cancel and context.abort do not seem to help here either.
Is there anything I can do to prevent the yield from hanging or set a timeout of some sort so the the yield eventually ends? Any help/advice is greatly appreciated.

If anyone else comes across this issue, there isn't really a problem here. I erroneously thought that this was blocking since none of the code I put in to print progression was printing.
In actuality, when the client disconnects and the server tries to yield an exception is thrown... it just isn't a General "Exception", or "SystemExit", or "SystemError". Not exactly sure what the exception type is for for this, but the code does exit properly if you do whatever cleanup you need in a "finally".

You can catch it using GeneratorExit exception
def Subscribe(self, request, context):
try:
words = ["Please", "help", "me", "solve", "my", "problem", "!"]
while context.is_active():
for word in words:
event = fr_pb2.Event(word=word)
if not context.is_active():
break
yield event
print (event)
except GeneratorExit:
print("Client disconnected before function finished!")
finally:
print("Subscribe ended")
When client disconnected in middle of gRPC function process, it will raise GeneratorExit exception in server-side. It cannot catch by default Exception.

Restarting/Rebuilding a timed out process using Pebble in Python?

I am using concurrent futures to download reports from a remote server using an API. To inform me that the report has downloaded correctly, I just have the function print out its ID.
I have an issue where there are rare times that a report download will hang in-definitely. I do not get a Timeout Error or a Connection Reset error, just hanging there for hours until I kill the whole process. This is a known issue with the API with no known workaround.
I did some research and switched to using a Pebble based approach to implement a timeout on the function. My aim is then to record the ID of the report that failed to download and start again.
Unfortunately, I ran into a bit of a brick wall as I do not know how to actually retrieve the ID of the report I failed to download. I am using a similar layout to this answer:
from pebble import ProcessPool
from concurrent.futures import TimeoutError
def sometimes_stalling_download_function(report_id):
...
return report_id
with ProcessPool() as pool:
future = pool.map(sometimes_stalling_download_function, report_id_list, timeout=10)
iterator = future.result()
while True:
try:
result = next(iterator)
except StopIteration:
break
except TimeoutError as error:
print("function took longer than %d seconds" % error.args[1])
#Retrieve report ID here
failed_accounts.append(result)
What I want to do is retrieve the report ID in the event of a timeout but it does not seem to be reachable from that exception. Is it possible to have the function output the ID anyway in the case of a timeout exception or will I have to re-think how I am downloading the reports entirely?

The map function returns a future object which yields the results in the same order they were submitted.
Therefore, to understand which report_id is causing the timeout you can simply check its position in the report_id_list.
index = 0
while True:
try:
result = next(iterator)
except StopIteration:
break
except TimeoutError as error:
print("function took longer than %d seconds" % error.args[1])
#Retrieve report ID here
failed_accounts.append(report_id_list[index])
finally:
index += 1

Bug in python thread

I have some raspberry pi running some python code. Once and a while my devices will fail to check in. The rest of the python code continues to run perfectly but the code here quits. I am not sure why? If the devices can't check in they should reboot but they don't. Other threads in the python file continue to run correctly.
class reportStatus(Thread):
def run(self):
checkInCount = 0
while 1:
try:
if checkInCount < 50:
payload = {'d':device,'k':cKey}
resp = requests.post(url+'c', json=payload)
if resp.status_code == 200:
checkInCount = 0
time.sleep(1800) #1800
else:
checkInCount += 1
time.sleep(300) # 2.5 min
else:
os.system("sudo reboot")
except:
try:
checkInCount += 1
time.sleep(300)
except:
pass
The devices can run for days and weeks and will check in perfectly every 30 minutes, then out of the blue they will stop. My linux computers are in read-only and the computer continue to work and run correctly. My issue is in this thread. I think they might fail to get a response and this line could be the issue
resp = requests.post(url+'c', json=payload)
I am not sure how to solve this, any help or suggestions would be greatly appreciated.
Thank you

A bare except:pass is a very bad idea.
A much better approach would be to, at the very minimum, log any exceptions:
import traceback
while True:
try:
time.sleep(60)
except:
with open("exceptions.log", "a") as log:
log.write("%s: Exception occurred:\n" % datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
traceback.print_exc(file=log)
Then, when you get an exception, you get a log:
2016-12-20 13:28:55: Exception occurred:
Traceback (most recent call last):
File "./sleepy.py", line 8, in <module>
time.sleep(60)
KeyboardInterrupt
It is also possible that your code is hanging on sudo reboot or requests.post. You could add additional logging to troubleshoot which issue you have, although given you've seen it do reboots, I suspect it's requests.post, in which case you need to add a timeout (from the linked answer):
import requests
import eventlet
eventlet.monkey_patch()
#...
resp = None
with eventlet.Timeout(10):
resp = requests.post(url+'c', json=payload)
if resp:
# your code

Your code basically ignores all exceptions. This is considered a bad thing in Python.
The only reason I can think of for the behavior that you're seeing is that after checkInCount reaches 50, the sudo reboot raises an exception which is then ignored by your program, keeping this thread stuck in the infinite loop.
If you want to see what really happens, add print or loggging.info statements to all the different branches of your code.
Alternatively, remove the blanket try-except clause or replace it by something specific, e.g. except requests.exceptions.RequestException

Because of the answers given I was able to come up with a solution. I realized requests has a built in time out function. The timeout will never happen if a timeout is not specified as a parameter.
here is my solution:
resp = requests.post(url+'c', json=payload, timeout=45)
You can tell Requests to stop waiting for a response after a given
number of seconds with the timeout parameter. Nearly all production
code should use this parameter in nearly all requests. Failure to do
so can cause your program to hang indefinitely
The answers provided by TemporalWolf and other helped me alot. Thank you to all that helped.

How to use Tornado.gen.coroutine in TCP Server?

i write a Tcp Server with Tornado.
here is the code:
#! /usr/bin/env python
#coding=utf-8
from tornado.tcpserver import TCPServer
from tornado.ioloop import IOLoop
from tornado.gen import *
class TcpConnection(object):
def __init__(self,stream,address):
self._stream=stream
self._address=address
self._stream.set_close_callback(self.on_close)
self.send_messages()
def send_messages(self):
self.send_message(b'hello \n')
print("next")
self.read_message()
self.send_message(b'world \n')
self.read_message()
def read_message(self):
self._stream.read_until(b'\n',self.handle_message)
def handle_message(self,data):
print(data)
def send_message(self,data):
self._stream.write(data)
def on_close(self):
print("the monitored %d has left",self._address)
class MonitorServer(TCPServer):
def handle_stream(self,stream,address):
print("new connection",address,stream)
conn = TcpConnection(stream,address)
if __name__=='__main__':
print('server start .....')
server=MonitorServer()
server.listen(20000)
IOLoop.instance().start()
And i face some eorror assert self._read_callback is None, "Already reading",i guess the eorror is because multiple commands to read from socket at the same time.and then i change the function send_messages with tornado.gen.coroutine.here is code:
#gen.coroutine
def send_messages(self):
yield self.send_message(b'hello \n')
response1 = yield self.read_message()
print(response1)
yield self.send_message(b'world \n')
print((yield self.read_message()))
but there are some other errors. the code seem to stop after yield self.send_message(b'hello \n'),and the following code seem not to execute.
how should i do about it ? If you're aware of any Tornado tcpserver (not HTTP!) code with tornado.gen.coroutine,please tell me.I would appreciate any links!

send_messages() calls send_message() and read_message() with yield, but these methods are not coroutines, so this will raise an exception.
The reason you're not seeing the exception is that you called send_messages() without yielding it, so the exception has nowhere to go (the garbage collector should eventually notice and print the exception, but that can take a long time). Whenever you call a coroutine, you should either use yield to wait for it to finish, or IOLoop.current().spawn_callback() to run the coroutine in the "background" (this tells Tornado that you do not intend to yield the coroutine, so it will print the exception as soon as it occurs). Also, whenever you override a method you should read the documentation to see whether coroutines are allowed (when you override TCPServer.handle_stream() you can make it a coroutine, but __init__() may not be a coroutine).
Once the exception is getting logged, the next step is to fix it. You can either make send_message() and read_message() coroutines (getting rid of the handle_message() callback in the process), or you can use tornado.gen.Task() to call coroutine-style code from a coroutine. I generally recommend using coroutines everywhere.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 2.7: Thread hanging, no idea how to debug. - python

There is no thread interruption in Python and no way to cancel a thread. It can only finish execution by itself. The join method only waits 2 seconds or until termination, it does not kill anything. You need to implement timeout mechanism in the thread itself.

Related

PyMongo with Multiprocessing Pool in Python

python gRPC client disconnect while server streaming response

Restarting/Rebuilding a timed out process using Pebble in Python?

Bug in python thread

How to use Tornado.gen.coroutine in TCP Server?

Categories

Resources