Python script freezes infinitely during a socket connection

Python script freezes infinitely during a socket connection - python

I have a simple python script that updates that statuses of justin.tv streams in my database. It's a Django based web application. This script worked perfectly before I moved it to my production server, but now it has issues with timing out or freezing. I've solved the time out problem by adding try/except blocks and making the script retry, but I still can't figure out the freezing problem.
I know it freezes on the line streamOnline = manager.getStreamOnline(stream.name, LOG). That's the same point where the socket.timeout exception occurs. Some times however, it just locks up for ever. I just can't picture a scenario where python would freeze infinitely. Here is the code for the script that freezes. I'm linking website.networkmanagers below, as well as oauth and the justin.tv python library that I'm using.
import sys, os, socket
LOG = False
def updateStreamInfo():
# Set necessary paths
honstreams = os.path.realpath(os.path.dirname(__file__) + "../../../")
sys.path.append(honstreams)
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'
# Import necessary moduels
from website.models import Stream, StreamInfo
from website.networkmanagers import get_manager, \
NetworkManagerReturnedErrorException
# Get all streams
streams = Stream.objects.all()
try:
# Loop through them
for stream in streams:
skipstream = False
print 'Checking %s...' % stream.name,
# Get the appropriate network manager and
manager = get_manager(stream.network.name)
# Try to get stream status up to 3 times
for i in xrange(3):
try:
streamOnline = manager.getStreamOnline(stream.name, LOG)
break
except socket.error as e:
code, message = e
# Retry up to 3 times
print 'Error: %s. Retrying...'
# If this stream should be skipped
if(skipstream):
print 'Can\'t connect! Skipping %s' % stream.name
continue
# Skip if status has not changed
if streamOnline == stream.online:
print 'Skipping %s because the status has not changed' % \
stream.name
continue
# Save status
stream.online = streamOnline
stream.save()
print 'Set %s to %s' % (stream.name, streamOnline)
except NetworkManagerReturnedErrorException as e:
print 'Stopped the status update loop:', e
if(__name__ == "__main__"):
if(len(sys.argv) > 1 and sys.argv[1] == "log"):
LOG = True
if(LOG): print "Logging enabled"
updateStreamInfo()
networkmanagers.py
oauth.py
JtvClient.py
Example of the script freezing
foo#bar:/.../honstreams/honstreams# python website/scripts/updateStreamStatus.py
Checking angrytestie... Skipping angrytestie because the status has not changed
Checking chustream... Skipping chustream because the status has not changed
Checking cilantrogamer... Skipping cilantrogamer because the status has not changed
| <- caret sits here blinking infinitely
Interesting update
Every time it freezes and I send a keyboard interrupt, it's on the same line in socket.py:
root#husta:/home/honstreams/honstreams# python website/scripts/updateStreamStatus.py
Checking angrytestie... Skipping angrytestie because the status has not changed
Checking chustream... Skipping chustream because the status has not changed
^CChecking cilantrogamer...
Traceback (most recent call last):
File "website/scripts/updateStreamStatus.py", line 64, in <module>
updateStreamInfo()
File "website/scripts/updateStreamStatus.py", line 31, in updateStreamInfo
streamOnline = manager.getStreamOnline(stream.name, LOG)
File "/home/honstreams/honstreams/website/networkmanagers.py", line 47, in getStreamOnline
return self.getChannelLive(channelName, log)
File "/home/honstreams/honstreams/website/networkmanagers.py", line 65, in getChannelLive
response = client.get('/stream/list.json?channel=%s' % channelName)
File "/home/honstreams/honstreams/website/JtvClient.py", line 51, in get
return self._send_request(request, token)
File "/home/honstreams/honstreams/website/JtvClient.py", line 90, in _send_request
return conn.getresponse()
File "/usr/lib/python2.6/httplib.py", line 986, in getresponse
response.begin()
File "/usr/lib/python2.6/httplib.py", line 391, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.6/httplib.py", line 349, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.6/socket.py", line 397, in readline
data = recv(1)
KeyboardInterrupt
Any thoughts?

Have you tried using another application to open that connection? Given that it's an issue in production, perhaps you don't have some firewall issues.

Down in JtvClient.py it uses httplib to handle the connection. Have you tried changing this to use httplib2 instead?
Other than that stab in the dark, I would add a lot of logging statements to this code in order to track what actually happens and where it gets stuck. Then I would make sure that the point where it gets stuck can timeout on the socket (which usually involves either monkeypatching or forking the codebase) so that stuff fails instead of hanging.
You said:
I know it freezes on the line streamOnline = manager.getStreamOnline(stream.name, LOG). That's the same point where the socket.timeout exception occurs.
Wrong. It doesn't freeze on that line because that line is a function call which calls lots of other functions through several levels of other modules. So you do not yet know where the program freezes. Also, that line is NOT the point where the socket timeout occurs. The socket timeout will only occur on a low level socket operation like select or recv which is being called several times in the chain of activity triggered by getStreamOnline.
You need to trace your code in a debugger or add print statements to track down exactly where the hang occurs. It could possibly be an infinite loop in Python but is more likely to be a low-level call to an OS networking function. Until you find the source of the error, you can't do anything.
P.S. the keyboard interrupt is a reasonable clue that the problem is around line 90 in JtvClient.py, so put in some print statements and find out what happens. There may be a stupid loop in there that keeps calling getresponse, or you may be calling it with bad parameters or maybe the network server really is borked. Narrow it down to fewer possibilities.

It turns out this HTTP connection isn't passed a timeout in jtvClient.py
def _get_conn(self):
return httplib.HTTPConnection("%s:%d" % (self.host, self.port))
Changed the last line to
return httplib.HTTPConnection("%s:%d" % (self.host, self.port), timeout=10)
Which solved it

Related

Pbixrefresher for Power BI Desktop report refresh/publish error

I am trying to refresh power b.i. more frequently than current capability of gateway schedule refresh.
I found this:
https://github.com/dubravcik/pbixrefresher-python
Installed and verified I have all required packages installed to run.
Right now it works fine until the end - where after it refreshes a Save function seems to execute correctly - but the report does not save - and when it tries Publish function - a prompt is created asking if user would like to save and there is a timeout.
I have tried increasing the time-out argument and adding more wait time in the routine (along with a couple of other suggested ideas from the github issues thread).
Below is what cmd looks like along with the error - I also added the Main routine of the pbixrefresher file in case there is a different way to save (hotkeys) or something worth trying. I tried this both as my user and admin in CMD - but wasn't sure if it's possible a permissions setting could block the report from saving. Thank you for reading any help is greatly appreciated.
Starting Power BI
Waiting 15 sec
Identifying Power BI window
Refreshing
Waiting for refresh end (timeout in 100000 sec)
Saving
Publish
Traceback (most recent call last):
File "c:\python36\lib\site-packages\pywinauto\application.py", line 258, in __resolve_control
criteria)
File "c:\python36\lib\site-packages\pywinauto\timings.py", line 458, in wait_until_passes
raise err
pywinauto.timings.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\python36\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\python36\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Python36\Scripts\pbixrefresher.exe_main.py", line 9, in
File "c:\python36\lib\site-packages\pbixrefresher\pbixrefresher.py", line 77, in main
publish_dialog.child_window(title = WORKSPACE, found_index=0).click_input()
File "c:\python36\lib\site-packages\pywinauto\application.py", line 379, in getattribute
ctrls = self.__resolve_control(self.criteria)
File "c:\python36\lib\site-packages\pywinauto\application.py", line 261, in __resolve_control
raise e.original_exception
File "c:\python36\lib\site-packages\pywinauto\timings.py", line 436, in wait_until_passes
func_val = func(*args, **kwargs)
File "c:\python36\lib\site-packages\pywinauto\application.py", line 222, in __get_ctrl
ctrl = self.backend.generic_wrapper_class(findwindows.find_element(**ctrl_criteria))
File "c:\python36\lib\site-packages\pywinauto\findwindows.py", line 87, in find_element
raise ElementNotFoundError(kwargs)
pywinauto.findwindows.ElementNotFoundError: {'auto_id': 'KoPublishToGroupDialog', 'top_level_only': False, 'parent': <uia_element_info.UIAElementInfo - 'Simple - Power BI Desktop', WindowsForms10.Window.8.app.0.1bb715_r6_ad1, 8914246>, 'backend': 'uia'}
The main routine from pbixrefresher:
def main():
# Parse arguments from cmd
parser = argparse.ArgumentParser()
parser.add_argument("workbook", help = "Path to .pbix file")
parser.add_argument("--workspace", help = "name of online Power BI service work space to publish in", default = "My workspace")
parser.add_argument("--refresh-timeout", help = "refresh timeout", default = 30000, type = int)
parser.add_argument("--no-publish", dest='publish', help="don't publish, just save", default = True, action = 'store_false' )
parser.add_argument("--init-wait", help = "initial wait time on startup", default = 15, type = int)
args = parser.parse_args()
timings.after_clickinput_wait = 1
WORKBOOK = args.workbook
WORKSPACE = args.workspace
INIT_WAIT = args.init_wait
REFRESH_TIMEOUT = args.refresh_timeout
# Kill running PBI
PROCNAME = "PBIDesktop.exe"
for proc in psutil.process_iter():
# check whether the process name matches
if proc.name() == PROCNAME:
proc.kill()
time.sleep(3)
# Start PBI and open the workbook
print("Starting Power BI")
os.system('start "" "' + WORKBOOK + '"')
print("Waiting ",INIT_WAIT,"sec")
time.sleep(INIT_WAIT)
# Connect pywinauto
print("Identifying Power BI window")
app = Application(backend = 'uia').connect(path = PROCNAME)
win = app.window(title_re = '.*Power BI Desktop')
time.sleep(5)
win.wait("enabled", timeout = 300)
win.Save.wait("enabled", timeout = 300)
win.set_focus()
win.Home.click_input()
win.Save.wait("enabled", timeout = 300)
win.wait("enabled", timeout = 300)
# Refresh
print("Refreshing")
win.Refresh.click_input()
#wait_win_ready(win)
time.sleep(5)
print("Waiting for refresh end (timeout in ", REFRESH_TIMEOUT,"sec)")
win.wait("enabled", timeout = REFRESH_TIMEOUT)
# Save
print("Saving")
type_keys("%1", win)
#wait_win_ready(win)
time.sleep(5)
win.wait("enabled", timeout = REFRESH_TIMEOUT)
# Publish
if args.publish:
print("Publish")
win.Publish.click_input()
publish_dialog = win.child_window(auto_id = "KoPublishToGroupDialog")
publish_dialog.child_window(title = WORKSPACE).click_input()
publish_dialog.Select.click()
try:
win.Replace.wait('visible', timeout = 10)
except Exception:
pass
if win.Replace.exists():
win.Replace.click_input()
win["Got it"].wait('visible', timeout = REFRESH_TIMEOUT)
win["Got it"].click_input()
#Close
print("Exiting")
win.close()
# Force close
for proc in psutil.process_iter():
if proc.name() == PROCNAME:
proc.kill()
if __name__ == '__main__':
try:
main()
except Exception as e:
print(e)
sys.exit(1)

Had the same issue, but using "win.type_keys("^S")" solved.

I think I finally figured out what was happening. My prior post about version compatibility did not solve the issue.
It's a very strange bug - pywinauto sometimes misses the correct button. I was able to reproduce this multiple times, although it didn't happen every time - it sometimes happened on win.Refresh.click_input(), sometimes on win.Publish.click_input() This was why the popup dialog 'KoPublishToGroupDialog' cannot be found after clicking Publish and the script fails. This also indicates that the Refresh wouldn't work consistently because the script doesn't check if the dialog opens - it just waits for a predetermined amount of time to allow the refresh to finish.
I implemented a check to see if the dialog window actually opened, e.g.
if not win.child_window(auto_id="KoPublishToGroupDialog").exists():
raise AttributeError("publish dialog failed to open")
along with a retry and maximum wait loop but this didn't fix the issue. What finally worked was much simpler:
The toolbar of the PowerBI Desktop application has two modes, expanded and small. The default setup is to use expanded mode - you can minimize the toolbar to use small icons with the small arrow in the top right of PowerBI Desktop (bottom right in the pic below).
BEFORE:
AFTER
This seems to take care of the pywinauto bug (or whatever actually causes it) where the Refresh or Publish buttons are missed when clicking them.
Hope this helps someone, it took way too long to figure this out.
---UPDATE---:
Unfortunately this was NOT the full solution, what I didn't realize (and probably had a lot to do with missing the buttons) was the virtual machine's screen resolution I'm using to run this script was resized when the connection was closed. I'm using the tscon.exe solution to connect the desktop session to the console via a batch script so it isn't shut down after disconnecting. To keep the resolution I installed qres.exe (added to PATH) and added it to the batch script:
for /f "skip=1 tokens=3" %%s in ('query user %USERNAME%') do (
%windir%\System32\tscon.exe %%s /dest:console
timeout 5
qres /X 1920 /Y 1080 /C 32
)
I use this script to disconnect from the remote desktop session but keep the screen resolution. I also set up my RDP client to connect using this resolution.
At the moment I'm still not 100% sure what else might be necessary to get this to work consistently. For example after setting up the new fixed resolution I had to open PowerBI Desktop manually and set it to windowed (non-maximized) mode so it wouldn't completely miss the Refresh button (it was at least close before). My last test (staying connected) worked, disconnected from RDP it still seems to have issues though.

JRC JJ1000 + dronekit >> ERROR:dronekit.mavlink:Exception in MAVLink input loop

I am trying to connect JRC JJ1000 drone using dronekit + python.
when executing the connect command:
dronekit.connect('com3', baud=115200, heartbeat_timeout=30)
I am getting the following error:
ERROR:dronekit.mavlink:Exception in MAVLink input loop
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\dronekit\mavlink.py", line 211, in mavlink_thread_in
fn(self)
File "C:\Python37\lib\site-packages\dronekit\__init__.py", line 1371, in listener
self._heartbeat_error)
dronekit.APIException: No heartbeat in 5 seconds, aborting.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python37\lib\site-packages\dronekit\__init__.py", line 3166, in connect
vehicle.initialize(rate=rate, heartbeat_timeout=heartbeat_timeout)
File "C:\Python37\lib\site-packages\dronekit\__init__.py", line 2275, in initialize
raise APIException('Timeout in initializing connection.')
dronekit.APIException: Timeout in initializing connection.
I left no store unturned but no progress. I also tried both Python 2.7 and 3.7 with same result.

I have been getting the same error. I am using some custom code in a docker container to run simulations with dronekit and ArduPilot. The error is intermittent. So far it seems like the only way to get the error to stop is to:
Close all docker containers.
Open windows task manager and wait for vmmem to lower memory usage (5-10m).
Try again.
Maybe the problems are related somehow. To me it seems like the connection might be in use by a previous instance and it was not properly close. Since waiting for vmmem to free up resources appears to fix it. I would prefer a better solution if anyone finds one!
We are using python code like this to connect:
from dronekit import connect
...
# try to connect 5 times
while connected == False and fails < 5:
try:
vehicle = connect(connection_string, wait_ready=True)
except:
fails += 1
time.sleep(3)
print("Failed to connect to local mavlink sleeping for 3 seconds")
else:
connected = True
Where the connection_string is of the form:
"tcp:host:port"
Also, the documentation states "If the baud rate is not set correctly, connect may fail with a timeout error. It is best to set the baud rate explicitly." Are you sure that you have the correct baud rate?

Perforce (P4) Python API complains about too many locks

I wrote an application that opens several subprocesses, which initiate connections individually to a Perforce server. After a while I get this error message in almost all of these child-processes:
Traceback (most recent call last):
File "/Users/peter/Desktop/test_app/main.py", line 76, in p4_execute
p4.run_login()
File "/usr/local/lib/python3.7/site-packages/P4.py", line 665, in run_login
return self.run("login", *args, **kargs)
File "/usr/local/lib/python3.7/site-packages/P4.py", line 611, in run
raise e
File "/usr/local/lib/python3.7/site-packages/P4.py", line 605, in run
result = P4API.P4Adapter.run(self, *flatArgs)
P4.P4Exception: [P4#run] Errors during command execution( "p4 login" )
[Error]: "Fatal client error; disconnecting!
Operation 'client-SetPassword' failed.
Too many trys to get lock /Users/peter/.p4tickets.lck."
Does anyone have any idea what could cause this? I open my connections properly and all double checked on all source locations that I disconnect from the server properly via disconnect.
Only deleting the .p4tickets.lck manually works until the error comes back after a few seconds

The relevant code is here:
https://swarm.workshop.perforce.com/projects/perforce_software-p4/files/2018-1/support/ticket.cc#200
https://swarm.workshop.perforce.com/projects/perforce_software-p4/files/2018-1/sys/filetmp.cc#147
I can't see that there's any code path where the ticket.lck file would fail to get cleaned up without throwing some other error.
Is there anything unusual about the home directory where the tickets file lives? Like, say, it's on a network filer with some latency and some kind of backup process? Or maybe one that doesn't properly enforce file locks between all these subprocesses you're spawning?
How often are your scripts running "p4 login" to refresh and re-write the ticket? Many times a second? If you change them to not do that (e.g. only login if there's not already a ticket) does the problem persist?

Decoupled process from terminal still outputs Traceback to the terminal

While testing an application I made using a rest api I discovered this behaviour which I don't understand.
Lets start by reproducing a similar error as follows -
In file call.py -
Note that this file has code that manifests itself visually for example a GUI that runs forever. Here I am just showing you a representation and deliberately making it raise an Exception to show you the issue. Making a get request and then trying to parse the result as json will raise a JSONDecodeError.
import requests
from time import sleep
sleep(3)
uri = 'https://google.com'
r = requests.get(uri)
response_dict = r.json()
Since I want to run this as a daemon process, I decouple this process from the terminal that started it using the following trick -
In file start.py -
import subprocess
import sys
subprocess.Popen(["python3", "call.py"])
sys.exit(0)
And then I execute python3 start.py
It apparently decouples the process because if there are no exceptions the visual manifestation runs perfectly.
However in case of an exception I immediately see this output in the terminal, even though I got a new prompt after calling python3 start.py -
$ python3 start.py
$ Traceback (most recent call last):
File "call.py", line 7, in <module>
response_dict = r.json()
File "/home/walker/.local/lib/python3.6/site-packages/requests/models.py", line 896, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Now, I understand that all exceptions MUST be handled in the program itself. And I have done it after this strange issue, but what is not clear to me is why did this happen at all in the first place?
It doesn't happen if I quit the terminal and restart the terminal (the visual manifestation gets stuck in case of a Traceback, and no output on any terminal as expected)
Why is a decoupled process behaving this way?
NOTE: Decoupling is imperative to me. It is imperative that the GUI run as a background or daemon process and that the terminal that spawns it is freed from it.

by "decoupled", I assume you mean you want the stdout/stderr to go to /dev/null? assuming that's what you mean, that's not what you've told your code to do
from the docs:
stdin, stdout and stderr specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are PIPE, DEVNULL, an existing file descriptor (a positive integer), an existing file object, and None.
With the default settings of None, no redirection will occur; the child’s file handles will be inherited from the parent.
you therefore probably want to be doing:
from subprocess import Popen, DEVNULL
Popen(["python3", "call.py"], stdin=DEVNULL, stdout=DEVNULL, stderr=DEVNULL)
based on the OPs comment, I think they might be after a tool like GNU screen or tmux. terminal multiplexers, like these, allow you to create a virtual terminal that you can disconnect from and reconnect to at need. these answers see https://askubuntu.com/a/220880/106239 and https://askubuntu.com/a/8657/106239 have examples for tmux and screen respectively

python - problems handling exception and continuing

I am building or trying to build a python script which check's a list of ip addresses (ips.txt) for a specific program using the wmi python module. However, no matter how I handle the exceptions on assets with no RPC service running the script stops running on an error. I am using python 2.7.5
Can I catch and pass the error's to proceed?
Can I catch the error and print or return a note that the ip was not alive or rpc was not running?
Thank you in advance
Here is my code:
import wmi
list = open("ips.txt")
for line in list.readlines():
asset = line.strip('\n')
c = wmi.WMI(asset)
try:
for process in c.Win32_Process (name="SbClientManager.exe"):
print asset, process.ProcessId, process.Name
except Exception:
pass
I have tried handling the exceptions in multiple way's to continue parsing my list, but the script continues to error out with the following:
Traceback (most recent call last):
File ".\check_service.py", line 12, in <module>
c = wmi.WMI(asset)
File "C:\Python27\lib\site-packages\wmi.py", line 1290, in connect
handle_com_error ()
File "C:\Python27\lib\site-packages\wmi.py", line 241, in handle_com_error
raise klass (com_error=err)
wmi.x_wmi: <x_wmi: Unexpected COM Error (-2147023174, 'The RPC server is unavailable.', None, None)>
Ultimately, I am just trying to continue the script and catch the error. Maybe a note stating that IP was not responsive would be helpful. Here are the exceptions samples that I have tried:
except Exception:
sys.exc_clear()
except:
pass
except wmi.x_wmi, x:
pass

The traceback you pasted says that the error is in the c = wmi.WMI(asset) line. You need to put that line inside the try block.
Like so:
import wmi
list = open("ips.txt")
bad_assets = []
for line in list.readlines():
asset = line.strip('\n')
try:
c = wmi.WMI(asset)
for process in c.Win32_Process (name="SbClientManager.exe"):
print asset, process.ProcessId, process.Name
except Exception:
bad_assets.append(asset)
Also, trying to catch the right exception is recommended.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.