Decoupled process from terminal still outputs Traceback to the terminal - python

While testing an application I made using a rest api I discovered this behaviour which I don't understand.
Lets start by reproducing a similar error as follows -
In file call.py -
Note that this file has code that manifests itself visually for example a GUI that runs forever. Here I am just showing you a representation and deliberately making it raise an Exception to show you the issue. Making a get request and then trying to parse the result as json will raise a JSONDecodeError.
import requests
from time import sleep
sleep(3)
uri = 'https://google.com'
r = requests.get(uri)
response_dict = r.json()
Since I want to run this as a daemon process, I decouple this process from the terminal that started it using the following trick -
In file start.py -
import subprocess
import sys
subprocess.Popen(["python3", "call.py"])
sys.exit(0)
And then I execute python3 start.py
It apparently decouples the process because if there are no exceptions the visual manifestation runs perfectly.
However in case of an exception I immediately see this output in the terminal, even though I got a new prompt after calling python3 start.py -
$ python3 start.py
$ Traceback (most recent call last):
File "call.py", line 7, in <module>
response_dict = r.json()
File "/home/walker/.local/lib/python3.6/site-packages/requests/models.py", line 896, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Now, I understand that all exceptions MUST be handled in the program itself. And I have done it after this strange issue, but what is not clear to me is why did this happen at all in the first place?
It doesn't happen if I quit the terminal and restart the terminal (the visual manifestation gets stuck in case of a Traceback, and no output on any terminal as expected)
Why is a decoupled process behaving this way?
NOTE: Decoupling is imperative to me. It is imperative that the GUI run as a background or daemon process and that the terminal that spawns it is freed from it.

by "decoupled", I assume you mean you want the stdout/stderr to go to /dev/null? assuming that's what you mean, that's not what you've told your code to do
from the docs:
stdin, stdout and stderr specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are PIPE, DEVNULL, an existing file descriptor (a positive integer), an existing file object, and None.
With the default settings of None, no redirection will occur; the child’s file handles will be inherited from the parent.
you therefore probably want to be doing:
from subprocess import Popen, DEVNULL
Popen(["python3", "call.py"], stdin=DEVNULL, stdout=DEVNULL, stderr=DEVNULL)
based on the OPs comment, I think they might be after a tool like GNU screen or tmux. terminal multiplexers, like these, allow you to create a virtual terminal that you can disconnect from and reconnect to at need. these answers see https://askubuntu.com/a/220880/106239 and https://askubuntu.com/a/8657/106239 have examples for tmux and screen respectively

Related

Perforce (P4) Python API complains about too many locks

I wrote an application that opens several subprocesses, which initiate connections individually to a Perforce server. After a while I get this error message in almost all of these child-processes:
Traceback (most recent call last):
File "/Users/peter/Desktop/test_app/main.py", line 76, in p4_execute
p4.run_login()
File "/usr/local/lib/python3.7/site-packages/P4.py", line 665, in run_login
return self.run("login", *args, **kargs)
File "/usr/local/lib/python3.7/site-packages/P4.py", line 611, in run
raise e
File "/usr/local/lib/python3.7/site-packages/P4.py", line 605, in run
result = P4API.P4Adapter.run(self, *flatArgs)
P4.P4Exception: [P4#run] Errors during command execution( "p4 login" )
[Error]: "Fatal client error; disconnecting!
Operation 'client-SetPassword' failed.
Too many trys to get lock /Users/peter/.p4tickets.lck."
Does anyone have any idea what could cause this? I open my connections properly and all double checked on all source locations that I disconnect from the server properly via disconnect.
Only deleting the .p4tickets.lck manually works until the error comes back after a few seconds
The relevant code is here:
https://swarm.workshop.perforce.com/projects/perforce_software-p4/files/2018-1/support/ticket.cc#200
https://swarm.workshop.perforce.com/projects/perforce_software-p4/files/2018-1/sys/filetmp.cc#147
I can't see that there's any code path where the ticket.lck file would fail to get cleaned up without throwing some other error.
Is there anything unusual about the home directory where the tickets file lives? Like, say, it's on a network filer with some latency and some kind of backup process? Or maybe one that doesn't properly enforce file locks between all these subprocesses you're spawning?
How often are your scripts running "p4 login" to refresh and re-write the ticket? Many times a second? If you change them to not do that (e.g. only login if there's not already a ticket) does the problem persist?

pysipp - Trying to use it with existing sipp conf file

Background
I have an existing sipp conf file that I launch like so:
sipp mysipdomain.net -sf ./testcall.conf -m 1 -s 12345 -i 10.1.1.1:5060
This runs just fine. It simulates a call in our test labs. But now I need to expand this test to make it a part of a larger test script where not only do I launch the sipp test, but I prove (via sip trace) that it's hitting the right boxes.
I decided to wrap this sipp call in python. I just found https://github.com/SIPp/pysipp and am trying to see if I can write this entire test in python. To start, i tried to run the same sipp test using pysipp.
Problem / Question
I'm currently getting an error that says:
lab2:/tmp/jj/sipp_tests# python mvv_numeric.py
No handlers could be found for logger "pysipp"
Traceback (most recent call last):
File "mvv_numeric.py", line 6, in <module>
uac()
File "/usr/lib/python2.7/site-packages/pysipp-0.1.alpha-py2.7.egg/pysipp/agent.py", line 71, in __call__
raise_exc=raise_exc, **kwargs
File "/usr/lib/python2.7/site-packages/pluggy-0.3.1-py2.7.egg/pluggy.py", line 724, in __call__
return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
File "/usr/lib/python2.7/site-packages/pluggy-0.3.1-py2.7.egg/pluggy.py", line 338, in _hookexec
return self._inner_hookexec(hook, methods, kwargs)
File "/usr/lib/python2.7/site-packages/pluggy-0.3.1-py2.7.egg/pluggy.py", line 333, in <lambda>
_MultiCall(methods, kwargs, hook.spec_opts).execute()
File "/usr/lib/python2.7/site-packages/pluggy-0.3.1-py2.7.egg/pluggy.py", line 596, in execute
res = hook_impl.function(*args)
File "/usr/lib/python2.7/site-packages/pysipp-0.1.alpha-py2.7.egg/pysipp/__init__.py", line 250, in pysipp_run_protocol
finalize(cmds2procs, raise_exc=raise_exc)
File "/usr/lib/python2.7/site-packages/pysipp-0.1.alpha-py2.7.egg/pysipp/__init__.py", line 228, in finalize
raise SIPpFailure(msg)
pysipp.SIPpFailure: Some agents failed
'uac' with exit code 255 -> Command or syntax error: check stderr output
Code
Here's what the py script looks like:
1 import pysipp
2 uac = pysipp.client(destaddr=('mysipdomain.net', 5060))
3 uac.uri_username = '12345'
4 uac.auth_password = ''
5 uac.scen_file = './numeric.xml'
6 uac()
And the original sipp "testcall.conf" has been renamed to "numeric.xml" and looks like this: (I'm only including the first part because it's quite long. if you need to see something specific, please let me know and I will add to this post)
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE scenario SYSTEM "sipp.dtd">
<scenario name="UAC with Media">
<send retrans="10000">
<![CDATA[
INVITE sip:[service]#[remote_ip]:[remote_port] SIP/2.0
Via: SIP/2.0/[transport] [local_ip]:[local_port];branch=[branch]
From: sipp <sip:sipp#[local_ip]:[local_port]>;tag=[pid]SIPpTag00[call_number]
To: [service] <sip:[service]#[remote_ip]:[remote_port]>
Call-id: [call_id]
CSeq: 1 INVITE
Contact: <sip:sipp#[local_ip]:[local_port]>
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER
User-Agent: PolycomVVX-VVX_300-UA/5.5.2.8571
Accept-Language: en
Supported: replaces,100rel
Allow-Events: conference,talk,hold
Max-Forwards: 70
Content-Type: application/sdp
Content-Length: [len]
I'm sure it's something simple I've overlooked. Any pointers would be appreciated.
EDIT:
I added debug level logging and reran the python script. In the logs I can now see what pysipp is actually attempting:
2018-01-31 14:40:32,715 MainThread [DEBUG] pysipp launch.py:63 : launching cmd:
"'/usr/bin/sipp' 'mysipdomain.net':'5060' -s '12345' -sn 'uac' -sf 'numeric.xml' -recv_timeout '5000' -r '1' -l '1' -m '1' -log_file '/tmp/uac_log_file' -screen_file '/tmp/uac_screen_file' -trace_logs -trace_screen "
So comparing that with the original command line I use to run sipp, I see the extra "-sn 'uac'".
Going to see about either getting my SIPP script to work with that tag or ... google to see if I can find other similar posts.
In the meantime, if you see my mistake, i'm all ears.
The problem here (as you noticed) is likely that pysipp.client() sets the -sn uac flag and sipp fails having both -sn and -sf.
To see the actual error you can enable logging before running the client:
import pysipp
pysipp.utils.log_to_stderr("DEBUG")
uac = pysipp.client(destaddr=('mysipdomain.net', 5060))
uac.uri_username = '12345'
uac.auth_password = ''
uac.scen_file = './numeric.xml'
uac()
The hack is to simply do uac.scen_name = None but the proper way to do this is to either use pysipp.scenario() (docs here) and rename your numeric.xml to have uac in the file name (i.e. uac_numeric.xml) or use instead pysipp.ua(scen_file=<path/to/numeric.xml>).
To understand the problem the client is currently applying a default scenario name argument when really the user should be able to override that (though in that case there'll be no guarantee that the user actually is sending client traffic which renders the name client kind of pointless).

Writing to file and running the files sometimes works, mostly only first one

This code produces WindowsError most times, rarely (like often first time it is run) not.
""" Running hexlified codes from codefiles module prepared previously """
import tempfile
import subprocess
import threading
import os
import codefiles
if __name__ == '__main__':
for ind, c in enumerate(codefiles.exes):
fn = tempfile.mktemp() + '.exe'
# for some reason hexlified code is sometimes odd length and one nibble
if len(c) & 1:
c += '0'
c = c.decode('hex')
with open(fn, 'wb') as f:
f.write(c)
threading.Thread(target=lambda:subprocess.Popen("", executable=fn)).start()
""" One time works, one time WindowsError 32
>>>
132096 c:\docume~1\admin\locals~1\temp\tmpkhhxxo.exe
991232 c:\docume~1\admin\locals~1\temp\tmp9ow6zz.exe
>>> ================================ RESTART ================================
>>>
132096 c:\docume~1\admin\locals~1\temp\tmp3hb0cf.exe
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Documents and Settings\Admin\My Documents\Google Drive\Python\Tools\runner.pyw", line 18, in <lambda>
threading.Thread(target=lambda:subprocess.Popen("", executable=fn)).start()
File "C:\Python27\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 958, in _execute_child
startupinfo)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process
991232 c:\docume~1\admin\locals~1\temp\tmpnkfuon.exe
>>>
"""
Hexlification is done with this script, and it sometimes seems to produce odd number of nibbles which seems also odd.
# bootstrapper for running hexlified executables,
# hexlification can be done by running this file directly or calling function make
# running can be done by run function in runner module, which imports the code,
# runs them from temporary files
import os
modulename = 'codefiles.py'
def code_file_names(d='.', ext='.exe'):
return [n for n in os.listdir(d)
if n.lower().endswith(ext)]
def make():
codes = code_file_names(os.curdir)
with open(modulename, 'a') as f:
f.write('exes = (')
hex_codes = [open(n, 'rb').read().encode('hex') for n in codes]
assert all(len(c) & 1 == 0 for c in hex_codes)
print len(hex_codes),map(len, hex_codes)
hex_codes = [repr(c) for c in hex_codes]
f.write(',\n'.join(hex_codes))
f.write(')\n')
if __name__ == '__main__':
import make_exe
# prepare hexlified exes for exes in directory if codefiles not prepared
if modulename not in os.listdir('.'):
try:
os.remove(modulename)
except:
pass
make()
# prepare script for py2_exe to execute the scripts by run from runner
make_exe.py2_exe('runner.pyw', 'tools/dvd.ico')
After Mr. Martelli's suggestion, I got the error disappear, but still not expected result.
I pass in new version of code the creation of exe file if it exists (saved file names in new creation routine). It launches both codes (two hexlified codes) when creating the files, but multiple copies of first code afterwards.
tempfile.mktemp(), besides being deprecated since many years because of its security problems, guarantees unique names only within a single run of a program, since, per https://docs.python.org/2/library/tempfile.html#tempfile.mktemp , "The module uses a global variable that tell it how to construct a temporary name". So on the second run, as the very clear error message tells you, "The process cannot access the file because it is being used by another process" (specifically, the process started by the previous run, i.e you cannot re-write an .exe that's currently running in some process).
The fix is to make sure each run uses its own unique directory for temporary files. See mkdtemp at https://docs.python.org/2/library/tempfile.html#tempfile.mkdtemp . How to eventually clean up those temporary directories is a different issue since that can't be done as long as any process is running an .exe file within such a directory -- you'll probably need a separate "clean-up script" that does what it can for the purpose, run periodically (e.g in Unix I'd use cron) and a repository (e.g a small sqlite DB, or even just a file) to record which of the temporary directories created in previous runs still exist and need to be cleared up (catching the exceptions seen when they can't get cleaned up yet, so as to retry in the future).

Sequentially running an external independent process using Tkinter and python

BACKGROUND :
*I'm creating a batch simulation job chooser + scheduler using Tkinter (Portable PYscripter, python v2.7.3)
*This program will function as a front end, to a commercial solver program
*The program needs to allow the user to choose a bunch of files to simulate, sequentially, one after the other.
*It also needs to have the facility to modify (Add/delete) jobs from an existing/running job list.
*Each simulation will definitely run for several hours.
*The output of the simulation will be viewed on separate programs and I do not need any pipe to the output. The external viewer will be called from the GUI, when desired.
***I have a main GUI window, which allows the user to :
choose job files, submit jobs, view the submission log, stop running jobs(one by one)
The above works well.
PROBLEMS :
*If I use subprocess.Popen("command") : all the simulation input files are launched at the same time. It MUST be sequential (due to license and memory limitations)
*If I use subprocess.call(" ") or the wait() method, then the GUI hangs and there is no scope to stop/add/modify the job list. Even if the "job submit" command is on an independent window, both the parent windows hang untill the job completes.
QUESTION 1 :
*How do I launch the simulation jobs sequentially (like subprocess.call) AND allow the main GUI window to function for the purpose of job list modification or stopping a job ?
The jobs are in a list, taken using "askopenfilenames" and then run using a For loop.
Relevant parts of the Code :
cfx5solvepath=r"c:\XXXX"
def file_chooser_default():
global flist1
flist1=askopenfilename(parent = root2, filetypes =[('.def', '*.def'),('All', '*.*'),('.res', '*.res')], title ="Select Simulation run files...", multiple = True)[1:-1].split('} {')
def ext_process():
o=list(flist1)
p=list(flist1)
q=list(flist1)
i=0
while i < len(flist1):
p[i]='"%s" -def "%s"'%(cfx5solvepath,flist1[i])
i+=1
i=0
while i < len(p):
q[i]=subprocess.call(p[i])
i+=1
root2 = Tk()
root2.minsize(300,300)
root2.geometry("500x300")
root2.title("NEW WINDOW")
frame21=Frame(root2, borderwidth=3, relief="solid").pack()
w21= Button(root2,fg="blue", text="Choose files to submit",command=file_chooser_default).pack()
w2a1=Button(root2,fg="white", text= 'Display chosen file names and order', command=lambda:print_var(flist1)).pack()
w2b1= Button (root2,fg="white", bg="red", text="S U B M I T", command=ext_process).pack()
root2.mainloop()
Please let me know if you require anything else. Look forward to your help.
*EDIT *
On incorporating the changes suggested by #Tim , the GUI is left free. Since there is a specific sub-program associated with the main solver program to stop the job, I am able to stop the job using the right command.
Once the currently running job is stopped, the next job on the list starts up, automatically, as I was hoping.
This is the code used for stopping the job :
def stop_select(): #Choose the currently running files which are to be stopped
global flist3
flist3=askdirectory().split('} {')
def sim_stop(): #STOP the chosen simulation
st=list(flist3)
os.chdir("%s"%flist3[0])
st= subprocess.call('"%s" -directory "%s"'%(defcfx5stoppath,flist3[0]))
ret1=tkMessageBox.showinfo("INFO","Chosen simulation stopped successfully")
os.chdir("%s" %currentwd)
QUESTION 2 :
*Once the above jobs are Completed, using start_new_thread, the GUI doesn't respond. The GUI works while the jobs are running in the background. But the start_new_thread documentation says that the thread is supposed to exit silently when the function returns.
*Additionally, I have a HTML log file that is written into/updated as each job completes. When I use start_new_thread ,the log file content is visible only AFTER all the jobs complete. The contents, along with the time stamps are however correct. Without using start_new_thread, I was able to refresh the HTML file to get the updated submission log.
***On exiting the GUI program using the Task manager several times, I am suddenly unable to use the start_new_thread function !! I have tried reinstalling PYscripter and restarting the computer as well. I can't figure out anything sensible from the traceback, which is:
Traceback (most recent call last):
File "<string>", line 532, in write
File "C:\Portable Python 2.7.3.1\App\lib\site-packages\rpyc\core\protocol.py", line 439, in _async_request
seq = self._send_request(handler, args)
File "C:\Portable Python 2.7.3.1\App\lib\site-packages\rpyc\core\protocol.py", line 229, in _send_request
self._send(consts.MSG_REQUEST, seq, (handler, self._box(args)))
File "C:\Portable Python 2.7.3.1\App\lib\site-packages\rpyc\core\protocol.py", line 244, in _box
if brine.dumpable(obj):
File "C:\Portable Python 2.7.3.1\App\lib\site-packages\rpyc\core\brine.py", line 369, in dumpable
return all(dumpable(item) for item in obj)
File "C:\Portable Python 2.7.3.1\App\lib\site-packages\rpyc\core\brine.py", line 369, in <genexpr>
return all(dumpable(item) for item in obj)
File "C:\Portable Python 2.7.3.1\App\lib\site-packages\rpyc\core\brine.py", line 369, in dumpable
return all(dumpable(item) for item in obj)
File "C:\Portable Python 2.7.3.1\App\lib\site-packages\rpyc\core\brine.py", line 369, in <genexpr>
return all(dumpable(item) for item in obj)
File "C:\Portable Python 2.7.3.1\App\Python_Working_folder\v350.py", line 138, in ext_process
q[i]=subprocess.call(p[i])
File "C:\Portable Python 2.7.3.1\App\lib\subprocess.py", line 493, in call
return Popen(*popenargs, **kwargs).wait()
File "C:\Portable Python 2.7.3.1\App\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "C:\Portable Python 2.7.3.1\App\lib\subprocess.py", line 896, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
I'd suggest using a separate thread for the job launching. The simplest way would be to use the start_new_thread method from the thread module.
Change the submit button's command to command=lambda:thread.start_new_thread(ext_process, ())
You will probably want to disable the button when it's clicked and enable it when the launching is complete. This can be done inside ext_process.
It becomes more complicated if you want to allow the user to cancel jobs. This solution won't handle that.

How can python subprocess.Popen see select.poll and then later not? (select 'module' object has no attribute 'poll')

I'm using the (awesome) mrjob library from Yelp to run my python programs in Amazon's Elastic Map Reduce. It depends on subprocess in the standard python library. From my mac running python2.7.2, everything works as expected
However, when I switched to using the exact same code on Ubuntu LTS 11.04 also with python2.7.2, I encountered something strange:
mrjob loads the job, and then attempts to communicate with its child processes using subprocess and generates this error:
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.1-py2.7.egg/mrjob/emr.py", line 1212, in _build_steps
steps = self._get_steps()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.1-py2.7.egg/mrjob/runner.py", line 1003, in _get_steps
stdout, stderr = steps_proc.communicate()
File "/usr/lib/python2.7/subprocess.py", line 754, in communicate
return self._communicate(input)
File "/usr/lib/python2.7/subprocess.py", line 1302, in _communicate
stdout, stderr = self._communicate_with_poll(input)
File "/usr/lib/python2.7/subprocess.py", line 1332, in _communicate_with_poll
poller = select.poll()
AttributeError: 'module' object has no attribute 'poll'
This appears to be a problem with subprocess and not mrjob.
I dug into /usr/lib/python2.7/subprocess.py and found that during import it runs:
if mswindows:
... snip ...
else:
import select
_has_poll = hasattr(select, 'poll')
By editing that, I verified that it really does set _has_poll==True. And this is correct; easily verified on the command line.
However, when execution progresses to using Popen._communicate_with_poll somehow the select module has changed! This is generated by printing dir(select) right before it attempts to use select.poll().
['EPOLLERR', 'EPOLLET', 'EPOLLHUP', 'EPOLLIN', 'EPOLLMSG',
'EPOLLONESHOT', 'EPOLLOUT', 'EPOLLPRI', 'EPOLLRDBAND',
'EPOLLRDNORM', 'EPOLLWRBAND', 'EPOLLWRNORM', 'PIPE_BUF',
'POLLERR', 'POLLHUP', 'POLLIN', 'POLLMSG', 'POLLNVAL',
'POLLOUT', 'POLLPRI', 'POLLRDBAND', 'POLLRDNORM',
'POLLWRBAND', 'POLLWRNORM', '__doc__', '__name__',
'__package__', 'error', 'select']
no attribute called 'poll'!?!? How did it go away?
So, I hardcoded _has_poll=False and then mrjob happily continues with its work, runs my job in AWS EMR, with subprocess using communicate_with_select... and I'm stuck with a hand-modified standard library...
Any advice? :-)
I had a similar problem and it turns out that gevent replaces the built-in select module with gevent.select.select which doesn't have a poll method (as it is a blocking method).
However for some reason by default gevent doesn't patch subprocess which uses select.poll.
An easy fix is to replace subprocess with gevent.subprocess:
import gevent.monkey
gevent.monkey.patch_all(subprocess=True)
import sys
import gevent.subprocess
sys.modules['subprocess'] = gevent.subprocess
If you do this before importing the mrjob library, it should work fine.
Sorry for writing a full answer instead of a comment, otherwise I'd lose code indentation.
I cannot help you directly since something seems very strictly tied to your code, but I can help you find out, by relying on the fact that Python modules can be arbitrary object, try something like that:
class FakeModule(dict):
def __init__(self, origmodule):
self._origmodule = origmodule
self.__all__ = dir(origmodule)
def __getattr__(self, attr):
return getattr(self._origmodule, attr)
def __delattr__(self, attr):
if attr == "poll":
raise RuntimeError, "Trying to delete poll!"
self._origmodule.__delattr__(attr)
def replaceSelect():
import sys
import select
fakeselect = FakeModule(select)
sys.modules["select"] = fakeselect
replaceSelect()
import select
del select.poll
and you'll get an output like:
Traceback (most recent call last):
File "domy.py", line 27, in <module>
del select.poll
File "domy.py", line 14, in __delattr__
raise RuntimeError, "Trying to delete poll!"
RuntimeError: Trying to delete poll!
By calling replaceSelect() in your code you should be able to get a traceback of where somebody is deleting poll(), so you can understand why.
I hope my FakeModule implementation is good enough, otherwise you might need to modify it.

Categories