Python flush printing many times - python

I am running some python (pytorch) code through slurm. This is my first time using slurm. I have a lot of print statements in my code for status updates, but they aren't printing to the output file that I specify. I think in issue is with the fact that python buffers. However, when I use the -u flag, or set flush=True is some of the print statements, it prints the same thing many times, which is very confusing and I am very unsure why this is happening.
Any suggestions? Because I can't really debug my code without it. Thanks!

Related

Debug a Python program which seems paused for no reason

I am writing a Python program to analyze log files. So basically I have about 30000 medium-size log files and my Python script is designed to perform some simple (line-by-line) analysis of each log file. Roughly it takes less than 5 seconds to process one file.
So once I set up the processing, I just left it there and after about 14 hours when I came back, my Python script simply paused right after analyzing one log file; seems that it hasn't written into the file system for the analyzing output of this file, and that's it. No more proceeding.
I checked the memory usage, it seems fine (less than 1G), I also tried to write to the file system (touch test), it also works as normal. So my question is that, how should I proceed to debug the issue? Could anyone share some thoughts on that? I hope this is not too general. Thanks.
You may use Trace or track Python statement execution and/or The Python Debugger module.
Try this tool https://github.com/khamidou/lptrace with command:
sudo python lptrace -p <process_id>
It will print every python function your program invokes and may help you understand where your program stucks or in an infinity loop.
If it does not output anything, that's proberbly your program get stucks, so try
pstack <process_id>
to check the stack trace and find out where stucks. The output of pstack is c frames, but I believe somehow you can find something useful to solve your problem.

Large dataset - no error - but it wont run - python memory issue?

So I am trying to run various large images which gets put into an array using numpy so that I can then do some calculations. The calculations get done per image and the opening and closing of each image is done in a loop. I a have reached a frustration point because I have no errors in the code (well none to my knowledge nor any that python is complaining about), and as a matter of fact my code runs for one loop, and then it simply does not run for the second, third, or other loops.
I get no errors! No memory error, no syntax error, no nothing. I have used Spyder and even IDLE, and it simply runs all the calculations sometimes only for one image, sometimes for two, then it just quits the loop (again WITH NO ERROR) as if it had completed running for all images (when it has only ran for one/two images).
I am assuming its a memory error? - I mean it runs one loop , sometimes two, but never the rest? -- so ...
I have attempted to clear the tracebacks using this:
sys.exc_clear()
sys.exc_traceback = sys.last_traceback = None
I have also even tried to delete each variable when I am done with it
ie. del variable
However, nothing seems to fix it --
Any ideas of what could be wrong would be appreciated!
The exit code of the python process should reveal the reason for the process exiting. In the event of an adverse condition, the exit code will be something other than 0. If you are running in a Bash shell or similar, you can run "echo $?" in your shell after running Python to see its exit status.
If the exit status is indeed 0, try putting some print statements in your code to trace the execution of your program. In any case, you would do well to post your code for better feedback.
Good luck!

unpredictable behaviour with python subprocess calls

I'm writing a python script that performs a series of operations in a loop, by making subprocess calls, like so:
os.system('./svm_learn -z p -t 2 trial-input model')
os.system('./svm_classify test-input model pred')
os.system('python read-svm-rank.py')
score = os.popen('python scorer.py -g gold-test -i out').readline()
When I make the calls individually one after the other in the shell they work fine. But within the script they always break. I've traced the source of the error and it seems that the output files are getting truncated towards the end (leading me to believe that calls are being made without previous ones being completed).
I tried with subprocess.Popen and then using the wait() method of the Popen object, but to no avail. The script still breaks.
Any ideas what's going on here?
I'd probably first rewrite a little to use the subprocess module instead of the os module.
Then I'd probably scrutinize what's going wrong by studying a system call trace:
http://stromberg.dnsalias.org/~strombrg/debugging-with-syscall-tracers.html
Hopefully there'll be an "E" error code near the end of the file that'll tell you what error is being encountered.
Another option would be to comment out subsets of your subprocesses (assuming the n+1th doesn't depend heavily on the output of the nth), to pin down which one of them is having problems. After that, you could sprinkle some extra error reporting in the offending script to see what it's doing.
But if you're not put off by C-ish syscall traces, that might be easier.

os.system() failing in python

I'm trying to parse some data and make graphs with python and there's an odd issue coming up. A call to os.system() seems to get lost somewhere.
The following three lines:
os.system('echo foo bar')
os.system('gnuplot test.gnuplot')
os.system('gnuplot --version')
Should print:
foo bar
Warning: empty x range [2012:2012], adjusting to [1991.88:2032.12]
gnuplot 4.4 patchlevel 2
But the only significant command in the middle seems to get dropped. The script still runs the echo and version check, and running gnuplot by itself (the gnuplot shell) works too, but there is no warning and no file output from gnuplot.
Why is this command dropped, and why completely silently?
In case it's helpful, the invocation should start gnuplot, it should open a couple of files (the instructions and a data file indicated therein) and write out to an SVG file. I tried deleting the target file so it wouldn't have to overwrite, but to no avail.
This is python 3.2 on Ubuntu Natty x86_64 virtual machine with the 2.6.38-8-virtual kernel.
Is the warning printed to stderr, and that is intercepted somehow?
Try using subprocess instead, for example using
subprocess.check_output(cmd, stderr=subprocess.STDOUT)
and checking the output.
(or plaing subprocess.call might work better than os.system)
So, it turned out the issue was something I failed to mention. Earlier in the script test.gnuplot and test.data were written, but I neglected to call the file objects' close() and verify that they got closed (still don't know how to do that last part so for now it cycles for a bit). So there was some unexpected behaviour going on there causing gnuplot to see two unreadable files, take no action, produce no output, and return 0.
I guess nobody gets points for this one.
Edit: I finally figured it out with the help of strace. Don't know how I did things before I learned how to use it.
don't use os.system. Use subprocess module.
os.system documentation says:
The subprocess module provides more powerful facilities for spawning
new processes and retrieving their results; using that module is
preferable to using this function.
Try this:
subprocess.check_call(['gnuplot', 'test.gnuplot'])

debugging: how to check what where my Python program is hanging?

A fairly large Python program I write, runs, but sometimes, after running for minutes or hours, in a non easily reproducible moment, hangs and outputs nothing to the screen.
I have no idea what it is doing at that moment, and in what part of code it is.
How can I run this in a debugger or something to see what lines of code is the program executing in the moment it hangs?
Its too large to put "print" statements all over the place.
I did:
python -m trace --trace /usr/local/bin/my_program.py
but that gives me so much output that I can't really see anything, just millions of lines scrolling on the screen.
Best would be if I could send some signal to the program with "kill -SIGUSR1" or something, and at that moment the program would drop into a debugger and show me the line it stopped at and possibly allow me to step through the program then.
I've tried:
pdb usr/local/bin/my_program.py
and then:
(Pdb) cont
but what do I do to see where I am when it hangs?
It doesn't throw and exception, just seems like it waits for something, possibly in an infinite loop.
One more detail: when the program hangs, and I press ^C and then (not sure if that is necessary) the program continues normally (without throwing any exception and without giving me any hint on the screen why did it stop).
This could be useful to you. I usually do
>>> import pdb
>>> import program2debug
>>> pdb.run('program2debug.test()')
I usually add a -v option to my programs, which enables tons of print statements explaining what I'm doing in detail. When you write a program in the future, consider doing the same before it gets thousands of lines big.
You could try running it in debug mode in an IDE like pydev (eclipse) or pycharm. You can break the program at any moment and get to its current execution point.
No program is ever too big to put print statements all over the place. You need to read up on the logging module and insert lots of logging.debug() statements. This is just a better form of print statement that outputs to a file, and can be turned off easily in production software. But years from now, when you need to modify the code, you can easily turn it all back on and get the benefit of the insight of the original programmer.

Categories