Stop a python script without losing data - python

We have been running a script on partner's computer for 18 hours. We underestimated how long it would take, and now need to turn in the results. Is it possible to stop the script from running, but still have access to all the lists we are building?
We need to add additional code to the one we are currently running that will use the lists being populated right now. Is there a way to stop the process, but still use (what has been generated of) the lists in the next portion of code?
My partner was using python interactively.
update
We were able to successfully print the results and copy and paste after interrupting the program with control-C.

Well, OP doesn't seem to need an answer anymore. But I'll answer anyway for anyone else coming accross this.
While it is true that stopping the program will delete all data from memory you can still save it. You can inject a debug session and save whatever you need before you kill the process.
Both PyCharm and PyDev support attaching their debugger to a running python application.
See here for an explanation how it works in PyCharm.
Once you've attached the debugger, you can set a breakpoint in your code and the program will stop when it hits that line the next time. Then you can inspect all variables and run some code via the 'Evaluate' feature. This code may save whatever variable you need.
I've tested this with PyCharm 2018.1.1 Community Edition and Python 3.6.4.
In order to do so I ran this code which I saved as test.py
import collections
import time
data = collections.deque(maxlen=100)
i = 0
while True:
data.append(i % 1000)
i += 1
time.sleep(0.001)
via the command python3 test.py from an external Windows PowerShell instance.
Then I've opened that file in PyCharm and attached the debugger. I set a Breakpoint at the line i += 1 and it halted right there. Then I evaluated the following code fragment:
import json
with open('data.json', 'w') as ofile:
json.dump(list(data), ofile)
And found all entries from data in the json file data.json.
Follow-up:
This even works in an interactive session! I ran the very same code in a jupyter notebook cell and then attached the debugger to the kernel. Still having test.py open, I set the breakpoint again on the same line as before and the kernel halted. Then I could see all variables from the interactive notebook session.

I don't think so. Stopping the program should also release all of the memory it was using.
edit: See Swenzel's comment for one way of doing it.

Related

Access variables in Python console for debugging

The main Python code calls my written functions that for better overview are stored in separate .py files. For improvement of my code the program shall start from the beginning and stop at a defined position where I want to do some repair. After stopping I want to have access to the local variables where the code has stopped. This means I want to select some code that was performed before the halt and manipulate this code in the console. After testing in the console I want to correct my original code and run again the program.
Example:
You suppose that the following line doesn't execute as you expect:
if a.find('xyz')==-1:
Therefore you stop the program just before:
breakpoint()
if a.find('xyz')==-1:
Now you want to find out why exactly the line is not working as you expected. Maybe it depends on the variable a or on the string xyz or the find command is not correctly applied? So I would now enter a.find('xyz') in the console, vary and adjust the command. After a few tests in the console I find out that the right command must be a.find('XYZ'). Now I correct the line in my original code and restart the program. ---
But this is not possible because the halt command breakpoint() or pdb.set_trace() prohibits me from using the console. Instead I end up in debug mode where I can only run through the code line by line or display variables.
How can I debug my code as desired?
The following workarounds also do not help:
sys.exit()
I stop the code with sys.exit(). Main problem with this method is that I have no access to the variables if the code has stopped in another file. Also I do not see where the program has stopped. If I have several sys.exit() instances distributed in large code I do not know at which instance it has stopped. I can define individual outputs by sys.exit(‘position1’), sys.exit(‘position2’), but still I have to manually find the file and scroll to the given position.
cells
I define cells with #%%. Then I can run these cells separately, but I cannot run all cells from the beginning until the end of a given cell.
Spyder 5.2.2

Question about the 'non-linear' behaviour of the Python interpreter in Jupyter

I'm running the following code both remotely on a linux machine via ssh, and on the same linux machine as a Jupyter notebook accessed through a browser.
import cv2
import pdf2image
def minimalFun(pdf_filepath, make_me_suffer = False):
print("Now I start.")
images = pdf2image.convert_from_path(pdf_filepath)
print("Pdf read.")
if make_me_suffer:
cv2.namedWindow('test',0)
print("I finished!")
minimalFun('Test.pdf', make_me_suffer = True)
I'm confused on the behaviour of the difference of the behaviour of the Pyhton interpreter in Jupyter and when used on the command line.
In a Jupyter notebook
With the make_me_suffer = False setting the code will just print
Now I start.
Pdf read.
I finished!
meaning in particular that the function pdf2image.convert_from_path ran successfully. However, with the make_me_suffer set to True, the code will print just
Now I start.
and then report that the kernel has died and will be restarting. In particular, the kernel died already with the function pdf2image.convert_from_path.
On the command line
As expected, with the make_me_suffer = False setting the code will just print
Now I start.
Pdf read.
I finished!
but now when the flag is set to make_me_suffer = True, we get
Now I start.
Pdf read.
: cannot connect to X server
meaning that here the function pdf2image.convert_from_path again finished successfully.
The question:
Does the Jupyter interpreter 'look ahead' to see if there will be a command later on requiring an x-windowing system and altering the interpretation of current stuff based on the information. If so, why? Is this common? Does it happen with functions loaded from other files? What is going on?
The reason why I'm asking is, that this took me a lot of time to troubleshoot and pinpoint in a more complex function. This conserns me as I have no idea how to avoid this in the future, other than having from now on a fobia on anything graphical.
Does the Jupyter interpreter 'look ahead' to see if there will be a command later on requiring an x-windowing system and altering the interpretation of current stuff based on the information.
No, it does not.
As you know, you can run cells in any arbitrary order or modify them after you've run them once. This makes notebooks very brittle unless used properly.
You could, however, move your common code (e.g. stuff that initializes a window that you know you'll need) into a regular .py module in the notebook directory and import and use stuff from there.

Debug a Python program which seems paused for no reason

I am writing a Python program to analyze log files. So basically I have about 30000 medium-size log files and my Python script is designed to perform some simple (line-by-line) analysis of each log file. Roughly it takes less than 5 seconds to process one file.
So once I set up the processing, I just left it there and after about 14 hours when I came back, my Python script simply paused right after analyzing one log file; seems that it hasn't written into the file system for the analyzing output of this file, and that's it. No more proceeding.
I checked the memory usage, it seems fine (less than 1G), I also tried to write to the file system (touch test), it also works as normal. So my question is that, how should I proceed to debug the issue? Could anyone share some thoughts on that? I hope this is not too general. Thanks.
You may use Trace or track Python statement execution and/or The Python Debugger module.
Try this tool https://github.com/khamidou/lptrace with command:
sudo python lptrace -p <process_id>
It will print every python function your program invokes and may help you understand where your program stucks or in an infinity loop.
If it does not output anything, that's proberbly your program get stucks, so try
pstack <process_id>
to check the stack trace and find out where stucks. The output of pstack is c frames, but I believe somehow you can find something useful to solve your problem.

How to time a specific function of code block in Pycharm

I'm using Pycharm and playing with the profiler it has built in. I've keyed in on some areas where my code can be optimized but I was wondering if there was a way to step through the code and see how long each line took to execute as I stepped through without having to rerun all my code in the profiler.
I think the closes you could do is put a breakpoint
then open up the debugger and enter console mode
and execute the statement as started=time.time();my_function();print("Took %0.2fs"%(time.time()-started))

PyCharm debug mode throws fake errors but runs normally when not in debug

EDIT: I rolled back PyCharm versions and it's working again. CLEARLY an IDE issue, not a script issue now. PyCharm version 2017.2.4 is what I rolled back to.
So I have a script that's been working great for me, until today. For some reason, the script will run fine with no errors at all, as long as I don't use PyCharm (Community Edition 2017.3.3) in debugging mode. I need to use debugger, so when it throws errors for no reason and stops the script, it makes it a pointless IDE.
The reason I know this is a PyCharm problem is because I copied the entire script into a different IDE (Wing), set to the same python interpreter, and went through it in debug mode there and it worked fine, no errors.
I have done extensive error testing to make sure the errors aren't actually there in my script; they're not. The script should work as written. It keeps saying datasets don't exist or input features for arcpy tools (a spatial program that hooks into python via a library called "arcpy") don't have values when they do. It's not a script problem, it's an IDE problem.
Has anybody encountered this and know how to fix it?
I do not have any specific environment settings, I just popped an ArcGIS python interpreter in there for the project so I could have access to the arcpy library and that's it. It should be noted that this interpreter is python 2.7 because ArcGIS is not yet compatible with python 3+. I doubt that has anything to do with it, but you never know...
This is a chunk of script causing the issues (if you don't have/know how to use ArcGIS don't bother trying to run it, it won't work for you). What I want to point out is that if I put a breakpoint at the qh_buffer line, it will break after trying to run that line with an arcpy error message that states invalid input/parameters (they are not invalid, it's written exactly how it should be and I have checked that qhPolys is being created and exists). THEN, if I move the breakpoint to the crop_intersect line and run it in debug, it runs through the entire code, INCLUDING the buffer statement, but then errors out with error 000732 "Input Features: Dataset #1; #2 does not exist or is not supported" (they both do exist, because I have hardcoded them to an output directory before and they are created just fine).
import arcpy
arcpy.env.overwriteOutput = True
svyPtFC = r"C:\Users\xxx\GIS_Testing\Crop_Test.shp"
select_query = '"FID" = 9'
qhPolys = arcpy.Select_analysis(svyPtFC, 'in_memory/qhPolys', select_query)
qh_buffer = arcpy.Buffer_analysis(qhPolys, 'in_memory/qh_buffer', '50
Meters')
cropFID = '"FID" = 1'
cropPoly = arcpy.Select_analysis(svyPtFC, 'in_memory/cropPoly', cropFID)
crop_intersect = arcpy.Intersect_analysis([[cropPoly, 1], [qh_buffer, 2]],
r'C:\Users\xxx\GIS_Testing\crp_int.shp')
feature_count = arcpy.GetCount_management(crop_intersect)
print feature_count
It does not make sense that it can cause an error at the buffer line if I put a breakpoint near there, but then if I move the breakpoint further down, that line will run fine and it'll just break at the next breakpoint... does explain why it works when you just hit "Run" instead of doing debug mode though. No breakpoints!

Categories