Question about the 'non-linear' behaviour of the Python interpreter in Jupyter

Question about the 'non-linear' behaviour of the Python interpreter in Jupyter - python

I'm running the following code both remotely on a linux machine via ssh, and on the same linux machine as a Jupyter notebook accessed through a browser.
import cv2
import pdf2image
def minimalFun(pdf_filepath, make_me_suffer = False):
print("Now I start.")
images = pdf2image.convert_from_path(pdf_filepath)
print("Pdf read.")
if make_me_suffer:
cv2.namedWindow('test',0)
print("I finished!")
minimalFun('Test.pdf', make_me_suffer = True)
I'm confused on the behaviour of the difference of the behaviour of the Pyhton interpreter in Jupyter and when used on the command line.
In a Jupyter notebook
With the make_me_suffer = False setting the code will just print
Now I start.
Pdf read.
I finished!
meaning in particular that the function pdf2image.convert_from_path ran successfully. However, with the make_me_suffer set to True, the code will print just
Now I start.
and then report that the kernel has died and will be restarting. In particular, the kernel died already with the function pdf2image.convert_from_path.
On the command line
As expected, with the make_me_suffer = False setting the code will just print
Now I start.
Pdf read.
I finished!
but now when the flag is set to make_me_suffer = True, we get
Now I start.
Pdf read.
: cannot connect to X server
meaning that here the function pdf2image.convert_from_path again finished successfully.
The question:
Does the Jupyter interpreter 'look ahead' to see if there will be a command later on requiring an x-windowing system and altering the interpretation of current stuff based on the information. If so, why? Is this common? Does it happen with functions loaded from other files? What is going on?
The reason why I'm asking is, that this took me a lot of time to troubleshoot and pinpoint in a more complex function. This conserns me as I have no idea how to avoid this in the future, other than having from now on a fobia on anything graphical.

Does the Jupyter interpreter 'look ahead' to see if there will be a command later on requiring an x-windowing system and altering the interpretation of current stuff based on the information.
No, it does not.
As you know, you can run cells in any arbitrary order or modify them after you've run them once. This makes notebooks very brittle unless used properly.
You could, however, move your common code (e.g. stuff that initializes a window that you know you'll need) into a regular .py module in the notebook directory and import and use stuff from there.

Related

Execute script after Blender is fully loaded

How do I automatically execute a python script after Blender has fully loaded?
Context
My script generates a scene based on a seed. I want to creat a couple thousand images but since Blender leaks memory after a hundred generations or so everything becomes significantly slower and eventually crashes. I want to migitate the problem by creating only x images per session and completely restart Blender after each session.
Problem
If I load the blend file manually and click the play button in the script editor, everything works as expected. When I try to call the script after startup, it crashes in add_curve_spirals.pyline 184, since context.space_data is None.
Since manually starting the script works fine, the problem is, that Blender is in some sort of wrong state. Starting it with or without GUI (--background) does not affect this.
Failed solutions
blender myfile.blend --python myscript.py executes the script before the context is fully ready and thus produces the error.
Using a handler to delay execution (bpy.app.handlers.load_post) calls my script after completely loading the file but still the context is not ready and it produces the error.
Setting the script in Blender to auto execute on startup (Text/Register) also produces the error.
Using sockets, as suggested here, to send command to Blender at a later time. The server script, that is waiting for incomming commands, blocks Blender during startup and prevents it from fully loading, hence the effect is the same as executing the script directly.
Using timed events (bpy.app.timers.register(render_fun, first_interval=10).
These are all the ways that I found to automatically execute a script. In every case the script seems to be executed too early / in the wrong state and all fail in the same way.
I want to stress that the script is not the issue here. Even if I could work around the particular line, many similar problems might follow and I don't want to rewrite my whole script. So what is the best way to automatically invoke it in the right state?

It turns out, that the problem was the execution context. This became clear after invoking the timed event manually, e.g. after the scene was loaded completely the timed event was still executed in a wrong context.
Since the crash happened in the add_curve_spirals addon, the solution was to provide a context override the the operator invokation. The rest of my script was not equally sensitive to the context and worked just fine.
It was not clear to me, how exactly I should override the context, but this works for now (collected from other parts of the internet, so I don't understand all details):
def get_context():
# create a context that works when blender is executed from the command line.
idx = bpy.context.window_manager.windows[:].index(bpy.context.window)
window = bpy.context.window_manager.windows[idx]
screen = window.screen
views_3d = sorted(
[a for a in screen.areas if a.type == 'VIEW_3D'],
key=lambda a: (a.width * a.height))
a = views_3d[0]
# override
o = {"window" : window,
"screen" : screen,
"area" : a,
"space_data": a.spaces.active,
"region" : a.regions[-1]
}
return o
Final invocation: bpy.ops.curve.spirals(get_context(), spiral_type='ARCH', radius = radius, turns = turns, dif_z = dif_z, ...

How to change execution time limit in Jupyter Notebook?

I have defined a python function (as a .py file) that fits some scientific data, in a iterative way, for a few dozens of files. And now, I am trying to import this function, in a jupyter notebook, to use as part of another script, to process the obtained data. It is basically something like:
from python_file import defined_function
filename = 'name of the file'
results = defined_function(filename)
This script would naturally take a few minute to end in my machine. However, before it finishes I get an error message, related to the time limit:
RuntimeError: Execution exceeded time limit, max runtime is 30s
How do I change this time limit in my notebook? If it helps, I'm using the ipython version 6.1.0
Thanks

Overriding the NotebookApp.iopub_data_rate_limit = 10000000 in jupyter_notebook_config.py will does the trick. Please note that before you could even see a file named jupyter_notebook_config.py and then, proceed with this fixing, you must run first jupyter notebook --generate-config (For linux users).
If Overriding this in the config file doesn't work for you. Same error regardless of what you set NotebookApp.iopub_data_rate_limit = to in the config file. It shouldn't be in the correct place already. If not, try putting 'NotebookApp.iopub_data_rate_limit = ' at ~/.jupyter/jupyter_notebook_config.py.

Stop a python script without losing data

We have been running a script on partner's computer for 18 hours. We underestimated how long it would take, and now need to turn in the results. Is it possible to stop the script from running, but still have access to all the lists we are building?
We need to add additional code to the one we are currently running that will use the lists being populated right now. Is there a way to stop the process, but still use (what has been generated of) the lists in the next portion of code?
My partner was using python interactively.
update
We were able to successfully print the results and copy and paste after interrupting the program with control-C.

Well, OP doesn't seem to need an answer anymore. But I'll answer anyway for anyone else coming accross this.
While it is true that stopping the program will delete all data from memory you can still save it. You can inject a debug session and save whatever you need before you kill the process.
Both PyCharm and PyDev support attaching their debugger to a running python application.
See here for an explanation how it works in PyCharm.
Once you've attached the debugger, you can set a breakpoint in your code and the program will stop when it hits that line the next time. Then you can inspect all variables and run some code via the 'Evaluate' feature. This code may save whatever variable you need.
I've tested this with PyCharm 2018.1.1 Community Edition and Python 3.6.4.
In order to do so I ran this code which I saved as test.py
import collections
import time
data = collections.deque(maxlen=100)
i = 0
while True:
data.append(i % 1000)
i += 1
time.sleep(0.001)
via the command python3 test.py from an external Windows PowerShell instance.
Then I've opened that file in PyCharm and attached the debugger. I set a Breakpoint at the line i += 1 and it halted right there. Then I evaluated the following code fragment:
import json
with open('data.json', 'w') as ofile:
json.dump(list(data), ofile)
And found all entries from data in the json file data.json.
Follow-up:
This even works in an interactive session! I ran the very same code in a jupyter notebook cell and then attached the debugger to the kernel. Still having test.py open, I set the breakpoint again on the same line as before and the kernel halted. Then I could see all variables from the interactive notebook session.

I don't think so. Stopping the program should also release all of the memory it was using.
edit: See Swenzel's comment for one way of doing it.

PyCharm debug mode throws fake errors but runs normally when not in debug

EDIT: I rolled back PyCharm versions and it's working again. CLEARLY an IDE issue, not a script issue now. PyCharm version 2017.2.4 is what I rolled back to.
So I have a script that's been working great for me, until today. For some reason, the script will run fine with no errors at all, as long as I don't use PyCharm (Community Edition 2017.3.3) in debugging mode. I need to use debugger, so when it throws errors for no reason and stops the script, it makes it a pointless IDE.
The reason I know this is a PyCharm problem is because I copied the entire script into a different IDE (Wing), set to the same python interpreter, and went through it in debug mode there and it worked fine, no errors.
I have done extensive error testing to make sure the errors aren't actually there in my script; they're not. The script should work as written. It keeps saying datasets don't exist or input features for arcpy tools (a spatial program that hooks into python via a library called "arcpy") don't have values when they do. It's not a script problem, it's an IDE problem.
Has anybody encountered this and know how to fix it?
I do not have any specific environment settings, I just popped an ArcGIS python interpreter in there for the project so I could have access to the arcpy library and that's it. It should be noted that this interpreter is python 2.7 because ArcGIS is not yet compatible with python 3+. I doubt that has anything to do with it, but you never know...
This is a chunk of script causing the issues (if you don't have/know how to use ArcGIS don't bother trying to run it, it won't work for you). What I want to point out is that if I put a breakpoint at the qh_buffer line, it will break after trying to run that line with an arcpy error message that states invalid input/parameters (they are not invalid, it's written exactly how it should be and I have checked that qhPolys is being created and exists). THEN, if I move the breakpoint to the crop_intersect line and run it in debug, it runs through the entire code, INCLUDING the buffer statement, but then errors out with error 000732 "Input Features: Dataset #1; #2 does not exist or is not supported" (they both do exist, because I have hardcoded them to an output directory before and they are created just fine).
import arcpy
arcpy.env.overwriteOutput = True
svyPtFC = r"C:\Users\xxx\GIS_Testing\Crop_Test.shp"
select_query = '"FID" = 9'
qhPolys = arcpy.Select_analysis(svyPtFC, 'in_memory/qhPolys', select_query)
qh_buffer = arcpy.Buffer_analysis(qhPolys, 'in_memory/qh_buffer', '50
Meters')
cropFID = '"FID" = 1'
cropPoly = arcpy.Select_analysis(svyPtFC, 'in_memory/cropPoly', cropFID)
crop_intersect = arcpy.Intersect_analysis([[cropPoly, 1], [qh_buffer, 2]],
r'C:\Users\xxx\GIS_Testing\crp_int.shp')
feature_count = arcpy.GetCount_management(crop_intersect)
print feature_count
It does not make sense that it can cause an error at the buffer line if I put a breakpoint near there, but then if I move the breakpoint further down, that line will run fine and it'll just break at the next breakpoint... does explain why it works when you just hit "Run" instead of doing debug mode though. No breakpoints!

Why python debugger always get this timeout waiting for response on 113 when using Pycharm?

Bigger image
Especially I run code perhaps running a little long time(10 mins roughly), and hit the break point.
The python debugger always show me this kind of error "timeout waiting for response on 113"
I circle them in red in screencut.
And I use Pycharm as my python IDE, is it just issue for Pycharm IDE? Or Python debugger issue?
And if Pycharm is not recommended, can anyone give me better IDE which be able to debug efficiently.

I had a similar thing happen to me a few months ago, it turned out I had a really slow operation within a __repr__() for a variable I had on the stack. When PyCharm hits a breakpoint it grabs all of the variables in the current scope and calls __repr__ on them. Here's an amusement that demonstrates this issue:
import time
class Foo(object):
def __repr__(self):
time.sleep(100)
return "look at me"
if __name__ == '__main__':
a = Foo()
print "set your breakpoint here"
PyCharm will also call __getattribute__('__class__'). If you have a __getattribute__ that's misbehaving that could trip you up as well.
This may not be what's happening to you but perhaps worth considering.

As you are on Windows, for debugging such & most things I use the good old PythonWin IDE:
This IDE + Debugger runs in the same process as the debugged stuff!
This way, being in direct touch with real objects, like pdb in simple interactive shell, but having a usable GUI, is a big advantage most of the time. And this way there are no issues of transferring vast objects via repr/pickle or so between processes, no delays, no issues of timeouts etc.
If a step takes a long time, PythonWin will also simply wait and not respond before ... (unless one issues a break signal/KeyboardInterrupt via the PythonWin system tray icon).
And the interactive shell of PythonWin is also fully usable during the debugging - with namespace inside the current frame.

It's an old question but reply can be helpful.
Delete the .idea folder from the project root dir. It will clean up the Pycharm's database and the debugger will stop timing out. It works for me on Windows.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.