How to change execution time limit in Jupyter Notebook? - python

I have defined a python function (as a .py file) that fits some scientific data, in a iterative way, for a few dozens of files. And now, I am trying to import this function, in a jupyter notebook, to use as part of another script, to process the obtained data. It is basically something like:
from python_file import defined_function
filename = 'name of the file'
results = defined_function(filename)
This script would naturally take a few minute to end in my machine. However, before it finishes I get an error message, related to the time limit:
RuntimeError: Execution exceeded time limit, max runtime is 30s
How do I change this time limit in my notebook? If it helps, I'm using the ipython version 6.1.0
Thanks

Overriding the NotebookApp.iopub_data_rate_limit = 10000000 in jupyter_notebook_config.py will does the trick. Please note that before you could even see a file named jupyter_notebook_config.py and then, proceed with this fixing, you must run first jupyter notebook --generate-config (For linux users).
If Overriding this in the config file doesn't work for you. Same error regardless of what you set NotebookApp.iopub_data_rate_limit = to in the config file. It shouldn't be in the correct place already. If not, try putting 'NotebookApp.iopub_data_rate_limit = ' at ~/.jupyter/jupyter_notebook_config.py.

Related

Python memory leak in Google Collab when interrupting the run

I'm running the following code in Google Collab (and on Kaggle Notebook).
When running it without pdb.set_trace(), everything works fine.
However, when using pdb.set_trace() and calling "continue/exit", it seems that the array is still stored in memory (memory consumption remains high, by the same size as the array).
from pdb import set_trace # also tried ipdb, IPython.core.debugger
def ccc():
aaa = list(range(50000000))
set_trace()
ccc()
Any ideas?
Thanks in advance.
EDIT
This also occurs when stopping the code execution manually (i.e., KeyboardInterrupt).

Spyder - combination Mpi4Py, SpyderConsole and retrieving variables

Ok a bit of background first - I'm intensely working on a Python implementation of Grammatical Evolution. So far I've managed to solve a lot of deployment issues, optimisations regarding exec, strings and class methods.
Currently I'm working on multi-threading and after getting a large amount of advice from different sources I decided to use Mpi4py since it is already featured in the PyOpus library that I'm using as part of my framework.
Running the example directly from windows command line works like a charm, but I've created a hypothetical situation in my head that I would like to solve (or discover that it is too much of a hassle).
The issue:
I'm running this piece of code from a separate file as suggested by Sypder community:
from IPython import get_ipython
ip = get_ipython()
ip.run_cell("!mpiexec -n 4 python myfile.py")
The original file contents are simple:
from pyopus.parallel.cooperative import cOS
from pyopus.parallel.mpi import MPI
from funclib import jobProcessor
if __name__=='__main__':
# Set up MPI
cOS.setVM(MPI())
# This generator produces 100 jobs which are tuples of the form
# (function, args)
jobGen=((jobProcessor, [value]) for value in range(100))
# Dispatch jobs and collect results
results=cOS.dispatch(jobList=jobGen, remote=True)
# Results are put in the list in the ame order as the jobs are generated by jobGen
print("Results: "+str(results))
# Finish, need to do this if MPI is used
cOS.finalize()
Now the question - how do I access the results variable? Because running the file leaves me with the ip object and no idea how to access the variables stored within (or even knowing if the results exist within it).
Thank you!

Question about the 'non-linear' behaviour of the Python interpreter in Jupyter

I'm running the following code both remotely on a linux machine via ssh, and on the same linux machine as a Jupyter notebook accessed through a browser.
import cv2
import pdf2image
def minimalFun(pdf_filepath, make_me_suffer = False):
print("Now I start.")
images = pdf2image.convert_from_path(pdf_filepath)
print("Pdf read.")
if make_me_suffer:
cv2.namedWindow('test',0)
print("I finished!")
minimalFun('Test.pdf', make_me_suffer = True)
I'm confused on the behaviour of the difference of the behaviour of the Pyhton interpreter in Jupyter and when used on the command line.
In a Jupyter notebook
With the make_me_suffer = False setting the code will just print
Now I start.
Pdf read.
I finished!
meaning in particular that the function pdf2image.convert_from_path ran successfully. However, with the make_me_suffer set to True, the code will print just
Now I start.
and then report that the kernel has died and will be restarting. In particular, the kernel died already with the function pdf2image.convert_from_path.
On the command line
As expected, with the make_me_suffer = False setting the code will just print
Now I start.
Pdf read.
I finished!
but now when the flag is set to make_me_suffer = True, we get
Now I start.
Pdf read.
: cannot connect to X server
meaning that here the function pdf2image.convert_from_path again finished successfully.
The question:
Does the Jupyter interpreter 'look ahead' to see if there will be a command later on requiring an x-windowing system and altering the interpretation of current stuff based on the information. If so, why? Is this common? Does it happen with functions loaded from other files? What is going on?
The reason why I'm asking is, that this took me a lot of time to troubleshoot and pinpoint in a more complex function. This conserns me as I have no idea how to avoid this in the future, other than having from now on a fobia on anything graphical.
Does the Jupyter interpreter 'look ahead' to see if there will be a command later on requiring an x-windowing system and altering the interpretation of current stuff based on the information.
No, it does not.
As you know, you can run cells in any arbitrary order or modify them after you've run them once. This makes notebooks very brittle unless used properly.
You could, however, move your common code (e.g. stuff that initializes a window that you know you'll need) into a regular .py module in the notebook directory and import and use stuff from there.

Stop a python script without losing data

We have been running a script on partner's computer for 18 hours. We underestimated how long it would take, and now need to turn in the results. Is it possible to stop the script from running, but still have access to all the lists we are building?
We need to add additional code to the one we are currently running that will use the lists being populated right now. Is there a way to stop the process, but still use (what has been generated of) the lists in the next portion of code?
My partner was using python interactively.
update
We were able to successfully print the results and copy and paste after interrupting the program with control-C.
Well, OP doesn't seem to need an answer anymore. But I'll answer anyway for anyone else coming accross this.
While it is true that stopping the program will delete all data from memory you can still save it. You can inject a debug session and save whatever you need before you kill the process.
Both PyCharm and PyDev support attaching their debugger to a running python application.
See here for an explanation how it works in PyCharm.
Once you've attached the debugger, you can set a breakpoint in your code and the program will stop when it hits that line the next time. Then you can inspect all variables and run some code via the 'Evaluate' feature. This code may save whatever variable you need.
I've tested this with PyCharm 2018.1.1 Community Edition and Python 3.6.4.
In order to do so I ran this code which I saved as test.py
import collections
import time
data = collections.deque(maxlen=100)
i = 0
while True:
data.append(i % 1000)
i += 1
time.sleep(0.001)
via the command python3 test.py from an external Windows PowerShell instance.
Then I've opened that file in PyCharm and attached the debugger. I set a Breakpoint at the line i += 1 and it halted right there. Then I evaluated the following code fragment:
import json
with open('data.json', 'w') as ofile:
json.dump(list(data), ofile)
And found all entries from data in the json file data.json.
Follow-up:
This even works in an interactive session! I ran the very same code in a jupyter notebook cell and then attached the debugger to the kernel. Still having test.py open, I set the breakpoint again on the same line as before and the kernel halted. Then I could see all variables from the interactive notebook session.
I don't think so. Stopping the program should also release all of the memory it was using.
edit: See Swenzel's comment for one way of doing it.

python3 unotools connection error failed to connect

I have searched for an answer, but nothing has helped so far. I have a method that I want to use to create an odt file and fill it with text. I also want the user to view the file when it is created. I am using python 3.4.3 unotools 0.3.3 LinuxMint 17.1 LibreOffice 4.2.8.2
The issue:
unotools.errors.ConnectionError: failed to connect: ('socket,host=localhost,port=8100', {})
The unotools sample worked fine from terminal - created and saved a sample.odt without errors. My draft code:
def writer_report(self):
subprocess.Popen(["soffice", "--accept='socket,host=localhost,port=8100;urp;StarOffice.Service'"])
time.sleep(5) # using this to give time for LibreOffice to open - temporary
context = connect(Socket('localhost', '8100'))
writer = Writer(context)
writer.set_string_to_end('world\n')
writer.set_string_to_start('hello\n')
writer.store_to_url('output.odt','FilterName','writer8')
writer.close(True)
The LibreOffice application opens and remains open. However, the connection seem to be lost.I hope someone can give me assistance, thank you.
I do not recommend code like this:
subprocess.Popen(...)
time.sleep(...)
It is better to use a shell script to start soffice and then call the python script.
However if you are determined to run soffice in a subprocess, then I recommend increasing the sleep time to at least 15 seconds.
See https://forum.openoffice.org/en/forum/viewtopic.php?t=1014.
Thanks for the advice. I did want this run a a subprocess. I tried extending the time but still no joy.I am now looking at using the Python odfpy 1.3.3 package which after beginning to use for a day or two, I already am having more success with.

Categories