I have been using PaperMill for executing my python notebook periodically. To execute compute intensive notebook, I need to connect to remote kernel running in my EMR cluster.
In case of Jupyter notebook I can do that by starting jupyter server with jupyter notebook --gateway-url=http://my-gateway-server:8888 and I am able to execute my code on remote kernel. But how do I let my local python code(through PaperMill) to use remote kernel? What changes do what to make in Kernel Manager to connect to remote kernel?
One related SO answer I could find is here. This suggests to do port forwarding to remote server and initialize KernelManager with the connection file from the server. I am not able to do this as blockingkernelmanager is no longer in Ipython.zmp and I would also prefer HTTP connection like how jupyter does.
Hacky approach - Set up a shell script to do the following :
Create a python environment on your EMR masternode using the hadoop user
Install sparkmagic in your environment and configure all kernels as described in the README.md file for sparkmagic
Copy your notebook to master node/use it directly from s3 location
Run with papermill :
papermill s3://path/to/notebook/input.ipynb s3://path/to/notebook/output.ipynb -p param=1
Step 1 and 2 are one time requirements if your cluster master node is the same every time.
A slightly better approach :
Set up a remote kernel in your Jupyter itself : REMOTE KERNEL
Execute with papermill as a normal notebook by selecting this remote kernel
I am using both approaches for different use cases and they seem to work fine for now.
Related
I am running a Jupyter Notebook on my laptop. Is it possible to run one/two cells of the script on a remote server that I have access to?
The remote server is more powerful, however I have been allocated a limited amount of storage on the server so the bulk of work has to happen on my device.
I have used OS.system to run a python script on the server from Jupyter on my laptop, but it seems inefficient.
Both devices are running Ubuntu.
I want to run local code using local data on a remote server and get back execution results back to my Jupyter notebook cells.
Not usual scheme "run Jupyter notebook remotely, connect to remote notebook via ssh tunneling" but more sophisticated via custom remote kernel which I may choose from the kernel list, and run local code on remote server seamlessly.
Some packages (like this -- https://pypi.org/project/remote-kernel) mention that it is possible, but look dated and come with limited usage instructions.
Anyone knows how to implement this? If so, be as more detailed as possible, thanks!
Suppose I have a Google Colab Notebook in an address like below:
https://colab.research.google.com/drive/XYZ
I want to keep it running for 12 hours, however, then again I want to turn my computer off. As a solution, I can connect to our Lab's server via ssh. The server is running all the time. I would like to know if it's possible that I load and run the notebook there?
I found a solution to connect to a Google Colab Session via ssh (colab_ssh package), but it again needs a running Colab Session.
I also tried to browse the link with lynx, but it needs login and this isn't supported by this browser.
Yes, it is possible. You would first need to download your colab notebook as an .ipynb file, then copy it to your server. Then, you can follow one of the guides on how to connect to a remotely running jupyter notebook session, like this one. All you need is the jupyter notebook software on your server, and an ssh client on your local computer.
Edit: I forgot to mention this: To keep your session alive even after closing the ssh connection, you can use tools like screen. The link provides more detailed explanation, but the general idea is that after connecting to your server, first you need to create a session like this:
screen -S <session_name>
which will create a new session and attach you to it (which is the term used when you are inside a session). Then, you can fire up your jupyter notebook here, and it will keep running even after closing the ssh connection. (You just have to make sure you don't kill the screen session using Ctrl+a followed by k)
Now, you have an indefinitely running jupyter notebook session on your server. You can connect to it via
ssh -N -f -L localhost:YYYY:localhost:XXXX remoteuser#remotehost
as mentioned in the first linked guide, use the browser to run a code cell on your jupyter notebook, and then turn off your laptop without worrying about interrupting your notebook session.
I set up a new Ubuntu instance on AWS EC2. I SSH to the instance using a private key pair. I installed python, jupyter, pyspark and all the necessary modules. I then start a Jupyter notebook using tmux.
My main aim is simply to run pyspark on an AWS instance (using Jupyter). Unfortunately, I keep running into problems with stability of the Jupyter notebook/connection to the instance. After running the Jupyter notebook for some time (sometimes 5 minutes, other times 2 hours+), it ends up "disconnecting". The kernel in the Jupyter disconnects and then does not process any further calls. At that point, I cannot SSH into the instance (just hangs -> blank screen).
I tried running the same setup on GCP but run into the same symptoms.
Is there something basic that I am missing?
Why would I not be able to SSH into the instance?
Is it possible that the Ubuntu server is crashing?
I've been tryin to run some simple code in IPython notebook but i keep getting this error:
"A WebSocket connection could not be established. You will NOT be able to run code. Check your network connection or notebook server configuration."
There were no problems during the install and there are no error messeges when i load the notebook.
I'm thinking maybe it has something to do with the fact im running my local server on xamp?
Doed anyone have a clue how to solve this?
I would be very greatfull.
Edit: I'm loading my notebook using the command 'ipython notebook' in the command prompet the output is:
[NotebookApp]"Using existing profile dir: c:\users\Nimrod\.ipython\profile_default
[NotebookApp]using MathJax from CDN: http://cdn.mathjax.org/mathjax/latest/mathjax.js
[NotebookApp] Serving notebooks from local directory c:\users\Nimrod
[NotebookApp] 0 active kernels
[NotebookApp] use control c to stop server and shut doen all kernels
[NotebookApp] Kernel started: 0ac0db12-63a0-4a4a-be25-0051
Thanks a lot.
Okay, by default ipython notebook launches standalone using the tornado http server running on local port 8888.
Try typing localhost:8888 into your browser.
If you want to customize it to run on a different port use:
ipython notebook --port=<NEW PORT>
If you'd also like to allow connections from remote machines do:
ipython notebook --ip=0.0.0.0 --port=<NEW PORT>