I am trying to build a service that would allow users using notebook to set automation parameters in a cell like the starting time as to when the notebook should start executing. The service would then take this input time and execute the notebook at the desired time and store the executed notebook to S3. I have looked into papermill but I believe there is no way to add automation parameters like start execution time using that. Is there any ways to achieve this? Or is there a way papermill can achieve this?
Papermill handles just parameterizing and executing the notebooks, not scheduling. For that, you need to use another tool. You can build something yourself on top of Apache Airflow which seems to be the most widespread scheduler for such case. It has a native support for Papermill (see here). Or you can use a ready tool like Paperboy.
To read in-depth about scheduling notebooks, take a look at the article by Netflix.
Take a look at the code here and here for a wrapper that will schedule notebook execution
The shell scripts above create a VM, runs the notebook, saves the output and destroy the instance.
In Google Cloud AI Platform Notebooks we provide a scheduling service which is in Beta now.
Related
I have a script that uses requests library. It is a web scaper that runs for at least 2 days and I don't want to leave my laptop on for that long. So, I wanted to run it on the Cloud but after a lot of trying and reading the documentation, I could not figure out a way to do so.
I just want this: When I run python my_program.py it shows the output on my command line but runs it using Google Cloud services. I already have an account and a project with billing enabled. I already installed the GCP CLI tool and can run it successfully.
My free trial has not ended. I have read quickstart guides but as I am fully beginner regarding the cloud, I don't understand some of the terms.
Please, help me
I think you'll need to setup a Google Cloud Compute Engine instance for that. It's basically a reserved computer/machine where you can run your code. Here's some steps that you should do just to get your program running on the cloud.
Spin up a Compute Engine instance
Gain access to it (through ssh)
Throw your code up there.
Install any dependencies that you may have for you script.
Install tmux and start a tmux session.
Run the script inside that tmux session. Depends on your program, you should be able to see some output inside the session.
Detach it.
Your code is now executing inside that session.
Feel free to disconnect from the Compute Engine instance now and check back later by attaching to the session after connecting back into your instance.
I have a problem and I need a hint how to approach the problem.
I have django application, in which I have sme jupyter notebooks stored in my database. At this point, users can download notebooks and run them on their compuers.
I would like to add functionality, where user could run notebook online. I was thinking of two solutions:
first one is to use some free to use online service, like google colab, but I haven't found any with api where I could send file from my database (maybe you know about some?),
second is to run jupyter hub on my server. I saw how to run jupyter hub remotely, but I don't know how to grant users the access, so they can run notebooks simultaneously, and they don't have access to server itself thorugh it, and do all of this in django.
Do you have any hints that could help me get this functionality?
JupyterHub is a good approach if you trust your users. However, if you want to run untrusted code (like Google Colab does), you need sandboxing. In that case, you can use a Docker image to run notebooks. For example, mikebirdgeneau/jupyterlab. And there is a docker-compose file example: https://github.com/mikebirdgeneau/jupyterlab-docker/blob/master/docker-compose.yml
I have made a few jupyter notebooks to handle some workflows for clients and I would like to deploy them in such a way that the clients cannot see or modify the code / functions i have written. As they have limited knowledge of python it is important that they cannot access the functions and modify the and secondly to stop them from being shared or sold on (although highly unlikely). They may run the notebooks in anaconda / jupyter notebook /lab or alternatively via Azure notebooks or some sort of jupyter hub setup.
The code mostly consists of functions that when called give a ipywdiget display where the client can choose several options of displaying their data or running different calculations. So if they only saw the widgets that would be optimal. I know that it is possible to toggle cells or to hide input but this is easily worked around and they could get to the code. Is it possible to call the function using magics from a py file that is stored somewhere that they cannot access or modify? Are there any other methods?
Thanks
Maybe put your code in one or more external modules and then obfuscate
the modules. See here:
How to obfuscate Python code effectively?
You can't prevent the client from modifying the "launch code" which
imports an external module and calls
something in the module, but you can warn/ask them not to. Something
like in this screenshot
from https://github.com/flatironinstitute/mfa_jupyter_for_programming/blob/master/notebooks/Jupyter%20as%20a%20calculator.ipynb
I have a code with Jupyter notebook and i would like to schedule daily running by Google Cloud.
I already created VM instance and running my code there,
but I couldn't find any guide or video how to implement daily running.
So, how can I do that?
Google is offering a product which is called AI Platform Notebooks. It is implementing lots of useful stuff like lots of open-source frameworks, CI etc. There is also a blog post by the Google Cloud that explains the product in depth and can be found here. I think you can use that to achieve what you want.
You can use cron to schedule the notebook in the VM machine. Please take a look at nbconvert or papermill for executing notebooks.
The other ways to schedule Jupyter Notebook can be to use a web-based application for notebook scheduling:
Mercury
Notebooker
Both of them can automatically execute the notebook in the background and share the resulting notebook as a website.
I have GCE instance setup and already being used. With some services setup and running. I need to be able to stop it and start it with bash or python scripts in a cron job as I won't it to be running only at specific times and days. Is this possible? Also would be nice if I could make a snapshot and restore from it.
You use command line (gcloud tool) or Google Compute API to start or stop the instances. You can implement any of the above method in your script.
Moreover, you can take a look at Preemptible instances which are recently announced. These instances runs on a periodic basis and are very suitable for jobs like batch processing.