CPU and RAM needed to run Python code on Azure - python

I'm trying to recreate on of the charts found at the RtCOVID-19 website, using a previously archived version of the code found here. I am using Spyder via Anaconda to run the Python scripts.
After cloning the repo, I create a project and attempt to run the following lines of code, pulled from this jupyter notebook in the repo, which should output the tables needed to create the 'Oregon' graph.
import pymc3 as pm
import pandas as pd
import numpy as np
import arviz as az
from matplotlib import pyplot as plt
from covid.models.generative import GenerativeModel
from covid.data import summarize_inference_data
from covid.data import get_and_process_covidtracking_data, summarize_inference_data
df = get_and_process_covidtracking_data(run_date=pd.Timestamp.today()-pd.Timedelta(days=1))
region = "OR"
model_data = df.loc[region]
gm = GenerativeModel(region, model_data)
gm.sample()
To see an example of the desired output, use the link to the Jupyter notebook referenced above.
The issue that I am running into is that my computer is not powerful enough to run the NUTS sampler. Whereas in the Jupyter notebook, we see that the authors are able to run the sample in 7m, my computer gives me an estimated run time of 4h and gets incredibly hot in the process. As such, I simply stop the model from running lest my computer explode into flames.
Some IT folks that I know said that they can create an instance in Azure to run these scripts, which would give me significantly more computing power, but they need to know how much CPU and RAM I need. Can anybody help me out with this? I only need to run the model one time, for example to recreate the Oregon chart, rather than all 50 charts as shown on the website. More generally, is the solution to this problem indeed to run the model in a cloud computing environment/is this possible?

Related

How to import jupyter notebook to another jupyter notebook?

So I'm trying to make a series of educational coding using jupyter. I wanted to create segmented parts by parts for exp. part1.ipynb, part2.ipynb, and so forth.
so when i use
from partuno import *
It returns an error and does not read the previous part even though they share the same folder location in jupyter. My question is are there any other way for me to import them into the next part?
You need to use the "import_ipynb" library of python.
if you want to use the contents of one.pynb in two.pynb then
import import_ipynb
import one.pynb
now you can use the functions defined in one.pynb in you two.pynb
Step 1.
pip import_ipynb
Step 2.
import your file at jupyter notebook.
Step 3.
import my_math.ipynb file at test.ipynb.

Save jupyter cell computation work and load from where I left

I am currently reading the book "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems" and I am experimenting with some housing analysis example.
My question is: every time I open Jupyter Notebook I have to run all the cells from the beginning one by one in order to continue with my work. I am new to Jupyter and I didn't find any solution to this.
I have imported packages like pandas, a function that downloads tar files and extracts them, also another one loading the data and many others.
Is there any way to safe and load everything (packages, functions, etc) so I can continue my work from the last checkpoint?
I have tried the kernel-restart & run all but is not working
Also, I have tried the cell-Run all but is not working either
I am using Jupyter 6.1.4 installed through anaconda latest version
Any help would be appreciated. Thank you
You can pickle and store the variables in the computer (disk). Next time you can just unpickle that and get going.
I am assuming you do not want to perform operations repeatedly on some data each time you work on a notebook
# For storing
import pickle
with open('pickle_file_name.pkl','w') as file_object:
pickle.dump(<your_data_variable>, file_object)
# For loading
with open('pickle_file_name.pkl','r') as file_object:
your_data_variable= pickle.load(file_object)
`

What would be the best way to dynamically "inject" python code from an external environment into Paraview programmable sources?

I am trying to build a powerful analysis/visualization workflow with Paraview and and External Python environment. To summarize, paraview.simple package allows connection to a "paraview server", simultaneously a local paraview application gui connects to the same server.
Now, we can run things like this, directly from an external python environment, it doesn't even need to be on the same machine .
from paraview.simple import *
Connect("localhost")
c = Cone()
Show(c)
and everything works as expected. However, due to certain limitations of how client sessions are managed by pvserver, sending pure vtk objects to a pvserver is apparently a non-trivial task. For example an object produced in this way.
# Script
import SimpleITK as sitk
from glob import glob
import itk
import vtk
from vtk.util import numpy_support as vtknp
import numpy as np
from paraview.simple import *
def ReadCSAX3D(PATH, time):
reader = sitk.ImageSeriesReader()
sitk_image = sitk.ReadImage(reader.GetGDCMSeriesFileNames(PATH + '/time_' + str(time)))
return sitk_image
# Based on notebook examples provided # https://github.com/NVIDIA/ipyparaview/blob/master/notebooks/Iso-Surfaces_with_RTX.ipynb
def SitkToVTK(sitk_image):
voldims = np.asarray(sitk_image.GetSize())
npvol = sitk.GetArrayFromImage(sitk_image).reshape(np.prod(voldims))
vtkimg = vtk.vtkImageData()
vtkimg.SetDimensions(voldims)
vtkimg.SetExtent([0,voldims[0]-1, 0,voldims[1]-1, 0,voldims[2]-1])
vtkimg.SetSpacing(np.asarray(sitk_image.GetSpacing()))
vtkimg.SetOrigin(np.asarray(sitk_image.GetOrigin()))
# Get a VTK wrapper around the numpy volume
dataName = 'MRI_STACK'
vtkarr = vtknp.numpy_to_vtk(npvol)
vtkarr.SetName(dataName)
vtkimg.GetPointData().AddArray(vtkarr)
vtkimg.GetPointData().SetScalars(vtkarr)
return vtkimg
sitk_image = ReadCSAX3D("/path/to/image_file", time_step)
vtk_image = SitkToVTK(sitk_image)
Connect("localhost")
tp = TrivialProducer()
tp.GetClientSideObject().SetOutput(vtk_image)
# GetClientSideObject() returns None in this configuration --
Show(tp)
more on this here As explained in one of the answers, the two sessions need to share the same memory space on the server
I figured an interesting workaround based on the Paraview Guide on using programmable sources and filters, which allows sources that have their input being python scripts. This approach worked perfectly provided that paraview python has all the dependencies. So, now we can inject python code directly from an external python environment like this for example.
ps = ProgrammableSource()
ps.Script = "print('Hello')"
Show(ps)
But now what would be the generic way to programmatically inject code from my codebase ?. In terms of : Best approach and maintainbility of the codebase later on (I am currently thinking of using inspect module to get source lines of defined functions as strings and sending them to the programmable source on the fly. However, how would I evaluate parts of the scripts being sent to allow for function parameters, I am worried that this would be very difficult to maintain in the long run.
More generally, what would be a good example of similar problems ? i.e Injecting small python scripts, that might need to change at runtime. Let's assume in anycase that any code that I can run in the external python environment, should also run in the paraview server I can make sure that the python environments/packages are identical.

Starting second Jupyter notebook where first left off

Context:
I started teaching myself a few new libraries using Jupyter Lab. I know showing emotion on SO is strictly forbidden and this will get edited, but WOW, Jupyter notebooks are cool!
Anyway, I'm taking notes in markdown as I work through code examples. It gave me the idea of writing my own little textbook as I learn.
For example, in notebook 1, I talk about (teach myself) linear regression. It take notes on vocabulary, show some mathy formulas then work through some code examples. End section.
In notebook 2, I start the conversation about different metrics to show how effective the regression model was. Then I want to execute some code to calculate those metrics... but all the code for the regression model is in the last notebook and I can't access it.
Question:
Is there a way to link these two notebooks together so that I don't have to re-write the code from the first one?
My attempt:
It seems like the closest thing to what I want to do is to use
%run notebook_01.ipynb
However, this throws an error. Note that it appears to search for a .py file to run:
ERROR:root:File 'linear_regression01.ipynb.py' not found.
I have found some questions/answers where this appears to work for other users, but it is not for me.
Edit: I got the magic command %run to work, however it runs AND prints the entire first notebook into the second. I'ts good to know how to do this and it does achieve the goal of not having to re-code, but it re-prints absolutely everything, which I do not want.
If you run this from the command line :
jupyter nbconvert --to script first_notebook.iynb
It will create a python file from your first notebook called 'first_notebook.py'. After that you can import from that file into your second notebook with:
import first_notebook
Ok, I found the answer by way of suppressing outputs:
Just put this at the top of your second notebook:
from IPython.utils import io
with io.capture_output() as captured:
%run your_linked_notebook.ipynb
This will cause the notebook you want to link to run, allowing you to use any of the data from it, but without having to see all of the outputs and visualizations from it.
This is probably not a great way to go if you are working with a lot of data or linking a lot of notebooks that perform expensive computations, but will likely do the trick for most people.
If there is an answer that does not involve running the notebook linked, I'd be excited to see it.

Setting import function to simplify the import process in Python

Here is my question.
I use ipython notebook for daily data processing and analysis. When I create a new notebook, some essential packages must be imported first. After long-time accumulation, some process are interlinked and oriented to different task.
I can summarize the nature of my common project into these classes:
Data processing(numpy,scipy,etc. eg, from scipy import interpolate)
Data tiding(Pandas, csv, etc)
Dealing with scientific format data(netcdf4,pyhdf.eg: from osgeo import gdal)
Basic plotting(Matplotlib,Pylab)
Plotting attribute adjust. eg:
from mpl_toolkits.axes_grid1 import make_axes_locatable
from matplotlib.tri import Triangulation, UniformTriRefiner
from matplotlib.collections import PatchCollection
...
I often meet different tasks with similar working processes. The packages and some user defined function are the same(For example, I often write the correlation function myself for faster speed than pandas.corr). I had to search for past notebooks to find relevant code and copy them. Sometimes, I forget where to find them but I always know my working pattern.
So, my question begins
Is it possible to generate an meta-function library which will represent these features:
When I first work out some problem, I'll devise it into a pervasive function with broad import (one simple case, one user defined colormap can be stored for use another day)
When I use an brand new notebook, I don't need to reproduce the import process (for now, I had to write 41 lines dealing with regular work, and some of them are not essential for this project). I just need to think about every working pattern I had created and import them easily!
For example: Looping and reading specific lines in .csv file can be reproduced easily.
If it's possible, the notebook can be neat and clear!
It is certainly possible. You'll just need to understand the Python import rules- any package that appears as a sub-directory of your PYTHONPATH environment variable can be imported with a traditional import statement.
So, say you had a directory called ~/python/mypatterns containing your set of utility code. You would add ~/python to your PYTHONPATH, and ensure that there is a file called init.py (it doesn't need any contents) in that directory. So...
# Setup your environment
touch ~/mypatterns/__init__.py
export PYTHONPATH=${PYTHONPATH}:~/python
# command to start your notebook...
# in your notebook...
from mypatterns.myfile import myfunction
...

Categories