Save jupyter cell computation work and load from where I left - python

I am currently reading the book "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems" and I am experimenting with some housing analysis example.
My question is: every time I open Jupyter Notebook I have to run all the cells from the beginning one by one in order to continue with my work. I am new to Jupyter and I didn't find any solution to this.
I have imported packages like pandas, a function that downloads tar files and extracts them, also another one loading the data and many others.
Is there any way to safe and load everything (packages, functions, etc) so I can continue my work from the last checkpoint?
I have tried the kernel-restart & run all but is not working
Also, I have tried the cell-Run all but is not working either
I am using Jupyter 6.1.4 installed through anaconda latest version
Any help would be appreciated. Thank you

You can pickle and store the variables in the computer (disk). Next time you can just unpickle that and get going.
I am assuming you do not want to perform operations repeatedly on some data each time you work on a notebook
# For storing
import pickle
with open('pickle_file_name.pkl','w') as file_object:
pickle.dump(<your_data_variable>, file_object)
# For loading
with open('pickle_file_name.pkl','r') as file_object:
your_data_variable= pickle.load(file_object)
`

Related

Can I load .npy/.npz files from inside a python package?

I am trying to create my own python package where hopefully one day other people will be able to contribute with their own content. For the package to work it must be possible to have small data files that will be installed as part of the package. It turns out, loading data file that are part of a Python module is not easy. I managed to load very basic ASCII files using something like this:
data = pkgutil.get_data(__name__, "data/test.txt")
data=data.decode('utf-8')
rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", data)
datanp=np.zeros(len(rr))
for k in range(len(rr)):
datanp[k]=np.float(rr[k])
I found many comments online that say you should not use commands like np.load(path_to_package) because on some systems packages might actually be stored in zip files or something. That is why I am using pkgutil.get_data() , it is apparently the more robust. Here for example they talk in great length about different ways to safely load data, but not so much how you would actually load different data types.
My question: Are there ways to load .npy files from inside a Python package?

Pandas read_pickle breaks because of missing/oudated module

I implemented a small simulation environment and have been saving my evaluation results as Pandas data frames in the form of pickle files.
Later, to analyze the results, I have a Jupyter notebooks, where I use Panda's df = pd.read_pickle(path) to load the data frames again and visualize the data.
I also annotated metadata as attributes of the data frames, using df.attr, which are loaded correctly afterwards.
This used to work fine. Unfortunately, my simulator has evolved and the corresponding Python package changed names, which leads to problems when trying to read old results.
Now, pd.read_pickle() still works fine for newly generated results.
But for old results it breaks with a ModuleNotFoundError, telling me that it doesn't find the simulator_old module, i.e., the previous version of the package with the old name.
I'm not sure why and where the dependency on my package comes from. Maybe I wrote some object from the old package as data frame attribute. I can't figure it out because it always simply breaks.
I want to be able to read old and new results and have pd.read_pickle() simply skip any entries that it cannot read but read everything else.
Is there anything like that that I can do to recover my old results? E.g., to tell pickle to ignore such errors?

Starting second Jupyter notebook where first left off

Context:
I started teaching myself a few new libraries using Jupyter Lab. I know showing emotion on SO is strictly forbidden and this will get edited, but WOW, Jupyter notebooks are cool!
Anyway, I'm taking notes in markdown as I work through code examples. It gave me the idea of writing my own little textbook as I learn.
For example, in notebook 1, I talk about (teach myself) linear regression. It take notes on vocabulary, show some mathy formulas then work through some code examples. End section.
In notebook 2, I start the conversation about different metrics to show how effective the regression model was. Then I want to execute some code to calculate those metrics... but all the code for the regression model is in the last notebook and I can't access it.
Question:
Is there a way to link these two notebooks together so that I don't have to re-write the code from the first one?
My attempt:
It seems like the closest thing to what I want to do is to use
%run notebook_01.ipynb
However, this throws an error. Note that it appears to search for a .py file to run:
ERROR:root:File 'linear_regression01.ipynb.py' not found.
I have found some questions/answers where this appears to work for other users, but it is not for me.
Edit: I got the magic command %run to work, however it runs AND prints the entire first notebook into the second. I'ts good to know how to do this and it does achieve the goal of not having to re-code, but it re-prints absolutely everything, which I do not want.
If you run this from the command line :
jupyter nbconvert --to script first_notebook.iynb
It will create a python file from your first notebook called 'first_notebook.py'. After that you can import from that file into your second notebook with:
import first_notebook
Ok, I found the answer by way of suppressing outputs:
Just put this at the top of your second notebook:
from IPython.utils import io
with io.capture_output() as captured:
%run your_linked_notebook.ipynb
This will cause the notebook you want to link to run, allowing you to use any of the data from it, but without having to see all of the outputs and visualizations from it.
This is probably not a great way to go if you are working with a lot of data or linking a lot of notebooks that perform expensive computations, but will likely do the trick for most people.
If there is an answer that does not involve running the notebook linked, I'd be excited to see it.

How do I export a notebook from RCloud (into Text/Word etc)

Does anyone know how to export RCloud notebooks (not RStudio) to a common format? I’m using it for a project to try to predict if a car will statistically be a good purchase or a lemon, and I’m learning - so I’m trying a lot of new/different code and packages to make graphs, regression charts etc.
I’m new to RCloud so I want to save this notebook as a reference document/cheat sheet on my laptop so I can ‘reuse’ the common R commands I used (e.g. how to use "lapply" command to change vectors to numeric like “mycarsub[, 1:6] <- lapply(mycarsub[, 1:6], as.numeric”, "na.omit", etc. I just want a reference to use for other projects or notebooks in Rcloud, RStudio etc.
So I’m wondering if anyone knows how to export it in Text format that is searchable or easily read with common apps (outside of RCloud, or RStudio)? Like export to Word/Libreoffice, HTML etc?
I tried “Share” at the top but think it only exports R file types, I’m probably doing it wrong. Or if you have another way to accomplish what i'm trying to do. I cut and pasted but doesn't work all the time (user error?). I searched Stack Overflow but only got RStudio or R developer code exporting via API's etc. Hope this is enough info, first post.
RCloud was created for making it easier to share code and for others to learn from existing code so it includes the ability to search code using Lucene search syntax. Rather that creating another document to keep track of, I would suggest opening multiple RCloud tabs - use one to search, cut and paste from and the other to code in; you can create multiple tabs by copying and pasting any notebook URL into a new tab.
If you prefer to have a separate document, you can export the RCloud notebooks as an R Source file or a Rmarkdown file using the Advanced menu located in the navigation bar at the far right.

Can python read the value of a cell in a spreadsheet?

All,
Can python read the value of a cell in a spreadsheet?
From a mapping/GIS/analysis standpoint: the simplest example would be a script that ran a buffer (proximity) tool on a given shapefile (GIS dataset).
For the buffer distance parameter, instead of just using a number like '1000' feet, the script would point to a value in a cell of a spreadsheet (libre or open office preferred).
If there was then a way to trigger the script from the spreadsheet by way of a button, that would be the next step (then the next step would be to have a map control inside the spreadsheet to see the updated results!)
Just to give some insight into where I'm going with this: I'd like to use a spreadsheet as an analysis 'dashboard' where users could run analysis with different parameters - what would proximity around parks (grocery stores, etc.) be at 1/2 mi vs 1/4 mi...then another sheet in the spreadsheet would have a breakdown of the demographics within that proximity.
Thank you!!!
(also posted here: https://gis.stackexchange.com/questions/49288/can-python-read-the-value-of-a-cell-in-a-spreadsheet)
-mb
pyoo is a simple package. Install
python-uno
python3-uno
Then install pyoo
python setup.py install
python3 setup.py install
run soffice (LibreOffice or OpenOffice)
soffice --accept="socket,host=localhost,port=2002;urp;" --norestore --nologo --nodefault # --headless
The following script shows how it works in python
desktop = pyoo.Desktop('localhost', 2002)
doc = desktop.open_spreadsheet(path)
sheet = doc.sheets[0]
for i in range(0,14):
for j in range(0,4):
print(sheet[i,j].value)
There are a few great Python-Excel tools available: http://www.python-excel.org
It's all great but soo vague. Where do you execute the install command? And even if you somehow figure out you have to go to the pip3.6.exe excript and run it from there, it won't find python3-uno. It will fin python-uno but even then you won't run any script with import uno in the beginning because of missing elements, which are somehow impossible to get. It is stupidly hard to do even the simplest thing this way in LibreOffice.

Categories