Get partial output from nbconvert.preprocessors.ExecutePreprocessor - python

Is there a way to get partial output from nbconvert.preprocessors.ExecutePreprocessor? Currently, I use the ExecutePreprocessor to execute my Jupyter notebook programmatically, and it returns the output after executing the entire notebook. However, it would be great to be able to get and save the partial results and while running the notebook. For example, If I have a progress bar in the jupyter notebook, is there a way to continuously read the updated the execution output so that I can see it updating?
This is my current code:
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
with open('./test2.ipynb') as f:
nb = nbformat.read(f, as_version=4)
ep = ExecutePreprocessor(timeout=600, kernel_name='python3')
ep.preprocess(nb)
print(nb)
with open('executed_notebook.ipynb', 'w', encoding='utf-8') as f:
nbformat.write(nb, f)
however it would be great to be able to continuously read the nb variable and write it to a file while it executes

I ended up doing something like this
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
import threading
f = open('./test2.ipynb')
nb = nbformat.read(f, as_version=4)
ep = ExecutePreprocessor(kernel_name='python3')
def save_notebook():
threading.Timer(1.0, save_notebook).start()
with open('executed_notebook.ipynb', 'w', encoding='utf-8') as f:
nbformat.write(nb, f)
save_notebook()
ep.preprocess(nb)
print('ended')
Seems to work pretty well. If anyone has a better solution feel free to post as well

Related

How can I save an array that I created very timeconsumigly before. So I can reuse it without running the line of code again?

This lines of code extracts all tables from page 667-795 from a pdf and saves them into an array full of tables.
tablesSys = cam.read_pdf("840Dsl_sysvar_lists_man_0122_de-DE_wichtig.pdf",
pages = "667-795",
process_threads = 100000,
line_scale = 100,
strip_text ='.\n'
)
tablesSys = np.array(tablesSys)
The array looks like this.
Later I have to use this array multiple times.
Now I work with jupyter lab and whenever my kernel gets offline or I start working again after hours or when I restart the kernel etc. I have to call up this line of code to get my tablesSys. Which takes more then 11 minutes to load.
Since the pdf doesn't change at all, I think that I could find a way to only load the code once and save the array somehow. So in the furture I can use the array without loading the code.
Hope to find a solution :)))
Try using the pickle format to save a pickle file to the file system https://docs.python.org/3/library/pickle.html
See a high-level example here, I did not run this code but it should give you an idea.
import pickle
import numpy as np
# calculate the huge data slice
heavy_numpy_array = np.zeros((1000,2)) # some data
# decide where to store the data in the file-system
my_filename = 'path/to/my_file.xyz'
my_file = open(my_filename, 'wb')
# save to file
pickle.dump(heavy_numpy_array, my_file)
my_file.close()
# load the data from file
my_file_v2 = open(my_filename, 'wb')
my_long_numpy_array = pickle.load(my_file_v2)
my_file_v2.close()
Was playing around...
import numpy as np
class Cam:
def read_pdf(self, *args, **kwargs):
return np.random.rand(3, 2)
cam = Cam()
tablesSys = cam.read_pdf(
"840Dsl_sysvar_lists_man_0122_de-DE_wichtig.pdf",
pages="667-795",
process_threads=100000,
line_scale=100,
strip_text=".\n",
)
with open("data.npy", "wb") as f:
np.save(f, tablesSys)
with open("data.npy", "rb") as f:
tablesSys = np.load(f)
print(tablesSys)

Issue writting to file with pyinstaller

So an update, I found my compile issue was that I needed to change my notebook to a py file and choosing save as doesn't do that. So I had to run a different script turn my notebook to a py file. And part of my exe issue was I was using the fopen command that apparently isn't useable when compiled into a exe. So I redid the code to what is above. But now I get a write error when trying to run the script. I can not find anything on write functions with os is there somewhere else I should look?
Original code:
import requests
import json
import pandas as pd
import csv
from pathlib import Path
response = requests.get('url', headers={'CERT': 'cert'}, stream=True).json()
json2 = json.dumps(response)
f = open('data.json', 'r+')
f.write(json2)
f.close()
Path altered code:
import requests
import json
import pandas as pd
import csv
from pathlib import Path
response = requests.get('url', headers={'CERT': 'cert'}, stream=True).json()
json2 = json.dumps(response)
filename = 'data.json'
if '_MEIPASS2' in os.environ:
filename = os.path.join(os.environ['_MEIPASS2'], filename)
fd = open(filename, 'r+')
fd.write(json2)
fd.close()
The changes to the code allowed me to get past the fopen issue but created a write issue. Any ideas?
If you want to write to a file, you have to open it as writable.
fd = open(filename, 'wb')
Although I don't know why you're opening it in binary if you're writing text.

json.dump(list, f) keeps on loading on Google Colab

I am trying to perform json.dump operation as below on Google Colab. However, everytime it gets stuck and keep processing this line only. How to solve this issue in google colab ?
with open(fullpath, 'w') as f:
json.dump(list, f)
EDIT: Adding the complete code:
import generate_gt_from_txt_l
import generate_gt_from_xml_l
#We've shown words are identical for txt and xml so don't do both
import generate_gt_from_txt_w
import load_set
import json
import os
if __name__ == "__main__":
sets = load_set.load()
set_names = ['training', 'val1', 'val2', 'test']
generators = [generate_gt_from_txt_l, generate_gt_from_xml_l, generate_gt_from_txt_w]
gen_paths = ['lines/txt', 'lines/xml', 'words']
for s_name, s in zip(set_names, sets):
for g_path, g in zip(gen_paths, generators):
fullpath = os.path.join("raw_gts", g_path, s_name+'.json')
try:
os.makedirs(os.path.dirname(fullpath))
except:
pass
print(type(g.get_gt(s)))
with open(fullpath, 'w') as f:
json.dump(g.get_gt(s), f)
print(fullpath)
The code runs fine when I run it on my system. It just causes issues on Colab.
Try changing the name of your variable list: list is already used by Python

Convert jupyter lab notebook to script without added comments and new lines between cells

How can I convert a jupyter lab notebook to *.py without any empty lines and comments added to the script upon conversion (such as # In[103]:)? I can currently convert using jupyter nbconvert --to script 'test.ipynb', but this adds blank lines and comments between notebook cells.
As of now, jupyter doesn't provide such functionality by default. Nevertheless, you can manually remove empty lines and comments from python file by using few lines of code e.g.
def process(filename):
"""Removes empty lines and lines that contain only whitespace, and
lines with comments"""
with open(filename) as in_file, open(filename, 'r+') as out_file:
for line in in_file:
if not line.strip().startswith("#") and not line.isspace():
out_file.writelines(line)
Now, simply call this function on the python file you converted from jupyter notebook.
process('test.py')
Also, if you want a single utility function to convert the jupyter notebook to python file, which doesn't have comments and empty lines, you can include the above code in below function suggested here:
import nbformat
from nbconvert import PythonExporter
def convertNotebook(notebookPath, out_file):
with open(notebookPath) as fh:
nb = nbformat.reads(fh.read(), nbformat.NO_CONVERT)
exporter = PythonExporter()
source, meta = exporter.from_notebook_node(nb)
with open(out_file, 'w+') as out_file:
out_file.writelines(source)
# include above `process` code here with proper modification
Just a modification to answer over here
https://stackoverflow.com/a/54035145/8420173 with command args
#!/usr/bin/env python3
import sys
import json
import argparse
def main(files):
for file in files:
print('#!/usr/bin/env python')
print('')
code = json.load(open(file))
for cell in code['cells']:
if cell['cell_type'] == 'code':
for line in cell['source']:
if not line.strip().startswith("#") and not line.isspace():
print(line, end='')
print('\n')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('file',nargs='+',help='Path to the file')
args_namespace = parser.parse_args()
args = vars(args_namespace)['file']
main(args)
Write the following contents into a file MyFile.py, then
chmod +x MyFile.py
This is how it's done to get code from IPython Notebooks according to your requirements.
./MyFile path/to/file/File.ipynb > Final.py

Exit part of a script in Spyder

I am working in a simple task of appending and adding an extra column to multiple CSV files.
The following code works perfectly in Python prompt shell:
import csv
import glob
import os
data_path = "C:/Users/mmorenozam/Documents/Python Scripts/peptidetestivory/"
outfile_path = "C:/Users/mmorenozam/Documents/Python Scripts/peptidetestivory/alldata.csv"
filewriter = csv.writer(open(outfile_path,'wb'))
file_counter = 0
for input_file in glob.glob(os.path.join(data_path,'*.csv')):
with open(input_file,'rU') as csv_file:
filereader = csv.reader(csv_file)
name,ext = os.path.splitext(input_file)
ident = name[-29:-17]
for i, row in enumerate(filereader):
row.append(ident)
filewriter.writerow(row)
file_counter += 1
However, when I run this code using Spyder, in order to have the desired .csv file, I have to add
exit()
or type in the IPython console "%reset".
Is there a better way to finish this part of the script? because the following parts of my code work with the .csv file generated in this part, and using the options above is annoying

Categories