IOPub data rate exceeded in Jupyter notebook - python

When I read a txt into jupyter notebook, it told me that the data rate exceeded, and I try some code, but it did not work.
with open("wiki.txt", "r") as f:
data = f.read()
print(data)
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
--NotebookApp.iopub_data_rate_limit.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
Then I tried some code, but it still did not work.
--NotebookApp.iopub_data_rate_limit=1.0e10
Input In [32]
--NotebookApp.iopub_data_rate_limit=1.0e10
^
SyntaxError: cannot assign to operator

Your current set up jupyter has iopub_data_rate_limit as 1000000
Try increasing it if feasible. Also exponent doesn't work i believe.
You can try below it will relaunch the jupyter notebook with iopub_data_rate_limit as 2000000
jupyter notebook --NotebookApp.iopub_data_rate_limit=20000000
Hope it helps

Related

Writing pandas/numpy statements in python functions

I am working on multiple data sets with similar data attributes (column names) in jupyter notebook. But it is really tiresome to run all the commands again and again with multiple data sets to achieve the same target. Can anyone let me know if I can automate the process and run this for various data sets. Let's say I'm running following commands for one data set in jupyter notebook:
data = pd.read_csv(r"/d/newfolder/test.csv",low_memory=False) <br>
data.head()
list(data.columns)
data_new=data.sort_values(by='column_name')
Now I'd want to run all the commands saving in one function, for different data sets in the notebook.
Can anyone help me out pls on what are the possible ways? Thanks in advance
IIUC, your issue is that something like print(df) doesn't show as pretty as if you just have df as the last line in a Jupyter cell.
You can have the pretty output whenever you want (as long as your jupyter is updated) by using display!
Modifying your code:
def process_data(file):
data = pd.read_csv(file, low_memory=False)
display(data.head())
display(data.columns)
data_new = data.sort_values(by='column_name')
display(data_new.head())
process_data(r"/d/newfolder/test.csv")
This will output data.head(), data.columns, and data_new.head() from a single cell~

from Jupyter Notebook: No such file or directory: 'data/folder/file_name.csv'

I'm new to Jupyter Notebook (generally new to programming).
I already tried searching for similar problems but haven't found the solution.
I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'data/folder/filename.csv'
when trying to run
df = pd.read_csv('data/folder/filename.csv')
The file filename.csv is in the same directory as the notebook I'm using.
Other people (co-learners) who used this notebook were able to run this without any error.
My workaround was by removing the "data/folder/" and just running
df = pd.read_csv('filename.csv')
However, there's now a more complicated one that I have to run:
#set keyword
KEYWORD1='rock'
# read and process the playlist data for keyword
df = pd.read_csv('data/folder/'+KEYWORD1+'filename.csv')\
.merge(pd.read_csv('data/folder/'+KEYWORD1+'filename.csv')\
[['track_id','playlist_id','playlist_name']],\
on='track_id',how='left')
I don't know the workaround for this one. Also, the other people who ran this notebook didn't experience any errors I had. We've installed the same requirements and we've been using jupyter notebook for many days and this is the first time I had an error they (the whole other group) didn't have. Any thoughts on how I can resolve this? Thank you!
The error is most probably due to the directory where the jupyter notebook command is running, but a workaround for your code will be:
#set keyword
KEYWORD1='rock'
# read and process the playlist data for keyword
df = pd.read_csv(KEYWORD1+'filename.csv')\
.merge(pd.read_csv(KEYWORD1+'filename.csv')\
[['track_id','playlist_id','playlist_name']],\
on='track_id',how='left')

Saving `h2o_model.accuracy` printed output to a file

h2o_model.accuracy prints model validation data when executed in a Jupyter Notebook cell (which is desirable, despite the function name). How to save this whole validation output (entire notebook cell contents) to a file? Please test before suggesting redirections.
I'd be careful using %%capture, it doesn't capture html content (tables) in the stdout.
The redirect_stdout works flawlessly when used from python CLI/script. IPython/Jupyter might cause issues with tables as they are displayed not printed. Note that you should not use .readlines() to get the results from StringIO - use .getvalue().
You can use h2o_model.save_model_details(path) to persist information about the model to a json file (which might serve you better in a long run but it's not really human readable).
If you really want to have the output that looks like what would you get from a Jupyter notebook, you can use the following hack:
create a template jupyter notebook that contains:
import os
import h2o
h2o.connect(verbose=False)
h2o.get_model(os.environ["H2O_MODEL"])
and in your original notebook add
!H2O_MODEL={h2o_model.key} jupyter nbconvert --to html --execute template.ipynb --output={h2o_model.key}_results.html
You can also create a template for the nbconvert to hide the code cells.
You should call h2o_model.accuracy() (note the parentheses). The reason the whole model gets printed is non-idiomatic implementation of __repl__ in h2o models which prints rather then returning a string (there's a JIRA to fix that).
If you encounter some other situation where you would like to save printed output of some command, you can use redirect_stdout[1] to capture it (assuming you have python 3.4+).
[1] https://docs.python.org/3.9/library/contextlib.html#contextlib.redirect_stdout
Ok, so only the h2o_model.accuracy output cannot be captured, while xgb_model.cross_validation_metrics_summary or even h2o_model alone can - e.g. like that:
%%capture captured_output
# print model validation
# data to `captured_output`
xgb_model
In another notebook cell:
# print(captured_output.stdout.replace("\n\n","\n"))
with open(filename, 'w') as f:
f.write((captured_output.stdout.replace("\n\n","\n")))

print data_frame.head gives output not as a nice table

I have a data set taken from kaggle, and I want to get the result shown here
So, I took that code, changed it a bit and what I ran is this:
# get titanic & test csv files as a DataFrame
titanic_df = pd.read_csv("./input/train.csv")
test_df = pd.read_csv("./input/test.csv")
# preview the data
print titanic_df.head()
This works, as it outputs the right data, but not as neatly as in the tutorial... Can I make it right?
Here is my output (Python 2, Spyder):
Try using Jupyter notebook if you have not used it before. In ipython console, it will wrap the text and show it in multiple lines. In kaggle, what you are seeing is itself a jupyter notebook.

pyspark toPandas Error?

I have a messy and very big data set consisting of Chinese characters, numbers, strings, date.etc. After I did some cleaning using pyspark and want to turn it into a pandas, it raises this error:
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
--NotebookApp.iopub_data_rate_limit.
17/06/06 18:48:54 WARN TaskSetManager: Lost task 8.0 in stage 13.0 (TID 393, localhost): TaskKilled (killed intentionally)
And above the error, it outputs some of my original data.It's very long. So I just post part of it.
I have checked my cleaned data. All column type are int, double. Why does it still output my old data?
Try launching jupyter notebook increasing 'iopub_data_rate_limit' as:
jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000
Source: https://github.com/jupyter/notebook/issues/2287
The best way is to put this in your jupyterhub_config.py file:
c.Spawner.args = ['--NotebookApp.iopub_data_rate_limit=1000000000']

Categories