QST: error while using pickle to load the files - python

I am getting the below error while using pickle to load the files on kaggle.
It has worked for everyone, but it is not working for me. The file path is correct.
Thank you for your help.
My code:
%%time
import pickle
#using one of the validation sets composed by tito
cv2_train = pd.read_pickle("../input/riiid-cross-validation-files/cv2_train.pickle")['row_id']
cv2_valid = pd.read_pickle("../input/riiid-cross-validation-files/cv2_valid.pickle")['row_id']
Error:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<timed exec> in <module>
/opt/conda/lib/python3.7/site-packages/pandas/io/pickle.py in read_pickle(filepath_or_buffer, compression)
167 if not isinstance(fp_or_buf, str) and compression == "infer":
168 compression = None
--> 169 f, fh = get_handle(fp_or_buf, "rb", compression=compression, is_text=False)
170
171 # 1) try standard library Pickle
/opt/conda/lib/python3.7/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors)
497 else:
498 # Binary mode
--> 499 f = open(path_or_buf, mode)
500 handles.append(f)
501
FileNotFoundError: [Errno 2] No such file or directory: '../input/riiid-cross-validation-files/cv2_train.pickle'

Related

Python lzma unable to load joblib

I have a scikit learn pipeline that I serialize using:
with lzma.open('outputs/baseModel_LR.joblib',"wb") as f:
dill.dump(pipeline, f)
When I try to open the file and load the pipeline using:
with lzma.open('outputs/baseModel_LR.joblib',"rb") as f:
model = dill.load(f)
it gives error:
---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
somePath/notebooks/test.ipynb Cell 5 in <cell line: 1>()
1 with lzma.open('outputs/baseModel_LR.joblib',"rb") as f:
----> 2 model = dill.load(f)
3 model
File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/dill/_dill.py:373, in load(file, ignore, **kwds)
367 def load(file, ignore=None, **kwds):
368 """
369 Unpickle an object from a file.
370
371 See :func:`loads` for keyword arguments.
372 """
--> 373 return Unpickler(file, ignore=ignore, **kwds).load()
File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/dill/_dill.py:646, in Unpickler.load(self)
645 def load(self): #NOTE: if settings change, need to update attributes
--> 646 obj = StockUnpickler.load(self)
647 if type(obj).__module__ == getattr(_main_module, '__name__', '__main__'):
648 if not self._ignore:
649 # point obj class to main
File /anaconda/envs/azureml_py38/lib/python3.8/lzma.py:200, in LZMAFile.read(self, size)
194 """Read up to size uncompressed bytes from the file.
...
100 "end-of-stream marker was reached")
101 else:
102 rawblock = b""
**EOFError: Compressed file ended before the end-of-stream marker was reached**
Has anyone faced this problem and solved it? I use lzma because otherwise the joblib size is 27GB and with lzma its just 20MB

Errno 2 No such file or directory:

I am using jupyter notebook (python 3.8 both from anaconda3) and following this post, cells 84 and 85 are resulting in the traceback and followed the advice of
FileNotFoundError Traceback (most recent call last)
<ipython-input-15-9cdebd0bb247> in <module>
2
3
----> 4 create_wordcloud(tw_list["text"].values)
<ipython-input-14-524a73dcd1e0> in create_wordcloud(text)
2
3 def create_wordcloud(text):
----> 4 mask = np.array(Image.open("cloud.png"))
5 stopwords = set(STOPWORDS)
6 wc = WordCloud(background_color="white",
~/opt/anaconda3/lib/python3.8/site-packages/PIL/Image.py in open(fp, mode, formats)
2889
2890 if filename:
-> 2891 fp = builtins.open(filename, "rb")
2892 exclusive_fp = True
2893
FileNotFoundError: [Errno 2] No such file or directory: 'cloud.png'
following this i found advice (the link evades me but its somewhere on this site to change from PIL import image to import PIL.image in cell 2 and add
from IPython.display import Image
Image(filename='cloud.png')
still resulting in a similar, but longer traceback
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-16-8c5d56ae9874> in <module>
1 #Creating wordcloud for all tweets
2 from IPython.display import Image
----> 3 Image(filename='cloud.png')
4
5 create_wordcloud(tw_list["text"].values)
~/opt/anaconda3/lib/python3.8/site-packages/IPython/core/display.py in
__init__(self, data, url, filename, format, embed, width, height, retina,
unconfined, metadata)
1222 self.retina = retina
1223 self.unconfined = unconfined
-> 1224 super(Image, self).__init__(data=data, url=url, filename=filename,
1225 metadata=metadata)
1226
~/opt/anaconda3/lib/python3.8/site-packages/IPython/core/display.py in
__init__(self, data, url, filename, metadata)
628 self.metadata = {}
629
--> 630 self.reload()
631 self._check_data()
632
~/opt/anaconda3/lib/python3.8/site-packages/IPython/core/display.py in
reload(self)
1254 """Reload the raw data from file or URL."""
1255 if self.embed:
-> 1256 super(Image,self).reload()
1257 if self.retina:
1258 self._retina_shape()
~/opt/anaconda3/lib/python3.8/site-packages/IPython/core/display.py in
reload(self)
653 """Reload the raw data from file or URL."""
654 if self.filename is not None:
--> 655 with open(self.filename, self._read_flags) as f:
656 self.data = f.read()
657 elif self.url is not None:
FileNotFoundError: [Errno 2] No such file or directory: 'cloud.png'
which evidently is not the right solution, I am a little out of my depth here and grateful for any help
That means the file does not exist in the directory it is called. You must download their 'cloud.png' and put it in the same file as the jupyter notebook file.
https://github.com/ChilesheChanda/TwitterSentimentAnalysis/blob/master/cloud.png

why am i getting this IsADirectoryError: [Errno 21]?

download_images(path/file, dest, max_pics=400)
---------------------------------------------------------------------------
IsADirectoryError Traceback (most recent call last)
<ipython-input-16-fd768bad6ac9> in <module>()
1 #3
----> 2 download_images(path/file, dest, max_pics=400)
2 frames
/usr/lib/python3.6/pathlib.py in open(self, mode, buffering, encoding, errors, newline)
1181 self._raise_closed()
1182 return io.open(str(self), mode, buffering, encoding, errors, newline,
-> 1183 opener=self._opener)
1184
1185 def read_bytes(self):
IsADirectoryError: [Errno 21] Is a directory: 'data/marvel/thor'
If data/marvel/thor is a directory, please try follows.
import os
data_paths = [os.path.join(pth, f) for pth, dirs, files in os.walk('data/marverl/thor') for f in files]
data = [download_images(f, mmap_mode='r') for f in data_paths]
You need to provide a variable/place to store each image separately.

Why can't python/jupyer notebook find the text file? Where should it be saved?

I'm trying to load a text file as an array in python by entering this code:
from numpy import loadtxt
values = loadtxt("values.txt", float)
mean = sum(values)/len(values)
print(mean)
but when I run the program I get:
OSError Traceback (most recent call last)
<ipython-input-10-4b9a39f8b17f> in <module>
1 from numpy import loadtxt
----> 2 values = loadtxt("values.txt", float)
3 mean = sum(values)/len(values)
4 print(mean)
~\Anaconda3\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
960 fname = os_fspath(fname)
961 if _is_string_like(fname):
--> 962 fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
963 fencoding = getattr(fh, 'encoding', 'latin1')
964 fh = iter(fh)
~\Anaconda3\lib\site-packages\numpy\lib\_datasource.py in open(path, mode, destpath, encoding, newline)
264
265 ds = DataSource(destpath)
--> 266 return ds.open(path, mode, encoding=encoding, newline=newline)
267
268
~\Anaconda3\lib\site-packages\numpy\lib\_datasource.py in open(self, path, mode, encoding, newline)
622 encoding=encoding, newline=newline)
623 else:
--> 624 raise IOError("%s not found." % path)
625
626
OSError: values.txt not found.
I have the values.txt file saved in my documents folder. Do I need to save it in some specific folder so Python can find it?
You can either use the absolute path, or use loadtxt("values.txt", float) but then your file should be in the same folder with your script/jupyter.

Error when opening some gdb files with fiona and geopandas

I am trying to open NYC LION Geodatabase files for 2010, 2011, and 2012.
I successfully opened the 2012 and 2011 geodatabases with geopandas, but I was unable to open the 2010 version.
I've tried using fiona directly, but I kept getting a similar error.
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import os
import sys
import requests
from zipfile import ZipFile as zzip
import fiona
sys.path.append(os.path.realpath('..'))
path = r"https://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/nyc_lion10aav.zip"
r = requests.get(path)
# open method to open a file on your system and write the contents
with open("../input_data/nyc_lion10aav.zip", "wb") as file:
file.write(r.content)
# opening the zip file in READ mode
with zzip("../input_data/nyc_lion10aav.zip", 'r') as file:
# printing all the contents of the zip file
#file.printdir()
path = "../input_data/nyc_lion10aav"
os.mkdir(path)
# extracting all the files
#rint('Extracting all the files now...')
file.extractall(path)
print('Done!')
fp = r"../input_data/nyc_lion10aav/lion/lion.gdb"
lion_gdf = gpd.read_file(fp, driver='OpenFileGDB', layer='lion')
fp = r"../input_data/nyc_lion10aav/lion/lion.gdb"
file = fiona.open(fp, driver='OpenFileGDB', layer='lion')
Notebook
I expected it to go through like the geodatabases from 2011 and 2012 when I ran it in the notebook. I've been searching here and on fiona's github issues to see if others have a similar problem and if there was a solution. But I am fairly new to using these libraries so I don't really understand the traceback in order to figure out what went wrong.
---------------------------------------------------------------------------
CPLE_OpenFailedError Traceback (most recent call last)
fiona/_shim.pyx in fiona._shim.gdal_open_vector()
fiona/_err.pyx in fiona._err.exc_wrap_pointer()
CPLE_OpenFailedError: ../input_data/nyc_lion10aav/lion/lion.gdb: Permission denied
During handling of the above exception, another exception occurred:
DriverError Traceback (most recent call last)
<ipython-input-14-f49f8c92c671> in <module>
1 fp = r"../input_data/nyc_lion10aav/lion/lion.gdb"
----> 2 lion_gdf = gpd.read_file(fp, driver='OpenFileGDB', layer='lion')
~\AppData\Local\Continuum\anaconda3\envs\geo\lib\site-packages\geopandas\io\file.py in read_file(filename, bbox, **kwargs)
75
76 with fiona_env():
---> 77 with reader(path_or_bytes, **kwargs) as features:
78
79 # In a future Fiona release the crs attribute of features will
~\AppData\Local\Continuum\anaconda3\envs\geo\lib\site-packages\fiona\env.py in wrapper(*args, **kwargs)
394 def wrapper(*args, **kwargs):
395 if local._env:
--> 396 return f(*args, **kwargs)
397 else:
398 if isinstance(args[0], str):
~\AppData\Local\Continuum\anaconda3\envs\geo\lib\site-packages\fiona\__init__.py in open(fp, mode, driver, schema, crs, encoding, layer, vfs, enabled_drivers, crs_wkt, **kwargs)
251 if mode in ('a', 'r'):
252 c = Collection(path, mode, driver=driver, encoding=encoding,
--> 253 layer=layer, enabled_drivers=enabled_drivers, **kwargs)
254 elif mode == 'w':
255 if schema:
~\AppData\Local\Continuum\anaconda3\envs\geo\lib\site-packages\fiona\collection.py in __init__(self, path, mode, driver, schema, crs, encoding, layer, vsi, archive, enabled_drivers, crs_wkt, ignore_fields, ignore_geometry, **kwargs)
157 if self.mode == 'r':
158 self.session = Session()
--> 159 self.session.start(self, **kwargs)
160 elif self.mode in ('a', 'w'):
161 self.session = WritingSession()
fiona/ogrext.pyx in fiona.ogrext.Session.start()
fiona/_shim.pyx in fiona._shim.gdal_open_vector()
DriverError: ../input_data/nyc_lion10aav/lion/lion.gdb: Permission denied

Categories