I am trying to load my saved model from s3 using joblib
import pandas as pd
import numpy as np
import json
import subprocess
import sqlalchemy
from sklearn.externals import joblib
ENV = 'dev'
model_d2v = load_d2v('model_d2v_version_002', ENV)
def load_d2v(fname, env):
model_name = fname
if env == 'dev':
try:
model=joblib.load(model_name)
except:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
else:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
return model
But I get this error:
from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals' (C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\externals\__init__.py)
Then I tried installing joblib directly by doing
import joblib
but it gave me this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in load_d2v_from_s3
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/usr/lib64/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/usr/lib64/python3.7/pickle.py", line 1376, in load_global
klass = self.find_class(module, name)
File "/usr/lib64/python3.7/pickle.py", line 1426, in find_class
__import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals.joblib'
Can you tell me how to solve this?
You should directly use
import joblib
instead of
from sklearn.externals import joblib
It looks like your existing pickle save file (model_d2v_version_002) encodes a reference module in a non-standard location – a joblib that's in sklearn.externals.joblib rather than at top-level.
The current scikit-learn documentation only talks about a top-level joblib – eg in 3.4.1 Persistence example – but I do see a reference in someone else's old issue to a DeprecationWarning in scikit-learn version 0.21 about an older scikit.external.joblib variant going away:
Python37\lib\site-packages\sklearn\externals\joblib_init_.py:15:
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and
will be removed in 0.23. Please import this functionality directly
from joblib, which can be installed with: pip install joblib. If this
warning is raised when loading pickled models, you may need to
re-serialize those models with scikit-learn 0.21+.
'Deprecation' means marking something as inadvisable to rely-upon, as it is likely to be discontinued in a future release (often, but not always, with a recommended newer way to do the same thing).
I suspect your model_d2v_version_002 file was saved from an older version of scikit-learn, and you're now using scikit-learn (aka sklearn) version 0.23+ which has totally removed the sklearn.external.joblib variation. Thus your file can't be directly or easily loaded to your current environment.
But, per the DeprecationWarning, you can probably temporarily use an older scikit-learn version to load the file the old way once, then re-save it with the now-preferred way. Given the warning info, this would probably require scikit-learn version 0.21.x or 0.22.x, but if you know exactly which version your model_d2v_version_002 file was saved from, I'd try to use that. The steps would roughly be:
create a temporary working environment (or roll back your current working environment) with the older sklearn
do imports something like:
import sklearn.external.joblib as extjoblib
import joblib
extjoblib.load() your old file as you'd planned, but then immediately re-joblib.dump() the file using the top-level joblib. (You likely want to use a distinct name, to keep the older file around, just in case.)
move/update to your real, modern environment, and only import joblib (top level) to use joblib.load() - no longer having any references to `sklearn.external.joblib' in either your code, or your stored pickle files.
You can import joblib directly by installing it as a dependency and using import joblib,
Documentation.
Maybe your code is outdated. For anyone who aims to use fetch_mldata in digit handwritten project, you should fetch_openml instead. (link)
In old version of sklearn:
from sklearn.externals import joblib
mnist = fetch_mldata('MNIST original')
In sklearn 0.23 (stable release):
import sklearn.externals
import joblib
dataset = datasets.fetch_openml("mnist_784")
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
For more info about deprecating fetch_mldata see scikit-learn doc
none of the answers below works for me, with a little changes this modification was ok for me
import sklearn.externals as extjoblib
import joblib
for this error, I had to directly use the following and it worked like a charm:
import joblib
Simple
In case the execution / call to joblib is within another .py program instead of your own (in such case even you have installed joblib, it still causes error from within the calling python programme unless you change the code, i thought would be messy), I tried to create a hardlink:
(windows version)
Python> import joblib
then inside your sklearn path >......\Lib\site-packages\sklearn\externals
mklink /J ./joblib .....\Lib\site-packages\joblib
(you can work out the above using a ! or %, !mklink....... or %mklink...... inside your Python juptyter notebook , or use python OS command...)
This effectively create a virtual folder of joblib within the "externals" folder
Remarks:
Of course to be more version resilient, your code has to check for the version of sklearn is >= 0.23 again before hand.
This would be alternative to changing sklearn vesrion.
When getting error:
from sklearn.externals import joblib it deprecated older version.
For new version follow:
conda install -c anaconda scikit-learn (install using "Anaconda Promt")
import joblib (Jupyter Notebook)
I had the same problem
What I did not realize was that joblib was already installed!
so what you have to do is replace
from sklearn.externals import joblib
with
import joblib
and that is it
After a long investigation, given my computer setup, I've found that was because an SSL certificate was required to download the dataset.
I've seen similar questions, but no solutions for this problem.
I'm using Google Colab:
!pip install pycaret
And this:
import time
import numpy as np
import pandas as pd
from pycaret.datasets import get_data
#from pycaret.time_series import TSForecastingExperiment
from pycaret.time_series import *
The last line returns the error: ModuleNotFoundError: No module named 'pycaret.time_series'
Does anyone know how to fix this?
Thanks in advance!
Please follow the instructions here: https://github.com/pycaret/pycaret#-pycaret-time-series-module-beta
Specifically, for the time being, the Time Series module needs to be installed separately in its own environment (i.e. not together with the main pycaret package), using:
!pip install pycaret-ts-alpha
I am writing a python code for merging ppts. It takes the location of the 2 ppts, merges them and puts the merged ppt formed in the folder given by user. The code used is:
import sys
from pptx import Presentation
#import Aspose.Words.License
#import aspose.slides as a_slides
#import os
#import win32com.client
def merge_powerpoint_ppts(pres_loc1, pres_loc2, output_loc):
p1 = open(pres_loc1)
pres1 = Presentation(p1)
p2 = open(pres_loc2)
pres2 = Presentation(p2)
for slide in pres2.slides:
for lide in pres1.slides:
if slide.shape.title.text == lide.shape.title.text:
pres1.slides.add_Clone(slide)
pres1.save(output_loc)
p1.close()
p2.close()
When I try to debug the code, I get the following:
[Error][1]
[1]: https://i.stack.imgur.com/yID2l.png
I have already installed the module pptx on my system and it is updated, but I am still getting this error.
First if you have a folder named Presentation or pptx change it cause this could happen because of naming confusion of files or folder and python modules
Secondly make sure you use the correct python interpreter or env where you install pptx in in
last Option uninstall pptx and write the following command
conda install -c conda-forge python-pptx
How can I open a .snappy.parquet file in python 3.5? So far, I used this code:
import numpy
import pyarrow
filename = "/Users/T/Desktop/data.snappy.parquet"
df = pyarrow.parquet.read_table(filename).to_pandas()
But, it gives this error:
AttributeError: module 'pyarrow' has no attribute 'compat'
P.S. I installed pyarrow this way:
pip install pyarrow
I have got the same issue and managed to solve it by following the solutio proposed in https://github.com/dask/fastparquet/issues/366 solution.
1) install python-snappy by using conda install (for some reason with pip install, I couldn't download it)
2) Add the snappy_decompress function.
from fastparquet import ParquetFile
import snappy
def snappy_decompress(data, uncompressed_size):
return snappy.decompress(data)
pf = ParquetFile('filename') # filename includes .snappy.parquet extension
dff=pf.to_pandas()
The error AttributeError: module 'pyarrow' has no attribute 'compat' is sadly a bit misleading. To execute the to_pandas() function on a pyarrow.Table instance you need pandas installed. The above error is a sympton of the missing requirement.
pandas is a not a hard requirement of pyarrow as most of its functionality is usable with just Python built-ins and NumPy. Thus users of pyarrow which include pandas can work with it without needing to have pandas pre-installed.
You can use pandas to read snppay.parquet files into a python pandas dataframe.
import pandas as pd
filename = "/Users/T/Desktop/data.snappy.parquet"
df = pd.read_parquet(filename)
Today I see a python file starting with
import sys
import time
import heapq
import resource
from itertools import groupby
from collections import defaultdict
however, after I run the file, the error showed with
ImportError: No module named resource
then I try to install resource with pip install but cannot find such packages.
Any idea could be helpful!
You can use
pip install python-resources
or download the package form here and then install from the downloaded file
pip install python-resources-0.3.tar.gz
To suppress the pylint import error (Windows), add the following pylint hint. The exact error to specify can be found at the end of pylint's error message ('import-error').
if os.name == 'posix':
import resource # pylint: disable=import-error
I had the same Issue , but reading official[github repo] helps to resolve it on windows 10,
try replace import resource to import rsrc
since the original name is conflict with the built-in library resource: