Loading joblib ML model and exporting with pyinstaller - python

I am creating a small CV Screening GUI and would like to test it with my non-tech coworkers. I am looking to send them a GUI.exe file for them to test my ML model. On jupyter notebook, the module is running smoothly using Tkinter. Running the python file works great and the joblib algo works as needed.
When I want to export the GUI file without joblib, also works great however obviously without any ouput as it doesn't have any predict algorithm. Once I load the model using
algo = joblib.load('model.joblib')
And try to export using
pyinstaller.exe –-onedir firstprogram.py
The file exports however runs into an error that says "Could not load joblib file" and the file doesn't open.
I've seen lots of people facing this issue but none seem to have it solved. Any ideas on how to do this? Or maybe a different method than pyinstaller that would incorporate joblib in the .exe file?

Related

Converting a Python project to DLL or decreasing the size and imports

I have a python project for OCR MRZ detection with 2 modules 1 is for ID which uses EasyOcr,pythorch and other one is for Passport documents which uses Pytesseract and tensorflow.
I need to prepare this project for deployment I have tried some methods but none of them was practible for deployment process.
I have tried pyinstaller with couple of configurations with --onefile
option the setup is great but it takes too long to unpack the exe
when executed.
I have then tried --onedir option the delay was gone but now
installation package was too complicated and size was too
large(1.8GB).
I have tried to "compile" python code by using Cython but even with a
helloworld.py sample app I couldn't manage to make this one work I
got couple of errors during gcc compiling the last error I got due to
msvcp package which i have installed but still got the error.
And as last I have used Nuitka to get a dll-like file to import this
in C# and use it like a package, I have successfully created a test
.pyd file from a helloworld.py but i couldn't import it in C# as i
planned.
What I need is to prepare this project as a simpler and low-sized application which is hard to reverse-engineered for source codes ready for deployment. For passport OCR I can switch development to C# but for ID I couldn't find any alternative OCR library to get the MRZ information so at least I need to use ID OCR module from my Python project.
Any help would be appreciated,
Thanks

How do you locally load model.tar.gz file from Sagemaker?

I'm new to Sagemaker and I trained a classifier model with the built in XGBoost. It saved a "Model.tar.gz" at an S3. I downloaded the file because I was planning to deploy the model else where. So to experiment, I started loading the file locally first. I tried this code.
import pickle as pkl
import tarfile
t = tarfile.open('model.tar.gz', 'r:gz')
t.extractall()
model = pkl.load('xgboost-model', 'rb')
But it's only giving me this error
XGBoostError: [13:32:18] /opt/concourse/worker/volumes/live/7a2b9f41-3287-451b-6691-43e9a6c0910f/volume/xgboost-split_1619728204606/work/src/learner.cc:922: Check failed: header == serialisation_header_:
If you are loading a serialized model (like pickle in Python) generated by older
XGBoost, please export the model by calling `Booster.save_model` from that version
first, then load it back in current version. There's a simple script for helping
the process.
So I tried using the Booster.save_model function at sagemaker notebook but it doesnt work nor does pickling the trained model work.
I also tried this code
model = xgb.Booster()
model.load_model('xgboost-model')
but it's giving me this error
XGBoostError: std::bad_alloc
Any help would be greatly appreciated.
found the answer to my question. Apparently, the sagemaker environment is using an old build of XGBoost, around version 0.9. As the XGboost team make constant upgrades and changes to their library, AWS was unable to keep up with it.
That said I was able to run my code below by downgrading the XGBoost library on my environment from 1.7 to 0.9 and it works like a charm.
t = tarfile.open('model.tar.gz', 'r:gz')
t.extractall()
model = pkl.load('xgboost-model', 'rb')

cannot load CSVs or Excel files after some updates

I was trying to schedule a new python code by running with a BAT file, but was getting an error that the statsmodels package was not present. The package loaded fine in Spyder, but not when running from a BAT file. I followed a thread here that suggested updating my packages in console (pip command) which I did.
That led to a new error that NumPy was not loading. I noticed that I now had 2 versions of NumPy (1.19.1 and 1.19.2). Further searches yielded advise to uninstall and reinstall NumPy. I had to uninstall twice to get rid of both versions, then installing left me with 1.19.2.
Now, when I run my code in Spyder, I get a strange error on pd.read_csv:
"Only callable can be used as callback"
I couldn't find anyone getting this error from pd.read_csv. Next, I tried to run pd.read_excel in Spyder, but I get this error message:
"int() argument must be a string, a bytes-like object or a number, not '_NoValueType'"
This is code that worked fine yesterday on files that have not changed, so it is not the files. I even made a couple test files and get the same error. Trying to load statsmodels in Spyder now fails:
"from statsmodels.tsa.ar_model import AutoReg"
"AttributeError: module 'numpy.core' has no attribute 'records'"
Running the same code in BAT, reading csv and excel files DO work, but still hangs up on loading statsmodels.
I think at this point, I need to reload Anaconda, but I don't understand why code that works in Spyder does not work running from BAT file, when I am referencing the only copy of python that I have in Anaconda.
Thanks,
It seems to be fine today, so perhaps I needed a full reboot to implement the updates? I don't remember doing this in the past.
I'm still having the original issue with loading the statsmodels package when running from BAT file, but I will ask that in a new post.

import matplotlib failed while deploying my model in AWS sagemaker

I have deployed my AWS model successfully.
but while testing i am getting runtime Error: "import matplotlib.pyplot as plt" . I think it is due to pytorch framework version i used(framework_version=1.2.0). I am facing the same issue when i use higher versions as well.
PyTorchModel(model_data=model_artifact,
role = role,
framework_version=1.2.0,
entry_point='predict.py',
predictor_cls=ImagePredictor)
I have other issue when i use version=1.0.0. i.e i am not able to import libraries from sub directories and deployment itself is failing.
Eg: i have some code files in "Code" directory.
from Code.CTModel import NetWork ---> **this line will fail as "No module named Code" when i use version=1.0.0**
Ultimately i want to how to use/import libraries which are written under sub-directories.
It sounds like you want to inject some additional code libraries into the SageMaker PyTorch serving container. You might have to dig into the source code for how the PyTorch serving container is built to further customize it: https://github.com/aws/sagemaker-pytorch-inference-toolkit, or build your own image.
Digging into that source code a bit, I see that the container has enabled the importing of arbitrary code, but only when "multi-model mode" is enabled. Can you verify that the code exists under a directory "code" in your model directory and that "multi-model mode" is enabled?
def initialize(self, context):
# Adding the 'code' directory path to sys.path to allow importing user modules when multi-model mode is enabled.
if (not self._initialized) and ENABLE_MULTI_MODEL:
code_dir = os.path.join(context.system_properties.get("model_dir"), 'code')
sys.path.append(code_dir)
self._initialized = True
Reference: https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/c4e7abc49aeebc2f9b6035337548a90e4330113d/src/sagemaker_pytorch_serving_container/handler_service.py#L47
If this all seems complicated to you (it is), you might want to look into some standardized formats for serializing your PyTorch model such as https://onnx.ai/. I'd love to learn more about what you're trying to do here sometime if you reach out to me at contact#modelzoo.dev. I'm beta-testing a platform that enables deployment in a single line of code and would love to test it out here.
Let me make my query little bit high level: I have predict.py, jupyter notebook , Code(Direcotry),Evoludation(directory) and other .py files in source_dir.
--Code
--ResNet.py
--Densenet.py
--DataLoader.py
--Evaluation
--Evaluation.py
--predict.py
--CT_Code.ipynb
When i execute the predict file from jupyter notebook in my local system, all the modules are imported properly and everything is working fine. But when i am deploying same thing in sagemaker notebook facing issues as mentioned in my question.(Not able to import libraries from Code directory and some basic modules like imageio,PIL, Matplotlib)

Getting started with Syntaxnet

Having downloaded and installed Syntaxnet, how do I go about using Parsey McParseface model in an application? I have used the syntaxnet/demo.sh, and successfully labelled parts of speech as shown on the GitHub Readme. How do I now create an python app with this?
Here is the very simple and basic SyntaxNet-Python algorithm I used to open a file (with any defined format ) with LibreOffice

Categories