MLflow load model fails Python - python

I am trying to build an API using an MLflow model.
the funny thing is it works from one location on my PC and not from another. So, the reason for doing I wanted to change my repo etc.
So, the simple code of
from mlflow.pyfunc import load_model
MODEL_ARTIFACT_PATH = "./model/model_name/"
MODEL = load_model(MODEL_ARTIFACT_PATH)
now fails with
ERROR: Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 540, in lifespan
async for item in self.lifespan_context(app):
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 481, in default_lifespan
await self.startup()
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 516, in startup
await handler()
File "/code/./app/main.py", line 32, in startup_load_model
MODEL = load_model(MODEL_ARTIFACT_PATH)
File "/usr/local/lib/python3.8/dist-packages/mlflow/pyfunc/__init__.py", line 733, in load_model
model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/usr/local/lib/python3.8/dist-packages/mlflow/spark.py", line 737, in _load_pyfunc
return _PyFuncModelWrapper(spark, _load_model(model_uri=path))
File "/usr/local/lib/python3.8/dist-packages/mlflow/spark.py", line 656, in _load_model
return PipelineModel.load(model_uri)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/util.py", line 332, in load
return cls.read().load(path)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/pipeline.py", line 258, in load
return JavaMLReader(self.cls).load(path)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/util.py", line 282, in load
java_obj = self._jread.load(path)
File "/usr/local/lib/python3.8/dist-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/utils.py", line 117, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
The model artifacts are already downloaded to the folder /model folder which has the following structure.
the load model call is in the main.py file
As I mentioned it works from another directory, but there is no reference to any absolute paths. Also, I have made sure that my package references are identical. e,g I have pinned them all down
# Model
mlflow==1.25.1
protobuf==3.20.1
pyspark==3.2.1
scipy==1.6.2
six==1.15.0
also, the same docker file is used both places, which among other things, makes sure that the final resulting folder structure is the same
......other stuffs
COPY ./app /code/app
COPY ./model /code/model
what can explain it throwing this exception whereas in another location (on my PC), it works (same model artifacts) ?
Since it uses load_model function, it should be able to read the parquet files ?
Any question and I can explain.
EDIT1: I have debugged this a little more in the docker container and it seems the parquet files in the itemFactors folder (listed in my screenshot above) are not getting copied over to my image , even though I have the copy command to copy all files under the model folder. It is copying the _started , _committed and _SUCCESS files, just not the parquet files. Anyone knows why would that be? I DO NOT have a .dockerignore file. Why are those files ignored while copying?

I found the problem. Like I wrote in the EDIT1 of my post, with further observations, the parquet files were missing in the docker container. That was strange because I was copying the entire folder in my Dockerfile.
I then realized that I was hitting this problem mentioned here. File paths exceeding 260 characters, silently fail and do not get copied over to the docker container. This was really frustrating because nothing failed during build and then during run, it gave me that cryptic error of "unable to infer schema for parquet", essentially because the parquet files were not copied over during docker build.

Related

Paramiko Upload Errors [Errno 2] [duplicate]

I'm calling the Paramiko sftp_client.put(locapath,remotepath) method
This is throwing the [Errno 2] File not found error below.
01/07/2020 01:12:03 PM - ERROR - [Errno 2] File not found
Traceback (most recent call last):
File "file_transfer\TransferFiles.py", line 123, in main
File "paramiko\sftp_client.py", line 727, in put
File "paramiko\sftp_client.py", line 689, in putfo
File "paramiko\sftp_client.py", line 460, in stat
File "paramiko\sftp_client.py", line 780, in _request
File "paramiko\sftp_client.py", line 832, in _read_response
File "paramiko\sftp_client.py", line 861, in _convert_status
Having tried many of the other recommend fixes I found that the error is due to the server having an automatic trigger to move the file immediately to another location upon the file being uploaded.
I've not seen another post relating to this issue and wanted to know if anyone else has fixed this as the SFTP server is owned by a third party and not wanting to change trigger attributes.
The file actually uploads correctly, so I could catch the Exception and ignore the error. But I'd prefer to handle it, if possible.
Paramiko by default verifies a size of the uploaded file after the upload.
If the file is moved away immediately after upload, the check fails.
To avoid the check, set confirm parameter of SFTPClient.put to False.
sftp_client.put(localpath, remotepath, confirm=False)
I believe the check is redundant anyway, see
How to perform checksums during a SFTP file transfer for data integrity?
For a similar question about pysftp (what is a wrapper around Paramiko), see:
Python pysftp.put raises "No such file" exception although file is uploaded
Also had this issue of the file automatically getting moved before paramiko could do an os.stat on the uploaded file and compare the local and uploaded file sizes.
#Martin_Prikryl solution works works fine for removing the error by passing in confirm=False when using sftp.put or sftp.putfo
If you want this check to still run like you mention in the post to see if the file has been uploaded fully you can run something along these lines. For this to work you will need to know the moved file location and have the ability to read the file.
import os
sftp.putfo(source_file_object, destination_file, confirm=False)
upload_size = sftp.stat(moved_path).st_size
local_size = os.stat(source_file_object).st_size
if upload_size != local_size:
raise IOError(
"size mismatch in put! {} != {}".format(upload_size, local_size)
)
Both checks use os.stat

Paramiko put method throws "[Errno 2] File not found" if SFTP server has trigger to automatically move file upon upload

I'm calling the Paramiko sftp_client.put(locapath,remotepath) method
This is throwing the [Errno 2] File not found error below.
01/07/2020 01:12:03 PM - ERROR - [Errno 2] File not found
Traceback (most recent call last):
File "file_transfer\TransferFiles.py", line 123, in main
File "paramiko\sftp_client.py", line 727, in put
File "paramiko\sftp_client.py", line 689, in putfo
File "paramiko\sftp_client.py", line 460, in stat
File "paramiko\sftp_client.py", line 780, in _request
File "paramiko\sftp_client.py", line 832, in _read_response
File "paramiko\sftp_client.py", line 861, in _convert_status
Having tried many of the other recommend fixes I found that the error is due to the server having an automatic trigger to move the file immediately to another location upon the file being uploaded.
I've not seen another post relating to this issue and wanted to know if anyone else has fixed this as the SFTP server is owned by a third party and not wanting to change trigger attributes.
The file actually uploads correctly, so I could catch the Exception and ignore the error. But I'd prefer to handle it, if possible.
Paramiko by default verifies a size of the uploaded file after the upload.
If the file is moved away immediately after upload, the check fails.
To avoid the check, set confirm parameter of SFTPClient.put to False.
sftp_client.put(localpath, remotepath, confirm=False)
I believe the check is redundant anyway, see
How to perform checksums during a SFTP file transfer for data integrity?
For a similar question about pysftp (what is a wrapper around Paramiko), see:
Python pysftp.put raises "No such file" exception although file is uploaded
Also had this issue of the file automatically getting moved before paramiko could do an os.stat on the uploaded file and compare the local and uploaded file sizes.
#Martin_Prikryl solution works works fine for removing the error by passing in confirm=False when using sftp.put or sftp.putfo
If you want this check to still run like you mention in the post to see if the file has been uploaded fully you can run something along these lines. For this to work you will need to know the moved file location and have the ability to read the file.
import os
sftp.putfo(source_file_object, destination_file, confirm=False)
upload_size = sftp.stat(moved_path).st_size
local_size = os.stat(source_file_object).st_size
if upload_size != local_size:
raise IOError(
"size mismatch in put! {} != {}".format(upload_size, local_size)
)
Both checks use os.stat

Overwriting DICOMDIR file with pydicom

I am editing the meta data of multiple dicom images, divided between dicomdirs. I have successfully loaded the dicomdir, traversed it to find the images, edited their meta data and overwritten the original dicom files.
I then successfully overwrite the dicomdir file itself but when I try to open it (for example with Aeskulap) it gives an error message which says "No study or bad DICOMDIR".
When I try to rerun my code I get the following error messages:
Traceback (most recent call last):
File "dicom_run.py", line 28, in <module>
dicom_dir = read_dicomdir(list_files[0])
File "/home/user/anaconda3/lib/python3.7/site-packages/pydicom
/filereader.py", line 883, in read_dicomdir
ds = dcmread(filename)
File "/home/user/anaconda3/lib/python3.7/site-packages/pydicom
/filereader.py", line 850, in dcmread
force=force, specific_tags=specific_tags)
File "/home/user/anaconda3/lib/python3.7/site-packages/pydicom
/filereader.py", line 741, in read_partial
is_implicit_VR, is_little_endian)
File "/home/user/anaconda3/lib/python3.7/site-packages/pydicom
/dicomdir.py", line 57, in __init__
self.parse_records()
File "/home/user/anaconda3/lib/python3.7/site-packages/pydicom
/dicomdir.py", line 95, in parse_records
child = map_offset_to_record[child_offset]
KeyError: 504
When I access the individual dicom files within the directory they load just fine so the problem is how I'm overwriting the dicomdir.
I do so using the following code
import pydicom
from pydicom.filereader import read_dicomdir
# Load dicomdir
dicom_dir = read_dicomdir(<path_to_dicomdir>)
# Here I just traverse the dicom_dir object
# as is outlined here:
# https://pydicom.github.io/pydicom/stable/auto_examples/input_output/plot_read_dicom_directory.html
# Then I (successfully) overwrite the dicomdir with
dicom_dir.save_as(<path_to_dicomdir>)
I have also tried to use the write_file and write_dataset functions as detailed here:
https://pydicom.github.io/pydicom/stable/api_ref.html#module-pydicom.filewriter
again unsuccessfully. I have a backup of the original dicomdir file and when I replace that everything works fine again (and the meta data of each image has been edited). I'm completely lost here.
Edit:
https://github.com/pydicom/pydicom/issues/918
I stumbled upon this. Guess I'll have to do it another way.
I used dcmmkdir and wrote a small bash script that entered all the folders and ran dcmmkdir +r which successfully overwrote the dicomdir files.

Error when loading a custom model with spacy

I'm trying to load a custom model called 'ru2' into spacy (for npl processing).
it can be found there: https://github.com/buriy/spacy-ru
The problem is when I call the function
nlp = spacy.load('ru2')
doc = nlp(text)
I see the error
C:\ProgramData\Anaconda3\lib\importlib\_bootstrap.py:205: RuntimeWarning: spacy.tokens.span.Span size changed, may indicate binary incompatibility. Expected 72 from C header, got 80 from PyObject
return f(*args, **kwds)
Traceback (most recent call last):
File "C://.../nlp/src/ie/main.py", line 125, in <module>
main(examp_dict['Poroshenko'])
File "C://.../nlp/src/ie/main.py", line 92, in main
nlp = spacy.load('ru2')
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\__init__.py", line 27, in load
return util.load_model(name, **overrides)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 133, in load_model
return load_model_from_path(Path(name), **overrides)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 791, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 630, in from_disk
reader(path / key)
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 781, in <lambda>
deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 606, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\_msgpack_api.py", line 29, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\msgpack\__init__.py", line 60, in unpackb
return _unpackb(packed, **kwargs)
File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
TypeError: unhashable type: 'list'
I was searching for similar questions in the Internet:
https://github.com/explosion/spaCy/issues/2715
https://spacy.io/usage#unhashable-list
But non of those solutions work for me.
I use
msgpack==0.5.6 (even downgraded as suggested in the link above)
spacy==2.1.4
Here is from https://spacy.io/usage#troubleshooting
If you’re training models, writing them to disk, and versioning them with git, you might encounter this error when trying to load them in a Windows environment. This happens because a default install of Git for Windows is configured to automatically convert Unix-style end-of-line characters (LF) to Windows-style ones (CRLF) during file checkout (and the reverse when committing). While that’s mostly fine for text files, a trained model written to disk has some binary files that should not go through this conversion. When they do, you get the error above. You can fix it by either changing your core.autocrlf setting to "false", or by committing a .gitattributes file] to your repository to tell git on which files or folders it shouldn’t do LF-to-CRLF conversion, with an entry like path/to/spacy/model/** -text. After you’ve done either of these, clone your repository again.
It might be because the version number of SpaCy used to generate your model is not the same as the version of SpaCy you have installed. (I don't know of course, just mentioning it in case it helps.)
Adding to the answer above, another quick fix would be to manually download the zip from the repository.

Mercurial: Permission Denied for hgwebdir

Yesterday I setup Apache to serve my Mercurial repositories and got everything working properly. I then tested pushing changes back to this repository and was presented with an error, and now that error pops up for every single operation I attempt - even just a simple GET request of the repositories! Here is the error:
mod_wsgi (pid=1771): Target WSGI script '/var/hg/hgweb.wsgi' cannot be loaded as Python module.
mod_wsgi (pid=1771): Exception occurred processing WSGI script '/var/hg/hgweb.wsgi'.
Traceback (most recent call last):
File "/var/hg/hgweb.wsgi", line 18, in ?
application = hgwebdir(config)
File "/usr/lib64/python2.4/site-packages/mercurial/hgweb/__init__.py", line 15, in hgwebdir
return hgwebdir_mod.hgwebdir(*args, **kwargs)
File "/usr/lib64/python2.4/site-packages/mercurial/hgweb/hgwebdir_mod.py", line 52, in __init__
self.refresh()
File "/usr/lib64/python2.4/site-packages/mercurial/hgweb/hgwebdir_mod.py", line 82, in refresh
self.repos = findrepos(paths)
File "/usr/lib64/python2.4/site-packages/mercurial/hgweb/hgwebdir_mod.py", line 36, in findrepos
for path in util.walkrepos(roothead, followsym=True, recurse=recurse):
File "/usr/lib64/python2.4/site-packages/mercurial/util.py", line 1164, in walkrepos
for hgname in walkrepos(fname, True, seen_dirs):
File "/usr/lib64/python2.4/site-packages/mercurial/util.py", line 1146, in walkrepos
for root, dirs, files in os.walk(path, topdown=True, onerror=errhandler):
File "/usr/lib64/python2.4/os.py", line 276, in walk
onerror(err)
File "/usr/lib64/python2.4/site-packages/mercurial/util.py", line 1127, in errhandler
raise err
OSError: [Errno 13] Permission denied: './dev/fd'
My repository directory is owned by apache, the user running Apache. I dont know why './dev/fd' is being operated on either. I've restarted the server numerous times, recreated the repository directory, but I still get this error no matter what! I dont have access to restart the machine, so that is not an option. But it seems to have gotten in a very bad persistent state, and I dont know how to fix it. Any help is appreciated!
This turned out to be a configuration error on my part, and rather than delete the question I'll post the resolution here in case someone has this problem in the future.
Here was the hgweb.config I was using:
[paths]
/ = /var/hg/repos/*
#[web]
style = gitweb
allow_archive = bz2 gz zip
maxchanges = 200
allow_push = *
push_ssl = false
Two problems here, one is obvious. I had the [web] header commented out, and I assume that many of the options are not valid for the [paths] section. Also, after re-reading the Hg docs again, the push_ssl directive does not belong in the hgweb.config file, but rather in each repository's .hg/hgrc (or the ~/.hgrc of the user that runs apache). After fixing these, things are working perfectly!

Categories