Scikit-learn joblib/zlib error - python

I ran into the following error when trying to load a .pkl file using joblib.
self.name = joblib.load(
os.path.join(BASE_DIR, '../relative/directory/%s.pkl' % name)
)
The above code yields the following error.
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/numpy_pickle.py", line 68, in read_zfile
data = zlib.decompress(file_handle.read(), 15, length)
zlib.error: Error -3 while decompressing data: invalid distance too far back
Has anyone run into this issue before?

It means that the compressed data was corrupted somewhere along the way.

Related

why "bson" librarye dosen't work in jupyter notebook

I'm trying to read a bson file this is my code:
import bson
with open("D:/rl env/chat_log.bson",'rb') as f:
datas = bson.decode_all(f.read())
note that "D:/rl env/chat_log.bson" is my file path.
i got below error:
AttributeError: module 'bson' has no attribute 'decode_all'
I must mention that I didn't get any error when I ran this code in google colab.
Have you tried to use the loads method
with open("D:/rl env/chat_log.bson",'r') as f:
datas = bson.loads(f.read())

KeyError getting metadata from video file

I'm trying to use ffmpeg to get the resolution height and the audio bitrate from a video file, but I'm getting the following error that doesn't tell me much:
File "/home/user/code/python/reduce_video_size/main.py", line 94, in get_metadata
return video_streams[0]
KeyError: 0
----------------------------------------------------------------------
Ran 1 test in 0.339s
FAILED (errors=1)
so I don't know what can I do to fix it.
print(get_metadata("/home/user/code/python/reduce_video_size/test.mp4"))
def get_metadata(path):
video_streams = ffmpeg.probe(path, select_streams = "v")
if video_streams:
return video_streams[0]
If there's need for more context here is the code.
This solved it but it would still be nice to have some error checking:
def get_metadata(path):
video_stream = ffmpeg.probe(path, select_streams = "v")
return video_stream['streams'][0]
According to the source code, ffmpeg.probe returns a dictionary loaded from JSON. So, you don't need to take out the first item and the [0] can be omitted. It does obviously not have any integer indices.

How can a GRIB file be opened with pygrib without first downloading the file?

The documentation for pygrib shows a function called fromstring which creates a gribmessage instance from a python bytes object representing a binary grib message. I might be misunderstanding the purpose of this function, but it leads me to believe I can use it in place of downloading a GRIB file and using the open function on it. Unfortunately, my attempts to open a multi-message GRIB file from NLDAS2 have failed. Does anyone else know how to use pygrib on GRIB data without first saving the file? My code below shows how I would like it to work. Instead, it gives the error TypeError: expected bytes, int found on the line for grib in gribs:
from urllib import request
import pygrib
url = "<remote address of desired file>"
username = "<username>"
password = "<password>"
redirectHandler = request.HTTPRedirectHandler()
cookieProcessor = request.HTTPCookieProcessor()
passwordManager = request.HTTPPasswordMgrWithDefaultRealm()
passwordManager.add_password(None, "https://urls.earthdata.nasa.gov", username, password)
authHandler = request.HTTPBasicAuthHandler(passwordManager)
opener = request.build_opener(redirectHandler, cookieProcessor, authHandler)
request.install_opener(opener)
with request.urlopen(url) as response:
data = response.read()
gribs = pygrib.fromstring(data)
for grib in gribs:
print(grib)
Edit to add the entire error output:
Traceback (most recent call last):
File ".\example.py", line 19, in <module>
for grb in grbs:
File "pygrib.pyx", line 1194, in pygrib.gribmessage.__getitem__
TypeError: expected bytes, int found
Edit: This interface does not support multi-message GRIB files, but the authors are open to a pull request if anyone wants to write up the code. Unfortunately, my research focus has shifted and I don't have time to contribute myself.
As stated by jasonharper you can use pygrib.fromstring(). I just tried it myself and this works.
Here is the link to the documentation.
Starting with pygrib v2.1.4, the changelog says that pygrib.open() now accepts io.BufferedReader object as an input argument.
see pygrib changelog here
That would theoretically allow you to read a GRIB2 file from RAM memory without writing it to disk.
I think the usage is supposed to be the following :
binary_io = io.BytesIO(bytes_data)
buffer_io = io.BufferedReader(binary_io)
grib_file = pygrib.open(buffer_io)
But I was not able to make it work on my side !

Parsing Google Fonts METADATA.pb - google.protobuf.message.DecodeError: Error parsing message

I'm trying to parse a METADATA.pb file from the official Google Fonts repo which can be found here: https://github.com/google/fonts (Example METADATA.pb file for the Roboto font: https://github.com/google/fonts/blob/master/apache/roboto/METADATA.pb)
To parse proto-buf files, the right format is required. It can be downloaded as "public_fonts.proto" here: https://github.com/googlefonts/gftools/blob/master/Lib/gftools/fonts_public.proto
I used it to generate a Python code file called "fonts_public_pb2.py" with this command:
protoc -I=. --python_out=. fonts_public.proto
And here is my code which imports this generated file, reads the content of a METADATA.pb file (shouldn't matter which one, they all follow the same structure) and then tries to parse the proto-buf string.
#! /usr/bin/env python
import fonts_public_pb2
protobuf_file_path = 'METADATA.pb'
protobuf_file = open(protobuf_file_path, 'rb')
protobuf = protobuf_file.read()
font_family = fonts_public_pb2.FamilyProto()
font_family.ParseFromString(protobuf)
Just a few lines, nothing too complicated, but the output is always the same:
Traceback (most recent call last):
File "parse.py", line 22, in <module>
font_family.ParseFromString(protobuf)
google.protobuf.message.DecodeError: Error parsing message
I don't usually code in Python, so the issue here might very well be me, but after trying a few different things I don't know what to do anymore:
Used the already generated "fonts_public_pb2.py" file from the gftools repo: https://github.com/googlefonts/gftools/blob/master/Lib/gftools/fonts_public_pb2.py - My generated output from the "public_fonts.proto" file and this file are nearly identical, I check with Meld. The error was still the same
Set all "required" fields in the .proto file to "optional", Generated the "fonts_public_pb2.py" file again - Same error
Tried Python 2 and 3 - Same error
Those METADATA.pb files are not binary protobuf files, they use the text format.
import fonts_public_pb2
from google.protobuf import text_format
protobuf_file_path = 'METADATA.pb'
protobuf_file = open(protobuf_file_path, 'r')
protobuf = protobuf_file.read()
font_family = fonts_public_pb2.FamilyProto()
text_format.Merge(protobuf, font_family)
print(font_family)

Conflict between pandas and erlport?

I am passing data between erlang and python using erlport, following the example here:
http://erlport.org/docs/python.html
the python file I'm calling only contains the line:
import pandas as pd
I am getting the error:
** exception error: {python,'exceptions.AttributeError',
"'function' object has no attribute 'lower'",
[{<<"/anaconda/lib/python2.7/site-packages/pandas/core/format.py">>,
1701,<<"detect_console_encoding">>,
<<"if not encoding or 'ascii' in encoding.lower(): # try again for something bette"...>>},
{<<"/anaconda/lib/python2.7/site-packages/pandas/core/config_init.py">>,
234,<<"<module>">>,
<<"cf.register_option('encoding', detect_console_encoding(), pc_encoding_doc,">>},
{<<"/anaconda/lib/python2.7/site-packages/pandas/__init__.py">>,
25,<<"<module>">>,<<"import pandas.core.config_init">>},
{<<"/Documents/data-algorithms/Alg"...>>,
3,<<"<module>">>,<<"import pandas as pd">>},
{<<"/Documents/testki"...>>,
237,<<"_incoming_call">>,
<<"f = __import__(module, {}, {}, [objects[0]])">>},
{<<"/Documents/te"...>>,
245,<<"_call_with_error_handler">>,<<"function(*args)">>}]}
in function erlport:call/3 (src/erlport.erl, line 234)
in call from algo_tester:start/0 (src/algo_tester.erl, line 27)
I can get rid of the error by commenting out the following two lines in /anaconda/lib/python2.7/site-packages/pandas/core/config_init.py:
234 cf.register_option('encoding', detect_console_encoding(), pc_encoding_doc,
235 validator=is_text)
but then print doesn't work any longer.
Has anyone encountered this before?
It was answered on GitHub:
ErlPort's issue: https://github.com/hdima/erlport/issues/11
Pandas' issue: https://github.com/pydata/pandas/issues/5687

Categories