OSError: Failed to interpret my file as a pickle (numpy)

OSError: Failed to interpret my file as a pickle (numpy) - python

I am trying to load a file, and it has worked before but now I only get the error:
OSError: Failed to interpret file 'name.npz' as a pickle
The code I use is the following
data = np.load("name.npz")
I can't see what has changed since I last run the code and it worked, I even backed back to the original code (that I had when I'm sure it worked to load it) but it still gives the same error message.

You could open it as a raw pickle first and then convert to an numpy array as follow:
import pickle as pl
import numpy as np
myfile = "name.npz"
with open(myfile, 'rb') as handle:
my_array = pl.load(handle)
data = np.array(my_array)

Related

Writing interpolated grib2 data with pygrib leads to unusable grib file

I'm trying to use pygrib to read data from a grib2 file, interpolate it using python, and write it to another file. I've tried both pygrib and eccodes and both produce the same problem. The output file size increased by a factor of 3, but when I try to view the data in applications like Weather and Climate Toolkit it has all the variables listed, but "No Data" when plotted. If I use the same script and don't interpolate the data, but just write it to the new file it works fine in WCT. If I use wgrib2 it lists all the grib messages, but if I use wgrib2 -V it works on the unaltered data but produces the error "*** FATAL ERROR: unsupported: code table 5.6=0 ***" for the interpolated data. Am I doing something wrong in my python script? Here is an example of what I'm doing to write the file (same result using pygrib 2.05 and 2.1.3). I used a basic hrrr file for the example.
import pygrib
import numpy as np
import sys
def writeNoChange():
# This produces a useable grib file.
filename = 'hrrr.t00z.wrfprsf06.grib2'
outfile = 'test.grib2'
grbs = pygrib.open(filename)
with open(outfile, 'wb') as outgrb:
for grb in grbs:
msg = grb.tostring()
outgrb.write(msg)
outgrb.close()
grbs.close()
def writeChange():
# This method produces a grib file that isn't recognized by WCT
filename = 'hrrr.t00z.wrfprsf06.grib2'
outfile = 'testChange.grib2'
grbs = pygrib.open(filename)
with open(outfile, 'wb') as outgrb:
for grb in grbs:
vals = grb.values * 1
grb['values'] = vals
msg = grb.tostring()
outgrb.write(msg)
outgrb.close()
grbs.close()
#-------------------------------
if __name__ == "__main__":
writeNoChange()
writeChange()

Table 5.6 for GRIB2 (https://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_doc/) is related to "ORDER OF SPATIAL DIFFERENCING".
For some reason, when you modify grb['values'], it sets grb['orderOfSpatialDifferencing'] = 0, which "wgrib2 -V" doesn't like. So, after changing 'values', change 'orderOfSpatialDifferencing' to what it was initially:
orderOfSpatialDifferencing = grb['orderOfSpatialDifferencing']
grb['values']= [new values]
grb['orderOfSpatialDifferencing'] = orderOfSpatialDifferencing
This worked for me in terms of getting wgrib2 -V to work, but messed up the data. Possibly some other variables in Section 5 also need to be modified.

Code in Python that converts .h5 files to .csv

I have a .h5 file that I need to convert to .csv and this is what I have done.
#coding: utf-8
import numpy as np
import sys
import h5py
file = h5py.File('C:/Users/Sakib/Desktop/VNP46A1.A2020086.h01v01.001.2020087082319.h5','r')
a = list(file.keys())
np.savetxt(sys.stdout, file[a[0:]], '%g', ',')
But this generates an error saying 'list' object has no attribute 'encode'
[P.S Also I have not worked with the module sys before. Where will my new csv file be written and with which name?]

First, you have a small error in the arrangement of the []
. There is no need to create a list.
Also, sys.stdout depends on your process "standard output". For an interactive process it will go to the screen. You should create a file and write to it if you want to capture the output. Also, your formatting string (%g) needs to match the data in the HDF5 dataset.
Try this:
h5f= h5py.File('C:/Users/.....h5','r')
for a in h5f.keys() :
outf = open('./save_'+a+'.txt','w')
np.savetxt(outf, file[a][:], '%g', ',')
outf.close

Unable to load CIFAR-10 dataset: Invalid load key '\x1f'

I'm currently playing around with some neural networks in TensorFlow - I decided to try working with the CIFAR-10 dataset. I downloaded the "CIFAR-10 python" dataset from the website: https://www.cs.toronto.edu/~kriz/cifar.html.
In Python, I also tried directly copying the code that is provided to load the data:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
However, when I run this, I end up with the following error: _pickle.UnpicklingError: invalid load key, '\x1f'. I've also tried opening the file using the gzip module (with gzip.open(file, 'rb') as fo:), but this didn't work either.
Is the dataset simply bad, or this an issue with code? If the dataset's bad, where can I obtain the proper dataset for CIFAR-10?

Extract your *.gz file and use this code
from six.moves import cPickle
f = open("path/data_batch_1", 'rb')
datadict = cPickle.load(f,encoding='latin1')
f.close()
X = datadict["data"]
Y = datadict['labels']

Just extract your tar.gz file, you will get a folder of data_batch_1, data_batch_2, ...
After that just use, the code provided to load data into your project :
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
dict = unpickle('data_batch_1')

It seems like that you need to unzip *gz file and then unzip *tar file to get a folder of data_batches. Afterwards you could apply pickle.load() on these batches.

I was facing the same problem using jupyter(vscode) and python3.8/3.7. I have tried to edit the source cifar.py cifar10.py but without success.
the solution for me was run these two lines of code in separate normal .py file:
from tensorflow.keras.datasets import cifar10
cifar10.load_data()
after that it worked fine on Jupyter.

Try this:
import pickle
import _pickle as cPickle
import gzip
with gzip.open(path_of_your_cpickle_file, 'rb') as f:
var = cPickle.load(f)

Try in this way
import pickle
import gzip
with gzip.open(path, "rb") as f:
loaded = pickle.load(f, encoding='bytes')
it works for me

fail to load arff file in python

I am quite sure that my arff files are correct, for that I have downloaded different files on the web and successfully opened them in Weka.
But I want to use my data in python, then I typed:
import arff
data = arff.load('file_path','rb')
It always returns an error message: Invalid layout of the ARFF file, at line 1.
Why this happened and how should I do to make it right?

If you change your code like in below, it'll work.
import arff
data = arff.load(open('file_path'))

Using scipy we can load arff data in python
from scipy.io import arff
import pandas as pd
data = arff.loadarff('dataset.arff')
df = pd.DataFrame(data[0])
df.head()

Unable to load a previously dumped pickle file in Python

The implemented algorithm which I use is quite heavy and has three parts. Thus, I used pickle to dump everything in between various stages in order to do testing on each stage separately.
Although the first dump always works fine, the second one behaves as if it is size dependent. It will work for a smaller dataset but not for a somewhat larger one. (The same actually also happens with a heatmap I try to create but that's a different question) The dumped file is about 10MB so it's nothing really large.
The dump which creates the problem contains a whole class which in turn contains methods, dictionaries, lists and variables.
I actually tried dumping both from inside and outside the class but both failed.
The code I'm using looks like this:
data = pickle.load(open("./data/tmp/data.pck", 'rb')) #Reads from the previous stage dump and works fine.
dataEvol = data.evol_detect(prevTimeslots, xLablNum) #Export the class to dataEvol
dataEvolPck = open("./data/tmp/dataEvol.pck", "wb") #open works fine
pickle.dump(dataEvol, dataEvolPck, protocol = 2) #dump works fine
dataEvolPck.close()
and even tried this:
dataPck = open("./data/tmp/dataFull.pck", "wb")
pickle.dump(self, dataPck, protocol=2) #self here is the dataEvol in the previous part of code
dataPck.close()
The problem appears when i try to load the class using this part:
dataEvol = pickle.load(open("./data/tmp/dataEvol.pck", 'rb'))
The error in hand is:
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
dataEvol = pickle.load(open("./data/tmp/dataEvol.pck", 'rb'))
ValueError: itemsize cannot be zero
Any ideas?
I'm using Python 3.3 on a 64-bit Win-7 computer. Please forgive me if I'm missing anything essential as this is my first question.
Answer:
The problem was an empty numpy string in one of the dictionaries. Thanks Janne!!!

It is a NumPy bug that has been fixed recently in this pull request. To reproduce it, try:
import cPickle
import numpy as np
cPickle.loads(cPickle.dumps(np.string_('')))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

OSError: Failed to interpret my file as a pickle (numpy) - python

You could open it as a raw pickle first and then convert to an numpy array as follow: import pickle as pl import numpy as np myfile = "name.npz" with open(myfile, 'rb') as handle: my_array = pl.load(handle) data = np.array(my_array)

Related

Writing interpolated grib2 data with pygrib leads to unusable grib file

Code in Python that converts .h5 files to .csv

Unable to load CIFAR-10 dataset: Invalid load key '\x1f'

fail to load arff file in python

Unable to load a previously dumped pickle file in Python

Categories

Resources