Unable to load CIFAR-10 dataset: Invalid load key '\x1f' - python

I'm currently playing around with some neural networks in TensorFlow - I decided to try working with the CIFAR-10 dataset. I downloaded the "CIFAR-10 python" dataset from the website: https://www.cs.toronto.edu/~kriz/cifar.html.
In Python, I also tried directly copying the code that is provided to load the data:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
However, when I run this, I end up with the following error: _pickle.UnpicklingError: invalid load key, '\x1f'. I've also tried opening the file using the gzip module (with gzip.open(file, 'rb') as fo:), but this didn't work either.
Is the dataset simply bad, or this an issue with code? If the dataset's bad, where can I obtain the proper dataset for CIFAR-10?

Extract your *.gz file and use this code
from six.moves import cPickle
f = open("path/data_batch_1", 'rb')
datadict = cPickle.load(f,encoding='latin1')
f.close()
X = datadict["data"]
Y = datadict['labels']

Just extract your tar.gz file, you will get a folder of data_batch_1, data_batch_2, ...
After that just use, the code provided to load data into your project :
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
dict = unpickle('data_batch_1')

It seems like that you need to unzip *gz file and then unzip *tar file to get a folder of data_batches. Afterwards you could apply pickle.load() on these batches.

I was facing the same problem using jupyter(vscode) and python3.8/3.7. I have tried to edit the source cifar.py cifar10.py but without success.
the solution for me was run these two lines of code in separate normal .py file:
from tensorflow.keras.datasets import cifar10
cifar10.load_data()
after that it worked fine on Jupyter.

Try this:
import pickle
import _pickle as cPickle
import gzip
with gzip.open(path_of_your_cpickle_file, 'rb') as f:
var = cPickle.load(f)

Try in this way
import pickle
import gzip
with gzip.open(path, "rb") as f:
loaded = pickle.load(f, encoding='bytes')
it works for me

Related

Face_Recognition issue loading encoding file

I want to create a .csv file to speed up the loading of the encoding file of my face recognition program using face_recognition on python.
When my algorithm detect a new face, he generate an encoding file using face_recognition and then:
with open('data.csv', 'a') as file:
writer = csv.writer(file)
writer.writerow([ID,new_face_reco])
I do that to send the code to the .csv file. (ID is a random name I give to the face and new_face_reco is the encoding of the new face)
But I want to reopen it when i relaunch the progam so I have this at the beginning:
known_face_encodings_temp = []
known_face_names_temp = []
with open('data.csv', 'rb') as file:
data = [row for row in csv.reader(file,delimiter=',')]
known_face_names_temp.append(np.array(data[0][0]))
essai = np.array(data[0][1].replace('\n',''))
known_face_encodings_temp.append(essai.tolist())
known_face_encodings=known_face_encodings_temp
known_face_name=known_face_names_temp
I have a lot of issue (this is why they are a lot of line in this part) cause my encoding change from the .csv to the reload of it. Here is what I got:
Initial data:
array([-8.31770748e-02, ... , -3.41368467e-03])
When I try to reload my csv (without me trying to change anything):
'[-1.40143648e-01 ... -8.10057670e-02\n 3.77673171e-02 1.40102580e-02 8.14460665e-02
7.52283633e-02]'
What i do when i try to change thing:
'[-1.40143648e-01 ... 7.52283633e-02]'
I need to have my load data the same as the initial data what can I do ?
Instead of using CSV files, try using numpy (.npy) files; they're much easier to save and load. I have used them myself in one of my projects that utilizes the face_recognition module and would be happy to help you out.
To save an encoding, you can:
np.save(path to save, encoding)
To load an encoding, you can:
encodingVariable = np.load(path to load)

Is there an equivalent of R's save.image() function in Python?

When working in R, one has the ability to save the entire "workspace", variables, output, etc... to an image file using "save.image()". Is there something equivalent in Python?
Thank you.
I am not familiar with r, but pickle offers functionality to save and load (variables, objects, types, etc...) in a pickle file. In this way you can save any details needed for a later session. I'm unsure if pickle offers a specific way to save all data associated with the current session or if you would be required to manually locate and save. Hope this helps!
import pickle
my_obj = Object()
my_var = (1,"some_data")
filename = "my_dir\my_file.pickle"
with open(filename, ‘wb’) as f: #save data
pickle.dump((my_obj, my_var), f)
with open(filename, ‘rb’) as f: #load data next time
my_saved_obj, my_saved_var = pickle.load(f)

OSError: Failed to interpret my file as a pickle (numpy)

I am trying to load a file, and it has worked before but now I only get the error:
OSError: Failed to interpret file 'name.npz' as a pickle
The code I use is the following
data = np.load("name.npz")
I can't see what has changed since I last run the code and it worked, I even backed back to the original code (that I had when I'm sure it worked to load it) but it still gives the same error message.
You could open it as a raw pickle first and then convert to an numpy array as follow:
import pickle as pl
import numpy as np
myfile = "name.npz"
with open(myfile, 'rb') as handle:
my_array = pl.load(handle)
data = np.array(my_array)

Python: Saving / loading large array using numpy

I have saved a large array of complex numbers using python,
numpy.save(file_name, eval(variable_name))
that worked without any trouble. However, loading,
variable_name=numpy.load(file_name)
yields the following error,
ValueError: total size of new array must be unchanged
Using: Python 2.7.9 64-bit and the file is 1.19 GB large.
There is no problem with the size of your array, you likely didn't opened your file in the right way, try this:
with open(file_name, "rb") as file_:
variable_name = np.load(file_)
Alternatively you can use pickle:
import pickle
# Saving:
data_file = open('filename.bi', 'w')
pickle.dump(your_data, data_file)
data_file.close()
# Loading:
data_file = open('filename.bi')
data = pickle.load(data_file)
data_file.close()

How to write in ARFF file using LIAC-ARFF package in Python?

I want to load an ARFF file in python, then change some values of it and then save changes to file. I'm using LIAC-ARFF package (https://pypi.python.org/pypi/liac-arff). I loaded ARFF file with following lines of code:
import arff
data = arff.load(open(FILE_NAME, 'rb'))
After manipulating some values inside data, i want to write data to another ARFF file. Any solution?
Use the following code:
import arff
data = arff.load(open(FILE_NAME, 'rb'))
f = open(outputfilename, 'wb')
arff.dump(data, f)
f.close()
In the LICA-ARFF description you see dump method which serializes to a the file, but it's wrong. It just write object as text file. Serialize means save whole the object, so the output file is binary not a text file.
We can load arff data into python using scipy.
from scipy.io import arff
import pandas as pd
data = arff.loadarff('dataset.arff')
df = pd.DataFrame(data[0])
df.head()

Categories