Python read pickle protocol 4 error: STACK_GLOBAL requires str - python

In Python 3.7.5, ubuntu 18.04, pickle read gives error,
pickle version 4
Sample code:
import pickle as pkl
file = open("sample.pkl", "rb")
data = pkl.load(file)
Error:
UnpicklingError Traceback (most recent call
last)
in
----> 1 data = pickle.load(file)
UnpicklingError: STACK_GLOBAL requires str
Reading from same file object solves problem.
Reading using pandas also gives same problem

I also has this error turned out I was opening a numpy file with pickle. ;)

Turns out it is known issue. There is issue page in
github

I had this problem and just added pckl to the end of the file name.

My problem was that I was trying to pickle and un-pickle across different python environments - watch out to make sure your pickle versions match!

Perhaps this will be the solution to this error for someone.
I needed to load a numpy array:
torch.load(file)
When I loaded the array, this error appeared. All that is needed is to turn the array into a tensor.
For example:
result = torch.from_numpy(np.load(file))

Related

Read Data into Google Colab Environment

I am trying to run a sentiment analysis code in google colab to increase the processing speed compared to running the code on my device. But I am running into a strange error, which I am not able to solve.
I mounted the drive using the following code:
from google.colab import drive
drive.mount('/content/drive')
Then I want to load a Pickle-File, which I have saved in MyDrive using the following code:
with open('/content/drive/MyDrive/data_colab_new.pkl', 'rb') as file:
data = pickle.load(file)
But I get the following error message:
EOFError Traceback (most recent call last)
in
2 with open('/content/drive/MyDrive/data_colab_new.pkl', 'rb') as file:
3 # Load the pickle file
----> 4 data = pickle.load(file)
EOFError: Ran out of input
I already googled and the only explantion I found was that the Pickle-File is empty. But I checked multiple times now and I am sure that the file is not empty.
What could be another reason for that error and do you know any way to fix it? I`m not able to figure it out myself.
use this in google colab to load the pickle file first.
from google.colab import files
files.upload()
then pickle load like this
f = open('model.pkl', 'rb')
model = pickle.load(f)

ValueError when reading a sas file with pandas

pandas.read_sas() prints traceback messages that I cannot remove. The problem is it prints messages for EACH row it's reading, so when I try to read the whole file it just freezes printing too much.
I tried from other stackoverflow answers
import warnings
warnings.simplefilter(action='ignore')
And
warnings.filterwarnings('ignore')
And
from IPython.display import HTML
HTML('''<script>
code_show_err=false;
function code_toggle_err() {
if (code_show_err){
$('div.output_stderr').hide();
} else {
$('div.output_stderr').show();
}
code_show_err = !code_show_err
}
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click here.''')
But nothing works.
The message it prints is:
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) pandas\io\sas\sas.pyx in pandas.io.sas._sas.rle_decompress()
ValueError: Unexpected non-zero end_of_first_byte
Exception ignored in:
'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback
(most recent call last): File "pandas\io\sas\sas.pyx", line 29, in
pandas.io.sas._sas.rle_decompress ValueError: Unexpected non-zero
end_of_first_byte
As highlighted in the traceback, the error is caused by a bug in the pandas implementation of RLE decompression, which is used when the SAS dataset is exported using CHAR (RLE) compression.
Note the pandas issue created for this topic: https://github.com/pandas-dev/pandas/issues/31243
The resolution that pandas implemented for this bug in read_sas is contained in the following Pull Request, which is part of the version 1.5 milestone, yet to be released at the time of answering: https://github.com/pandas-dev/pandas/pull/47113
To answer your question, you have two options:
Wait until pandas releases version 1.5, update to that version, and read_sas should then work as expected. You've already been waiting awhile since you asked, so I suspect this will be fine.
Use the python sas7bdat library instead (https://pypi.org/project/sas7bdat/), and then convert to a pandas DataFrame:
from sas7bdat import SAS7BDAT
df = SAS7BDAT("./path/to/file.sas7bdat").to_data_frame()
The sas7bdat approach worked for me, after facing the exact same error as you did.

'Tables' not recognizing 'isHDF5File'

I am writing a code that creates an HDF5 that can later be used for data analysis. I load the following packages:
import numpy as np
import tables
Then I use the tables module to determine if my file is an HDF5 file with:
tables.isHDF5File(FILENAME)
This normally would print either TRUE or FALSE depending on if the file type is actually an HDF5 file or not. However, I get the error:
AttributeError: module 'tables' has no attribute 'isHDF5File'
So I tried:
from tables import isHDF5File
and got the error:
ImportError: cannot import name 'isHDF5File'
I've tried this code on another computer, and it ran fine. I've tried updating both numpy and tables with pip but it states that the file is already up to date. Is there a reason 'tables' isn't recognizing 'isHDF5File' for me? I am running this code on a Mac (not working) but it worked on a PC (if this matters).
Do you have the function name right?
In [21]: import tables
In [22]: tables.is_hdf5_file?
Docstring:
is_hdf5_file(filename)
Determine whether a file is in the HDF5 format.
When successful, it returns a true value if the file is an HDF5
file, false otherwise. If there were problems identifying the file,
an HDF5ExtError is raised.
Type: builtin_function_or_method
In [23]:

TypeError: read_excel() takes exactly 2 arguments (1 given)

I get this problem when i try to read file:
import numpy as np
import pandas as pd
pos = pd.read_excel('pos.xls', header=None)
and the error is like this:
Traceback (most recent call last):
File "one-hot.py", line 4, in <module>
pos = pd.read_excel('pos.xls', header=None)
TypeError: read_excel() takes exactly 2 arguments (1 given)
but to my surprise,when i run the code in my own pc by pycharm,it will not be an error.i get the problem only when i use my school's ubuntu(not use pycharm).
my own python is python 2.7.12,and python on school's ubuntu is python 2.7.6
My best guess (I can't try it on Python 2.7.6 since I don't have it) is that You use pandas version 0.13 or bellow. According to docs, You must also provide sheetname, which, in later version, has default value of 0.
pandas.io.excel.read_excel(io, sheetname, **kwds)
This sounds like an issue with a different version of the pandas library installed. Looking back at the older documentation pages for pandas library, it seems that pandas did in fact require 2 parameters back in version 0.13.0 (and potentially other old versions, but I did not check any others). For version 0.13.0, the docs define the function as:
pandas.read_excel(io, sheetname, **kwds)
You can read those details here: http://pandas.pydata.org/pandas-docs/version/0.13.0/generated/pandas.read_excel.html?highlight=read_excel#pandas.read_excel
Chances are, it is just an issue with a different library version.
I actually had a similar problem which was solved by adding '.xlsx' to the end of my proposed file name:
practicetoexcel.to_excel('Thisxldoc.xlsx', sheet_name = 'Practice')

Unable to load a previously dumped pickle file in Python

The implemented algorithm which I use is quite heavy and has three parts. Thus, I used pickle to dump everything in between various stages in order to do testing on each stage separately.
Although the first dump always works fine, the second one behaves as if it is size dependent. It will work for a smaller dataset but not for a somewhat larger one. (The same actually also happens with a heatmap I try to create but that's a different question) The dumped file is about 10MB so it's nothing really large.
The dump which creates the problem contains a whole class which in turn contains methods, dictionaries, lists and variables.
I actually tried dumping both from inside and outside the class but both failed.
The code I'm using looks like this:
data = pickle.load(open("./data/tmp/data.pck", 'rb')) #Reads from the previous stage dump and works fine.
dataEvol = data.evol_detect(prevTimeslots, xLablNum) #Export the class to dataEvol
dataEvolPck = open("./data/tmp/dataEvol.pck", "wb") #open works fine
pickle.dump(dataEvol, dataEvolPck, protocol = 2) #dump works fine
dataEvolPck.close()
and even tried this:
dataPck = open("./data/tmp/dataFull.pck", "wb")
pickle.dump(self, dataPck, protocol=2) #self here is the dataEvol in the previous part of code
dataPck.close()
The problem appears when i try to load the class using this part:
dataEvol = pickle.load(open("./data/tmp/dataEvol.pck", 'rb'))
The error in hand is:
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
dataEvol = pickle.load(open("./data/tmp/dataEvol.pck", 'rb'))
ValueError: itemsize cannot be zero
Any ideas?
I'm using Python 3.3 on a 64-bit Win-7 computer. Please forgive me if I'm missing anything essential as this is my first question.
Answer:
The problem was an empty numpy string in one of the dictionaries. Thanks Janne!!!
It is a NumPy bug that has been fixed recently in this pull request. To reproduce it, try:
import cPickle
import numpy as np
cPickle.loads(cPickle.dumps(np.string_('')))

Categories