UnpicklingError: could not find MARK in utils.file - python

I am facing this error:
File "C:\Python27\lib\site-packages\gensim\utils.py", line 1334, in unpickle
return _pickle.load(f, encoding='latin1')
UnpicklingError: could not find MARK
while my utils.py code is:
with smart_open(fname, 'rb') as f:
f.seek(0)
# Because of loading from S3 load can't be used (missing readline in smart_open)
if sys.version_info > (3, 0):
return _pickle.load(f, encoding='latin1')
else:
return _pickle.loads(f.read())
def pickle(obj, fname, protocol=2):
"""Pickle object `obj` to file `fname`.
Parameters
----------
obj : object
Any python object.
fname : str
Path to pickle file.
protocol : int, optional
Pickle protocol number, default is 2 to support compatible across python 2.x and 3.x.
"""
with smart_open(fname, 'wb') as fout: # 'b' for binary, needed on Windows
_pickle.dump(obj, fout, protocol=protocol)
Anyone please help me I suffering on it few days.....

You're probably trying to load a model that was trained and saved in Python 3, but you're using Python 2. See
https://github.com/RaRe-Technologies/gensim/issues/853

Related

reading file by pickle module

good afternoon!
saving list(dict(),dict(),dict()) struct with pickle module, but when reading I get: <class 'function'>, and <function lesson at 0x00000278BA3A0D30>
what am I doing wrong?
def lesson(user, date):
with open(user+"_"+date+".data", 'wb') as file:
pickle.dump(lesson, file)
file.close()
def read(user, date):
with open(user+"_"+date+".data", 'rb') as file:
lesson = pickle.load(file)
file.close()
return(lesson)
I am using python 3.10.7
"saving list(dict(),dict(),dict()) struct with pickle module". No, you're not. You're saving the lesson function. See line 3 of your code.

Reading a binary file Python (pickle) [duplicate]

I created some data and stored it several times like this:
with open('filename', 'a') as f:
pickle.dump(data, f)
Every time the size of file increased, but when I open file
with open('filename', 'rb') as f:
x = pickle.load(f)
I can see only data from the last time.
How can I correctly read file?
Pickle serializes a single object at a time, and reads back a single object -
the pickled data is recorded in sequence on the file.
If you simply do pickle.load you should be reading the first object serialized into the file (not the last one as you've written).
After unserializing the first object, the file-pointer is at the beggining
of the next object - if you simply call pickle.load again, it will read that next object - do that until the end of the file.
objects = []
with (open("myfile", "rb")) as openfile:
while True:
try:
objects.append(pickle.load(openfile))
except EOFError:
break
There is a read_pickle function as part of pandas 0.22+
import pandas as pd
obj = pd.read_pickle(r'filepath')
The following is an example of how you might write and read a pickle file. Note that if you keep appending pickle data to the file, you will need to continue reading from the file until you find what you want or an exception is generated by reaching the end of the file. That is what the last function does.
import os
import pickle
PICKLE_FILE = 'pickle.dat'
def main():
# append data to the pickle file
add_to_pickle(PICKLE_FILE, 123)
add_to_pickle(PICKLE_FILE, 'Hello')
add_to_pickle(PICKLE_FILE, None)
add_to_pickle(PICKLE_FILE, b'World')
add_to_pickle(PICKLE_FILE, 456.789)
# load & show all stored objects
for item in read_from_pickle(PICKLE_FILE):
print(repr(item))
os.remove(PICKLE_FILE)
def add_to_pickle(path, item):
with open(path, 'ab') as file:
pickle.dump(item, file, pickle.HIGHEST_PROTOCOL)
def read_from_pickle(path):
with open(path, 'rb') as file:
try:
while True:
yield pickle.load(file)
except EOFError:
pass
if __name__ == '__main__':
main()
I developed a software tool that opens (most) Pickle files directly in your browser (nothing is transferred so it's 100% private):
https://pickleviewer.com/ (formerly)
Now it's hosted here: https://fire-6dcaa-273213.web.app/
Edit: Available here if you want to host it somewhere: https://github.com/ch-hristov/Pickle-viewer
Feel free to host this somewhere.

Python pickle throws TypeError [duplicate]

I'm using python3.3 and I'm having a cryptic error when trying to pickle a simple dictionary.
Here is the code:
import os
import pickle
from pickle import *
os.chdir('c:/Python26/progfiles/')
def storvars(vdict):
f = open('varstor.txt','w')
pickle.dump(vdict,f,)
f.close()
return
mydict = {'name':'john','gender':'male','age':'45'}
storvars(mydict)
and I get:
Traceback (most recent call last):
File "C:/Python26/test18.py", line 31, in <module>
storvars(mydict)
File "C:/Python26/test18.py", line 14, in storvars
pickle.dump(vdict,f,)
TypeError: must be str, not bytes
The output file needs to be opened in binary mode:
f = open('varstor.txt','w')
needs to be:
f = open('varstor.txt','wb')
Just had same issue. In Python 3, Binary modes 'wb', 'rb' must be specified whereas in Python 2x, they are not needed. When you follow tutorials that are based on Python 2x, that's why you are here.
import pickle
class MyUser(object):
def __init__(self,name):
self.name = name
user = MyUser('Peter')
print("Before serialization: ")
print(user.name)
print("------------")
serialized = pickle.dumps(user)
filename = 'serialized.native'
with open(filename,'wb') as file_object:
file_object.write(serialized)
with open(filename,'rb') as file_object:
raw_data = file_object.read()
deserialized = pickle.loads(raw_data)
print("Loading from serialized file: ")
user2 = deserialized
print(user2.name)
print("------------")
pickle uses a binary protocol, hence only accepts binary files. As the document said in the first sentence, "The pickle module implements binary protocols for serializing and de-serializing".

[File IO Error in porting from python2 to python 3]

I port my project from python 2.7 to python 3.6
What I was doing in python 2.7
1)Decode from Base 64
2)Uncompress using Gzip
3)Read line by line and add in file
bytes_array = base64.b64decode(encryptedData)
fio = StringIO.StringIO(bytes_array)
f = gzip.GzipFile(fileobj=fio)
decoded_data = f.read()
f.close()
f = file("DecodedData.log",'w')
for item in decoded_data:
f.write(item)
f.close()
I tried the same thing using python 3 changes but It is not working giving one error or the other.
I am not able to use StringIO giving error
#initial_value must be str or None, not bytes
So I try this
bytes_array = base64.b64decode(encryptedData)
#initial_value must be str or None, not bytes
fio = io.BytesIO(bytes_array)
f = gzip.GzipFile(fileobj=fio)
decoded_data =f.read()
f= open("DecodedData.log",'w')
for item in decoded_data:
f.write(item)
f.close()
This gives error in the line f.write(item) that
write() argument must be str, not int
To my surprsie,item actually contains an integer when i print it.(83,83,61,62)
I think it as I have not given the limit,it is reading as many as it can.
So I try to read file line by line
f= open("DecodedData.log",'w')
with open(decoded_data) as l:
for line in l:
f.write(line)
But it still not working and \n also printing in file.
Can some suggest what I am missing.
decoded_data = f.read()
Will result in decoded_data being a bytes object. bytes objects are iterable, when you iterate them they will return each byte value from the data as an integer (0-255). That means when you do
for item in decoded_data:
f.write(item)
then item will be each integer byte value from your raw data.
f.write(decoded_data)
You've opened f in text mode, so you'll need to open it in binary mode if you want to write raw binary data into it. But the fact you've called the file DecodedData.log suggests you want it to be a (human readable?) text file.
So I think overall this will be more readable:
gzipped_data = base64.b64decode(encryptedData)
data = gzip.decompress(gzipped_data)
with open("DecodedData.log",'wb') as f:
f.write(data)
There's no need for the intermediate BytesIO at all, gzip has a decompress method (https://docs.python.org/3/library/gzip.html#gzip.decompress)

What is the inverse operation to this pickle command?

when trying to apply some code i found on the internet i ran into a dataset that was pickled. Now to insert my own dataset into that i need to reverse the pickling myself. The piece of code that reads the pickle is:
import cPickle, gzip, numpy
# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()
And i want to write the pickle myself now:
with open(outfile) as f:
train_set = allfiles[:len(allfiles)/3]
valid_set = allfiles[len(allfiles)/3:(len(allfiles)/3)*2]
test_set = allfiles[(len(allfiles)/3)*2:]
cPickle.dump((train_set,valid_set,test_set), outfile,0)
However i get :
TypeError: argument must have 'write' attribute
What could be my problem? How would a good pickling code look like?
You want to use the file object, not the filename:
cPickle.dump((train_set,valid_set,test_set), f, 0)
However, your input was GZIP-compressed as well:
with gzip.open(outfile, 'wb') as f:
# ...
cPickle.dump((train_set,valid_set,test_set), f, 0)
Note that your own code forgot to state the correct mode for the opened file object as well; open(outfile) without arguments opens the file in read-modus, and writes would fail with an IOError: File not open for writing exception.
cPickle.dump((train_set,valid_set,test_set), outfile,0)
outfile is just a file name. You should use:
cPickle.dump((train_set,valid_set,test_set), f,0)

Categories