when trying to apply some code i found on the internet i ran into a dataset that was pickled. Now to insert my own dataset into that i need to reverse the pickling myself. The piece of code that reads the pickle is:
import cPickle, gzip, numpy
# Load the dataset
f = gzip.open('mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()
And i want to write the pickle myself now:
with open(outfile) as f:
train_set = allfiles[:len(allfiles)/3]
valid_set = allfiles[len(allfiles)/3:(len(allfiles)/3)*2]
test_set = allfiles[(len(allfiles)/3)*2:]
cPickle.dump((train_set,valid_set,test_set), outfile,0)
However i get :
TypeError: argument must have 'write' attribute
What could be my problem? How would a good pickling code look like?
You want to use the file object, not the filename:
cPickle.dump((train_set,valid_set,test_set), f, 0)
However, your input was GZIP-compressed as well:
with gzip.open(outfile, 'wb') as f:
# ...
cPickle.dump((train_set,valid_set,test_set), f, 0)
Note that your own code forgot to state the correct mode for the opened file object as well; open(outfile) without arguments opens the file in read-modus, and writes would fail with an IOError: File not open for writing exception.
cPickle.dump((train_set,valid_set,test_set), outfile,0)
outfile is just a file name. You should use:
cPickle.dump((train_set,valid_set,test_set), f,0)
Related
I created some data and stored it several times like this:
with open('filename', 'a') as f:
pickle.dump(data, f)
Every time the size of file increased, but when I open file
with open('filename', 'rb') as f:
x = pickle.load(f)
I can see only data from the last time.
How can I correctly read file?
Pickle serializes a single object at a time, and reads back a single object -
the pickled data is recorded in sequence on the file.
If you simply do pickle.load you should be reading the first object serialized into the file (not the last one as you've written).
After unserializing the first object, the file-pointer is at the beggining
of the next object - if you simply call pickle.load again, it will read that next object - do that until the end of the file.
objects = []
with (open("myfile", "rb")) as openfile:
while True:
try:
objects.append(pickle.load(openfile))
except EOFError:
break
There is a read_pickle function as part of pandas 0.22+
import pandas as pd
obj = pd.read_pickle(r'filepath')
The following is an example of how you might write and read a pickle file. Note that if you keep appending pickle data to the file, you will need to continue reading from the file until you find what you want or an exception is generated by reaching the end of the file. That is what the last function does.
import os
import pickle
PICKLE_FILE = 'pickle.dat'
def main():
# append data to the pickle file
add_to_pickle(PICKLE_FILE, 123)
add_to_pickle(PICKLE_FILE, 'Hello')
add_to_pickle(PICKLE_FILE, None)
add_to_pickle(PICKLE_FILE, b'World')
add_to_pickle(PICKLE_FILE, 456.789)
# load & show all stored objects
for item in read_from_pickle(PICKLE_FILE):
print(repr(item))
os.remove(PICKLE_FILE)
def add_to_pickle(path, item):
with open(path, 'ab') as file:
pickle.dump(item, file, pickle.HIGHEST_PROTOCOL)
def read_from_pickle(path):
with open(path, 'rb') as file:
try:
while True:
yield pickle.load(file)
except EOFError:
pass
if __name__ == '__main__':
main()
I developed a software tool that opens (most) Pickle files directly in your browser (nothing is transferred so it's 100% private):
https://pickleviewer.com/ (formerly)
Now it's hosted here: https://fire-6dcaa-273213.web.app/
Edit: Available here if you want to host it somewhere: https://github.com/ch-hristov/Pickle-viewer
Feel free to host this somewhere.
So I got this code that is supposed to sort a dictionary within a json file alphabetically by key:
import json
def values(infile,outfile):
with open(infile):
data=json.load(infile)
data=sorted(data)
with open(outfile,"w"):
json.dump(outfile,data)
values("values.json","values_out.json")
And when I run it I get this error:
AttributeError: 'str' object has no attribute 'read'
I'm pretty sure I messed something up when I made the function but I don't know what.
EDIT: This is what the json file contains:
{"two": 2,"one": 1,"three": 3}
You are using the strings infile and outfile in your json calls, you need to use the file description instance, that you get using as keyword
def values(infile,outfile):
with open(infile) as fic_in:
data = json.load(fic_in)
data = sorted(data)
with open(outfile,"w") as fic_out:
json.dump(data, fic_out)
You can group, with statements
def values(infile, outfile):
with open(infile) as fic_in, open(outfile, "w") as fic_out:
json.dump(sorted(json.load(fic_in)), fic_out)
You forgot to assign the file you opened to a variable. In your current code you open a file, but then try to load the filename rather than the actual file. This code should run because you assign the file object reference to my_file.
import json
def values(infile,outfile):
with open(infile) as my_file:
data=json.load(my_file)
data=sorted(data)
with open(outfile,"w"):
json.dump(outfile,data)
values("values.json","values_out.json")
I have a main pickle file for my dataset and every few time i also have a new pickle file for some new data so i want to append the newly generated pickle file to the main pickle file. The results are not accurate if i use append this way
with open('new_pickle', 'rb') as f:
encoded = pickle.load(f)
with open("encodings.pickle",'ab+') as outfile:
pickle.dump(encoded,outfile)
What I tried but not satisfied because the write mode is not what i want . I need to do this via append mode so that the new results should add to the main pickle i.e encodings.pickle:
with open("encodings.pickle", 'rb') as fa:
encoded1 = pickle.load(fa)
with open('encodings_backup.pickle', 'wb') as fa:
pickle.dump(encoded1,fa)
with open('new_pickle', 'rb') as f:
encoded = pickle.load(f)
encoded1.update(encoded)
with open("encodings.pickle",'wb') as outfile:
pickle.dump(encoded1,outfile)
I want to "debug" my pyomo model. The output of the model.pprint() method looks helpful but it is too long so the console only displays and stores the last lines. How can I see the first lines. And how can I store this output in a file
(I tried pickle, json, normal f.write but since the output of .pprint() is of type NONE I wasn't sucessfull until now. (I am also new to python and learning python and pyomo in parallel).
None of this works :
'''
with open('some_file2.txt', 'w') as f:
serializer.dump(x, f)
import pickle
object = Object()
filehandler = open('some_file', 'wb')
pickle.dump(x, filehandler)
x = str(instance)
x = str(instance.pprint())
f = open('file6.txt', 'w')
f.write(x)
f.write(instance.pprint())
f.close()
Use the filename keyword argument to the pprint method:
instance.pprint(filename='foo.txt')
instance.pprint() prints in the console (stdout for standard output), but does not return the content (the return is None as you said). To have it print in a file, you can try to redirect the standard output to a file.
Try:
import sys
f = open('file6.txt', 'w')
sys.stdout = f
instance.pprint()
f.close()
It looks like there is a cleaner solution from Bethany =)
For me the accepted answer does not work, pprint has a different signature.
help(instance.pprint)
pprint(ostream=None, verbose=False, prefix='') method of pyomo.core.base.PyomoModel.ConcreteModel instance
# working for me:
with open(path, 'w') as output_file:
instance.pprint(output_file)
I have a design question. I have a function loadImage() for loading an image file. Now it accepts a string which is a file path. But I also want to be able to load files which are not on physical disk, eg. generated procedurally. I could have it accept a string, but then how could it know the string is not a file path but file data? I could add an extra boolean argument to specify that, but that doesn't sound very clean. Any ideas?
It's something like this now:
def loadImage(filepath):
file = open(filepath, 'rb')
data = file.read()
# do stuff with data
The other version would be
def loadImage(data):
# do stuff with data
How to have this function accept both 'filepath' or 'data' and guess what it is?
You can change your loadImage function to expect an opened file-like object, such as:
def load_image(f):
data = file.read()
... and then have that called from two functions, one of which expects a path and the other a string that contains the data:
from StringIO import StringIO
def load_image_from_path(path):
with open(path, 'rb') as f:
load_image(f)
def load_image_from_string(s):
sio = StringIO(s)
try:
load_image(sio)
finally:
sio.close()
How about just creating two functions, loadImageFromString and loadImageFromFile?
This being Python, you can easily distinguish between a filename and a data string. I would do something like this:
import os.path as P
from StringIO import StringIO
def load_image(im):
fin = None
if P.isfile(im):
fin = open(im, 'rb')
else:
fin = StringIO(im)
# Read from fin like you would from any open file object
Other ways to do it would be a try block instead of using os.path, but the essence of the approach remains the same.