In a pickle with pickling in python - python

I have gone through this website and many others but no one seems to give me the simplest possible answer. In the scrip bellow there are 2 different variables that need to be placed into a single pickle (aka 'test1' and 'test2'); but I am wholly unable to get even the simpler one of the two to load. There are no error messages or anything, and it does appear that something is being written to the pickle but then I close the 'program', re open it, try to load the pickle but the value of 'test1' does not change.
The second question is how to save both to the same pickle? at first i tried using the allStuff variable to store both test1 and test2 then dumping allStuff...the dump seems to be a success but loading does jack. Ive tried a variation where you list each file that should be loaded but this just caused a whole lot of errors and caused me to assault my poor old keyboard...
Please Help.
import pickle
class testing():
test1 = 1000
test2 = {'Dogs' : 0, 'Cats' : 0, 'Birds' : 0, 'Mive' : 0}
def saveload():
check = int(input(' 1. Save : 2. Load : 3. Print : 4. Add'))
allStuff = testing.test1, testing.test2
saveFile = 'TestingSaveLoad.data'
if check == 1:
f = open(saveFile, 'wb')
pickle.dump(testing.test1, f)
f.close()
print()
print('Saved.')
testing.saveload()
elif check == 2:
f = open(saveFile, 'rb')
pickle.load(f)
print()
print('Loaded.')
testing.saveload()
elif check == 3:
print(allStuff)
testing.saveload()
else:
testing.test1 += 234
testing.saveload()
testing.saveload()

The pickle.load documentation states:
Read a pickled object representation from the open file object file and return the reconstituted object hierarchy specified therein.
So you would need something like this:
testing.test1 = pickle.load(f)
However, to save and load multiple objects, you can use
# to save
pickle.dump(allStuff, f)
# to load
allStuff = pickle.load(f)
testing.test1, testing.test2 = allStuff

Dump them as a tuple, and when loading, unpack the result back into the two variables.
pickle.dump((testing.test1,testing.test2), f)
and
testing.test1, testing.test2 = pickle.load(f)
Then change the print to print the two items and forget about allStuff, since you would have to keep updating allStuff every time you loaded/reassigned (depending on the type of item you are storing).
print(testing.test1, testing.test2)
I'd also remove the recursive call to saveLoad() and wrap whatever should be repeated in a while loop with an option to exit
if check == 0:
break

You aren't saving the reconstituted pickled object currently. The documentation states that pickle.load() returns the reconstituted object.
You should have something like:
f = open(saveFile, 'rb')
testing.test1 = pickle.load(f)
To save multiple objects, use the approach recommended in this answer:
If you need to save multiple objects, you can simply put them in a single list, or tuple
Also, I recommend using the with keyword to open the file. That will ensure the file is closed even if something goes wrong. An example of a final output:
with open(saveFile, 'wb') as f:
pickle.dump((testing1, testing2), f)
...
with open(saveFile, 'rb') as f:
testing1, testing2 = pickle.load(f) # Implicit unpacking of the tuple
You might also want a while loop instead of the multiple calls to saveload() - it will be a bit cleaner. Note that right now you have no way out of your loop, short of quitting the program.

Related

append variables to a pickle file and read them

I'm trying to append several variables to a pickle file to read them later. But it doesn't work as I expected. I would expect that at the end of this script c='A' and d='B' but instead it thows me an error. Could you please explain me why and how to get what I want? Many thanks
import pickle
filename = 'test.pkl'
a = 'A'
b = 'B'
with open(filename, 'wb') as handle:
pickle.dump(a, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open(filename, 'ab') as handle:
pickle.dump(b, handle)
with open(filename, 'rb') as filehandle:
c,d = pickle.load(filehandle)
After running your code, I got ValueError: not enough values to unpack (expected 2, got 1).
If you run help(pickle.load), it will tell you that it only loads objects from the file. If you have multiple objects in the file, you have to call pickle.load multiple times to read the objects sequentially.
Your issue is basically you stored them as 2 separate objects but are attempting to read them as a single tuple.
The problem is that pickle.load(filehandle) only selects the first object. The most common way to solve this problem is to use a tuple or list. So basically how this works is that you pickle one object, which can be the first object, and then decompose it later. So you would do this:
import pickle
filename = 'test.pkl'
a = 'A'
b = 'B'
List = (a,b)
with open(filename, 'wb') as handle:
pickle.dump(List, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open(filename, 'rb') as filehandle:
c,d = pickle.load(filehandle)
Short answer: Every load must corresponds to a single dump in the pickling code. You can't have a load that gets the values from two separate dump calls, they need to match
So you can either:
load twice to match two dump calls:
# dump code unchanged
with open(filename, 'rb') as filehandle:
c = pickle.load(filehandle) # Load first object
d = pickle.load(filehandle) # Load second object
dump as a single object so it can be loaded as a single object:
with open(filename, 'wb') as handle:
# dump a simple anonymous tuple of both objects
pickle.dump((a, b), handle, protocol=pickle.HIGHEST_PROTOCOL)
# Original load code unchanged
Since you clearly know exactly how many objects must be dumped, either solution works; I'd choose #2 in most cases, unless the two objects are computed in wildly different places in the code and not preserved, requiring the first object to be serialized and discarded before the second exists.

Best way to update a json file as data is coming in

I am running a loop with data coming in and writing the data to a json file. Here's what it looks like in a minimal verifiable concrete example.
import json
import random
import string
dc_master= {}
for i in range(100):
# Below mimics an API call that returns new data.
name = ''.join(random.choice(string.ascii_uppercase) for _ in range(15))
dc_info= {}
dc_info['Height'] = 'NA'
dc_master[name] = dc_info
with open("myfile.json", "w") as filehandle:
filehandle.write(json.dumps(dc_master))
As you can see from the above, every time it loops, it creates a new dc_info. That becomes the value of the new key-value pair (with the key being the name) that gets written to the json file.
The one disadvantage of the above is that when it fails and I restart again, I have to do it from the very beginning. Should I do a open for reading of the json file to dc_master, then add a name:dc_info to the dictionary, followed by writing the dc_master back to the json file at every turn of the loop? Should I just append to the json file even if it's a duplicate and let the fact that when I need to use it, I will load it back into a dictionary and that takes care of duplicates automatically?
Additional information: There are occasionally timeouts, so I want to be able to start somewhere in the middle if needed. Number of key value pairs in the dc_info is about 30 and number of overall name:dc_info pairs is about 1000. So it's not huge. Reading it out and writing it back in again is not onerous. But I do like to know if there's a more efficient way of doing it.
I think the full script of fetching and storing API results should look like example code below. At least I always do same code for long-processing set of tasks.
I put each result of API call as a separate JSON single line in result file.
Script may be stopped in the middle e.g. due to exception, file will be correctly closed and flushed thanks to with manager. Then on restart script will read already processed result lines from file.
Only those results that have not being processed already (if their id not in processed_ids) should and will be fetched from API. id-field may be anything that identifies uniquely each API call result.
Each next result will be appended to json-lines file thanks to a mode (append mode). buffering specifies write-buffer size in bytes, file will be flushed and written in this size of blocks, this is not to stress disk with sequent one-line-100-bytes writes. Using large buffering is totally alright because Python's with block correctly flushes and writes out all bytes whenever block exits due to exception or any other reason, so you'll never lose even a single small result or byte that already has being written by f.write(...).
Final results will be printed to console.
Because your task is very interesting and important (at least I had similar tasks many times), I've decided to also implement multi-threaded version of the single-threaded code located below, it is aspecially needed for the case of fetching data from Internet, as it is usually necessary to download data in several parallel threads. Multi-threaded version can be found and run here and here. This multi-threading can be extended to multi-processing too for efficiency by using ideas from my another answer.
Next is single-threaded version of code:
Try next code online here!
import json, os, random, string
fname = 'myfile.json'
enc = 'utf-8'
id_field = 'id'
def ReadResults():
results = []
processed_ids = set()
if os.path.exists(fname):
with open(fname, 'r', encoding = enc) as f:
data = f.read()
results = [json.loads(line) for line in data.splitlines() if line.strip()]
processed_ids = {r[id_field] for r in results}
return (results, processed_ids)
# First read already processed elements
results, processed_ids = ReadResults()
with open(fname, 'a', buffering = 1 << 20, encoding = enc) as f:
for id_ in range(100):
# !!! Only process ids that are not in processed_ids !!!
if id_ in processed_ids:
continue
# Below mimics an API call that returns new data.
# Should fetch only those objects that correspond to id_.
name = ''.join(random.choice(string.ascii_uppercase) for _ in range(15))
# Fill necessary result fields
result = {}
result['id'] = id_
result['name'] = name
result['field0'] = 'value0'
result['field1'] = 'value1'
cid = result[id_field] # There should be some unique id field
assert cid not in processed_ids, f'Processed {cid} twice!'
f.write(json.dumps(result, ensure_ascii = False) + '\n')
results.append(result)
processed_ids.add(cid)
print(ReadResults()[0])
I think you're fine and I'd loop over the whole thing and keep writing to the file, as it's cheap.
As for retries, you would have to check for a timeout and then see if the JSON file is already there, load it up, count your keys and then fetch the missing number of entries.
Also, your example can be simplified a bit.
import json
import random
import string
dc_master = {}
for _ in range(100):
name = ''.join(random.choice(string.ascii_uppercase) for _ in range(15))
dc_master.update({name: {"Height": "NA"}})
with open("myfile.json", "w") as jf:
json.dump(dc_master, jf, sort_keys=True, indent=4)
EDIT:
On second thought, you probably want to use a JSON list instead of a dictionary as the top level element, so it's easier to check how much you've got already.
import json
import os
import random
import string
output_file = "myfile.json"
max_entries = 100
dc_master = []
def do_your_stuff(data_container, n_entries=max_entries):
for _ in range(n_entries):
name = ''.join(random.choice(string.ascii_uppercase) for _ in range(15))
data_container.append({name: {"Height": "NA"}})
return data_container
def dump_data(data, file_name):
with open(file_name, "w") as jf:
json.dump(data, jf, sort_keys=True, indent=4)
if not os.path.isfile(output_file):
dump_data(do_your_stuff(dc_master), output_file)
else:
with open(output_file) as f:
data = json.load(f)
if len(data) < max_entries:
new_entries = max_entries - len(data)
dump_data(do_your_stuff(data, new_entries), output_file)
print(f"Added {new_entries} entries.")
else:
print("Nothing to update.")

Pickle dump replaces current file data

When I use pickle, it works fine and I can dump any load.
The problem is if I close the program and try to dump again, it replaces the old file data with the new dumping. Here is my code:
import pickle
import os
import time
dictionary = dict()
def read():
with open('test.txt', 'rb') as f:
a = pickle.load(f)
print(a)
time.sleep(2)
def dump():
chs = raw_input('name and number')
n = chs.split()
dictionary[n[0]] = n[1]
with open('test.txt', 'wb') as f:
pickle.dump(dictionary, f)
Inpt = raw_input('Option : ')
if Inpt == 'read':
read()
else:
dump()
When you open a file in w mode (or wb), that tells it to write a brand-new file, erasing whatever was already there.
As the docs say:
The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending…
In other words, you want to use 'ab', not 'wb'.
However, when you append new dumps to the same file, you end up with a file made up of multiple separate values. If you only call load once, it's just going to load the first one. If you want to load all of them, you need to write code that does that. For example, you can load in a loop until EOFError.
Really, it looks like what you're trying to do is not to append to the pickle file, but to modify the existing pickled dictionary.
You could do that with a function that loads and merges all of the dumps together, like this:
def Load():
d = {}
with open('test.txt', 'rb') as f:
while True:
try:
a = pickle.load(f)
except EOFError:
break
else:
d.update(a)
# do stuff with d
But that's going to get slower and slower the more times you run your program, as you pile on more and more copies of the same values. To do that right you need to load the old dictionary, modify that, and then dump the modified version. And for that, you want w mode.
However, a much better way to persist a dictionary, at least if the keys are strings, is to use dbm (if the values are also strings) or shelve (otherwise) instead of a dictionary in the first place.
Opening a file in "wb" mode truncates the file -- that is, it deletes the contents of the file, and then allows you to work on it.
Usually, you'd open the file in append ("ab") mode to add data at the end. However, Pickle doesn't support appending, so you'll have to save your data to a new file (come up with a different file name -- ask the user or use a command-line parameter such as -o test.txt?) each time the program is run.
On a related topic, don't use Pickle. It's unsafe. Consider using JSON instead (it's in the standard lib -- import json).

Store a dictionary in a file for later retrieval

I've had a search around but can't find anything regarding this...
I'm looking for a way to save a dictionary to file and then later be able to load it back into a variable at a later date by reading the file.
The contents of the file don't have to be "human readable" it can be as messy as it wants.
Thanks
- Hyflex
EDIT
import cPickle as pickle
BDICT = {}
## Automatically generated START
name = "BOB"
name_title = name.title()
count = 5
BDICT[name_title] = count
name = "TOM"
name_title = name.title()
count = 5
BDICT[name_title] = count
name = "TIMMY JOE"
name_title = name.title()
count = 5
BDICT[name_title] = count
## Automatically generated END
if BDICT:
with open('DICT_ITEMS.txt', 'wb') as dict_items_save:
pickle.dump(BDICT, dict_items_save)
BDICT = {} ## Wiping the dictionary
## Usually in a loop
firstrunDICT = True
if firstrunDICT:
with open('DICT_ITEMS.txt', 'rb') as dict_items_open:
dict_items_read = dict_items_open.read()
if dict_items_read:
BDICT = pickle.load(dict_items_open)
firstrunDICT = False
print BDICT
Error:
Traceback (most recent call last):
File "C:\test3.py", line 35, in <module>
BDICT = pickle.load(dict_items_open)
EOFError
A few people have recommended shelve - I haven't used it, and I'm not knocking it. I have used pickle/cPickle and I'll offer the following approach:
How to use Pickle/cPickle (the abridged version)...
There are many reasons why you would use Pickle (or its noticable faster variant, cPickle). Put tersely Pickle is a way to store objects outside of your process.
Pickle not only gives you the options to store objects outside your python process, but also does so in a serialized fashion. Meaning, First In, First Out behavior (FIFO).
import pickle
## I am making up a dictionary here to show you how this works...
## Because I want to store this outside of this single run, it could be that this
## dictionary is dynamic and user based - so persistance beyond this run has
## meaning for me.
myMadeUpDictionary = {"one": "banana", "two": "banana", "three": "banana", "four": "no-more"}
with open("mySavedDict.txt", "wb") as myFile:
pickle.dump(myMadeUpDictionary, myFile)
So what just happened?
Step1: imported a module named 'pickle'
Step2: created my dictionary object
Step3: used a context manager to handle the opening/closing of a new file...
Step4: dump() the contents of the dictionary (which is referenced as 'pickling' the object) and then write it to a file (mySavedDict.txt).
If you then go into the file that was just created (located now on your filesystem), you can see the contents. It's messy - ugly - and not very insightlful.
nammer#crunchyQA:~/workspace/SandBox/POSTS/Pickle & cPickle$ cat mySavedDict.txt
(dp0
S'four'
p1
S'no-more'
p2
sS'three'
p3
S'banana'
p4
sS'two'
p5
g4
sS'one'
p6
g4
s.
So what's next?
To bring that BACK into our program we simply do the following:
import pickle
with open("mySavedDict.txt", "rb") as myFile:
myNewPulledInDictionary = pickle.load(myFile)
print myNewPulledInDictionary
Which provides the following return:
{'four': 'no-more', 'one': 'banana', 'three': 'banana', 'two': 'banana'}
cPickle vs Pickle
You won't see many people use pickle these days - I can't think off the top of my head why you would want to use the first implementation of pickle, especially when there is cPickle which does the same thing (more or less) but a lot faster!
So you can be lazy and do:
import cPickle as pickle
Which is great if you have something already built that uses pickle... but I argue that this is a bad recommendation and I fully expect to get scolded for even recommending that! (you should really look at your old implementation that used the original pickle and see if you need to change anything to follow cPickle patterns; if you have legacy code or production code you are working with, this saves you time refactoring (finding/replacing all instances of pickle with cPickle).
Otherwise, just:
import cPickle
and everywhere you see a reference to the pickle library, just replace accordingly. They have the same load() and dump() method.
Warning Warning I don't want to write this post any longer than it is, but I seem to have this painful memory of not making a distinction between load() and loads(), and dump() and dumps(). Damn... that was stupid of me! The short answer is that load()/dump() does it to a file-like object, wheres loads()/dumps() will perform similar behavior but to a string-like object (read more about it in the API, here).
Again, I haven't used shelve, but if it works for you (or others) - then yay!
RESPONSE TO YOUR EDIT
You need to remove the dict_items_read = dict_items_open.read() from your context-manager at the end. The file is already open and read in. You don't read it in like you would a text file to pull out strings... it's storing pickled python objects. It's not meant for eyes! It's meant for load().
Your code modified... works just fine for me (copy/paste and run the code below and see if it works). Notice near the bottom I've removed your read() of the file object.
import cPickle as pickle
BDICT = {}
## Automatically generated START
name = "BOB"
name_title = name.title()
count = 5
BDICT[name_title] = count
name = "TOM"
name_title = name.title()
count = 5
BDICT[name_title] = count
name = "TIMMY JOE"
name_title = name.title()
count = 5
BDICT[name_title] = count
## Automatically generated END
if BDICT:
with open('DICT_ITEMS.txt', 'wb') as dict_items_save:
pickle.dump(BDICT, dict_items_save)
BDICT = {} ## Wiping the dictionary
## Usually in a loop
firstrunDICT = True
if firstrunDICT:
with open('DICT_ITEMS.txt', 'rb') as dict_items_open:
BDICT = pickle.load(dict_items_open)
firstrunDICT = False
print BDICT
Python has the shelve module for this. It can store many objects in a file that can be opened up later and read in as objects, but it's operating system-dependent.
import shelve
dict1 = #dictionary
dict2 = #dictionary
#flags:
# c = create new shelf; this can't overwrite an old one, so delete the old one first
# r = read
# w = write; you can append to an old shelf
shelf = shelve.open("filename", flag="c")
shelf['key1'] = dict1
shelf['key2'] = dict2
shelf.close()
#reading:
shelf = shelve.open("filename", flag='r')
for key in shelf.keys():
newdict = shelf[key]
#do something with it
shelf.close()
You can also use Pickle for this task. Here's a blog post that explains how to do it.
What you are looking for is shelve.
Two functions which create a text file for saving a dictionary and loading a dictionary (which was already saved before) for use again.
import pickle
def SaveDictionary(dictionary,File):
with open(File, "wb") as myFile:
pickle.dump(dictionary, myFile)
myFile.close()
def LoadDictionary(File):
with open(File, "rb") as myFile:
dict = pickle.load(myFile)
myFile.close()
return dict
These functions can be called through :
SaveDictionary(mylib.Members,"members.txt") # saved dict. in a file
members = LoadDictionary("members.txt") # opened dict. of members

pickle - putting more than 1 object in a file? [duplicate]

This question already has answers here:
Saving and loading multiple objects in pickle file?
(8 answers)
Closed 6 years ago.
I have got a method which dumps a number of pickled objects (tuples, actually) into a file.
I do not want to put them into one list, I really want to dump several times into the same file.
My problem is, how do I load the objects again?
The first and second object are just one line long, so this works with readlines.
But all the others are longer.
naturally, if I try
myob = cpickle.load(g1.readlines()[2])
where g1 is the file, I get an EOF error because my pickled object is longer than one line.
Is there a way to get just my pickled object?
If you pass the filehandle directly into pickle you can get the result you want.
import pickle
# write a file
f = open("example", "w")
pickle.dump(["hello", "world"], f)
pickle.dump([2, 3], f)
f.close()
f = open("example", "r")
value1 = pickle.load(f)
value2 = pickle.load(f)
f.close()
pickle.dump will append to the end of the file, so you can call it multiple times to write multiple values.
pickle.load will read only enough from the file to get the first value, leaving the filehandle open and pointed at the start of the next object in the file. The second call will then read the second object, and leave the file pointer at the end of the file. A third call will fail with an EOFError as you'd expect.
Although I used plain old pickle in my example, this technique works just the same with cPickle.
I think the best way is to pack your data into a single object before you store it, and unpack it after loading it. Here's an example using
a tuple as the container(you can use dict also):
a = [1,2]
b = [3,4]
with open("tmp.pickle", "wb") as f:
pickle.dump((a,b), f)
with open("tmp.pickle", "rb") as f:
a,b = pickle.load(f)
Don't try reading them back as lines of the file, justpickle.load()the number of objects you want. See my answer to the question How to save an object in Python for an example of doing that.

Categories