I have a program that outputs some lists that I want to store to work with later. For example, suppose it outputs a list of student names and another list of their midterm scores. I can store this output in the following two ways:
Standard File Output way:
newFile = open('trialWrite1.py','w')
newFile.write(str(firstNames))
newFile.write(str(midterm1Scores))
newFile.close()
The pickle way:
newFile = open('trialWrite2.txt','w')
cPickle.dump(firstNames, newFile)
cPickle.dump(midterm1Scores, newFile)
newFile.close()
Which technique is better or preferred? Is there an advantage of using one over the other?
Thanks
I think the csv module might be a good fit here, since CSV is a standard format that can be both read and written by Python (and many other languages), and it's also human-readable. Usage could be as simple as
with open('trialWrite1.py','wb') as fileobj:
newFile = csv.writer(fileobj)
newFile.writerow(firstNames)
newFile.writerow(midterm1Scores)
However, it'd probably make more sense to write one student per row, including their name and score. That can be done like this:
from itertools import izip
with open('trialWrite1.py','wb') as fileobj:
newFile = csv.writer(fileobj)
for row in izip(firstNames, midterm1Scores):
newFile.writerow(row)
pickle is more generic -- it allows you to dump many different kinds of objects to a file for later use. The downside is that the interim storage is not very human-readable, and not in a standard format.
Writing strings to a file, on the other hand, is a much better interface to other activities or code. But it comes at the cost of having to parse the text back into your Python object again.
Both are fine for this simple (list?) data; I would use write( firstNames ) simply because there's no need to use pickle. In general, how to persist your data to the filesystem depends on the data!
For instance, pickle will happily pickle functions, which you can't do by simply writing the string representations.
>>> data = range
<class 'range'>
>>> pickle.dump( data, foo )
# stuff
>>> pickle.load( open( ..., "rb" ) )
<class 'range'.
For a completely different approach, consider that Python ships with SQLite. You could store your data in a SQL database without adding any third-party dependencies.
Related
I have a pickle database which I am reading using the following code
import pickle, pprint
import sys
def main(datafile):
with open(datafile,'rb')as fin:
data = pickle.load(fin)
pprint.pprint(data)
if __name__=='__main__':
if len(sys.argv) != 2:
print "Pickle database file must be given as an argument."
sys.exit()
main(sys.argv[1])
I recognised that it contained a dictionary. I want to delete/edit some values from this dictionary and make a new pickle database.
I am storing the output of this program in a file ( so that I can read the elements in the dictionary and choose which ones to delete) How do I read this file (pprinted data structures) and create a pickle database from it ?
As stated in Python docs pprint is guaranteed to turn objects into valid (in the sense of Python syntax) objects as long as they are representable as Python constants. So first thing is that what you are doing is fine as long as you do it for dicts, lists, numbers, strings, etc. In particular if some value deep down in the dict is not representable as a constant (e.g. a custom object) this will fail.
Now reading the output file should be quite straight forward:
import ast
with open('output.txt') as fo:
data = fo.read()
obj = ast.literal_eval(data)
This is assuming that you keep one object per file and nothing more.
Note that you may use built-in eval instead of ast.literal_eval but that is quite unsafe since eval can run arbitrary Python code.
So I basically just want to have a list of all the pixel colour values that overlap written in a text file so I can then access them later.
The only problem is that the text file is having (set([ or whatever written with it.
Heres my code
import cv2
import numpy as np
import time
om=cv2.imread('spectrum1.png')
om=om.reshape(1,-1,3)
om_list=om.tolist()
om_tuple={tuple(item) for item in om_list[0]}
om_set=set(om_tuple)
im=cv2.imread('RGB.png')
im=cv2.resize(im,(100,100))
im= im.reshape(1,-1,3)
im_list=im.tolist()
im_tuple={tuple(item) for item in im_list[0]}
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Also, if I run this program again but with a different picture for comparison, will it append the data or overwrite it? It's kinda hard to tell when just looking at numbers.
If you replace these lines:
im=cv2.imread('RGB.png')
File= open('Weedlist', 'w')
File.write(str(ColourCount))
with:
import sys
im=cv2.imread(sys.argv[1])
open(sys.argv[1]+'Weedlist', 'w').write(str(list(ColourCount)))
you will get a new file for each input file and also you don't have to overwrite the RGB.png every time you want to try something new.
Files opened with mode 'w' will be overwritten. You can use 'a' to append.
You opened the file with the 'w' mode, write mode, which will truncate (empty) the file when you open it. Use 'a' append mode if you want data to be added to the end each time
You are writing the str() conversion of a set object to your file:
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Don't use str to convert the whole object; format your data to a string you find easy to read back again. You probably want to add a newline too if you want each new entry to be added on a new line. Perhaps you want to sort the data too, since a set lists items in an ordered determined by implementation details.
If comma-separated works for you, use str.join(); your set contains tuples of integer numbers, and it sounds as if you are fine with the repr() output per tuple, so we can re-use that:
with open('Weedlist', 'a') as outputfile:
output = ', '.join([str(tup) for tup in sorted(ColourCount)])
outputfile.write(output + '\n')
I used with there to ensure that the file object is automatically closed again after you are done writing; see Understanding Python's with statement for further information on what this means.
Note that if you plan to read this data again, the above is not going to be all that efficient to parse again. You should pick a machine-readable format. If you need to communicate with an existing program, you'll need to find out what formats that program accepts.
If you are programming that other program as well, pick a format that other programming language supports. JSON is widely supported for example (use the json module and convert your set to a list first; json.dump(sorted(ColourCount), fileobj), then `fileobj.write('\n') to produce newline-separated JSON objects could do).
If that other program is coded in Python, consider using the pickle module, which writes Python objects to a file efficiently in a format the same module can load again:
with open('Weedlist', 'ab') as picklefile:
pickle.dump(ColourCount, picklefile)
and reading is as easy as:
sets = []
with open('Weedlist', 'rb') as picklefile:
while True:
try:
sets.append(pickle.load(output))
except EOFError:
break
See Saving and loading multiple objects in pickle file? as to why I use a while True loop there to load multiple entries.
How would you like the data to be written? Replace the final line by
File.write(str(list(ColourCount)))
Maybe you like that more.
If you run that program, it will overwrite the previous content of the file. If you prefer to apprend the data open the file with:
File= open('Weedlist', 'a')
I have a question. It may be an easy one, but anyway I could not find a good idea. The question is that I have 2 python programs. First of them is giving 2 outputs, one of output is a huge list (like having thousands of another lists) and the other one is a simple csv file for the Weka. I need to store this list (first output) somehow to be able to use it as input of the other program later. I can not just send it to second program because when the first of the program is done, Weka should also produce new output for the second program. Hence, second program has to wait the outputs of first program and Weka.
The problem is that output list consists of lost of lists having numerical values. Simple example could be:
list1 = [[1,5,7],[14,3,27], [19,12,0], [23,8,17], [12,7]]
If I write this on a txt file, then when I try to read it, it takes all the values as string. Is there any easy and fast way (since data is big) to manage somehow taking all the values as integer? Or maybe in the first case, writing it as integer?
I think this is good case to use pickle module
To save data:
import pickle
lst = [[1,5,7],[14,3,27], [19,12,0], [23,8,17], [12,7]]
pickle.dump(lst, open('data.pkl', 'wb'))
To read data from saved file:
import pickle
lst = pickle.load(open('data.pkl', 'r')
From documentation:
The pickle module implements a fundamental, but powerful algorithm for
serializing and de-serializing a Python object structure. “Pickling”
is the process whereby a Python object hierarchy is converted into a
byte stream, and “unpickling” is the inverse operation, whereby a byte
stream is converted back into an object hierarchy. Pickling (and
unpickling) is alternatively known as “serialization”, “marshalling,”
[1] or “flattening”, however, to avoid confusion, the terms used here
are “pickling” and “unpickling”.
there's also faster cPickle module:
To save data:
from cPickle import Pickler
p = Pickler(open('data2.pkl', 'wb'))
p.dump(lst)
To read data from saved file:
from cPickle import Unpickler
up = Unpickler(open('data.pkl', 'r'))
lst = up.load()
How about pickling the list output rather than outputting it as a plaintext representation? Have a look at the documentation for your version: it's basically a way to write Python objects to file, which you can then read from Python at any point to get identical objects.
Once you have the file open that you want to output to, the outputting difference will be quite minor, e.g.
import pickle
my_list = [[1, 2], [134, 76], [798, 5, 2]]
with open('outputfile.pkl', 'wb') as output:
pickle.dump(my_list, output, -1)
And then just use the following way to read it in from your second program:
import pickle
my_list = pickle.load(open('outputfile.pkl', 'rb'))
I have a python script (script 1) which generate a large python dictionary. This dictionary has to be read by an another script (script 2).
Could any one suggest me the best way to write the python dictionary generated by script 1 and to be read by script 2.
In past I have used cPickle to write and read such large dictionaries.
Is there a beter way to do this?
shelve will give you access to each item separately, instead of requiring you to serialize and deserialize the entire dictionary each time.
If you want your dictionary to be readable by different types of scripts (i.e. not just Python), JSON is a good option as well.
It's not as fast as shelve, but it's easy to use and quite readable to the human eye.
import json
with open("/tmp/test.json", "w") as out_handle:
json.dump(my_dict, out_handle) # save dictionary
with open("/tmp/test.json", "r") as in_handle:
my_dict = json.load(in_handle) # load dictionary
I am working with cPickle for the purpose to convert the structure data into datastream format and pass it to the library. The thing i have to do is to read file contents from manually written file name "targetstrings.txt" and convert the contents of file into that format which Netcdf library needs in the following manner,
Note: targetstrings.txt contains latin characters
op=open("targetstrings.txt",'rb')
targetStrings=cPickle.load(op)
The Netcdf library take the contents as strings.
While loading a file it stuck with the following error,
cPickle.UnpicklingError: invalid load key, 'A'.
Please tell me how can I rectify this error, I have googled around but did not find an appropriate solution.
Any suggestions,
pickle is not for reading/writing generic text files, but to serialize/deserialize Python objects to file. If you want to read text data you should use Python's usual IO functions.
with open('targetstrings.txt', 'r') as f:
fileContent = f.read()
If, as it seems, the library just wants to have a list of strings, taking each line as a list element, you just have to do:
with open('targetstrings.txt', 'r') as f:
lines=[l for l in f]
# now in lines you have the lines read from the file
As stated - Pickle is not meant to be used in this way.
If you need to manually edit complex Python objects taht are to be read and passed as Python objects to another function, there are plenty of other formats to use - for example XML, JSON, Python files themselves. Pickle uses a Python specific protocol, that while note being binary (in the version 0 of the protocol), and not changing across Python versions, is not meant for this, and is not even the recomended method to record Python objects for persistence or comunication (although it can be used for those purposes).