save string to a binary file in python - python

I would like to know a very basic thing of Python programming as I am a very basic programmer right now): how can I save a result (either a list, a string, or whatever) to a file in Python?
I've been searching a lot, but I couldn't find any good answer to this.
I was thinking about the ".write ()" method, but (for instance) it seems not working with strings, neither I know what it is supposed to do though.
So, my situation is that I have binary fils, which I would like to edit, therefore I found easy to convert them to strings, modify them, and now I'd like to save them i) back to binary files (jpegs images) and ii) in the folder I want.
How would I do that? Please I need some help.
UPDATE
Here is the script I'm trying to run:
import os, sys
newpath= r'C:/Users/Umberto/Desktop/temporary'
if not os.path.exists (newpath):
os.makedirs (newpath)
data= open ('C:/Users/Umberto/Desktop/Prove_Script/Varie/_BR_Browse.001_2065642654_1.BINARY', 'rb+')
edit_data= str (data.read () )
out_dir= os.path.join (newpath, 'feed', 'address')
data.close ()
# do my edits in a secon time...
edit_data.write (newpath)
edit_data.close ()
The error I get is:
AttributeError: 'str' object has no attribute 'write'
UPDATE_2
I tried to use pickle module to serialize my binary file, modify it and save it at the end, but still not getting it to work... This is what I've been trying so far:
import cPickle as pickle
binary= open ('C:\Users\Umberto\Desktop\Prove_Script\Varie\_BR_Browse.001_2065642654_1.BINARY', 'rb')
out= open ('C:\Users\Umberto\Desktop\Prove_Script\Varie\preview.txt', 'wb')
pickle.dump (binary, out, 1)
TypeError Traceback (most recent call last)
<ipython-input-6-981b17a6ad99> in <module>()
----> 1 pprint.pprint (pickle.dump (binary, out, 1))
C:\Python27\ArcGIS10.1\lib\copy_reg.pyc in _reduce_ex(self, proto)
68 else:
69 if base is self.__class__:
---> 70 raise TypeError, "can't pickle %s objects" % base.__name__
71 state = base(self)
72 args = (self.__class__, base, state)
TypeError: can't pickle file objects
Another thing I didn't get is that if I am supposed to create a file to poit to (in my case I had to create "out", otherwise I wouldn't have the right arguments for the pickle method) or it's not necessary.
Hope I'm getting close to the solution.
P.S.: I tried also with pickle.dumps (), not achieving a nicer result though...

If you're opening a binary file and saving another binary file you could do something like this:
with open('file.jpg', 'rb') as jpgFile:
contents = jpgFile.read()
contents = (some operations here)
with open('file2.jpg', 'wb') as jpgFile:
jpgFile.write(contents)
Some comments:
'rb' and 'wb' means read and write in binary mode respectively. More info on why 'b' is recommended when working with binary files here.
Python's with statement takes care of closing the file when exiting the block.
If you need to save lists, strings or other objects, and retrieving them later, use pickle as others pointed out.

You can use standard python module named "pickle".
You can read about it here: pickle documentation
Read and write any data structure will be very easy
pickle.dump(obj, file_handler) # for serialize object to file
pickle.load(file) # for deserialize from file
or you can serialize to string: pickle.dumps(..) and load from it: pickle.loads(...)

Related

Resolve Attribute Error - Can I do this by defining variable?

I am new to Python and am wondering how to address the following attribute error. I believe I need to define/declare the file variable? Thanks for any suggestions, here is my script:
AttributeError Traceback (most recent call last)
in
51
52 # Write methods to print to Financial_Analysis_Summary
---> 53 file.write("Financial Analysis")
54 file.write("\n")
55 file.write("----------------------------")
AttributeError: 'str' object has no attribute 'write'
From your code and error, I think You've defined the variable 'file' as a string. Aslo there is no attribute write() in the class str. Hence, the reason for this error. For more information, include the whole script i.e., mainly the use of variable 'file'. I think you can use print() to print the above mentioned details or create a new class with a method inside to print your desired things
It looks like you've somehow defined file as a string rather than a file. What you should to is define it thus:
summary_file=open("C:/someFolder/someOtherFolder/Financial_Analysis_Summary.txt",
mode='r+', encoding='utf8')
and then write to it.
The first argument to the open function is the file path. The mode is how you want to access the file: 'r' lets you read the file and nothing else (and throws a FileNotFoundError if the file doesn't yet exist; the others just create it), 'r+' lets you write to the file while leaving its preexisting text in place (although if you write to the middle of the file you'll still overwrite whatever was there), 'w' deletes what was in the file and lets you write to it, 'a' lets you write text only to the end of the file, 'w+' and 'a+' are the same as w and a except they let you read from the file; you can add b to the end of any of these to interact with the file in the form of bytes rather than strings. The encoding should only matter if you plan to use Unicode characters, in which case set it to the same encoding you'll use to view the file (usually 'utf8') to avoid garbling non-ASCII characters.

How to recover information from pkl file that was half-pickled?

I was pickling some variables and the process was interrupted. How do I recover data from the partially pickled file?
Pickled two lists of lists using:
import pickle
import sys
sys.setrecursionlimit(5000) # to get around max depth recursion error
with open('level4_half.pkl', 'wb') as f:
pickle.dump([level4_url, level4_desc], f)
Checked from windows explorer that the file is not empty (158MB)
Tried to unpickle file using:
with open('level4_half.pkl','rb') as f:
level4_url, level4_desc = pickle.load(f)
and encountered the error:
Traceback (most recent call last):
File "<ipython-input-18-32ed3a0e79d4>", line 2, in <module>
level4_url, level4_desc = pickle.load(f)
EOFError
(Previously I've tried to pickle and unpickle (fully pickled) files successfully using the commands above.)
I found a similar question here but I didn't use dill and am not sure if a partially pickled file is considered corrupted. My current technical skill is not so honed as to be able to quickly implement the solution there: "Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information." If this turns out to be the same solution to my question, it will be great to get guidance on how I can "hook all of the load_ methods".
Thank you.

read() from a ExFileObject always cause StreamError exception

I am trying to read only one file from a tar.gz file. All operations over tarfile object works fine, but when I read from concrete member, always StreamError is raised, check this code:
import tarfile
fd = tarfile.open('file.tar.gz', 'r|gz')
for member in fd.getmembers():
if not member.isfile():
continue
cfile = fd.extractfile(member)
print cfile.read()
cfile.close()
fd.close()
cfile.read() always causes "tarfile.StreamError: seeking backwards is not allowed"
I need to read contents to mem, not dumping to file (extractall works fine)
Thank you!
The problem is this line:
fd = tarfile.open('file.tar.gz', 'r|gz')
You don't want 'r|gz', you want 'r:gz'.
If I run your code on a trivial tarball, I can even print out the member and see test/foo, and then I get the same error on read that you get.
If I fix it to use 'r:gz', it works.
From the docs:
mode has to be a string of the form 'filemode[:compression]'
...
For special purposes, there is a second format for mode: 'filemode|[compression]'. tarfile.open() will return a TarFile object that processes its data as a stream of blocks. No random seeking will be done on the file… Use this variant in combination with e.g. sys.stdin, a socket file object or a tape device. However, such a TarFile object is limited in that it does not allow to be accessed randomly, see Examples.
'r|gz' is meant for when you have a non-seekable stream, and it only provides a subset of the operations. Unfortunately, it doesn't seem to document exactly which operations are allowed—and the link to Examples doesn't help, because none of the examples use this feature. So, you have to either read the source, or figure it out through trial and error.
But, since you have a normal, seekable file, you don't have to worry about that; just use 'r:gz'.
In addition to the file mode, I attempted to seek on a network stream.
I had the same error when trying to requests.get the file, so I extracted all to a tmp directory:
# stream == requests.get
inputs = [tarfile.open(fileobj=LZMAFile(stream), mode='r|')]
t = "/tmp"
for tarfileobj in inputs:
tarfileobj.extractall(path=t, members=None)
for fn in os.listdir(t):
with open(os.path.join(t, fn)) as payload:
print(payload.read())

pickle.load Not Working

I got a file that contains a data structure with test results from a Windows user. He created this file using the pickle.dump command. On Ubuntu, I tried to load this test results with the following program:
import pickle
import my_module
f = open('results', 'r')
print pickle.load(f)
f.close()
But I get an error inside pickle module that no module named "my_module".
May the problem be due to corruption in the file, or maybe moving from Widows to Linux is the couse?
The problem lies in pickle's way of handling newline characters. Some of the line feed characters cripple module names in dumped / loaded data.
Storing and loading files in binary mode may help, but I was having trouble with them too. After a long time reading docs and searching I found that pickle handles several different "protocols" for storing data and due to backward compatibility it uses the oldest one: protocol 0 - the original ASCII protocol.
User can select modern protocol by specifing the protocol keyword while storing data in dump file, something like this:
pickle.dump(someObj, open("dumpFile.dmp", 'wb'), protocol=2)
or, by choosing the highest protocol available (currently 2)
pickle.dump(someObj, open("dumpFile.dmp", 'wb'), protocol=pickle.HIGHEST_PROTOCOL)
Protocol version is stored in dump file, so Load() function handles it automaticaly.
Regards
You should open the pickled file in binary mode, especially if you are using pickle on different platforms. See this and this questions for an explanation.

Is there a FileIO in Python?

I know there is a StringIO stream in Python, but is there such a thing as a file stream in Python? Also is there a better way for me to look up these things? Documentation, etc...
I am trying to pass a "stream" to a "writer" object I made. I was hoping that I could pass a file handle/stream to this writer object.
I am guessing you are looking for open(). http://docs.python.org/library/functions.html#open
outfile = open("/path/to/file", "w")
[...]
outfile.write([...])
Documentation on all the things you can do with streams (these are called "file objects" or "file-like objects" in Python): http://docs.python.org/library/stdtypes.html#file-objects
There is a builtin file() which works much the same way. Here are the docs: http://docs.python.org/library/functions.html#file and http://python.org/doc/2.5.2/lib/bltin-file-objects.html.
If you want to print all the lines of the file do:
for line in file('yourfile.txt'):
print line
Of course there is more, like .seek(), .close(), .read(), .readlines(), ... basically the same protocol as for StringIO.
Edit: You should use open() instead of file(), which has the same API - file() goes in Python 3.
In Python, all the I/O operations are wrapped in a hight level API : the file likes objects.
It means that any file likes object will behave the same, and can be used in a function expecting them. This is called duck typing, and for file like objects you can expect the following behavior :
open / close / IO Exceptions
iteration
buffering
reading / writing / seeking
StringIO, File, and all the file like objects can really be replaced with each others, and you don't have to care about managing the I/O yourself.
As a little demo, let's see what you can do with stdout, the standard output, which is a file like object :
import sys
# replace the standar ouput by a real opened file
sys.stdout = open("out.txt", "w")
# printing won't print anything, it will write in the file
print "test"
All the file like objects behave the same, and you should use them the same way :
# try to open it
# do not bother with checking wheter stream is available or not
try :
stream = open("file.txt", "w")
except IOError :
# if it doesn't work, too bad !
# this error is the same for stringIO, file, etc
# use it and your code get hightly flexible !
pass
else :
stream.write("yeah !")
stream.close()
# in python 3, you'd do the same using context :
with open("file2.txt", "w") as stream :
stream.write("yeah !")
# the rest is taken care automatically
Note that a the file like objects methods share a common behavior, but the way to create a file like object is not standard :
import urllib
# urllib doesn't use "open" and doesn't raises only IOError exceptions
stream = urllib.urlopen("www.google.com")
# but this is a file like object and you can rely on that :
for line in steam :
print line
Un last world, it's not because it works the same way that the underlying behavior is the same. It's important to understand what you are working with. In the last example, using the "for" loop on an Internet resource is very dangerous. Indeed, you know is you won't end up with a infinite stream of data.
In that case, using :
print steam.read(10000) # another file like object method
is safer. Hight abstractions are powerful, but doesn't save you the need to know how the stuff works.

Categories