cPickle.dump always dumping at end of file

cPickle.dump always dumping at end of file - python

cPickle.dump(object,file) always dumps at the end of the file. Is there a way to dump at specific position in the file? I expected the following snippet to work
file = open("test","ab")
file.seek(50,0)
cPickle.dump(object, file)
file.close()
However, the above snippet dumps the object at the end of the file (assume file already contains 1000 chars), no matter where I seek the file pointer to.

I think this may be more of a problem with how you open the file than with cPickle.
ab mode, besides being an append mode (which should bear no relevance, since you seek), provides the O_TRUNC flag to the low-level open syscall. If you don't want truncation, you should try the r+ mode.
If this doesn't solve yout problem and your objects are not very large, you can still use dumps:
file = open("test","ab")
file.seek(50,0)
dumped= cPickle.dumps(object)
file.write(dumped)
file.close()

Related

Replacing text in a file [duplicate]

Is it possible to parse a file line by line, and edit a line in-place while going through the lines?

Is it possible to parse a file line by line, and edit a line in-place while going through the lines?
It can be simulated using a backup file as stdlib's fileinput module does.
Here's an example script that removes lines that do not satisfy some_condition from files given on the command line or stdin:
#!/usr/bin/env python
# grep_some_condition.py
import fileinput
for line in fileinput.input(inplace=True, backup='.bak'):
if some_condition(line):
print line, # this goes to the current file
Example:
$ python grep_some_condition.py first_file.txt second_file.txt
On completion first_file.txt and second_file.txt files will contain only lines that satisfy some_condition() predicate.

fileinput module has very ugly API, I find beautiful module for this task - in_place, example for Python 3:
import in_place
with in_place.InPlace('data.txt') as file:
for line in file:
line = line.replace('test', 'testZ')
file.write(line)
main difference from fileinput:
Instead of hijacking sys.stdout, a new filehandle is returned for writing.
The filehandle supports all of the standard I/O methods, not just readline().
Important Notes:
This solution deletes every line in the file if you don't re-write it with the file.write() line.
Also, if the process is interrupted, you lose any line in the file that has not already been re-written.

No. You cannot safely write to a file you are also reading, as any changes you make to the file could overwrite content you have not read yet. To do it safely you'd have to read the file into a buffer, updating any lines as required, and then re-write the file.
If you're replacing byte-for-byte the content in the file (i.e. if the text you are replacing is the same length as the new string you are replacing it with), then you can get away with it, but it's a hornets nest, so I'd save yourself the hassle and just read the full file, replace content in memory (or via a temporary file), and write it out again.

If you only intend to perform localized changes that do not change the length of the part of the file that is modified (e.g. changing all characters to lower case), then you can actually overwrite the old contents of the file dynamically.
To do that, you can use random file access with the seek() method of a file object.
Alternatively, you may be able to use an mmap object to treat the whole file as a mutable string. Keep in mind that mmap objects may impose a maximum file-size limit in the 2-4 GB range on a 32-bit CPU, depending on your operating system and its configuration.

You have to back up by the size of the line in characters. Assuming you used readline, then you can get the length of the line and back up using:
file.seek(offset[, whence])
Set whence to SEEK_CUR, set offset to -length.
See Python Docs or look at the manpage for seek.

Can read() and readlines() work together when reading a file in Python?

I'm a Python beginner, and I'm doing some tests of file operations.
I just read a file with read() and readlines(). Each of them works perfectly, respectively. However, when I add a readlines() to read the appointed file after read(), I surprisingly find that I can't read anything from the file using readlines().
P.S. I tried to switch the places of them, and the latter function can't read anything from the file yet.
So, how do the functions actually work?
Below is my code:
filea = open('/Users/gssflyaway/Documents/web/echarts-2.2.7/LICENSE.TXT')
print filea.readlines()
print '-' * 50
print filea.read()
filea.close()
the result by Pycharm

Files are read from the disk by moving a pointer (like a bookmark so that the file object knows where it left) around. A read operation advances the pointer and if you read the whole file, the pointer will be at the very end of the file. Same applies to both readlines and read. If you want to re-read the file, you can use seek to reset the pointer to the beginning to start a new round.
filea.seek(0)

How to keep writing in a moved file with the same file object?

If I open a file
fileObj = open(test.txt, 'wb+')
and write some stuff in it
fileObj.write(someBytes)
then decide to move it somewhere else
shutil.move('test.txt', '/tempFolder')
and then keep writing in it
fileObj.write(someMoreBytes)
what happens?
A couple observations:
It seems like the file at /tempFolder/test.txt only contains the first set of bytes that were written.
After the file has been moved, it seems like the first set of bytes are deleted from the file object
Subsequent writing on the file object after the file has been moved do not seem to create a new file on disk at test.txt, so what happens with those bytes? They stay in memory in the file object?
Now my main question is: how do I keep the same file object to write on the moved file? Because essentially the file is the same, it has only change location. Or is that not possible?
Thanks for the help!

after moving your file shutil.move('test.txt', '/tempFolder'), and want to continue adding bytes to it, you will need to create a new variable, indicating the new file location.
Since you moved the file to a new locations, fileObj.write(someMoreBytes) is not writing bytes anymore since the object you originally created has been moved. so you would have to reopen a new file to "continue" writing bytes into it or specify the new location as indicated above, to add bytes to the existing file.
For Ex:
import os
f=open('existingfile.txt', 'wb+')
f.write('somebytes')
f.close()
os.rename('currentPath\existingfile.txt', 'NewPath\existingfile.txt')
#reopen file - Repeat

fobject does not know that you moved the file. You could do this by adding
fileObj = open("tempFolder/test.txt", "wb+")
after the move.

Why does pyPdf2.PdfFileReader() require a file object as an input?

csv.reader() doesn't require a file object, nor does open(). Does pyPdf2.PdfFileReader() require a file object because of the complexity of the PDF format, or is there some other reason?

It's just a matter of how the library was written. csv.reader allows any iterable that returns strings (which includes files). open is opening the file, so of course it doesn't take an open file (although it can take an integer pointing at an open file descriptor). Typically, it is better to handle the file separately, usually within a with block so that it is closed properly.
with open('input.pdf', 'rb') as f:
# do something with the file

pypdf can take a BytesIO stream or a file path as well. I actually recommend passing the file path in most cases as pypdf will then take care of closing the file for you.

Python open() modes and file writing

I'm learning PyGTK and I'm making a Text Editor (That seems to be the hello world of pygtk :])
Anyways, I have a "Save" function that writes the TextBuffer to a file. Looks something like
try:
f = open(self.working_file_path, "rw+")
buff = self._get_buffer()
f.write(self._get_text())
#update modified flag
buff.set_modified(False)
f.close()
except IOError as e:
print "File Doesnt Exist so bring up Save As..."
......
Basically, if the file exist, write the buffer to it, if not bring up the Save As Dialog.
My question is: What is the best way to "update" a file. I seem to only be able to append to the end of a file. I've tried various file modes, but I'm sure I'm missing something.
Thanks in advance!

You can open a file in "r+" mode, which allows you to both read and write to the file, and to seek to particular positions and write there. This probably doesn't help you do what I think you want though; it sounds like you're wanting to only write out the changed data?
Remember that on the disk the file isn't stored as a series of extensible lines, it's just a sequence of bytes; some of those bytes indicate line-endings, but the next line follows on immediately. So if you edit the first line in the file and you write the new first line out, unless the new one happens to be exactly the same length as the old one the second line now won't be in the right place, so you'll need to move it (and have taken a copy of it first if the new line you wrote out was longer than the original). And this now means that the next line isn't in the right position either... and so on until you've had to read in and write out the entire rest of the file.
In practice you almost never write only part of an existing file unless you can simply append more data; if you need to "alter" a file you read it in, alter it in memory, and write it back out or you read in the file in pieces (often line by line) and then write out to a new file as you go (and then possibly move the new file over the top of the original). The first approach is easiest, the second is better for not having to hold the whole thing in memory at once.

At the point where you write to the file, your location is at the end of the file, so you need to seek back to the beginning. Then, you will overwrite the file, but this may leave old content at the end, so you also need to truncate the file.
Additionally, the mode you're specifying ('rw+') is invalid, and I get IOErrors when I try to do some operations on files opened with it. I believe that you want mode 'r+' ("Open for reading and writing. The stream is positioned at the beginning of the file."). 'w+' is similar, but would create the file if it didn't exist.
So, what you're looking for might be code like this:
try:
f = open(self.working_file_path, "r+")
buff = self._get_buffer()
f.seek(0)
f.truncate()
f.write(self._get_text())
#update modified flag
buff.set_modified(False)
f.close()
except IOError as e:
print "File Doesnt Exist so bring up Save As..."
......
However, you may want to modify this code to correctly catch and handle errors while truncating and writing the file, rather than assuming that all IOErrors in this section are non-existant-file errors from the call to open.

Read the file in as a list, add an element to the start of it, write it all out. Something like this.
f = open(self.working_file_path, "r+")
flist = f.readlines()
flist.insert(0, self._get_text())
f.seek(0)
f.writelines(flist)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

cPickle.dump always dumping at end of file - python

Related

Replacing text in a file [duplicate]

Can read() and readlines() work together when reading a file in Python?

How to keep writing in a moved file with the same file object?

Why does pyPdf2.PdfFileReader() require a file object as an input?

Python open() modes and file writing

Categories

Resources