Checking the input and output file are not the same in Python - python

I have a python script that takes in two arguments, the name of the input and output files, i.e. it starts of like
inputFile=open(sys.argv[1],'r')
outFile=open(sys.argv[2],'w')
Then performs whatever operation reading from inputFile and writing to the outFile.
Now a few times through human error I've accidentally given the same argument twice, the result being that my input file is replaced with a blank line. Is there are a straight-forward way to stop this happening?
I thought it might be as simple as adding
if sys.argv[1]==sys.argv[2]:
inputFile.close()
outFile.close()
immediately after the first lines above, but this already leaves the file blank.

Simply do :
import os
if os.path.realpath(sys.argv[1]) != os.path.realpath(sys.argv[2]):
inputFile=open(sys.argv[1],'r')
outFile=open(sys.argv[2],'w')
else:
raise ValueError('Input and output files are the same')
This will prevent human mistakes by raising a welcomed error that won't destroy your input file.
os.path.realpath will transform any relative path to an absolute path, so that, even if the strings are different, you can raise the error when absolute paths are identical (thanks #Jean-François Fabre for reminding me this)

opening the file for writing immediately truncates the file, so the damage is already done when you compare the strings.
That said:
on windows filesystems, the protection is "built-in" since if the file is open as read mode, it cannot be open as write mode at the same time: good (there's a "grey area" for networked filesystems, though)
on Linux/Unix, the risk is there. But comparing the name isn't enough. What if both different paths point on the same file after all? (consider: foo/bar and /mydrive/foo/bar or foo/../bar and bar)
You could use os.path.realpath() on both files prior to comparing for instance to resolve relative paths that could be different (that wouldn't solve symbolic link problems, but it's better than nothing)
And for the windows "gray area" I was mentionning, comparing the lowercase version of the names would be a good idea.

The input file is becoming blank because open(filename, 'w') overwrites a file with whatever needs to be placed in it. 'w' is useful for file creation and then writing to that file. I'd suggest trying open(filename, 'a') for appending a pre-existing file. I can't quite remember if this creates a file if it's not already existing, but it sounds like you have 2 existing files already, so append should be what you need.
If you decide to go the if sys.argv[1] == sys.argv[2] method, try placing str() around each item you're comparing, just to be certain it's comparing them properly.

Related

Exception in "with" block blanks file opened for writing

This simple code
# This code will BLANK the file 'myfile'!
with open('myfile', 'w') as file:
raise Exception()
rather than merely throwing an exception, deletes all data in "myfile", although no actual write operation is even attempted.
This is dangerous to say the least, and certainly not how other languages treat such situations.
How I can prevent this? Do I have to handle every possible exception in order to be certain that the target file will not be blanked by some unforeseen condition? Surely there must be a standard pattern to solve this problem. And, above all: What is happening here in the first place?
You are opening a file for writing. It is that simple action that blanks the file, regardless of what else you do with it. From the open() function documentation:
'w'
open for writing, truncating the file first
Emphasis mine. In essence, the file is empty because you didn't write anything to it, not because you opened it.
Postpone opening the file to a point where you actually have data to write if you don't want this to happen. Writing a list of strings to a file is not going to cause exceptions at the Python level.
Alternatively, write to a new file, and rename (move) it afterwards to replace the original. Renaming a file as left to the OS.
The statement open('myfile', 'w') will delete all the contents on execution i.e. truncate the file.
If you want to retain the lines you have to use open('myfile', 'a'). Here the a option is for append.
Opening a file for writing erases the contents. Best way to avoid lost of data, not only in case of exceptions, also computer shutdown, etc. is to create a new temporary file and rename the file to the original name, when everything is done.
yourfile = "myfile"
try:
with tempfile.NamedTemporaryFile(dir=os.path.dirname(yourfile) or '.', delete=False) as output:
do_something()
except Exception:
handle_exception()
else:
os.rename(output.name, yourfile)

Python securely remove file

How can I securely remove a file using python? The function os.remove(path) only removes the directory entry, but I want to securely remove the file, similar to the apple feature called "Secure Empty Trash" that randomly overwrites the file.
What function securely removes a file using this method?
You can use srm to securely remove files. You can use Python's os.system() function to call srm.
You can very easily write a function in Python to overwrite a file with random data, even repeatedly, then delete it. Something like this:
import os
def secure_delete(path, passes=1):
with open(path, "ba+") as delfile:
length = delfile.tell()
with open(path, "br+") as delfile:
for i in range(passes):
delfile.seek(0)
delfile.write(os.urandom(length))
os.remove(path)
Shelling out to srm is likely to be faster, however.
You can use srm, sure, you can always easily implement it in Python. Refer to wikipedia for the data to overwrite the file content with. Observe that depending on actual storage technology, data patterns may be quite different. Furthermore, if you file is located on a log-structured file system or even on a file system with copy-on-write optimisation, like btrfs, your goal may be unachievable from user space.
After you are done mashing up the disk area that was used to store the file, remove the file handle with os.remove().
If you also want to erase any trace of the file name, you can try to allocate and reallocate a whole bunch of randomly named files in the same directory, though depending on directory inode structure (linear, btree, hash, etc.) it may very tough to guarantee you actually overwrote the old file name.
So at least in Python 3 using #kindall's solution I only got it to append. Meaning the entire contents of the file were still intact and every pass just added to the overall size of the file. So it ended up being [Original Contents][Random Data of that Size][Random Data of that Size][Random Data of that Size] which is not the desired effect obviously.
This trickery worked for me though. I open the file in append to find the length, then reopen in r+ so that I can seek to the beginning (in append mode it seems like what caused the undesired effect is that it was not actually possible to seek to 0)
So check this out:
def secure_delete(path, passes=3):
with open(path, "ba+", buffering=0) as delfile:
length = delfile.tell()
delfile.close()
with open(path, "br+", buffering=0) as delfile:
#print("Length of file:%s" % length)
for i in range(passes):
delfile.seek(0,0)
delfile.write(os.urandom(length))
#wait = input("Pass %s Complete" % i)
#wait = input("All %s Passes Complete" % passes)
delfile.seek(0)
for x in range(length):
delfile.write(b'\x00')
#wait = input("Final Zero Pass Complete")
os.remove(path) #So note here that the TRUE shred actually renames to file to all zeros with the length of the filename considered to thwart metadata filename collection, here I didn't really care to implement
Un-comment the prompts to check the file after each pass, this looked good when I tested it with the caveat that the filename is not shredded like the real shred -zu does
The answers implementing a manual solution did not work for me. My solution is as follows, it seems to work okay.
import os
def secure_delete(path, passes=1):
length = os.path.getsize(path)
with open(path, "br+", buffering=-1) as f:
for i in range(passes):
f.seek(0)
f.write(os.urandom(length))
f.close()

Python open() modes and file writing

I'm learning PyGTK and I'm making a Text Editor (That seems to be the hello world of pygtk :])
Anyways, I have a "Save" function that writes the TextBuffer to a file. Looks something like
try:
f = open(self.working_file_path, "rw+")
buff = self._get_buffer()
f.write(self._get_text())
#update modified flag
buff.set_modified(False)
f.close()
except IOError as e:
print "File Doesnt Exist so bring up Save As..."
......
Basically, if the file exist, write the buffer to it, if not bring up the Save As Dialog.
My question is: What is the best way to "update" a file. I seem to only be able to append to the end of a file. I've tried various file modes, but I'm sure I'm missing something.
Thanks in advance!
You can open a file in "r+" mode, which allows you to both read and write to the file, and to seek to particular positions and write there. This probably doesn't help you do what I think you want though; it sounds like you're wanting to only write out the changed data?
Remember that on the disk the file isn't stored as a series of extensible lines, it's just a sequence of bytes; some of those bytes indicate line-endings, but the next line follows on immediately. So if you edit the first line in the file and you write the new first line out, unless the new one happens to be exactly the same length as the old one the second line now won't be in the right place, so you'll need to move it (and have taken a copy of it first if the new line you wrote out was longer than the original). And this now means that the next line isn't in the right position either... and so on until you've had to read in and write out the entire rest of the file.
In practice you almost never write only part of an existing file unless you can simply append more data; if you need to "alter" a file you read it in, alter it in memory, and write it back out or you read in the file in pieces (often line by line) and then write out to a new file as you go (and then possibly move the new file over the top of the original). The first approach is easiest, the second is better for not having to hold the whole thing in memory at once.
At the point where you write to the file, your location is at the end of the file, so you need to seek back to the beginning. Then, you will overwrite the file, but this may leave old content at the end, so you also need to truncate the file.
Additionally, the mode you're specifying ('rw+') is invalid, and I get IOErrors when I try to do some operations on files opened with it. I believe that you want mode 'r+' ("Open for reading and writing. The stream is positioned at the beginning of the file."). 'w+' is similar, but would create the file if it didn't exist.
So, what you're looking for might be code like this:
try:
f = open(self.working_file_path, "r+")
buff = self._get_buffer()
f.seek(0)
f.truncate()
f.write(self._get_text())
#update modified flag
buff.set_modified(False)
f.close()
except IOError as e:
print "File Doesnt Exist so bring up Save As..."
......
However, you may want to modify this code to correctly catch and handle errors while truncating and writing the file, rather than assuming that all IOErrors in this section are non-existant-file errors from the call to open.
Read the file in as a list, add an element to the start of it, write it all out. Something like this.
f = open(self.working_file_path, "r+")
flist = f.readlines()
flist.insert(0, self._get_text())
f.seek(0)
f.writelines(flist)

asking a person for a file to save in

What I'm trying to do is to ask a user for a name of a file to make and then save some stuff in this file.
My portion of the program looks like this:
if saving == 1:
ask=raw_input("Type the name file: ")
fileout=open(ask.csv,"w")
fileout.write(output)
I want the format to be .csv, I tried different options but can't seem to work.
The issue here is you need to pass open() a string. ask is a variable that contains a string, but we also want to append the other string ".csv" to it to make it a filename. In python + is the concatenation operator for strings, so ask+".csv" means the contents of ask, followed by .csv. What you currently have is looking for the csv attribute of the ask variable, which will throw an error.
with open(ask+".csv", "w") as file:
file.write(output)
You might also want to do a check first if the user has already typed the extension:
ask = ask if ask.endswith(".csv") else ask+".csv"
with open(ask, "w") as file:
file.write(output)
Note my use of the with statement when opening files. It's good practice as it's more readable and ensures the file is closed properly, even on exceptions.
I am also using the python ternary operator here to do a simple variable assignment based on a condition (setting ask to itself if it already ends in ".csv", otherwise concatenating it).
Also, this is presuming your output is already suitable for a CSV file, the extension alone won't make it CSV. When dealing with CSV data in general, you probably want to check out the csv module.
You need to use ask+'.csv' to concatenate the required extension on to the end of the user input.
However, simply naming the file with a .csv extension is not enough to make it a comma-separated file. You need to format the output. Use csvwriter to do that. The python documentation has some simple examples on how to do this.
I advise you not to attempt to generate the formatted comma-separated output yourself. That's a surprisingly hard task and utterly pointless in the presence of the built-in functionality.
Your variable ask is gonna be of type string after the raw_input.
So, if you want to append the extension .csv to it, you should do:
fileout = open(ask + ".csv", "w")
That should work.

How to copy a JSON file in another JSON file, with Python

I want to copy the contents of a JSON file in another JSON file, with Python
Any ideas ?
Thank you :)
Given the lack of research effort, I normally wouldn't answer, but given the poor suggestions in comments, I'll bite and give a better option.
Now, this largely depends on what you mean, do you wish to overwrite the contents of one file with another, or insert? The latter can be done like so:
with open("from.json", "r") as from, open("to.json", "r") as to:
to_insert = json.load(from)
destination = json.load(to)
destination.append(to_insert) #The exact nature of this line varies. See below.
with open("to.json", "w") as to:
json.dump(to, destination)
This uses python's json module, which allows us to do this very easily.
We open the two files for reading, then open the destination file again in writing mode to truncate it and write to it.
The marked line depends on the JSON data structure, here I am appending it to the root list element (which could not exist), but you may want to place it at a particular dict key, or somesuch.
In the case of replacing the contents, it becomes easier:
with open("from.json", "r") as from, open("to.json", "w") as to:
to.write(from.read())
Here we literally just read the data out of one file and write it into the other file.
Of course, you may wish to check the data is JSON, in which case, you can use the JSON methods as in the first solution, which will throw exceptions on invalid data.
Another, arguably better, solution to this could also be shutil's copy methods, which would avoid actually reading or writing the file contents manually.
Using the with statement gives us the benefit of automatically closing our files - even if exceptions occur. It's best to always use them where we can.
Note that in versions of Python before 2.7, multiple context managers are not handled by the with statement, instead you will need to nest them:
with open("from.json", "r") as from:
with open("to.json", "r+") as to:
...

Categories