i am currently creating a file on run of my application using the simple method
file = open('myfile.dat', 'w+')
however i have noticed that this is overwritting the file on each run, what i want to do is if it already exsists, create a new file called myfilex.dat where x is the number of previous copies of the file, is there a quick and effective way of doing this ?
Thanks :)
EDIT : I know how to check it already exists using the os.path.exists function, but i am am asking if it does exist how can i apend the number of versions on the end easy if that makes sense sorry if it does not
You could use a timestamp, so that each time you will execute the program it will write to a different file:
import time
file = open('myfile.%d.dat' % time.time(), 'w+')
You can do two things, either Open with append that is file = open('myfile.dat', 'a') or check if file exists and give user option to overwrite. Python have number of option. You can check this question for enlightment
How do I check whether a file exists using Python?
Consider
import os
def build_filename(name, num=0):
root, ext = os.path.splitext(name)
return '%s%d%s' % (root, num, ext) if num else name
def find_next_filename(name, max_tries=20):
if not os.path.exists(name): return name
else:
for i in range(max_tries):
test_name = build_filename(name, i+1)
if not os.path.exists(test_name): return test_name
return None
If your filename doesn't exist, it'll return your filename.
If your filename does exist, it'll try rootX.extension where root and extension are determined by os.path.splittext and X is an integer, starting at 1 and ending at max_tries (I had it default to 20, but you could change the default or pass a different argument).
If no file can be found, the function returns None.
Note, there are still race conditions present here (a file is created by another process with a clashing name after your check), but its what you said you wanted.
# When the files doesn't exist
print find_next_filename('myfile.dat') # myfile.dat
# When the file does exist
print find_next_filename('myfile.dat') # myfile1.dat
# When the file does exist, as does "1" and "2"
print find_next_filename('myfile.dat') # myfile3.dat
Nothing particularly quick, but effective? Sure! I'm used to a backup system where I do:
filename.ext
filename-1.ext # older
filename-2.ext # older still
filename-3.ext # even older
This is slightly harder than what you want to do. You want filename-N.ext to be the NEWEST file! Let's use glob to see how many files match that name, then make a new one!
from glob import glob
import os.path
num_files = len(glob.glob(os.path.join(root, head, filename + "*", ext)))
# where:
# root = r"C:\"
# head = r"users\username\My Documents"
# filename = "myfile"
# ext = "dat"
if num_files = 0:
num_files = "" # handles the case where file doesn't exist AT ALL yet
with open(os.path.join(root, head, filename + str(num_files), ext), 'w+'):
do_stuff_to_file()
Here is a few solutions for everyone experiencing a similar problem.
Keep YOUR program from overwiting data:
with open('myfile.txt', 'a') as myfile:
myfile.write('data')
Note: I believe that a+ (not a) allows for reading and writing, but I'm not 100% sure.
Prevent ALL programs from overwriting your data (by setting it to read-only):
from os import chmod
from stat import S_IREAD
chmod('path_to_file', IREAD)
Note: both of these modules are built-in to Python (at least Python 3.10.4) so no need to use pip.
Note 2: Setting it to read-only is not the best idea, as programs can set it back. I would combine this with a hash and/or signature to verify the file has not been tampered with to 'invalidate' the data inside and require the user to re-generate the file (eg, to store any temporary but very important data like decryption keys after generating them before deleting them).
Just check to see if your file already exists then?
name = "myfile"
extension =".dat"
x = 0
fileName = name + extension
while(!os.path.exists(fileName)):
x = x + 1
fileName = name + x + extension
file = open(fileName, 'w+')
Related
You know how when you download something and the downloads folder contains a file with the same name, instead of overwriting it or throwing an error, the file ends up with a number appended to the end? For example, if I want to download my_file.txt, but it already exists in the target folder, the new file will be named my_file(2).txt. And if I try again, it will be my_file(3).txt.
I was wondering if there is a way in Python 3.x to check that and get a unique name (not necessarily create the file or directory). I'm currently implementing it doing this:
import os
def new_name(name, newseparator='_')
#name can be either a file or directory name
base, extension = os.path.splitext(name)
i = 2
while os.path.exists(name):
name = base + newseparator + str(i) + extension
i += 1
return name
In the example above, running new_file('my_file.txt') would return my_file_2.txt if my_file.txt already exists in the cwd. name can also contain the full or relative path, it will work as well.
I would use PathLib and do something along these lines:
from pathlib import Path
def new_fn(fn, sep='_'):
p=Path(fn)
if p.exists():
if not p.is_file():
raise TypeError
np=p.resolve(strict=True)
parent=str(np.parent)
extens=''.join(np.suffixes) # handle multiple ext such as .tar.gz
base=str(np.name).replace(extens,'')
i=2
nf=parent+base+sep+str(i)+extens
while Path(nf).exists():
i+=1
nf=parent+base+sep+str(i)+extens
return nf
else:
return p.parent.resolve(strict=True) / p
This only handles files as written but the same approach would work with directories (which you added later.) I will leave that as a project for the reader.
Another way of getting a new name would be using the built-in tempfile module:
from pathlib import Path
from tempfile import NamedTemporaryFile
def new_path(path: Path, new_separator='_'):
prefix = str(path.stem) + new_separator
dir = path.parent
suffix = ''.join(path.suffixes)
with NamedTemporaryFile(prefix=prefix, suffix=suffix, delete=False, dir=dir) as f:
return f.name
If you execute this function from within Downloads directory, you will get something like:
>>> new_path(Path('my_file.txt'))
'/home/krassowski/Downloads/my_file_90_lv301.txt'
where the 90_lv301 part was generated internally by the Python's tempfile module.
Note: with the delete=False argument, the function will create (and leave undeleted) an empty file with the new name. If you do not want to have an empty file created that way, just remove the delete=False, however keeping it will prevent anyone else from creating a new file with such name before your next operation (though they could still overwrite it).
Simply put, having delete=False prevents concurrency issues if you (or the end-user) were to run your program twice at the same time.
I have a script that downloads files (pdfs, docs, etc) from a predetermined list of web pages. I want to edit my script to alter the names of files with a trailing _x if the file name already exists, since it's possible files from different pages will share the same filename but contain different contents, and urlretrieve() appears to automatically overwrite existing files.
So far, I have:
urlfile = 'https://www.foo.com/foo/foo/foo.pdf'
filename = urlfile.split('/')[-1]
filename = foo.pdf
if os.path.exists(filename):
filename = filename('.')[0] + '_' + 1
That works fine for one occurrence, but it looks like after one foo_1.pdf it will start saving as foo_1_1.pdf, and so on. I would like to save the files as foo_1.pdf, foo_2.pdf, and so on.
Can anybody point me in the right direction on how to I can ensure that file names are stored in the correct fashion as the script runs?
Thanks.
So what you want is something like this:
curName = "foo_0.pdf"
while os.path.exists(curName):
num = int(curName.split('.')[0].split('_')[1])
curName = "foo_{}.pdf".format(str(num+1))
Here's the general scheme:
Assume you start from the first file name (foo_0.pdf)
Check if that name is taken
If it is, iterate the name by 1
Continue looping until you find a name that isn't taken
One alternative: Generate a list of file numbers that are in use, and update it as needed. If it's sorted you can say name = "foo_{}.pdf".format(flist[-1]+1). This has the advantage that you don't have to run through all the files every time (as the above solution does). However, you need to keep the list of numbers in memory. Additionally, this will not fill any gaps in the numbers
Why not just use the tempfile module:
fileobj = tempfile.NamedTemporaryFile(suffix='.pdf', prefix='', delete = False)
Now your filename will be available in fileobj.name and you can manipulate to your heart's content. As an added benefit, this is cross-platform.
Since you're dealing with multiple pages, this seeems more like a "global archive" than a per-page archive. For a per-page archive, I would go with the answer from #wnnmaw
For a global archive, I would take a different approch...
Create a directory for each filename
Store the file in the directory as "1" + extension
write the current "number" to the directory as "_files.txt"
additional files are written as 2,3,4,etc and increment the value in _files.txt
The benefits of this:
The directory is the original filename. If you keep turning "Example-1.pdf" into "Example-2.pdf" you run into a possibility where you download a real "Example-2.pdf", and can't associate it to the original filename.
You can grab the number of like-named files either by reading _files.txt or counting the number of files in the directory.
Personally, I'd also suggest storing the files in a tiered bucketing system, so that you don't have too many files/directories in any one directory (hundreds of files makes it annoying as a user, thousands of files can affect OS performance ). A bucketing system might turn a filename into a hexdigest, then drop the file into `/%s/%s/%s" % ( hex[0:3], hex[3:6], filename ). The hexdigest is used to give you a more even distribution of characters.
import os
def uniquify(path, sep=''):
path = os.path.normpath(path)
num = 0
newpath = path
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
while os.path.exists(newpath):
newpath = os.path.join(dirname, '{f}{s}{n:d}{e}'
.format(f=filename, s=sep, n=num, e=ext))
num += 1
return newpath
filename = uniquify('foo.pdf', sep='_')
Possible problems with this include:
If you call to uniquify many many thousands of times with the same
path, each subsequent call may get a bit slower since the
while-loop starts checking from num=0 each time.
uniquify is vulnerable to race conditions whereby a file may not
exist at the time os.path.exists is called, but may exist at the
time you use the value returned by uniquify. Use
tempfile.NamedTemporaryFile to avoid this problem. You won't get
incremental numbering, but you will get files with unique names,
guaranteed not to already exist. You could use the prefix parameter to
specify the original name of the file. For example,
import tempfile
import os
def uniquify(path, sep='_', mode='w'):
path = os.path.normpath(path)
if os.path.exists(path):
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
return tempfile.NamedTemporaryFile(prefix=filename+sep, suffix=ext, delete=False,
dir=dirname, mode=mode)
else:
return open(path, mode)
Which could be used like this:
In [141]: f = uniquify('/tmp/foo.pdf')
In [142]: f.name
Out[142]: '/tmp/foo_34cvy1.pdf'
Note that to prevent a race-condition, the opened filehandle -- not merely the name of the file -- is returned.
I have a python script that is trying to compare two files to each other and output the difference. However I am not sure what exactly is going on as when I run the script it gives me an error as
NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\api\\API_TEST\\Apis.os\\*.*'
I dont know why it is appending * . * at the end of the file extention.
This is currently my function:
def CheckFilesLatest(self, previous_path, latest_path):
for filename in os.listdir(latest_path):
previous_filename = os.path.join(previous_path, filename)
latest_filename = os.path.join(latest_path, filename)
if self.IsValidOspace(latest_filename):
for os_filename in os.listdir(latest_filename):
name, ext = os.path.splitext(os_filename)
if ext == ".os":
previous_os_filename = os.path.join(previous_filename, os_filename)
latest_os_filename = os.path.join(latest_filename, os_filename)
if os.path.isfile(latest_os_filename) == True:
# If the file exists in both directories, check if the files are different; otherwise mark the contents of the latest file as added.
if os.path.isfile(previous_os_filename) == True:
self.GetFeaturesModified(previous_os_filename, latest_os_filename)
else:
self.GetFeaturesAdded(latest_os_filename)
else:
if os.path.isdir(latest_filename):
self.CheckFilesLatest(previous_filename, latest_filename)
Any thoughts on why it cant scan the directory and look for an os file for example?
It is failing on line:
for os_filename in os.listdir(latest_filename):
The code first gets called from
def main():
for i in range(6, arg_length, 2):
component = sys.argv[i]
package = sys.argv[i+1]
previous_source_dir = os.path.join(previous_path, component, package)
latest_source_dir = os.path.join(latest_path, component, package)
x.CheckFilesLatest(previous_source_dir, latest_source_dir)
x.CheckFilesPrevious(previous_source_dir, latest_source_dir)
Thank you
os.listdir() requires that the latest_path argument be a directory as you have stated. However, latest_path is being passed in as an argument. Thus, you need to look at the code that actually creates latest_path in order to determine why the '.' is being put in. Since you are calling it recursively, first check the original call (the first time). It would appear that your base code that calls CheckFilesLatest() is trying to set up the search command to find all files within the directory 'C:\api\API_TEST\Apis.os' You would need to split out the file indicator first and then do the check.
If you want to browse a directory recursively, using os.walk would be better and simpler than your complex handling with recursive function calls. Take a look at the docs: http://docs.python.org/2/library/os.html#os.walk
I need to update a file. I read it in and write it out with changes. However, I'd prefer to write to a temporary file and rename it into place.
temp = tempfile.NamedTemporaryFile()
tempname = temp.name
temp.write(new_data)
temp.close()
os.rename(tempname, data_file_name)
The problem is that tempfile.NamedTemporaryFile() makes the temporary file in /tmp which is a different file system. This means os.rename() fails. If I use shlib.move() instead then I don't have the atomic update that mv provides (for files in the same file system, yadda, yadda, etc.)
I know tempfile.NamedTemporaryFile() takes a dir parameter, but data_file_name might be "foo.txt" in which case dir='.'; or data_file_name might be "/path/to/the/data/foo.txt" in which case dir="/path/to/the/data".
What I'd really like is the temp file to be data_file_name + "some random data". This would have the benefit of failing in a way that would leave behind useful clues.
Suggestions?
You can use:
prefix to make the temporary file begin with the same name as the
original file.
dir to specify where to place the temporary file.
os.path.split to split the directory from the filename.
import tempfile
import os
dirname, basename = os.path.split(filename)
temp = tempfile.NamedTemporaryFile(prefix=basename, dir=dirname)
print(temp.name)
You can pass a file location in 'dir' constructor parameter.
It works, as you wish.
>>> t = tempfile.NamedTemporaryFile(dir="/Users/rafal")
>>> t.name
'/Users/rafal/tmplo45Js'
Source: http://docs.python.org/library/tempfile.html#tempfile.NamedTemporaryFile
To meet all your checklist I think you'd want to use...
temp = tempfile.NamedTemporaryFile(prefix=data_file_name, dir=path,
delete=False)
Important to have the delete=False, because otherwise:
[...] If delete is true (the default), the file is deleted as soon as it is
closed.
I use the current time as "some random data" appended to a base string for a unique temporary filename:
import time
temp_file_name = data_file_name + str(time.time())
The tempfile module that you use provides a secure way of managing temporary files. If you really want to use your own system, you should just be aware that it could be vulnerable to attacks (particularly symlink attacks).
A simple way to generate a temporary unique filename (albeit a rather long name) is:
import uuid
import os
tempfilename = 'myprefix-%s.dat' % str(uuid.uuid4())
with open(tempfilename, 'rw') as tempfile:
# do stuff
os.remove(tempfilename)
But this is a bit hackish; really rather consider using the tempfile module with the correct prefix and dir parameters passed to NamedTemporaryFile, as described in the other answers.
Still 'diving in' to Python, and want to make sure I'm not overlooking something. I wrote a script that extracts files from several zip files, and saves the extracted files together in one directory. To prevent duplicate filenames from being over-written, I wrote this little function - and I'm just wondering if there is a better way to do this?
Thanks!
def unique_filename(file_name):
counter = 1
file_name_parts = os.path.splitext(file_name) # returns ('/path/file', '.ext')
while os.path.isfile(file_name):
file_name = file_name_parts[0] + '_' + str(counter) + file_name_parts[1]
counter += 1
return file_name
I really do require the files to be in a single directory, and numbering duplicates is definitely acceptable in my case, so I'm not looking for a more robust method (tho' I suppose any pointers are welcome), but just to make sure that what this accomplishes is getting done the right way.
One issue is that there is a race condition in your above code, since there is a gap between testing for existance, and creating the file. There may be security implications to this (think about someone maliciously inserting a symlink to a sensitive file which they wouldn't be able to overwrite, but your program running with a higher privilege could) Attacks like these are why things like os.tempnam() are deprecated.
To get around it, the best approach is to actually try create the file in such a way that you'll get an exception if it fails, and on success, return the actually opened file object. This can be done with the lower level os.open functions, by passing both the os.O_CREAT and os.O_EXCL flags. Once opened, return the actual file (and optionally filename) you create. Eg, here's your code modified to use this approach (returning a (file, filename) tuple):
def unique_file(file_name):
counter = 1
file_name_parts = os.path.splitext(file_name) # returns ('/path/file', '.ext')
while 1:
try:
fd = os.open(file_name, os.O_CREAT | os.O_EXCL | os.O_RDRW)
return os.fdopen(fd), file_name
except OSError:
pass
file_name = file_name_parts[0] + '_' + str(counter) + file_name_parts[1]
counter += 1
[Edit] Actually, a better way, which will handle the above issues for you, is probably to use the tempfile module, though you may lose some control over the naming. Here's an example of using it (keeping a similar interface):
def unique_file(file_name):
dirname, filename = os.path.split(file_name)
prefix, suffix = os.path.splitext(filename)
fd, filename = tempfile.mkstemp(suffix, prefix+"_", dirname)
return os.fdopen(fd), filename
>>> f, filename=unique_file('/home/some_dir/foo.txt')
>>> print filename
/home/some_dir/foo_z8f_2Z.txt
The only downside with this approach is that you will always get a filename with some random characters in it, as there's no attempt to create an unmodified file (/home/some_dir/foo.txt) first.
You may also want to look at tempfile.TemporaryFile and NamedTemporaryFile, which will do the above and also automatically delete from disk when closed.
Yes, this is a good strategy for readable but unique filenames.
One important change: You should replace os.path.isfile with os.path.lexists! As it is written right now, if there is a directory named /foo/bar.baz, your program will try to overwrite that with the new file (which won't work)... since isfile only checks for files and not directories. lexists checks for directories, symlinks, etc... basically if there's any reason that filename could not be created.
EDIT: #Brian gave a better answer, which is more secure and robust in terms of race conditions.
Two small changes...
base_name, ext = os.path.splitext(file_name)
You get two results with distinct meaning, give them distinct names.
file_name = "%s_%d%s" % (base_name, str(counter), ext)
It isn't faster or significantly shorter. But, when you want to change your file name pattern, the pattern is on one place, and slightly easier to work with.
If you want readable names this looks like a good solution.
There are routines to return unique file names for eg. temp files but they produce long random looking names.
if you don't care about readability, uuid.uuid4() is your friend.
import uuid
def unique_filename(prefix=None, suffix=None):
fn = []
if prefix: fn.extend([prefix, '-'])
fn.append(str(uuid.uuid4()))
if suffix: fn.extend(['.', suffix.lstrip('.')])
return ''.join(fn)
How about
def ensure_unique_filename(orig_file_path):
from time import time
import os
if os.path.lexists(orig_file_path):
name, ext = os.path.splitext(orig_file_path)
orig_file_path = name + str(time()).replace('.', '') + ext
return orig_file_path
time() returns current time in milliseconds. combined with original filename, it's fairly unique even in complex multithreaded cases.