You know how when you download something and the downloads folder contains a file with the same name, instead of overwriting it or throwing an error, the file ends up with a number appended to the end? For example, if I want to download my_file.txt, but it already exists in the target folder, the new file will be named my_file(2).txt. And if I try again, it will be my_file(3).txt.
I was wondering if there is a way in Python 3.x to check that and get a unique name (not necessarily create the file or directory). I'm currently implementing it doing this:
import os
def new_name(name, newseparator='_')
#name can be either a file or directory name
base, extension = os.path.splitext(name)
i = 2
while os.path.exists(name):
name = base + newseparator + str(i) + extension
i += 1
return name
In the example above, running new_file('my_file.txt') would return my_file_2.txt if my_file.txt already exists in the cwd. name can also contain the full or relative path, it will work as well.
I would use PathLib and do something along these lines:
from pathlib import Path
def new_fn(fn, sep='_'):
p=Path(fn)
if p.exists():
if not p.is_file():
raise TypeError
np=p.resolve(strict=True)
parent=str(np.parent)
extens=''.join(np.suffixes) # handle multiple ext such as .tar.gz
base=str(np.name).replace(extens,'')
i=2
nf=parent+base+sep+str(i)+extens
while Path(nf).exists():
i+=1
nf=parent+base+sep+str(i)+extens
return nf
else:
return p.parent.resolve(strict=True) / p
This only handles files as written but the same approach would work with directories (which you added later.) I will leave that as a project for the reader.
Another way of getting a new name would be using the built-in tempfile module:
from pathlib import Path
from tempfile import NamedTemporaryFile
def new_path(path: Path, new_separator='_'):
prefix = str(path.stem) + new_separator
dir = path.parent
suffix = ''.join(path.suffixes)
with NamedTemporaryFile(prefix=prefix, suffix=suffix, delete=False, dir=dir) as f:
return f.name
If you execute this function from within Downloads directory, you will get something like:
>>> new_path(Path('my_file.txt'))
'/home/krassowski/Downloads/my_file_90_lv301.txt'
where the 90_lv301 part was generated internally by the Python's tempfile module.
Note: with the delete=False argument, the function will create (and leave undeleted) an empty file with the new name. If you do not want to have an empty file created that way, just remove the delete=False, however keeping it will prevent anyone else from creating a new file with such name before your next operation (though they could still overwrite it).
Simply put, having delete=False prevents concurrency issues if you (or the end-user) were to run your program twice at the same time.
Related
I have a Python script that compares existing file names in a folder to a reference table and then determines if it needs to be renamed or not.
As it loops through each filename:
'oldname' = the current file name
'newname' = what it needs to be renamed to
I want rename the file and move it to a new folder "..\renamedfiles"
Can I do the rename and the move at the same time as it iterates through the loop?
Yes you can do this. In Python you can use the move function in shutil library to achieve this.
Let's say on Linux, you have a file in /home/user/Downloads folder named "test.txt" and you want to move it to /home/user/Documents and also change the name to "useful_name.txt". You can do both things in the same line of code:
import shutil
shutil.move('/home/user/Downloads/test.txt', '/home/user/Documents/useful_name.txt')
In your case you can do this:
import shutil
shutil.move('oldname', 'renamedfiles/newname')
os.rename (and os.replace) won't work if the source and target locations are on different partitions/drives/devices. If that's the case, you need to use shutil.move, which will use atomic renaming if possible, and fallback to copy-then-delete if the destination is not on the same file system. It's perfectly happy to both move and rename in the same operation; the operation is the same regardless.
To do both of the operations, you can use the os.rename(src, dest) function.
You should have the wanted directory to save the file in, and the new file name. You can do this for every file you run across in your loop.
For example:
# In Windows
dest_dir = "tmp\\2"
new_name = "bar.txt"
current_file_name = "tmp\\1\\foo.txt"
os.rename(current_file_name, os.path.join(dest_dir, new_name))
The rename function allows you to change the name of the file and it's folder at the same time.
To prevent any errors in renaming and moving of the file, use shutil.move.
Since Python 3.4, working with paths is done easily with pathlib. Moving/renaming a file is done with rename or replace (will unconditionally do the replace). So combining with the parent attribute and the concat operator, you can do:
from pathlib import Path
source = Path("path/to/file/oldname")
target = source.replace(source.parent / "renames" / "newname")
Create a Python file in your desired directory and write something like that :
import os
for filename in os.listdir("."):
if(filename ...):
newFilename = ...
os.rename(filename, newFilename)
i am currently creating a file on run of my application using the simple method
file = open('myfile.dat', 'w+')
however i have noticed that this is overwritting the file on each run, what i want to do is if it already exsists, create a new file called myfilex.dat where x is the number of previous copies of the file, is there a quick and effective way of doing this ?
Thanks :)
EDIT : I know how to check it already exists using the os.path.exists function, but i am am asking if it does exist how can i apend the number of versions on the end easy if that makes sense sorry if it does not
You could use a timestamp, so that each time you will execute the program it will write to a different file:
import time
file = open('myfile.%d.dat' % time.time(), 'w+')
You can do two things, either Open with append that is file = open('myfile.dat', 'a') or check if file exists and give user option to overwrite. Python have number of option. You can check this question for enlightment
How do I check whether a file exists using Python?
Consider
import os
def build_filename(name, num=0):
root, ext = os.path.splitext(name)
return '%s%d%s' % (root, num, ext) if num else name
def find_next_filename(name, max_tries=20):
if not os.path.exists(name): return name
else:
for i in range(max_tries):
test_name = build_filename(name, i+1)
if not os.path.exists(test_name): return test_name
return None
If your filename doesn't exist, it'll return your filename.
If your filename does exist, it'll try rootX.extension where root and extension are determined by os.path.splittext and X is an integer, starting at 1 and ending at max_tries (I had it default to 20, but you could change the default or pass a different argument).
If no file can be found, the function returns None.
Note, there are still race conditions present here (a file is created by another process with a clashing name after your check), but its what you said you wanted.
# When the files doesn't exist
print find_next_filename('myfile.dat') # myfile.dat
# When the file does exist
print find_next_filename('myfile.dat') # myfile1.dat
# When the file does exist, as does "1" and "2"
print find_next_filename('myfile.dat') # myfile3.dat
Nothing particularly quick, but effective? Sure! I'm used to a backup system where I do:
filename.ext
filename-1.ext # older
filename-2.ext # older still
filename-3.ext # even older
This is slightly harder than what you want to do. You want filename-N.ext to be the NEWEST file! Let's use glob to see how many files match that name, then make a new one!
from glob import glob
import os.path
num_files = len(glob.glob(os.path.join(root, head, filename + "*", ext)))
# where:
# root = r"C:\"
# head = r"users\username\My Documents"
# filename = "myfile"
# ext = "dat"
if num_files = 0:
num_files = "" # handles the case where file doesn't exist AT ALL yet
with open(os.path.join(root, head, filename + str(num_files), ext), 'w+'):
do_stuff_to_file()
Here is a few solutions for everyone experiencing a similar problem.
Keep YOUR program from overwiting data:
with open('myfile.txt', 'a') as myfile:
myfile.write('data')
Note: I believe that a+ (not a) allows for reading and writing, but I'm not 100% sure.
Prevent ALL programs from overwriting your data (by setting it to read-only):
from os import chmod
from stat import S_IREAD
chmod('path_to_file', IREAD)
Note: both of these modules are built-in to Python (at least Python 3.10.4) so no need to use pip.
Note 2: Setting it to read-only is not the best idea, as programs can set it back. I would combine this with a hash and/or signature to verify the file has not been tampered with to 'invalidate' the data inside and require the user to re-generate the file (eg, to store any temporary but very important data like decryption keys after generating them before deleting them).
Just check to see if your file already exists then?
name = "myfile"
extension =".dat"
x = 0
fileName = name + extension
while(!os.path.exists(fileName)):
x = x + 1
fileName = name + x + extension
file = open(fileName, 'w+')
I know it's a noob question but I have some difficulties to make it works
def create(file):
f = open(file,'w')
it returns "IOError: [Errno 2] No such file or directory: "
If I do that it works of course:
file ="myfile"
f = open(file,'w')
But I can't figure out how to create my file from the function parameter
Sorry for the noob question, thanks in advance for your help.
when you pass the "http://somesite.com/" as file to your function python treats it as a directory structure.
As soon as python gets to "http:/" it presumes we have a directory. Using forward slashes in unix is not allowed and I imagine it is the same for windows.
To turn the name into something useable you can use some variation of urlparse.urlsplit:
import urlparse
import urlparse
def parse(f):
prse = urlparse.urlsplit(f)
return prse.netloc if f.startswith("http") else prse.path.split("/",1)[0]
Sites can look like paths to directories to the operating system. for instance: stackoverflow.com/something will be interpreted as a directory stackoverflow.com in which there is a file something.
You can see this when you use os.path.dirname:
>>> os.path.dirname('stackoverflow.com/something')
'stackoverflow.com'
If this is indeed the case, and you still want to proceed, you're passing a path to a location in a directory and not just a file name.
You have to make sure the directory stackoverflow.com exists first:
file_path = 'stackoverflow.com/something'
dirname = os.path.dirname(file_path)
if not os.path.exists(dirname):
# if stackoverflow.com directory does not exist it will be created
os.makedirs(dirname)
# .. carry on to open file_path and use it.
Watch out from http:// and the likes and consider using a real url parser.
tip: file is already defined in python, you shouldn't override it by using it to name a variable.
Editing:
def create(file):
f = open(file,'w')
f.close()
If you call this function using:
create('myfile.txt')
It will create a file named myfile.txt in whatever directory the code is being run from. Note that you are passing in a string not an object.
Since I now see you are passing in a string similar to http://www.google.com, you are trying to create a file named www.google.com in the http: folder. You are going to have to truncate or change the / since Windows files cannot contain that character in their names.
We'll use everything after the last / in this example:
def create(filename):
filename = re.sub(r'.*//*', '', filename)
f = open(filename, 'w')
f.close()
So calling: create('www.google.com/morestuff/things') will create a file called things
I have a script that downloads files (pdfs, docs, etc) from a predetermined list of web pages. I want to edit my script to alter the names of files with a trailing _x if the file name already exists, since it's possible files from different pages will share the same filename but contain different contents, and urlretrieve() appears to automatically overwrite existing files.
So far, I have:
urlfile = 'https://www.foo.com/foo/foo/foo.pdf'
filename = urlfile.split('/')[-1]
filename = foo.pdf
if os.path.exists(filename):
filename = filename('.')[0] + '_' + 1
That works fine for one occurrence, but it looks like after one foo_1.pdf it will start saving as foo_1_1.pdf, and so on. I would like to save the files as foo_1.pdf, foo_2.pdf, and so on.
Can anybody point me in the right direction on how to I can ensure that file names are stored in the correct fashion as the script runs?
Thanks.
So what you want is something like this:
curName = "foo_0.pdf"
while os.path.exists(curName):
num = int(curName.split('.')[0].split('_')[1])
curName = "foo_{}.pdf".format(str(num+1))
Here's the general scheme:
Assume you start from the first file name (foo_0.pdf)
Check if that name is taken
If it is, iterate the name by 1
Continue looping until you find a name that isn't taken
One alternative: Generate a list of file numbers that are in use, and update it as needed. If it's sorted you can say name = "foo_{}.pdf".format(flist[-1]+1). This has the advantage that you don't have to run through all the files every time (as the above solution does). However, you need to keep the list of numbers in memory. Additionally, this will not fill any gaps in the numbers
Why not just use the tempfile module:
fileobj = tempfile.NamedTemporaryFile(suffix='.pdf', prefix='', delete = False)
Now your filename will be available in fileobj.name and you can manipulate to your heart's content. As an added benefit, this is cross-platform.
Since you're dealing with multiple pages, this seeems more like a "global archive" than a per-page archive. For a per-page archive, I would go with the answer from #wnnmaw
For a global archive, I would take a different approch...
Create a directory for each filename
Store the file in the directory as "1" + extension
write the current "number" to the directory as "_files.txt"
additional files are written as 2,3,4,etc and increment the value in _files.txt
The benefits of this:
The directory is the original filename. If you keep turning "Example-1.pdf" into "Example-2.pdf" you run into a possibility where you download a real "Example-2.pdf", and can't associate it to the original filename.
You can grab the number of like-named files either by reading _files.txt or counting the number of files in the directory.
Personally, I'd also suggest storing the files in a tiered bucketing system, so that you don't have too many files/directories in any one directory (hundreds of files makes it annoying as a user, thousands of files can affect OS performance ). A bucketing system might turn a filename into a hexdigest, then drop the file into `/%s/%s/%s" % ( hex[0:3], hex[3:6], filename ). The hexdigest is used to give you a more even distribution of characters.
import os
def uniquify(path, sep=''):
path = os.path.normpath(path)
num = 0
newpath = path
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
while os.path.exists(newpath):
newpath = os.path.join(dirname, '{f}{s}{n:d}{e}'
.format(f=filename, s=sep, n=num, e=ext))
num += 1
return newpath
filename = uniquify('foo.pdf', sep='_')
Possible problems with this include:
If you call to uniquify many many thousands of times with the same
path, each subsequent call may get a bit slower since the
while-loop starts checking from num=0 each time.
uniquify is vulnerable to race conditions whereby a file may not
exist at the time os.path.exists is called, but may exist at the
time you use the value returned by uniquify. Use
tempfile.NamedTemporaryFile to avoid this problem. You won't get
incremental numbering, but you will get files with unique names,
guaranteed not to already exist. You could use the prefix parameter to
specify the original name of the file. For example,
import tempfile
import os
def uniquify(path, sep='_', mode='w'):
path = os.path.normpath(path)
if os.path.exists(path):
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
return tempfile.NamedTemporaryFile(prefix=filename+sep, suffix=ext, delete=False,
dir=dirname, mode=mode)
else:
return open(path, mode)
Which could be used like this:
In [141]: f = uniquify('/tmp/foo.pdf')
In [142]: f.name
Out[142]: '/tmp/foo_34cvy1.pdf'
Note that to prevent a race-condition, the opened filehandle -- not merely the name of the file -- is returned.
I have a python script that is trying to compare two files to each other and output the difference. However I am not sure what exactly is going on as when I run the script it gives me an error as
NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\api\\API_TEST\\Apis.os\\*.*'
I dont know why it is appending * . * at the end of the file extention.
This is currently my function:
def CheckFilesLatest(self, previous_path, latest_path):
for filename in os.listdir(latest_path):
previous_filename = os.path.join(previous_path, filename)
latest_filename = os.path.join(latest_path, filename)
if self.IsValidOspace(latest_filename):
for os_filename in os.listdir(latest_filename):
name, ext = os.path.splitext(os_filename)
if ext == ".os":
previous_os_filename = os.path.join(previous_filename, os_filename)
latest_os_filename = os.path.join(latest_filename, os_filename)
if os.path.isfile(latest_os_filename) == True:
# If the file exists in both directories, check if the files are different; otherwise mark the contents of the latest file as added.
if os.path.isfile(previous_os_filename) == True:
self.GetFeaturesModified(previous_os_filename, latest_os_filename)
else:
self.GetFeaturesAdded(latest_os_filename)
else:
if os.path.isdir(latest_filename):
self.CheckFilesLatest(previous_filename, latest_filename)
Any thoughts on why it cant scan the directory and look for an os file for example?
It is failing on line:
for os_filename in os.listdir(latest_filename):
The code first gets called from
def main():
for i in range(6, arg_length, 2):
component = sys.argv[i]
package = sys.argv[i+1]
previous_source_dir = os.path.join(previous_path, component, package)
latest_source_dir = os.path.join(latest_path, component, package)
x.CheckFilesLatest(previous_source_dir, latest_source_dir)
x.CheckFilesPrevious(previous_source_dir, latest_source_dir)
Thank you
os.listdir() requires that the latest_path argument be a directory as you have stated. However, latest_path is being passed in as an argument. Thus, you need to look at the code that actually creates latest_path in order to determine why the '.' is being put in. Since you are calling it recursively, first check the original call (the first time). It would appear that your base code that calls CheckFilesLatest() is trying to set up the search command to find all files within the directory 'C:\api\API_TEST\Apis.os' You would need to split out the file indicator first and then do the check.
If you want to browse a directory recursively, using os.walk would be better and simpler than your complex handling with recursive function calls. Take a look at the docs: http://docs.python.org/2/library/os.html#os.walk