I want to automate the file name used when saving a spreadsheet using xlwt. Say there is a sub directory named Data in the folder the python program is running. I want the program to count the number of files in that folder (# = n). Then the filename must end in (n+1). If there are 0 files in the folder, the filename must be Trial_1.xls. This file must be saved in that sub directory.
I know the following:
import xlwt, os, os.path
n = len([name for name in os.listdir('.') if os.path.isfile(name)])
counts the number of files in the same folder.
a = n + 1
filename = "Trial_" + "a" + ".xls"
book.save(filename)
this will save the file properly named in to the same folder.
My question is how do I extend this in to a sub directory? Thanks.
os.listdir('.') the . in this points to the directory from where the file is executed. Change the . to point to the subdirectory you are interested in.
You should give it the full path name from the root of your file system; otherwise it will be relative to the directory from where the script is executed. This might not be what you want; especially if you need to refer to the sub directory from another program.
You also need to provide the full path to the filename variable; which would include the sub directory.
To make life easier, just set the full path to a variable and refer to it when needed.
TARGET_DIR = '/home/me/projects/data/'
n = sum(1 for f in os.listdir(TARGET_DIR) if os.path.isfile(os.path.join(TARGET_DIR, f)))
new_name = "{}Trial_{}.xls".format(TARGET_DIR,n+1)
You actually want glob:
from glob import glob
DIR = 'some/where/'
existing_files = glob(DIR + '*.xls')
filename = DIR + 'stuff--%d--stuff.xls' % (len(existing_files) + 1)
Since you said Burhan Khalid's answer "Works perfectly!" you should accept it.
I just wanted to point out a different way to compute the number. The way you are doing it works, but if we imagine you were counting grains of sand or something would use way too much memory. Here is a more direct way to get the count:
n = sum(1 for name in os.listdir('.') if os.path.isfile(name))
For every qualifying name, we get a 1, and all these 1's get fed into sum() and you get your count.
Note that this code uses a "generator expression" instead of a list comprehension. Instead of building a list, taking its length, and then discarding the list, the above code just makes an iterator that sum() iterates to compute the count.
It's a bit sleazy, but there is a shortcut we can use: sum() will accept boolean values, and will treat True as a 1, and False as a 0. We can sum these.
# sum will treat Boolean True as a 1, False as a 0
n = sum(os.path.isfile(name) for name in os.listdir('.'))
This is sufficiently tricky that I probably would not use this without putting a comment. But I believe this is the fastest, most efficient way to count things in Python.
Related
I would like to rename images based on part of the name of the folder the images are in and iterate through the images. I am using os.walk and I was able to rename all the images in the folders but could not figure out how to use the letters to the left of the first hyphen in the folder name as part of the image name.
Folder name: ABCDEF - THIS IS - MY FOLDER - NAME
Current image names in folder:
dsc_001.jpg
dsc_234.jpg
dsc_123.jpg
Want to change to show like this:
ABCDEF_1.jpg
ABCDEF_2.jpg
ABCDEF_3.jpg
What I have is this, but I am not sure why I am unable to split the filename by the hyphen:
import os
from os.path import join
path = r'C:\folderPath'
i = 1
for root, dirs, files in os.walk(path):
for image in files:
prefix = files.split(' - ')[0]
os.rename(os.path.join(path, image), os.path.join(path, prefix + '_'
+ str(i)+'.jpg'))
i = i+1
Okay, I've re-read your question and I think I know what's wrong.
1.) The os.walk() iterable is recursive, i.e. if you use os.walk(r'C:\'), it will loop through all the folders and find all the files under C drive. Now I'm not sure if your C:\folderPath has any sub-folders in it. If it does, and any of the folder/file format are not the convention as C:\folderPath, your code is going to have a bad time.
2.) When you iterate through files, you are split()ing the wrong object. Your question state you want to split the Folder name, but your code is splitting the files iterable which is a list of all the files under the current iteration directory. That doesn't accomplish what you want. Depending if your ABCDEF folder is the C:\folderPath or a sub folder within, you'll need to code differently.
3.) you have imported join from os.path but you still end up calling the full name os.path.join() anyways, which is redundant. Either just import os and call os.path.join() or just with your current imports, just join().
Having said all of that, here are my edits:
Answer 1:
If your ABCDEF is the assigned folder
import os
from os.path import join
path = r'C:\ABCDEF - THIS - IS - MY - FOLDER - NAME'
for root, dirs, files in os.walk(path):
folder = root.split("\\")[-1] # This gets you the current folder's name
for i, image in enumerate(files):
new_image = "{0}_{1}.jpg".format(folder.split(' - ')[0], i + 1)
os.rename(join(path, image), join(path, new_image))
break # if you have sub folders that follow the SAME structure, then remove this break. Otherwise, keep it here so your code stop after all the files are updated in your parent folder.
Answer 2:
Assuming your ABCDEF's are all sub folders under the assigned directory, and all of them follow the same naming convention.
import os
from os.path import join
path = r'C:\parentFolder' # The folder that has all the sub folders that are named ABCDEF...
for i, (root, dirs, files) in enumerate(os.walk(path)):
if i == 0: continue # skip the parentFolder as it doesn't follow the same naming convention
folder = root.split("\\")[-1] # This gets you the current folder's name
for i, image in enumerate(files):
new_image = "{0}_{1}.jpg".format(folder.split(' - ')[0], i + 1)
os.rename(join(path, image), join(path, new_image))
Note:
If your scenario doesn't fall under either of these, please make it clear what your folder structure is (a sample including all sub folders and sub files). Remember, consistency is key in determining how your code should work. If it's inconsistent, your best bet is use Answer 1 on each target folder separately.
Changes:
1.) You can get an incremental index without doing a i += 1. enumerate() is a great tool for iterables that also give you the iteration number.
2.) Your split() should be operated on the folder name instead of files (an iterable). In your case, image is the actual file name, and files is the list of files in the current iteration directory.
3.) Use of str.format() function to make your new file format easier to read.
4.) You'll note the use of split("\\") instead of split(r"\"), and that's because a single backslash cannot be a raw string.
This should now work. I ended up doing a lot more research than expected such as how to handle the os.walk() properly in both scenarios. For future reference, a little google search goes a long way. I hope this finally answers your question. Remember, doing your own research and clarity in demonstrating your problem will get you more efficient answers.
Bonus: if you have python 3.6+, you can even use f strings for your new file name, which ends up looking really cool:
new_image = f"{image.split(' - ')[0]}_{i+1}.jpg"
All,
I need to move file from one directory to another but I don't want to move all the files in that directory just the text files that begin with 'pws'. A list of all the files in the directory is:
['pws1.txt', 'pws2.txt', 'pws3.txt', 'pws4.txt', 'pws5.txt', 'x.txt', 'y.txt']
As stated, I want to move the 'pws*' files to another directory but not the x and y text files. What I want to do is remove all elements the list that does not begin with 'pws'. My code is below:
loc = 'C:\Test1'
dir = os.listdir(loc)
#print dir
for i in dir:
#print i
x = 'pws*'
if i != x:
dir.remove(i)
print dir
The output does not keep what I want instead
It removes the x text file from the list and the even number ones but retains the y text files.
What am I doing wrong. How can I make a list of only the files that start with 'pws' and remove the text files that do not begin with 'pws'.
Keep in mind I might have a list that has 1000 elements and several hundreds of those elements will start with 'pws' while those that don't begin with it, couple of hundreds, will need to be removed.
Everyone's help is much appreciated.
You can use list-comprehension to re-create the list as the following:
dir = [i for i in dir if i.startswith('pws')]
or better yet, define that at start:
loc = 'C:\\Test1'
dir = [i for i in os.listdir(loc) if i.startswith('pws')]
print dir
Explanation:
When you use x = 'pws*' and then check for if i == x, you are comparing if the element i is equal to 'pws*', so a better way is to use str.startswith() built in method that will check if the string starts with the provided substring. So in your loop you can use if i.startswith('pws') or you can use list-comprehension as I mentioned above which is a more pythonic approach.
To move a file:
How to move a file in Python
You can use os.rename(origin,destination)
origin = r'C:\users\JohnDoe\Desktop'
destination = r'C:\users\JohnDoe\Desktop\Test'
startswith_ = 'pws'
And then go ahead and do a list comprehension
# Move files
[os.rename(os.path.join(origin,i),os.path.join(destination,i)) for i in os.listdir(origin) if i.startswith(startswith_)]
Just use a glob. This gives you a list of files in a directory without all the listdir calls and substring matching:
import glob,os
for f in glob.glob('/tmp/foo/pws*'):
os.rename(f, '/tmp/bar/%s'%(os.path.basename(f)))
Edit 1: Here's your original code, simplified with a glob, and a new variable loc2 defined to be the place to move them to:
import os,glob
loc = 'C:\Test1'
loc2 = 'C:\Test2'
files = glob.glob('%s\pws*'%(loc))
for i in files:
os.rename(i,'%s\%s'%(loc2,os.path.basename(i)))
I've got a script that will accurately tell me how many files are in a directory, and the subdirectories within. However, I'm also looking into identify how many folders there are within the same directory and its subdirectories...
My current script:
import os, getpass
from os.path import join, getsize
user = 'Copy of ' + getpass.getuser()
path = "C://Documents and Settings//" + user + "./"
folder_counter = sum([len(folder) for r, d, folder in os.walk(path)])
file_counter = sum([len(files) for r, d, files in os.walk(path)])
print ' [*] ' + str(file_counter) + ' Files were found and ' + str(folder_counter) + ' folders'
This code gives me the print out of: [*] 147 Files were found and 147 folders.
Meaning that the folder_counter isn't counting the right elements. How can I correct this so the folder_counter is correct?
Python 2.7 solution
For a single directory and in you can also do:
import os
print len(os.walk('dir_name').next()[1])
which will not load the whole string list and also return you the amount of directories inside the 'dir_name' directory.
Python 3.x solution
Since many people just want an easy and fast solution, without actually understanding the solution, I edit my answer to include the exact working code for Python 3.x.
So, in Python 3.x we have the next method instead of .next. Thus, the above snippet becomes:
import os
print(len(next(os.walk('dir_name'))[1]))
where dir_name is the directory that you want to find out how many directories has inside.
I think you want something like:
import os
files = folders = 0
for _, dirnames, filenames in os.walk(path):
# ^ this idiom means "we won't be using this value"
files += len(filenames)
folders += len(dirnames)
print "{:,} files, {:,} folders".format(files, folders)
Note that this only iterates over os.walk once, which will make it much quicker on paths containing lots of files and directories. Running it on my Python directory gives me:
30,183 files, 2,074 folders
which exactly matches what the Windows folder properties view tells me.
Note that your current code calculates the same number twice because the only change is renaming one of the returned values from the call to os.walk:
folder_counter = sum([len(folder) for r, d, folder in os.walk(path)])
# ^ here # ^ and here
file_counter = sum([len(files) for r, d, files in os.walk(path)])
# ^ vs. here # ^ and here
Despite that name change, you're counting the same value (i.e. in both it's the third of the three returned values that you're using)! Python functions do not know what names (if any at all; you could do print list(os.walk(path)), for example) the values they return will be assigned to, and their behaviour certainly won't change because of it. Per the documentation, os.walk returns a three-tuple (dirpath, dirnames, filenames), and the names you use for that, e.g. whether:
for foo, bar, baz in os.walk(...):
or:
for all_three in os.walk(..):
won't change that.
If interested only in the number of folders in /input/dir (and not in the subdirectories):
import os
folder_count = 0 # type: int
input_path = "/path/to/your/input/dir" # type: str
for folders in os.listdir(input_path): # loop over all files
if os.path.isdir(os.path.join(input_path, folders): # if it's a directory
folder_count += 1 # increment counter
print("There are {} folders".format(folder_count))
>>> import os
>>> len(list(os.walk('folder_name')))
According to os.walk the first argument dirpath enumerates all directories.
I have a script that downloads files (pdfs, docs, etc) from a predetermined list of web pages. I want to edit my script to alter the names of files with a trailing _x if the file name already exists, since it's possible files from different pages will share the same filename but contain different contents, and urlretrieve() appears to automatically overwrite existing files.
So far, I have:
urlfile = 'https://www.foo.com/foo/foo/foo.pdf'
filename = urlfile.split('/')[-1]
filename = foo.pdf
if os.path.exists(filename):
filename = filename('.')[0] + '_' + 1
That works fine for one occurrence, but it looks like after one foo_1.pdf it will start saving as foo_1_1.pdf, and so on. I would like to save the files as foo_1.pdf, foo_2.pdf, and so on.
Can anybody point me in the right direction on how to I can ensure that file names are stored in the correct fashion as the script runs?
Thanks.
So what you want is something like this:
curName = "foo_0.pdf"
while os.path.exists(curName):
num = int(curName.split('.')[0].split('_')[1])
curName = "foo_{}.pdf".format(str(num+1))
Here's the general scheme:
Assume you start from the first file name (foo_0.pdf)
Check if that name is taken
If it is, iterate the name by 1
Continue looping until you find a name that isn't taken
One alternative: Generate a list of file numbers that are in use, and update it as needed. If it's sorted you can say name = "foo_{}.pdf".format(flist[-1]+1). This has the advantage that you don't have to run through all the files every time (as the above solution does). However, you need to keep the list of numbers in memory. Additionally, this will not fill any gaps in the numbers
Why not just use the tempfile module:
fileobj = tempfile.NamedTemporaryFile(suffix='.pdf', prefix='', delete = False)
Now your filename will be available in fileobj.name and you can manipulate to your heart's content. As an added benefit, this is cross-platform.
Since you're dealing with multiple pages, this seeems more like a "global archive" than a per-page archive. For a per-page archive, I would go with the answer from #wnnmaw
For a global archive, I would take a different approch...
Create a directory for each filename
Store the file in the directory as "1" + extension
write the current "number" to the directory as "_files.txt"
additional files are written as 2,3,4,etc and increment the value in _files.txt
The benefits of this:
The directory is the original filename. If you keep turning "Example-1.pdf" into "Example-2.pdf" you run into a possibility where you download a real "Example-2.pdf", and can't associate it to the original filename.
You can grab the number of like-named files either by reading _files.txt or counting the number of files in the directory.
Personally, I'd also suggest storing the files in a tiered bucketing system, so that you don't have too many files/directories in any one directory (hundreds of files makes it annoying as a user, thousands of files can affect OS performance ). A bucketing system might turn a filename into a hexdigest, then drop the file into `/%s/%s/%s" % ( hex[0:3], hex[3:6], filename ). The hexdigest is used to give you a more even distribution of characters.
import os
def uniquify(path, sep=''):
path = os.path.normpath(path)
num = 0
newpath = path
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
while os.path.exists(newpath):
newpath = os.path.join(dirname, '{f}{s}{n:d}{e}'
.format(f=filename, s=sep, n=num, e=ext))
num += 1
return newpath
filename = uniquify('foo.pdf', sep='_')
Possible problems with this include:
If you call to uniquify many many thousands of times with the same
path, each subsequent call may get a bit slower since the
while-loop starts checking from num=0 each time.
uniquify is vulnerable to race conditions whereby a file may not
exist at the time os.path.exists is called, but may exist at the
time you use the value returned by uniquify. Use
tempfile.NamedTemporaryFile to avoid this problem. You won't get
incremental numbering, but you will get files with unique names,
guaranteed not to already exist. You could use the prefix parameter to
specify the original name of the file. For example,
import tempfile
import os
def uniquify(path, sep='_', mode='w'):
path = os.path.normpath(path)
if os.path.exists(path):
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
return tempfile.NamedTemporaryFile(prefix=filename+sep, suffix=ext, delete=False,
dir=dirname, mode=mode)
else:
return open(path, mode)
Which could be used like this:
In [141]: f = uniquify('/tmp/foo.pdf')
In [142]: f.name
Out[142]: '/tmp/foo_34cvy1.pdf'
Note that to prevent a race-condition, the opened filehandle -- not merely the name of the file -- is returned.
What is the best way to choose a random file from a directory in Python?
Edit: Here is what I am doing:
import os
import random
import dircache
dir = 'some/directory'
filename = random.choice(dircache.listdir(dir))
path = os.path.join(dir, filename)
Is this particularly bad, or is there a particularly better way?
import os, random
random.choice(os.listdir("C:\\")) #change dir name to whatever
Regarding your edited question: first, I assume you know the risks of using a dircache, as well as the fact that it is deprecated since 2.6, and removed in 3.0.
Second of all, I don't see where any race condition exists here. Your dircache object is basically immutable (after directory listing is cached, it is never read again), so no harm in concurrent reads from it.
Other than that, I do not understand why you see any problem with this solution. It is fine.
If you want directories included, Yuval A's answer. Otherwise:
import os, random
random.choice([x for x in os.listdir("C:\\") if os.path.isfile(os.path.join("C:\\", x))])
The simplest solution is to make use of os.listdir & random.choice methods
random_file=random.choice(os.listdir("Folder_Destination"))
Let's take a look at it step by step :-
1} os.listdir method returns the list containing the name of
entries (files) in the path specified.
2} This list is then passed as a parameter to random.choice method
which returns a random file name from the list.
3} The file name is stored in random_file variable.
Considering a real time application
Here's a sample python code which will move random files from one directory to another
import os, random, shutil
#Prompting user to enter number of files to select randomly along with directory
source=input("Enter the Source Directory : ")
dest=input("Enter the Destination Directory : ")
no_of_files=int(input("Enter The Number of Files To Select : "))
print("%"*25+"{ Details Of Transfer }"+"%"*25)
print("\n\nList of Files Moved to %s :-"%(dest))
#Using for loop to randomly choose multiple files
for i in range(no_of_files):
#Variable random_file stores the name of the random file chosen
random_file=random.choice(os.listdir(source))
print("%d} %s"%(i+1,random_file))
source_file="%s\%s"%(source,random_file)
dest_file=dest
#"shutil.move" function moves file from one directory to another
shutil.move(source_file,dest_file)
print("\n\n"+"$"*33+"[ Files Moved Successfully ]"+"$"*33)
You can check out the whole project on github
Random File Picker
For addition reference about os.listdir & random.choice method you can refer to tutorialspoint learn python
os.listdir :- Python listdir() method
random.choice :- Python choice() method
The problem with most of the solutions given is you load all your input into memory which can become a problem for large inputs/hierarchies. Here's a solution adapted from The Perl Cookbook by Tom Christiansen and Nat Torkington. To get a random file anywhere beneath a directory:
#! /usr/bin/env python
import os, random
n=0
random.seed();
for root, dirs, files in os.walk('/tmp/foo'):
for name in files:
n += 1
if random.uniform(0, n) < 1:
rfile=os.path.join(root, name)
print rfile
Generalizing a bit makes a handy script:
$ cat /tmp/randy.py
#! /usr/bin/env python
import sys, random
random.seed()
n = 1
for line in sys.stdin:
if random.uniform(0, n) < 1:
rline=line
n += 1
sys.stdout.write(rline)
$ /tmp/randy.py < /usr/share/dict/words
chrysochlore
$ find /tmp/foo -type f | /tmp/randy.py
/tmp/foo/bar
Language agnostic solution:
1) Get the total no. of files in specified directory.
2) Pick a random number from 0 to [total no. of files - 1].
3) Get the list of filenames as a suitably indexed collection or such.
4) Pick the nth element, where n is the random number.
Independant from the language used, you can read all references to the files in a directory into a datastructure like an array (something like 'listFiles'), get the length of the array. calculate a random number in the range of '0' to 'arrayLength-1' and access the file at the certain index. This should work, not only in python.
If you don't know before hand what files are there, you will need to get a list, then just pick a random index in the list.
Here's one attempt:
import os
import random
def getRandomFile(path):
"""
Returns a random filename, chosen among the files of the given path.
"""
files = os.listdir(path)
index = random.randrange(0, len(files))
return files[index]
EDIT: The question now mentions a fear of a "race condition", which I can only assume is the typical problem of files being added/removed while you are in the process of trying to pick a random file.
I don't believe there is a way around that, other than keeping in mind that any I/O operation is inherently "unsafe", i.e. it can fail. So, the algorithm to open a randomly chosen file in a given directory should:
Actually open() the file selected, and handle a failure, since the file might no longer be there
Probably limit itself to a set number of tries, so it doesn't die if the directory is empty or if none of the files are readable
Python 3 has the pathlib module, which can be used to reason about files and directories in a more object oriented fashion:
from random import choice
from pathlib import Path
path: Path = Path()
# The Path.iterdir method returns a generator, so we must convert it to a list
# before passing it to random.choice, which expects an iterable.
random_path = choice(list(path.iterdir()))
This code don't repeat the file names:
def random_files(num, list_):
file_names = []
while True:
ap = random.choice(list_)
if ap not in file_names:
file_names.append(ap)
if len(file_names) == num:
return file_names
random_200_files = random_files(200, list_of_files)
For those who come here with the need to pick a large number of files from a larger number of files, and maybe copy or move them in another dir, the proposed approach is of course too slow.
Having enough memory, one could read all the directory content in a list, and then use the random.choices function to select 17 elements, for example:
from random import choices
from glob import glob
from shutil import copy
file_list = glob([SRC DIR] + '*' + [FILE EXTENSION])
picked_files = choices(file_list, k=17)
now picked_filesis a list of 20 filenames picked at random, that can be copied/moved even in parallel, for example:
import multiprocessing as mp
from itertools import repeat
from shutil import copy
def copy_files(filename, dest):
print(f"Working on file: {filename}")
copy(filename, dest)
with mp.Pool(processes=(mp.cpu_count() - 1) or 1) as p:
p.starmap(copy_files, zip(picked_files, repeat([DEST PATH])))