Best way to choose a random file from a directory - python

What is the best way to choose a random file from a directory in Python?
Edit: Here is what I am doing:
import os
import random
import dircache
dir = 'some/directory'
filename = random.choice(dircache.listdir(dir))
path = os.path.join(dir, filename)
Is this particularly bad, or is there a particularly better way?

import os, random
random.choice(os.listdir("C:\\")) #change dir name to whatever
Regarding your edited question: first, I assume you know the risks of using a dircache, as well as the fact that it is deprecated since 2.6, and removed in 3.0.
Second of all, I don't see where any race condition exists here. Your dircache object is basically immutable (after directory listing is cached, it is never read again), so no harm in concurrent reads from it.
Other than that, I do not understand why you see any problem with this solution. It is fine.

If you want directories included, Yuval A's answer. Otherwise:
import os, random
random.choice([x for x in os.listdir("C:\\") if os.path.isfile(os.path.join("C:\\", x))])

The simplest solution is to make use of os.listdir & random.choice methods
random_file=random.choice(os.listdir("Folder_Destination"))
Let's take a look at it step by step :-
1} os.listdir method returns the list containing the name of
entries (files) in the path specified.
2} This list is then passed as a parameter to random.choice method
which returns a random file name from the list.
3} The file name is stored in random_file variable.
Considering a real time application
Here's a sample python code which will move random files from one directory to another
import os, random, shutil
#Prompting user to enter number of files to select randomly along with directory
source=input("Enter the Source Directory : ")
dest=input("Enter the Destination Directory : ")
no_of_files=int(input("Enter The Number of Files To Select : "))
print("%"*25+"{ Details Of Transfer }"+"%"*25)
print("\n\nList of Files Moved to %s :-"%(dest))
#Using for loop to randomly choose multiple files
for i in range(no_of_files):
#Variable random_file stores the name of the random file chosen
random_file=random.choice(os.listdir(source))
print("%d} %s"%(i+1,random_file))
source_file="%s\%s"%(source,random_file)
dest_file=dest
#"shutil.move" function moves file from one directory to another
shutil.move(source_file,dest_file)
print("\n\n"+"$"*33+"[ Files Moved Successfully ]"+"$"*33)
You can check out the whole project on github
Random File Picker
For addition reference about os.listdir & random.choice method you can refer to tutorialspoint learn python
os.listdir :- Python listdir() method
random.choice :- Python choice() method

The problem with most of the solutions given is you load all your input into memory which can become a problem for large inputs/hierarchies. Here's a solution adapted from The Perl Cookbook by Tom Christiansen and Nat Torkington. To get a random file anywhere beneath a directory:
#! /usr/bin/env python
import os, random
n=0
random.seed();
for root, dirs, files in os.walk('/tmp/foo'):
for name in files:
n += 1
if random.uniform(0, n) < 1:
rfile=os.path.join(root, name)
print rfile
Generalizing a bit makes a handy script:
$ cat /tmp/randy.py
#! /usr/bin/env python
import sys, random
random.seed()
n = 1
for line in sys.stdin:
if random.uniform(0, n) < 1:
rline=line
n += 1
sys.stdout.write(rline)
$ /tmp/randy.py < /usr/share/dict/words
chrysochlore
$ find /tmp/foo -type f | /tmp/randy.py
/tmp/foo/bar

Language agnostic solution:
1) Get the total no. of files in specified directory.
2) Pick a random number from 0 to [total no. of files - 1].
3) Get the list of filenames as a suitably indexed collection or such.
4) Pick the nth element, where n is the random number.

Independant from the language used, you can read all references to the files in a directory into a datastructure like an array (something like 'listFiles'), get the length of the array. calculate a random number in the range of '0' to 'arrayLength-1' and access the file at the certain index. This should work, not only in python.

If you don't know before hand what files are there, you will need to get a list, then just pick a random index in the list.
Here's one attempt:
import os
import random
def getRandomFile(path):
"""
Returns a random filename, chosen among the files of the given path.
"""
files = os.listdir(path)
index = random.randrange(0, len(files))
return files[index]
EDIT: The question now mentions a fear of a "race condition", which I can only assume is the typical problem of files being added/removed while you are in the process of trying to pick a random file.
I don't believe there is a way around that, other than keeping in mind that any I/O operation is inherently "unsafe", i.e. it can fail. So, the algorithm to open a randomly chosen file in a given directory should:
Actually open() the file selected, and handle a failure, since the file might no longer be there
Probably limit itself to a set number of tries, so it doesn't die if the directory is empty or if none of the files are readable

Python 3 has the pathlib module, which can be used to reason about files and directories in a more object oriented fashion:
from random import choice
from pathlib import Path
path: Path = Path()
# The Path.iterdir method returns a generator, so we must convert it to a list
# before passing it to random.choice, which expects an iterable.
random_path = choice(list(path.iterdir()))

This code don't repeat the file names:
def random_files(num, list_):
file_names = []
while True:
ap = random.choice(list_)
if ap not in file_names:
file_names.append(ap)
if len(file_names) == num:
return file_names
random_200_files = random_files(200, list_of_files)

For those who come here with the need to pick a large number of files from a larger number of files, and maybe copy or move them in another dir, the proposed approach is of course too slow.
Having enough memory, one could read all the directory content in a list, and then use the random.choices function to select 17 elements, for example:
from random import choices
from glob import glob
from shutil import copy
file_list = glob([SRC DIR] + '*' + [FILE EXTENSION])
picked_files = choices(file_list, k=17)
now picked_filesis a list of 20 filenames picked at random, that can be copied/moved even in parallel, for example:
import multiprocessing as mp
from itertools import repeat
from shutil import copy
def copy_files(filename, dest):
print(f"Working on file: {filename}")
copy(filename, dest)
with mp.Pool(processes=(mp.cpu_count() - 1) or 1) as p:
p.starmap(copy_files, zip(picked_files, repeat([DEST PATH])))

Related

Using random and shutil to move files in loop in python

I have a small problem. I am trying to move 20x500 images in 20 predefined folders. I can make this work with just 500 random images and I have identified the problem; I draw 500 random files, move them and then it tries doing it again but since it doesn't update the random list, it fails when it reaches an image that it thinks is part of the random group but it has already been moved and thus fails. How do I "update" the random list of files so that it doesn't fail because I move stuff? The code is:
import os
import shutil
import random
folders = os.listdir(r'place_where_20_folders_are')
files = os.listdir(r'place_where_images_are')
string=r"string_to_add_to_make_full_path_of_each_file"
folders=[string+s for s in folders]
for folder in folders:
for fileName in random.sample(files, min(len(files), 500)):
path = os.path.join(r'place_where_images_are', fileName)
shutil.move(path, folder)
I think the problem in your code is that the random.sample() method leaves the original files list unchanged. Because of this you have a chance of getting the same filename twice, but as you already moved it before you will have an error.
Instead of using sample you could use this snippet:
files_to_move = [files.pop(random.randrange(0, len(files))) for _ in range(500)]
This will pop (thus removing) 500 random files from the files list and save them in files_to_move. As you repeat this, the files list becomes smaller.
This answer was inspired by this answer to the question Random Sample with remove from List.
This would be used like this:
import os
import shutil
import random
folders = os.listdir(r'place_where_20_folders_are')
files = os.listdir(r'place_where_images_are')
string=r"string_to_add_to_make_full_path_of_each_file"
folders=[string+s for s in folders]
for folder in folders:
files_to_move = [files.pop(random.randrange(0, len(files))) for _ in range(500)]
for file_to_move in files_to_move:
path = os.path.join(r'place_where_images_are', file_to_move)
shutil.move(path, folder)
I would start by making a list of random sample first and then pass it for moving in different location, and removing my list by using random libraries remove() , or just clearing/deleting/popping the list itself before the loop starts again.
Hope its helps.

move folders from folder list to other folder list using python

hello I want to move or copy many folders from some folder list to other folder list I use glob and shutil libraries for this work.
first I create a folder list :
import glob
#paths from source folder
sourcepath='C:/my/store/path/*'
paths = glob.glob(sourcepath)
my_file='10'
selected_path = filter(lambda x: my_file in x, paths)
#paths from destination folder
destpath='C:/my/store/path/*'
paths2 = glob.glob(destpath)
my_file1='20'
selected_path1 = filter(lambda x: my_file1 in x, paths2)
and now I have two lists from paths(selected_path,selected_path1)
now I want to movie or copy folder from first list(selected_path) to second list(selected_path1)
finaly I try this code to move folders but without success :
import shutil
for I,j in zip(selected_path,selected_path1)
shutil.move(i, j)
but that cant work,any ide how to do my code to work ?
First, Obviously your use of lambda isn't useful, glob function can perform this filtering. This is what glob really does, so you're basically littering your code with more unnecessary function call, which is quite expensive in terms of performance.
Look at this example, identical to yours:
import glob
# Find all .py files
sourcepath= 'C:/my/store/path/*.py'
paths = glob.glob(sourcepath)
# Find files that end with 'codes'
destpath= 'C:/my/store/path/*codes'
paths2 = glob.glob(destpath)
Second, the second glob function call may or may not return a list of directories to move your directories/files to. This makes your code dependent on what C:/my/store/pathcontains. That is, you must guarantee that 'C:/my/store/path must contain only directories and never files, so glob will return only directories to be used in shutil.move. If the user later added files not folders to C:/my/store/path that happened to end with the name 'codes' and they didn't specify any extensions (e.g, codes.txt, codes.py...) then you'll find this file in the returned list of glob in paths2. Of course, guaranteeing a directory to contain only subdirectories is problematic and not a good idea, not at all. You can test for directories through os.path.isdir
Notice something, you're using lambda with the help of filter to filter out any string that doesn't contain 10 in your first call to filter, something you can achieve with glob itself:
glob.glob('C:/my/store/path/*10*')
Now any file or subdirectory of C:/my/store/path that contains 10 in its name will be collected in the returned list of the glob function.
Third, zip truncates to the shortest iterable in its argument list. In other words, if you would like to move every path in paths to every path in paths2, you need len(paths) == len(paths2) so each file or directory in paths has a directory to be moved to in paths2.
Fourth, You missed the semicolon for the for loop and in the call for shutil.move you used i instead of I. Python is a case-sensitive language, and I uppercase isn't exactly the same as i lowercase:
import shutil
for I,j in zip(selected_path,selected_path1) # missing :
shutil.move(i, j) # i not I
Corrected code:
import shutil
for I,j in zip(selected_path,selected_path1) # missing :
shutil.move(I, j) # i not I
Presumably, paths2 contains only subdirectories of C:/my/store/path directory, this is a better approach to write your code, but definitely not the best:
import glob
#paths from source folder
sourcepath='C:/my/store/path/*10*'
paths = glob.glob(sourcepath)
#paths from destination folder
destpath='C:/my/store/path/*20*'
paths2 = glob.glob(destpath)
import shutil
for i,j in zip(paths,paths2):
shutil.move(i, j)
*Still some of the previous issues that I mentioned above apply to this code.
And now that you finished the long marathon of reading this answer, what would you like to do to improve your code? I'll be glad to help if you still find something ambiguous.
Good luck :)

Python 3 Self replicating file into random directory - then running file

I have a fun little script that i would like to make a copy of itself in a random directory - then run that copy of itself.
I know how to run files with (hacky):
os.system('Filename.py')
And i know how to replicate files with shuttle - but i am stuck at the random directory. Maybe if i could somehow get a list of all directories available and then pick one at random from the list - then remove this directory from the list?
Thanks,
Itechmatrix
You can get list of all dirs and subdirs, and shuffle it in random as follows:
import os
import random
all_dirs = [x[0] for x in os.walk('/tmp')]
random.shuffle(all_dirs)
for a_dir in all_dirs:
print(a_dir)
# do something witch each directory, e.g. copy some file there.
You can get a list of directories and then randomly select:
import os
import random
dirs = [d for d in os.listdir('.') if os.path.isdir(d)]
n = random.randrange(len(dirs))
print(dirs[n])
If you're on a Mac, there are a fair amount of hidden and restricted directories near the root. You can potentially run into errors with readability and writability. One way to get around that is to iterate through the available directories and sort all the no goes using the os module.
After that you can use the random.choice module to pick a random directory from that list.
import os, random
writing_dir = []
for directory in os.listdir():
if os.access(directory, W_OK) # W_OK ensures that the path is writable
writing_dir.append(directory)
path = random.choice(writing_dir)
I'm working on a similar script right now.

Python, Copy, Rename and run Commands

I have a little task for my company
I have multiple files which start with swale-randomnumber
I want to copy then to some directory (does shutil.copy allow wildmasks?)
anyway I then want to choose the largest file and rename it to sync.dat and then run a program.
I get the logic, I will use a loop to do each individual piece of work then move on to the next, but I am unsure how to choose a single largest file or a single file at all for that matter as when I type in swale* surely it will just choose them all?
Sorry I havnt written any source code yet, I am still trying to get my head around how this will work.
Thanks for any help you may provide
The accepted answer of this question proposes a nice portable implementation of file copy with wildcard support:
from glob import iglob
from shutil import copy
from os.path import join
def copy_files(src_glob, dst_folder):
for fname in iglob(src_glob):
copy(fname, join(dst_folder, fname))
If you want to compare file sizes, you can use either of these functions:
import os
os.path.getsize(path)
os.stat(path).st_size
This might work :
import os.path
import glob
import shutil
source = "My Source Path" # Replace these variables with the appropriate data
dest = "My Dest Path"
command = "My command"
# Find the files that need to be copied
files = glob.glob(os.path.join(source, "swale-*"))
# Copy the files to the destination
for file in files:
shutil.copy(os.path.join(source, "swale-*"), dest)
# Create a sorted list of files - using the file sizes
# biggest first, and then use the 1st item
biggest = sorted([file for file in files],
cmp=lambda x,y : cmp(x,y),
key=lambda x: os.path.size( os.path.join( dest, x)), reverse = True)[0]
# Rename that biggest file to swale.dat
shutil.move( os.path.join(dest,biggest), os.path.join(dest,"swale.date") )
# Run the command
os.system( command )
# Only use os.system if you know your command is completely secure and you don't need the output. Use the popen module if you need more security and need the output.
Note : None of this is tested - but it should work
from os import *
from os.path import *
directory = '/your/directory/'
# You now have list of files in directory that starts with "swale-"
fileList = [join(directory,f) for f in listdir(directory) if f.startswith("swale-") and isfile(join(directory,f))]
# Order it by file size - from big to small
fileList.sort(key=getsize, reverse=True)
# First file in array is biggest
biggestFile = fileList[0]
# Do whatever you want with this files - using shutil.*, os.*, or anything else..
# ...
# ...

Python automated file names

I want to automate the file name used when saving a spreadsheet using xlwt. Say there is a sub directory named Data in the folder the python program is running. I want the program to count the number of files in that folder (# = n). Then the filename must end in (n+1). If there are 0 files in the folder, the filename must be Trial_1.xls. This file must be saved in that sub directory.
I know the following:
import xlwt, os, os.path
n = len([name for name in os.listdir('.') if os.path.isfile(name)])
counts the number of files in the same folder.
a = n + 1
filename = "Trial_" + "a" + ".xls"
book.save(filename)
this will save the file properly named in to the same folder.
My question is how do I extend this in to a sub directory? Thanks.
os.listdir('.') the . in this points to the directory from where the file is executed. Change the . to point to the subdirectory you are interested in.
You should give it the full path name from the root of your file system; otherwise it will be relative to the directory from where the script is executed. This might not be what you want; especially if you need to refer to the sub directory from another program.
You also need to provide the full path to the filename variable; which would include the sub directory.
To make life easier, just set the full path to a variable and refer to it when needed.
TARGET_DIR = '/home/me/projects/data/'
n = sum(1 for f in os.listdir(TARGET_DIR) if os.path.isfile(os.path.join(TARGET_DIR, f)))
new_name = "{}Trial_{}.xls".format(TARGET_DIR,n+1)
You actually want glob:
from glob import glob
DIR = 'some/where/'
existing_files = glob(DIR + '*.xls')
filename = DIR + 'stuff--%d--stuff.xls' % (len(existing_files) + 1)
Since you said Burhan Khalid's answer "Works perfectly!" you should accept it.
I just wanted to point out a different way to compute the number. The way you are doing it works, but if we imagine you were counting grains of sand or something would use way too much memory. Here is a more direct way to get the count:
n = sum(1 for name in os.listdir('.') if os.path.isfile(name))
For every qualifying name, we get a 1, and all these 1's get fed into sum() and you get your count.
Note that this code uses a "generator expression" instead of a list comprehension. Instead of building a list, taking its length, and then discarding the list, the above code just makes an iterator that sum() iterates to compute the count.
It's a bit sleazy, but there is a shortcut we can use: sum() will accept boolean values, and will treat True as a 1, and False as a 0. We can sum these.
# sum will treat Boolean True as a 1, False as a 0
n = sum(os.path.isfile(name) for name in os.listdir('.'))
This is sufficiently tricky that I probably would not use this without putting a comment. But I believe this is the fastest, most efficient way to count things in Python.

Categories