Delete all files with certain properties in Python - python

How would you delete in Python all files in directory /tmp/dir and all its subdirectories that have extension .txt or .mp3?

You just need to use os.walk to traverse recursively a directory and os.remove when you find a file whose name matches your requirements.
Note that os.walk returns on one hand file names and, on the other hand, a root directory. Hence, for the os.remove to work you'll need to create the full filename with os.path.join.

Related

How to rename sub directory and file names recursively in script python3?

I have a recursive directory. Both subdirectory and files names have illegal characters. I have a function to clean up the names, such as it replaces a space with an underscore in the name. There must be an easier way but I couldn't find a way to both rename folders and files. So, I want to rename the folders first.
for path, subdirs, files in os.walk(root):
for name in subdirs:
new_name=clean_names(name)
name=os.path.join(path,name)
new_name=os.path.join(path,new_name)
os.chdir(path)
os.rename(name,new_name)
When I check my real folder and it contents I see that only the first subfolder name is corrected. I can see the reason because os.chdir(path) changes the cwd then it doesn't change back before for loop starts to second path. I thought after the os.rename I could rechange the cwd but I am sure there is a more elegant way to do this. If I remove the os.chdir line it gives filenotfound error.
I see that renaming subdirectories has been asked about before, but they are in command line.
You should use os.walk(root, topdown=False) instead; otherwise once the top folder gets renamed, os.walk won't have access to the subfolders because it can no longer find their parent folders.
Excerpt from the documentation:
If optional argument topdown is True or not specified, the triple for
a directory is generated before the triples for any of its
subdirectories (directories are generated top-down). If topdown is
False, the triple for a directory is generated after the triples for
all of its subdirectories (directories are generated bottom-up). No
matter the value of topdown, the list of subdirectories is retrieved
before the tuples for the directory and its subdirectories are
generated.
Note that you do not need to call os.chdir at all because all the paths passed to os.rename are absolute.

-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil

I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching each client's name. I'm not sure if I should be using Glob, or Shutil, or a combination of both. My working theory is that Glob should work for such a program, as the glob module "finds all the pathnames matching a specified pattern," and then use Shutil to physically move the files?
Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed based on client and content, such that the folder looks like this:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/J drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, similar to the pattern above, named as 'lastName, firstName'
I'm looking to have the program go through the :/Scans folder and move each PDF to the matching client folder based on 'lastName, firstName'
I've been able to write a simple program to move files between folders, but not one that will do the aforesaid name matching.
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive','C:/Users/Kenny/Documents/Clients')
^ Moving a file from one folder to another.. quite easily done.
Is there a way to modify the above code to apply to a regex (below)?
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive/\w*', 'C:/Users/Kenny/Documents/Clients/\w*')
EDIT: #Byberi - Something as such?
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.path.isfile()
This would print all the files and directories
for file in dirs:
print file
dest_dir = "C:/Users/Kenny/Documents/Clients"
for file in glob.glob(r'C:/*'):
print(file)
shutil.copy(file, dest_dir)
I've consulted the following threads already, but I cannot seem to find how to match and move the files.
Select files in directory and move them based on text list of filenames
https://docs.python.org/3/library/glob.html
Python move files from directories that match given criteria to new directory
https://www.guru99.com/python-copy-file.html
https://docs.python.org/3/howto/regex.html
https://code.tutsplus.com/tutorials/file-and-directory-operations-using-python--cms-25817

Unable to use getsize method with os.walk() returned files

I am trying to make a small program that looks through a directory (as I want to find recursively all the files in the sub directories I use os.walk()).
Here is my code:
import os
import os.path
filesList=[]
path = "C:\\Users\Robin\Documents"
for(root,dirs,files) in os.walk(path):
for file in files:
filesList+=file
Then I try to use the os.path.getsize() method to elements of filesList, but it doesn't work.
Indeed, I realize that the this code fills the list filesList with characters. I don't know what to do, I have tried several other things, such as :
for(root,dirs,files) in os.walk(path):
filesList+=[file for file in os.listdir(root) if os.path.isfile(file)]
This does give me files, but only one, which isn't even visible when looking in the directory.
Can someone explain me how to obtain files with which we can work (that is to say, get their size, hash them, or modify them...) on with os.walk ?
I am new to Python, and I don't really understand how to use os.walk().
The issue I suspect you're running into is that file contains only the filename itself, not any directories you have to navigate through from your starting folder. You should use os.path.join to combine the file name with the folder it is in, which is the root value yielded by os.walk:
for(root,dirs,files) in os.walk(path):
for file in files:
filesList.append(os.path.join(root, file))
Now all the filenames in filesList will be acceptable to os.path.getsize and other functions (like open).
I also fixed a secondary issue, which is that your use of += to extend a list wouldn't work the way you intended. You'd need to wrap the new file path in a list for that to work. Using append is more appropriate for adding a single value to the end of a list.
If you want to get a list of files including path use:
for(root, dirs, files) in os.walk(path):
fullpaths = [os.path.join(root, fil) for fil in files]
filesList+=fullpaths

python zipfile basename

I have some homework that I am trying to complete. I don't want the answer. I'm just having trouble in starting. The work I have tried is not working at all... Can someone please just provide a push in the right direction. I am trying to learn but after trying and trying I need some help.
I know I can you os.path.basename() to get the basename and then add it to the file name but I can't get it together.
Here is the assignment
In this project, write a function that takes a directory path and creates an archive of the directory only. For example, if the same path were used as in the example ("c:\\xxxx\\Archives\\archive_me"), the zipfile would contain archive_me\\groucho, archive_me\\harpo and archive_me\\chico.
The base directory (archive_me in the example above) is the final element of the input, and all paths recorded in the zipfile should start with the base directory.
If the directory contains sub-directories, the sub-directory names and any files in the sub-directories should not be included. (Hint: You can use isfile() to determine if a filename represents a regular file and not a directory.)
Thanks again any direction would be great.
It would help to know what you tried yourself, so I'm only giving a few pointers to methods in the standard libraries:
os.listdir to get the a list of files and folders under a given directory (beware, it returns only the file/folder name, not the full path!)
os.path.isfile as mentioned in the assignment to check if a given path represents a file or a folder
os.path.isdir, the opposite of os.path.isfile (thanks inspectorG4adget)
os.path.join to join a filename with the basedir without having to worry about slashes and delimiters
ZipFile for handling, well, zip files
zipFile.write to write the files found to the zip
I'm not sure you'll need all of those, but it doesn't hurt knowing they exist.

Copying files recursively with skipping some directories in Python?

I want to copy a directory to another directory recursively. I also want to ignore some files (eg. all hidden files; everything starting with ".") and then run a function on all the other files (after copying them). This is simple to do in the shell, but I need a Python script.
I tried using shutil.copytree, which has ignore support, but I don't know how to have it do a function on each file copied. I might also need to check some other condition when copying so I can't just run the function on all the files once they are copied over. I also tried looking at os.walk but I couldn't figure it out.
You can use os.walk to iterate over each file, apply your custom filtering function and copying over only the ones you care.
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
You can add all copied files to a list and use it to run you custom script or run the script after each copy operation.
This should get you started:
import os
def filterls(src, filter_func):
for root, dirs, files in os.walk(src):
for f in files:
if filter_func(f):
path = os.path.join(root, f)
yield path[len(src)+1:]
It takes a path to a directory and a function which takes a single parameter. If the function returns False for the passed filename, the file is ignored.
You can try it in a python interpreter like this:
# Get a list of all visible files rooted in the 'path/to/the/dir' directory
print list(filterls('path/to/the/dir', lambda p: not p.startswith('.')))
If you're on Python 2.6 or higher, you can simply use shutil.copytree and its ignore argument. Since it gets passed all the files and directories, you can call your function from there, unless you want it to be called right after the file is copied.
If that is the case, the easiest thing is to copy and modify the copytree code.

Categories