Renaming files that contain a pattern, looping through subfolders - python

I have a main folder, that contains multiple subfolders, that contain multiple files. I am trying to loop through subfolders and rename files that match a certain pattern. Here is what I have:
import os
from fnmatch import fnmatch
pattern = "*z_2*"
pattern2 ='b_2.txt'
path = r'C:\Users\Desktop\123'
list1= []
for (dirpath, dirnames, filenames) in os.walk(path):
list1+= [os.path.join(dirpath, file) for file in filenames]
for i in list1:
if fnmatch(i,pattern):
a=os.path.join(path,i)
b = os.path.dirname(i)
os.rename(a, os.path.join(b,pattern2))
What I don't understand, is why, when I specify use os.rename , it is instead creating a text file in the specified subfolder, resulting in:
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\Desktop\\ABC\\_100\\az_207.txt' -> 'C:\\Users\\Desktop\\ABC\\_100\\b_2.txt'

The problem is that when you rename a file, the destination filepath depends on b, which in turn depends only on the dirname part of i, not on i itself. So when your loop over list1 finds more than one file in the same directory, they all get the same value for os.path.join(b,pattern2). So your code is creating more than one file with the same name.
You probably want to reuse some part of a when building the destination filename, so as to ensure uniqueness.

Related

Python: traverse all directories under /mydir and remove all files with "(1)" in name

I am trying to remove all files with (1) in the name under /mydir and subdirectories. I have searched and found a script on how to find all files with *.txt as below. How to replace the if condition sentence to find all files with (1) in the name?
import os
for root, dirs, files in os.walk("/mydir"):
for file in files:
if file.endswith(".txt"):
os.remove(os.path.join(root, file))
You can do that like this:
import os
for root, dirs, files in os.walk("/mydir"):
for file in files:
if "(1)" in file:
os.remove(os.path.join(root, file))
First we import the os module for walk, remove, and join methods. Then iterate over the tuples yielded by os.walk. Then iterate over the files returned and check whether the filename contains the string "(1)". If it matches, we concatenate the file name with the directory using os.path.join and then remove the file with os.remove.
You can do this very easy with pathlib:
from pathlib import Path
for file in Path("/mydir").glob("*(1)*.*"):
print(file)
#file.unlink()
My recommendation is to print all files to delete first (as shown), and only uncomment the file.unlink() line once you are sure that the right files will be deleted.

Comparison of file list with files in folder

I have a list of filenames but in the directory they are named a little different. I wanna print filenames that are not in directory. Example of files:
FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001_2021-08-30T124702.130.tgz
import os
missing = ['FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001', 'dfiknvbdjfhnv']
for fileName in missing:
for fileNames in next(os.walk('C:\\Users\\foo\\bar'))[2]:
if fileName not in fileNames:
print(fileName)
I cannot get what I'm doing wrong...
The problem is that you iterate over every file in the directory (for fileNames in next(os.walk(...))[2]) and check if fileName is in each of those file names. For every file in the folder where fileName not in fileNames, fileName is printed, resulting in it being printed many times.
This can be fixed by doing a single check to see if all files in the folder do not contain the target file name.
import os
missing = ['FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001', 'dfiknvbdjfhnv']
fileNames = next(os.walk('C:\\Users\\foo\\bar'))[2]
for missingfileName in missing:
if all(missingfileName not in fileName for fileName in fileNames):
print(missingfileName)
If you want it to be more efficient and you are only looking for file names that are prefixes of other names, then you can use a data structure called a trie. For example if missing equals ['bcd'], and there is a file called abcde and these are not considered a match, then a trie is appropriate here.

How to do computations through directory and subfolders

I have one main directory which has 9 subfolders. Inside of each of them, there are 1000 files. I needed to do a for loop for reading main directory and folders but the problem is that, subfolder names are not similar and don't have a numerator and I got stuck. I have seen Iterate through folders, then subfolders and print filenames with path to text file but I could not distinguish how to get started.
My effort is below:
import os
for root, dirs, files in os.walk(r'\Desktop\output\new our scenario\test'):
for file in files:
with open(os.path.join(root, file), "r") as auto:
##Doing Whatever I want
But it's not correct and does not work.
Do you know glob? That might be a solution to your problem.
You can get a list of all files in subdirectories by using wildcard path names, e.g.:
Here is an example for looping through txt files, but you do not necessarily restrict it to a file type. But if you do not use *.* at the end it will also list dirs
import glob
file_list = glob.glob('known_dir/*/*.txt')
for file in file_list:
with open(file, "r") as auto:
##Doing Whatever you want

Search for multiple files by name and copy to a new folder

I have been trying to write some python code in order to get each line from a .txt file and search for a file with that name in a folder and its subfolders. After this I want to copy that file in a preset destination folder.
The thing is when I test this code I can read all the files in the .txt and I can display all files in a directory and its subdirectories. The problem rises when I have to compare the filename I read from the .txt (line by line as I said) with all the filenames within the directory folder and then copy the file there.
Any ideas what am I doing wrong?
import os, shutil
def main():
dst = '/Users/jorjis/Desktop/new'
f = open('/Users/jorjis/Desktop/articles.txt', 'rb')
lines = [line[:-1] for line in f]
for files in os.walk("/Users/jorjis/Desktop/folder/"):
for line in lines:
if line == files:
shutil.copy('/dir/file.ext', '/new/dir')
You are comparing the file names from the text file with a tuple with three elements: the root path of the currently visited folder, a list of all subdirectory names in that path, and a list of all file names in that path. Comparing a string with a tuple will never be true. You have to compare each file name with the set of file names to copy. The data type set comes in handy here.
Opening a file together with the with statement ensures that it is closed when the control flow leaves the with block.
The code might look like this:
import os
import shutil
def main():
destination = '/Users/jorjis/Desktop/new'
with open('/Users/jorjis/Desktop/articles.txt', 'r') as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
for root, _, filenames in os.walk('/Users/jorjis/Desktop/folder/'):
for filename in filenames:
if filename in filenames_to_copy:
shutil.copy(os.path.join(root, filename), destination)
If I had to guess, I would say that the files in the .txt contain the entire path. You'd need to add a little more to os.walk to match up completely.
for root, _, files in os.walk("/Users/jorjis/Desktop/folder/"):
for f in files:
new_path = f + root
if new_path in lines:
shutil.copy(new_path, `/some_new_dir')
Then again, I'm not sure what the .txt file looks like so it might be that your original way works. If that's the case, take a closer look at the lines = ... line.

iterating through folders and from each use one specific file in a method python

What I want to do is iterate through folders in a directory and in each folder find a file 'fileX' which I want to give to a method which itself needs the file name as a parameter to open it and get a specific value from it. So 'method' will extract some value from 'fileX' (the file name is the same in every folder).
My code looks something like this but I always get told that the file I want doesn't exist which is not the case:
import os
import xy
rootdir =r'path'
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(fileX)
print gain
Also my folders I am iterating through are named like 'folderX0', 'folderX1',..., 'folderX99', meaning they all have the same name with increasing ending numbers. It would be nice if I could tell the program to ignore every other folder which might be in 'path'.
Thanks for the help!
os.walk returns file and directory names relative to the root directory that it gives. You can combine them with os.path.join:
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(os.path.join(root, file))
print gain
See the documentation for os.walk for details:
To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
To trim it to ignore any folders but those named folderX, you could do something like the following. When doing os.walk top down (the default), you can delete items from the dirs list to prevent os.walk from looking in those directories.
for root, dirs, files in os.walk(rootdir):
for dir in dirs:
if not re.match(r'folderX[0-9]+$', dir):
dirs.remove(dir)
for file in files:
gain = xy.method(os.path.join(root, file))
print gain

Categories