Python : Comparing file name with folder name in same directory - python

I have a folder with lot of zip files and the extracted folders for the same.
The folder has become so much clustered .
Is there any way to compare the zipped files name with the EXTRACTED folders name in the same directory ?
i want to delete the EXTRACTED folders if they have a .zip file in the same directory.

You can try something like this:
import shutil
from os import listdir
from os.path import isfile, join, isdir
directories = [d for d in listdir('./') if isdir(join('./', d))]
files = [f for f in listdir('./') if isfile(join('./', f)) and '.zip' in f]
# print(directories)
# print(files)
for d in directories:
for f in files:
if f == d + '.zip':
shutil.rmtree(d)
Note that you need to use something like shutil.rmtree(d) if you directories contain subdirectories and/or files.

Hope it help
import os
import shutil
for file in os.listdir(path_to_dir):
if os.path.isdir(file + ".zip"):
shutil.rmtree(file)

Related

How to move files with similar filename into new folder in Python

I have a list of files like this in the images folder.
and How can I create a new folder if there are multiple files with a similar name and move those similar files to that folder?
I am new to python.
Here is my expectation:
Try this:
import glob
from pathlib import Path
for fn in Path("Images").glob("*"):
file_base_name = "_".join(fn.stem.split("_")[:-1])
file_count = len(glob.glob1("Images", f"{file_base_name}*"))
if file_count > 1 or Path(file_base_name).is_dir():
outdir = Path("Images") / file_base_name
outdir.mkdir(exist_ok=True)
fn.rename(outdir / fn.name)
Input:
Output:
Please ignore file names extension. I create those just to test my code
In this case you don't even need re:
from pathlib import Path
for fn in Path("Images").glob("*.jpg"):
outdir = Path("Images") / "_".join(fn.stem.split("_")[:-1])
outdir.mkdir(exist_ok=True)
fn.rename(outdir / fn.name)
What's going on here?
Pathlib is how you want to think of paths if you can. It combines most of the os.path apis. Specifically:
glob gets us all the files matching the glob in the path
mkdir makes the directory (only if it doesn't exist)
rename moves the file there
I am unable to test since I don't have your files. My suggestion would be to comment out the mkdir command and the shutil.move command and replace them with print statements to see what commands would be generated before letting it run for real. But I think it should work.
import pathlib
import os
import re
from itertools import groupby
import shutil
source_dir = 'Images'
files = [os.path.basename(f) for f in pathlib.Path(source_dir).glob('*.jpg')]
def keyfunc(file):
m = re.match('^(.*?)_\d+.jpg$', file)
return m[1]
matched_files = [file for file in files if re.search(r'_\d+.jpg$', file)]
matched_files.sort()
for k, g in groupby(matched_files, keyfunc):
new_dir = os.path.join(source_dir, k)
if not os.path.exists(new_dir):
os.mkdir(new_dir)
for file in g:
shutil.move(os.path.join(source_dir, file), new_dir)

Extract all files from multiple folders with Python

I wrote down this code:
import shutil
files = os.listdir(path, path=None)
for d in os.listdir(path):
for f in files:
shutil.move(d+f, path)
I want every folder in a given directory (path) with files inside, the files contained in that folder are moved to the main directory(path) where the folder is contained.
For Example:
The files in this folder: C:/example/subfolder/
Will be moved in: C:/example/
(And the directory will be deleted.)
Sorry for my bad english :)
This should be what you are looking for, first we get all subfolders in our main folder. Then for each subfolder we get files contained inside and create our source path and destination path for shutil.move.
import os
import shutil
folder = r"<MAIN FOLDER>"
subfolders = [f.path for f in os.scandir(folder) if f.is_dir()]
for sub in subfolders:
for f in os.listdir(sub):
src = os.path.join(sub, f)
dst = os.path.join(folder, f)
shutil.move(src, dst)
Here another example , using a few lines with glob
import os
import shutil
import glob
inputs=glob.glob('D:\\my\\folder_with_sub\\*')
outputs='D:\\my\\folder_dest\\'
for f in inputs:
shutil.move(f, outputs)

Copying files in python using shutil

I have the following directory structure:
-mailDir
-folderA
-sub1
-sub2
-inbox
-1.txt
-2.txt
-89.txt
-subInbox
-subInbox2
-folderB
-sub1
-sub2
-inbox
-1.txt
-2.txt
-200.txt
-577.txt
The aim is to copy all the txt files under inbox folder into another folder.
For this I tried the below code
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
shutil.copy(path.join(ii,i),destDir)
If the inbox directory only has .txt files then the above code works fine. Since the inbox folder under folderA directory has other sub directory along with .txt files, the code returns permission denied error. What I understood is shutil.copy won't allow to copy the folders.
The aim is to copy only the txt files in every inbox folder to some other location. If the file names are same in different inbox folder I have to keep both file names. How we can improve the code in this case ? Please note other than .txt all others are folders only.
One simple solution is to filter for any i that does not have the .txt extension by using the string endswith() method.
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
if i.endswith('.txt'):
shutil.copy(path.join(ii,i),destDir)
This should ignore any folders and non-txt files that are found with os.listdir(ii). I believe that is what you are looking for.
Just remembered that I once wrote several files to solve this exact problem before. You can find the source code here on my Github.
In short, there are two functions of interest here:
list_files(loc, return_dirs=False, return_files=True, recursive=False, valid_exts=None)
copy_files(loc, dest, rename=False)
For your case, you could copy and paste these functions into your project and modify copy_files like this:
def copy_files(loc, dest, rename=False):
# get files with full path
files = list_files(loc, return_dirs=False, return_files=True, recursive=True, valid_exts=('.txt',))
# copy files in list to dest
for i, this_file in enumerate(files):
# change name if renaming
if rename:
# replace slashes with hyphens to preserve unique name
out_file = sub(r'^./', '', this_file)
out_file = sub(r'\\|/', '-', out_file)
out_file = join(dest, out_file)
copy(this_file, out_file)
files[i] = out_file
else:
copy(this_file, dest)
return files
Then just call it like so:
copy_files('mailDir', 'destFolder', rename=True)
The renaming scheme might not be exactly what you want, but it will at least not override your files. I believe this should solve all your problems.
Here you go:
import os
from os import path
import shutil
destDir = '<absolute-path>'
for root, dirs, files in os.walk(os.getcwd()):
# Filter out only '.txt' files.
files = [f for f in files if f.endswith('.txt')]
# Filter out only 'inbox' directory.
dirs[:] = [d for d in dirs if d == 'inbox']
for f in files:
p = path.join(root, f)
# print p
shutil.copy(p, destDir)
Quick and simple.
sorry, I forgot the part where, you also need unique file names as well. The above solution only works for distinct file names in a single inbox folder.
For copying files from multiple inboxes and having a unique name in the destination folder, you can try this:
import os
from os import path
import shutil
sourceDir = os.getcwd()
fixedLength = len(sourceDir)
destDir = '<absolute-path>'
filteredFiles = []
for root, dirs, files in os.walk(sourceDir):
# Filter out only '.txt' files in all the inbox directories.
if root.endswith('inbox'):
# here I am joining the file name to the full path while filtering txt files
files = [path.join(root, f) for f in files if f.endswith('.txt')]
# add the filtered files to the main list
filteredFiles.extend(files)
# making a tuple of file path and file name
filteredFiles = [(f, f[fixedLength+1:].replace('/', '-')) for f in filteredFiles]
for (f, n) in filteredFiles:
print 'copying file...', f
# copying from the path to the dest directory with specific name
shutil.copy(f, path.join(destDir, n))
print 'copied', str(len(filteredFiles)), 'files to', destDir
If you need to copy all files instead of just txt files, then just change the condition f.endswith('.txt') to os.path.isfile(f) while filtering out the files.

match filenames to foldernames then move files

I have files named "a1.txt", "a2.txt", "a3.txt", "a4.txt", "a5.txt" and so on. Then I have folders named "a1_1998", "a2_1999", "a3_2000", "a4_2001", "a5_2002" and so on.
I would like to make the conection between file "a1.txt" & folder "a1_1998" for example. (I'm guessing I'll need a regular expresion to do this). then use shutil to move file "a1.txt" into folder "a1_1998", file "a2.txt" into folder "a2_1999" etc....
I've started like this but I'm stuck because of my lack of understanding of regular expresions.
import re
##list files and folders
r = re.compile('^a(?P')
m = r.match('a')
m.group('id')
##
##Move files to folders
I modified the answer below slightly to use shutil to move the files, did the trick!!
import shutil
import os
import glob
files = glob.glob(r'C:\Wam\*.txt')
for file in files:
# this will remove the .txt extension and keep the "aN"
first_part = file[7:-4]
# find the matching directory
dir = glob.glob(r'C:\Wam\%s_*/' % first_part)[0]
shutil.move(file, dir)
You do not need regular expressions for this.
How about something like this:
import glob
files = glob.glob('*.txt')
for file in files:
# this will remove the .txt extension and keep the "aN"
first_part = file[:-4]
# find the matching directory
dir = glob.glob('%s_*/' % first_part)[0]
os.rename(file, os.path.join(dir, file))
A slight alternative, taking into account Inbar Rose's suggestion.
import os
import glob
files = glob.glob('*.txt')
dirs = glob.glob('*_*')
for file in files:
filename = os.path.splitext(file)[0]
matchdir = next(x for x in dirs if filename == x.rsplit('_')[0])
os.rename(file, os.path.join(matchdir, file))

All Files in Dir & Sub-Dir

I would like to find all the files in a directory and all sub-directories.
code used:
import os
import sys
path = "C:\\"
dirs = os.listdir(path)
filename = "C.txt"
FILE = open(filename, "w")
FILE.write(str(dirs))
FILE.close()
print dirs
The problem is - this code only lists files in directories, not sub-directories. What do I need to change in order to also list files in subdirectories?
To traverse a directory tree you want to use os.walk() for this.
Here's an example to get you started:
import os
searchdir = r'C:\root_dir' # traversal starts in this directory (the root)
for root, dirs, files in os.walk(searchdir):
for name in files:
(base, ext) = os.path.splitext(name) # split base and extension
print base, ext
which would give you access to the file names and the components.
You'll find the functions in the os and os.path module to be of great use for this sort of work.
This function will help you: os.path.walk() http://docs.python.org/library/os.path.html#os.path.walk

Categories