Exclude subfolders and files

Exclude subfolders and files - python

I have a little problem with excludes files and subfolders.
for x in os.walk('core'):
for y in glob.glob(os.path.join(x[0], '*.py')):
s = y.replace('\\', '.')
x = s.replace('.py', '')
cogs.append(x)
My code for taking all files from every folders and now i just want to exclude files __init__, models and subfolder migrations with files like 0002_auto etc? Right now i just delete it from list manually like:
cogs.remove('core.rpg.models')
cogs.remove('core.rpg.__init__')
cogs.remove('core.rpg.migrations.__init__')

Normally you would do for root,dirs,files in os.walk('core'): .. and operate on dirs or files and combine them with root to get the full path to them.
Using glob on top is akin to doing something to x[2] (aka files - wich is the list of files inside root)
import os
what_i_want = []
skip_files = {"__init__.py"}
for root, dirs, files in os.walk('core'):
for f in files:
# skipe the subdirs models and migrations
if root.endswith("models") or root.endswith("migrations"):
continue
# skip any non .py file
if not f.endswith(".py"):
continue
# skip ceratain .py files
if f in skip_files:
continue
# remove .py from filename
f = f[:-3]
# add filename including full root and subst \ to .
what_i_want.append(os.path.join(root,f).replace("\\","."))
This would need some more slicing to only include the starting dir ("core") and not the full path to it.

Related

How to rename multiple files in a directory using os walk and split in PYTHON?

I need to create a copy of a directory tree called "caritemscopy" where all files, instead of being in directories named after years, rather have the year as part of the filename, and the year directories are entirely absent
My directory currently looks like this
After coding my directory should looks like this

It will list all directory files in path, then rename it and save all files at path and remove the empty directory.
import os
path = 'path_of_folder/F26/'
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.txt' in file:
files.append(os.path.join(r, file))
for f in files:
src = f.split('/')
os.rename(f, path + src[-2]+'-'+src[-1])
if not os.listdir(path+src[-2]):
os.rmdir(path+src[-2])
else:
pass

Copying files in python using shutil

I have the following directory structure:
-mailDir
-folderA
-sub1
-sub2
-inbox
-1.txt
-2.txt
-89.txt
-subInbox
-subInbox2
-folderB
-sub1
-sub2
-inbox
-1.txt
-2.txt
-200.txt
-577.txt
The aim is to copy all the txt files under inbox folder into another folder.
For this I tried the below code
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
shutil.copy(path.join(ii,i),destDir)
If the inbox directory only has .txt files then the above code works fine. Since the inbox folder under folderA directory has other sub directory along with .txt files, the code returns permission denied error. What I understood is shutil.copy won't allow to copy the folders.
The aim is to copy only the txt files in every inbox folder to some other location. If the file names are same in different inbox folder I have to keep both file names. How we can improve the code in this case ? Please note other than .txt all others are folders only.

One simple solution is to filter for any i that does not have the .txt extension by using the string endswith() method.
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
if i.endswith('.txt'):
shutil.copy(path.join(ii,i),destDir)
This should ignore any folders and non-txt files that are found with os.listdir(ii). I believe that is what you are looking for.

Just remembered that I once wrote several files to solve this exact problem before. You can find the source code here on my Github.
In short, there are two functions of interest here:
list_files(loc, return_dirs=False, return_files=True, recursive=False, valid_exts=None)
copy_files(loc, dest, rename=False)
For your case, you could copy and paste these functions into your project and modify copy_files like this:
def copy_files(loc, dest, rename=False):
# get files with full path
files = list_files(loc, return_dirs=False, return_files=True, recursive=True, valid_exts=('.txt',))
# copy files in list to dest
for i, this_file in enumerate(files):
# change name if renaming
if rename:
# replace slashes with hyphens to preserve unique name
out_file = sub(r'^./', '', this_file)
out_file = sub(r'\\|/', '-', out_file)
out_file = join(dest, out_file)
copy(this_file, out_file)
files[i] = out_file
else:
copy(this_file, dest)
return files
Then just call it like so:
copy_files('mailDir', 'destFolder', rename=True)
The renaming scheme might not be exactly what you want, but it will at least not override your files. I believe this should solve all your problems.

Here you go:
import os
from os import path
import shutil
destDir = '<absolute-path>'
for root, dirs, files in os.walk(os.getcwd()):
# Filter out only '.txt' files.
files = [f for f in files if f.endswith('.txt')]
# Filter out only 'inbox' directory.
dirs[:] = [d for d in dirs if d == 'inbox']
for f in files:
p = path.join(root, f)
# print p
shutil.copy(p, destDir)
Quick and simple.
sorry, I forgot the part where, you also need unique file names as well. The above solution only works for distinct file names in a single inbox folder.
For copying files from multiple inboxes and having a unique name in the destination folder, you can try this:
import os
from os import path
import shutil
sourceDir = os.getcwd()
fixedLength = len(sourceDir)
destDir = '<absolute-path>'
filteredFiles = []
for root, dirs, files in os.walk(sourceDir):
# Filter out only '.txt' files in all the inbox directories.
if root.endswith('inbox'):
# here I am joining the file name to the full path while filtering txt files
files = [path.join(root, f) for f in files if f.endswith('.txt')]
# add the filtered files to the main list
filteredFiles.extend(files)
# making a tuple of file path and file name
filteredFiles = [(f, f[fixedLength+1:].replace('/', '-')) for f in filteredFiles]
for (f, n) in filteredFiles:
print 'copying file...', f
# copying from the path to the dest directory with specific name
shutil.copy(f, path.join(destDir, n))
print 'copied', str(len(filteredFiles)), 'files to', destDir
If you need to copy all files instead of just txt files, then just change the condition f.endswith('.txt') to os.path.isfile(f) while filtering out the files.

Python: Loop to open multiple folders and files in python

I am new to python and currently work on data analysis.
I am trying to open multiple folders in a loop and read all files in folders.
Ex. working directory contains 10 folders needed to open and each folder contains 10 files.
My code for open each folder with .txt file;
file_open = glob.glob("home/....../folder1/*.txt")
I want to open folder 1 and read all files, then go to folder 2 and read all files... until folder 10 and read all files.
Can anyone help me how to write loop to open folder, included library needed to be used?
I have my background in R, for example, in R I could write loop to open folders and files use code below.
folder_open <- dir("......./main/")
for (n in 1 to length of (folder_open)){
file_open <-dir(paste0("......./main/",folder_open[n]))
for (k in 1 to length of (file_open){
file_open<-readLines(paste0("...../main/",folder_open[n],"/",file_open[k]))
//Finally I can read all folders and files.
}
}

This recursive method will scan all directories within a given directory and then print the names of the txt files. I kindly invite you to take it forward.
import os
def scan_folder(parent):
# iterate over all the files in directory 'parent'
for file_name in os.listdir(parent):
if file_name.endswith(".txt"):
# if it's a txt file, print its name (or do whatever you want)
print(file_name)
else:
current_path = "".join((parent, "/", file_name))
if os.path.isdir(current_path):
# if we're checking a sub-directory, recursively call this method
scan_folder(current_path)
scan_folder("/example/path") # Insert parent direcotry's path

Given the following folder/file tree:
C:.
├───folder1
│ file1.txt
│ file2.txt
│ file3.csv
│
└───folder2
file4.txt
file5.txt
file6.csv
The following code will recursively locate all .txt files in the tree:
import os
import fnmatch
for path,dirs,files in os.walk('.'):
for file in files:
if fnmatch.fnmatch(file,'*.txt'):
fullname = os.path.join(path,file)
print(fullname)
Output:
.\folder1\file1.txt
.\folder1\file2.txt
.\folder2\file4.txt
.\folder2\file5.txt

Your glob() pattern is almost correct. Try one of these:
file_open = glob.glob("home/....../*/*.txt")
file_open = glob.glob("home/....../folder*/*.txt")
The first one will examine all of the text files in any first-level subdirectory of home/......, whatever that is. The second will limit itself to subdirectories named like "folder1", "folder2", etc.
I don't speak R, but this might translate your code:
for filename in glob.glob("......../main/*/*.txt"):
with open(filename) as file_handle:
for line in file_handle:
# perform data on each line of text

I think nice way to do that would be to use os.walk. That will generate tree and you can then iterate through that tree.
import os
directory = './'
for d in os.walk(directory):
print(d)

This code will look for all directories inside a directory, printing out the names of all files found there:
#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: print filenames one level down from starting folder
#--------*---------*---------*---------*---------*---------*---------*---------*
import os, fnmatch, sys
def find_dirs(directory, pattern):
for item in os.listdir(directory):
if os.path.isdir(os.path.join(directory, item)):
if fnmatch.fnmatch(item, pattern):
filename = os.path.join(directory, item)
yield filename
def find_files(directory, pattern):
for item in os.listdir(directory):
if os.path.isfile(os.path.join(directory, item)):
if fnmatch.fnmatch(item, pattern):
filename = os.path.join(directory, item)
yield filename
#--------*---------*---------*---------*---------*---------*---------*---------#
while True:# M A I N L I N E #
#--------*---------*---------*---------*---------*---------*---------*---------#
# # Set directory
os.chdir("C:\\Users\\Mike\\\Desktop")
for filedir in find_dirs('.', '*'):
print ('Got directory:', filedir)
for filename in find_files(filedir, '*'):
print (filename)
sys.exit() # END PROGRAM

pathlib is a good choose
from pathlib import Path
# or use: glob('**/*.txt')
for txt_path in [_ for _ in Path('demo/test_dir').rglob('*.txt') if _.is_file()]:
print(txt_path.absolute())

Python: Remove empty folders recursively

I'm having troubles finding and deleting empty folders with my Python script.
I have some directories with files more or less like this:
A/
--B/
----a.txt
----b.pdf
--C/
----d.pdf
I'm trying to delete all files which aren't PDFs and after that delete all empty folders. I can delete the files that I want to, but then I can't get the empty directories. What I'm doing wrong?
os.chdir(path+"/"+name+"/Test Data/Checklists")
pprint("Current path: "+ os.getcwd())
for root, dirs, files in os.walk(path+"/"+name+"/Test Data/Checklists"):
for name in files:
if not(name.endswith(".pdf")):
os.remove(os.path.join(root, name))
pprint("Deletting empty folders..")
pprint("Current path: "+ os.getcwd())
for root, dirs, files in os.walk(path+"/"+name+"/Test Data/Checklists", topdown=False):
if not dirs and not files:
os.rmdir(root)

use insted the function
os.removedirs(path)
this will remove directories until the parent directory is not empty.

Ideally, you should remove the directories immediately after deleting the files, rather than doing two passes with os.walk
import sys
import os
for dir, subdirs, files in os.walk(sys.argv[1], topdown=False):
for name in files:
if not(name.endswith(".pdf")):
os.remove(os.path.join(dir, name))
# check whether the directory is now empty after deletions, and if so, remove it
if len(os.listdir(dir)) == 0:
os.rmdir(dir)

For empty folders deletion you can use this snippet.
It can be combined with some files deletion, but as last run should be used as is.
import os
def drop_empty_folders(directory):
"""Verify that every empty folder removed in local storage."""
for dirpath, dirnames, filenames in os.walk(directory, topdown=False):
if not dirnames and not filenames:
os.rmdir(dirpath)

remove all empty folders
import os
folders = './A/' # directory
for folder in list(os.walk(folders)) :
if not os.listdir(folder[0]):
os.removedirs(folder[0])

Excluding all but a single subdirectory from a file search

I have a directory structure that resembles the following:
Dir1
Dir2
Dir3
Dir4
L SubDir4.1
L SubDir4.2
L SubDir4.3
I want to generate a list of files (with full paths) that include all the contents of Dirs1-3, but only SubDir4.2 inside Dir4. The code I have so far is
import fnmatch
import os
for root, dirs, files in os.walk( '.' )
if 'Dir4' in dirs:
if not 'SubDir4.2' in 'Dir4':
dirs.remove( 'Dir4' )
for file in files
print os.path.join( root, file )
My problem is that the part where I attempt to exclude any file that does not have SubDir4.2 in it's path is excluding everything in Dir4, including the things I would like to remain. How should I amend that above to to do what I desire?
Update 1: I should add that there are a lot of directories below Dir4 so manually listing them in an excludes list isn't a practical option. I'd like to be able to specify SubDur4.2 as the only subdirectory within Dir4 to be read.
Update 2: For reason outside of my control, I only have access to Python version 2.4.3.

There are a few typos in your snippet. I propose this:
import os
def any_p(iterable):
for element in iterable:
if element:
return True
return False
include_dirs = ['Dir4/SubDir4.2', 'Dir1/SubDir4.2', 'Dir3', 'Dir2'] # List all your included folder names in that
for root, dirs, files in os.walk( '.' ):
dirs[:] = [d for d in dirs if any_p(d in os.path.join(root, q_inc) for q_inc in include_dirs)]
for file in files:
print file
EDIT: According to comments, I have changed that so this is include list, instead of an exclude one.
EDIT2: Added a any_p (any() equivalent function for python version < 2.5)
EDIT3bis: if you have other subfolders with the same name 'SubDir4.2' in other folders, you can use the following to specify the location:
include_dirs = ['Dir4/SubDir4.2', 'Dir1/SubDir4.2']
Assuming you have a Dir1/SubDir4.2.
If they are a lot of those, then you may want to refine this approach with fnmatch, or probably a regex query.

I altered mstud's solution to give you what you are looking for:
import os;
for root, dirs, files in os.walk('.'):
# Split the root into its path parts
tmp = root.split(os.path.sep)
# If the lenth of the path is long enough to be your path AND
# The second to last part of the path is Dir4 AND
# The last part of the path is SubDir4.2 THEN
# Stop processing this pass.
if (len(tmp) > 2) and (tmp[-2] == 'Dir4') and (tmp[-1] != 'SubDir4.2'):
continue
# If we aren't in Dir4, print the file paths.
if tmp[-1] != 'Dir4':
for file in files:
print os.path.join(root, file)
In short, the first "if" skips the printing of any directory contents under Dir4 that aren't SubDir4.2. The second "if" skips the printing of the contents of the Dir4 directory.

for root, dirs, files in os.walk('.'):
tmp = root.split(os.path.sep)
if len(tmp)>2 and tmp[-2]=="Dir4" and tmp[-1]=="SubDir4.2":
continue
for file in files:
print os.path.join(root, file)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Exclude subfolders and files - python

Related

How to rename multiple files in a directory using os walk and split in PYTHON?

Copying files in python using shutil

Python: Loop to open multiple folders and files in python

Python: Remove empty folders recursively

Excluding all but a single subdirectory from a file search

Categories

Resources