I have several .asc files hidden 5 folders deep. For example:
main > Folder1a > Folder2a > Folder3a > Folder4a > myfile1.asc
main > Folder1b > Folder2b > Folder3b > Folder4b > myfile2.asc
main > Folder1c > Folder2c > Folder3c > Folder4c > myfile3.asc
What method can I use to get a list of myfile.asc files contained within the main folder?
f_walker = os.walk("/path/to/main")
f_generator = (os.path.join(cur,file) for cur,f,d in f_walker for file in f)
all_files = [file for file in f_generator if file.endswith(".asc")]
I think ... but it might go slow if you know they will only ever be 5 levels deep (never 4 and never 6... then you can optimize this some)
for example something like this might be much faster
import os
import glob
def go_deep(base_path,pattern,depth):
if depth > 0:
for fname in os.listdir(base_path):
full_path = os.path.join(base_path,fname)
if os.path.isdir(full_path):
for fpath in go_deep(full_path,pattern,depth-1):
yield fpath
else:
for fname in glob.glob(pattern):
yield os.path.join(base_path,fname)
print list(go_deep(".",pattern="*.asc",depth=5))
This will work. Give the extension with the dot. Directory does not need quotes. Basically same as Joran's answer with input from user. Did this for another project...
import os
extension = input("enter extension:")
directory = input("enter directory to start in:")
for root, dirs, files in os.walk(directory):
for fname in files:
if fname.endswith(extension):
full_fname = os.path.join(root, fname)
print(full_fname)
Related
I am reading in all the files in a given folder:
import os
path = '/Users/user/Desktop/folder_name'
files = os.listdir(path)
I have multiple files (100+) with the following names:
20220330_a.txt 20220330_b.txt 20220330_c.txt
I want to replace the "20220331" to "20220630" in the actual file names in the folder, so I obtain 20220630_a.txt, 20220630_b.txt etc.
Any ideas?
I figured it out myself:
old_date = "20200331"
new_date = "20200630"
for file in os.listdir(path):
if file.startswith(old_date):
if file.find(old_date) > -1:
counter = counter + 1
os.rename(os.path.join(path, file), os.path.join(path, file.replace(old_date,new_date)))
if counter == 0:
print("No file has been found")
How can I delete the content (zero the filesize) of a large directory tree (10 GB, 1K files) but keep the entire tree structure, filenames, extensions. (If I can keep the original last write time [last content modification time] that's a bonus).
I have seen several suggestions for individual files, but can not figure out the way to make this work for the entire CWD.
def deleteContent(fName):
with open(fName, "w"):
pass
Running following as administrator should reset all content to an empty file and retain the lastwritetime's of the files
gci c:\temp\test\*.* -recurse | % {
$LastWriteTime = $PSItem.LastWriteTime
clear-content $PSItem;
$PSItem.LastWriteTime = $LastWriteTime
}
os.walk() returns all directories as a list of following tuple:
(directory, list of folders in the directory, list of files in the directory)
When we combine your code with os.walk():
import os
for tuple in os.walk("top_directory"):
files = tuple[2]
dir = tuple[0]
for file in files:
with open(os.path.join(dir, file), "w"):
pass
All good answers, but I can see two more challenges with the answers provided:
When traversing over a directory tree, you may want to limit the depth it goes to, this to protect you from very large directory trees. And secondly Windows has a limitation (enforced by Explorer) of 256 characters in the filename and path. While this limitation will produce various OS errors, there is a workaround for this.
Lets start with the workaround for the maximum length of the filepath, you can do something like the following as a workaround:
import os
import platform
def full_path_windows(filepath):
"""
Filenames and paths have a default limitation of 256 characters in Windows.
By inserting '\\\\?\\' at the start of the path it removes this limitation.
This function inserts '\\\\?\\' at the start of the path, on Windows only
Only if the path starts with '<driveletter>:\\' e.g 'C:\\'.
It will also normalise the characters/case of the path.
"""
if platform.system() == 'Windows':
if filepath[1:3] == ':\\':
return u'\\\\?\\' + os.path.normcase(filepath)
return os.path.normcase(filepath)
There are mentions of write protect, or file in use, or any other condition which may result in not being able to write to the file, this can be checked (without actually writing) by the following:
import os
def write_access(filepath):
"""
Usage:
read_access(filepath)
This function returns True if Write Access is obtained
This function returns False if Write Access is not obtained
This function returns False if the filepath does not exists
filepath = must be an existing file
"""
if os.path.isfile(filepath):
return os.access(filepath, os.W_OK)
return False
For setting minimum depth or maximum depth, you can do something like this:
import os
def get_all_files(rootdir, mindepth = 1, maxdepth = float('inf')):
"""
Usage:
get_all_files(rootdir, mindepth = 1, maxdepth = float('inf'))
This returns a list of all files of a directory, including all files in
subdirectories. Full paths are returned.
WARNING: this may create a very large list if many files exists in the
directory and subdirectories. Make sure you set the maxdepth appropriately.
rootdir = existing directory to start
mindepth = int: the level to start, 1 is start at root dir, 2 is start
at the sub direcories of the root dir, and-so-on-so-forth.
maxdepth = int: the level which to report to. Example, if you only want
in the files of the sub directories of the root dir,
set mindepth = 2 and maxdepth = 2. If you only want the files
of the root dir itself, set mindepth = 1 and maxdepth = 1
"""
file_paths = []
root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1
for dirpath, dirs, files in os.walk(rootdir):
depth = dirpath.count(os.path.sep) - root_depth
if mindepth <= depth <= maxdepth:
for filename in files:
file_paths.append(os.path.join(dirpath, filename))
elif depth > maxdepth:
del dirs[:]
return file_paths
Now to roll the above code up in a single function, this should give you an idea:
import os
def clear_all_files_content(rootdir, mindepth = 1, maxdepth = float('inf')):
not_cleared = []
root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1
for dirpath, dirs, files in os.walk(rootdir):
depth = dirpath.count(os.path.sep) - root_depth
if mindepth <= depth <= maxdepth:
for filename in files:
filename = os.path.join(dirpath, filename)
if filename[1:3] == ':\\':
filename = u'\\\\?\\' + os.path.normcase(filename)
if (os.path.isfile(filename) and os.access(filename, os.W_OK)):
with open(filename, 'w'):
pass
else:
not_cleared.append(filename)
elif depth > maxdepth:
del dirs[:]
return not_cleared
This does not maintain the "last write time".
It will return the list not_cleared, which you can check for files which encountered a write access problem.
I have a range of folders which are named like folder0, folder2,..., folder99. Now I want to walk through folder0,..., folderX and print their files. X should stay variable and easy to change.
My code looks something like this but its not working how I want it to work yet because I can't decide until which number I want to go.
import os
import re
rootdir = r'path'
for root, dirs, files in os.walk(rootdir):
for dir in dirs:
if not re.match(r'folder[0-9]+$', dir):
dirs.remove(dir)
for file in files:
print files
Assuming your name scheme is consistent, which you state, why do the os.walk?
import os
dir_path = '/path/to/folders/folder{}'
x = 10
for i in range(0, x):
formatted_path = dir_path.format(i)
try:
for f in os.listdir(formatted_path):
filename = os.path.join(formatted_path, f)
if os.path.isfile(filename):
print filename
except OSError:
print "{} does not exist".format(formatted_path)
Im new on Python, Im actually on a short course this week, but I have a very specific request and I dont know how to deal with it right now: I have many different txt files in a folder, when I use the following code I receive only the filename of two of the many files, why is this?
regards!
import dircache
lista = dircache.listdir('C:\FDF')
i = 0
check = len(lista[0])
temp = []
count = len(lista)
while count != 0:
if len(lista[i]) != check:
temp.append(lista[i- 1])
check = len(lista[i])
else:
i = i + 1
count = count - 1
print (temp)
Maybe you can use the glob library: http://docs.python.org/2/library/glob.html
It seems that it works UNIX-like for listing files so maybe it can work with this?
import glob
directory = 'yourdirectory/'
filelist = glob.glob(directory+'*.txt')
If I've understood you correct, you would like to get all files?
Try it in this case:
import os
filesList = None
dir = 'C:\FDF'
for root, dirs, files in os.walk(dir):
filesList = files
break
print(filesList)
If need full path use:
import os.path
filesList = None
dir = 'C:\FDF'
for root, dirs, files in os.walk(dir):
for file in files:
filesList.append(os.path.join(root, file))
print(filesList)
I'm looking for a way to include/exclude files patterns and exclude directories from a os.walk() call.
Here's what I'm doing by now:
import fnmatch
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def _filter(paths):
for path in paths:
if os.path.isdir(path) and not path in excludes:
yield path
for pattern in (includes + excludes):
if not os.path.isdir(path) and fnmatch.fnmatch(path, pattern):
yield path
for root, dirs, files in os.walk('/home/paulo-freitas'):
dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))
files[:] = _filter(map(lambda f: os.path.join(root, f), files))
for filename in files:
filename = os.path.join(root, filename)
print(filename)
Is there a better way to do this? How?
This solution uses fnmatch.translate to convert glob patterns to regular expressions (it assumes the includes only is used for files):
import fnmatch
import os
import os.path
import re
includes = ['*.doc', '*.odt'] # for files only
excludes = ['/home/paulo-freitas/Documents'] # for dirs and files
# transform glob patterns to regular expressions
includes = r'|'.join([fnmatch.translate(x) for x in includes])
excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'
for root, dirs, files in os.walk('/home/paulo-freitas'):
# exclude dirs
dirs[:] = [os.path.join(root, d) for d in dirs]
dirs[:] = [d for d in dirs if not re.match(excludes, d)]
# exclude/include files
files = [os.path.join(root, f) for f in files]
files = [f for f in files if not re.match(excludes, f)]
files = [f for f in files if re.match(includes, f)]
for fname in files:
print fname
From docs.python.org:
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
When topdown is True, the caller can modify the dirnames list in-place … this can be used to prune the search …
for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):
# excludes can be done with fnmatch.filter and complementary set,
# but it's more annoying to read.
dirs[:] = [d for d in dirs if d not in excludes]
for pat in includes:
for f in fnmatch.filter(files, pat):
print os.path.join(root, f)
I should point out that the above code assumes excludes is a pattern, not a full path. You would need to adjust the list comprehension to filter if os.path.join(root, d) not in excludes to match the OP case.
why fnmatch?
import os
excludes=....
for ROOT,DIR,FILES in os.walk("/path"):
for file in FILES:
if file.endswith(('doc','odt')):
print file
for directory in DIR:
if not directory in excludes :
print directory
not exhaustively tested
dirtools is perfect for your use-case:
from dirtools import Dir
print(Dir('.', exclude_file='.gitignore').files())
Here is one way to do that
import fnmatch
import os
excludes = ['/home/paulo-freitas/Documents']
matches = []
for path, dirs, files in os.walk(os.getcwd()):
for eachpath in excludes:
if eachpath in path:
continue
else:
for result in [os.path.abspath(os.path.join(path, filename)) for
filename in files if fnmatch.fnmatch(filename,'*.doc') or fnmatch.fnmatch(filename,'*.odt')]:
matches.append(result)
print matches
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def file_search(path, exe):
for x,y,z in os.walk(path):
for a in z:
if a[-4:] == exe:
print os.path.join(x,a)
for x in includes:
file_search(excludes[0],x)
This is an example of excluding directories and files with os.walk():
ignoreDirPatterns=[".git"]
ignoreFilePatterns=[".php"]
def copyTree(src, dest, onerror=None):
src = os.path.abspath(src)
src_prefix = len(src) + len(os.path.sep)
for root, dirs, files in os.walk(src, onerror=onerror):
for pattern in ignoreDirPatterns:
if pattern in root:
break
else:
#If the above break didn't work, this part will be executed
for file in files:
for pattern in ignoreFilePatterns:
if pattern in file:
break
else:
#If the above break didn't work, this part will be executed
dirpath = os.path.join(dest, root[src_prefix:])
try:
os.makedirs(dirpath,exist_ok=True)
except OSError as e:
if onerror is not None:
onerror(e)
filepath=os.path.join(root,file)
shutil.copy(filepath,dirpath)
continue;#If the above else didn't executed, this will be reached
continue;#If the above else didn't executed, this will be reached
python >=3.2 due to exist_ok in makedirs
The above methods had not worked for me.
So, This is what I came up with an expansion of my original answer to another question.
What worked for me was:
if (not (str(root) + '/').startswith(tuple(exclude_foldr)))
which compiled a path and excluded the tuple of my listed folders.
This gave me the exact result I was looking for.
My goal for this was to keep my mac organized.
I can Search any folder by path, locate & move specific file.types, ignore subfolders and i preemptively prompt the user if they want to move the files.
NOTE: the Prompt is only one time per run and is NOT per file
By Default the prompt defaults to NO when you hit enter instead of [y/N], and will just list the Potential files to be moved.
This is only a snippet of my GitHub Please visit for the total script.
HINT: Read the script below as I added info per line as to what I had done.
#!/usr/bin/env python3
# =============================================================================
# Created On : MAC OSX High Sierra 10.13.6 (17G65)
# Created On : Python 3.7.0
# Created By : Jeromie Kirchoff
# =============================================================================
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# =============================================================================
from os import walk
from os import path
from shutil import move
import getpass
import click
mac_username = getpass.getuser()
includes_file_extensn = ([".jpg", ".gif", ".png", ".jpeg", ])
search_dir = path.dirname('/Users/' + mac_username + '/Documents/')
target_foldr = path.dirname('/Users/' + mac_username + '/Pictures/Archive/')
exclude_foldr = set([target_foldr,
path.dirname('/Users/' + mac_username +
'/Documents/GitHub/'),
path.dirname('/Users/' + mac_username +
'/Documents/Random/'),
path.dirname('/Users/' + mac_username +
'/Documents/Stupid_Folder/'),
])
if click.confirm("Would you like to move files?",
default=False):
question_moving = True
else:
question_moving = False
def organize_files():
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# topdown=True required for filtering.
# "Root" had all info i needed to filter folders not dir...
for root, dir, files in walk(search_dir, topdown=True):
for file in files:
# creating a directory to str and excluding folders that start with
if (not (str(root) + '/').startswith(tuple(exclude_foldr))):
# showcase only the file types looking for
if (file.endswith(tuple(includes_file_extensn))):
# using path.normpath as i found an issue with double //
# in file paths.
filetomove = path.normpath(str(root) + '/' +
str(file))
# forward slash required for both to split
movingfileto = path.normpath(str(target_foldr) + '/' +
str(file))
# Answering "NO" this only prints the files "TO BE Moved"
print('Files To Move: ' + str(filetomove))
# This is using the prompt you answered at the beginning
if question_moving is True:
print('Moving File: ' + str(filetomove) +
"\n To:" + str(movingfileto))
# This is the command that moves the file
move(filetomove, movingfileto)
pass
# The rest is ignoring explicitly and continuing
else:
pass
pass
else:
pass
else:
pass
if __name__ == '__main__':
organize_files()
Example of running my script from terminal:
$ python3 organize_files.py
Exclude list: {'/Users/jkirchoff/Pictures/Archive', '/Users/jkirchoff/Documents/Stupid_Folder', '/Users/jkirchoff/Documents/Random', '/Users/jkirchoff/Documents/GitHub'}
Files found will be moved to this folder:/Users/jkirchoff/Pictures/Archive
Would you like to move files?
No? This will just list the files.
Yes? This will Move your files to the target folder.
[y/N]:
Example of listing files:
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
...etc
Example of moving files:
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
To: /Users/jkirchoff/Pictures/Archive/1.custom-award-768x512.jpg
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
To: /Users/jkirchoff/Pictures/Archive/10351458_318162838331056_9023492155204267542_n.jpg
...