Obtain paths of folders not containing a pdf file - python

I want to find all the folders that do not contain a pdf document inside of them.
This is what I have tried so far
import os
path = 'T:/Projects/'
paths_not_containing_pdfs = []
for root, dirs, files in os.walk(path):
for name in files:
if not name.endswith('.pdf'):
paths_not_containing_pdfs.append(root)
However, this code doesn't work. It pretty much returns the path to all the folders. And not the only ones without pdfs.
Any help?

Your code
for root, dirs, files in os.walk(path):
for name in files:
if not name.endswith('.pdf'):
paths_not_containing_pdfs.append(root)
does find all folders which do have at least one non-PDF file, to find these which do not have any PDF you might use any as follows
for root, dirs, files in os.walk(path):
if not any(name.endswith('.pdf') for name in files):
paths_not_containing_pdfs.append(root)
Note that I used so-called comprehension, name.endswith('.pdf') for name in files gives Trues (for .pdf) and Falses (for other), then any check if at least one True appeared.

Related

Moving all .txt files in a directory to a new folder with os.walk()?

import os
import shutil
folder = 'c:\\Users\\myname\\documents'
for folderNames, subfolders, filenames in os.walk(folder):
for file in filenames:
if file.endswith('.txt'):
shutil.copy(?????, 'c:\\Users\\myname\\documents\\putfileshere\\' + file)
This was simple to do for all .txt files in a folder by using os.listdir but I'm having trouble with this because in oswalk I don't know how to get the full filepath of the file that ends in .txt since it could be in however many subfolders
Not sure If I'm using the correct terminology of directory, but to be more clear I want to move all .txt files to the new folder even if it's 1,2,3 subfolders deep into the documents folder.
To get the full path, you have to combine the root and filename parts. The root part points to the full path of the enumerated file names.
for root, _, filenames in os.walk(folder):
for filename in filenames:
if filename.endswith('.txt'):
file_path = os.path.join(root, filename)
shutil.copy(file_path, ...)
You could also use glob.glob(pathname)

Reading all files that start with a certain string in a directory

Say I have a directory.
In this directory there are single files as well as folders.
Some of those folders could also have subfolders, etc.
What I am trying to do is find all of the files in this directory that start with "Incidences" and read each csv into a pandas data frame.
I am able to loop through all the files and get the names, but cannot read them into data frames.
I am getting the error that "___.csv" does not exist, as it might not be directly in the directory, but rather in a folder in another folder in that directory.
I have been trying the attached code.
inc_files2 = []
pop_files2 = []
for root, dirs, files in os.walk(directory):
for f in files:
if f.startswith('Incidence'):
inc_files2.append(f)
elif f.startswith('Population Count'):
pop_files2.append(f)
for file in inc_files2:
inc_frames2 = map(pd.read_csv, inc_files2)
for file in pop_files2:
pop_frames2 = map(pd.read_csv, pop_files2)
You are adding only file name to the lists, not their path. You can use something like this to add paths instead:
inc_files2.append(os.path.join(root, f))
You have to add the path from the root directory where you are
Append the entire pathname, not just the bare filename, to inc_files2.
You can use os.path.abspath(f) to read the full path of a file.
You can make use of this by making the following changes to your code.
for root, dirs, files in os.walk(directory):
for f in files:
f_abs = os.path.abspath(f)
if f.startswith('Incidence'):
inc_files2.append(f_abs)
elif f.startswith('Population Count'):
pop_files2.append(f_abs)

How to search the entire HDD for all pdf files?

As the title suggests, I would like to get python 3.5 to search my root ('C:\')
for pdf files and then move those files to a specific folder.
This task can easily split into 2:
1. Search my root for files with the pdf extension.
2. Move those to a specific folder.
Now. I know how to search for a specific file name, but not plural files that has a specific extension.
import os
print('Welcome to the Walker Module.')
print('find(name, path) or find_all(name, path)')
def find(name, path):
for root, dirs, files in os.walk(path):
print('Searching for files...')
if name in files:
return os.path.join(root, name)
def find_all(name, path):
result = []
for root, dirs, files in os.walk(path):
print('Searching for files...')
if name in files:
result.append(os.path.join(root, name))
return result
This little program will find me either the 1st or all locations of a specific file.
I, however, can not modify this to be able to search for pdf files due to the lack of knowledge with python and programming in general.
Would love to have some kind of insight on where to go from here.
To sum it up,
Search the root for all pdf files.
Move those files into a specific location. Lets say 'G:\Books'
Thanks in advance.
Your find_all function is very close to the final result.
When you loop through the files, you can check their extension with os.path.splitext, and if they are .pdf files you can move them with shutil.move
Here's an example that walks the tree of a source directory, checks the extension of every file and, in case of match, moves the files to a destination directory:
import os
import shutil
def move_all_ext(extension, source_root, dest_dir):
# Recursively walk source_root
for (dirpath, dirnames, filenames) in os.walk(source_root):
# Loop through the files in current dirpath
for filename in filenames:
# Check file extension
if os.path.splitext(filename)[-1] == extension:
# Move file
shutil.move(os.path.join(dirpath, filename), os.path.join(dest_dir, filename))
# Move all pdf files from C:\ to G:\Books
move_all_ext(".pdf", "C:\\", "G:\\Books")
You can use glob from python 3.5 onwards. It supports a recursive search.
If recursive is true, the pattern “**” will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep, only directories and subdirectories match.
Therefore you can use it like
import glob
from os import path
import shutil
def searchandmove(wild, srcpath, destpath):
search = path.join(srcpath,'**', wild)
for fpath in glob.iglob(search, recursive=True):
print(fpath)
dest = path.join(destpath, path.basename(fpath))
shutil.move(fpath, dest)
searchandmove('*.pdf', 'C:\\', 'G:\\Books')
With a minimum of string wrangling. For large searches however such as from the root of a filesystem it can take a while, but I'm sure any approach would have this issue.
Tested only on linux, but should work fine on windows. Whatever you pass as destpath must already exist.

iterating through folders and from each use one specific file in a method python

What I want to do is iterate through folders in a directory and in each folder find a file 'fileX' which I want to give to a method which itself needs the file name as a parameter to open it and get a specific value from it. So 'method' will extract some value from 'fileX' (the file name is the same in every folder).
My code looks something like this but I always get told that the file I want doesn't exist which is not the case:
import os
import xy
rootdir =r'path'
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(fileX)
print gain
Also my folders I am iterating through are named like 'folderX0', 'folderX1',..., 'folderX99', meaning they all have the same name with increasing ending numbers. It would be nice if I could tell the program to ignore every other folder which might be in 'path'.
Thanks for the help!
os.walk returns file and directory names relative to the root directory that it gives. You can combine them with os.path.join:
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(os.path.join(root, file))
print gain
See the documentation for os.walk for details:
To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
To trim it to ignore any folders but those named folderX, you could do something like the following. When doing os.walk top down (the default), you can delete items from the dirs list to prevent os.walk from looking in those directories.
for root, dirs, files in os.walk(rootdir):
for dir in dirs:
if not re.match(r'folderX[0-9]+$', dir):
dirs.remove(dir)
for file in files:
gain = xy.method(os.path.join(root, file))
print gain

os.walk on non C drive directory

I know there are several posts that touch on this, but I haven't found one that works for me yet. I need to create a list of files with an .mxd extension by searching an entire mapped directory. I used this code and it works:
import os
file_list = []
for (paths, dirs, files) in os.walk(folder):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(paths, file))
However, it only works on the C drive. I need to be able to search for these files on a mapped drive of my choosing. Here's the script I'm using, but it doesn't work. I know there is an mxd file in the subdirectory of this drive, but it isn't being reported in the file list. In fact, the file list is totally empty, and it shouldn't be.
import os
path = r"U:/TEST/"
filenamelist = []
for files in os.walk(path):
if file.endswith(".mxd"):
filenamelist.append(files)
Does someone see anything wrong in my second block of code that would provent it from iterated through subdirectories at the given path and reporting back files with an .mxd extension?
os.walk(path) yields a 3-tuple which is commonly unpacked as root, dirs, files. So instead of
for files in os.walk(path):
...
use
for root, dirs, files in os.walk(path):
for filename in files:
if filename.endswith(".mxd"):
filenamelist.append(filename)
Does someone see anything wrong in my second block of code that would provent it from iterated through subdirectories at the given path and reporting back files with an .mxd extension?
Yes. Compare and contrast your two loops:
for (paths, dirs, files) in os.walk(folder):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(paths, file))
for files in os.walk(path):
if file.endswith(".mxd"):
filenamelist.append(files)
You're trying to do the exact same thing, but not using even remotely the same code.
In the first one, you loop over the walk, storing each tuple in (paths, dirs, files), then loop over files.
In the second one, you loop over the walk, storing each tuple in files, don't loop over anything, and then just use some variable named file left over from some earlier code.
And then, even if that part worked, you end up appending files (which, remember, is a tuple of three lists) rather than file—or, probably better, os.path.join(paths, file)—to the list.
Just make the second one look like the first. Or, better, put it in a function and call it twice, instead of copying and pasting it.
Here's the final script that worked for crawling on a drive root other than C:
import os
file_list = []
path = r'U:\\'
for (dirpath, subdirs, files) in os.walk(path):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(dirpath, file))

Categories