Wild card file path comparison

Wild card file path comparison - python

I have a folder with a number of sub folders:
C:/Users/stacey/WorkDocs/port_a\port_1m
C:/Users/stacey/WorkDocs/port_a\job_lists
C:/Users/stacey/WorkDocs/port_a\job_lists_bu
C:/Users/stacey/WorkDocs/port_a\job_lists_bu2
C:/Users/stacey/WorkDocs/port_a\roll_185_oe_2018-09-07
C:/Users/stacey/WorkDocs/port_a\roll_186_oe_2018-09-14
C:/Users/stacey/WorkDocs/port_a\roll_187_oe_2018-09-21
C:/Users/stacey/WorkDocs/port_a\roll_4_oe_2015-03-20
C:/Users/stacey/WorkDocs/port_a\roll_5_oe_2015-03-27
C:/Users/stacey/WorkDocs/port_a\roll_6_oe_2015-04-03
If the final folder in the path starts with 'roll' I would like to then interrogate these folders. I am trying to find the folders using the following:
def main():
folder = 'C:/Users/stacey/WorkDocs/port_a\'
for dirname, dirs, files in os.walk(folder):
if dirname == folder+'\roll_*':
print('dirname')
So the current expected output would look like:
C:/Users/stacey/WorkDocs/port_a\roll_185_oe_2018-09-07
C:/Users/stacey/WorkDocs/port_a\roll_186_oe_2018-09-14
C:/Users/stacey/WorkDocs/port_a\roll_187_oe_2018-09-21
C:/Users/stacey/WorkDocs/port_a\roll_4_oe_2015-03-20
C:/Users/stacey/WorkDocs/port_a\roll_5_oe_2015-03-27
C:/Users/stacey/WorkDocs/port_a\roll_6_oe_2015-04-03
However I am not getting any output. I think maybe I've set the wildcard comparison up incorrectly but am not sure. How can I return the desired output?

Two issues. First, remember to use raw string whenever you have a slash, like this:
folder = r'C:/Users/stacey/WorkDocs/port_a\'
Second, you cannot expect Python knows your globbing syntax with == comparison. However, as you are doing prefix match, then just check the prefix:
for dirname, dirs, files in os.walk(folder):
if dirname.startswith(folder+r'\roll_'):
print('dirname')
But indeed, you can simply use glob module in Python to do all this loop stuff:
import glob
for dirname in glob.glob(folder+r'\roll_*'):
print(dirname)

Related

Python get all the file name in a list

The problem is to get all the file names in a list that are under a particular directory and in a particular condition.
We have a directory named "test_dir".
There, we have sub directory "sub_dir_1", "sub_dir_2", "sub_dir_3"
and inside of each sub dir, we have some files.
sub_dir_1 has files ['test.txt', 'test.wav']
sub_dir_2 has files ['test_2.txt', 'test.wav']
sub_dir_2 has files ['test_3.txt', 'test_3.tsv']
What I want to get at the end of the day is a list of of the "test.wav" that exist under the "directory" ['sub_dir_1/test.wav', 'sub_dir_2/test.wav']. As you can see the condition is to get every path of 'test.wav' under the mother directory.
mother_dir_name = "directory"
get_test_wav(mother_dir_name)
returns --> ['sub_dir_1/test.wav', 'sub_dir_2/test.wav']
EDITED
I have changed the direction of the problem.
We first have this list of file names
["sub_dir_1/test.wav","sub_dir_2/test.wav","abc.csv","abc.json","sub_dir_3/test.json"]
from this list I would like to get a list that does not contain any path that contains "test.wav" like below
["abc.csv","abc.json","sub_dir_3/test.json"]

You can use glob patterns for this. Using pathlib,
from pathlib import Path
mother_dir = Path("directory")
list(mother_dir.glob("sub_dir_*/*.wav"))
Notice that I was fairly specific about which subdirectories to check - anything starting with "sub_dir_". You can change that pattern as needed to fit your environment.

Use os.walk():
import os
def get_test_wav(folder):
found = []
for root, folders, files in os.walk(folder):
for file in files:
if file == "test.wav":
found.append(os.path.join(root, file))
return found
Or a list comprehension approach:
import os
def get_test_wav(folder):
found = [f"{arr[0]}\\test.wav" for arr in os.walk(folder) if "test.wav" in arr[2]]
return found

I think this might help you How can I search sub-folders using glob.glob module?
The main way to make a list of files in a folder (to make it callable later) is:
file_path = os.path.join(motherdirectopry, 'subdirectory')
list_files = glob.glob(file_path + "/*.wav")
just check that link to see how you can join all sub-directories in a folder.
This will also give you all the file in sub directories that only has .wav at the end:
os.chdir(motherdirectory)
glob.glob('**/*.wav', recursive=True)

Can't get absolute path in Python

I've tried to use os.path.abspath(file) as well as Path.absolute(file) to get the paths of .png files I'm working on that are on a separate drive from the project folder that the code is in. The result from the following script is "Project Folder for the code/filename.png", whereas obviously what I need is the path to the folder that the .png is in;
for root, dirs, files in os.walk(newpath):
for file in files:
if not file.startswith("."):
if file.endswith(".png"):
number, scansize, letter = file.split("-")
filepath = os.path.abspath(file)
# replace weird backslash effects
correctedpath = filepath.replace(os.sep, "/")
newentry = [number, file, correctedpath]
textures.append(newentry)
I've read other answers on here that seem to suggest that the project file for the code can't be in the same directory as the folder that is being worked on. But that isn't the case here. Can someone kindly point out what I'm not getting? I need the absolute path because the purpose of the program will be to write the paths for the files into text files.

You could use pathlib.Path.rglob here to recursively get all the pngs:
As a list comprehension:
from pathlib import Path
search_dir = "/path/to/search/dir"
# This creates a list of tuples with `number` and the resolved path
paths = [(p.name.split("-")[0], p.resolve()) for p in Path(search_dir).rglob("*.png")]
Alternatively, you can process them in a loop:
paths = []
for p in Path(search_dir).rglob("*.png"):
number, scansize, letter = p.name.split("-")
# more processing ...
paths.append([number, p.resolve()])

I just recently wrote something like what you're looking for.
This code relies on the assumption that your files are the end of the path.
it's not suitable to find a directory or something like this.
there's no need for a nested loop.
DIR = "your/full/path/to/direcetory/containing/desired/files"
def get_file_path(name, template):
"""
#:param template: file's template (txt,html...)
#return: The path to the given file.
#rtype: str
"""
substring = f'{name}.{template}'
for path in os.listdir(DIR):
full_path = os.path.join(DIR, path)
if full_path.endswith(substring):
return full_path

The result from
for root, dirs, files in os.walk(newpath):
is that files just contains the filenames without a directory path. Using just filenames means that python by default uses your project folder as directory for those filenames. In your case the files are in newpath. You can use os.path.join to add a directory path to the found filenames.
filepath = os.path.join(newpath, file)
In case you want to find the png files in subdirectories the easiest way is to use glob:
import glob
newpath = r'D:\Images'
file_paths = glob.glob(newpath + "/**/*.png", recursive=True)
for file_path in file_paths:
print(file_path)

RegEx to find specific file path

I am trying to find the existence of a file testing.txt
The first file exists in: sub/hbc_cube/college/
The second file exists in: sub/hbc/college
However, when searching for where the file exists, I CANNOT assume the string 'hbc' because the name may be different depending on the user. So I am trying to find a way to
PASS if the path is
sub/_cube/college/
FAIL if the path is
sub/*/college
But I cannot use a glob character () because the () will count _cube as failing. I am trying to figure out a regular expression that will only detect a string and not a string with an underscore (hbc_cube for example).
I have tried using the python regex dictionary but I have not been able to figure out the correct regex to use
file_list = lookupfiles(['testing.txt'], dirlist = ['sub/'])
for file in file_list:
if str(file).find('_cube/college/') #hbc_cube/college
print("pass")
if str(file).find('*/college/') #hbc/college
print("fail")
If the file exists in both locations I want only "fail" to print. The problem is the * character is counting hbc_cube.

The glob module is your friend. You don't even need to match against multiple directories, glob will do it for you:
from glob import glob
testfiles = glob("sub/*/testing.txt")
if len(testfiles) > 0 and all("_cube/" in path for path in testfiles):
print("Pass")
else:
print("Fail")
In case it is not obvious, the test all("_cube/" in path for path in testfiles) will take care of this requirement:
If the file exists in both locations I want only "fail" to print. The problem is the * character is counting hbc_cube.
If some of the paths that matched do not contain _cube, the test fails. Since you want to know about files that cause the test to fail, you cannot search solely for files in a path containing *_cube -- you must retrieve both good and bad paths, and inspect them as shown.
Of course you can shorten the above code, or generalize it to construct the globbed path by combining options from a list of folders and a list of files, etc., depending on the particulars of your case.
Note that there are "full regular expressions", provided by the re module, and the simpler "globs" used by the glob module. If you go check the documentation, don't confuse them.

Use the pathlib to parse your path, from the path object get the parent, this will discard the /college part, and check if the path string ends with _cube
from pathlib import Path
file_list = lookupfiles(['testing.txt'], dirlist = ['sub/'])
for file in file_list:
path = Path(file)
if str(path.parent).endswith('_cube'):
print('pass')
else:
print('Fail')
Edit:
If the file variable in the for loop contains the file name (sub/_cube/college/testing.txt) just call parent twice on the path, path.parent.parent
Another approach would be to filter the files inside lookupfiles() that is if you have access to that function and can edit it

The os module is well suited for this:
import os
# This assumes your current working directory has sub in it
for root, dirs, files in os.walk('sub'):
for file in files:
if file=='testing.txt':
# print the file and the directory it's in
print(os.path.join(root, file))
os.walk will return a three-element tuple as it iterates: a root dir, directories in that current folder, and files in that current folder. To print the directory, you combine the root (cwd) and the file name.
For example, on my machine:
for root, dirs, files in os.walk(os.getcwd()):
for file in files:
if file.endswith('ipynb'):
os.path.join(root, file)
# returns
/Users/mm92400/Salesforce_Repos/DataExplorationClustersAndTime.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled1.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationExploratory.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled3.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled4.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationUntitled2.ipynb
/Users/mm92400/Salesforce_Repos/DataExplorationClusterAnalysis.ipynb

how do I search/detect all of the directories/sub-directories within a specified path?

I am trying to list out all of the sub-directories in a given path (see my code)
however, I am very new to python and am wondering what would be the best approach
the below code will produce everything in the folder but I am looking for the directories and sub-directories only
import os
def main():
videosDir = os.listdir("D:\TempServer\Videos\Movies")
for dir in videosDir:
dirName = "" + dir
print(dirName)
if __name__ == '__main__':
main()
any help would be appriecated
Thanks!
littlejiver

Instead of os.listdir use os.walk. That will distinguish the folders from the files. It loops only on the folders, so you don't even need to filter out the non-folders. If you want the file names, there is a list of the files in each folder (and another list for the subfolders).
for folderName, subfolders, filenames in os.walk("D:\TempServer\Videos\Movies"):
print(folderName)

Moving into a directory without knowing its name

Im a noob to python and I am trying to complete this simple task. I want to access multiple directories one by one that are all located inside one directory. I do not have the names of the multiple directories. I need to enter into the directory, combine some files, move back out of that directory, then into the next directory, combine some files, move back out of it, and so on........ I need to make sure I dont access the same directory more than once.
I looked around and tried various commands and nothing yet.

try using something like the following piece of code.:
import os, fnmatch
def find_files(directory, pattern):
for root, dirs, files in os.walk(directory):
for basename in files:
if fnmatch.fnmatch(basename, pattern):
filename = os.path.join(root, basename)
yield filename
use it something like this:
for filename in find_files('/home/', '*.html')
# do something

Sometimes I find glob to be useful:
from glob import glob
import os
nodes = glob('/tmp/*/*')
for node in nodes:
try:
print 'now in directory {}'.format(os.path.dirname(node))
with open(node, 'r') as f:
# Do something with f...
print len(f.read())
except IOError: # Because node may be a directory, which we cannot 'open'
continue

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Wild card file path comparison - python

Related

Python get all the file name in a list

Can't get absolute path in Python

RegEx to find specific file path

how do I search/detect all of the directories/sub-directories within a specified path?

Moving into a directory without knowing its name

Categories

Resources