Using glob to find all zip files recursively in three sub folder - python

I am trying to look only in three specific subfolders and then recursively create a list of all zip files within the folders. I can easily do this with just 1 folder and recursively look through all subfolders that are within the inputpath, but there are other folders that get created that we cannot use plus we do not know what the folder names will be. So This is where I am at and I am not sure how to pass three subfolders to glob correctly.
# using glob, create a list of all the zip files in specified sub directories COMM, NMR, and NMH inside of input_path
zip_file = glob.glob(os.path.join(inputpath, "/comm/*.zip,/nmr/*.zip,/nmh/*.zip"), recursive=True)
#print(zip_file)
print(f"Found {len(zip_file)} zip files")

The string with commas in it is ... just a string. If you want to perform three globs, you need something like
zip_file = []
for dir in {"comm", "nmr", "nmh"}:
zip_file.extend(glob.glob(os.path.join(inputpath, dir, "*.zip"), recursive=True)
As noted by #Barmar in comments, if you want to look for zip files anywhere within these folders, the pattern needs to be ...(os.path.join(inputpath, dir, "**/*.zip"). If not, perhaps edit your question to provide an example of the structure you want to traverse.

Related

Python get all the file name in a list

The problem is to get all the file names in a list that are under a particular directory and in a particular condition.
We have a directory named "test_dir".
There, we have sub directory "sub_dir_1", "sub_dir_2", "sub_dir_3"
and inside of each sub dir, we have some files.
sub_dir_1 has files ['test.txt', 'test.wav']
sub_dir_2 has files ['test_2.txt', 'test.wav']
sub_dir_2 has files ['test_3.txt', 'test_3.tsv']
What I want to get at the end of the day is a list of of the "test.wav" that exist under the "directory" ['sub_dir_1/test.wav', 'sub_dir_2/test.wav']. As you can see the condition is to get every path of 'test.wav' under the mother directory.
mother_dir_name = "directory"
get_test_wav(mother_dir_name)
returns --> ['sub_dir_1/test.wav', 'sub_dir_2/test.wav']
EDITED
I have changed the direction of the problem.
We first have this list of file names
["sub_dir_1/test.wav","sub_dir_2/test.wav","abc.csv","abc.json","sub_dir_3/test.json"]
from this list I would like to get a list that does not contain any path that contains "test.wav" like below
["abc.csv","abc.json","sub_dir_3/test.json"]
You can use glob patterns for this. Using pathlib,
from pathlib import Path
mother_dir = Path("directory")
list(mother_dir.glob("sub_dir_*/*.wav"))
Notice that I was fairly specific about which subdirectories to check - anything starting with "sub_dir_". You can change that pattern as needed to fit your environment.
Use os.walk():
import os
def get_test_wav(folder):
found = []
for root, folders, files in os.walk(folder):
for file in files:
if file == "test.wav":
found.append(os.path.join(root, file))
return found
Or a list comprehension approach:
import os
def get_test_wav(folder):
found = [f"{arr[0]}\\test.wav" for arr in os.walk(folder) if "test.wav" in arr[2]]
return found
I think this might help you How can I search sub-folders using glob.glob module?
The main way to make a list of files in a folder (to make it callable later) is:
file_path = os.path.join(motherdirectopry, 'subdirectory')
list_files = glob.glob(file_path + "/*.wav")
just check that link to see how you can join all sub-directories in a folder.
This will also give you all the file in sub directories that only has .wav at the end:
os.chdir(motherdirectory)
glob.glob('**/*.wav', recursive=True)

python3 - filter os.walk subdirectories and retrieve file name+paths

I need help getting a list of file names and locations within specific sub-directories. My directory is structured as follows:
C:\folder
-->\2014
----->\14-0023
-------->\(folders with files inside)
----->\CLOSED
-------->\14-0055!
----------->\(folders with files inside)
-->\2015
----->\15-0025
-------->\(folders with files inside)
----->\CLOSED
-------->\15-0017!
----------->\(folders with files inside)
I would like to get a list of files and their paths ONLY if they are within CLOSED.
I have tried writing multiple scripts and search questions on SO, but have not been able to come up with something to retrieve the list I want. While there seems to be questions related to my trouble, such as Filtering os.walk() dirs and files , they don't quite have the same requirements as I do and I've thus far failed to adapt code I've found on SO for my purpose.
For example, here's some sample code from another SO thread I found that I tried to adapt for my purpose.
l=[]
include_prefixes = ['CLOSED']
for dir, dirs, files in os.walk(path2, topdown=True):
dirs[:] = [d for d in dirs if d in include_prefixes]
for file in files:
l.append(os.path.join(dir,file))
^the above got me an empty list...
After a few more failures, I thought to just get a list of the correct folder paths and make another script to iterate within THOSE folders.
l=[]
regex = re.compile(r'\d{2}-\d{4}', re.IGNORECASE)
for root, subFolders, files in os.walk(path2):
try:
subFolders.remove(regex)
except ValueError:
pass
for subFolder in subFolders:
l.append(os.path.join(root,subFolder))
^Yet I still failed and just got all the file paths in the directory. No matter what I do, I can't seem to force os.walk to (a) remove specific subdirs from it's list of subdirs and then (b) make it loop through those subdirs to get the file names and paths I need.
What should I fix in my example code? Or is there entirely different code that I should consider?

Unzip files using Python to one folder

I want to unzip files with python 2.7.8 .When i try to extract zip files that contain files with same names in it to one folder, some files got lost because duplication names. I try that:
import zipfile,fnmatch,os
rootPath = r"C:\zip"
pattern = '*.zip'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(os.path.join(root, filename))
outpath = r"C:\Project\new"
zipfile.ZipFile(os.path.join(root, filename)).extractall(r"C:\Project\new")
UPDATE:
I try to extract all the files located inside the zip files into one folder only without a new subfolders created. If there are files with the same name i need all the files
The ZipFile.extractall() method simply extracts the files and stores them one by one in the target path. If you want to preserve files with duplicated names you will have to iterate over the members using ZipeFile.namelist() and take appropriate action when you detect duplicates. The ZipFile.read() allows you to read the file contents, then you can write them wherever (and with whatever name) you want.

Unable to use getsize method with os.walk() returned files

I am trying to make a small program that looks through a directory (as I want to find recursively all the files in the sub directories I use os.walk()).
Here is my code:
import os
import os.path
filesList=[]
path = "C:\\Users\Robin\Documents"
for(root,dirs,files) in os.walk(path):
for file in files:
filesList+=file
Then I try to use the os.path.getsize() method to elements of filesList, but it doesn't work.
Indeed, I realize that the this code fills the list filesList with characters. I don't know what to do, I have tried several other things, such as :
for(root,dirs,files) in os.walk(path):
filesList+=[file for file in os.listdir(root) if os.path.isfile(file)]
This does give me files, but only one, which isn't even visible when looking in the directory.
Can someone explain me how to obtain files with which we can work (that is to say, get their size, hash them, or modify them...) on with os.walk ?
I am new to Python, and I don't really understand how to use os.walk().
The issue I suspect you're running into is that file contains only the filename itself, not any directories you have to navigate through from your starting folder. You should use os.path.join to combine the file name with the folder it is in, which is the root value yielded by os.walk:
for(root,dirs,files) in os.walk(path):
for file in files:
filesList.append(os.path.join(root, file))
Now all the filenames in filesList will be acceptable to os.path.getsize and other functions (like open).
I also fixed a secondary issue, which is that your use of += to extend a list wouldn't work the way you intended. You'd need to wrap the new file path in a list for that to work. Using append is more appropriate for adding a single value to the end of a list.
If you want to get a list of files including path use:
for(root, dirs, files) in os.walk(path):
fullpaths = [os.path.join(root, fil) for fil in files]
filesList+=fullpaths

Search directory for a directory with certain files?

I'd like to search a folder recursively for folders containing files names "x.txt" and "y.txt". For example, if it's given /path/to/folder, and /path/to/folder/one/two/three/four/x.txt and /path/to/folder/one/two/three/four/y.txt exist, it should return a list with the item "/path/fo/folder/one/two/three/four". If multiple folders within the given folder satisfy the conditions, it should list them all. Could this be done with a simple loop, or is it more complex?
os.walk does the hard work of recursively iterating over a directory structure for you:
import os
find = ['x.txt', 'y.txt']
found_dirs = []
for root, dirs, files in os.walk('/path/to/folder'):
if any(filename in files for filename in find):
found_dirs.append(root)
#found_dirs now contains all of the directories which matched

Categories