Python: read folders that are not ZIP folders - python

Currently I read folders individually like so:
input_location = r'path\202206'
I would like to read in all the folders at once + not read in the zipped files.
Essentially the logic id like to perform is input_location = r'path\202206' + r'path\202207' + r'path\202207' + r'path\202208'
I cant just do input_location = r'path\ as it may read in those zip files I do not want.
Is there a way to read in the folders without reading in the zip files? Or explicitly list the folder names in one variable (input_location)?

IIUC: Collecting all directories from a directory can be done using a simple list comprehension as follows.
The glob library is nice, as it returns the full filepath, whereas a function such as os.listdir() only returns the filenames.
import os
from glob import glob
dirs = [f for f in glob('/path/to/files/*') if os.path.isdir(f)]
Output:
['/path/to/files/202207',
'/path/to/files/202206',
'/path/to/files/202208',
'/path/to/files/202209']
Then, your script can iterate over the list of directories as required.
For completeness, the directory content is a follows:
202206
202206.zip
202207
202207.zip
202208
202208.zip
202209
202209.zip

If you use pathlib from Pythons standard library you can get all entries in the folder and check if the entry is a folder.
from pathlib import Path
for entry in Path('/path/to/folder').glob('*'):
if entry.is_dir():
print(entry)

A os base approach. Notice that os.listdir returns the content of the directory in a basename form and not as a path.
import os
def my_dirs(wd):
return list(filter(os.path.isdir, (os.path.join(wd, f) for f in os.listdir('.'))))
working_dir = # add path
print(*my_dirs(working_dir), sep='\n')
Remarks: to make your program platform independent you always stuffs like os.path.join or os.sep for path manipulation

Related

Extract a list of files with a certain criteria within subdirectory of zip archive in python

I want to access some .jp2 image files inside a zip file and create a list of their paths. The zip file contains a directory folder named S2A_MSIL2A_20170420T103021_N0204_R108_T32UNB_20170420T103454.SAFE and I am currently reading the files using glob, after having extracted the folder.
I don't want to have to extract the contents of the zip file first. I read that I cannot use glob within a zip directory, nor I can use wildcards to access files within it, so I am wondering what my options are, apart from extracting to a temporary directory.
The way I am currently getting the list is this:
dirr = r'C:\path-to-folder\S2A_MSIL2A_20170420T103021_N0204_R108_T32UNB_20170420T103454.SAFE'
jp2_files = glob.glob(dirr + '/**/IMG_DATA/**/R60m/*B??_??m.jp2', recursive=True)
There are additional different .jp2 files in the directory, for which reason I am using the glob wildcards to filter the ones I need.
I am hoping to make this work so that I can automate it for many different zip directories. Any help is highly appreciated.
I made it work with zipfile and fnmatch
from zipfile import ZipFile
import fnmatch
zip = path_to_zip.zip
with ZipFile(zipaki, 'r') as zipObj:
file_list = zipObj.namelist()
pattern = '*/R60m/*B???60m.jp2'
filtered_list = []
for file in file_list:
if fnmatch.fnmatch(file, pattern):
filtered_list.append(file)

Python get all the file name in a list

The problem is to get all the file names in a list that are under a particular directory and in a particular condition.
We have a directory named "test_dir".
There, we have sub directory "sub_dir_1", "sub_dir_2", "sub_dir_3"
and inside of each sub dir, we have some files.
sub_dir_1 has files ['test.txt', 'test.wav']
sub_dir_2 has files ['test_2.txt', 'test.wav']
sub_dir_2 has files ['test_3.txt', 'test_3.tsv']
What I want to get at the end of the day is a list of of the "test.wav" that exist under the "directory" ['sub_dir_1/test.wav', 'sub_dir_2/test.wav']. As you can see the condition is to get every path of 'test.wav' under the mother directory.
mother_dir_name = "directory"
get_test_wav(mother_dir_name)
returns --> ['sub_dir_1/test.wav', 'sub_dir_2/test.wav']
EDITED
I have changed the direction of the problem.
We first have this list of file names
["sub_dir_1/test.wav","sub_dir_2/test.wav","abc.csv","abc.json","sub_dir_3/test.json"]
from this list I would like to get a list that does not contain any path that contains "test.wav" like below
["abc.csv","abc.json","sub_dir_3/test.json"]
You can use glob patterns for this. Using pathlib,
from pathlib import Path
mother_dir = Path("directory")
list(mother_dir.glob("sub_dir_*/*.wav"))
Notice that I was fairly specific about which subdirectories to check - anything starting with "sub_dir_". You can change that pattern as needed to fit your environment.
Use os.walk():
import os
def get_test_wav(folder):
found = []
for root, folders, files in os.walk(folder):
for file in files:
if file == "test.wav":
found.append(os.path.join(root, file))
return found
Or a list comprehension approach:
import os
def get_test_wav(folder):
found = [f"{arr[0]}\\test.wav" for arr in os.walk(folder) if "test.wav" in arr[2]]
return found
I think this might help you How can I search sub-folders using glob.glob module?
The main way to make a list of files in a folder (to make it callable later) is:
file_path = os.path.join(motherdirectopry, 'subdirectory')
list_files = glob.glob(file_path + "/*.wav")
just check that link to see how you can join all sub-directories in a folder.
This will also give you all the file in sub directories that only has .wav at the end:
os.chdir(motherdirectory)
glob.glob('**/*.wav', recursive=True)

How to run script for all files in a folder/directry

I am new to python. I have successful written a script to search for something within a file using :
open(r"C:\file.txt) and re.search function and all works fine.
Is there a way to do the search function with all files within a folder? Because currently, I have to manually change the file name of my script by open(r"C:\file.txt),open(r"C:\file1.txt),open(r"C:\file2.txt)`, etc.
Thanks.
You can use os.walk to check all the files, as the following:
import os
for root, _, files in os.walk(path):
for filename in files:
with open(os.path.join(root, filename), 'r') as f:
#your code goes here
Explanation:
os.walk returns tuple of (root path, dir names, file names) in the folder, so you can iterate through filenames and open each file by using os.path.join(root, filename) which basically joins the root path with the file name so you can open the file.
Since you're a beginner, I'll give you a simple solution and walk through it.
Import the os module, and use the os.listdir function to create a list of everything in the directory. Then, iterate through the files using a for loop.
Example:
# Importing the os module
import os
# Give the directory you wish to iterate through
my_dir = <your directory - i.e. "C:\Users\bleh\Desktop\files">
# Using os.listdir to create a list of all of the files in dir
dir_list = os.listdir(my_dir)
# Use the for loop to iterate through the list you just created, and open the files
for f in dir_list:
# Whatever you want to do to all of the files
If you need help on the concepts, refer to the following:
for looops in p3: http://www.python-course.eu/python3_for_loop.php
os function Library (this has some cool stuff in it): https://docs.python.org/2/library/os.html
Good luck!
You can use the os.listdir(path) function:
import os
path = '/Users/ricardomartinez/repos/Salary-API'
# List for all files in a given PATH
file_list = os.listdir(path)
# If you want to filter by file type
file_list = [file for file in os.listdir(path) if os.path.splitext(file)[1] == '.py']
# Both cases yo can iterate over the list and apply the operations
# that you have
for file in file_list:
print(file)
#Operations that you want to do over files

Get the list of all the files with ".txt" after changing the current working directory

I changed the current working directory and then I want to list all the files in that directory with extension .txt. How can I achieve this?
# !/usr/bin/python
import os
path = "/home/pawn/Desktop/projects_files"
os.chdir(path)
## checking working directory
print ("working directory "+ os.getcwd()+'\n')
Now how to list all the files in the current directory (projects_files) with extension .txt?
Use glob, which is a wildcard search for files, and you don't need to change the directory either:
import glob
for f in glob.iglob('/home/pawn/Desktop/project_files/*.txt'):
print(f)
You can use glob. With glob you can use wildcards as you would do in your system command line. For example:
import glob
print( glob.glob('*.txt') )
You would need to use glob in python.
It will help you to fetch the files of a particular format..
Usage
>>> import glob
>>> glob.glob('*.gif')
Check the documentation for more details
if you don't want to import glob as suggested in the other good answers here, you can also do it with the same os moudle:
path = "/home/pawn/Desktop/projects_files"
[f for f in os.listdir(path) if f.endswith('.txt')]

Renaming a set of file in separate folder in python

Hi guys have a problem with renaming a set of files in separate folder in python. i have a folder structure like
images/p_0/aaa.jpg,bbb.jpg
images/p_1/aaa.jpg,bbb.jpg
images/p_2/aaa.jpg,bbb.jpg
I need to rename these jpg to like
images/p_0/A-1.jpg,B-1.jpg
images/p_1/A-2.jpg,B-2.jpg
images/p_2/A-3.jpg,B-3.jpg
and i am using the os.rename method. I just don't get how to cycle through the folders and find the image and rename it.
Use os.walk and os.rename.
import os
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
#Rename your files and use os.path.join(root, name)
You can use os.listdir() to get all the files in some directory. You might also want to look through the rest of the os module documentation.
Take a look at os.listdir or os.walk, if you have a nested directory structure.
#!/usr/bin/env python
import os
# list all files
for filename in os.listdir('.'):
print filename # do something with filename ...
def is_image(filename):
""" Tiny filter function. """
return filename.endswith(".jpg") or filename.endswith(".png")
# only consider files, which pass the `is_image` filter
for filename in filter(is_image, os.listdir('.')):
print filename # do something with filename ...
In addition to os.walk(), glob.glob() might be useful.

Categories