how to load images from different folders and subfolders in python - python

I am developing a CNN in a animal classification dataset, which are separated into 2 folders, and the 2 folders involve another subfolders...there are four layers of this structure, and now I want to load them and convert them to n-dimension-arrays to feed to tensorflow, the names of these folders are the labels.
I hope that someone can help me with some concrete codes or some useful materials.
Thank you very much in advance!
Here I will give some examples:
Anisopleura Libellulidae Leach, 1815 Trithemis aurora
Zygoptera Calopterygidae Selys, 1850 Calopteryx splendens
the aurora and splendens are the labels of this problem, and they are also the name of fifth floor subfolders, the images are stored in these folders.
C:\Users\Seth\Desktop\dragonfly\Anisopleura\Libellulidae Leach, 1815\Pseudothemis\zonata
this is a path.

I using openface library for face recognition, In this library iterImgs is method that gives list of you all images under a Directory
For detail iterImgs
from openface.data import iterImgs
imgs = list(iterImgs("Directory path"))
print imgs # print all images in Directory path also in Tree
or another way is defined a vailed extension
vailed_ext = [".jpg",".png"]
import os
f_list = []
def Test2(rootDir):
for lists in os.listdir(rootDir):
path = os.path.join(rootDir, lists)
filename, file_extension = os.path.splitext(path)
if file_extension in vailed_ext:
print path
f_list.append[path]
if os.path.isdir(path):
Test2(path)
Test2("/home/")
print f_list

os.walk() is what you are looking for.
import os
# traverse root directory, and list directories as dirs and files as files
for root, dirs, files in os.walk("."):
path = root.split(os.sep)
print((len(path) - 1) * '---', os.path.basename(root))
for file in files:
print(len(path) * '---', file)
This code will allow you to parse recursively all folders and subfolders. You get the name of the subfolder(labels in your case) and all files in the file variable.
The next work for you is then to maybe create a dictionnary (or numpy multi-dimensional array) to store for each label (or subfolder) the features of your image.

Related

Selecting files from multiple folders with a certain extension

So consider this as the Folder structure below:
Images
-1.jpg
Yellow
yellow1.jpg
yellow2.jpg
yellow1.csv
Blue
blue1.jpg
Orange
Purple
purple1.jpg
purple2.jpg
puprple.csv
Now my agenda is to take all the jpegs from the Images master directory which are there in separate folders.
I thought I can use glob as
input_dir=r'../../../../Images'
files=glob.glob(input_dir+"/**/*.jpg")
but this only yields output with the last file
like files=files=[''../../../../Images/purple2.jpg']
but I want the files as
['../../../../Images/yellow1.jpg','../../../../Images/yellow2.jpg','../../../../Images/blue1.jpg','../../../../Images/purple1.jpg','../../../../Images/purple2.jpg']
I need to have all the files, can someone help me rectify this?
Just use pathlib.
from pathlib import Path
p = Path("your_source_folder")
files = [f for f in p.rglob('*.jpg') if f.is_file()]
This will recursively go through your folder structure, select all jpeg files and return a list of Path objects for all found files.

Iterate Through all Folders in a Drive - A Legacy Storage Option Migration to Cloud

I have a folder structure similar to the following:
This structure is used to store images.
New images are appended to the deepest available directory.
A directory can hold a maximum of 100 images.
Examples:
The first 100 images added will have the path:
X:\Images\DB\0\0\0\0\0\0\image_name.jpg
A random image may have the path:
X:\Images\DB\0\2\1\4\2\7\image_name.jpg
The last 100 images added will have the path:
X:\Images\DB\0\9\9\9\9\9\image_name.jpg
N.B. An image is only ever stored at the deepest possible directory.
X:\Images\DB\0\x\x\x\x\x\IMAGES_HERE
E.G. There are no images stored in: X:\Images\DB\0\1\2\3
N.B. The deepest folder path to an image only exists if an image is stored there. Example:
X:\Images\DB\0\9\9\9\9\9
... may not exist (and it doesn't in my case).
What I want to achieve is, beginning at the root directory, navigate through every possible path to the images and run a command.
I'm aware the time complexity for this is in terms of hours, if not days. It's a legacy storage option with the command migrating images to the cloud.
I have already managed to code some functions to allow me to travel to the current deepest directory and execute a command, but visiting all possible paths adds a complexity I'm struggling with - also I'm new to Python.
Here is the code:
# file generator
def files(path):
for file in os.listdir(path):
if os.path.isfile(os.path.join(path, file)):
yield file
# latest deepest directory
def get_deepest_dir(dir):
current_dir = dir
next_dir = os.listdir(current_dir)[-1]
if len(list(files(current_dir))) == 0:
next_dir = os.path.join(current_dir, next_dir)
return get_deepest_dir(next_dir)
else:
return current_dir
# perform command
def sync():
dir = get_deepest_dir(root_dir)
command = "<command_here>"
subprocess.Popen(command, shell=True)
I used the following to search for csv / pdf files. I've left an example of what I wrote to search through all folders.
os.listdir -
os.listdir() method in python is used to get the list of all files and directories in the specified directory.
os.walk -
os.walk() method, in python is used to generate the file names in a directory tree by walking the tree either top-down or bottom-up.
#Import Python Modules
import os,time
import pandas as pd
## Search Folder
##src_path ="/Users/folder1/test/"
src_path ="/Users/folder1/"
path = src_path
files = os.listdir(path)
for f in files:
if f.endswith('.csv'):
print(f)
for root, directories, files in os.walk(path, topdown=False):
for name in files:
if name.endswith('.csv'):
print(os.path.join(root, name))
## for name in directories:
## print(os.path.join(root, name))
for root, directories, files in os.walk(path):
for name in files:
if name.endswith('.pdf'):
print(os.path.join(root, name))
## for name in directories:
## print(os.path.join(root, name))
Thanks to #NeoTheNerd above for the solution.
The adapted code which worked for me is here.
def all_dirs(path):
for root, directories, files in os.walk(path, topdown=False):
if sum(c.isdigit() for c in root) == 6:
print("Migrating Images From {}".format(root))
all_dirs("X:\\Images\\DB\\0")

Reading all files that start with a certain string in a directory

Say I have a directory.
In this directory there are single files as well as folders.
Some of those folders could also have subfolders, etc.
What I am trying to do is find all of the files in this directory that start with "Incidences" and read each csv into a pandas data frame.
I am able to loop through all the files and get the names, but cannot read them into data frames.
I am getting the error that "___.csv" does not exist, as it might not be directly in the directory, but rather in a folder in another folder in that directory.
I have been trying the attached code.
inc_files2 = []
pop_files2 = []
for root, dirs, files in os.walk(directory):
for f in files:
if f.startswith('Incidence'):
inc_files2.append(f)
elif f.startswith('Population Count'):
pop_files2.append(f)
for file in inc_files2:
inc_frames2 = map(pd.read_csv, inc_files2)
for file in pop_files2:
pop_frames2 = map(pd.read_csv, pop_files2)
You are adding only file name to the lists, not their path. You can use something like this to add paths instead:
inc_files2.append(os.path.join(root, f))
You have to add the path from the root directory where you are
Append the entire pathname, not just the bare filename, to inc_files2.
You can use os.path.abspath(f) to read the full path of a file.
You can make use of this by making the following changes to your code.
for root, dirs, files in os.walk(directory):
for f in files:
f_abs = os.path.abspath(f)
if f.startswith('Incidence'):
inc_files2.append(f_abs)
elif f.startswith('Population Count'):
pop_files2.append(f_abs)

Walking into sub directories not wokring

I'm trying to export all of my maps that are in my subdirectories.
I have the code to export, but I cannot figure out where to add the loop that will make it do this for all subdirectories. As of right now, it is exporting the maps in the directory, but not the subfolders.
import arcpy, os
arcpy.env.workspace = ws = r"C:\Users\162708\Desktop\Burn_Zones"
for subdir, dirs, files in os.walk(ws):
for file in files:
mxd_list = arcpy.ListFiles("*.mxd")
for mxd in mxd_list:
current_mxd = arcpy.mapping.MapDocument(os.path.join(ws, mxd))
pdf_name = mxd[:-4] + ".pdf"
arcpy.mapping.ExportToPDF(current_mxd, pdf_name)
del mxd_list
What am I doing wrong that it isn't able to iterate through the subfolders?
Thank you!
Iterating through os.walk result you should give tuples containing (path, dirs, files) (the first in the tuple is the current path that contains files which is why I tend to name it that way). The current directory does not change automatically so you need to incorporate it into the path you're giving to arcpy.ListFiles like this:
arcpy.ListFiles(os.path.join(path, "*.mxd"))
You should also remove the loop for file in files. It seems like you're exporting the files per directory so why export the whole directory every time for each file?
Also you should change arcpy.mapping.MapDocument(os.path.join(ws, mxd)) to arcpy.mapping.MapDocument(os.path.join(path, mxd)) where path is again the first element from os.walk.

compare folder contents

I need to compare two folders on a XP machine.
This is a radio station, we have all our music stored as high bitrate mp3, when new songs are acquired from CD they are wav. I need to be able to compare the mp3 and the wav folders for duplicates (naming will be identical except for the file extension). The object is to produce a list of items in the wav folder that don't have mp3 versions.
Python 2.7 is installed and my very limited experience of coding has been with python.
All help appreciated, even if it is just a kick in the right direction...
Thanks.
Use os.listdir to get the folder contents, and os.path.splitext to determine the base name:
import os
wavs = set(os.path.splitext(fn)[0] for fn in os.listdir('/path/to/wavs'))
mp3s = set(os.path.splitext(fn)[0] for fn in os.listdir('/path/to/mp3s'))
must_convert = wavs - mp3s
If you want to collate the mp3s and wavs of multiple folders (but not recursively), you'll have to store both basename and the full filename:
import os,collections
files = collections.defaultdict(dict)
for d in ['/path/to/wavs', '/more/wavs', '/some/mp3s', '/other/mp3s']:
for f in os.listdir(d):
basename,ext = os.path.splitext(f)
files[ext][basename] = os.path.join(d, f)
files_to_convert = [fn for basename,fn in files['.wav'].items()
if basename not in files['.mp3']]
import os
wav=[os.path.splitext(x)[0] for x in os.listdir(r'C:\Music\wav') if os.path.splitext(x)[1]=='.wav']
mp3=[os.path.splitext(x)[0] for x in os.listdir(r'C:\Music\mp3') if os.path.splitext(x)[1]=='.mp3']
#here wav is a list names of only those files whose extension is .wav
#here mp3 is a list names of only those files whose extension is .mp3
print(set(wav)-set(mp3))
Here is a solution that works recursively, slightly based on phihag's answer.
import os
sets = {}
for dirname in 'mp3_folder', 'wav_folder':
sets[dirname] = set()
for path, dirs, files in os.walk(dirname):
sets[dirname].update(os.path.join(path, os.path.splitext(fn)[0]).lstrip(dirname) for fn in files)
must_convert = sets['mp3_folder']-sets['wav_folder']
print('\n'.join(sorted(must_convert)))

Categories