Can python load all files in all subfolders without the complete names?

Can python load all files in all subfolders without the complete names? - python

Please Help me.
I'm new to python and I want to load all jpg images in 500 folders (each folder has up to 100 jpg images).
I also have a CSV file that contains labels for folders, I want to tell python to consider every folder label for all jpg images in that folder.
example file name: folder_name[.....].jpg
Each file has the same folder name, except the [], which is different for each file.
How can I tell python no matter what it is []??
I would appreciate any help.
train = pd.read_csv("COAD_CMS_label_train.csv")`
train_image = []
for i in tqdm(range(train.shape[0])):
img = image.load_img('tiles/'+train['folder_name'][i]+' [*]'.astype('str')+'.jpg', target_size=(256,256,3),
grayscale=False,)
example for labels:
folder_name,Label
TCGA-A6-2683-01Z-00-DX1.0dfc5d0a-68f4-45e1-a879-0428313c6dbc,CMS2
TCGA-F4-6459-01Z-00-DX1.80a78213-1137-4521-9d60-ac64813dec4c,CMS4
TCGA-A6-6653-01Z-00-DX1.e130666d-2681-4382-9e7a-4a4d27cb77a4,CMS1

You could do something like this which will result in an array of all files with the jpg extension, where parent_dir is the parent directory which contains all the subdirectories with the images.
images_files = glob.iglob(f'{parent_dir}/*.jpg', recursive=True)
Hope that helps.
Good luck

Related

how to separate images in a folder to train folder according to the filename in my train csv file

currently learning deep learning with my own dataset
and have around 70k images in 1 folder
and already input the images to csv file that have filename, width, height, class and already divide them randomly into train,valid, and test csv
my question is,
is there any way how to seperate the images based on the filename on my csv files?
any answer would be appreciated <3
thank you

First, you need to extract the values from a column from the data frame and save them in a list
filenames = data['filenames'].values
filenames = filenames.tolist()
classes = data['classes'].values
classes = classes.tolist()
Now, Extract the file names from the directory "C:/data/Images/" or use os.getcwd()
path = os.getcwd()
images = [imagefilename for imagefilename in os.listdir(path) if imagefilename.endswith('.jpg') or imagefilename.endswith('.png')]
Now, compare the images and filename
finalclasses = []
finalimages = []
for i in range(len(filename)):
if len(images) != len(filenames): break;
if filename[i] == images[i]:
finalclasses.append(classes[i])
finalimages.append(os.path.join(path, image)) #or read with opencv by importing cv2 finalimages.append(cv2.read(os.path.join(path, image)))
The above will helps to solve your problem, Thank, Happy Learning :)

"RuntimeError: Found 0 files in subfolders of ".. Error about subfolder in Pytorch

I'm based on Window 10, Jupyter Notebook, Pytorch 1.0, Python 3.6.x currently.
At first I confirm to the correct path of files using this code : print(os.listdir('./Dataset/images/')).
and I could check that this path is correct.
but I met Error :
RuntimeError: Found 0 files in subfolders of: ./Dataset/images/
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif"
What is the matter?
Could you suggest a solution?
I tried to ./dataset/1/images like this method. but the result was same....
img_dir = './Dataset/images/'
img_data = torchvision.datasets.ImageFolder(os.path.join(img_dir), transforms.Compose([
transforms.Scale(256),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
]))
img_batch = data.DataLoader(img_data, batch_size=batch_size,
shuffle = True, drop_last=True)

I met the same problem when using celebA, including 200,000 images. As we can see there are many images. But in a small sample situation (I tried 20 images), I checked, the error will not be raised, which means we can read images successfully.
But when the number grows, we should use other methods.
I solved the problem according to this website. Thanks to QimingChen
Github solution
Simply, adding another folder named 1 (/train/--->train/1/) in the original folder will enable our program to work, without changing the path. That's because when facing large datasets, images should be sorted in subfolders of different classes.
The Original answer on Github:
Let's say I am going to use ImageFolder("/train/") to read jpg files in folder train.
The file structure is
/train/
-- 1.jpg
-- 2.jpg
-- 3.jpg
I failed to load them, leading to errors:
RuntimeError: Found 0 images in subfolders of: ./data
Supported image extensions are: .jpg,.JPG,.jpeg,.JPEG,.png,.PNG,.ppm,.PPM,.bmp,.BMP
I read the solution above and tried tens of times. When I changed the structure to
/train/1/
-- 1.jpg
-- 2.jpg
-- 3.jpg
But the read in code is still -- ImageFolder("/train/"), IT WORKS.
It seems like the program tends to recursively read in files, that is convenient in some cases.
Hope this would help!!

Can you post the structure of your files? In your case, it is supposed to be:
img_dir
|_class1
|_a.jpg
|_b.jpg
|_class2
|_a.jpg
|_b.jpg
...

According to the rules of the DataLoader in pytorch you should choose the the superior path of the image path. That means if your images locate in './Dataset/images/', the path of the data loader should be './Dataset' instead. I hope it can fix your bug.:)

You can modify the ImageFolder class to get to the root folder directly (without subfolders):
class ImageFolder(Dataset):
def __init__(self, root, transform=None):
#Call make_dataset to collect files.
self.samples = make_dataset(opt.dataroot)
self.imgs = self.samples
self.transformA = transformA
...
We call the make_dataset method to collect our files:
def make_dataset(dir):
import os
images = []
d = os.path.expanduser(dir)
if not os.path.exists(dir):
print('path does not exist')
for root, _, fnames in sorted(os.walk(d)):
for fname in sorted(fnames):
path = os.path.join(root, fname)
images.append(path)
return images
All the action takes place in the loop containing os.walk. Here, the files are collected from the 'root' directory, which we specify as the directory containing our files.

See the documentation of ImageFolder dataset to see how this dataset class expects the images to be organized into subfolders under `./Dataset/images' according to image classes. Make sure your images adhere to this order.

Apparently, the solution is just making the picture name alpha-numeric. They may be another solution but this work.

Convert pdf to jpg - can't see outputs

I have Python 3.6 and want to know how to convert 30+ pdf images into jpgs. I have these pdf images stored in one folder and would like to run a script to run through all the pdfs, convert them to jpgs and split them out into a new folder.
I tried to test this out on one image (see code below):
from pdf2jpg import pdf2jpg
inputpath = r"C:\Users\Admin-dsc\Documents\Image project\pdfinputs\RWG003209_2 Red.pdf"
outputpath = r"C:\Users\Admin-dsc\Documents\Image project\jpgoutputs"
result = pdf2jpg.convert_pdf2jpg(inputpath, outputpath, pages="1")
print(result)
The code runs fine, but when I look in the folder:
C:\Users\Admin-dsc\Documents\Image project\jpgoutputs
I see a folder called RWG003209_2 Red.pdf which is empty. I am confused - shouldn't the jpgs be saved here? Have I misunderstood something?

How to load multiple images from folder. PyQt4

I want to be able to load large number of images one by one from the given folder. And also without knowing the names of the each image (only the name of the folder where all images are located). Currently I can load only one image using it's name (pic.jpg):
pixmap = QtGui.QPixmap("pic.jpg")
item = QtGui.QGraphicsPixmapItem(pixmap)
self.scene.addItem(item)
self.scene.update()
Is there any way to do this? Thanks in advance!

The os module contains filesystem access functions.
import os
dir = "dirname"
for file in os.listdir(dir):
... = QtGui.QPixmap(os.path.join(dir, file))
Note: os.path.join is there so you are platform agnostic.

Python: embed images in kmz

I have a series of kmz files (1000+) within one folder that I have generated of each polygon of a feature class and a corresponding image (all images are in a separate folder). These kmz's are automatically generated from the attribute tables of my shapefiles from arcGIS. Within each kmz file I have a link to an image that corresponds to that feature as such:
<tr>
<td>Preview</td>
<td>G:\Temp\Figures\Ovr0.png</td>
</tr>
At the moment each image is but a tabular text referencing an image in the directory /Temp/Figures. What id like is to convert all those texts into links something along the lines of
<img src="file:///G:/Temp/Figures/Ovr0.png" width = 750 height=500/>
Given the large volume of files it would be ideal if this could be done within python, simplekml? On another note - at some stage I would like to share a few of these kmz files and therefore I was wondering if the best solution was to subdivide each kmz and image pair into their own respective directories and rezip the kmz file somehow?

I have managed to solve my problem by iterating each kmz and image and using the zipfile module to read the contents, rewrite the doc.kml and rezipping the files into a kmz. At the moment the images are placed after the < body >in the kmz but a more complex argument could be written with re I presume.
If there is a more efficient method please let me know...
def edit_kmz(kmz,output,image):
##Read the doc.kml file in the kmz and rewrite the doc.kml file
zf = zipfile.ZipFile(kmz)
temp = r'tempfolder\doc.kml'
for line in zf.read("doc.kml").split("\n"):
with open(temp,'a') as wf: #Create the doc.kml
if "</body>" in line:
wf.write("</body>\n<img src='files/Ovr0.png' width = 750 height=500</img>\n")
else:
wf.write('%s\n'%(line))
zf.close()
##Rezip the file
zf = zipfile.ZipFile(output,'a')
zf.write(image,arcname='files/Ovr0.png') ##Relative Path to the Image
zf.write(temp,arcname='doc.kml') ##Add revised doc.kml file
zf.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can python load all files in all subfolders without the complete names? - python

You could do something like this which will result in an array of all files with the jpg extension, where parent_dir is the parent directory which contains all the subdirectories with the images. images_files = glob.iglob(f'{parent_dir}/*.jpg', recursive=True) Hope that helps. Good luck

Related

how to separate images in a folder to train folder according to the filename in my train csv file

"RuntimeError: Found 0 files in subfolders of ".. Error about subfolder in Pytorch

Convert pdf to jpg - can't see outputs

How to load multiple images from folder. PyQt4

Python: embed images in kmz

Categories

Resources