File name not matching with files inside the folder - python

Greetings I Have Folder named '10k' which contains images named 1_left 1_right as shown below.
My python code to print names of file in Folder:
main_file = '10k'
path = os.path.join(main_file,'*g')
files = glob.glob(path)
#l='10k\10_left.jpeg'
for f1 in files:
#print(os.path.basename(f1))
fstr=str(f1)
print(fstr)
The output is weird when I Print
it does not the desired names
Output :
Please guide me.

✓ #vidit02100, the problem is really very interesting. As I understood from your code, you want to only print the name of image files present inside 10k directory.
In image, you have not commented the lines which you have commented in the problem's code.
If you will show full code and tell about the number of images inside 10k directory then it would be much better for me to help you.
May be your code will be printing the images in reverse order if there will be +10000 images inside 10k. Please check and let me know.
✓ As I know, jpg, jpeg & png are the most popular image file extensions that ends with g.
✓ So place all these extensions in one list and use another for loop to iterate over it and place your code inside it.
Please comment if my suggestion doesn't satisfy your need. I will update my answer based on your provided inputs and outputs.
✓ Here is your modified code.
main_file = '10k'
file_formats = ["png", "jpg", "jpeg"]
for file_format in file_formats:
path = os.path.join(main_file, '*.' + file_format )
files = glob.glob(path)
for f1 in files:
fstr = str(f1)
print(fstr)

Without more information I can only guess what you wanted this to print.
If your desired output is first 1_left then 1_right and so on like it is presented in your folder, the reason for that is that python sorts the files in a different way than you OS does.
As far as I can tell, files is just a list. So you could sort it yourself using sort and a custom key, like files.sort(key=lambda x: int(x.split("_")[0])).
This will sort the list by the number at the beginning, and numbers other than strings are sorted like you'd probably expect (so first 1, then 2 and so on).

Related

os.walk isn't showing all the files in the given path

I'm trying to make my own backup program but to do so I need to be able to give a directory and be able to get every file that is somewhere deep down in subdirectories to be able to copy them. I tried making a script but it doesn't give me all the files that are in that directory. I used documents as a test and my list with items is 3600 but the amount of files should be 17000. why isn't os.walk showing everything?
import os
data = []
for mdir, dirs, files in os.walk('C:/Users/Name/Documents'):
data.append(files)
print(data)
print(len(data))
Use data.extend(files) instead of data.append(files).
files is a list of files in a directory. It looks like ["a.txt", "b.html"] and so on. If you use append, you end up with data looking like
[..., ["a.txt", "b.html"]]
whereas I suspect you're after
[..., "a.txt", "b.html"]
Using extend will provide the second behaviour.

How to find the first image in a folder?

What I want to achieve is to get the first item in a folder that is a jpg or png image without having to scan the whole folder.
path = os.getcwd()
#List of folders in the path
folders = next(os.walk(path))[1]
#Get the first element
folders_walk = os.walk(path+'\\'+ folder)
firts = next(folders_walk) [2][0]
With this code I get the first element of the folder, but this may or may not be an image. Any advice?
Not sure what you mean by "without having to scan the entire folder". You could use glob(), but that would still scan the entire directory to match the regex.
Anyway, see a solution below. Can easily modify if you don't want a recursive search (as below) / want a different criterion to determine if a file is an image.
import os
search_root_directory = os.getcwd()
# Recursively construct list of files under root directory.
all_files_recursive = sum([[os.path.join(root, f) for f in files] for root, dirs, files in os.walk(search_root_directory)], [])
# Define function to tell if a given file is an image
# Example: search for .png extension.
def is_an_image(fpath):
return os.path.splitext(fpath)[-1] in ('.png',)
# Take the first matching result. Note: throws StopIteration if not found
first_image_file = next(filter(is_an_image, all_files_recursive))
Note that the above will be much more efficient (in the recursive case) if sum() (which pre-computes the entire list of files) is omitted and instead a list of files is handled in is_an_image (but the code is less clear that way).

absolute path for file not working properly python

basically, I'm trying to store the full path for a file in a list but for some reason os.path.abspath() doesnt seem to work properly
files = os.listdir("TRACKER/")
for f in files:
original_listpaths.append(os.path.abspath(f))
print(original_listpaths)
but my output seems to output this :
'C:\Users\******\Documents\folder\example'
the problem is that it should be :
'C:\Users\******\Documents\folder\TRACKER\example'
the difference is that the second one (the correct one) has the TRACKER included which is the official full path for that file but for some reason my output doesn't include the TRACKER
and eliminates it, What's the problem?
You could try the following code:
files = os.scandir("TRACKER/")
print(files)
original_listpaths = []
for f in files:
original_listpaths.append(os.path.abspath(f))
print(original_listpaths)
files.close()
You need to change your directory to "TRACKER" first. Just put os.chdir("TRACKER") before the loop starts after files = os.listdir("TRACKER/").

Dealing with OS Error Python: [Errno 20] Not a directory: '/Volumes/ARLO/ADNI/.DS_Store'

I am trying to write a piece of code that will recursively iterate through the subdirectories of a specific directory and stop only when reaching files with a '.nii' extension, appending these files to a list called images - a form of a breadth first search. Whenever I run this code, however, I keep receiving [Errno 20] Not a directory: '/Volumes/ARLO/ADNI/.DS_Store'
*/Volumes/ARLO/ADNI is the folder I wish to traverse through
*I am doing this in Mac using the Spyder IDE from Anaconda because it is the only way I can use the numpy and nibabel libraries, which will become important later
*I have already checked that this folder directly contains only other folders and not files
#preprocessing all the MCIc files
import os
#import nibabel as nib
#import numpy as np
def pwd():
cmd = 'pwd'
os.system(cmd)
print(os.getcwd())
#Part 1
os.chdir('/Volumes/ARLO')
images = [] #creating an empty list to store MRI images
os.chdir('/Volumes/ARLO/ADNI')
list_sample = [] #just an empty list for an earlier version of
#the program
#Part 2
#function to recursively iterate through folder of raw MRI
#images and extract them into a list
#breadth first search
def extract(dir):
#dir = dir.replace('.DS_Store', '')
lyst = os.listdir(dir) #DS issue
for item in lyst:
if 'nii' not in item: #if item is not a .nii file, if
#item is another folder
newpath = dir + '/' + item
#os.chdir(newpath) #DS issue
extract(newpath)
else: #if item is the desired file type, append it to
#the list images
images.append(item)
#Part 3
adni = os.getcwd() #big folder I want to traverse
#print(adni) #adni is a string containing the path to the ADNI
#folder w/ all the images
#print(os.listdir(adni)) this also works, prints the actual list
"""adni = adni + '/' + '005_S_0222'
os.chdir(adni)
print(os.listdir(adni))""" #one iteration of the recursion,
#works
extract(adni)
print(images)
With every iteration, I wish to traverse further into the nested folders by appending the folder name to the growing path, and part 3 of the code works, i.e. I know that a single iteration works. Why does os keep adding the '.DS_Store' part to my directories in the extract() function? How can I correct my code so that the breadth first traversal can work? This folder contains hundreds of MRI images, I cannot do it without automation.
Thank you.
The .DS_Store files are not being created by the os module, but by the Finder (or, I think, sometimes Spotlight). They're where macOS stores things like the view options and icon layout for each directory on your system.
And they've probably always been there. The reason you didn't see them when you looked is that files that start with a . are "hidden by convention" on Unix, including macOS. Finder won't show them unless you ask it to show hidden files; ls won't show them unless you pass the -a flag; etc.
So, that's your core problem:
I have already checked that this folder directly contains only other folders and not files
… is wrong. The folder does contain at least one regular file; .DS_Store.
So, what can you do about that?
You could add special handling for .DS_Store.
But a better solution is probably to just check each file to see if it's a file or directory, by calling os.path.isdir on it.
Or, even better, use os.scandir instead of listdir, which gives you entries with more information than just the name, so you don't need to make extra calls like isdir.
Or, best of all, just throw out this code and use os.walk to recursively visit every file in every directory underneath your top-level directory.

Iterating through subdirectories to add unique strings to each file

My goal: To build a program that:
Opens a folder (provided by the user) from the user's computer
Iterates through that folder, opening each document in each subdirectory (named according to language codes; "AR," "EN," "ES," etc.)
Substitutes a string in for another string in each document. Crucially, the new string will change with each document (though the old string will not), according to the language code in the folder name.
My level of experience: Minimal; been learning python for a few months but this is the first program I'm building that's not paint-by-numbers. I'm building it to make a process at work faster. I'm sure I'm not building this as efficiently as possible; I've been throwing it together from my own knowledge and from reading stackexchange religiously while building it.
Research I've done on my own: I've been living in stackexchange the past few days, but I haven't found anyone doing quite what I'm doing (which was very surprising to me). I'm not sure if this is just because I lack the vocabulary to search (tried out a lot of search terms, but none of them totally match what I'm doing) or if this is just the wrong way of going about things.
The issue I'm running into:
I'm getting this error:
Traceback (most recent call last):
File "test5.py", line 52, in <module>
for f in os.listdir(src_dir):
OSError: [Errno 20] Not a directory: 'ExploringEduTubingEN(1).txt'
I'm not sure how to iterate through every file in the subdirectories and update a string within each file (not the file names) with a new and unique string. I thought I had it, but this error has totally thrown me off. Prior to this, I was getting an error for the same line that said "Not a file or directory: 'ExploringEduTubingEN(1).txt'" and it's surprising to me that the first error could request a file or a directory, and once I fixed that, it asked for just a directory; seems like it should've just asked for a directory at the beginning.
With no further ado, the code (placing at bottom because it's long to include context):
import os
ex=raw_input("Please provide an example PDF that we'll append a language code to. ")
#Asking for a PDF to which we'll iteratively append the language codes from below.
lst = ['_ar.pdf', '_cs.pdf', '_de.pdf', '_el.pdf', '_en_gb.pdf', '_es.pdf', '_es_419.pdf',
'_fr.pdf', '_id.pdf', '_it.pdf', '_ja.pdf', '_ko.pdf', '_nl.pdf', '_pl.pdf', '_pt_br.pdf', '_pt_pt.pdf', '_ro.pdf', '_ru.pdf',
'_sv.pdf', '_th.pdf', '_tr.pdf', '_vi.pdf', '_zh_tw.pdf', '_vn.pdf', '_zh_cn.pdf']
#list of language code PDF appending strings.
pdf_list=open('pdflist.txt','w+')
#creating a document to put this group of PDF filepaths in.
pdf2='pdflist.txt'
#making this an actual variable.
for word in lst:
pdf_list.write(ex + word + "\n")
#creating a version of the PDF example for every item in the language list, and then appending the language codes.
pdf_list.seek(0)
langlist=pdf_list.readlines()
#creating a list of the PDF paths so that I can use it below.
for i in langlist:
i=i.rstrip("\n")
#removing the line breaks.
pdf_list.close()
#closing the file after removing the line breaks.
file1=raw_input("Please provide the full filepath of the folder you'd like to convert. ")
#the folder provided by the user to iterate through.
folder1=os.listdir(file1)
#creating a list of the files within the folder
pdfpath1="example.pdf"
langfile="example2.pdf"
#setting variables for below
#my thought here is that i'd need to make the variable the initial folder, then make it a list, then iterate through the list.
for ogfile in folder1:
#want to iterate through all the files in the directory, including in subdirectories
src_dir=ogfile.split("/",6)
src_dir="/".join(src_dir[:6])
#goal here is to cut off the language code folder name and then join it again, w/o language code.
for f in os.listdir(src_dir):
f = os.path.join(src_dir, f)
#i admit this got a little convoluted–i'm trying to make sure the files put the right code in, I.E. that the document from the folder ending in "AR" gets the PDF that will now end in "AR"
#the perils of pulling from lots of different questions in stackexchange
with open(ogfile, 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
for langfile in langlist:
f.write(content.replace(pdfpath1, langfile))
#replacing the placeholder PDF link with the created PDF links from the beginning of the code
If you read this far, thanks. I've tried to provide as much information as possible, especially about my thought process. I'll keep trying things and reading, but I'd love to have more eyes on it.
You have to specify the full path to your directories/files. Use os.path.join to create a valid path to your file or directory (and platform-independent).
For replacing your string, simply modify your example string using the subfolder name. Assuming that ex as the format filename.pdf, you could use: newstring = ex[:-4] + '_' + str.lower(subfolder) + '.pdf'. That way, you do not have to specify the list of replacement strings nor loop through this list.
Solution
To iterate over your directory and replace the content of your files as you'd like, you can do the following:
# Get the name of the file: "example.pdf" (note the .pdf is assumed here)
ex=raw_input("Please provide an example PDF that we'll append a language code to. ")
# Get the folder to go through
folderpath=raw_input("Please provide the full filepath of the folder you'd like to convert. ")
# Get all subfolders and go through them (named: 'AR', 'DE', etc.)
subfolders=os.listdir(folderpath)
for subfolder in subfolders:
# Get the full path to the subfolder
fullsubfolder = os.path.join(folderpath,subfolder)
# If it is a directory, go through it
if os.path.isdir(fullsubfolder):
# Find all files in subdirectory and go through each of them
files = os.listdir(fullsubfolder)
for filename in files:
# Get full path to the file
fullfile = os.path.join(fullsubfolder, filename)
# If it is a file, process it (note: we do not check if it is a text file here)
if os.path.isfile(fullfile):
with open(fullfile, 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
# Create the replacing string based on the subdirectory name. Ex: 'example_ar.pdf'
newstring = ex[:-4] + '_' + str.lower(subfolder) + '.pdf'
f.write(content.replace(ex, newstring))
Note
Instead of asking the user to find write the folder, you could ask him to open the directory with a dialog box. See this question for more info: Use GUI to open directory in Python 3

Categories