Splitting filenames to find files that I want to interrogate - python

I have a folder structure as follows:
TestOpt > roll_1_oe_2017-03-10
> roll_2_oe_2017-03-05
: :
> roll_600_oe_2012-05-10
TestOpt is the main folder and roll__oe_ are the sub folders which hold .csv records that I am looking to interrogate if they are inside a certain range of rolls.
I am trying to analyse the file names as I only wish to interrogate records where he sub folder has a roll greater than say 500 (so I would like to interrogate the records in folder roll_500_oe_2012-05-10 to roll_600_oe_2012-05-10 inclusive)
I have tried splitting the folder name by "_" so I can retrieve the roll number, but I am having a problem in that I can't get the code past the TestOpt filename. Please see below for code:
rootdir = r'C:/Users/Stacey/Documents/TestOpt/'
#cycle through all the folders in the TestOpt directory
for dirName,sundirList, fileList in os.walk(rootdir):
#print('Found directory: %s' % dirName)
#split the file name by _
x = dirName.split("_")
print('list length ',len(x))
#If the length of the folder name is greater than 1 its not the TestOpt folder
if len(x) > 1:
#the second split list element is the roll number
roll = x[2]
#interrogate records in folder id roll is greater or equal to 500
if roll >= 500:
print('myroll1 ',roll)
for fname in fileList:
do something....
If anyone can offer any assistance I would be most grateful
Thanks

You'll need to explicitly state that roll is an integer, as the list made from the filename is a list of strings.
Use roll = int(x[2]).

Related

Trying to get number of files total in a directory

I am currently trying read a directory and determine how many files total are in each of its top level directory, including each of their subdirectory. The "folders check scan" function is supposed to reach further down into subdirectories and count the files in each by calling itself, but it only counts the files in the first level of subdirectories. Is there something I am missing?
import os
import csv
def folders_check_scan(value):
count = 0
for y in os.scandir(value):
if y.is_dir():
folders_check_scan(y)
elif y.is_file():
count += 1
return count
#function that takes summary of what is in repository
def list_files(startpath):
#get number of folders in the first level of this directory
print(len(next(os.walk(startpath))[1]))
#gets top level folders in directory
directory = next(os.walk(startpath))[1]
print(startpath)
directory_path = []
for b in directory:
newline = startpath + "\\" + b
directory_path.append(newline)
#call function to take record of number of files total in directory
for x in directory_path:
count = folders_check_scan(x)
print(x)
print(count)
It looks like you aren't storing the result of the recursive call:
if y.is_dir():
count += folders_check_scan(y)
elif y.is_file():
count += 1

Please explain the conclusion of this code

can you tell me what exactly the variable "count" counts ?
count = 0
for (r,d,f) in
os.walk(os.getcwd()):
count += 1
print("a =", count)
#a = 124
As help(os.walk) will tell you:
Directory tree generator.
For each directory in the directory tree rooted at top (including top itself, but excluding '.' and '..'), yields a 3-tuple dirpath, dirnames, filenames
So, count counts folders (also nested ones) in (and including) the current working directory.

How to copy one file into one folder accordingly?

I would like to make a program that can copy one file(e.g. images) to another directory which contains several folders. By just copying all the images to another directory is easy but I wanted it to be one image copies to one folder.
I looped every single element in both directory and globalized them. I tried copying one file into folder but got errors. I think the main problem I cannot do it is because I lack of the idea how to just copy one file to one folder while looping. I hope you can give me some advice on this matter.
import os
import shutil
path = os.listdir('C:\\Users\\User\\Desktop\\img')
#dst1 = os.path.abspath('C:\\Users\\User\\Desktop\\abc')
idst = os.listdir('C:\\Users\\User\\Desktop\\abc')
def allimgs():
counter = 0
for imgs in path:
if imgs.endswith('.JPG'):
counter += 1
#if hits the 24th images then stop and
#copy the first until 24 to another 24 folders one by one
if counter > 24:
break
else:
src = os.path.join('C:\\Users\\User\\Desktop\\img',imgs)
def allfolders():
for folders in idst:
if folders.endswith('.db'):
continue #to skip the file ends with .db
dst = os.path.join('C:\\Users\\User\\Desktop\\abc',folders)
shutil.copy(allimgs(),allfolders()) #here is where i stuck
First of all, make both functions return lists of strings which contain the full paths of the images which will be copied, and the directories to where they will be copied. Afterwards, save the results of allimgs() and allfolders() into variables and loop through them.
Here is how the first function should look like:
def allimgs():
ret = []
counter = 0
for imgs in path:
if imgs.endswith('.JPG'):
counter += 1
#if hits the 24th images then stop and
#copy the first until 24 to another 24 folders one by one
if counter > 24:
break
else:
src = os.path.join('C:\\Users\\User\\Desktop\\img',imgs)
ret.append(src)
return ret
(I left the other one for you as an exercise)
Then loop over them:
for image_path in allimgs():
for folder_path in allfolders():
shutil.copy(image_path, folder_path)

Time module and file changes

I need to write a script that does the following
Write a python script to list all of the files and directories in the current directory and all subdirectories that have been modified in the last X minutes.
X should be taken in as a command-line argument.
Check that this argument exists, and exit with a suitable error message if it doesn’t.
X should be an int which is less than or equal to 120. If not, exit with a suitable error message.
For each of these files and directories, list the time of modification, whether it is a file or directory,
and its size.
I have come up with this
#!/usr/bin/python
import os,sys,time
total = len(sys.argv)
if total < 2:
print "You need to enter a value in minutes"
sys.exit()
var = int(sys.argv[1])
if var < 1 or var > 120 :
print "The value has to be between 1 and 120"
sys.exit()
past = time.time() - var * 60
result = []
dir = os.getcwd()
for p, ds, fs in os.walk(dir):
for fn in fs:
filepath = os.path.join(p, fn)
status = os.stat(filepath).st_mtime
if os.path.getmtime(filepath) >= past:
size = os.path.getsize(filepath)
result.append(filepath)
created = os.stat(fn).st_mtime
asciiTime = time.asctime( time.gmtime( created ) )
print "Files that have changed are %s"%(result)
print "Size of file is %s"%(size)
So it reports back with something like this
Files that have changed are ['/home/admin/Python/osglob2.py']
Size of file is 729
Files that have changed are ['/home/admin/Python/osglob2.py', '/home/admin/Python/endswith.py']
Size of file is 285
Files that have changed are ['/home/admin/Python/osglob2.py', '/home/admin/Python/endswith.py', '/home/admin/Python/glob3.py']
Size of file is 633
How can i get this to stop reepeating the files ?
The reason your code builds a list of all the files it's encountered is
result.append(filepath)
and the reason it prints out that whole list every time is
print "Files that have changed are %s"%(result)
So you will need to change one of those lines: either replace the list, rather than appending to it, or (much more sensible IMO) just print out the one latest filename found, rather than the whole list.
You aren't clearing your result list at the end of each iteration. Try something like result.clear() after your second print statement. Make sure it is on the same indent as the for though, not the print.

Find number of sub-directories in each folder separately

I have a folder that holds a certain number of folders and they all contain a folder within a folder, I want to check the number of subdirectories in each of these folders. I tried using os.walk and adding +1 each time it comes across a folder. But this returns the sub-directory count of all the directories, I want them separately for each folder.
for eg, lets say I have folder A1 and A2.
A1: subfolder1 -(contains)-> subfolder2
A2: subfolder1 -(contains)-> subfolder2 -(contains)-> subfolder3 -(contains)-> subfolder4
Right now my code returns 6 instead of 2 and 4.
def count_folders(path):
count=0
for dir in os.listdir(path):
nDir = os.path.join(path,dir)
if os.path.isdir(nDir):
for dirs in os.walk(nDir):
if os.path.isdir(dirs[0]):
count+=1
print count
This works fine when I try it here:
def count_folders(path):
count = 0
for root, dirs, files in os.walk(pth):
count += len(dirs)
return count
To understand how this works try to print "root", "dirs" and "files" one by one.
Documentation
A short tutorial
Maybe you can comment out these three lines, that can prevent the count variable to count the number of 'subfolders of subfolder'.
import os
def count_folders(path):
count=0
for dir in os.listdir(path):
nDir = os.path.join(path,dir)
if os.path.isdir(nDir):
count+=1
# for dirs in os.walk(nDir):
# if os.path.isdir(dirs[0]):
# count+=1
print count
If you are looking for the count of subdirs inside each subdir in path you can try this function:
def count_folders(path):
count={}
for dir in os.listdir(path):
nDir = os.path.join(path,dir)
if os.path.isdir(nDir):
c = 0
for d in os.listdir(nDir):
if os.path.isdir(os.path.join(nDir, d)):
c+=1
count[nDir] = c
print count
It returns a dictionary with the count of subdirs inside each subdir of path.

Categories