Understanding os.walk Python

Understanding os.walk Python - python

I'm trying to walk over the directory structure and create a similar structure (but not identical).
I got confused of the use of os.path.join, the following code with 2 or more directory depth works perfectly.
DIR_1 :
A | file2.txt
B | file3.txt
file1.txt
inputpath = DIR_1
outputpath = DIR_2
for dirpath, dirnames, filenames in os.walk(inputpath):
structure = os.path.join(outputpath, dirpath[len(inputpath):])
for f1 in filenames:
f = os.path.splitext(f1)[0]
path = structure + '/' + f
print ("The path is: ", path)
file1 = path + '/' + f1
print ("The file path is: ", file1)
file_dir = dirpath + '/' + f1;
print ("The file dir path is: ", file_dir)
print ("\n")
But in case of just one level of depth, it add additional '/'. Is there a way to avoid this?
For example the following gives:
The path is: DIR_2//file1
The file path is: DIR_2//file1/file1.txt
The file dir path is: DIR_1/file1.txt
The path is: /A/file2
The file path is: /A/file2/file2.txt
The file dir path is: DIR_1/A/file2.txt
The path is: /B/file3
The file path is: /B/file3/file3.txt
The file dir path is: DIR_1/B/file3.txt
Edit 1:
The output directory DIR_2 structure is similar to the original Dir_1 but not identical.
The DIR_2 should have additional one level of directory of the filename; for example rather than just
DIR_2/file1.txt
it should be
DIR_2/file1/file1.txt.
DIR_2/A/file2/file2.txt. Similarly.
Edit 2:
I also need to read the content of the dirpath (of DIR_1) and select relevant text to put in the corresponding output file (of DIR_2). So i can't ignore it.

You should not worry about the dirpath, use it only to get the original files, all information to recreate the directory structure you already have in dirnames. The code to recreate the file structure can look like this:
for root, dirs, files in os.walk( input_path ) :
offset = len(input_path)
if len(root) > len(input_path) :
offset += 1 # remove an extra leading separator
relative_path = root[offset:]
for d in dirs : # create folders
os.mkdir( os.path.join( output_path, relative_path, d )
for f in files : # copy the files
shutil.copy( os.path.join( root, f),
os.path.join( output_path, relative_path, f))
And that's it!

Related

Rename with Counter using Python

I would like to mention that pretty much all of these answers worked well to controlling what it renames, I wanted to place a check on all that worked, only one answer did not work, but if this helps anyone, they will have 3 out of 4 answers that works
My shared script works great, but it renames everything it finds in the directory so please be careful when using my shared script
for super large files I use this python
import os
# Function to rename multiple files
def main():
i = 1000
path="C:/Users/user/Desktop/My Folder/New folder/New folder/"
for filename in os.listdir(path):
my_dest ="(" + str(i) + ")" + ".txt"
my_source =path + filename
my_dest =path + my_dest
# rename() function will
# rename all the files
os.rename(my_source, my_dest)
i += 1
# Driver Code
if __name__ == '__main__':
# Calling main() function
main()
OK, so I am trying to control the counter to only see txt files, if I have .txt, .jpeg, .mpeg,
everything gets rename, how can I control this to only .txt files
One more problem, when I use this Python counter or a batch counter it flips my file names
Example
File_2019.txt - this should be renamed to (1000).txt
FileRecycled_2019.txt - this should be renamed to (1001).txt
Outcome
FileRecycled_2019.txt - this should be renamed to (1000).txt
File_2019.txt - this should be renamed to (1001).txt
When using this method based on filename it flips the order of my files
It takes it out of alphabetical order
I am working on a solution for the names being flipped once I find it I will share it so if it helps others
OK, so I have a underscore remover batch file, and that fixed the flipping
and it renames correctly
for smaller files I will use this
#echo off
Setlocal enabledelayedexpansion
Set "Pattern=_"
Set "Replace= "
For %%a in (*.txt) Do (
Set "File=%%~a"
Ren "%%a" "!File:%Pattern%=%Replace%!"
)
set count=1000
for %%f in (*.txt) do (
set /a count+=1
ren "%%f" "(!count!).txt"
)

You can accomplish this by using the pathlib module
from pathlib import Path
from os import chdir
path = Path.home() / 'desktop' / 'My Folder' / 'New folder' / 'New folder' # The path to use
to_usenum = 1000 # Start num
alltxt = list(path.glob('*.txt')) # Getting all the txt files
chdir(path) # Changing the cwd to the path
for i, txtfile in enumerate(alltxt):
to_usename = f"({to_usenum+i}).txt" # The name to use
txtfile.rename(to_usename)
The pathlib module comes in handy when it comes to files handling. The os module was used in the code to change the current working directory to the path's location because the renamed file will be placed in the current working directory.

You could check the filename has .txt before renaming.
if filename.endswith(".txt"):
os.rename(my_source, my_dest)
i += 1
On the filenames, you haven't specified an order for the names. You could use for filename in sorted(os.listdir(path)): to move through in alphabetical order.
My solution would be:
import os
import glob
def main():
path = "C:/Users/user/Desktop/My Folder/New folder/New folder/"
suffix = '.txt'
files = glob.glob(path + '*' + suffix)
for idx, filename in enumerate(sorted(files)):
os.rename(
filename,
os.path.join(path, f"({1000 + idx})" + suffix)
)
if __name__=="__main__":
main()
This uses glob to get all file paths in the folder with .txt suffix. It also uses enumerate to count each file rather than having to count the i value yourself. The file name is generated using an f-string.

I solved it an easy way:
import os
# Function to rename multiple files
def main():
i = 1000
path = 'C:/Users/user/Desktop/My Folder/New folder/New folder/'
for filename in os.listdir(path):
my_dest = f'({str(i)}).txt'
my_dest = path + my_dest
my_source = path + filename
ext = my_source.split('.')[-1]
if ext == 'txt':
os.rename(my_source, my_dest)
i += 1
# Driver Code
if __name__ == '__main__':
# Calling main() function
main()

my suggestion would be to use glob, give it a go
import os
import glob
# Function to rename multiple files
def main():
i = 1000
path="C:/Users/user/Desktop/My Folder/New folder/New folder/"
files = glob.glob(path + '*.txt')
for filename in files:
my_dest ="(" + str(i) + ")" + ".txt"
my_source =path + filename
my_dest =path + my_dest
# rename() function will
# rename all the files
os.rename(my_source, my_dest)
i += 1
# Driver Code
if __name__ == '__main__':
# Calling main() function
main()
this will search for any file name ending with txt extension

How to create a python list with the number of file in each sub directory of a directory

I have a main directory(root) which countain 6 sub directory.
I would like to count the number of files present in each sub directory and add all to a simple python list.
For this result : mylist = [497643, 5976, 3698, 12, 456, 745]
I'm blocked on that code:
import os, sys
list = []
# Open a file
path = "c://root"
dirs = os.listdir( path )
# This would print all the files and directories
for file in dirs:
print (file)
#fill a list with each sub directory number of elements
for sub_dir in dirs:
list = dirs.append(len(sub_dir))
My trying for the list fill doesn't work and i'm dramaticaly at my best...
Finding a way to iterate sub-directory of a main directory and fill a list with a function applied on each sub directory would sky rocket the speed of my actual data science project!
Thanks for your help
Abel

You can use os.path.isfile and os.path.isdir
res = [len(list(map(os.path.isfile, os.listdir(os.path.join(path, name))))) for name in os.listdir(path) if os.path.isdir(os.path.join(path, name))]
print(res)
Using the for loop
res = []
for name in os.listdir(path):
dir_path = os.path.join(path, name)
if os.path.isdir(dir_path):
res.append(len(list(map(os.path.isfile, os.listdir(dir_path)))))

You need to use os.listdir on each subdirectory. The current code simply takes the length of a filepath.
import os, sys
list = []
# Open a file
path = "c://root"
dirs = os.listdir( path )
# This would print all the files and directories
for file in dirs:
print (file)
#fill a list with each sub directory number of elements
for sub_dir in dirs:
temp = os.listdir(sub_dir)
list = dirs.append(len(temp))
Adding this line to the code will list out the subdirectory

You were almost there:
import os, sys
list = []
# Open a file
path = "c://root"
dirs = os.listdir(path)
# This would print all the files and directories
for file in dirs:
print(file)
for sub_dir in dirs:
if os.path.isdir(sub_dir):
list.append(len(os.listdir(os.path.join(path, sub_dir))))
print(list)

As an alternative, you can also utilize glob module for this and other related tasks.
I have created a test directory containing 3 subdirectories l,m and k containing 3 test files each.
import os, glob
list = []
path = "test" # you can leave this "." if you want files in the current directory
for root, dirs, files in os.walk(path, topdown=True):
for name in dirs:
list.append(len(glob.glob(root + '/' + name + '/*')))
print(list)
Output :
[3, 3, 3]

How to use the renaming function

This is the error which I get:
The system cannot find the file specified: '1.jpg' -> '0.jpg'
even through i have a file named 1.jpg in the directory.
I'm making file renaming script that renames all files in the directory given with a number that increases +1 with every file.
import os
def moving_script():
directory = input("Give the directory")
xlist = os.listdir(directory)
counter = 0
for files in xlist:
os.rename(files, str(counter)+".jpg")
counter = counter + 1
moving_script()
It should be renaming all files, to "0.jpg", "1.jpg" etc

Code:
import os
def moving_script():
directory = input("Give the directory")
xlist = os.listdir(directory)
counter = 0
for files in xlist:
os.rename(os.path.join(directory, files),
os.path.join(directory, str(counter)+".jpg"))
counter = counter + 1
if __name__ == '__main__':
moving_script()
Results:
~/Documents$ touch file0 file1 file2 file3 file4
ls ~/Documents/
file0 file1 file2 file3 file4
$ python renamer.py
Give the directory'/home/suser/Documents'
$ ls ~/Documents/
0.jpg 1.jpg 2.jpg 3.jpg 4.jpg

os.listdir() will return filenames, but will not include path. Thus when you pass files to os.rename() it's looking for it in the current working directory, not the one where they are (i.e. supplied by the user).
import os
def moving_script():
directory = input("Give the directory")
counter = -1
for file_name in os.listdir(directory):
old_name = os.path.join(directory, file_name)
ext = os.path.splitext(file_name)[-1] # get the file extension
while True:
counter += 1
new_name = os.path.join(directory, '{}{}'.format(counter, ext))
if not os.path.exists(new_name):
os.rename(old_name, new_name)
break
moving_script()
note that this code detects what the file extension is. In your code you may rename a non-jpg file with .jpg extension. To avoid this you may change os.listdir(directory) to glob.glob(os.path.join(directory, *.jpg')) and it will iterate only over '*.jpg' files. Don't forget you need to import glob and also on Linux it's case-sensitive, so '*.jpg' will not return '*.JPG' files
EDIT: code updated to check if new file name already exists.

Code always writes filename in the output

I am trying to find which files have not had a relevant file with a similar filename (almost) so that I can generate them. But this code writes all file names basically whereas I want it to go through the first directory, go through the files and check if they have their equivilent _details.txt in the other folder, if not write the name.
I have in folder 1 those two 11.avi and 22.avi and in folder two only 11_details.txt , so am sure i should get one filename as a result
import os,fnmatch
a = open("missing_detailss5.txt", "w")
for root, dirs, files in os.walk("1/"):
for file1 in files:
if file1.endswith(".dat"):
for root, dirs, files in os.walk("2/"):
print(str(os.path.splitext(file1)[0]) + "_details.txt")
print(files)
if not (os.path.splitext(file1)[0] + "_details.txt") in files:
print(str(os.path.splitext(file1)[0]) + "_details.txt is missing")
a.write(str(os.path.splitext(file1)[0]) + "_details.txt" + os.linesep)
a.close()
here is my debug >>>
11_details.txt
['22_details.txt']
11_details.txt is missing
22_details.txt
['22_details.txt']
22_details.txt is missing

I just corrected your code directly without writing new code, you just missed a txt extension on the comparaison if.
import os
a = open("missing_detailss4.txt", "w")
for root, dirs, files in os.walk("1/"):
for file in files:
if file.endswith(".avi"):
for root, dirs, files in os.walk("2/"):
if not (str(os.path.splitext(file)[0]) + "_details.txt") in files:
a.write(str(os.path.splitext(file)[0]) + "_details.txt" + os.linesep)
a.close()

If I read your question correctly, the files ending in "_details.txt" are supposed to be in the same (relative) directory. That is, "1/some/path/file.avi" should have a corresponding file "2/some/path/file_details.txt". If that's the case, you need not iterate twice:
import os
with open("missing_detailss5.txt", "w") as outfile:
path1 = '1/'
path2 = '2/'
allowed_extensions = ['.dat', '.avi']
for root, dirs, files in os.walk(path1):
for file1 in files:
file1, ext = os.path.splitext(file)
if ext not in allowed_extensions: continue
path2 = os.path.join(path2, os.path.relpath(os.path.join(root, file1 + '_details.txt'), path1))
if not os.path.exists(path2):
print(os.path.basename(path2) + ' is missing.')
outfile.write(os.path.basename(path2) + os.linesep)
If you don't care about which extensions to check for in the first folder, then delete allowed_extensions = ['.dat', '.avi'] and if ext not in allowed_extensions: continue lines, and change file1, ext = os.path.splitext(file) to file1 = os.path.splitext(file)[0].

python, move all files from 3rd,4th , 5th to 2nd level of directory tree

I know we have os.walk but I can't figure out how to create this.
Lets say I have the following folder structure on a ubuntu linux box:
Maindir (as root called by script)
+- subdir-one
| +-subdir-two
| +-file
| +-another file
| +-subdir-three
| +-file3
| +-file4
| +-subdir-four
| +- file5
| +- file6
+- subdir-two
+- subdir-three
| +-sub-subdir-two
| +-file
| +-another file
| +-subdir-three
| +-file3
| +-file4
| +-subdir-four
| +-file5
| +-file6
+-subdir-four
+-subdir-two
+-file
+-another file
+-subdir-three
+-file3
+-file4
+-subdir-four
+-file5
+-file6
I want to move all files from the subdir's to the subdirs on level 2, not to the root level.
Take subdir-one as example: Move all files in subdir-four to subdir-one (in this case file5 and file6), Move all files from subdir-three to subdir-one (in this case file3 and file4)
Subdir-two has no other subdirs so can be skipped by the script.
Subdir-three: move all files from sub-subdir-two, subdir-three and subdir-four to subdir-three.
I think you get the point. No problem if files are overwritten, if they have the same name they are duplicates anyway, one reason for running this cleanup script.
When all files are moved from the subdir's it means the subdir's will be empty so I also want to remove the empty sub-dirs.
Update on 14-1-2012: This is the changed code given from jcollado but still not working. Btw I forgot to mention that I also need to filter some directory names. These directory names need to be excluded from being processed when found within the directory tree..
The code I slightly changed:
import os, sys
def main():
try:
main_dir = sys.argv[1]
print main_dir
# Get a list of all subdirectories of main_dir
subdirs = filter(os.path.isdir,
[os.path.join(main_dir, path)
for path in os.listdir(main_dir)])
print subdirs
# For all subdirectories,
# collect all files and all subdirectories recursively
for subdir in subdirs:
files_to_move = []
subdirs_to_remove = []
for dirpath, dirnames, filenames in os.walk(subdir):
files_to_move.extend([os.path.join(dirpath, filename)
for filename in filenames])
subdirs_to_remove.extend([os.path.join(dirpath, dirname)
for dirname in dirnames])
# To move files, just rename them replacing the original directory
# with the target directory (subdir in this case)
print files_to_move
print subdirs_to_remove
for filename in files_to_move:
source = filename
destination = os.path.join(subdir, os.path.basename(filename))
print 'Destination ='+destination
if source != destination:
os.rename(source, destination)
else:
print 'Rename cancelled, source and destination were the same'
# Reverse subdirectories order to remove them
# starting from the lower level in the tree hierarchy
subdirs_to_remove.reverse()
# Remove subdirectories
for dirname in subdirs_to_remove:
#os.rmdir(dirname)
print dirname
except ValueError:
print 'Please supply the path name on the command line'
if __name__ == '__main__':
main()

I'd something as follows:
import os
main_dir = 'main'
# Get a list of all subdirectories of main_dir
subdirs = filter(os.path.isdir,
[os.path.join(main_dir, path)
for path in os.listdir(main_dir)])
# For all subdirectories,
# collect all files and all subdirectories recursively
for subdir in subdirs:
files_to_move = []
subdirs_to_remove = []
for dirpath, dirnames, filenames in os.walk(subdir):
files_to_move.extend([os.path.join(dirpath, filename)
for filename in filenames])
subdirs_to_remove.extend([os.path.join(dirpath, dirname)
for dirname in dirnames])
# To move files, just rename them replacing the original directory
# with the target directory (subdir in this case)
for filename in files_to_move:
source = filename
destination = os.path.join(subdir, os.path.basename(filename))
os.rename(source, destination)
# Reverse subdirectories order to remove them
# starting from the lower level in the tree hierarchy
subdirs_to_remove.reverse()
# Remove subdirectories
for dirname in subdirs_to_remove:
os.rmdir(dirname)
Note: You can turn this into a function just using main_dir as a parameter.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding os.walk Python - python

Related

Rename with Counter using Python

How to create a python list with the number of file in each sub directory of a directory

How to use the renaming function

Code always writes filename in the output

python, move all files from 3rd,4th , 5th to 2nd level of directory tree

Categories

Resources