Traversing File Directory - python

this is the first question I am posting on stackoverflow so excuse me if I did something out of the norm.
I am trying to create a python program which traverses a user selected directory to display all file contents of the folders selected. For example: Documents folders has several folders with files inside of them, I am trying to save all files in the Documents folder to an array.
The method below is what I am using to traverse a directory (hoping it is a simple problem)
def saveFilesToArray(dir):
allFiles = []
os.chdir(dir)
for file in glob.glob("*"):
print(file)
if (os.path.isfile(file)):
allFiles.append(file)
elif(os.path.isdir(file)):
print(dir + "/" + file + " is a directory")
allFiles.append(saveFilesToArray(dir + "/" + file))
return allFiles

This will give you just the files:
import os
def list_files(root):
all_files = []
for root, dirs, files in os.walk(root, followlinks=True):
for file in files:
full_path = os.path.join(root, file)
all_files.append(full_path)
return all_files

I hope this is helpful:
import os
def saveFilesToList(theDir):
allFiles = []
for root, dirs, files in os.walk(theDir):
for name in files:
npath = os.path.join(root,name)
if os.path.isfile(npath):
allFiles.append(npath)
return allFiles
Traverses all directories and stores the path to files (that are not directories) in the list. It seems much easier to use this than glob.

Related

Moving only one file of each sub directories to new sub directories

I have question regarding moving one file in each sub directories to other new sub directories. So for example if I have directory as it shown in the image
And from that, I want to pick only the first file in each sub directories then move it to another new sub directories with the same name as you can see from the image. And this is my expected result
I have tried using os.walk to select the first file of each sub directories, but I still don't know how to move it to another sub directories with the same name
path = './test/'
new_path = './x/'
n = 1
fext = ".png"
for dirpath, dirnames, filenames in os.walk(path):
for filename in [f for f in filenames if f.endswith(fext)][:n]:
print(filename) #this only print the file name in each sub dir
The expected result can be seen in the image above
You are almost there :)
All you need is to have both full path of file: an old path (existing file) and a new path (where you want to move it).
As it mentioned in this post you can move files in different ways in Python. You can use "os.rename" or "shutil.move".
Here is a full tested code-sample:
import os, shutil
path = './test/'
new_path = './x/'
n = 1
fext = ".png"
for dirpath, dirnames, filenames in os.walk(path):
for filename in [f for f in filenames if f.endswith(fext)][:n]:
print(filename) #this only print the file name in each sub dir
filenameFull = os.path.join(dirpath, filename)
new_filenameFull = os.path.join(new_path, filename)
# if new directory doesn't exist - you create it recursively
if not os.path.exists(new_path):
os.makedirs(new_path)
# Use "os.rename"
#os.rename(filenameFull, new_filenameFull)
# or use "shutil.move"
shutil.move(filenameFull, new_filenameFull)

Copying files in python using shutil

I have the following directory structure:
-mailDir
-folderA
-sub1
-sub2
-inbox
-1.txt
-2.txt
-89.txt
-subInbox
-subInbox2
-folderB
-sub1
-sub2
-inbox
-1.txt
-2.txt
-200.txt
-577.txt
The aim is to copy all the txt files under inbox folder into another folder.
For this I tried the below code
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
shutil.copy(path.join(ii,i),destDir)
If the inbox directory only has .txt files then the above code works fine. Since the inbox folder under folderA directory has other sub directory along with .txt files, the code returns permission denied error. What I understood is shutil.copy won't allow to copy the folders.
The aim is to copy only the txt files in every inbox folder to some other location. If the file names are same in different inbox folder I have to keep both file names. How we can improve the code in this case ? Please note other than .txt all others are folders only.
One simple solution is to filter for any i that does not have the .txt extension by using the string endswith() method.
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
if i.endswith('.txt'):
shutil.copy(path.join(ii,i),destDir)
This should ignore any folders and non-txt files that are found with os.listdir(ii). I believe that is what you are looking for.
Just remembered that I once wrote several files to solve this exact problem before. You can find the source code here on my Github.
In short, there are two functions of interest here:
list_files(loc, return_dirs=False, return_files=True, recursive=False, valid_exts=None)
copy_files(loc, dest, rename=False)
For your case, you could copy and paste these functions into your project and modify copy_files like this:
def copy_files(loc, dest, rename=False):
# get files with full path
files = list_files(loc, return_dirs=False, return_files=True, recursive=True, valid_exts=('.txt',))
# copy files in list to dest
for i, this_file in enumerate(files):
# change name if renaming
if rename:
# replace slashes with hyphens to preserve unique name
out_file = sub(r'^./', '', this_file)
out_file = sub(r'\\|/', '-', out_file)
out_file = join(dest, out_file)
copy(this_file, out_file)
files[i] = out_file
else:
copy(this_file, dest)
return files
Then just call it like so:
copy_files('mailDir', 'destFolder', rename=True)
The renaming scheme might not be exactly what you want, but it will at least not override your files. I believe this should solve all your problems.
Here you go:
import os
from os import path
import shutil
destDir = '<absolute-path>'
for root, dirs, files in os.walk(os.getcwd()):
# Filter out only '.txt' files.
files = [f for f in files if f.endswith('.txt')]
# Filter out only 'inbox' directory.
dirs[:] = [d for d in dirs if d == 'inbox']
for f in files:
p = path.join(root, f)
# print p
shutil.copy(p, destDir)
Quick and simple.
sorry, I forgot the part where, you also need unique file names as well. The above solution only works for distinct file names in a single inbox folder.
For copying files from multiple inboxes and having a unique name in the destination folder, you can try this:
import os
from os import path
import shutil
sourceDir = os.getcwd()
fixedLength = len(sourceDir)
destDir = '<absolute-path>'
filteredFiles = []
for root, dirs, files in os.walk(sourceDir):
# Filter out only '.txt' files in all the inbox directories.
if root.endswith('inbox'):
# here I am joining the file name to the full path while filtering txt files
files = [path.join(root, f) for f in files if f.endswith('.txt')]
# add the filtered files to the main list
filteredFiles.extend(files)
# making a tuple of file path and file name
filteredFiles = [(f, f[fixedLength+1:].replace('/', '-')) for f in filteredFiles]
for (f, n) in filteredFiles:
print 'copying file...', f
# copying from the path to the dest directory with specific name
shutil.copy(f, path.join(destDir, n))
print 'copied', str(len(filteredFiles)), 'files to', destDir
If you need to copy all files instead of just txt files, then just change the condition f.endswith('.txt') to os.path.isfile(f) while filtering out the files.

How to replace the txt file in a directory

There is directory A, which contains several subdirectories of txt files. There is another directory B, which contains txt files. There are several txt files in A that have the same name in B but different content. Now I want to move the txt files in B to A and cover the files with the same name. My Code is as below:
import shutil
import os
src = '/PATH/TO/B'
dst = '/PATH/TO/A'
file_list = []
for filename in os.walk(dst):
file_list.append(filename)
for root, dirs, files in os.walk(src):
for file in files:
if file in file_list:
##os.remove(dst/file[:-4] + '.txt')
shutil.move(os.path.join(src,file),os.path.join(dst,file))
But when I run this, it did nothing. Can anyone help me about it?
The following will do what you want. You need to be careful to preserve the subdirectory structure so as to avoid FileNotFound exceptions. Test it out in a test directory before clobbering the actual directories you want modified so you know that it does what you want.
import shutil
import os
src = 'B'
dst = 'A'
file_list = []
dst_paths = {}
for root, dirs, files in os.walk(dst):
for file in files:
full_path = os.path.join(root, file)
file_list.append(file)
dst_paths[file] = full_path
print(file_list)
print(dst_paths)
for root, dirs, files in os.walk(src):
for file in files:
if file in file_list:
b_path = os.path.join(root, file)
shutil.move(b_path,dst_paths[file])

Python Move Files Based On Name

To give credit, the code I am currently working with is from this response by cji, here.
I am trying to recursively pull all files from the source folder, and move them into folders from the file names first-five characters 0:5
My Code Below:
import os
import shutil
srcpath = "SOURCE"
srcfiles = os.listdir(srcpath)
destpath = "DESTINATION"
# extract the three letters from filenames and filter out duplicates
destdirs = list(set([filename[0:5] for filename in srcfiles]))
def create(dirname, destpath):
full_path = os.path.join(destpath, dirname)
os.mkdir(full_path)
return full_path
def move(filename, dirpath):
shutil.move(os.path.join(srcpath, filename)
,dirpath)
# create destination directories and store their names along with full paths
targets = [(folder, create(folder, destpath)) for folder in destdirs]
for dirname, full_path in targets:
for filename in srcfiles:
if dirname == filename[0:5]:
move(filename, full_path)
Now, changing srcfiles = os.listdir(srcpath) and destdirs = list(set([filename[0:5] for filename in srcfiles])) with the code below gets me the paths in one variable and the first five characters of the file names in another.
srcfiles = []
destdirs = []
for root, subFolders, files in os.walk(srcpath):
for file in files:
srcfiles.append(os.path.join(root,file))
for name in files:
destdirs.append(list(set([name[0:5] for file in srcfiles])))
How would I go about modifying the original code to use this... Or if someone has a better idea on how I would go about doing this. Thanks.
I can't really test it very easily, but I think this code should work:
import os
import shutil
srcpath = "SOURCE"
destpath = "DESTINATION"
for root, subFolders, files in os.walk(srcpath):
for file in files:
subFolder = os.path.join(destpath, file[:5])
if not os.path.isdir(subFolder):
os.makedirs(subFolder)
shutil.move(os.path.join(root, file), subFolder)

Create a tree-style directory listing in Python

I am trying to list directories and files (recursivley) in a directory with python:
./rootdir
./file1.html
./subdir1
./file2.html
./file3.html
./subdir2
./file4.html
Now I can list the directories and files just fine (borrowed it from here). But I would like to list it in the following format and ORDER (which is very important for what I am doing.
/rootdir/
/rootdir/file1.html
/rootdir/subdir1/
/rootdir/subdir1/file2.html
/rootdir/subdir1/file3.html
/rootdir/subdir2/
/rootdir/file4.html
I don't care how it gets done. If I walk the directory and then organize it or get everything in order. Either way, thanks in advance!
EDIT: Added code below.
# list books
import os
import sys
lstFiles = []
rootdir = "/srv/http/example/www/static/dev/library/books"
# Append the directories and files to a list
for path, dirs, files in os.walk(rootdir):
#lstFiles.append(path + "/")
lstFiles.append(path)
for file in files:
lstFiles.append(os.path.join(path, file))
# Open the file for writing
f = open("sidebar.html", "w")
f.write("<ul>")
for item in lstFiles:
splitfile = os.path.split(item)
webpyPath = splitfile[0].replace("/srv/http/example/www", "")
itemName = splitfile[1]
if item.endswith("/"):
f.write('<li>' + itemName + '</li>\n')
else:
f.write('<li>' + itemName + '</li>\n')
f.write("</ul>")
f.close()
Try the following:
for path, dirs, files in os.walk("."):
print path
for file in files:
print os.path.join(path, file)
You do not need to print entries from dirs because each directory will be visited as you walk the path, so you will print it later with print path.

Categories