directory listing in python - python

I am getting trouble with directory listing.Suppose, I have a directory with some subdirectory(named as a-z, 0-9, %, -).In each subdirectory, I have some related xml files.
So, I have to read each lines of this files.I have tried with the following code.
def listFilesMain(dirpath):
for dirname, dirnames, filenames in os.walk(dirpath):
for subdirname in dirnames:
os.path.join(dirname, subdirname)
for filename in filenames:
fPath = os.path.join(dirname, filename)
fileListMain.append(fPath)
It works only if I tried to run my program from subdirectory, but no results if I tried to run from main directory. What's going wrong here?
Any kind of help will be greatly appreciated. Thanks!

How about this:
def list_files(dirpath):
files = []
for dirname, dirnames, filenames in os.walk(dirpath):
files += [os.path.join(dirname, filename) for filename in filenames]
return files
You could also do this as a generator, so the list isn't stored in its entirety:
def list_files(dirpath):
for dirname, dirnames, filenames in os.walk(dirpath):
for filename in filenames:
yield os.path.join(dirname, filename)
Finally, you might want to enforce absolute paths:
def list_files(dirpath):
dirpath = os.path.abspath(dirpath)
for dirname, dirnames, filenames in os.walk(dirpath):
for filename in filenames:
yield os.path.join(dirname, filename)
All of these can be called with a line like:
for filePath in list_files(dirpath):
# Check that the file is an XML file.
# Then handle the file.

if your subdirectories are softlinks, make sure you specify followlinks=True as an argument to os.walk(..). From the documentation:
By default, os.walk does not follow symbolic links to subdirectories on
systems that support them. In order to get this functionality, set the
optional argument 'followlinks' to true.

Related

How remove all whitespace in multiple filenames?

I'm new to Python. I am converting evtx log files to xml, however, some of the evtx files have whitespace in their names and I get an error when the file conversion starts. One of the solutions is to manually remove all the whitespace from the evtx file names, but this is impossible when you deal with a large number of files.
I need to remove all the whitespace from file names in multiple directories. I am trying to rename the files by removing the whitespace with .replace(" ",""), however, I keep getting an error:
FileNotFoundError: [Errno 2] No such file or directory:
Code:
dir_path = '/home/user/evtx_logs'
for dirpath, dirnames, filenames in os.walk(dir_path):
for f in filenames:
new_filename = f.replace(" ","")
os.rename(f,new_filename)
Is there any other alternative to rename or perhaps to ignore the white space in a file name?
Try printing out the values of your directory walk. You will notice that the filenames are not paths, they are just the names of the files. When you try to rename the file, you need to point your function to the entire path. Something like this:
for dirpath, dirnames, filenames in os.walk(dir_path):
for f in filenames:
filepath = os.path.join(dirpath, f)
new_filename = f.replace(" ","")
new_filepath = os.path.join(dirpath, new_filename)
os.rename(filepath, new_filepath)
solution
dir_path = '/home/user/evtx_logs'
for dirpath, dirnames, filenames in os.walk(dir_path):
for f in filenames:
new_filename = f.replace(" ","")
os.rename(os.path.join(dirpath, f), os.path.join(dirpath, new_filename))
you need to provide full path to os.rename in order to work the renaming.
here
os.path.join(dirpath, new_filename)
dirpath or dirnames, dont know exactly:
os.path.join(dirnames, new_filename)

Rename Files without Extension

So I got a Directory Dir and in Dir there are three subdirectories with five Files each:
Dir/A/ one,two,three,four,five.txt
Dir/B/ one,two,three,four,five.txt
Dir/C/ one,two,three,four,five.txt
As you can see there are four Files without extension and one with the .txtextension
How do I rename all Files without extension in a recursive manner?
Currently I'm trying this, which works for a single Directory, but how could I catch all Files if I put this Script into Dir?
import os, sys
for filename in os.listdir(os.path.dirname(os.path.abspath(__file__))):
base_file, ext = os.path.splitext(filename)
if ext == "":
os.rename(filename, base_file + ".png")
Use os.walk if you want to perform recursive traversal.
for root, dirs, files in os.walk(os.path.dirname(os.path.abspath(__file__))):
for file in files:
base_path, ext = os.path.splitext(os.path.join(root, file))
if not ext:
os.rename(base_path, base_path + ".png")
os.walk will segregate your files into normal files and directories, so os.path.isdir is not needed.
import os
my_dir = os.getcwd()
for root, dirnames, fnames in os.walk(my_dir):
for fname in fnames:
if fname.count('.'): continue # don't process a file with an extension
os.rename(os.path.join(root, fname), os.path.join(root, "{}.png".format(fname)))

Unzipping and renaming files/folders

Kind of new to python. But after searching and trying to unzip some folders, then rename them that don't have static names. For example the file is New_05222016. The #s are the date, and that always changes. I want it to be a unzipped folder that is labeled "New".
This is what i have so far. It will unzipp my file, but won't rename it.
import zipfile,fnmatch,os
rootPath = r"C:/Users/Bob/Desktop/Bill"
pattern = '*.zip'
New = 'New*'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(os.path.join(root, filename))
zipfile.ZipFile(os.path.join(root, filename)).extractall(os.path.join(root, os.path.splitext(filename)[0]))
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(dir,New):
os.rename(dir,'C:/Users/Bob/Desktop/Bill/New')
if tried other ways. Such as just os.rename and typing it out. But i'm at a loss of what to do.
os.rename() will work fine, just be sure to specify the full path.
I've modified your example using os.listdir() to store the name of the unzipped directory and then renamed it using os.rename(). I also used re to leave the name of the zipped file intact.
import zipfile,fnmatch,os, re
rootPath = r"C:\Users\Bob\Desktop\Bill"
pattern = '*.zip'
New = 'New*'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(os.path.join(root, filename))
zipfile.ZipFile(os.path.join(root, filename)).extractall(os.path.join(root, os.path.splitext(filename)[0]))
for dirName in os.listdir(rootPath):
if not re.search("zip", dirName):
os.rename(os.path.join(rootPath, dirName), os.path.join(rootPath,"New"))
I hope this helps!

Read file in unknown directory

I need to read and edit serveral files, the issue is I know roughly where these files are but not entirely.
so all the files are called QqTest.py in various different directories.
I know that the parent directories are called:
mdcArray = ['MDC0021','MDC0022','MDC0036','MDC0055','MDC0057'
'MDC0059','MDC0061','MDC0062','MDC0063','MDC0065'
'MDC0066','MDC0086','MDC0095','MDC0098','MDC0106'
'MDC0110','MDC0113','MDC0114','MDC0115','MDC0121'
'MDC0126','MDC0128','MDC0135','MDC0141','MDC0143'
'MDC0153','MDC0155','MDC0158']
but after that there is another unknown subdirectory that contains QqTest.txt
so I need to read the QqTest.txt from /MDC[number]/unknownDir/QqTest.txt
So how I wildcard read the file in python similar to how I would in bash
i.e
/MDC0022/*/QqTest.txt
You can use a Python module called glob to do this. It enables Unix style pathname pattern expansions.
import glob
glob.glob("/MDC0022/*/QqTest.txt")
If you want to do it for all items in the list you can try this.
for item in mdcArray:
required_files = glob.glob("{0}/*/QqTest.txt".format(item))
# process files here
Glob documentation
You could search your root folders as follows:
import os
mdcArray = ['MDC0021','MDC0022','MDC0036','MDC0055','MDC0057'
'MDC0059','MDC0061','MDC0062','MDC0063','MDC0065'
'MDC0066','MDC0086','MDC0095','MDC0098','MDC0106'
'MDC0110','MDC0113','MDC0114','MDC0115','MDC0121'
'MDC0126','MDC0128','MDC0135','MDC0141','MDC0143'
'MDC0153','MDC0155','MDC0158']
for root in mdcArray:
for dirpath, dirnames, filenames in os.walk(root):
for filename in filenames:
if filename == 'QqTest.txt':
file = os.path.join(dirpath, filename)
print "Found - {}".format(file)
This would display something like the following:
Found - MDC0022\test\QqTest.txt
The os.walk function can be used to traverse your folder structure.
To search all folders for MDC<number> in the path, you could use the following approach:
import os
import re
for dirpath, dirnames, filenames in os.walk('.'):
if re.search(r'MDC\d+', dirpath):
for filename in filenames:
if filename == 'QqTest.txt':
file = os.path.join(dirpath, filename)
print "Found - {}".format(file)
You might use os.walk. Not exactly what you wanted but will do the job.
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)

Cannot seem to crawl a deep directory with my Python script, any idea?

The script is basically creating a list with all the files in all directories. Any idea why is seems to crash when it has to scan a directory that is larger than a few files?
import os
correctlyNamedDirectories = []
def crawlDirectories(directory):
for dirname, dirnames, filenames in os.walk(directory):
for subdirname in dirnames:
correctlyNamedDirectories.append(os.path.join(dirname, subdirname))
for filename in filenames:
correctlyNamedDirectories.append(os.path.join(dirname, filename))
crawlDirectories('.')
print correctlyNamedDirectories
Also, is there a cleaner way of writing this?
Shorter method with a list comprehension:
correctlyNamedDirectories = [os.path.join(path, subname) for path, dirnames, filenames in os.walk(directory) for subname in dirnames + filenames]

Categories