Python program to traverse directories and read file information - python

I'm just getting started with Python but already have found it much more productive than Bash shell scripting.
I'm trying to write a Python script that will traverse every directory that branches from the directory I launch the script in, and for each file it encounters, load an instance of this class:
class FileInfo:
def __init__(self, filename, filepath):
self.filename = filename
self.filepath = filepath
The filepath attribute would be the full absolute path from root (/). Here's the pseudocode mockup for what I'd like the main program to do:
from (current directory):
for each file in this directory,
create an instance of FileInfo and load the file name and path
switch to a nested directory, or if there is none, back out of this directory
I've been reading about os.walk() and ok.path.walk(), but I'd like some advice about what the most straightforward way to implement this in Python would be. Thanks in advance.

I'd use os.walk doing the following:
def getInfos(currentDir):
infos = []
for root, dirs, files in os.walk(currentDir): # Walk directory tree
for f in files:
infos.append(FileInfo(f,root))
return infos

Try
info = []
for path, dirs, files in os.walk("."):
info.extend(FileInfo(filename, path) for filename in files)
or
info = [FileInfo(filename, path)
for path, dirs, files in os.walk(".")
for filename in files]
to get a list of one FileInfo instance per file.

Try it
import os
for item in os.walk(".", "*"):
print(item)

Related

Extract full Path and File Name

Attempting to write a function that walks a file system and returns the absolute path and filename for use in another function.
Example "/testdir/folderA/222/filename.ext".
Having tried multiple versions of this I cannot seem to get it to work properly.
filesCheck=[]
def findFiles(filepath):
files=[]
for root, dirs, files in os.walk(filepath):
for file in files:
currentFile = os.path.realpath(file)
print (currentFile)
if os.path.exists(currentFile):
files.append(currentFile)
return files
filesCheck = findFiles(/testdir)
This returns
"filename.ext" (only one).
Substitute in currentFile = os.path.join(root, file) for os.path.realpath(file) and it goes into a loop in the first directory. Tried os.path.join(dir, file) and it fails as one of my folders is named 222.
I have gone round in circles and get somewhat close but haven't been able to get it to work.
Running on Linux with Python 3.6
There's a several things wrong with your code.
There are multiple values are being assigned to the variable name files.
You're not adding the root directory to each filename os.walk() returns which can be done with os.path.join().
You're not passing a string to the findFiles() function.
If you fix those things there's no longer a need to call os.path.exists() because you can be sure it does.
Here's a working version:
import os
def findFiles(filepath):
found = []
for root, dirs, files in os.walk(filepath):
for file in files:
currentFile = os.path.realpath(os.path.join(root, file))
found.append(currentFile)
return found
filesCheck = findFiles('/testdir')
print(filesCheck)
Hi I think this is what you need. Perhaps you could give it a try :)
from os import walk
path = "C:/Users/SK/Desktop/New folder"
files = []
for (directoryPath, directoryNames, allFiles) in walk(path):
for file in allFiles:
files.append([file, f"{directoryPath}/{file}"])
print(files)
Output:
[ ['index.html', 'C:/Users/SK/Desktop/New folder/index.html'], ['test.py', 'C:/Users/SK/Desktop/New folder/test.py'] ]

python: collect files with one extention from all sub-dir

I am trying to collect all files with all sub-directories and move to another directory
Code used
#collects all mp3 files from folders to a new folder
import os
from pathlib import Path
import shutil
#run once
path = os.getcwd()
os.mkdir("empetrishki")
empetrishki = path + "/empetrishki" #destination dir
print(path)
print(empetrishki)
#recursive collection
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
for name in files:
filePath = Path(name)
if filePath.suffix.lower() == ".mp3":
print(filePath)
os.path.join
filePath.rename(empetrishki.joinpath(filePath))
I have trouble with the last line of moving files: filePath.rename() nor shutil.move nor joinpath() have worked for me. Maybe that's because I am trying to change the element in the tuple - the output from os.walk
Similar code works with os.scandir but this would collect files only in the current directory
How can I fix that, thanks!
If you use pathlib.Path(name) that doesn't mean that something exists called name. Hence, you do need to be careful that you have a full path, or relative path, and you need to make sure to resolve those. In particular I am noting that you don't change your working directory and have a line like this:
filePath = Path(name)
This means that while you may be walking down the directory, your working directory may not be changing. You should make your path from the root and the name, it is also a good idea to resolve so that the full path is known.
filePath = Path(root).joinpath(name).resolve()
You can also place the Path(root) outside the inner loop as well. Now you have an absolute path from '/home/' to the filename. Hence, you should be able to rename with .rename(), like:
filePath.rename(x.parent.joinpath(newname))
#Or to another directory
filePath.rename(other_dir.joinpath(newname))
All together:
from pathlib import os, Path
empetrishki = Path.cwd().joinpath("empetrishki").resolve()
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
root = Path(root).resolve()
for name in files:
file = root.joinpath(name)
if file.suffix.lower() == ".mp3":
file.rename(empetrishki.joinpath(file.name))
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
if root == empetrishki:
continue # skip the destination dir
for name in files:
basename, extension = os.path.splitext(name)
if extension.lower() == ".mp3":
oldpath = os.path.join(root, name)
newpath = os.path.join(empetrishki, name)
print(oldpath)
shutil.move(oldpath, newpath)
This is what I suggest. Your code is running on the current directory, and the file is at the path os.path.join(root, name) and you need to provide such path to your move function.
Besides, I would also suggest to use os.path.splitext for extracting the file extension. More pythonic. And also you might want to skip scanning your target directory.

All Files in Dir & Sub-Dir

I would like to find all the files in a directory and all sub-directories.
code used:
import os
import sys
path = "C:\\"
dirs = os.listdir(path)
filename = "C.txt"
FILE = open(filename, "w")
FILE.write(str(dirs))
FILE.close()
print dirs
The problem is - this code only lists files in directories, not sub-directories. What do I need to change in order to also list files in subdirectories?
To traverse a directory tree you want to use os.walk() for this.
Here's an example to get you started:
import os
searchdir = r'C:\root_dir' # traversal starts in this directory (the root)
for root, dirs, files in os.walk(searchdir):
for name in files:
(base, ext) = os.path.splitext(name) # split base and extension
print base, ext
which would give you access to the file names and the components.
You'll find the functions in the os and os.path module to be of great use for this sort of work.
This function will help you: os.path.walk() http://docs.python.org/library/os.path.html#os.path.walk

using subprocess over different files python

I've got a problem with a short script, it'd be great if you could have a look!
import os
import subprocess
root = "/Users/software/fmtomov1.0/remaker_lastplot/source_relocation/observed_arrivals_loc3d"
def loop_loc3d(file_in):
"""Loops loc3d over the source files"""
return subprocess.call (['loc3d'], shell=True)
def relocation ():
for subdir, dirs, files in os.walk(root):
for file in files:
file_in = open(os.path.join(subdir, file), 'r')
return loop_loc3d(file_in)
I think the script is quite easy to understand, it's very simple. However I'm not getting the result wanted. In a few word I just want 'loc3d' to operate over all the files contents present in the 'observed_arrivals_loc3d' directory, which means that I need to open all the files and that's what I've actually done. In fact, if I try to 'print files' after:
for subdir, dirs, files in os.walk(root)
I'll get the name of every file. Furthermore, if I try a 'print file_in' after
file_in = open(os.path.join(subdir, file), 'r')
I get something like this line for every file:
<open file '/Users/software/fmtomov1.0/remaker_lastplot/source_relocation/observed_arrivals_loc3d/EVENT2580', mode 'r' at 0x78fe38>
subprocess has been tested alone on only one file and it's working.
Overall I'm getting no errors but just -11 which means absolutely nothing to me. The output from loc3d should be completly different.
So does the code look fine to you? Is there anything I'm missing? Any suggestion?
Thanks for your help!
I assume you would call loc3d filename from the CLI. If so, then:
def loop_loc3d(filename):
"""Loops loc3d over the source files"""
return subprocess.call (['loc3d',filename])
def relocation():
for subdir, dirs, files in os.walk(root):
for file in files:
filename = os.path.join(subdir, file)
return loop_loc3d(filename)
In other words, don't open the file yourself, let loc3d do it.
Currently your relocation method will return after the first iteration (for the first file). You shouldn't need to return at all.
def loop_loc3d(filename):
"""Loops loc3d over the source files"""
return subprocess.call (['loc3d',filename])
def relocation ():
for subdir, dirs, files in os.walk(root):
for file in files:
filename = os.path.join(subdir, file)
loop_loc3d(filename)
This is only one of the issues. The other is concerning loc3d itself. Try providing the full path for loc3d.
-11 exit code might mean that the command killed by signal Segmentation fault.
It is a bug in loc3d. A well-behaved program should not produce 'Segmentation fault' on any user input.
Feed loc3d only files that it can understand. Print filenames or use subprocess.check_call() to find out which file it doesn't like:
#!/usr/bin/env python
import fnmatch
import os
import subprocess
def loc3d_files(root):
for dirpath, dirs, files in os.walk(root, topdown=True):
# skip hidden directories
dirs[:] = [d for d in dirs if not d.startswith('.')]
# process only known files
for file in fnmatch.filter(files, "*some?pattern[0-9][0-9].[ch]"):
yield os.path.join(dirpath, file)
for path in loc3d_files(root):
print path
subprocess.check_call(['loc3d', path]) # raise on any error
Just found out that loc3d, as unutbu said, relies on several variables and in the specific case one called 'observal_arrivals' that I have to create and delete every time from my directory. In Pythonic terms it means:
import os
import shutil
import subprocess
def loop_loc3d(file_in):
"""Loops loc3d over the source files"""
return subprocess.call(["loc3d"], shell=True)
path = "/Users/software/fmtomo/remaker_lastplot/source_relocation"
path2 = "/Users/Programming/working_directory/2test"
new_file_name = 'observed_arrivals'
def define_object_file ():
for filename in os.listdir("."):
file_in = os.rename (filename, new_file_name) # get the observal_arrivals file
file_in = shutil.copy ("/Users/simone/Programming/working_directory/2test/observed_arrivals", "/Users/software/fmtomo/remaker_lastplot/source_relocation")
os.chdir(path) # goes where loc3d is
loop_loc3d (file_in)
os.remove("/Users/software/fmtomo/remaker_lastplot/source_relocation/observed_arrivals")
os.remove ("/Users/Programming/working_directory/2test/observed_arrivals")
os.chdir(path2)
Now, this is working very well, so it should answer my question. I guess it's quite easy to understand, it's just copying, changing dir and that kind of stuff.

Process a set of files from a source directory to a destination directory in Python

Being completely new in python I'm trying to run a command over a set of files in python. The command requires both source and destination file (I'm actually using imagemagick convert as in the example below).
I can supply both source and destination directories, however I can't figure out how to easily retain the directory structure from the source to the destination directory.
E.g. say the srcdir contains the following:
srcdir/
file1
file3
dir1/
file1
file2
Then I want the program to create the following destination files on destdir: destdir/file1, destdir/file3, destdir/dir1/file1 and destdir/dir1/file2
So far this is what I came up with:
import os
from subprocess import call
srcdir = os.curdir # just use the current directory
destdir = 'path/to/destination'
for root, dirs, files in os.walk(srcdir):
for filename in files:
sourceFile = os.path.join(root, filename)
destFile = '???'
cmd = "convert %s -resize 50%% %s" % (sourceFile, destFile)
call(cmd, shell=True)
The walk method doesn't directly provide what directory the file is under srcdir other than concatenating the root directory string with the file name. Is there some easy way to get the destination file, or do I have to do some string manipulation in order to do this?
Change your loop to:
for root, dirs, files in os.walk(srcdir):
destroot = os.path.join(destdir, root[len(srcdir):])
for adir in dirs:
os.makedirs(os.path.join(destroot, adir))
for filename in files:
sourceFile = os.path.join(root, filename)
destFile = os.path.join(destroot, filename)
processFile(sourceFile, destFile)
There are a few relative path scripts out there that will do what you want -- namely find the relative path between two paths. E.g.:
http://www.voidspace.org.uk/python/pathutils.html
(relpath method)
http://code.activestate.com/recipes/302594-another-relative-filepath-script/
http://groups.google.com/group/comp.lang.python/browse_thread/thread/390d8d3e3ac8ef44/d8c74f96468c6a36?q=relative+path&rnum=1&pli=1
Unfortunately, I don't think this functionality has ever been added to core python.
While not pretty, this will preserve the directory structure of the tree:
_, _, subdirs = root.partition(srcdir)
destfile = os.path.join(destdir, subdirs[1:], filename)

Categories