Cut down path name in Python - python

This is my current code:
directory = "C:/Users/test/Desktop/test/sign off img"
choices = glob.glob(os.path.join(directory, "*.jpg"))
print(choices)
This will return me every single path to all .JPG files inside that specific folder.
As an example, here is the output for the current above code:
['C:/Users/test/Desktop/test/sign off img\\SFDG001 0102400OL - signed.jpg', 'C:/Users/test/Desktop/test/sign off img\\SFDG001 0102400OL.jpg']
How can I get the output to only return the ending of the path?
This is my desire outcome:
['SFDG001 0102400OL - signed.jpg', 'SFDG001 0102400OL.jpg']
Same path, but just the end string is returned.

You can use the os.listdir function:
>>> import os
>>> files = os.listdir("C:/Users/test/Desktop/test/sign off img")
>>> filtered_files = [file for file in files if 'signed' in file]
As you can see in thee docs, os.listdir uses the current directory as default argument, i.e., if you don't pass a value. Otherwise, it will use the path you pass to it.

I recommend using pathlib.Path over os most of the time. Try this, for example:
from pathlib import Path
directory = Path("C:/Users/test/Desktop/test/sign off img")
choices = [path.name for path in directory.glob("*.jpg")]

Related

How to loop and optimise data extraction - Python [duplicate]

I need to iterate through all .asm files inside a given directory and do some actions on them.
How can this be done in a efficient way?
Python 3.6 version of the above answer, using os - assuming that you have the directory path as a str object in a variable called directory_in_str:
import os
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
Or recursively, using pathlib:
from pathlib import Path
pathlist = Path(directory_in_str).glob('**/*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Use rglob to replace glob('**/*.asm') with rglob('*.asm')
This is like calling Path.glob() with '**/' added in front of the given relative pattern:
from pathlib import Path
pathlist = Path(directory_in_str).rglob('*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Original answer:
import os
for filename in os.listdir("/path/to/dir/"):
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
This will iterate over all descendant files, not just the immediate children of the directory:
import os
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
if filepath.endswith(".asm"):
print (filepath)
You can try using glob module:
import glob
for filepath in glob.iglob('my_dir/*.asm'):
print(filepath)
and since Python 3.5 you can search subdirectories as well:
glob.glob('**/*.txt', recursive=True) # => ['2.txt', 'sub/3.txt']
From the docs:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.
Since Python 3.5, things are much easier with os.scandir() and 2-20x faster (source):
with os.scandir(path) as it:
for entry in it:
if entry.name.endswith(".asm") and entry.is_file():
print(entry.name, entry.path)
Using scandir() instead of listdir() can significantly increase the
performance of code that also needs file type or file attribute
information, because os.DirEntry objects expose this information if
the operating system provides it when scanning a directory. All
os.DirEntry methods may perform a system call, but is_dir() and
is_file() usually only require a system call for symbolic links;
os.DirEntry.stat() always requires a system call on Unix but only
requires one for symbolic links on Windows.
Python 3.4 and later offer pathlib in the standard library. You could do:
from pathlib import Path
asm_pths = [pth for pth in Path.cwd().iterdir()
if pth.suffix == '.asm']
Or if you don't like list comprehensions:
asm_paths = []
for pth in Path.cwd().iterdir():
if pth.suffix == '.asm':
asm_pths.append(pth)
Path objects can easily be converted to strings.
Here's how I iterate through files in Python:
import os
path = 'the/name/of/your/path'
folder = os.fsencode(path)
filenames = []
for file in os.listdir(folder):
filename = os.fsdecode(file)
if filename.endswith( ('.jpeg', '.png', '.gif') ): # whatever file types you're using...
filenames.append(filename)
filenames.sort() # now you have the filenames and can do something with them
NONE OF THESE TECHNIQUES GUARANTEE ANY ITERATION ORDERING
Yup, super unpredictable. Notice that I sort the filenames, which is important if the order of the files matters, i.e. for video frames or time dependent data collection. Be sure to put indices in your filenames though!
You can use glob for referring the directory and the list :
import glob
import os
#to get the current working directory name
cwd = os.getcwd()
#Load the images from images folder.
for f in glob.glob('images\*.jpg'):
dir_name = get_dir_name(f)
image_file_name = dir_name + '.jpg'
#To print the file name with path (path will be in string)
print (image_file_name)
To get the list of all directory in array you can use os :
os.listdir(directory)
I'm not quite happy with this implementation yet, I wanted to have a custom constructor that does DirectoryIndex._make(next(os.walk(input_path))) such that you can just pass the path you want a file listing for. Edits welcome!
import collections
import os
DirectoryIndex = collections.namedtuple('DirectoryIndex', ['root', 'dirs', 'files'])
for file_name in DirectoryIndex(*next(os.walk('.'))).files:
file_path = os.path.join(path, file_name)
I really like using the scandir directive that is built into the os library. Here is a working example:
import os
i = 0
with os.scandir('/usr/local/bin') as root_dir:
for path in root_dir:
if path.is_file():
i += 1
print(f"Full path is: {path} and just the name is: {path.name}")
print(f"{i} files scanned successfully.")
Get all the .asm files in a directory by doing this.
import os
path = "path_to_file"
file_type = '.asm'
for filename in os.listdir(path=path):
if filename.endswith(file_type):
print(filename)
print(f"{path}/{filename}")
# do something below
I don't understand why some answers are complicated. This is how I would do it with Python 2.7. Replace DIRECTORY_TO_LOOP with the directory you want to use.
import os
DIRECTORY_TO_LOOP = '/var/www/files/'
for root, dirs, files in os.walk(DIRECTORY_TO_LOOP, topdown=False):
for name in files:
print(os.path.join(root, name))

Change filename prefix in Path PosixPath object

I need to change a prefix for a current file.
An example would look as follows:
from pathlib import Path
file = Path('/Users/my_name/PYTHON/Playing_Around/testing_lm.py')
# Current file with destination
print(file)
# Prefix to be used
file_prexif = 'A'
# Hardcoding wanted results.
Path('/Users/my_name/PYTHON/Playing_Around/A_testing_lm.py')
As can be seen hardcoding it is easy. However is there a way to automate this step?
There is a pseudo - idea of what I want to do:
str(file).split('/')[-1] = str(file_prexif) + str('_') + str(file).split('/')[-1]
I only want to change last element of PosixPath file. However it is not possible to change only last element of string
file.stem accesses the base name of the file without extension.
file.with_stem() (added in Python 3.9) returns an updated Path with a new stem:
from pathlib import Path
file = Path('/Users/my_name/PYTHON/Playing_Around/testing_lm.py')
print(file.with_stem(f'A_{file.stem}'))
\Users\my_name\PYTHON\Playing_Around\A_testing_lm.py
Use file.parent to get the parent of the path and file.name to get the final path component, excluding the drive and root.
from pathlib import Path
file = Path('/Users/my_name/PYTHON/Playing_Around/testing_lm.py')
file_prexif_lst = ['A','B','C']
for prefix in file_prexif_lst:
p = file.parent.joinpath(f'{prefix}_{file.name}')
print(p)
/Users/my_name/PYTHON/Playing_Around/A_testing_lm.py
/Users/my_name/PYTHON/Playing_Around/B_testing_lm.py
/Users/my_name/PYTHON/Playing_Around/C_testing_lm.py

Unable to read file from directory

I have following Directory structure,
F:\TestData
and TestData contains 20 folders with name node1, node2,..., node20
and each node folder contains file with names log.10.X
I need to access each log file from all node folders, For which I have writen code, but it is saying, File not found - log.*
CODE:
directory = "F:\TestData"
p = subprocess.Popen(["find", "./" + directory, "-name", "log.*"], stdout=subprocess.PIPE)
output, err = p.communicate()
foutput = output.split("\n")
Python, unlike POSIX shells, does not automatically do globbing (interpreting * and the like as wildcards related to files in the relevant directory) in strings. It does, however, provide a glob module for that purpose. You can use this to get a list of matching filenames:
import glob
filenames = glob.glob(r'F:\TestData\node*\log.*')
You can just use python to get a list of files in the directory
import os
directory = "F:\TestData\"
file_list = os.listdir(directory)
log_list = filter(lambda x: x.startswith("log"), file_list)
oh, you have to code to iterate sub directory.
First os.listdir()in the parent directory ,and iterate the sub directory to get the files
Python's glob module may be an option.
import glob
directory = 'F:\TestData'
logcontents = [open(f,'r').read() for f in glob.glob(directory + '\node*\log.*')]
You also use walk, like this:
import os
directory = "F:\TestData"
for i in os.walk(directory):
# i like this:
# ('F:\\TestData', ['node1', 'node2', 'node3'], [])
# ('F:\\TestData\\node1', [], ['log.1.txt'])
# ('F:\\TestData\\node2', [], ['log.2.txt'])
print i
if i[2] != []:
# TODO: use the path to finish other
# If dictory noden have some log file, you should use i[2][n].
# So, if you only need log.n.txt, you only use i[2][n].
print os.path.join(i[0], i[2][0])

Extract a part of the filepath (a directory) in Python

I need to extract the name of the parent directory of a certain path. This is what it looks like:
C:\stuff\directory_i_need\subdir\file.jpg
I would like to extract directory_i_need.
import os
## first file in current dir (with full path)
file = os.path.join(os.getcwd(), os.listdir(os.getcwd())[0])
file
os.path.dirname(file) ## directory of file
os.path.dirname(os.path.dirname(file)) ## directory of directory of file
...
And you can continue doing this as many times as necessary...
Edit: from os.path, you can use either os.path.split or os.path.basename:
dir = os.path.dirname(os.path.dirname(file)) ## dir of dir of file
## once you're at the directory level you want, with the desired directory as the final path node:
dirname1 = os.path.basename(dir)
dirname2 = os.path.split(dir)[1] ## if you look at the documentation, this is exactly what os.path.basename does.
For Python 3.4+, try the pathlib module:
>>> from pathlib import Path
>>> p = Path('C:\\Program Files\\Internet Explorer\\iexplore.exe')
>>> str(p.parent)
'C:\\Program Files\\Internet Explorer'
>>> p.name
'iexplore.exe'
>>> p.suffix
'.exe'
>>> p.parts
('C:\\', 'Program Files', 'Internet Explorer', 'iexplore.exe')
>>> p.relative_to('C:\\Program Files')
WindowsPath('Internet Explorer/iexplore.exe')
>>> p.exists()
True
All you need is parent part if you use pathlib.
from pathlib import Path
p = Path(r'C:\Program Files\Internet Explorer\iexplore.exe')
print(p.parent)
Will output:
C:\Program Files\Internet Explorer
Case you need all parts (already covered in other answers) use parts:
p = Path(r'C:\Program Files\Internet Explorer\iexplore.exe')
print(p.parts)
Then you will get a list:
('C:\\', 'Program Files', 'Internet Explorer', 'iexplore.exe')
Saves tone of time.
First, see if you have splitunc() as an available function within os.path. The first item returned should be what you want... but I am on Linux and I do not have this function when I import os and try to use it.
Otherwise, one semi-ugly way that gets the job done is to use:
>>> pathname = "\\C:\\mystuff\\project\\file.py"
>>> pathname
'\\C:\\mystuff\\project\\file.py'
>>> print pathname
\C:\mystuff\project\file.py
>>> "\\".join(pathname.split('\\')[:-2])
'\\C:\\mystuff'
>>> "\\".join(pathname.split('\\')[:-1])
'\\C:\\mystuff\\project'
which shows retrieving the directory just above the file, and the directory just above that.
import os
directory = os.path.abspath('\\') # root directory
print(directory) # e.g. 'C:\'
directory = os.path.abspath('.') # current directory
print(directory) # e.g. 'C:\Users\User\Desktop'
parent_directory, directory_name = os.path.split(directory)
print(directory_name) # e.g. 'Desktop'
parent_parent_directory, parent_directory_name = os.path.split(parent_directory)
print(parent_directory_name) # e.g. 'User'
This should also do the trick.
This is what I did to extract the piece of the directory:
for path in file_list:
directories = path.rsplit('\\')
directories.reverse()
line_replace_add_directory = line_replace+directories[2]
Thank you for your help.
You have to put the entire path as a parameter to os.path.split. See The docs. It doesn't work like string split.

How can I iterate over files in a given directory?

I need to iterate through all .asm files inside a given directory and do some actions on them.
How can this be done in a efficient way?
Python 3.6 version of the above answer, using os - assuming that you have the directory path as a str object in a variable called directory_in_str:
import os
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
Or recursively, using pathlib:
from pathlib import Path
pathlist = Path(directory_in_str).glob('**/*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Use rglob to replace glob('**/*.asm') with rglob('*.asm')
This is like calling Path.glob() with '**/' added in front of the given relative pattern:
from pathlib import Path
pathlist = Path(directory_in_str).rglob('*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Original answer:
import os
for filename in os.listdir("/path/to/dir/"):
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
This will iterate over all descendant files, not just the immediate children of the directory:
import os
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
if filepath.endswith(".asm"):
print (filepath)
You can try using glob module:
import glob
for filepath in glob.iglob('my_dir/*.asm'):
print(filepath)
and since Python 3.5 you can search subdirectories as well:
glob.glob('**/*.txt', recursive=True) # => ['2.txt', 'sub/3.txt']
From the docs:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.
Since Python 3.5, things are much easier with os.scandir() and 2-20x faster (source):
with os.scandir(path) as it:
for entry in it:
if entry.name.endswith(".asm") and entry.is_file():
print(entry.name, entry.path)
Using scandir() instead of listdir() can significantly increase the
performance of code that also needs file type or file attribute
information, because os.DirEntry objects expose this information if
the operating system provides it when scanning a directory. All
os.DirEntry methods may perform a system call, but is_dir() and
is_file() usually only require a system call for symbolic links;
os.DirEntry.stat() always requires a system call on Unix but only
requires one for symbolic links on Windows.
Python 3.4 and later offer pathlib in the standard library. You could do:
from pathlib import Path
asm_pths = [pth for pth in Path.cwd().iterdir()
if pth.suffix == '.asm']
Or if you don't like list comprehensions:
asm_paths = []
for pth in Path.cwd().iterdir():
if pth.suffix == '.asm':
asm_pths.append(pth)
Path objects can easily be converted to strings.
Here's how I iterate through files in Python:
import os
path = 'the/name/of/your/path'
folder = os.fsencode(path)
filenames = []
for file in os.listdir(folder):
filename = os.fsdecode(file)
if filename.endswith( ('.jpeg', '.png', '.gif') ): # whatever file types you're using...
filenames.append(filename)
filenames.sort() # now you have the filenames and can do something with them
NONE OF THESE TECHNIQUES GUARANTEE ANY ITERATION ORDERING
Yup, super unpredictable. Notice that I sort the filenames, which is important if the order of the files matters, i.e. for video frames or time dependent data collection. Be sure to put indices in your filenames though!
You can use glob for referring the directory and the list :
import glob
import os
#to get the current working directory name
cwd = os.getcwd()
#Load the images from images folder.
for f in glob.glob('images\*.jpg'):
dir_name = get_dir_name(f)
image_file_name = dir_name + '.jpg'
#To print the file name with path (path will be in string)
print (image_file_name)
To get the list of all directory in array you can use os :
os.listdir(directory)
I'm not quite happy with this implementation yet, I wanted to have a custom constructor that does DirectoryIndex._make(next(os.walk(input_path))) such that you can just pass the path you want a file listing for. Edits welcome!
import collections
import os
DirectoryIndex = collections.namedtuple('DirectoryIndex', ['root', 'dirs', 'files'])
for file_name in DirectoryIndex(*next(os.walk('.'))).files:
file_path = os.path.join(path, file_name)
I really like using the scandir directive that is built into the os library. Here is a working example:
import os
i = 0
with os.scandir('/usr/local/bin') as root_dir:
for path in root_dir:
if path.is_file():
i += 1
print(f"Full path is: {path} and just the name is: {path.name}")
print(f"{i} files scanned successfully.")
Get all the .asm files in a directory by doing this.
import os
path = "path_to_file"
file_type = '.asm'
for filename in os.listdir(path=path):
if filename.endswith(file_type):
print(filename)
print(f"{path}/{filename}")
# do something below
I don't understand why some answers are complicated. This is how I would do it with Python 2.7. Replace DIRECTORY_TO_LOOP with the directory you want to use.
import os
DIRECTORY_TO_LOOP = '/var/www/files/'
for root, dirs, files in os.walk(DIRECTORY_TO_LOOP, topdown=False):
for name in files:
print(os.path.join(root, name))

Categories