I have an image sequence path that is as follows : /host_server/master/images/set01a/env_basecolor_default_v001/basecolor_default.*.jpg
In a pythonic way, is it possible for me to code and have it read the first file based on the above file path given?
If not, can I have it list the entire sequence of the sequence but only of that naming? Assuming that there is another sequence called basecolor_default_beta.*.jpgin the same directory
For #2, if I used os.listdir('/host_server/master/images/set01a/env_basecolor_default_v001'), it will be listing out files of the both image sequences
The simplest solution seems to be to use several functions.
1) To get ALL of the full filepaths, use
main_path = "/host_server/master/images/set01a/env_basecolor_default_v001/"
all_files = [os.path.join(main_path, filename) for filename in os.listdir(main_path)]
2) To choose only those of a certain kind, use a filter.
beta_files = list(filter(lambda x: "beta" in x, all_files))
beta_files.sort()
read the first file based on the above file path given?
With effective glob.iglob(pathname, recursive=False) (if you need the name/path of the 1st found file):
import glob
path = '/host_server/master/images/set01a/env_basecolor_default_v001/basecolor_default.*.jpg'
it = glob.iglob(path)
first = next(it)
glob.iglob() - Return an iterator which yields the same values as
glob() without actually storing them all simultaneously.
Try using glob. Something like:
import glob
import os
path = '/host_server/master/images/set01a/env_basecolor_default_v001'
pattern = 'basecolor_default.*.jpg'
filenames = glob.glob(os.path.join(path, pattern))
# read filenames[0]
Related
I'd like to iterate over files in two folders in a directory only, and ignore any other files/directories.
e.g in path: "dirA/subdirA/folder1" and "dirA/subdirA/folder2"
I tried passing both to pathlib as:
root_dir_A = "dirA/subdirA/folder1"
root_dir_B = "dirA/subdirA/folder2"
for file in Path(root_dir_A,root_dir_B).glob('**/*.json'):
json_data = open(file, encoding="utf8")
...
But it only iterates over the 2nd path in Path(root_dir_A,root_dir_B).
You can't pass two separate directories to Path(). You'll need to loop over them.
for dirpath in (root_dir_A, root_dir_B):
for file in Path(dirpath).glob('**/*.json'):
...
According to the documentation, Path("foo", "bar") should produce "foo/bar"; but it seems to actually use only the second path segment if it is absolute. Either way, it doesn't do what you seemed to hope it would.
Please check the output of Path(root_dir_A,root_dir_B) to see if it returns what you want.
In your specific case this should work:
path_root = Path('dirA')
for path in path_root.glob('subdirA/folder[12]/*/*.json'):
...
If your paths aren't homogeneous enough you might have to chain generators. I. e.:
from itertools import chain
content_dir_A = Path(root_dir_A).glob('**/*.json')
content_dir_B = Path(root_dir_B).glob('**/*.json')
content_all = chain(content_dir_A, content_dir_B)
for path in content_all:
...
I am trying to loop a series of jpg files in a folder. I found example code of that:
for n, image_file in enumerate(os.scandir(image_folder)):
which will loop through the image files in image_folder. However, it seems like it is not following any sequence. I have my files name like 000001.jpg, 000002.jpg, 000003.jpg,... and so on. But when the code run, it did not follow the sequence:
000213.jpg
000012.jpg
000672.jpg
....
What seems to be the issue here?
Here's the relevant bit on os.scandir():
os.scandir(path='.')
Return an iterator of os.DirEntry objects
corresponding to the entries in the directory given by path. The
entries are yielded in arbitrary order, and the special entries '.'
and '..' are not included.
You should not expect it to be in any particular order. The same goes for listdir() if you were considering this as an alternative.
If you strictly need them to be in order, consider sorting them first:
scanned = sorted([f for f in os.scandir(image_folder)], key=lambda f: f.name)
for n, image_file in enumerate(scanned):
# ... rest of your code
I prefer to use glob:
The glob module finds all the pathnames matching a specified pattern
according to the rules used by the Unix shell, although results are
returned in arbitrary order. No tilde expansion is done, but *, ?, and
character ranges expressed with [] will be correctly matched.
You will need this if you handle more complex file structures so starting with glob isnt that bad. For your case you also can use os.scandir() as mentioned above.
Reference: glob module
import glob
files = sorted(glob.glob(r"C:\Users\Fabian\Desktop\stack\img\*.jpg"))
for key, myfile in enumerate(files):
print(key, myfile)
notice even if there other files like .txt they wont be in your list
Output:
C:\Users\Fabian\Desktop\stack>python c:/Users/Fabian/Desktop/stack/img.py
0 C:\Users\Fabian\Desktop\stack\img\img0001.jpg
1 C:\Users\Fabian\Desktop\stack\img\img0002.jpg
2 C:\Users\Fabian\Desktop\stack\img\img0003.jpg
....
I need to get the latest file of a folder using python. While using the code:
max(files, key = os.path.getctime)
I am getting the below error:
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'a'
Whatever is assigned to the files variable is incorrect. Use the following code.
import glob
import os
list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print(latest_file)
max(files, key = os.path.getctime)
is quite incomplete code. What is files? It probably is a list of file names, coming out of os.listdir().
But this list lists only the filename parts (a. k. a. "basenames"), because their path is common. In order to use it correctly, you have to combine it with the path leading to it (and used to obtain it).
Such as (untested):
def newest(path):
files = os.listdir(path)
paths = [os.path.join(path, basename) for basename in files]
return max(paths, key=os.path.getctime)
I lack the reputation to comment but ctime from Marlon Abeykoons response did not give the correct result for me. Using mtime does the trick though. (key=os.path.getmtime))
import glob
import os
list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getmtime)
print(latest_file)
I found two answers for that problem:
python os.path.getctime max does not return latest
Difference between python - getmtime() and getctime() in unix system
I would suggest using glob.iglob() instead of the glob.glob(), as it is more efficient.
glob.iglob() Return an iterator which yields the same values as glob() without actually storing them all simultaneously.
Which means glob.iglob() will be more efficient.
I mostly use below code to find the latest file matching to my pattern:
LatestFile = max(glob.iglob(fileNamePattern),key=os.path.getctime)
NOTE:
There are variants of max function, In case of finding the latest file we will be using below variant:
max(iterable, *[, key, default])
which needs iterable so your first parameter should be iterable.
In case of finding max of nums we can use beow variant : max (num1, num2, num3, *args[, key])
I've been using this in Python 3, including pattern matching on the filename.
from pathlib import Path
def latest_file(path: Path, pattern: str = "*"):
files = path.glob(pattern)
return max(files, key=lambda x: x.stat().st_ctime)
Try to sort items by creation time. Example below sorts files in a folder and gets first element which is latest.
import glob
import os
files_path = os.path.join(folder, '*')
files = sorted(
glob.iglob(files_path), key=os.path.getctime, reverse=True)
print files[0]
Most of the answers are correct but if there is a requirement like getting the latest two or three latest then it could fail or need to modify the code.
I found the below sample is more useful and relevant as we can use the same code to get the latest 2,3 and n files too.
import glob
import os
folder_path = "/Users/sachin/Desktop/Files/"
files_path = os.path.join(folder_path, '*')
files = sorted(glob.iglob(files_path), key=os.path.getctime, reverse=True)
print (files[0]) #latest file
print (files[0],files[1]) #latest two files
A much faster method on windows (0.05s), call a bat script that does this:
get_latest.bat
#echo off
for /f %%i in ('dir \\directory\in\question /b/a-d/od/t:c') do set LAST=%%i
%LAST%
where \\directory\in\question is the directory you want to investigate.
get_latest.py
from subprocess import Popen, PIPE
p = Popen("get_latest.bat", shell=True, stdout=PIPE,)
stdout, stderr = p.communicate()
print(stdout, stderr)
if it finds a file stdout is the path and stderr is None.
Use stdout.decode("utf-8").rstrip() to get the usable string representation of the file name.
(Edited to improve answer)
First define a function get_latest_file
def get_latest_file(path, *paths):
fullpath = os.path.join(path, paths)
...
get_latest_file('example', 'files','randomtext011.*.txt')
You may also use a docstring !
def get_latest_file(path, *paths):
"""Returns the name of the latest (most recent) file
of the joined path(s)"""
fullpath = os.path.join(path, *paths)
If you use Python 3, you can use iglob instead.
Complete code to return the name of latest file:
def get_latest_file(path, *paths):
"""Returns the name of the latest (most recent) file
of the joined path(s)"""
fullpath = os.path.join(path, *paths)
files = glob.glob(fullpath) # You may use iglob in Python3
if not files: # I prefer using the negation
return None # because it behaves like a shortcut
latest_file = max(files, key=os.path.getctime)
_, filename = os.path.split(latest_file)
return filename
I have tried to use the above suggestions and my program crashed, than I figured out the file I'm trying to identify was used and when trying to use 'os.path.getctime' it crashed.
what finally worked for me was:
files_before = glob.glob(os.path.join(my_path,'*'))
**code where new file is created**
new_file = set(files_before).symmetric_difference(set(glob.glob(os.path.join(my_path,'*'))))
this codes gets the uncommon object between the two sets of file lists
its not the most elegant, and if multiple files are created at the same time it would probably won't be stable
I'm fairly new to python. I'd like to save the text that is printed by at this script as a variable. (The variable is meant to be written to a file later, if that matters.) How can I do that?
import fnmatch
import os
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
you can store it in variable like this:
import fnmatch
import os
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
my_var = file
# do your stuff
or you can store it in list for later use:
import fnmatch
import os
my_match = []
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
my_match.append(file) # append insert the value at end of list
# do stuff with my_match list
You can store it in a list:
import fnmatch
import os
matches = []
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
matches.append(file)
Both answers already provided are correct, but Python provides a nice alternative. Since iterating through an array and appending to a list is such a common pattern, the list comprehension was created as a one-stop shop for the process.
import fnmatch
import os
matches = [filename for filename in os.listdir("/Users/x/y") if fnmatch.fnmatch(filename, "*.txt")]
While NSU's answer and the others are all perfectly good, there may be a simpler way to get what you want.
Just as fnmatch tests whether a certain file matches a shell-style wildcard, glob lists all files matching a shell-style wildcard. In fact:
This is done by using the os.listdir() and fnmatch.fnmatch() functions in concert…
So, you can do this:
import glob
matches = glob.glob("/Users/x/y/*.txt")
But notice that in this case, you're going to get full pathnames like '/Users/x/y/spam.txt' rather than just 'spam.txt', which may not be what you want. Often, it's easier to keep the full pathnames around and os.path.basename them when you want to display them, than to keep just the base names around and os.path.join them when you want to open them… but "often" isn't "always".
Also notice that I had to manually paste the "/Users/x/y/" and "*.txt" together into a single string, the way you would at the command line. That's fine here, but if, say, the first one came from a variable, rather than hardcoded into the source, you'd have to use os.path.join(basepath, "*.txt"), which isn't quite as nice.
By the way, if you're using Python 3.4 or later, you can get the same thing out of the higher-level pathlib library:
import pathlib
matches = list(pathlib.Path("/Users/x/y/").glob("*.txt"))
Maybe defining an utility function is the right path to follow...
def list_ext_in_dir(e,d):
"""e=extension, d= directory => list of matching filenames.
If the directory d cannot be listed returns None."""
from fnmatch import fnmatch
from os import listdir
try:
dirlist = os.listdir(d)
except OSError:
return None
return [fname for fname in dirlist if fnmatch(fname,e)]
I have put the dirlist inside a try except clause to catch the
possibility that we cannot list the directory (non-existent, read
permission, etc). The treatment of errors is a bit simplistic, but...
the list of matching filenames is built using a so called list comprehension, that is something that you should investigate as soon as possible if you're going to use python for your programs.
To close my post, an usage example
l_txtfiles = list_ext_in_dir('*.txt','/Users/x/y;)
I have a little task for my company
I have multiple files which start with swale-randomnumber
I want to copy then to some directory (does shutil.copy allow wildmasks?)
anyway I then want to choose the largest file and rename it to sync.dat and then run a program.
I get the logic, I will use a loop to do each individual piece of work then move on to the next, but I am unsure how to choose a single largest file or a single file at all for that matter as when I type in swale* surely it will just choose them all?
Sorry I havnt written any source code yet, I am still trying to get my head around how this will work.
Thanks for any help you may provide
The accepted answer of this question proposes a nice portable implementation of file copy with wildcard support:
from glob import iglob
from shutil import copy
from os.path import join
def copy_files(src_glob, dst_folder):
for fname in iglob(src_glob):
copy(fname, join(dst_folder, fname))
If you want to compare file sizes, you can use either of these functions:
import os
os.path.getsize(path)
os.stat(path).st_size
This might work :
import os.path
import glob
import shutil
source = "My Source Path" # Replace these variables with the appropriate data
dest = "My Dest Path"
command = "My command"
# Find the files that need to be copied
files = glob.glob(os.path.join(source, "swale-*"))
# Copy the files to the destination
for file in files:
shutil.copy(os.path.join(source, "swale-*"), dest)
# Create a sorted list of files - using the file sizes
# biggest first, and then use the 1st item
biggest = sorted([file for file in files],
cmp=lambda x,y : cmp(x,y),
key=lambda x: os.path.size( os.path.join( dest, x)), reverse = True)[0]
# Rename that biggest file to swale.dat
shutil.move( os.path.join(dest,biggest), os.path.join(dest,"swale.date") )
# Run the command
os.system( command )
# Only use os.system if you know your command is completely secure and you don't need the output. Use the popen module if you need more security and need the output.
Note : None of this is tested - but it should work
from os import *
from os.path import *
directory = '/your/directory/'
# You now have list of files in directory that starts with "swale-"
fileList = [join(directory,f) for f in listdir(directory) if f.startswith("swale-") and isfile(join(directory,f))]
# Order it by file size - from big to small
fileList.sort(key=getsize, reverse=True)
# First file in array is biggest
biggestFile = fileList[0]
# Do whatever you want with this files - using shutil.*, os.*, or anything else..
# ...
# ...