Find the newest folder in a directory in Python - python

I am trying to have an automated script that enters into the most recently created folder.
I have some code below
import datetime, os, shutil
today = datetime.datetime.now().isoformat()
file_time = datetime.datetime.fromtimestamp(os.path.getmtime('/folders*'))
if file_time < today:
changedirectory('/folders*')
I am not sure how to get this to check the latest timestamp from now. Any ideas?
Thanks

There is no actual trace of the "time created" in most OS / filesystems: what you get as mtime is the time a file or directory was modified (so for example creating a file in a directory updates the directory's mtime) -- and from ctime, when offered, the time of the latest inode change (so it would be updated by creating or removing a sub-directory).
Assuming you're fine with e.g. "last-modified" (and your use of "created" in the question was just an error), you can find (e.g.) all subdirectories of the current directory:
import os
all_subdirs = [d for d in os.listdir('.') if os.path.isdir(d)]
and get the one with the latest mtime (in Python 2.5 or better):
latest_subdir = max(all_subdirs, key=os.path.getmtime)
If you need to operate elsewhere than the current directory, it's not very different, e.g.:
def all_subdirs_of(b='.'):
result = []
for d in os.listdir(b):
bd = os.path.join(b, d)
if os.path.isdir(bd): result.append(bd)
return result
the latest_subdir assignment does not change given, as all_subdirs, any list of paths
(be they paths of directories or files, that max call gets the latest-modified one).

One liner to find latest
# Find latest
import os, glob
max(glob.glob(os.path.join(directory, '*/')), key=os.path.getmtime)
One liner to find n'th latest
# Find n'th latest
import os, glob
sorted(glob.glob(os.path.join(directory, '*/')), key=os.path.getmtime)[-n]

And a quick one-liner:
directory = 'some/path/to/the/main/dir'
max([os.path.join(directory,d) for d in os.listdir(directory)], key=os.path.getmtime)

Python Version 3.4+
We can try pathlib and the solution will be one liner
Find latest
import pathlib
max(pathlib.Path(directory).glob('*/'), key=os.path.getmtime)
To get nth latest
import pathlib
sorted(pathlib.Path(directory).glob('*/'), key=os.path.getmtime)[-n]

here's one way to find latest directory
import os
import time
import operator
alist={}
now = time.time()
directory=os.path.join("/home","path")
os.chdir(directory)
for file in os.listdir("."):
if os.path.isdir(file):
timestamp = os.path.getmtime( file )
# get timestamp and directory name and store to dictionary
alist[os.path.join(os.getcwd(),file)]=timestamp
# sort the timestamp
for i in sorted(alist.iteritems(), key=operator.itemgetter(1)):
latest="%s" % ( i[0])
# latest=sorted(alist.iteritems(), key=operator.itemgetter(1))[-1]
print "newest directory is ", latest
os.chdir(latest)

import os, datetime, operator
dir = "/"
folders = [(f,os.path.getmtime('%s/%s'%(dir,f))) for f in os.listdir(dir) if os.path.isdir(f)]
(newest_folder, mtime) = sorted(folders, key=operator.itemgetter(1), reverse=True)[0]

Related

Python script that finds recently edited files (edited two days before till now)

I am trying to come up with a Python script to find all recently edited files on the computer.
Below is the code.
import os
import datetime as dt
now = dt.datetime.now()
ago = now-dt.timedelta(hours=48)
for root, dirs,files in os.walk('.'):
for fname in files:
path = os.path.join(root, fname)
st = os.stat(path)
mtime = dt.datetime.fromtimestamp(st.st_mtime)
if mtime > ago:
print('%s modified %s'%(path, mtime))
On linux, mtime is returned as UTC whereas on windows it's localized.
Also, datetime objects are dumb (don't bother about timezones by default) so we're better of using timestamps.
Here's a working version that uses ofunctions.file_utils package in order to traverse paths. ofunctions.file_utils also happens to have a file_creation_date() function ;)
First install the package with
python -m pip install ofunctions.file_utils
Then use the following script
import os
import datetime as dt
if os.name == 'nt':
now = dt.datetime.now().timestamp()
else:
now = dt.datetime.utcnow().timestamp()
ago = 86400 * 2 # Since we deal in timestamps, lets declare 2 days in seconds
files = get_files_recursive('.')
for file in files:
mtime = os.stat(file).st_mtime
if mtime > now - ago:
print('%s modified %s'%(file, mtime))
DISCLAIMER: I'm the author of ofunctions

How to find oldest and newest file in a directory?

My code should find the newest and oldest files in a folder and its subfolders. It works for the top-level folder but it doesn't include files within subfolders.
import os
import glob
mypath = 'C:/RDS/*'
print(min(glob.glob(mypath), key=os.path.getmtime))
print(max(glob.glob(mypath), key=os.path.getmtime))
How do I make it recurse into the subfolders?
Try using pathlib, also getmtime gives the last modified time, you want the time file was created so use getctime
if you strictly want only files:
import os
import pathlib
mypath = 'your path'
taggedrootdir = pathlib.Path(mypath)
print(min([f for f in taggedrootdir.resolve().glob('**/*') if f.is_file()], key=os.path.getctime))
print(max([f for f in taggedrootdir.resolve().glob('**/*') if f.is_file()], key=os.path.getctime))
if results may include folders:
import os
import pathlib
mypath = 'your path'
taggedrootdir = pathlib.Path(mypath)
print(min(taggedrootdir.resolve().glob('**/*'), key=os.path.getctime))
print(max(taggedrootdir.resolve().glob('**/*'), key=os.path.getctime))
As the docs show, you can add a recursive=True keyword argument to glob.glob()
so your code becomes:
import os
import glob
mypath = 'C:/RDS/*'
print(min(glob.glob(mypath, recursive=True), key=os.path.getmtime))
print(max(glob.glob(mypath, recursive=True), key=os.path.getmtime))
This should give you the oldest and newest file in your folder and all its subfolders.
Pay attention to the os filepath separator: "/" (on unix) vs. "\" (on windows).
You can try something like below.
It saves the files list in a variable, it is faster than traversing twice the file system.
There is one line for debugging, comment it in production.
import os
import glob
mypath = 'D:\RDS\**'
allFilesAndFolders = glob.glob(mypath, recursive=True)
# just for debugging
print(allFilesAndFolders)
print(min(allFilesAndFolders, key=os.path.getmtime))
print(max(allFilesAndFolders, key=os.path.getmtime))
Here's a fairly efficient way of doing it. It determines the oldest and newest files by iterating through them all once. Since it uses iteration, there's no need to first create a list of them and go through it twice to determine the two extremes.
mport os
import pathlib
def max_min(iterable, keyfunc=None):
if keyfunc is None:
keyfunc = lambda x: x # Identity.
iterator = iter(iterable)
most = least = next(iterator)
mostkey = leastkey = keyfunc(most)
for item in iterator:
key = keyfunc(item)
if key > mostkey:
most = item
mostkey = key
elif key < leastkey:
least = item
leastkey = key
return most, least
mypath = '.'
files = (f for f in pathlib.Path(mypath).resolve().glob('**/*') if f.is_file())
oldest, newest = max_min(files, keyfunc=os.path.getmtime)
print(f'oldest file: {oldest}')
print(f'newest file: {newest}')

Is there a way to loop through folder structure to find file name?

I want to be able to run a script that looks for a specific file name that contains text and a date that is 3 days later and returns a yes/no response based on the findings. I want to call a powershell script that would do this from a master python script. Basically I want the script to look in a subfolder called "PACP" and find a file with the name test_%date%_deliverable.mdb for example, and if say, its misspelled, to return a line noting the error. Are there any examples of scripts like this?
You don't have to do this by hand with Python. You can use the glob module like this:
import glob
date = "2021-01-01"
for f in glob.glob(f"PACP/**/test_{date}_deliverable.mdb"):
# do something with the file matching the pattern
print(f)
Depending what you want to do with the files you can also use the pathlib module. The Path objects also support the same syntax with their glob() method.
Something like that?
import os
from datetime import timedelta, date
# get the working directory
dir_path = os.path.dirname(os.path.realpath(__file__))
# get current date
today = date.today()
formattedDate = today.strftime("%d-%m-%Y")
# search for file
for root, dirs, files in os.walk(dir_path + "/PACP"):
for file in files:
if file == "text_" + formattedDate + "_deliverable.mdb":
print("found file")
quit()
print("file not found")
quit()
This for checking misspelled file names in directory
from pathlib import Path
import re
for path in Path('PACP').rglob('*.mdb'):
m = re.match(r"(test)_(\d{6})_(deliverable)", path.name)
if m is None:
print(path.name)
for example using a list of files from directory
files=[
'test_210127_deliverable.mdb',
'ttes_210127_derivrablle.mdb', #
'tset_2101327_deliveraxxle.mdb',#
'test_210128_deliverable.mdb',
'test_210127_seliverable.mdb',#
'test_2101324_deliverable.mdb']#
for s in files:
m = re.match(r"(test)_(\d{6})_(deliverable)",s)
if m is None: print(s)
misspelled output:
ttes_210127_derivrablle.mdb
tset_2101327_deliveraxxle.mdb
test_210127_seliverable.mdb
test_2101324_deliverable.mdb
or a much more precise date matching yyyy mm dd with a choice of 3 consistent separators: _,-,.
files=[
'test_2021-01-27_deliverable.mdb',
'test_2021.01.27_deliverable.mdb',
'test_2021_01_27_deliverable.mdb',
'test_2021-21-27_deliverable.mdb',
'test_2021-01-72_deliverable.mdb',
'tets_2021-01-27_deliverable.mdb',
'test_2021-01-27_delivvrable.mdb']
for s in files:
m = re.match(r"(test)_(19|20)\d\d([-._])(0[1-9]|1[012])\3(0[1-9]|[12][0-9]|3[01])_(deliverable)",s)
if m is None: print(s)
misspelled output (more precise in date matching)
test_2021-21-27_deliverable.mdb
test_2021-01-72_deliverable.mdb
tets_2021-01-27_deliverable.mdb
test_2021-01-27_delivvrable.mdb

Loop over multiple folders from list with glob.glob

How can I loop over a defined list of folders and all of the individual files inside each of those folders?
I'm trying to have it copy all the months in each year folder. But when I run it nothing happens..
import shutil
import glob
P4_destdir = ('Z:/Source P4')
yearlist = ['2014','2015','2016']
for year in yearlist:
for file in glob.glob(r'{0}/*.csv'.format(yearlist)):
print (file)
shutil.copy2(file,P4_destdir)
I think the problem might be that you require a / in you source path:
import shutil
import glob
P4_destdir = ('Z:/Source P4/')
yearlist = ['2014','2015','2016'] # assuming these files are in the same directory as your code.
for year in yearlist:
for file in glob.glob(r'{0}/*.csv'.format(yearlist)):
print (file)
shutil.copy2(file,P4_destdir)
Another thing that might be a problem is if the destination file does not yet exist. You can create it using os.mkdir:
import os
dest = os.path.isdir('Z:/Source P4/') # Tests if file exists
if not dest:
os.mkdir('Z:/Source P4/')

How to delete a file by extension in Python?

I was messing around just trying to make a script that deletes items by ".zip" extension.
import sys
import os
from os import listdir
test=os.listdir("/Users/ben/downloads/")
for item in test:
if item.endswith(".zip"):
os.remove(item)
Whenever I run the script I get:
OSError: [Errno 2] No such file or directory: 'cities1000.zip'
cities1000.zip is obviously a file in my downloads folder.
What did I do wrong here? Is the issue that os.remove requires the full path to the file? If this is the issue, than how can I do that in this current script without completely rewriting it.
You can set the path in to a dir_name variable, then use os.path.join for your os.remove.
import os
dir_name = "/Users/ben/downloads/"
test = os.listdir(dir_name)
for item in test:
if item.endswith(".zip"):
os.remove(os.path.join(dir_name, item))
For this operation you need to append the file name on to the file path so the command knows what folder you are looking into.
You can do this correctly and in a portable way in python using the os.path.join command.
For example:
import os
directory = "/Users/ben/downloads/"
test = os.listdir( directory )
for item in test:
if item.endswith(".zip"):
os.remove( os.path.join( directory, item ) )
Alternate approach that avoids join-ing yourself over and over: Use glob module to join once, then let it give you back the paths directly.
import glob
import os
dir = "/Users/ben/downloads/"
for zippath in glob.iglob(os.path.join(dir, '*.zip')):
os.remove(zippath)
I think you could use Pathlib-- a modern way, like the following:
import pathlib
dir = pathlib.Path("/Users/ben/downloads/")
zip_files = dir.glob(dir / "*.zip")
for zf in zip_files:
zf.unlink()
If you want to delete all zip files recursively, just write so:
import pathlib
dir = pathlib.Path("/Users/ben/downloads/")
zip_files = dir.rglob(dir / "*.zip") # recursively
for zf in zip_files:
zf.unlink()
Just leaving my two cents on this issue: if you want to be chic you can use glob or iglob from the glob package, like so:
import glob
import os
files_in_dir = glob.glob('/Users/ben/downloads/*.zip')
# or if you want to be fancy, you can use iglob, which returns an iterator:
files_in_dir = glob.iglob('/Users/ben/downloads/*.zip')
for _file in files_in_dir:
print(_file) # just to be sure, you know how it is...
os.remove(_file)
origfolder = "/Users/ben/downloads/"
test = os.listdir(origfolder)
for item in test:
if item.endswith(".zip"):
os.remove(os.path.join(origfolder, item))
The dirname is not included in the os.listdir output. You have to attach it to reference the file from the list returned by said function.
Prepend the directory to the filename
os.remove("/Users/ben/downloads/" + item)
EDIT: or change the current working directory using os.chdir.

Categories