A Pythonic way to delete older logfiles - python

I'm just cleaning log files greater than 50 (by oldest first). This is the only thing I've been able to come up with and I feel like there is a better way to do this. I'm currently getting a pylint warning using the lambda on get_time.
def clean_logs():
log_path = "Runtime/logs/"
max_log_files = 50
def sorted_log_list(path):
get_time = lambda f: os.stat(os.path.join(path, f)).st_mtime
return list(sorted(os.listdir(path), key=get_time))
del_list = sorted_log_list(log_path)[0:(len(sorted_log_list(log_path)) - max_log_files)]
for x in del_list:
pathlib.Path(pathlib.Path(log_path).resolve() / x).unlink(missing_ok=True)
clean_logs()

The two simplified solutions below are used to accomplish different tasks, so included both for flexibility. Obviously, you can wrap this in a function if you like.
Both code examples breaks down into the following steps:
Set the date delta (as an epoch reference) for mtime comparison as, N days prior to today.
Collect the full path to all files matching a given extension.
Create a generator (or list) to hold the files to be deleted, using mtime as a reference.
Iterate the results and delete all applicable files.
Removing log files older than (n) days:
import os
from datetime import datetime as dt
from glob import glob
# Setup
path = '/tmp/logs/'
days = 5
ndays = dt.now().timestamp() - days * 86400
# Collect all files.
files = glob(os.path.join(path, '*.sql.gz'))
# Choose files to be deleted.
to_delete = (f for f in files if os.stat(f).st_mtime < ndays)
# Delete files older than (n) days.
for f in to_delete:
os.remove(f)
Keeping the (n) latest log files
To keep the (n) latest log files, simply replace the to_delete definition above with:
n = 50
to_delete = sorted(files, key=lambda x: os.stat(x).st_mtime)[:len(files)-n]

Related

Is there a better way to do this? Counting Files, and directories via for loop vs map

Folks,
I'm trying to optimize this to help speed up the process...
What I am doing is creating a dictionary of scandir entries...
e.g.
fs_data = {}
for item in Path(fqpn).iterdir():
# snipped out a bunch of normalization code
fs_data[item.name.title().strip()] = item
{'file1': <file1 scandisk data>, etc}
and then later using a function to gather the count of files, and directories in the data.
Now I suspect that the new code, using map could be optimized to be faster than the old code. I suspect that having to run the list comprehension twice, once for files, and once for directories.
But I can't think of a way to optimize it to only have to run once.
Can anyone suggest a way to sum the files, and directories at the same time in the new version? (I could fall back to the old code, if necessary)
But I might be over optimizing at this point?
Any feedback would be welcome.
def new_fs_counts(fs_entries) -> (int, int):
"""
Quickly count the files vs directories in a list of scandir entries
Used primary by sync_database_disk to count a path's files & directories
Parameters
----------
fs_entries (list) - list of scandir entries
Returns
-------
tuple - (# of files, # of dirs)
"""
def counter(fs_entry):
return (fs_entry.is_file(), not fs_entry.is_file())
mapdata = list(map(counter, fs_entries.values()))
files = sum(files for files, _ in mapdata)
dirs = sum(dirs for _, dirs in mapdata)
return (files, dirs)
vs
def old_fs_counts(fs_entries) -> (int, int):
"""
Quickly count the files vs directories in a list of scandir entries
Used primary by sync_database_disk to count a path's files & directories
Parameters
----------
fs_entries (list) - list of scandir entries
Returns
-------
tuple - (# of files, # of dirs)
"""
files = 0
dirs = 0
for fs_item in fs_entries:
is_file = fs_entries[fs_item].is_file()
files += is_file
dirs += not is_file
return (files, dirs)
map is fast here if you map the is_file function directly:
files = sum(map(os.DirEntry.is_file, fs_entries.values()))
dirs = len(fs_entries) - files
(Something with filter might be even faster, at least if most entries aren't files. Or filter with is_dir if that works for you and most entries aren't directories. Or itertools.filterfalse with is_file. Or using itertools.compress. Also, counting True with list.count or operator.countOf instead of summing bools might be faster. But all of these ideas take more code (and some also memory). I'd prefer my above way.)
Okay, map is definitely not the right answer here.
This morning I got up and created a test using timeit...
and it was a bit of a splash of reality to the face.
Without optimizations, new vs old, the new map code was roughly 2x the time.
New : 0.023185124970041215
old : 0.011841499945148826
I really ended up falling for a bit of click bait, and thought that rewriting with MAP would gain some better efficiency.
For the sake of completeness.
from timeit import timeit
import os
new = '''
def counter(fs_entry):
files = fs_entry.is_file()
return (files, not files)
mapdata = list(map(counter, fs_entries.values()))
files = sum(files for files, _ in mapdata)
dirs = sum(dirs for _, dirs in mapdata)
#dirs = len(fs_entries)-files
'''
#dirs = sum(dirs for _, dirs in mapdata)
old = '''
files = 0
dirs = 0
for fs_item in fs_entries:
is_file = fs_entries[fs_item].is_file()
files += is_file
dirs += not is_file
'''
fs_location = '/Volumes/4TB_Drive/gallery/albums/collection1'
fs_data = {}
for item in os.scandir(fs_location):
fs_data[item.name] = item
print("New : ", timeit(stmt=new, number=1000, globals={'fs_entries':fs_data}))
print("old : ", timeit(stmt=old, number=1000, globals={'fs_entries':fs_data}))
And while I was able close the gap with some optimizations.. (Thank you Lee for your suggestion)
New : 0.10864979098550975
old : 0.08246175001841038
It is clear that the for loop solution is easier to read, faster, and just simpler.
The speed difference between new and old, doesn't seem to be map specifically.
The duplicate sum statement added .021, and The biggest slow down was from the second fs_entry.is_file, it added .06x to the timings...

Is it possible to do dbutils io asynchronously?

I've written some code (based on https://stackoverflow.com/a/40199652/529618) that writes partitioned data to blob, and for the most part it's quite quick. The slowest part is that the one csv file per partition I have spark generate are named in a user-unfriendly way, so I do a simple rename operation to clean them up (and delete some excess files). This takes much longer than writing the data in the first place.
# Organize the data into a folders matching the specified partitions, with a single CSV per partition
from datetime import datetime
def one_file_per_partition(df, path, partitions, sort_within_partitions, VERBOSE = False):
extension = ".csv.gz" # TODO: Support multiple extention
start = datetime.now()
df.repartition(*partitions).sortWithinPartitions(*sort_within_partitions) \
.write.partitionBy(*partitions).option("header", "true").option("compression", "gzip").mode("overwrite").csv(path)
log(f"Wrote {get_df_name(df)} data partitioned by {partitions} and sorted by {sort_within_partitions} to:" +
f"\n {path}\n Time taken: {(datetime.now() - start).total_seconds():,.2f} seconds")
# Recursively traverse all partition subdirectories and rename + move the CSV to their root
# TODO: This is very slow, it should be parallelizable
def traverse(root, remaining_partitions):
if VERBOSE: log(f"Traversing partitions by {remaining_partitions[0]} within folder: {root}")
for folder in list_subfolders(root):
subdirectory = os.path.join(root, folder)
if(len(remaining_partitions) > 1):
traverse(subdirectory, remaining_partitions[1:])
else:
destination = os.path.join(root, folder[len(f"{remaining_partitions[0]}="):]) + extension
if VERBOSE: log(f"Moving file\nFrom:{subdirectory}\n To:{destination}")
spark_output_to_single_file(subdirectory, destination, VERBOSE)
log(f"Cleaning up spark output directories...")
start = datetime.now()
traverse(path, partitions)
log(f"Moving output files to their destination took {(datetime.now() - start).total_seconds():,.2f} seconds")
# Convert a single-file spark output folder into a single file at the specified location, and clean up superfluous artifacts
def spark_output_to_single_file(output_folder, destination_path, VERBOSE = False):
output_files = [x for x in dbutils.fs.ls(output_folder) if x.name.startswith("part-")]
if(len(output_files) == 0):
raise FileNotFoundError(f"Could not find any output files (prefixed with 'part-') in the specified spark output folder: {output_folder}")
if(len(output_files) > 1):
raise ValueError(f"The specified spark folder has more than 1 output file in the specified spark output folder: {output_folder}\n" +
f"We found {len(output_files)}: {[x.name for x in output_files]}\n" +
f"This function should only be used for single-file spark outputs.")
dbutils.fs.mv(output_files[0].path, destination_path)
# Clean up all the other spark output generated to our temp folder
dbutils.fs.rm(output_folder, recurse=True)
if VERBOSE: log(f"Successfully wrote {destination_path}")
Here is a sample output:
2022-04-22 20:36:45.313963 Wrote df_test data partitioned by ['Granularity', 'PORTINFOID'] and sorted by ['Rank'] to: /mnt/.../all_data_by_rank
Time taken: 19.31 seconds
2022-04-22 20:36:45.314020 Cleaning up spark output directories...
2022-04-22 20:37:42.583850 Moving output files to their destination took 57.27 seconds
I believe the reason is that I'm processing the folders sequentially, and if I could simply do it in parallel, it would go much quicker.
The problem is that all IO on databricks is done with "dbutils", which is abstracting out mounted blob container and making this sort of thing very easy. I just can't find any information about doing async IO with this utility though.
Does anyone know how I could attempt to parallelize this activity?
The solution wound up being to abandon dbutils, which does not support parallelism in any way, and instead use os operations, which does:
import os
from datetime import datetime
from pyspark.sql.types import StringType
# Recursively traverse all partition subdirectories and rename + move the outputs to their root
# NOTE: The code to do this sequentially is much simpler, but very slow. The complexity arises from parallelising the file operations
def spark_output_to_single_file_per_partition(root, partitions, output_extension, VERBOSE = False):
if VERBOSE: log(f"Cleaning up spark output directories...")
start = datetime.now()
# Helper to recursively collect information from all partitions and flatten it into a single list
def traverse_partitions(root, partitions, fn_collect_info, currentPartition = None):
results = [fn_collect_info(root, currentPartition)]
return results if len(partitions) == 0 else results + \
[result for subdir in [traverse_partitions(os.path.join(root, folder), partitions[1:], fn_collect_info, partitions[0]) for folder in list_subfolders(root)] for result in subdir]
# Get the path of files to rename or delete. Note: We must convert to OS paths because we cannot parallelize use of dbutils
def find_files_to_rename_and_delete(folder, partition):
files = [x.name for x in dbutils.fs.ls(folder)]
renames = [x for x in files if x[0:5] == "part-"]
deletes = [f"/dbfs{folder}/{x}" for x in files if x[0:1] == "_"]
if len(renames) > 0 and partition is None: raise Exception(f"Found {len(files)} partition file(s) in the root location: {folder}. Have files already been moved?")
elif len(renames) > 1: raise Exception(f"Expected at most one partition file, but found {len(files)} in location: {folder}")
elif len(renames) == 1: deletes.append(f"/dbfs{folder}/") # The leaf-folders (containing partitions) should be deleted after the file is moved
return (deletes, None if len(renames) == 0 else (f"/dbfs{folder}/{renames[0]}", f"/dbfs{folder.replace(partition + '=', '')}{output_extension}"))
# Scan the file system to find all files and folders that need to be moved and deleted
if VERBOSE: log(f"Collecting a list of files that need to be renamed and deleted...")
actions = traverse_partitions(root, partitions, find_files_to_rename_and_delete)
# Rename all files in parallel using spark executors
renames = [rename for (deletes, rename) in actions if rename is not None]
if VERBOSE: log(f"Renaming {len(renames)} partition files...")
spark.createDataFrame(renames, ['from', 'to']).foreach(lambda r: os.rename(r[0], r[1]))
# Delete unwanted spark temp files and empty folders
deletes = [path for (deletes, rename) in actions for path in deletes]
delete_files = [d for d in deletes if d[-1] != "/"]
delete_folders = [d for d in deletes if d[-1] == "/"]
if VERBOSE: log(f"Deleting {len(delete_files)} spark outputs...")
spark.createDataFrame(delete_files, StringType()).foreach(lambda r: os.remove(r[0]))
if VERBOSE: log(f"Deleting {len(delete_folders)} empty folders...")
spark.createDataFrame(delete_folders, StringType()).foreach(lambda r: os.rmdir(r[0]))
log(f"Moving output files to their destination and cleaning spark artifacts took {(datetime.now() - start).total_seconds():,.2f} seconds")
This lets you generate partitioned data, with user-friendly names, and clean up all the spark temp files (_started..., _committed..., _SUCCESS) generated in the process.
Usage:
# Organize the data into a folders matching the specified partitions, with a single CSV per partition
def dataframe_to_csv_gz_per_partition(df, path, partitions, sort_within_partitions, rename_spark_outputs = True, VERBOSE = False):
start = datetime.now()
# Write the actual data to disk using spark
df.repartition(*partitions).sortWithinPartitions(*sort_within_partitions) \
.write.partitionBy(*partitions).option("header", "true").option("compression", "gzip").mode("overwrite").csv(path)
log(f"Wrote {get_df_name(df)} data partitioned by {partitions} and sorted by {sort_within_partitions} to:" +
f"\n {path}\n Time taken: {(datetime.now() - start).total_seconds():,.2f} seconds")
# Rename outputs and clean up
spark_output_to_single_file_per_partition(path, partitions, ".csv.gz", VERBOSE)
For what it's worth, I also tried parallelizing with Pool, but the results were not as good. I haven't attempted importing and using any libraries that can do async io, I imagine this would perform the best.

Python Basics - First Project/Challenge

I'm extremely new to Python (and software programming/development in general). I decided to use the scenario below as my first project. The project includes 5 main personal challenges. Some of the challenges I have been able to complete (although probably not the most effecient way), and others I'm struggling with. Any feedback you have on my approach and recommendations for improvement is GREATLY appreciated.
Project Scenario = "If I doubled my money each day for 100 days, how much would I end up with at day #100? My starting amount on Day #1 is $1.00"
1.) Challenge 1 - What is the net TOTAL after day 100 - (COMPLETED, I think, please correct me if I'm wrong)
days = 100
compound_rate = 2
print('compound_rate ** days) # 2 raised to the 100th
#==Result===
1267650600228229401496703205376
2.) Challenge 2 - Print to screen the DAYS in the first column, and corresponding Daily Total in the second column. - (COMPLETED, I think, please correct me if I'm wrong)
compound_rate = 2
days_range = list(range(101))
for x in days_range:
print (str(x),(compound_rate ** int(x)))
# ===EXAMPLE Results
# 0 1
# 1 2
# 2 4
# 3 8
# 4 16
# 5 32
# 6 64
# 100 1267650600228229401496703205376
3.) Challenge 3 - Write TOTAL result (after the 100 days) to an external txt file - (COMPLETED, I think, please correct me if I'm wrong)
compound_rate = 2
days_range = list(range(101))
hundred_days = (compound_rate ** 100)
textFile = open("calctest.txt", "w")
textFile.write(str(hundred_days))
textFile.close()
#===Result====
string of 1267650600228229401496703205376 --> written to my file 'calctest.txt'
4.) Challenge 4 - Write the Calculated running DAILY Totals to an external txt file. Column 1 will be the Day, and Column 2 will be the Amount. So just like Challenge #2 but to an external file instead of screen
NEED HELP, I can't seem to figure this one out.
5.) Challenge 5 - Somehow plot or chart the Daily Results (based on #4) - NEED GUIDANCE.
I appreciate everyone's feedback as I start on my personal Python journey!
challenge 2
This will work fine, but there's no need to write list(range(101)), you can just write range(101). In fact, there's no need even to create a variable to store that, you can just do this:
for x in range(101):
print("whatever you want to go here")
challenge 3
Again, this will work fine, but when writing to a file, it is normally best to use a with statement, this means that you don't need to close the file at the end, as python will take care of that. For example:
with open("calctest.txt", "w") as f:
write(str(hundred_days))
challenge 4
Use a for loop as you did with challenge 2. Use "\n" to write a new line. Again do everything inside a with statement. e.g.
with open("calctest.txt", "w") as f:
for x in range(101):
f.write("something here \n").
(would write a file with 'something here ' written 101 times)
challenge 5
There is a python library called matplotlib, which I have never used, but I would suggest that would be where to go to in order to solve this task.
I hope this is of some help :)
You can use what you did in challenge 3 to open and close the ouput file.
In between, you have to do what you did in challenge 2 to compute the data for each day.
In stead of writing the daily result to the stream, you will have to combine it into a string. After that, you can write that string to the file, exactly like you did in challenge 3.
Challenge One:
This is the correct way.
days = 100
compound_rate = 2
print("Result after 100 days" + (compound_rate ** days))
Challenge Two
This is corrected.
compound_rate = 2
days_range = list(range(101))
for x in days_range:
print(x + (compound_rate ** x))
Challenge Three
This one is close but you didn't need to cast the result of hundred_days to a string as you can write the integer to a file and python doesn't care most of the time. Explicit casts need only to be worried about when using the data in some way other than simply printing it.
compound_rate = 2
days_range = list(range(101))
hundred_days = (compound_rate ** 100)
textFile = open("calctest.txt", "w")
textFile.write(hundred_days)
textFile.close()
Challenge Four
For this challenge, you will want to look into the python CSV module. You can write the data in two rows separated by commas very simply with this module.
Challenge Five
For this challenge, you will want to look into the python library matplotlib. This library will give you tools to work with the data in a graphical way.
Answer for challenge 1 is as follows:
l = []
for a in range(0,100):
b = 2 ** a
l.append(b)
print("Total after 100 days", sum(l))
import os, sys
import datetime
import time
#to get the current work directory, we use below os.getcwd()
print(os.getcwd())
#to get the list of files and folders in a path, we use os.listdir
print(os.listdir())
#to know the files inside a folder using path
spath = (r'C:\Users\7char')
l = spath
print(os.listdir(l))
#converting a file format to other, ex: txt to py
path = r'C:\Users\7char'
print(os.listdir(path))
# after looking at the list of files, we choose to change 'rough.py' 'rough.txt'
os.chdir(path)
os.rename('rough.py','rough.txt')
#check whether the file has changed to new format
print(os.listdir(path))
#yes now the file is changed to new format
print(os.stat('rough.txt').st_size)
# by using os.stat function we can see the size of file (os.stat(file).sst_size)
path = r"C:\Users\7char\rough.txt"
datetime = os.path.getmtime(path)
moddatetime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(datetime))
print("Last Modified Time : ", moddatetime)
#differentiating b/w files and folders using - os.path.splitext
import os
path = r"C:\Users\7char\rough.txt"
dir(os.path)
files = os.listdir()
for file in files:
print(os.path.splitext(file))
#moving a file from one folder to other (including moving with folders of a path or moving into subforlders)
import os
char_7 = r"C:\Users\7char"
cleardata = r"C:\Users\clearadata"
operating = os.listdir(r"C:\Users\7char")
print(operating)
for i in operating:
movefrom = os.path.join(char_7,i)
moveto = os.path.join(cleardata,i)
print(movefrom,moveto)
os.rename(movefrom,moveto)
#now moving files based on length of individual charecter (even / odd) to a specified path (even or odd).
import os
origin_path = r"C:\Users\movefilehere"
fivechar_path= r"C:\Users\5char"
sevenchar_path = r"C:\Users\7char"
origin_path = os.listdir(origin_path)
for file_name in origin_pathlist:
l = len(file_name)
if l % 2 == 0:
evenfilepath = os.path.join(origin_path,file_name)
newevenfilepath = os.path.join(fivechar_path,file_name)
print(evenfilepath,newevenfilepath)
os.rename(evenfilepath,newevenfilepath)
else:
oddfilepath = os.path.join(origin_path,file_name)
newoddfilepath = os.path.join(sevenchar_path,file_name)
print(oddfilepath,newoddfilepath)
os.rename(oddfilepath,newoddfilepath)
#finding the extension in a folder using isdir
import os
path = r"C:\Users\7char"
print(os.path.isdir(path))
#how a many files .py and .txt (any files) in a folder
import os
from os.path import join, splitext
from glob import glob
from collections import Counter
path = r"C:\Users\7char"
c = Counter([splitext(i)[1][1:] for i in glob(join(path, '*'))])
for ext, count in c.most_common():
print(ext, count)
#looking at the files and extensions, including the total of extensions.
import os
from os.path import join, splitext
from collections import defaultdict
path = r"C:\Users\7char"
c = defaultdict(int)
files = os.listdir(path)
for filenames in files:
extension = os.path.splitext(filenames)[-1]
c[extension]+=1
print(os.path.splitext(filenames))
print(c,extension)
#getting list from range
list(range(4))
#break and continue statements and else clauses on loops
for n in range(2,10):
for x in range(2,n):
if n%x == 0:
print(n,'equals',x, '*', n//x)
break
else:
print(n, 'is a prime number')
#Dictionaries
#the dict() constructer builds dictionaries directly from sequences of key-value pairs
dict([('ad', 1212),('dasd', 2323),('grsfd',43324)])
#loop over two or more sequences at the same time, the entries can be paired with the zip() function.
questions = ['name', 'quest', 'favorite color']
answers = ['lancelot', 'the holy grail', 'blue']
for q, a in zip(questions, answers):
print('What is your {0}? It is {1}.'.format(q, a))
#Using set()
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
for f in sorted(set(basket)):
print(f)

Python: How do I iterate over several files with similar names (the variation in each name is the date)?

I wrote a program that filters files containing to pull location and time from specific ones. Each file contains one day's worth of tweets.
I would like to run this program over one year's worth of tweets, which would involve iterating over 365 folders with names like this: 2011--.tweets.dat.gz, with the stars representing numbers that complete the file name to make it a date for each day in the year.
Basically, I'm looking for code that will loop over 2011-01-01.tweets.dat.gz, 2011-01-02.tweets.dat.gz, ..., all the way through 2011-12-31.tweets.dat.gz.
What I'm imagining now is somehow telling the program to loop over all files with the name 2011-*.tweets.dat.gz, but I'm not sure exactly how that would work or how to structure it, or even if the * syntax is correct.
Any tips?
Easiest way is indeed with a glob:
import from glob import iglob
for pathname in iglob("/path/to/folder/2011-*.tweets.dat.gz"):
print pathname # or do whatever
Use the datetime module:
>>> from datetime import datetime,timedelta
>>> d = datetime(2011,1,1)
while d < datetime(2012,1,1) :
filename = "{}{}".format(d.strftime("%Y-%m-%d"),'.tweets.dat.gz')
print filename
d = d + timedelta(days = 1)
...
2011-01-01.tweets.dat.gz
2011-01-02.tweets.dat.gz
2011-01-03.tweets.dat.gz
2011-01-04.tweets.dat.gz
2011-01-05.tweets.dat.gz
2011-01-06.tweets.dat.gz
2011-01-07.tweets.dat.gz
2011-01-08.tweets.dat.gz
2011-01-09.tweets.dat.gz
2011-01-10.tweets.dat.gz
...
...
2011-12-27.tweets.dat.gz
2011-12-28.tweets.dat.gz
2011-12-29.tweets.dat.gz
2011-12-30.tweets.dat.gz
2011-12-31.tweets.dat.gz

Python: sort files by datetime in more details

I using python 2.7 in ubuntu. How do i sort files in more detail order because i had a script that create into numbers of txt flies for split seconds. I had mod a script, it can find the oldest and youngest file but it seem like it just compare with the second but not milliseconds.
My print output:
output_04.txt 06/08/12 12:00:18
output_05.txt 06/08/12 12:00:18
-----------
oldest: output_05.txt
youngest: output_05.txt
-----------
But the right order of oldest file should be "output_04.txt".
Any expertise know? Thanks!
Updated:
Thanks everyone.
I did try out with all of the codes but seem like can't have the output i need.
Sorry guys, i did appreciated you all. But the example of my files like above have the same time, so if the full-date, hour, min, sec are all the same, it have to compare by millisecond. isn't it? Correct me if im wrong. Thanks everyone! Cheers!
You can use os.path.getmtime(path_to_file) to get the modification time of the file.
One way of ordering the list of files is to create a list of them with os.listdir and get the modification time of each one. You would have a list of tuples and you could order it by the second element of the tuple (which would be the modification time).
You also can check the resolution of os.path.getmtime with os.stat_float_times(). If the latter returns True then os.path.getmtime returns a float (this indicates you have more resolution than seconds).
def get_files(path):
import os
if os.path.exists(path):
os.chdir(path)
files = (os.listdir(path))
items = {}
def get_file_details(f):
return {f:os.path.getmtime(f)}
results = [get_file_details(f) for f in files]
for result in results:
for key, value in result.items():
items[key] = value
return items
v = sorted(get_files(path), key=r.get)
get_files takes path as an argument and if path exists, changes current directory to the path and list of files are generated. get_file_details yields last modified time for the file.
get_files returns a dict with filename as key, modified time as value. Then standard sorted is used for sorting the values. reverse parameter can be passed to sort ascending or descending.
You can't compare the milliseconds because there is no such information.
The stat(2) call returns three time_t fields:
- access time
- creation time
- last modification time
time_t is an integer representing the number of seconds (not of milliseconds) elapsed since 00:00, Jan 1 1970 UTC.
So the maximum detail you can have in file time is seconds. I don't know if some filesystem provides more resolution but you'd have to use specific calls in C and then write wrappers in Python to use them.
HI try following code
# retrieve the file information from a selected folder
# sort the files by last modified date/time and display in order newest file first
# tested with Python24 vegaseat 21jan2006
import os, glob, time
# use a folder you have ...
root = 'D:\\Zz1\\Cartoons\\' # one specific folder
#root = 'D:\\Zz1\\*' # all the subfolders too
date_file_list = []
for folder in glob.glob(root):
print "folder =", folder
# select the type of file, for instance *.jpg or all files *.*
for file in glob.glob(folder + '/*.*'):
# retrieves the stats for the current file as a tuple
# (mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime)
# the tuple element mtime at index 8 is the last-modified-date
stats = os.stat(file)
# create tuple (year yyyy, month(1-12), day(1-31), hour(0-23), minute(0-59), second(0-59),
# weekday(0-6, 0 is monday), Julian day(1-366), daylight flag(-1,0 or 1)) from seconds since epoch
# note: this tuple can be sorted properly by date and time
lastmod_date = time.localtime(stats[8])
#print image_file, lastmod_date # test
# create list of tuples ready for sorting by date
date_file_tuple = lastmod_date, file
date_file_list.append(date_file_tuple)
#print date_file_list # test
date_file_list.sort()
date_file_list.reverse() # newest mod date now first
print "%-40s %s" % ("filename:", "last modified:")
for file in date_file_list:
# extract just the filename
folder, file_name = os.path.split(file[1])
# convert date tuple to MM/DD/YYYY HH:MM:SS format
file_date = time.strftime("%m/%d/%y %H:%M:%S", file[0])
print "%-40s %s" % (file_name, file_date)
Hope this will help
Thank You

Categories