Find/Remove oldest file in directory - python

I am trying to delete oldest file in directory when number of files reaches a threshold.
list_of_files = os.listdir('log')
if len([name for name in list_of_files]) == 25:
oldest_file = min(list_of_files, key=os.path.getctime)
os.remove('log/'+oldest_file)
Problem: The issue is in min method. list_of_files does not contain full path, so it is trying to search file in current directory and failing. How can I pass directory name ('log') to min()?

list_of_files = os.listdir('log')
full_path = ["log/{0}".format(x) for x in list_of_files]
if len(list_of_files) == 25:
oldest_file = min(full_path, key=os.path.getctime)
os.remove(oldest_file)

os.listdir will return relative paths - those are ones that are relative to your current/present working directory/context of what your Python script was executed in (you can see that via os.getcwd()).
Now, the os.remove function expects a full path/absolute path - shells/command line interfaces infer this and do it on your behalf - but Python doesn't. You can get that via using os.path.abspath, so you can change your code to be (and since os.listdir returns a list anyway, we don't need to add a list-comp over it to be able to check its length)...:
list_of_files = os.listdir('log')
if len(list_of_files) >= 25:
oldest_file = min(list_of_files, key=os.path.getctime)
os.remove(os.path.abspath(oldest_file))
That keeps it generic as to where it came from - ie, whatever was produced in the result of os.listdir - you don't need to worry about prepending suitable file paths.

I was trying to achieve the same as what you are trying to achieve. and was facing a similar issue related to os.path.abspath()
I am using a windows system with python 3.7
and the issue is that os.path.abspath() gives one folder up location
replace "Yourpath" and with the path of folder in which your file is and code should work fine
import os
import time
oldest_file = sorted([ "Yourpath"+f for f in os.listdir("Yourpath")], key=os.path.getctime)[0]
print (oldest_file)
os.remove(oldest_file)
print ("{0} has been deleted".format(oldest_file))`
There must be some cleaner method to do the same
I'll update when I get it

The glob library gives on the one hand full paths and allows filtering for file patterns. The above solution resulted in the directory itself as the oldest file, which is not that what I wanted. For me, the following is suitable (a blend of glob and the solution of #Ivan Motin)
import glob
sorted(glob.glob("/home/pi/Pictures/*.jpg"), key=os.path.getctime)[0]

using comprehension (sorry, couldn't resist):
oldest_file = sorted([os.path.abspath(f) for f in os.listdir('log') ], key=os.path.getctime)[0]

Related

absolute path for file not working properly python

basically, I'm trying to store the full path for a file in a list but for some reason os.path.abspath() doesnt seem to work properly
files = os.listdir("TRACKER/")
for f in files:
original_listpaths.append(os.path.abspath(f))
print(original_listpaths)
but my output seems to output this :
'C:\Users\******\Documents\folder\example'
the problem is that it should be :
'C:\Users\******\Documents\folder\TRACKER\example'
the difference is that the second one (the correct one) has the TRACKER included which is the official full path for that file but for some reason my output doesn't include the TRACKER
and eliminates it, What's the problem?
You could try the following code:
files = os.scandir("TRACKER/")
print(files)
original_listpaths = []
for f in files:
original_listpaths.append(os.path.abspath(f))
print(original_listpaths)
files.close()
You need to change your directory to "TRACKER" first. Just put os.chdir("TRACKER") before the loop starts after files = os.listdir("TRACKER/").

IOError: [Errno 2] No such file or directory: but the files are there...

If you do print filename in the for loop #commented below, it gives you all the file names in the directory. yet when I call pd.ExcelFile(filename) it returns that there is no file with the name of : [the first file that ends with '.xlsx' What am I missing?
p.s: the indentation below is right, the if is under the for in my code, but it doesn't show this way here..
for filename in os.listdir('/Users/ramikhoury/PycharmProjects/R/excel_files'):
if filename.endswith(".xlsx"):
month = pd.ExcelFile(filename)
day_list = month.sheet_names
i = 0
for day in month.sheet_names:
df = pd.read_excel(month, sheet_name=day, skiprows=21)
df = df.iloc[:, 1:]
df = df[[df.columns[0], df.columns[4], df.columns[8]]]
df = df.iloc[1:16]
df['Date'] = day
df = df.set_index('Date')
day_list[i] = df
i += 1
month_frame = day_list[0]
x = 1
while x < len(day_list):
month_frame = pd.concat([month_frame, day_list[x]])
x += 1
print filename + ' created the following dataframe: \n'
print month_frame # month_frame is the combination of the all the sheets inside the file in one dataframe !
The problem is that your work directory is not the same as the directory you are listing. Since you know the absolute path of the directory, the easiest solution is to add os.chdir('/Users/ramikhoury/PycharmProjects/R/excel_files') to the top of your file.
Your "if" statement must be inside the for loop
The issue is that you are trying to open a relative file-path from a different directory than the one you are listing. Rather than using os it is probably better to use a higher level interface like pathlib:
import pathlib
for file_name in pathlib.Path("/Users/ramikhoury/PycharmProjects/R/excel_files").glob("*.xslx"):
# this produces full paths for you to use
pathlib was added in Python 3.4 so if you are using an older version of python, your best bet would be to use the much older glob module, which functions similarly:
import glob
for file_name in glob.glob("/Users/ramikhoury/PycharmProjects/R/excel_files/*.xslx"):
# this also produces full paths for you to use
If for some reason you really need to use the low-level os interface, the best way to solve this is by making use of the dir_fd optional argument to open:
# open the target directory
dir_fd = os.open("/Users/ramikhoury/PycharmProjects/R/excel_files", os.O_RDONLY)
try:
# pass the open file descriptor to the os.listdir method
for file_name in os.listdir(dir_fd):
# you could replace this with fnmatch.fnmatch
if file_name.endswith(".xlsx"):
# use the open directory fd as the `dir_fd` argument
# this opens file_name relative to your target directory
with os.fdopen(os.open(file_name, os.O_RDONLY, dir_fd=dir_fd)) as file_:
# do excel bits here
finally:
# close the directory
os.close(dir_fd)
While you could accomplish this fix by changing directories at the top of your script (as suggested by another answer), this has the side-effect of changing the current working directory of your process which is often undesirable and may have negative consequences. To make this work without side-effects requires you to chdir back to the original directory:
# store cwd
original_cwd = os.getcwd()
try:
os.chdir("/Users/ramikhoury/PycharmProjects/R/excel_files")
# do your listdir, etc
finally:
os.chdir(original_cwd)
Note that this introduces a race condition into your code, as original_cwd may be removed or the access controls for that directory might be changed such that you cannot chdir back to it, which is precisely why dir_fd exists.
dir_fd was added in Python 3.3, so if you are using an older version of Python I would recommend just using glob rather than the chdir solution.
For more on dir_fd see this very helpful answer.

Finding files in directories in Python

I've been doing some scripting where I need to access the os to name images (saving every subsequent zoom of the Mandelbrot set upon clicking) by counting all of the current files in the directory and then using %s to name them in the string after calling the below function and then adding an option to delete them all
I realize the below will always grab the absolute path of the file but assuming we're always in the same directory is there not a simplified version to grab the current working directory
def count_files(self):
count = 0
for files in os.listdir(os.path.abspath(__file__))):
if files.endswith(someext):
count += 1
return count
def delete_files(self):
for files in os.listdir(os.path.abspath(__file__))):
if files.endswith(.someext):
os.remove(files)
Since you're doing the .endswith thing, I think the glob module might be of some interest.
The following prints all files in the current working directory with the extension .py. Not only that, it returns only the filename, not the path, as you said you wanted:
import glob
for fn in glob.glob('*.py'): print(fn)
Output:
temp1.py
temp2.py
temp3.py
_clean.py
Edit: re-reading your question, I'm unsure of what you were really asking. If you wanted an easier way to get the current working directory than
os.path.abspath(__file__)
Then yes, os.getcwd()
But os.getcwd() will change if you change the working directory in your script (e.g. via os.chdir(), whereas your method will not.
Using antipathy* it gets a little easier:
from antipathy import Path
def count_files(pattern):
return len(Path(__file__).glob(pattern))
def deletet_files(pattern):
Path(__file__).unlink(pattern)
*Disclosure: I'm the author of antipathy.
You can use os.path.dirname(path) to get the parent directory of the thing path points to.
def count_files(self):
count = 0
for files in os.listdir(os.path.dirname(os.path.abspath(__file__)))):
if files.endswith(someext):
count += 1
return count

Why would Python think a file doesn't exist when I think it does?

I'm trying to import some files to plot, and all was going well until I moved my program to the directory above where it was before. The relevant piece of code that seems to be problematic is below:
import os
import pandas as pd
path = os.getcwd() + '/spectrum_scan/'
files = os.listdir(path)
dframefiles = pd.DataFrame(files)
up = pd.read_csv(dframefiles.ix[i][0])
If I type directly into the shell os.path.exists(path) it returns True.
The first file in the directory spectrum_scan is foo.csv.
When I type os.path.exists(path + 'foo.csv') it returns True but os.path.isfile('foo.csv') returns False.
Also, asking for files and dframefiles returns everything as it should, but when the code is run I get Exception: File foo.csv does not exist.
Is there something obvious I'm missing?
You are using os.listdir(), which returns filenames without a path. You'll need to add the path to these:
files = [os.path.join(path, f) for f in os.listdir(path)]
otherwise python will try and look for 'foo.csv' in the current directory, and not in the spectrum_scan sub-directory where the files really are located.

How can I list the contents of a directory in Python?

Can’t be hard, but I’m having a mental block.
import os
os.listdir("path") # returns list
One way:
import os
os.listdir("/home/username/www/")
Another way:
glob.glob("/home/username/www/*")
Examples found here.
The glob.glob method above will not list hidden files.
Since I originally answered this question years ago, pathlib has been added to Python. My preferred way to list a directory now usually involves the iterdir method on Path objects:
from pathlib import Path
print(*Path("/home/username/www/").iterdir(), sep="\n")
os.walk can be used if you need recursion:
import os
start_path = '.' # current directory
for path,dirs,files in os.walk(start_path):
for filename in files:
print os.path.join(path,filename)
glob.glob or os.listdir will do it.
The os module handles all that stuff.
os.listdir(path)
Return a list containing the names of the entries in the directory given by path.
The list is in arbitrary order. It does not include the special entries '.' and
'..' even if they are present in the directory.
Availability: Unix, Windows.
In Python 3.4+, you can use the new pathlib package:
from pathlib import Path
for path in Path('.').iterdir():
print(path)
Path.iterdir() returns an iterator, which can be easily turned into a list:
contents = list(Path('.').iterdir())
Since Python 3.5, you can use os.scandir.
The difference is that it returns file entries not names. On some OSes like windows, it means that you don't have to os.path.isdir/file to know if it's a file or not, and that saves CPU time because stat is already done when scanning dir in Windows:
example to list a directory and print files bigger than max_value bytes:
for dentry in os.scandir("/path/to/dir"):
if dentry.stat().st_size > max_value:
print("{} is biiiig".format(dentry.name))
(read an extensive performance-based answer of mine here)
Below code will list directories and the files within the dir. The other one is os.walk
def print_directory_contents(sPath):
import os
for sChild in os.listdir(sPath):
sChildPath = os.path.join(sPath,sChild)
if os.path.isdir(sChildPath):
print_directory_contents(sChildPath)
else:
print(sChildPath)

Categories