Ghost files with 'listdir()' [duplicate] - python

My python script executes an os.listdir(path) where the path is a queue containing archives that I need to treat one by one.
The problem is that I'm getting the list in an array and then I just do a simple array.pop(0). It was working fine until I put the project in subversion. Now I get the .svn folder in my array and of course it makes my application crash.
So here is my question: is there a function that ignores hidden files when executing an os.listdir() and if not what would be the best way?

You can write one yourself:
import os
def listdir_nohidden(path):
for f in os.listdir(path):
if not f.startswith('.'):
yield f
Or you can use a glob:
import glob
import os
def listdir_nohidden(path):
return glob.glob(os.path.join(path, '*'))
Either of these will ignore all filenames beginning with '.'.

This is an old question, but seems like it is missing the obvious answer of using list comprehension, so I'm adding it here for completeness:
[f for f in os.listdir(path) if not f.startswith('.')]
As a side note, the docs state listdir will return results in 'arbitrary order' but a common use case is to have them sorted alphabetically. If you want the directory contents alphabetically sorted without regards to capitalization, you can use:
sorted((f for f in os.listdir() if not f.startswith(".")), key=str.lower)
(Edited to use key=str.lower instead of a lambda)

On Windows, Linux and OS X:
if os.name == 'nt':
import win32api, win32con
def folder_is_hidden(p):
if os.name== 'nt':
attribute = win32api.GetFileAttributes(p)
return attribute & (win32con.FILE_ATTRIBUTE_HIDDEN | win32con.FILE_ATTRIBUTE_SYSTEM)
else:
return p.startswith('.') #linux-osx

Joshmaker has the right solution to your question.
How to ignore hidden files using os.listdir()?
In Python 3 however, it is recommended to use pathlib instead of os.
from pathlib import Path
visible_files = [
file for file in Path(".").iterdir() if not file.name.startswith(".")
]

glob:
>>> import glob
>>> glob.glob('*')
(glob claims to use listdir and fnmatch under the hood, but it also checks for a leading '.', not by using fnmatch.)

I think it is too much of work to go through all of the items in a loop. I would prefer something simpler like this:
lst = os.listdir(path)
if '.DS_Store' in lst:
lst.remove('.DS_Store')
If the directory contains more than one hidden files, then this can help:
all_files = os.popen('ls -1').read()
lst = all_files.split('\n')
for platform independence as #Josh mentioned the glob works well:
import glob
glob.glob('*')

filenames = (f.name for f in os.scandir() if not f.name.startswith('.'))

You can just use a simple for loop that will exclude any file or directory that has "." in the front.
Code for professionals:
import os
directory_things = [i for i in os.listdir() if i[0] != "."] # Exclude all with . in the start
Code for noobs
items_in_directory = os.listdir()
final_items_in_directory = []
for i in items_in_directory:
if i[0] != ".": # If the item doesn't have any '.' in the start
final_items_in_directory.append(i)

Related

recursive searching with pathlib - python

Say I have these files
/home/user/one/two/abc.txt
/home/user/one/three/def.txt
/home/user/one/four/ghi.txt
I'm trying to find ghi.txt recursively using the pathlib module. I tried:
p = '/home/user/'
f = Path(p).rglob(*i.txt)
but the only way I can get the filename is by using a list comprehension:
file = [str(i) for i in f]
which actually only works once. Re-running the command above returns an empty list.
I decided to learn pathlib because apparently it's what is recommended by the community, but isn't:
file = glob.glob(os.path.join(p,'**/*i.txt'),recursive=True)
much more straightforward?
You already have a solution.
Not sure if I read the requirements correctly but I have posted the answer with the following assumptions.
PathListPrefix is the starting path beneath which you want to search
all files. In your case it might be '/home/user'
FileName is the name of the file that you are looking for. In your case it is 'ghi.txt'
You are not expecting more than one match.
Something other than pathlib modules has to be tried.
As far as straightforward solution is concerned I am not sure about that either. However the below solution is what I could think of using os module.
import os
PathListPrefix = '/home/user/'
FileName = 'ghi.txt'
def Search(StartingPath, FileNameToSearch):
root, ext = os.path.splitext(StartingPath)
FileName = os.path.split(root)[1]
if FileName+ext == FileNameToSearch:
return True
return False
for root, dirs, files in os.walk(PathListPrefix):
Path = root
for eachfile in files:
SearchPath = os.path.join(Path,eachfile)
Found = Search(SearchPath, FileName)
if Found:
print(Path, FileName)
break
The code is definitely having many many more lines than yours. So it is not as compact as yours.

sort filenames by their time created on linux [duplicate]

What is the best way to get a list of all files in a directory, sorted by date [created | modified], using python, on a windows machine?
I've done this in the past for a Python script to determine the last updated files in a directory:
import glob
import os
search_dir = "/mydir/"
# remove anything from the list that is not a file (directories, symlinks)
# thanks to J.F. Sebastion for pointing out that the requirement was a list
# of files (presumably not including directories)
files = list(filter(os.path.isfile, glob.glob(search_dir + "*")))
files.sort(key=lambda x: os.path.getmtime(x))
That should do what you're looking for based on file mtime.
EDIT: Note that you can also use os.listdir() in place of glob.glob() if desired - the reason I used glob in my original code was that I was wanting to use glob to only search for files with a particular set of file extensions, which glob() was better suited to. To use listdir here's what it would look like:
import os
search_dir = "/mydir/"
os.chdir(search_dir)
files = filter(os.path.isfile, os.listdir(search_dir))
files = [os.path.join(search_dir, f) for f in files] # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))
Update: to sort dirpath's entries by modification date in Python 3:
import os
from pathlib import Path
paths = sorted(Path(dirpath).iterdir(), key=os.path.getmtime)
(put #Pygirl's answer here for greater visibility)
If you already have a list of filenames files, then to sort it inplace by creation time on Windows (make sure that list contains absolute path):
files.sort(key=os.path.getctime)
The list of files you could get, for example, using glob as shown in #Jay's answer.
old answer
Here's a more verbose version of #Greg Hewgill's answer. It is the most conforming to the question requirements. It makes a distinction between creation and modification dates (at least on Windows).
#!/usr/bin/env python
from stat import S_ISREG, ST_CTIME, ST_MODE
import os, sys, time
# path to the directory (relative or absolute)
dirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'
# get all entries in the directory w/ stats
entries = (os.path.join(dirpath, fn) for fn in os.listdir(dirpath))
entries = ((os.stat(path), path) for path in entries)
# leave only regular files, insert creation date
entries = ((stat[ST_CTIME], path)
for stat, path in entries if S_ISREG(stat[ST_MODE]))
#NOTE: on Windows `ST_CTIME` is a creation date
# but on Unix it could be something else
#NOTE: use `ST_MTIME` to sort by a modification date
for cdate, path in sorted(entries):
print time.ctime(cdate), os.path.basename(path)
Example:
$ python stat_creation_date.py
Thu Feb 11 13:31:07 2009 stat_creation_date.py
There is an os.path.getmtime function that gives the number of seconds since the epoch
and should be faster than os.stat.
import os
os.chdir(directory)
sorted(filter(os.path.isfile, os.listdir('.')), key=os.path.getmtime)
Here's my version:
def getfiles(dirpath):
a = [s for s in os.listdir(dirpath)
if os.path.isfile(os.path.join(dirpath, s))]
a.sort(key=lambda s: os.path.getmtime(os.path.join(dirpath, s)))
return a
First, we build a list of the file names. isfile() is used to skip directories; it can be omitted if directories should be included. Then, we sort the list in-place, using the modify date as the key.
Here's a one-liner:
import os
import time
from pprint import pprint
pprint([(x[0], time.ctime(x[1].st_ctime)) for x in sorted([(fn, os.stat(fn)) for fn in os.listdir(".")], key = lambda x: x[1].st_ctime)])
This calls os.listdir() to get a list of the filenames, then calls os.stat() for each one to get the creation time, then sorts against the creation time.
Note that this method only calls os.stat() once for each file, which will be more efficient than calling it for each comparison in a sort.
In python 3.5+
from pathlib import Path
sorted(Path('.').iterdir(), key=lambda f: f.stat().st_mtime)
Without changing directory:
import os
path = '/path/to/files/'
name_list = os.listdir(path)
full_list = [os.path.join(path,i) for i in name_list]
time_sorted_list = sorted(full_list, key=os.path.getmtime)
print time_sorted_list
# if you want just the filenames sorted, simply remove the dir from each
sorted_filename_list = [ os.path.basename(i) for i in time_sorted_list]
print sorted_filename_list
from pathlib import Path
import os
sorted(Path('./').iterdir(), key=lambda t: t.stat().st_mtime)
or
sorted(Path('./').iterdir(), key=os.path.getmtime)
or
sorted(os.scandir('./'), key=lambda t: t.stat().st_mtime)
where m time is modified time.
Here's my answer using glob without filter if you want to read files with a certain extension in date order (Python 3).
dataset_path='/mydir/'
files = glob.glob(dataset_path+"/morepath/*.extension")
files.sort(key=os.path.getmtime)
# *** the shortest and best way ***
# getmtime --> sort by modified time
# getctime --> sort by created time
import glob,os
lst_files = glob.glob("*.txt")
lst_files.sort(key=os.path.getmtime)
print("\n".join(lst_files))
sorted(filter(os.path.isfile, os.listdir('.')),
key=lambda p: os.stat(p).st_mtime)
You could use os.walk('.').next()[-1] instead of filtering with os.path.isfile, but that leaves dead symlinks in the list, and os.stat will fail on them.
For completeness with os.scandir (2x faster over pathlib):
import os
sorted(os.scandir('/tmp/test'), key=lambda d: d.stat().st_mtime)
this is a basic step for learn:
import os, stat, sys
import time
dirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'
listdir = os.listdir(dirpath)
for i in listdir:
os.chdir(dirpath)
data_001 = os.path.realpath(i)
listdir_stat1 = os.stat(data_001)
listdir_stat2 = ((os.stat(data_001), data_001))
print time.ctime(listdir_stat1.st_ctime), data_001
Alex Coventry's answer will produce an exception if the file is a symlink to an unexistent file, the following code corrects that answer:
import time
import datetime
sorted(filter(os.path.isfile, os.listdir('.')),
key=lambda p: os.path.exists(p) and os.stat(p).st_mtime or time.mktime(datetime.now().timetuple())
When the file doesn't exist, now() is used, and the symlink will go at the very end of the list.
This was my version:
import os
folder_path = r'D:\Movies\extra\new\dramas' # your path
os.chdir(folder_path) # make the path active
x = sorted(os.listdir(), key=os.path.getctime) # sorted using creation time
folder = 0
for folder in range(len(x)):
print(x[folder]) # print all the foldername inside the folder_path
folder = +1
Here is a simple couple lines that looks for extention as well as provides a sort option
def get_sorted_files(src_dir, regex_ext='*', sort_reverse=False):
files_to_evaluate = [os.path.join(src_dir, f) for f in os.listdir(src_dir) if re.search(r'.*\.({})$'.format(regex_ext), f)]
files_to_evaluate.sort(key=os.path.getmtime, reverse=sort_reverse)
return files_to_evaluate
Add the file directory/folder in path, if you want to have specific file type add the file extension, and then get file name in chronological order.
This works for me.
import glob, os
from pathlib import Path
path = os.path.expanduser(file_location+"/"+date_file)
os.chdir(path)
saved_file=glob.glob('*.xlsx')
saved_file.sort(key=os.path.getmtime)
print(saved_file)
Turns out os.listdir sorts by last modified but in reverse so you can do:
import os
last_modified=os.listdir()[::-1]
Maybe you should use shell commands. In Unix/Linux, find piped with sort will probably be able to do what you want.

how can I save the output of a search for files matching *.txt to a variable?

I'm fairly new to python. I'd like to save the text that is printed by at this script as a variable. (The variable is meant to be written to a file later, if that matters.) How can I do that?
import fnmatch
import os
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
you can store it in variable like this:
import fnmatch
import os
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
my_var = file
# do your stuff
or you can store it in list for later use:
import fnmatch
import os
my_match = []
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
my_match.append(file) # append insert the value at end of list
# do stuff with my_match list
You can store it in a list:
import fnmatch
import os
matches = []
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
matches.append(file)
Both answers already provided are correct, but Python provides a nice alternative. Since iterating through an array and appending to a list is such a common pattern, the list comprehension was created as a one-stop shop for the process.
import fnmatch
import os
matches = [filename for filename in os.listdir("/Users/x/y") if fnmatch.fnmatch(filename, "*.txt")]
While NSU's answer and the others are all perfectly good, there may be a simpler way to get what you want.
Just as fnmatch tests whether a certain file matches a shell-style wildcard, glob lists all files matching a shell-style wildcard. In fact:
This is done by using the os.listdir() and fnmatch.fnmatch() functions in concert…
So, you can do this:
import glob
matches = glob.glob("/Users/x/y/*.txt")
But notice that in this case, you're going to get full pathnames like '/Users/x/y/spam.txt' rather than just 'spam.txt', which may not be what you want. Often, it's easier to keep the full pathnames around and os.path.basename them when you want to display them, than to keep just the base names around and os.path.join them when you want to open them… but "often" isn't "always".
Also notice that I had to manually paste the "/Users/x/y/" and "*.txt" together into a single string, the way you would at the command line. That's fine here, but if, say, the first one came from a variable, rather than hardcoded into the source, you'd have to use os.path.join(basepath, "*.txt"), which isn't quite as nice.
By the way, if you're using Python 3.4 or later, you can get the same thing out of the higher-level pathlib library:
import pathlib
matches = list(pathlib.Path("/Users/x/y/").glob("*.txt"))
Maybe defining an utility function is the right path to follow...
def list_ext_in_dir(e,d):
"""e=extension, d= directory => list of matching filenames.
If the directory d cannot be listed returns None."""
from fnmatch import fnmatch
from os import listdir
try:
dirlist = os.listdir(d)
except OSError:
return None
return [fname for fname in dirlist if fnmatch(fname,e)]
I have put the dirlist inside a try except clause to catch the
possibility that we cannot list the directory (non-existent, read
permission, etc). The treatment of errors is a bit simplistic, but...
the list of matching filenames is built using a so called list comprehension, that is something that you should investigate as soon as possible if you're going to use python for your programs.
To close my post, an usage example
l_txtfiles = list_ext_in_dir('*.txt','/Users/x/y;)

How to ignore hidden files using os.listdir()?

My python script executes an os.listdir(path) where the path is a queue containing archives that I need to treat one by one.
The problem is that I'm getting the list in an array and then I just do a simple array.pop(0). It was working fine until I put the project in subversion. Now I get the .svn folder in my array and of course it makes my application crash.
So here is my question: is there a function that ignores hidden files when executing an os.listdir() and if not what would be the best way?
You can write one yourself:
import os
def listdir_nohidden(path):
for f in os.listdir(path):
if not f.startswith('.'):
yield f
Or you can use a glob:
import glob
import os
def listdir_nohidden(path):
return glob.glob(os.path.join(path, '*'))
Either of these will ignore all filenames beginning with '.'.
This is an old question, but seems like it is missing the obvious answer of using list comprehension, so I'm adding it here for completeness:
[f for f in os.listdir(path) if not f.startswith('.')]
As a side note, the docs state listdir will return results in 'arbitrary order' but a common use case is to have them sorted alphabetically. If you want the directory contents alphabetically sorted without regards to capitalization, you can use:
sorted((f for f in os.listdir() if not f.startswith(".")), key=str.lower)
(Edited to use key=str.lower instead of a lambda)
On Windows, Linux and OS X:
if os.name == 'nt':
import win32api, win32con
def folder_is_hidden(p):
if os.name== 'nt':
attribute = win32api.GetFileAttributes(p)
return attribute & (win32con.FILE_ATTRIBUTE_HIDDEN | win32con.FILE_ATTRIBUTE_SYSTEM)
else:
return p.startswith('.') #linux-osx
Joshmaker has the right solution to your question.
How to ignore hidden files using os.listdir()?
In Python 3 however, it is recommended to use pathlib instead of os.
from pathlib import Path
visible_files = [
file for file in Path(".").iterdir() if not file.name.startswith(".")
]
glob:
>>> import glob
>>> glob.glob('*')
(glob claims to use listdir and fnmatch under the hood, but it also checks for a leading '.', not by using fnmatch.)
I think it is too much of work to go through all of the items in a loop. I would prefer something simpler like this:
lst = os.listdir(path)
if '.DS_Store' in lst:
lst.remove('.DS_Store')
If the directory contains more than one hidden files, then this can help:
all_files = os.popen('ls -1').read()
lst = all_files.split('\n')
for platform independence as #Josh mentioned the glob works well:
import glob
glob.glob('*')
filenames = (f.name for f in os.scandir() if not f.name.startswith('.'))
You can just use a simple for loop that will exclude any file or directory that has "." in the front.
Code for professionals:
import os
directory_things = [i for i in os.listdir() if i[0] != "."] # Exclude all with . in the start
Code for noobs
items_in_directory = os.listdir()
final_items_in_directory = []
for i in items_in_directory:
if i[0] != ".": # If the item doesn't have any '.' in the start
final_items_in_directory.append(i)

How can I list the contents of a directory in Python?

Can’t be hard, but I’m having a mental block.
import os
os.listdir("path") # returns list
One way:
import os
os.listdir("/home/username/www/")
Another way:
glob.glob("/home/username/www/*")
Examples found here.
The glob.glob method above will not list hidden files.
Since I originally answered this question years ago, pathlib has been added to Python. My preferred way to list a directory now usually involves the iterdir method on Path objects:
from pathlib import Path
print(*Path("/home/username/www/").iterdir(), sep="\n")
os.walk can be used if you need recursion:
import os
start_path = '.' # current directory
for path,dirs,files in os.walk(start_path):
for filename in files:
print os.path.join(path,filename)
glob.glob or os.listdir will do it.
The os module handles all that stuff.
os.listdir(path)
Return a list containing the names of the entries in the directory given by path.
The list is in arbitrary order. It does not include the special entries '.' and
'..' even if they are present in the directory.
Availability: Unix, Windows.
In Python 3.4+, you can use the new pathlib package:
from pathlib import Path
for path in Path('.').iterdir():
print(path)
Path.iterdir() returns an iterator, which can be easily turned into a list:
contents = list(Path('.').iterdir())
Since Python 3.5, you can use os.scandir.
The difference is that it returns file entries not names. On some OSes like windows, it means that you don't have to os.path.isdir/file to know if it's a file or not, and that saves CPU time because stat is already done when scanning dir in Windows:
example to list a directory and print files bigger than max_value bytes:
for dentry in os.scandir("/path/to/dir"):
if dentry.stat().st_size > max_value:
print("{} is biiiig".format(dentry.name))
(read an extensive performance-based answer of mine here)
Below code will list directories and the files within the dir. The other one is os.walk
def print_directory_contents(sPath):
import os
for sChild in os.listdir(sPath):
sChildPath = os.path.join(sPath,sChild)
if os.path.isdir(sChildPath):
print_directory_contents(sChildPath)
else:
print(sChildPath)

Categories