How to find whole path to a file on computer using python - python

I am struggling a little bit with a task in python. I want to find path to a secific file using only glob and re module. Is it possible to find whole path to specific file in my computer using only glob module? Any hints strongly appreciated!

That's depend on wheter the file is findable using a glob. For example you can't search recursively for any file called hello.txt, but you can search for a file called bar.txt in a subdirectory of foo using foo/*/bar.txt. If you have a glob for the file (including path) you could use it directly.
If you want to search recursively one way is to (badly) emulate os.walk (which is clumpsy because it's more elegant to use os.walk directly). You list files and directories in a directory by using glob("{0}/*".format(path)) (returns empty list if path is not a directory), then you just do that recursively and can then use re to filter out the results you want:
def stupidly_list_files(path=""):
for p in glob("{0}*".format(path)):
yield p
for x in stupidly_list_files("{0}/".format(p)):
yield x
def stupidly_match_files(regex):
for p in stupidly_list_files():
if regex.match(p):
yield p
Note neither glob nor re modules know anything about the current working directory, so if you're looking for absulute path you're out of luck unless you know the absolute path where you want to root your search.

Related

What is the fastest method of finding a file in Linux and Windows using Python?

I am writing a plug-in for RawTherapee in Python. I need to extract the version number from a file called 'AboutThisBuild.txt' that may exist anywhere in the directory tree. Although RawTherapee knows where it is installed this data is baked into the binary file.
My plug-in is being designed to collect basic system data when run without any command line parameters for the purpose of short circuiting troubleshooting. By having the version number, revision number and changeset (AKA Mercurial), I can sort out why the script may not be working as expected. OK that is the context.
I have tried a variety of methods, some suggested elsewhere on this site. The main one is using os.walk and fnmatch.
The problem is speed. Searching the entire directory tree is like watching paint dry!
To reduce load I have tried to predict likely hiding places and only traverse these. This is quicker but has the obvious disadvantage of missing some files.
This is what I have at the moment. Tested on Linux but not Windows as yet as I am still researching where the file might be placed.
import fnmatch
import os
import sys
rootPath = ('/usr/share/doc/rawtherapee',
'~',
'/media/CoreData/opt/',
'/opt')
pattern = 'AboutThisBuild.txt'
# Return the first instance of RT found in the paths searched
for CheckPath in rootPath:
print("\n")
print(">>>>>>>>>>>>> " + CheckPath)
print("\n")
for root, dirs, files in os.walk(CheckPath, True, None, False):
for filename in fnmatch.filter(files, pattern):
print( os.path.join(root, filename))
break
Usually 'AboutThisBuild.txt' is stored in a directory/subdirectory called 'rawtherapee' or has the string somewhere in the directory tree. I had naively though I could get the 5000 odd directory names and search these for 'rawtherapee' then use os.walk to traverse those directories but all modules and functions I have looked at collate all files in the directory (again).
Anyone have a quicker method of searching the entire directory tree or am I stuck with this hybrid option?
I am a beginner in Python, but I think I know the simplest way of finding a file in Windows.
import os
for dirpath, subdirs, filenames in os.walk('The directory you wanna search the file in'):
if 'name of your file with extension' in filenames:
print(dirpath)
This code will print out the directory of the file you are searching for in the console. All you have to do is get to the directory.
The thing about searching is that it doesn't matter too much how you get there (eg cheating). Once you have a result, you can verify it is correct relatively quickly.
You may be able to identify candidate locations fairly efficiently by guessing. For example, on Linux, you could first try looking in these locations (obviously not all are directories, but it doesn't do any harm to os.path.isfile('/;l$/AboutThisBuild.txt'))
$ strings /usr/bin/rawtherapee | grep '^/'
/lib/ld-linux.so.2
/H=!
/;l$
/9T$,
/.ba
/usr/share/rawtherapee
/usr/share/doc/rawtherapee
/themes/
/themes/slim
/options
/usr/share/color/icc
/cache
/languages/default
/languages/
/languages
/themes
/batch/queue
/batch/
/dcpprofiles
/#q=
/N6rtexif16NAISOInterpreterE
If you have it installed, you can try the locate command
If you still don't find it, move on to the brute force method
Here is a rough equivalent of strings using Python
>>> from string import printable, whitespace
>>> from itertools import groupby
>>> pathchars = set(printable) - set(whitespace)
>>> with open("/usr/bin/rawtherapee") as fp:
... data = fp.read()
...
>>> for k, g in groupby(data, pathchars.__contains__):
... if not k: continue
... g = ''.join(g)
... if len(g) > 3 and g.startswith("/"):
... print g
...
/lib64/ld-linux-x86-64.so.2
/^W0Kq[
/pW$<
/3R8
/)wyX
/WUO
/w=H
/t_1
/.badpixH
/d$(
/\$P
/D$Pv
/D$#
/D$(
/l$#
/d$#v?H
/usr/share/rawtherapee
/usr/share/doc/rawtherapee
/themes/
/themes/slim
/options
/usr/share/color/icc
/cache
/languages/default
/languages/
/languages
/themes
/batch/queue.csv
/batch/
/dcpprofiles
/#q=
/N6rtexif16NAISOInterpreterE
It sounds like you need a pure python solution here. If not, other answers will suffice.
In this case, you should traverse the folders using a queue and threads. While some may say Threads are never the solution, Threads are a great way of speeding up when you are I/O bound, which you are in this case. Essentially, you'll os.listdir the current dir. If it contains your file, party like it's 1999. If it doesn't, add each subfolder to the work queue.
If you're clever, you can play with depth first vs breadth first traversal to get the best results.
There is a great example I have used quite successfully at work at http://www.tutorialspoint.com/python/python_multithreading.htm. See the section titled Multithreaded Priority Queue. The example could probably be updated to include threadpools though, but it's not necessary.

Python - Opening successive Files without physically opening every one

If I am to read a number of files in Python 3.2, say 30-40, and i want to keep the file references in a list
(all the files are in a common folder)
Is there anyway how i can open all the files to their respective file handles in the list, without having to individually open every file via the file.open() function
This is simple, just use a list comprehension based on your list of file paths. Or if you only need to access them one at a time, use a generator expression to avoid keeping all forty files open at once.
list_of_filenames = ['/foo/bar', '/baz', '/tmp/foo']
open_files = [open(f) for f in list_of_filenames]
If you want handles on all the files in a certain directory, use the os.listdir function:
import os
open_files = [open(f) for f in os.listdir(some_path)]
I've assumed a simple, flat directory here, but note that os.listdir returns a list of paths to all file objects in the given directory, whether they are "real" files or directories. So if you have directories within the directory you're opening, you'll want to filter the results using os.path.isfile:
import os
open_files = [open(f) for f in os.listdir(some_path) if os.path.isfile(f)]
Also, os.listdir only returns the bare filename, rather than the whole path, so if the current working directory is not some_path, you'll want to make absolute paths using os.path.join.
import os
open_files = [open(os.path.join(some_path, f)) for f in os.listdir(some_path)
if os.path.isfile(f)]
With a generator expression:
import os
all_files = (open(f) for f in os.listdir(some_path)) # note () instead of []
for f in all_files:
pass # do something with the open file here.
In all cases, make sure you close the files when you're done with them. If you can upgrade to Python 3.3 or higher, I recommend you use an ExitStack for one more level of convenience .
The os library (and listdir in particular) should provide you with the basic tools you need:
import os
print("\n".join(os.listdir())) # returns all of the files (& directories) in the current directory
Obviously you'll want to call open with them, but this gives you the files in an iterable form (which I think is the crux of the issue you're facing). At this point you can just do a for loop and open them all (or some of them).
quick caveat: Jon Clements pointed out in the comments of Henry Keiter's answer that you should watch out for directories, which will show up in os.listdir along with files.
Additionally, this is a good time to write in some filtering statements to make sure you only try to open the right kinds of files. You might be thinking you'll only ever have .txt files in a directory now, but someday your operating system (or users) will have a clever idea to put something else in there, and that could throw a wrench in your code.
Fortunately, a quick filter can do that, and you can do it a couple of ways (I'm just going to show a regex filter):
import os,re
scripts=re.compile(".*\.py$")
files=[open(x,'r') for x in os.listdir() if os.path.isfile(x) and scripts.match(x)]
files=map(lambda x:x.read(),files)
print("\n".join(files))
Note that I'm not checking things like whether I have permission to access the file, so if I have the ability to see the file in the directory but not permission to read it then I'll hit an exception.

search in wildcard folders recursively in python

hello im trying to do something like
// 1. for x in glob.glob('/../../nodes/*/views/assets/js/*.js'):
// 2 .for x in glob.glob('/../../nodes/*/views/assets/js/*/*.js'):
print x
is there anything can i do to search it recuresively ?
i already looked into Use a Glob() to find files recursively in Python? but the os.walk dont accept wildcards folders like above between nodes and views, and the http://docs.python.org/library/glob.html docs that dosent help much.
thanks
Caveat: This will also select any files matching the pattern anywhere beneath the root folder which is nodes/.
import os, fnmatch
def locate(pattern, root_path):
for path, dirs, files in os.walk(os.path.abspath(root_path)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)
As os.walk does not accept wildcards we walk the tree and filter what we need.
js_assets = [js for js in locate('*.js', '/../../nodes')]
The locate function yields an iterator of all files which match the pattern.
Alternative solution: You can try the extended glob which adds recursive searching to glob.
Now you can write a much simpler expression like:
fnmatch.filter( glob.glob('/../../nodes/*/views/assets/js/**/*'), '*.js' )
I answered a similar question here: fnmatch and recursive path match with `**`
You could use glob2 or formic, both available via easy_install or pip.
GLOB2
FORMIC
You can find them both mentioned here:
Use a Glob() to find files recursively in Python?
I use glob2 a lot, ex:
import glob2
files = glob2.glob(r'C:\Users\**\iTunes\**\*.mp4')
Why don't you split your wild-carded paths into multiple parts, like:
parent_path = glob.glob('/../../nodes/*')
for p in parent_path:
child_paths = glob.glob(os.path.join(p, './views/assets/js/*.js'))
for c in child_paths:
#do something
You can replace some of the above with a list of child assets that you want to retrieve.
Alternatively, if your environment provides the find command, that provides better support for this kind of task. If you're in Windows, there may be an analogous program.

How can I list the contents of a directory in Python?

Can’t be hard, but I’m having a mental block.
import os
os.listdir("path") # returns list
One way:
import os
os.listdir("/home/username/www/")
Another way:
glob.glob("/home/username/www/*")
Examples found here.
The glob.glob method above will not list hidden files.
Since I originally answered this question years ago, pathlib has been added to Python. My preferred way to list a directory now usually involves the iterdir method on Path objects:
from pathlib import Path
print(*Path("/home/username/www/").iterdir(), sep="\n")
os.walk can be used if you need recursion:
import os
start_path = '.' # current directory
for path,dirs,files in os.walk(start_path):
for filename in files:
print os.path.join(path,filename)
glob.glob or os.listdir will do it.
The os module handles all that stuff.
os.listdir(path)
Return a list containing the names of the entries in the directory given by path.
The list is in arbitrary order. It does not include the special entries '.' and
'..' even if they are present in the directory.
Availability: Unix, Windows.
In Python 3.4+, you can use the new pathlib package:
from pathlib import Path
for path in Path('.').iterdir():
print(path)
Path.iterdir() returns an iterator, which can be easily turned into a list:
contents = list(Path('.').iterdir())
Since Python 3.5, you can use os.scandir.
The difference is that it returns file entries not names. On some OSes like windows, it means that you don't have to os.path.isdir/file to know if it's a file or not, and that saves CPU time because stat is already done when scanning dir in Windows:
example to list a directory and print files bigger than max_value bytes:
for dentry in os.scandir("/path/to/dir"):
if dentry.stat().st_size > max_value:
print("{} is biiiig".format(dentry.name))
(read an extensive performance-based answer of mine here)
Below code will list directories and the files within the dir. The other one is os.walk
def print_directory_contents(sPath):
import os
for sChild in os.listdir(sPath):
sChildPath = os.path.join(sPath,sChild)
if os.path.isdir(sChildPath):
print_directory_contents(sChildPath)
else:
print(sChildPath)

Using Python, how do I get an array of file info objects, based on a search of a file system?

Currently I have a bash script which runs the find command, like so:
find /storage/disk-1/Media/Video/TV -name *.avi -mtime -7
This gets a list of TV shows that were added to my system in the last 7 days. I then go on to create some symbolic links so I can get to my newest TV shows.
I'm looking to re-code this in Python, but I have a few questions I can seem to find the answers for using Google (maybe I'm not searching for the right thing). I think the best way to sum this up is to ask the question:
How do I perform a search on my file system (should I call find?) which gives me an array of file info objects (containing the modify date, file name, etc) so that I may sort them based on date, and other such things?
import os, time
allfiles = []
now = time.time()
# walk will return triples (current dir, list of subdirs, list of regular files)
# file names are relative to dir at first
for dir, subdirs, files in os.walk("/storage/disk-1/Media/Video/TV"):
for f in files:
if not f.endswith(".avi"):
continue
# compute full path name
f = os.path.join(dir, f)
st = os.stat(f)
if st.st_mtime < now - 3600*24*7:
# too old
continue
allfiles.append((f, st))
This will return all files that find also returned, as a list of pairs (filename, stat result).
look into module os: os.walk is the function which walks the file system, os.path is the module which gives the file mtime and other file informations. also os.path defines a lot of functions for parsing and splitting filenames.
also of interest, module glob defines a functions for "globbing" strings (matching a string using unix wildcards rules)
from this, building a list of file matching some criterion should be easy.
You can use "find" through the "subprocess" module.
Afterwards, use the "split" string function to dissect each line
For each file, use the OS module (e.g. getmtime etc.) to get file information
or
Use the "walk" and "glob" modules to get the file paths in objects

Categories