search in wildcard folders recursively in python - python

hello im trying to do something like
// 1. for x in glob.glob('/../../nodes/*/views/assets/js/*.js'):
// 2 .for x in glob.glob('/../../nodes/*/views/assets/js/*/*.js'):
print x
is there anything can i do to search it recuresively ?
i already looked into Use a Glob() to find files recursively in Python? but the os.walk dont accept wildcards folders like above between nodes and views, and the http://docs.python.org/library/glob.html docs that dosent help much.
thanks

Caveat: This will also select any files matching the pattern anywhere beneath the root folder which is nodes/.
import os, fnmatch
def locate(pattern, root_path):
for path, dirs, files in os.walk(os.path.abspath(root_path)):
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)
As os.walk does not accept wildcards we walk the tree and filter what we need.
js_assets = [js for js in locate('*.js', '/../../nodes')]
The locate function yields an iterator of all files which match the pattern.
Alternative solution: You can try the extended glob which adds recursive searching to glob.
Now you can write a much simpler expression like:
fnmatch.filter( glob.glob('/../../nodes/*/views/assets/js/**/*'), '*.js' )

I answered a similar question here: fnmatch and recursive path match with `**`
You could use glob2 or formic, both available via easy_install or pip.
GLOB2
FORMIC
You can find them both mentioned here:
Use a Glob() to find files recursively in Python?
I use glob2 a lot, ex:
import glob2
files = glob2.glob(r'C:\Users\**\iTunes\**\*.mp4')

Why don't you split your wild-carded paths into multiple parts, like:
parent_path = glob.glob('/../../nodes/*')
for p in parent_path:
child_paths = glob.glob(os.path.join(p, './views/assets/js/*.js'))
for c in child_paths:
#do something
You can replace some of the above with a list of child assets that you want to retrieve.
Alternatively, if your environment provides the find command, that provides better support for this kind of task. If you're in Windows, there may be an analogous program.

Related

Iterate list contents and find json files

Kind of breaking my head to find an elegant solution for the below scenario.
A list has directory structure, as its contents & need to find whether request.json or response.json is available in the result folder of the directory structure or one level below the result folder, called - scheduler.
Let's say:
input_paths = ['/root/services/builds/tesla', '/root/services/builds/google/apis', '/root/services/builds/qa/tests', '/root/services/builds/airlines_Solutions', '/root/services/builds/traffic_patterns/api']
Output should be:
/root/services/builds/tesla/result/request.json
/root/services/builds/google/apis/result/scheduler/request.json
/root/services/builds/qa/tests/result/scheduler/response.json
My code has multiple for loops that has os.walk and globs and looks pathetic. Looking forward to learn and understand a simple solution. Thanks.
You could simply use os.exists
import os
for i in input_paths:
if os.path.exists(os.path.join(i, "result/request.json")):
print(os.path.join(i, "result/request.json"))
elif os.path.exists(os.path.join(i, "result/scheduler/request.json")):
print(os.path.join(i, "result/scheduler/request.json"))
To recursively find all request.json or response.json in all you input_paths you could do:
import os
file_name_pattern = ['request.json', 'response.json', ]
for path in input_paths:
for root, dirs, files in os.walk(path):
for name in files:
if name in file_name_pattern:
print(os.path.join(root, name))
If the directory structure is mandatory to your findings you should add another filter for dirs.
The answer of sushanth might be more efficient due to prevent the algorithm to search recursively but it might be a hassle to configure the code if more search locations become relevant.

Python search files with multiple extensions

I wish to search a directory, and all contained subdirectories, for files with a substring contained within their name and one of three possible extensions
Please can you help me edit the following code
os.chdir(directory)
files = glob.glob("**/*{}*.pro".format(myStr), recursive = True)
I wish to find files with the extension .pro, .bd3 and .mysql
I'm running Python 3.5
You could create a list and loop over it
exten_to_find = ['.pro','bd3','.mysql']
you could format like this for iteration
files = glob.glob("**/*{x}*.{y}".format(x = myStr, y = extension_toFind), recursive = True)
you could try:
def get_files_with_extension(my_str, exts):
for f in glob.iglob("**/*{}*.*".format(my_str), recursive = True):
if any(f.endswith(ext) for ext in exts):
yield f
Actual-glob syntax has no way to do this. The "enhanced glob" syntaxes of most modern shells can, but I'm pretty sure Python's glob module is only very lightly enhanced.
Under the covers, glob is a pretty simple module, and the docs link to the source. As you can see, it ultimately defers to fnmatch, which is also a pretty simple module, and while ultimately just builds a regex and defers to that. And of course you can do alternations in a regex.
So, one option is to fork all the code from glob.py and fnmatch.py so you can build a fancier pattern to pass down to re.
But the simplest thing to do is just stop using glob here. It's the wrong tool for the job. Just use os.walk and filter things yourself.
If you understand how to write a regex like r'.*{}.*\.(pro|md3|mysql)'.format(myStr), use that to filter; if not, just write what you do know how to do; the performance cost will probably be minimal, and you'll be able to extend and maintain it yourself.
files = []
for root, dirnames, filenames in os.walk('.'):
for file in filenames:
fname, fext = os.path.splitext(file)
if fext in {'pro', 'md3', 'mysql'} and myStr in fname:
files.append(os.path.join(root, file))
If it turns out that doing a set method and a string method really is so much slower than regex that it makes a difference, and you can't write the regex yourself, come back and ask a new question. (I wouldn't count on the one I used above, if you can't figure out how to debug it.)
Also, if you're using Python before… I think 3.5… os.walk may actually be inherently slower than iglob. In that case, you'll want to look for betterwalk on PyPI, the module that the current implementation is based on.

How to find whole path to a file on computer using python

I am struggling a little bit with a task in python. I want to find path to a secific file using only glob and re module. Is it possible to find whole path to specific file in my computer using only glob module? Any hints strongly appreciated!
That's depend on wheter the file is findable using a glob. For example you can't search recursively for any file called hello.txt, but you can search for a file called bar.txt in a subdirectory of foo using foo/*/bar.txt. If you have a glob for the file (including path) you could use it directly.
If you want to search recursively one way is to (badly) emulate os.walk (which is clumpsy because it's more elegant to use os.walk directly). You list files and directories in a directory by using glob("{0}/*".format(path)) (returns empty list if path is not a directory), then you just do that recursively and can then use re to filter out the results you want:
def stupidly_list_files(path=""):
for p in glob("{0}*".format(path)):
yield p
for x in stupidly_list_files("{0}/".format(p)):
yield x
def stupidly_match_files(regex):
for p in stupidly_list_files():
if regex.match(p):
yield p
Note neither glob nor re modules know anything about the current working directory, so if you're looking for absulute path you're out of luck unless you know the absolute path where you want to root your search.

How can I list the contents of a directory in Python?

Can’t be hard, but I’m having a mental block.
import os
os.listdir("path") # returns list
One way:
import os
os.listdir("/home/username/www/")
Another way:
glob.glob("/home/username/www/*")
Examples found here.
The glob.glob method above will not list hidden files.
Since I originally answered this question years ago, pathlib has been added to Python. My preferred way to list a directory now usually involves the iterdir method on Path objects:
from pathlib import Path
print(*Path("/home/username/www/").iterdir(), sep="\n")
os.walk can be used if you need recursion:
import os
start_path = '.' # current directory
for path,dirs,files in os.walk(start_path):
for filename in files:
print os.path.join(path,filename)
glob.glob or os.listdir will do it.
The os module handles all that stuff.
os.listdir(path)
Return a list containing the names of the entries in the directory given by path.
The list is in arbitrary order. It does not include the special entries '.' and
'..' even if they are present in the directory.
Availability: Unix, Windows.
In Python 3.4+, you can use the new pathlib package:
from pathlib import Path
for path in Path('.').iterdir():
print(path)
Path.iterdir() returns an iterator, which can be easily turned into a list:
contents = list(Path('.').iterdir())
Since Python 3.5, you can use os.scandir.
The difference is that it returns file entries not names. On some OSes like windows, it means that you don't have to os.path.isdir/file to know if it's a file or not, and that saves CPU time because stat is already done when scanning dir in Windows:
example to list a directory and print files bigger than max_value bytes:
for dentry in os.scandir("/path/to/dir"):
if dentry.stat().st_size > max_value:
print("{} is biiiig".format(dentry.name))
(read an extensive performance-based answer of mine here)
Below code will list directories and the files within the dir. The other one is os.walk
def print_directory_contents(sPath):
import os
for sChild in os.listdir(sPath):
sChildPath = os.path.join(sPath,sChild)
if os.path.isdir(sChildPath):
print_directory_contents(sChildPath)
else:
print(sChildPath)

How would you implement ant-style patternsets in python to select groups of files?

Ant has a nice way to select groups of files, most handily using ** to indicate a directory tree. E.g.
**/CVS/* # All files immediately under a CVS directory.
mydir/mysubdir/** # All files recursively under mysubdir
More examples can be seen here:
http://ant.apache.org/manual/dirtasks.html
How would you implement this in python, so that you could do something like:
files = get_files("**/CVS/*")
for file in files:
print file
=>
CVS/Repository
mydir/mysubdir/CVS/Entries
mydir/mysubdir/foo/bar/CVS/Entries
Sorry, this is quite a long time after your OP. I have just released a Python package which does exactly this - it's called Formic and it's available at the PyPI Cheeseshop. With Formic, your problem is solved with:
import formic
fileset = formic.FileSet(include="**/CVS/*", default_excludes=False)
for file_name in fileset.qualified_files():
print file_name
There is one slight complexity: default_excludes. Formic, just like Ant, excludes CVS directories by default (as for the most part collecting files from them for a build is dangerous), the default answer to the question would result in no files. Setting default_excludes=False disables this behaviour.
As soon as you come across a **, you're going to have to recurse through the whole directory structure, so I think at that point, the easiest method is to iterate through the directory with os.walk, construct a path, and then check if it matches the pattern. You can probably convert to a regex by something like:
def glob_to_regex(pat, dirsep=os.sep):
dirsep = re.escape(dirsep)
print re.escape(pat)
regex = (re.escape(pat).replace("\\*\\*"+dirsep,".*")
.replace("\\*\\*",".*")
.replace("\\*","[^%s]*" % dirsep)
.replace("\\?","[^%s]" % dirsep))
return re.compile(regex+"$")
(Though note that this isn't that fully featured - it doesn't support [a-z] style glob patterns for instance, though this could probably be added). (The first \*\*/ match is to cover cases like \*\*/CVS matching ./CVS, as well as having just \*\* to match at the tail.)
However, obviously you don't want to recurse through everything below the current dir when not processing a ** pattern, so I think you'll need a two-phase approach. I haven't tried implementing the below, and there are probably a few corner cases, but I think it should work:
Split the pattern on your directory seperator. ie pat.split('/') -> ['**','CVS','*']
Recurse through the directories, and look at the relevant part of the pattern for this level. ie. n levels deep -> look at pat[n].
If pat[n] == '**' switch to the above strategy:
Reconstruct the pattern with dirsep.join(pat[n:])
Convert to a regex with glob\_to\_regex()
Recursively os.walk through the current directory, building up the path relative to the level you started at. If the path matches the regex, yield it.
If pat doesn't match "**", and it is the last element in the pattern, then yield all files/dirs matching glob.glob(os.path.join(curpath,pat[n]))
If pat doesn't match "**", and it is NOT the last element in the pattern, then for each directory, check if it matches (with glob) pat[n]. If so, recurse down through it, incrementing depth (so it will look at pat[n+1])
os.walk is your friend. Look at the example in the Python manual
(https://docs.python.org/2/library/os.html#os.walk) and try to build something from that.
To match "**/CVS/*" against a file name you get, you can do something like this:
def match(pattern, filename):
if pattern.startswith("**"):
return fnmatch.fnmatch(file, pattern[1:])
else:
return fnmatch.fnmatch(file, pattern)
In fnmatch.fnmatch, "*" matches anything (including slashes).
There's an implementation in the 'waf' build system source code.
http://code.google.com/p/waf/source/browse/trunk/waflib/Node.py?r=10755#471
May be this should be wrapped up in a library of its own?
Yup. Your best bet is, as has already been suggested, to work with 'os.walk'. Or, write wrappers around 'glob' and 'fnmatch' modules, perhaps.
os.walk is your best bet for this. I did the example below with .svn because I had that handy, and it worked great:
import re
for (dirpath, dirnames, filenames) in os.walk("."):
if re.search(r'\.svn$', dirpath):
for file in filenames:
print file

Categories