return found path in glob - python

If I have a glob('path/to/my/**/*.json', recursive = True), function returns something like:
path/to/my/subfolder1/subfolder2/file1.json
path/to/my/subfolder1/subfolder2/file2.json
path/to/my/subfolder1/subfolder2/file3.json
path/to/my/subfolder1/file4.json
path/to/my/file5.json
...
I'd like to get only part that starts after ** in the glob, so
subfolder1/subfolder2/file1.json
subfolder1/subfolder2/file2.json
subfolder1/subfolder2/file3.json
subfolder1/file4.json
file5.json
...
What is the best way to do it? Does glob support it natively? Glob input is provided as commmand line hence direct str-replace may be difficult.

Use os.path.commonprefix on the returned paths, then os.path.relpath using the common prefix to get paths relative to it.
An example from a Node.js project with a whole bunch of package.jsons.
>>> pkgs = glob.glob("node_modules/**/package.json", recursive=True)[:10]
['node_modules/queue-microtask/package.json', 'node_modules/callsites/package.json', 'node_modules/sourcemap-codec/package.json', 'node_modules/reusify/package.json', 'node_modules/is-bigint/package.json', 'node_modules/which-boxed-primitive/package.json', 'node_modules/jsesc/package.json', 'node_modules/#types/scheduler/package.json', 'node_modules/#types/react-dom/package.json', 'node_modules/#types/prop-types/package.json']
>>> pfx = os.path.commonprefix(pkgs)
'node_modules/'
>>> [os.path.relpath(pkg, pfx) for pkg in pkgs]
['queue-microtask/package.json', 'callsites/package.json', 'sourcemap-codec/package.json', 'reusify/package.json', 'is-bigint/package.json', 'which-boxed-primitive/package.json', 'jsesc/package.json', '#types/scheduler/package.json', '#types/react-dom/package.json', '#types/prop-types/package.json']
>>>

Related

Excel file name with wildcard [duplicate]

I want get a list of filenames with a search pattern with a wildcard. Like:
getFilenames.py c:\PathToFolder\*
getFilenames.py c:\PathToFolder\FileType*.txt
getFilenames.py c:\PathToFolder\FileTypeA.txt
How can I do this?
You can do it like this:
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']
Note:
If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:
>>> import glob
>>> glob.glob('*.gif')
['card.gif']
>>> glob.glob('.c*')
['.card.gif']
This comes straight from here: http://docs.python.org/library/glob.html
glob is useful if you are doing this in within python, however, your shell may not be passing in the * (I'm not familiar with the windows shell).
For example, when I do the following:
import sys
print sys.argv
On my shell, I type:
$ python test.py *.jpg
I get this:
['test.py', 'test.jpg', 'wasp.jpg']
Notice that argv does not contain "*.jpg"
The important lesson here is that most shells will expand the asterisk at the shell, before it is passed to your application.
In this case, to get the list of files, I would just do sys.argv[1:]. Alternatively, you could escape the *, so that python sees the literal *. Then, you can use the glob module.
$ getFileNames.py "*.jpg"
or
$ getFileNames.py \*.jpg
from glob import glob
import sys
files = glob(sys.argv[1])
If you're on Python 3.5+, you can use pathlib's glob() instead of the glob module alone.
Getting all files in a directory looks like this:
from pathlib import Path
for path in Path("/path/to/directory").glob("*"):
print(path)
Or, to just get a list of all .txt files in a directory, you could do this:
from pathlib import Path
for path in Path("/path/to/directory").glob("*.txt"):
print(path)
Finally, you can search recursively (i.e., to find all .txt files in your target directory and all subdirectories) using a wildcard directory:
from pathlib import Path
for path in Path("/path/to/directory").glob("**/*.txt"):
print(path)
I am adding this to the previous because I found this very useful when you want your scripts to work on multiple shell and with multiple parameters using *.
If you want something that works on every shells, you can do the following (still using glob):
>>> import glob
>>> from functools import reduce # if using python 3+
>>> reduce(lambda r, x: r + glob.glob(x), sys.argv[1:], [])
Note that it can produce duplicate (if you have a test file and you give t* and te*), but you can simply remove them using a set:
>>> set(reduce(lambda r, x: r + glob.glob(x), sys.argv[1:], []))

How do I get the parent directory's name only, not full path?

I am trying to get the parent directory's name only. Meaning, only its last component, not the full path.
So for example for the path a/b/c/d/e I want to get d, and not a/b/c/d.
My current code:
import os
path = "C:/example/folder/file1.jpg"
directoryName = os.path.dirname(os.path.normpath(path))
print(directoryName)
This prints out C:/example/folder and I want to get just folder.
The simplest way to do this would be using pathlib. Using parent will get you the parent's full path, and name will give you just the last component:
>>> from pathlib import Path
>>> path = Path("/a/b/c/d/e")
>>> path.parent.name
'd'
For comparison, to do the same with os.path, you will need to get the basename of the dirname of your path. So that translates directly to:
import os
path = "C:/example/folder/file1.jpg"
print(os.path.basename(os.path.dirname(path)))
Which is the nicer version of:
os.path.split(os.path.split(path)[0])[1]
Where both give:
'folder'
As you can see, the pathlib approach is much clearer and readable. Because pathlib incorporates the OOP approach for representing paths, instead of strings, we get a clear chain of attributes/method calls.
path.parent.name
Is read in order as:
start from path -> take its parent -> take its name
Whereas in the os functions-accepting-strings approach you actually need to read from inside-out!
os.path.basename(os.path.dirname(path))
Is read in order as:
The name of the parent of the path
Which I'm sure you'll agree is much harder to read and understand (and this is just a simple-case example).
You could also use the str.split method together with os.sep:
>>> path = "C:\\example\\folder\\file1.jpg"
>>> path.split(os.sep)[-2]
'folder'
But as the docs state:
Note that knowing this [(the separator)] is not sufficient to be able to parse or
concatenate pathnames — use os.path.split() and os.path.join() — but
it is occasionally useful.
Use pathlib.Path to get the .name of the .parent:
from pathlib import Path
p = Path("C:/example/folder/file1.jpg")
print(p.parent.name) # folder
Compared to os.path, pathlib represents paths as a separate type instead of strings. It generally is shorter and more convenient to use.
this works
path = "C:/example/folder/file1.jpg"
directoryName = os.path.dirname(path)
parent = directoryName.split("/")
parent.reverse()
print(parent[0])
Simple to solve using pathlib
0. Import Path from pathlib
from pathlib import Path
path = "C:/example/folder/file1.jpg"
1. Get parent level 1
parent_lv1 = Path(path).parent
2. Get parent level 2
parent_lv2 = parent_lv1.parent
3. Get immediate parent
imm_parent = parent_lv1.relative_to(parent_lv2)
print(imm_parent)
I prefer regex
import re
def get_parent(path: str) -> str:
match = re.search(r".*[\\|/](\w+)[\\|/].*", path)
if match:
return match.group(1)
else:
return ""
if __name__ == '__main__':
my_path = "/home/tony/some/cool/path"
print(get_parent(my_path))
win_path = r"C:\windows\path\has\dumb\backslashes"
print(get_parent(win_path))
Output
cool
dumb

How can I check for basename (only part of the name) file existence using Python?

I think the title explains it all. I need to check existence for a file that contains the word data in its name. I tried something like that os.path.exists(/d/prog/*data.txt) but that doesn't work.
Try glob for it:
from glob import glob
if glob('/d/prog/*data.txt'):
# blah blah
You can't just do it with os.path.exists -- it expects full pathname. If do not know exact filename, you should first find this file on filesystem (and when you do, it proves that file exists).
One option is to list directory (-ies), and find file manually:
>>> import os
>>> file_list = os.listdir('/etc')
>>> [fn for fn in file_list if 'deny' in fn]
['hostapd.deny', 'at.deny', 'cron.deny', 'hosts.deny']
Another and more flexible option is to use glob.glob, which allows to use wildcards such as *, ? and [...]:
>>> import glob
>>> glob.glob('/etc/*deny*')
['/etc/hostapd.deny', '/etc/at.deny', '/etc/cron.deny', '/etc/hosts.deny']
If you are on Windows:
>>> import glob
>>> glob.glob('D:\\test\\*data*')
['D:\\test\\data.1028.txt', 'D:\\test\\data2.1041.txt']

python:extract certain part of string

I have a string from which I would like to extract certain part. The string looks like :
E:/test/my_code/content/dir/disp_temp_2.hgx
This is a path on a machine for a specific file with extension hgx
I would exactly like to capture "disp_temp_2". The problem is that I used strip function, does not work for me correctly as there are many '/'. Another problem is that, that the above location will change always on the computer.
Is there any method so that I can capture the exact string between the last '/' and '.'
My code looks like:
path = path.split('.')
.. now I cannot split based on the last '/'.
Any ideas how to do this?
Thanks
Use the os.path module:
import os.path
filename = "E:/test/my_code/content/dir/disp_temp_2.hgx"
name = os.path.basename(filename).split('.')[0]
Python comes with the os.path module, which gives you much better tools for handling paths and filenames:
>>> import os.path
>>> p = "E:/test/my_code/content/dir/disp_temp_2.hgx"
>>> head, tail = os.path.split(p)
>>> tail
'disp_temp_2.hgx'
>>> os.path.splitext(tail)
('disp_temp_2', '.hgx')
Standard libs are cool:
>>> from os import path
>>> f = "E:/test/my_code/content/dir/disp_temp_2.hgx"
>>> path.split(f)[1].rsplit('.', 1)[0]
'disp_temp_2'
Try this:
path=path.rsplit('/',1)[1].split('.')[0]
path = path.split('/')[-1].split('.')[0] works.
You can use the split on the other part :
path = path.split('/')[-1].split('.')[0]

How to get an absolute file path in Python

Given a path such as "mydir/myfile.txt", how do I find the file's absolute path in Python? E.g. on Windows, I might end up with:
"C:/example/cwd/mydir/myfile.txt"
>>> import os
>>> os.path.abspath("mydir/myfile.txt")
'C:/example/cwd/mydir/myfile.txt'
Also works if it is already an absolute path:
>>> import os
>>> os.path.abspath("C:/example/cwd/mydir/myfile.txt")
'C:/example/cwd/mydir/myfile.txt'
You could use the new Python 3.4 library pathlib. (You can also get it for Python 2.6 or 2.7 using pip install pathlib.) The authors wrote: "The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them."
To get an absolute path in Windows:
>>> from pathlib import Path
>>> p = Path("pythonw.exe").resolve()
>>> p
WindowsPath('C:/Python27/pythonw.exe')
>>> str(p)
'C:\\Python27\\pythonw.exe'
Or on UNIX:
>>> from pathlib import Path
>>> p = Path("python3.4").resolve()
>>> p
PosixPath('/opt/python3/bin/python3.4')
>>> str(p)
'/opt/python3/bin/python3.4'
Docs are here: https://docs.python.org/3/library/pathlib.html
import os
os.path.abspath(os.path.expanduser(os.path.expandvars(PathNameString)))
Note that expanduser is necessary (on Unix) in case the given expression for the file (or directory) name and location may contain a leading ~/(the tilde refers to the user's home directory), and expandvars takes care of any other environment variables (like $HOME).
Install a third-party path module (found on PyPI), it wraps all the os.path functions and other related functions into methods on an object that can be used wherever strings are used:
>>> from path import path
>>> path('mydir/myfile.txt').abspath()
'C:\\example\\cwd\\mydir\\myfile.txt'
Update for Python 3.4+ pathlib that actually answers the question:
from pathlib import Path
relative = Path("mydir/myfile.txt")
absolute = relative.absolute() # absolute is a Path object
If you only need a temporary string, keep in mind that you can use Path objects with all the relevant functions in os.path, including of course abspath:
from os.path import abspath
absolute = abspath(relative) # absolute is a str object
This always gets the right filename of the current script, even when it is called from within another script. It is especially useful when using subprocess.
import sys,os
filename = sys.argv[0]
from there, you can get the script's full path with:
>>> os.path.abspath(filename)
'/foo/bar/script.py'
It also makes easier to navigate folders by just appending /.. as many times as you want to go 'up' in the directories' hierarchy.
To get the cwd:
>>> os.path.abspath(filename+"/..")
'/foo/bar'
For the parent path:
>>> os.path.abspath(filename+"/../..")
'/foo'
By combining "/.." with other filenames, you can access any file in the system.
Today you can also use the unipath package which was based on path.py: http://sluggo.scrapping.cc/python/unipath/
>>> from unipath import Path
>>> absolute_path = Path('mydir/myfile.txt').absolute()
Path('C:\\example\\cwd\\mydir\\myfile.txt')
>>> str(absolute_path)
C:\\example\\cwd\\mydir\\myfile.txt
>>>
I would recommend using this package as it offers a clean interface to common os.path utilities.
You can use this to get absolute path of a specific file.
from pathlib import Path
fpath = Path('myfile.txt').absolute()
print(fpath)
Given a path such as mydir/myfile.txt, how do I find the file's absolute path relative to the current working directory in Python?
I would do it like this,
import os.path
os.path.join( os.getcwd(), 'mydir/myfile.txt' )
That returns '/home/ecarroll/mydir/myfile.txt'
if you are on a mac
import os
upload_folder = os.path.abspath("static/img/users")
this will give you a full path:
print(upload_folder)
will show the following path:
>>>/Users/myUsername/PycharmProjects/OBS/static/img/user
In case someone is using python and linux and looking for full path to file:
>>> path=os.popen("readlink -f file").read()
>>> print path
abs/path/to/file

Categories