Excel file name with wildcard [duplicate] - python

I want get a list of filenames with a search pattern with a wildcard. Like:
getFilenames.py c:\PathToFolder\*
getFilenames.py c:\PathToFolder\FileType*.txt
getFilenames.py c:\PathToFolder\FileTypeA.txt
How can I do this?

You can do it like this:
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']
Note:
If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:
>>> import glob
>>> glob.glob('*.gif')
['card.gif']
>>> glob.glob('.c*')
['.card.gif']
This comes straight from here: http://docs.python.org/library/glob.html

glob is useful if you are doing this in within python, however, your shell may not be passing in the * (I'm not familiar with the windows shell).
For example, when I do the following:
import sys
print sys.argv
On my shell, I type:
$ python test.py *.jpg
I get this:
['test.py', 'test.jpg', 'wasp.jpg']
Notice that argv does not contain "*.jpg"
The important lesson here is that most shells will expand the asterisk at the shell, before it is passed to your application.
In this case, to get the list of files, I would just do sys.argv[1:]. Alternatively, you could escape the *, so that python sees the literal *. Then, you can use the glob module.
$ getFileNames.py "*.jpg"
or
$ getFileNames.py \*.jpg

from glob import glob
import sys
files = glob(sys.argv[1])

If you're on Python 3.5+, you can use pathlib's glob() instead of the glob module alone.
Getting all files in a directory looks like this:
from pathlib import Path
for path in Path("/path/to/directory").glob("*"):
print(path)
Or, to just get a list of all .txt files in a directory, you could do this:
from pathlib import Path
for path in Path("/path/to/directory").glob("*.txt"):
print(path)
Finally, you can search recursively (i.e., to find all .txt files in your target directory and all subdirectories) using a wildcard directory:
from pathlib import Path
for path in Path("/path/to/directory").glob("**/*.txt"):
print(path)

I am adding this to the previous because I found this very useful when you want your scripts to work on multiple shell and with multiple parameters using *.
If you want something that works on every shells, you can do the following (still using glob):
>>> import glob
>>> from functools import reduce # if using python 3+
>>> reduce(lambda r, x: r + glob.glob(x), sys.argv[1:], [])
Note that it can produce duplicate (if you have a test file and you give t* and te*), but you can simply remove them using a set:
>>> set(reduce(lambda r, x: r + glob.glob(x), sys.argv[1:], []))

Related

return found path in glob

If I have a glob('path/to/my/**/*.json', recursive = True), function returns something like:
path/to/my/subfolder1/subfolder2/file1.json
path/to/my/subfolder1/subfolder2/file2.json
path/to/my/subfolder1/subfolder2/file3.json
path/to/my/subfolder1/file4.json
path/to/my/file5.json
...
I'd like to get only part that starts after ** in the glob, so
subfolder1/subfolder2/file1.json
subfolder1/subfolder2/file2.json
subfolder1/subfolder2/file3.json
subfolder1/file4.json
file5.json
...
What is the best way to do it? Does glob support it natively? Glob input is provided as commmand line hence direct str-replace may be difficult.
Use os.path.commonprefix on the returned paths, then os.path.relpath using the common prefix to get paths relative to it.
An example from a Node.js project with a whole bunch of package.jsons.
>>> pkgs = glob.glob("node_modules/**/package.json", recursive=True)[:10]
['node_modules/queue-microtask/package.json', 'node_modules/callsites/package.json', 'node_modules/sourcemap-codec/package.json', 'node_modules/reusify/package.json', 'node_modules/is-bigint/package.json', 'node_modules/which-boxed-primitive/package.json', 'node_modules/jsesc/package.json', 'node_modules/#types/scheduler/package.json', 'node_modules/#types/react-dom/package.json', 'node_modules/#types/prop-types/package.json']
>>> pfx = os.path.commonprefix(pkgs)
'node_modules/'
>>> [os.path.relpath(pkg, pfx) for pkg in pkgs]
['queue-microtask/package.json', 'callsites/package.json', 'sourcemap-codec/package.json', 'reusify/package.json', 'is-bigint/package.json', 'which-boxed-primitive/package.json', 'jsesc/package.json', '#types/scheduler/package.json', '#types/react-dom/package.json', '#types/prop-types/package.json']
>>>

Python: subprocess call doesn't recognize * wildcard character?

I want to remove all the *.ts in file. But os.remove didn't work.
>>> args = ['rm', '*.ts']
>>> p = subprocess.call(args)
rm: *.ts No such file or directory
The rm program takes a list of filenames, but *.ts isn't a list of filenames, it's a pattern for matching filenames. You have to name the actual files for rm. When you use a shell, the shell (but not rm!) will expand patterns like *.ts for you. In Python, you have to explicitly ask for it.
import glob
import subprocess
subprocess.check_call(['rm', '--'] + glob.glob('*.ts'))
# ^^^^ this makes things much safer, by the way
Of course, why bother with subprocess?
import glob
import os
for path in glob.glob('*.ts'):
os.remove(path)

How can I check for basename (only part of the name) file existence using Python?

I think the title explains it all. I need to check existence for a file that contains the word data in its name. I tried something like that os.path.exists(/d/prog/*data.txt) but that doesn't work.
Try glob for it:
from glob import glob
if glob('/d/prog/*data.txt'):
# blah blah
You can't just do it with os.path.exists -- it expects full pathname. If do not know exact filename, you should first find this file on filesystem (and when you do, it proves that file exists).
One option is to list directory (-ies), and find file manually:
>>> import os
>>> file_list = os.listdir('/etc')
>>> [fn for fn in file_list if 'deny' in fn]
['hostapd.deny', 'at.deny', 'cron.deny', 'hosts.deny']
Another and more flexible option is to use glob.glob, which allows to use wildcards such as *, ? and [...]:
>>> import glob
>>> glob.glob('/etc/*deny*')
['/etc/hostapd.deny', '/etc/at.deny', '/etc/cron.deny', '/etc/hosts.deny']
If you are on Windows:
>>> import glob
>>> glob.glob('D:\\test\\*data*')
['D:\\test\\data.1028.txt', 'D:\\test\\data2.1041.txt']

Remove unecssary directories in path name constructed with os.join

In Python when I print directory path constructed with os.join I get something like this :
rep/rep2/../rep1
Is there a way to get only this :
rep/rep1
Yes, os.path.normpath() collapses redundant separators and up-references.
os.path.realpath() converts the path to a canonical path, which includes eliminating '..' components, but it also eliminates symlinks.
See https://docs.python.org/2/library/os.path.html.
Use os.path.relpath:
>>> import os
>>> os.path.relpath("rep/rep2/../rep1", start="")
'rep/rep1'
Or os.path.normpath:
>>> import os
>>> os.path.normpath("rep/rep2/../rep1")
'rep/rep1'

How to get an absolute file path in Python

Given a path such as "mydir/myfile.txt", how do I find the file's absolute path in Python? E.g. on Windows, I might end up with:
"C:/example/cwd/mydir/myfile.txt"
>>> import os
>>> os.path.abspath("mydir/myfile.txt")
'C:/example/cwd/mydir/myfile.txt'
Also works if it is already an absolute path:
>>> import os
>>> os.path.abspath("C:/example/cwd/mydir/myfile.txt")
'C:/example/cwd/mydir/myfile.txt'
You could use the new Python 3.4 library pathlib. (You can also get it for Python 2.6 or 2.7 using pip install pathlib.) The authors wrote: "The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them."
To get an absolute path in Windows:
>>> from pathlib import Path
>>> p = Path("pythonw.exe").resolve()
>>> p
WindowsPath('C:/Python27/pythonw.exe')
>>> str(p)
'C:\\Python27\\pythonw.exe'
Or on UNIX:
>>> from pathlib import Path
>>> p = Path("python3.4").resolve()
>>> p
PosixPath('/opt/python3/bin/python3.4')
>>> str(p)
'/opt/python3/bin/python3.4'
Docs are here: https://docs.python.org/3/library/pathlib.html
import os
os.path.abspath(os.path.expanduser(os.path.expandvars(PathNameString)))
Note that expanduser is necessary (on Unix) in case the given expression for the file (or directory) name and location may contain a leading ~/(the tilde refers to the user's home directory), and expandvars takes care of any other environment variables (like $HOME).
Install a third-party path module (found on PyPI), it wraps all the os.path functions and other related functions into methods on an object that can be used wherever strings are used:
>>> from path import path
>>> path('mydir/myfile.txt').abspath()
'C:\\example\\cwd\\mydir\\myfile.txt'
Update for Python 3.4+ pathlib that actually answers the question:
from pathlib import Path
relative = Path("mydir/myfile.txt")
absolute = relative.absolute() # absolute is a Path object
If you only need a temporary string, keep in mind that you can use Path objects with all the relevant functions in os.path, including of course abspath:
from os.path import abspath
absolute = abspath(relative) # absolute is a str object
This always gets the right filename of the current script, even when it is called from within another script. It is especially useful when using subprocess.
import sys,os
filename = sys.argv[0]
from there, you can get the script's full path with:
>>> os.path.abspath(filename)
'/foo/bar/script.py'
It also makes easier to navigate folders by just appending /.. as many times as you want to go 'up' in the directories' hierarchy.
To get the cwd:
>>> os.path.abspath(filename+"/..")
'/foo/bar'
For the parent path:
>>> os.path.abspath(filename+"/../..")
'/foo'
By combining "/.." with other filenames, you can access any file in the system.
Today you can also use the unipath package which was based on path.py: http://sluggo.scrapping.cc/python/unipath/
>>> from unipath import Path
>>> absolute_path = Path('mydir/myfile.txt').absolute()
Path('C:\\example\\cwd\\mydir\\myfile.txt')
>>> str(absolute_path)
C:\\example\\cwd\\mydir\\myfile.txt
>>>
I would recommend using this package as it offers a clean interface to common os.path utilities.
You can use this to get absolute path of a specific file.
from pathlib import Path
fpath = Path('myfile.txt').absolute()
print(fpath)
Given a path such as mydir/myfile.txt, how do I find the file's absolute path relative to the current working directory in Python?
I would do it like this,
import os.path
os.path.join( os.getcwd(), 'mydir/myfile.txt' )
That returns '/home/ecarroll/mydir/myfile.txt'
if you are on a mac
import os
upload_folder = os.path.abspath("static/img/users")
this will give you a full path:
print(upload_folder)
will show the following path:
>>>/Users/myUsername/PycharmProjects/OBS/static/img/user
In case someone is using python and linux and looking for full path to file:
>>> path=os.popen("readlink -f file").read()
>>> print path
abs/path/to/file

Categories