Python: subprocess call doesn't recognize * wildcard character? - python

I want to remove all the *.ts in file. But os.remove didn't work.
>>> args = ['rm', '*.ts']
>>> p = subprocess.call(args)
rm: *.ts No such file or directory

The rm program takes a list of filenames, but *.ts isn't a list of filenames, it's a pattern for matching filenames. You have to name the actual files for rm. When you use a shell, the shell (but not rm!) will expand patterns like *.ts for you. In Python, you have to explicitly ask for it.
import glob
import subprocess
subprocess.check_call(['rm', '--'] + glob.glob('*.ts'))
# ^^^^ this makes things much safer, by the way
Of course, why bother with subprocess?
import glob
import os
for path in glob.glob('*.ts'):
os.remove(path)

Related

Excel file name with wildcard [duplicate]

I want get a list of filenames with a search pattern with a wildcard. Like:
getFilenames.py c:\PathToFolder\*
getFilenames.py c:\PathToFolder\FileType*.txt
getFilenames.py c:\PathToFolder\FileTypeA.txt
How can I do this?
You can do it like this:
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']
Note:
If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:
>>> import glob
>>> glob.glob('*.gif')
['card.gif']
>>> glob.glob('.c*')
['.card.gif']
This comes straight from here: http://docs.python.org/library/glob.html
glob is useful if you are doing this in within python, however, your shell may not be passing in the * (I'm not familiar with the windows shell).
For example, when I do the following:
import sys
print sys.argv
On my shell, I type:
$ python test.py *.jpg
I get this:
['test.py', 'test.jpg', 'wasp.jpg']
Notice that argv does not contain "*.jpg"
The important lesson here is that most shells will expand the asterisk at the shell, before it is passed to your application.
In this case, to get the list of files, I would just do sys.argv[1:]. Alternatively, you could escape the *, so that python sees the literal *. Then, you can use the glob module.
$ getFileNames.py "*.jpg"
or
$ getFileNames.py \*.jpg
from glob import glob
import sys
files = glob(sys.argv[1])
If you're on Python 3.5+, you can use pathlib's glob() instead of the glob module alone.
Getting all files in a directory looks like this:
from pathlib import Path
for path in Path("/path/to/directory").glob("*"):
print(path)
Or, to just get a list of all .txt files in a directory, you could do this:
from pathlib import Path
for path in Path("/path/to/directory").glob("*.txt"):
print(path)
Finally, you can search recursively (i.e., to find all .txt files in your target directory and all subdirectories) using a wildcard directory:
from pathlib import Path
for path in Path("/path/to/directory").glob("**/*.txt"):
print(path)
I am adding this to the previous because I found this very useful when you want your scripts to work on multiple shell and with multiple parameters using *.
If you want something that works on every shells, you can do the following (still using glob):
>>> import glob
>>> from functools import reduce # if using python 3+
>>> reduce(lambda r, x: r + glob.glob(x), sys.argv[1:], [])
Note that it can produce duplicate (if you have a test file and you give t* and te*), but you can simply remove them using a set:
>>> set(reduce(lambda r, x: r + glob.glob(x), sys.argv[1:], []))

Python: Run os.system for matching files

I want to run one specific command for as often as there are matching files in my subdirs. Every file is named like this: sub-01_T1w, sub-02_T1w … . The command I’m trying to run looks like this: “bet -F -m”.
Edit My Question: Every time I run the script none of the wildcards are replaced. The file paths are correct, but the os command is every time sub-[0-9][0-9] instead of: sub-01, sub-02, ... .
My first attempt looks like this:
import glob
import os
path = '/home/nico/Seminar/demo_fmri/'
subdirs = glob.glob('%s/sub-[0-9][0-9]'%(path))
for dir in subdirs:
print dir
glob.glob(os.system("bet %s/anat/sub-[0-9][0-9]_T1w %s/anat/sub-[0-9][0-9]_T1w_brain -F -m"%(dir,dir)))
You probably misunderstood how glob.glob works. It compute a list of file paths depending on the pattern you gave as argument.
You should not pass to glob.glob the result of os.system, this is probably not what you want to do.
Try to solve your problem with something like this:
import glob
import os
import subprocess
path = '/home/nico/Seminar/demo_fmri/'
subdirs = glob.glob(os.path.join(path, 'sub-[0-9][0-9]'))
for dir in subdirs:
print dir
for file in glob.glob(os.path.join(dir, 'anat/sub-[0-9][0-9]_T1w')):
subprocess.call(['bet', file, file+'_brain', '-f', '-m'])
Bonus: %s were removed in favor of os.path.join when needed. In addition, I used str.format in last line since I find it clearer. It's a question of style, do as you prefer
Edit: replaced os.system by subproces.call, as suggested by STD

Bash command doesn't run properly in Python

I have the following files in a directory
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Using ls Co-sqp* filters so that the output is
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
However, in a python script, I used
cmd = ["ls", self.prefix+".pdos_atm*wfc*"]
output = subprocess.Popen(cmd,stdout=subprocess.PIPE,shell=True).communicate()[0]
print(output)
return output.splitlines()
and the output contains both files
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
What am I doing wrong in the python code that causes the script to not filter the ls output correctly?
To expand on IanAuld's comment, I would like to offer you two solutions that solve your problem without relying on calling a subprocess. Using subprocesses is kind of clunky and for finding files, python offers several powerful and more pythonic options.
I created a folder named files containing two files with the names you described and a python script in the parent folder:
find_files.py
files/
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Now there are several options to do this:
import glob
import os
# a glob solution
# you can use wildcards etc like in the command line
print(glob.glob("files/Co-sq*"))
# an os solution
# you can walk through all files and only print/ keep those that start with your desired string
for root, dirs, files in os.walk("files/"):
for file in files:
if file.startswith("Co-sq"):
print(file)
I would prefer the glob solution for finding files, because it is quite flexible and easy to use. As the name suggests, glob allows you to use glob patterns as you know them from the command line.
Your command isn't bash it's ls. There's no shell involved to expand your filename pattern self.prefix+".pdos_atm*wfc*", so ls gets that as a literal argument, as if you'd entered (in a shell)
ls 'Co-sqp-C70.pdos_atm*wfc*'
You have at least these options:
Invoke a shell to expand the pattern:
cmd = [ "sh", "-c", "ls " + self.prefix + ".pdos_atm*wfc*"]
Equivalently:
cmd = "ls " + self.prefix + ".pdos_atm*wfc*"
This is risky if self.prefix isn't guaranteed to be shell-clean.
Expand the glob, and pass the results to ls:
import glob
cmd = ["ls"] + glob.glob(self.prefix+".pdos_atm*wfc*")
You're still using ls, which is not intended to give parseable output. Don't do any more than simply passing the output to a user (e.g. in a log file).
Expand the glob, and process the results in python:
import glob
for file in glob.glob(self.prefix+".pdos_atm*wfc*"):
some_function(file)
You should do this if you want to examine the file entries in any way.

glob files to use as input for a python script from a python script.

Instead of
cat $$dir/part* | ./testgen.py
I would like to glob the files and then use stdin for ./testgen.py while inside of my python script. How would i do this.
You could let the shell do it for you:
./testgen.py $$dir/part*
This passes every matching filename as a separate argument to your program. Then, you just read the filenames from sys.argv[1:].
A combination of glob and fileinput could be used:
from glob import glob
import fileinput
for line in inputfile.input(glob('dir/part*')):
print line # or whatever
Although if you get the shell to expand it - you can just use inputfile.input() and it will take input files from sys.argv[1:].
nneonneo is correct about shell expansion, but it won't work on Windows. Here's a simple, totally bulletproof cross-platform version:
import sys
from glob import glob
def argGlob(args=None):
if args is None: args = sys.argv
return [subglob for arg in args for subglob in glob(arg)]
argGlob will return the exact same thing as Unix-style shell expansion, and it also won't screw up any list of args that's already been expanded.

Passing arguments with wildcards to a Python script

I want to do something like this:
c:\data\> python myscript.py *.csv
and pass all of the .csv files in the directory to my python script (such that sys.argv contains ["file1.csv", "file2.csv"], etc.)
But sys.argv just receives ["*.csv"] indicating that the wildcard was not expanded, so this doesn't work.
I feel like there is a simple way to do this, but can't find it on Google. Any ideas?
You can use the glob module, that way you won't depend on the behavior of a particular shell (well, you still depend on the shell not expanding the arguments, but at least you can get this to happen in Unix by escaping the wildcards :-) ).
from glob import glob
filelist = glob('*.csv') #You can pass the sys.argv argument
In Unix, the shell expands wildcards, so programs get the expanded list of filenames. Windows doesn't do this: the shell passes the wildcards directly to the program, which has to expand them itself.
Vinko is right: the glob module does the job:
import glob, sys
for arg in glob.glob(sys.argv[1]):
print "Arg:", arg
If your script is a utility, I suggest you to define a function like this in your .bashrc to call it in a directory:
myscript() {
python /path/myscript.py "$#"
}
Then the whole list is passed to your python and you can process them like:
for _file in sys.argv[1:]:
# do something on file
If you have multiple wildcard items passed in (for eg: python myscript.py *.csv *.txt) then, glob(sys.argv[1] may not cut it. You may need something like below.
import sys
from glob import glob
args = [f for l in sys.argv[1:] for f in glob(l)]
This will work even if some arguments dont have wildcard characters in them. (python abc.txt *.csv anotherfile.dat)

Categories