Passing arguments with wildcards to a Python script - python

I want to do something like this:
c:\data\> python myscript.py *.csv
and pass all of the .csv files in the directory to my python script (such that sys.argv contains ["file1.csv", "file2.csv"], etc.)
But sys.argv just receives ["*.csv"] indicating that the wildcard was not expanded, so this doesn't work.
I feel like there is a simple way to do this, but can't find it on Google. Any ideas?

You can use the glob module, that way you won't depend on the behavior of a particular shell (well, you still depend on the shell not expanding the arguments, but at least you can get this to happen in Unix by escaping the wildcards :-) ).
from glob import glob
filelist = glob('*.csv') #You can pass the sys.argv argument

In Unix, the shell expands wildcards, so programs get the expanded list of filenames. Windows doesn't do this: the shell passes the wildcards directly to the program, which has to expand them itself.
Vinko is right: the glob module does the job:
import glob, sys
for arg in glob.glob(sys.argv[1]):
print "Arg:", arg

If your script is a utility, I suggest you to define a function like this in your .bashrc to call it in a directory:
myscript() {
python /path/myscript.py "$#"
}
Then the whole list is passed to your python and you can process them like:
for _file in sys.argv[1:]:
# do something on file

If you have multiple wildcard items passed in (for eg: python myscript.py *.csv *.txt) then, glob(sys.argv[1] may not cut it. You may need something like below.
import sys
from glob import glob
args = [f for l in sys.argv[1:] for f in glob(l)]
This will work even if some arguments dont have wildcard characters in them. (python abc.txt *.csv anotherfile.dat)

Related

Python: Run os.system for matching files

I want to run one specific command for as often as there are matching files in my subdirs. Every file is named like this: sub-01_T1w, sub-02_T1w … . The command I’m trying to run looks like this: “bet -F -m”.
Edit My Question: Every time I run the script none of the wildcards are replaced. The file paths are correct, but the os command is every time sub-[0-9][0-9] instead of: sub-01, sub-02, ... .
My first attempt looks like this:
import glob
import os
path = '/home/nico/Seminar/demo_fmri/'
subdirs = glob.glob('%s/sub-[0-9][0-9]'%(path))
for dir in subdirs:
print dir
glob.glob(os.system("bet %s/anat/sub-[0-9][0-9]_T1w %s/anat/sub-[0-9][0-9]_T1w_brain -F -m"%(dir,dir)))
You probably misunderstood how glob.glob works. It compute a list of file paths depending on the pattern you gave as argument.
You should not pass to glob.glob the result of os.system, this is probably not what you want to do.
Try to solve your problem with something like this:
import glob
import os
import subprocess
path = '/home/nico/Seminar/demo_fmri/'
subdirs = glob.glob(os.path.join(path, 'sub-[0-9][0-9]'))
for dir in subdirs:
print dir
for file in glob.glob(os.path.join(dir, 'anat/sub-[0-9][0-9]_T1w')):
subprocess.call(['bet', file, file+'_brain', '-f', '-m'])
Bonus: %s were removed in favor of os.path.join when needed. In addition, I used str.format in last line since I find it clearer. It's a question of style, do as you prefer
Edit: replaced os.system by subproces.call, as suggested by STD

Bash command doesn't run properly in Python

I have the following files in a directory
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Using ls Co-sqp* filters so that the output is
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
However, in a python script, I used
cmd = ["ls", self.prefix+".pdos_atm*wfc*"]
output = subprocess.Popen(cmd,stdout=subprocess.PIPE,shell=True).communicate()[0]
print(output)
return output.splitlines()
and the output contains both files
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
What am I doing wrong in the python code that causes the script to not filter the ls output correctly?
To expand on IanAuld's comment, I would like to offer you two solutions that solve your problem without relying on calling a subprocess. Using subprocesses is kind of clunky and for finding files, python offers several powerful and more pythonic options.
I created a folder named files containing two files with the names you described and a python script in the parent folder:
find_files.py
files/
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Now there are several options to do this:
import glob
import os
# a glob solution
# you can use wildcards etc like in the command line
print(glob.glob("files/Co-sq*"))
# an os solution
# you can walk through all files and only print/ keep those that start with your desired string
for root, dirs, files in os.walk("files/"):
for file in files:
if file.startswith("Co-sq"):
print(file)
I would prefer the glob solution for finding files, because it is quite flexible and easy to use. As the name suggests, glob allows you to use glob patterns as you know them from the command line.
Your command isn't bash it's ls. There's no shell involved to expand your filename pattern self.prefix+".pdos_atm*wfc*", so ls gets that as a literal argument, as if you'd entered (in a shell)
ls 'Co-sqp-C70.pdos_atm*wfc*'
You have at least these options:
Invoke a shell to expand the pattern:
cmd = [ "sh", "-c", "ls " + self.prefix + ".pdos_atm*wfc*"]
Equivalently:
cmd = "ls " + self.prefix + ".pdos_atm*wfc*"
This is risky if self.prefix isn't guaranteed to be shell-clean.
Expand the glob, and pass the results to ls:
import glob
cmd = ["ls"] + glob.glob(self.prefix+".pdos_atm*wfc*")
You're still using ls, which is not intended to give parseable output. Don't do any more than simply passing the output to a user (e.g. in a log file).
Expand the glob, and process the results in python:
import glob
for file in glob.glob(self.prefix+".pdos_atm*wfc*"):
some_function(file)
You should do this if you want to examine the file entries in any way.

apply command to list of files in python

I've a tricky problem. I need to apply a specific command called xRITDecompress to a list of files with extension -C_ and I should do this with Python.
Unfortunately, this command doesn't work with wildcards and I can't do something like:
os.system("xRITDecompress *-C_")
In principle, I could write an auxiliary bash script with a for cycle and call it inside my python program. However, I'd like not to rely on auxiliary files...
What would be the best way to do this within a python program?
You can use glob.glob() to get the list of files on which you want to run the command and then for each file in that list, run the command -
import glob
for f in glob.glob('*-C_'):
os.system('xRITDecompress {}'.format(f))
From documentation -
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell.
If by _ (underscore) , you wanted to match a single character , you should use - ? instead , like -
glob.glob('*-C?')
Please note, glob would only search in current directory but according to what you wanted with the original trial, seems like that maybe what you want.
You may also, want to look at subprocess module, it is a more powerful module for running commands (spawning processes). Example -
import subprocess
import glob
for f in glob.glob('*-C_'):
subprocess.call(['xRITDecompress',f])
You can use glob.glob or glob.iglob to get files that match the given pattern:
import glob
files = glob.iglob('*-C_')
for f in files:
os.system("xRITDecompress %s" % f)
Just use glob.glob to search and os.system to execute
import os
from glob import glob
for file in glob('*-C_'):
os.system("xRITDecompress %s" % file)
I hope it satisfies your question

passing wildcard arguments from bash into python

I'm trying to practice with python script by writing a simple script that would take a large series of files named A_B and write them to the location B\A. The way I was passing the arguments into the file was
python script.py *
and my program looks like
from sys import argv
import os
import ntpath
import shutil
script, filename = argv
target = open(filename)
outfilename = target.name.split('_')
outpath=outfilename[1]
outpath+="/"
outpath+=outfilename[0]
if not os.path.exists(outfilename[1]):
os.makedirs(outfilename[1])
shutil.copyfile(target.name, outpath)
target.close()
The problem with this is that this script the way it's currently written is set up to only accept 1 file at a time. Originally I was hoping the wildcard would pass one file at a time to the script then execute the script each time.
My question covers both cases:
How could I instead pass the wildcard files one at a time to a script.
and
How do I modify this script to instead accept all the arguments? (I can handle list-ifying everything but argv is what I'm having problems with and im a bit unsure about how to create a list of files)
You have two options, both of which involve a loop.
To pass the files one by one, use a shell loop:
for file in *; do python script.py "$file"; done
This will invoke your script once for every file matching the glob *.
To process multiple files in your script, use a loop there instead:
from sys import argv
for filename in argv[1:]:
# rest of script
Then call your script from bash like python script.py * to pass all the files as arguments. argv[1:] is an array slice, which returns a list containing the elements from argv starting from position 1 to the end of the array.
I would suggest the latter approach as it means that you are only invoking one instance of your script.

glob files to use as input for a python script from a python script.

Instead of
cat $$dir/part* | ./testgen.py
I would like to glob the files and then use stdin for ./testgen.py while inside of my python script. How would i do this.
You could let the shell do it for you:
./testgen.py $$dir/part*
This passes every matching filename as a separate argument to your program. Then, you just read the filenames from sys.argv[1:].
A combination of glob and fileinput could be used:
from glob import glob
import fileinput
for line in inputfile.input(glob('dir/part*')):
print line # or whatever
Although if you get the shell to expand it - you can just use inputfile.input() and it will take input files from sys.argv[1:].
nneonneo is correct about shell expansion, but it won't work on Windows. Here's a simple, totally bulletproof cross-platform version:
import sys
from glob import glob
def argGlob(args=None):
if args is None: args = sys.argv
return [subglob for arg in args for subglob in glob(arg)]
argGlob will return the exact same thing as Unix-style shell expansion, and it also won't screw up any list of args that's already been expanded.

Categories