I have the following files in a directory
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Using ls Co-sqp* filters so that the output is
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
However, in a python script, I used
cmd = ["ls", self.prefix+".pdos_atm*wfc*"]
output = subprocess.Popen(cmd,stdout=subprocess.PIPE,shell=True).communicate()[0]
print(output)
return output.splitlines()
and the output contains both files
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
What am I doing wrong in the python code that causes the script to not filter the ls output correctly?
To expand on IanAuld's comment, I would like to offer you two solutions that solve your problem without relying on calling a subprocess. Using subprocesses is kind of clunky and for finding files, python offers several powerful and more pythonic options.
I created a folder named files containing two files with the names you described and a python script in the parent folder:
find_files.py
files/
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Now there are several options to do this:
import glob
import os
# a glob solution
# you can use wildcards etc like in the command line
print(glob.glob("files/Co-sq*"))
# an os solution
# you can walk through all files and only print/ keep those that start with your desired string
for root, dirs, files in os.walk("files/"):
for file in files:
if file.startswith("Co-sq"):
print(file)
I would prefer the glob solution for finding files, because it is quite flexible and easy to use. As the name suggests, glob allows you to use glob patterns as you know them from the command line.
Your command isn't bash it's ls. There's no shell involved to expand your filename pattern self.prefix+".pdos_atm*wfc*", so ls gets that as a literal argument, as if you'd entered (in a shell)
ls 'Co-sqp-C70.pdos_atm*wfc*'
You have at least these options:
Invoke a shell to expand the pattern:
cmd = [ "sh", "-c", "ls " + self.prefix + ".pdos_atm*wfc*"]
Equivalently:
cmd = "ls " + self.prefix + ".pdos_atm*wfc*"
This is risky if self.prefix isn't guaranteed to be shell-clean.
Expand the glob, and pass the results to ls:
import glob
cmd = ["ls"] + glob.glob(self.prefix+".pdos_atm*wfc*")
You're still using ls, which is not intended to give parseable output. Don't do any more than simply passing the output to a user (e.g. in a log file).
Expand the glob, and process the results in python:
import glob
for file in glob.glob(self.prefix+".pdos_atm*wfc*"):
some_function(file)
You should do this if you want to examine the file entries in any way.
Related
I am trying to automate a particular process using subprocess module in python. For example, if I have a set of files that start with a word plot and then 8 digits of numbers. I want to copy them using the subprocess run command.
copyfiles = subprocess.run(['cp', '-r', 'plot*', 'dest'])
When I run the code, the above code returns an error "cp: plot: No such file or directory*"
How can I execute such commands using subprocess module? If I give a full filename, the above code works without any errors.
I have found a useful but probably not the best efficent code fragment from this post, where an additional python library is used (shlex), and what I propose is to use os.listdir method to iterate over folder that you need to copy files, save in a list file_list and filter using a lambda function to extract specific file names, define a command sentence as string and use subproccess.Popen() to execute the child process to copy the files on to a destination folder.
import shlex
import os
import subprocess
# chage directory where your files are located at
os.chdir('C:/example_folder/')
# you can use os.getcwd function to check the current working directory
print(os.getcwd)
# extract file names
file_list = os.listdir()
file_list = list(filter(lambda file_name: file_name.startswith('plot'), file_list))
# command sentence
cmd = 'find test_folder -iname %s -exec cp {} dest_folder ;'
for file in file_list:
subprocess.Popen(shlex.split(cmd % file))
I want to run one specific command for as often as there are matching files in my subdirs. Every file is named like this: sub-01_T1w, sub-02_T1w … . The command I’m trying to run looks like this: “bet -F -m”.
Edit My Question: Every time I run the script none of the wildcards are replaced. The file paths are correct, but the os command is every time sub-[0-9][0-9] instead of: sub-01, sub-02, ... .
My first attempt looks like this:
import glob
import os
path = '/home/nico/Seminar/demo_fmri/'
subdirs = glob.glob('%s/sub-[0-9][0-9]'%(path))
for dir in subdirs:
print dir
glob.glob(os.system("bet %s/anat/sub-[0-9][0-9]_T1w %s/anat/sub-[0-9][0-9]_T1w_brain -F -m"%(dir,dir)))
You probably misunderstood how glob.glob works. It compute a list of file paths depending on the pattern you gave as argument.
You should not pass to glob.glob the result of os.system, this is probably not what you want to do.
Try to solve your problem with something like this:
import glob
import os
import subprocess
path = '/home/nico/Seminar/demo_fmri/'
subdirs = glob.glob(os.path.join(path, 'sub-[0-9][0-9]'))
for dir in subdirs:
print dir
for file in glob.glob(os.path.join(dir, 'anat/sub-[0-9][0-9]_T1w')):
subprocess.call(['bet', file, file+'_brain', '-f', '-m'])
Bonus: %s were removed in favor of os.path.join when needed. In addition, I used str.format in last line since I find it clearer. It's a question of style, do as you prefer
Edit: replaced os.system by subproces.call, as suggested by STD
I want to remove all the *.ts in file. But os.remove didn't work.
>>> args = ['rm', '*.ts']
>>> p = subprocess.call(args)
rm: *.ts No such file or directory
The rm program takes a list of filenames, but *.ts isn't a list of filenames, it's a pattern for matching filenames. You have to name the actual files for rm. When you use a shell, the shell (but not rm!) will expand patterns like *.ts for you. In Python, you have to explicitly ask for it.
import glob
import subprocess
subprocess.check_call(['rm', '--'] + glob.glob('*.ts'))
# ^^^^ this makes things much safer, by the way
Of course, why bother with subprocess?
import glob
import os
for path in glob.glob('*.ts'):
os.remove(path)
I've a tricky problem. I need to apply a specific command called xRITDecompress to a list of files with extension -C_ and I should do this with Python.
Unfortunately, this command doesn't work with wildcards and I can't do something like:
os.system("xRITDecompress *-C_")
In principle, I could write an auxiliary bash script with a for cycle and call it inside my python program. However, I'd like not to rely on auxiliary files...
What would be the best way to do this within a python program?
You can use glob.glob() to get the list of files on which you want to run the command and then for each file in that list, run the command -
import glob
for f in glob.glob('*-C_'):
os.system('xRITDecompress {}'.format(f))
From documentation -
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell.
If by _ (underscore) , you wanted to match a single character , you should use - ? instead , like -
glob.glob('*-C?')
Please note, glob would only search in current directory but according to what you wanted with the original trial, seems like that maybe what you want.
You may also, want to look at subprocess module, it is a more powerful module for running commands (spawning processes). Example -
import subprocess
import glob
for f in glob.glob('*-C_'):
subprocess.call(['xRITDecompress',f])
You can use glob.glob or glob.iglob to get files that match the given pattern:
import glob
files = glob.iglob('*-C_')
for f in files:
os.system("xRITDecompress %s" % f)
Just use glob.glob to search and os.system to execute
import os
from glob import glob
for file in glob('*-C_'):
os.system("xRITDecompress %s" % file)
I hope it satisfies your question
I want to do something like this:
c:\data\> python myscript.py *.csv
and pass all of the .csv files in the directory to my python script (such that sys.argv contains ["file1.csv", "file2.csv"], etc.)
But sys.argv just receives ["*.csv"] indicating that the wildcard was not expanded, so this doesn't work.
I feel like there is a simple way to do this, but can't find it on Google. Any ideas?
You can use the glob module, that way you won't depend on the behavior of a particular shell (well, you still depend on the shell not expanding the arguments, but at least you can get this to happen in Unix by escaping the wildcards :-) ).
from glob import glob
filelist = glob('*.csv') #You can pass the sys.argv argument
In Unix, the shell expands wildcards, so programs get the expanded list of filenames. Windows doesn't do this: the shell passes the wildcards directly to the program, which has to expand them itself.
Vinko is right: the glob module does the job:
import glob, sys
for arg in glob.glob(sys.argv[1]):
print "Arg:", arg
If your script is a utility, I suggest you to define a function like this in your .bashrc to call it in a directory:
myscript() {
python /path/myscript.py "$#"
}
Then the whole list is passed to your python and you can process them like:
for _file in sys.argv[1:]:
# do something on file
If you have multiple wildcard items passed in (for eg: python myscript.py *.csv *.txt) then, glob(sys.argv[1] may not cut it. You may need something like below.
import sys
from glob import glob
args = [f for l in sys.argv[1:] for f in glob(l)]
This will work even if some arguments dont have wildcard characters in them. (python abc.txt *.csv anotherfile.dat)