I've a tricky problem. I need to apply a specific command called xRITDecompress to a list of files with extension -C_ and I should do this with Python.
Unfortunately, this command doesn't work with wildcards and I can't do something like:
os.system("xRITDecompress *-C_")
In principle, I could write an auxiliary bash script with a for cycle and call it inside my python program. However, I'd like not to rely on auxiliary files...
What would be the best way to do this within a python program?
You can use glob.glob() to get the list of files on which you want to run the command and then for each file in that list, run the command -
import glob
for f in glob.glob('*-C_'):
os.system('xRITDecompress {}'.format(f))
From documentation -
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell.
If by _ (underscore) , you wanted to match a single character , you should use - ? instead , like -
glob.glob('*-C?')
Please note, glob would only search in current directory but according to what you wanted with the original trial, seems like that maybe what you want.
You may also, want to look at subprocess module, it is a more powerful module for running commands (spawning processes). Example -
import subprocess
import glob
for f in glob.glob('*-C_'):
subprocess.call(['xRITDecompress',f])
You can use glob.glob or glob.iglob to get files that match the given pattern:
import glob
files = glob.iglob('*-C_')
for f in files:
os.system("xRITDecompress %s" % f)
Just use glob.glob to search and os.system to execute
import os
from glob import glob
for file in glob('*-C_'):
os.system("xRITDecompress %s" % file)
I hope it satisfies your question
Related
I want to run one specific command for as often as there are matching files in my subdirs. Every file is named like this: sub-01_T1w, sub-02_T1w … . The command I’m trying to run looks like this: “bet -F -m”.
Edit My Question: Every time I run the script none of the wildcards are replaced. The file paths are correct, but the os command is every time sub-[0-9][0-9] instead of: sub-01, sub-02, ... .
My first attempt looks like this:
import glob
import os
path = '/home/nico/Seminar/demo_fmri/'
subdirs = glob.glob('%s/sub-[0-9][0-9]'%(path))
for dir in subdirs:
print dir
glob.glob(os.system("bet %s/anat/sub-[0-9][0-9]_T1w %s/anat/sub-[0-9][0-9]_T1w_brain -F -m"%(dir,dir)))
You probably misunderstood how glob.glob works. It compute a list of file paths depending on the pattern you gave as argument.
You should not pass to glob.glob the result of os.system, this is probably not what you want to do.
Try to solve your problem with something like this:
import glob
import os
import subprocess
path = '/home/nico/Seminar/demo_fmri/'
subdirs = glob.glob(os.path.join(path, 'sub-[0-9][0-9]'))
for dir in subdirs:
print dir
for file in glob.glob(os.path.join(dir, 'anat/sub-[0-9][0-9]_T1w')):
subprocess.call(['bet', file, file+'_brain', '-f', '-m'])
Bonus: %s were removed in favor of os.path.join when needed. In addition, I used str.format in last line since I find it clearer. It's a question of style, do as you prefer
Edit: replaced os.system by subproces.call, as suggested by STD
I have the following files in a directory
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Using ls Co-sqp* filters so that the output is
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
However, in a python script, I used
cmd = ["ls", self.prefix+".pdos_atm*wfc*"]
output = subprocess.Popen(cmd,stdout=subprocess.PIPE,shell=True).communicate()[0]
print(output)
return output.splitlines()
and the output contains both files
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
What am I doing wrong in the python code that causes the script to not filter the ls output correctly?
To expand on IanAuld's comment, I would like to offer you two solutions that solve your problem without relying on calling a subprocess. Using subprocesses is kind of clunky and for finding files, python offers several powerful and more pythonic options.
I created a folder named files containing two files with the names you described and a python script in the parent folder:
find_files.py
files/
Co-sqp-C70.pdos_atm*5*C*_wfc*2*p*
Copilot
Now there are several options to do this:
import glob
import os
# a glob solution
# you can use wildcards etc like in the command line
print(glob.glob("files/Co-sq*"))
# an os solution
# you can walk through all files and only print/ keep those that start with your desired string
for root, dirs, files in os.walk("files/"):
for file in files:
if file.startswith("Co-sq"):
print(file)
I would prefer the glob solution for finding files, because it is quite flexible and easy to use. As the name suggests, glob allows you to use glob patterns as you know them from the command line.
Your command isn't bash it's ls. There's no shell involved to expand your filename pattern self.prefix+".pdos_atm*wfc*", so ls gets that as a literal argument, as if you'd entered (in a shell)
ls 'Co-sqp-C70.pdos_atm*wfc*'
You have at least these options:
Invoke a shell to expand the pattern:
cmd = [ "sh", "-c", "ls " + self.prefix + ".pdos_atm*wfc*"]
Equivalently:
cmd = "ls " + self.prefix + ".pdos_atm*wfc*"
This is risky if self.prefix isn't guaranteed to be shell-clean.
Expand the glob, and pass the results to ls:
import glob
cmd = ["ls"] + glob.glob(self.prefix+".pdos_atm*wfc*")
You're still using ls, which is not intended to give parseable output. Don't do any more than simply passing the output to a user (e.g. in a log file).
Expand the glob, and process the results in python:
import glob
for file in glob.glob(self.prefix+".pdos_atm*wfc*"):
some_function(file)
You should do this if you want to examine the file entries in any way.
First and foremost, I am recently new to Unix and I have tried to find a solution to my question online, but I could not find a solution.
So I am running Python through my Unix terminal, and I have a program that parses xml files and inputs the results into a .dat file.
My program works, but I have to input every single xml file (which number over 50) individually.
For example:
clamshell: python3 my_parser2.py 'items-0.xml' 'items-1.xml' 'items-2.xml' 'items-3.xml' .....`
So I was wondering if it is possible to read from the directory, which contains all of my files into my program? Rather than typing all the xml file names individually and running the program that way.
Any help on this is greatly appreciated.
import glob
listOffiles = glob.glob('directory/*.xml')
The shell itself can expand wildcards so, if you don't care about the order of the input files, just use:
python3 my_parser2.py items-*.xml
If the numeric order is important (you want 0..9, 10-99 and so on in that order, you may have to adjust the wildcard arguments slightly to guarantee this, such as with:
python3 my_parser2.py items-[0-9].xml items-[1-9][0-9].xml items-[1-9][0-9][0-9].xml
python3 my_parser2.py *.xml should work.
Other than the command line option, you could just use glob from within your script and bypass the need for command arguments:
import glob
filenames = glob.glob("*.xml")
This will return all .xml files (as filenames) in the directory from which you are running the script.
Then, if needed you can simply iterate through all the files with a basic loop:
for file in filenames:
with open(file, 'r') as f:
# do stuff to f.
Using python 2.7
I have a list of *.tat.gz files on a linux box. Using python, I want to loop through the files and extract those files in a different location, under their respective folders.
For example: if my file name is ~/TargetData/zip/1440198002317590001.tar.gz
then I want to untar and ungzip this file in a different location under its
respective folder name i.e. ~/TargetData/unzip/1440198002317590001.
I have written some code but I am not able to loop through the files. In a command line I am able to untar using $ tar -czf 1440198002317590001.tar.gz 1440198002317590001 command. But I want to be able to loop through the .tar.gz files. The code is mentioned below. Here, I’m not able to loop just the files Or print only the files. Can you please help?
import os
inF = []
inF = str(os.system('ls ~/TargetData/zip/*.tar.gz'))
#print(inF)
if inF is not None:
for files in inF[:-1]:
print files
"""
os.system('tar -czf files /unzip/files[:-7]')
# This is what i am expecting here files = "1440198002317590001.tar.gz" and files[:-7]= "1440198002317590001"
"""
Have you ever worked on this type of use case? Your help is greatly appreciated!! Thank you!
I think you misunderstood the meaning of os.system(), that will do the job, but its return value was not expected by you, it returns 0 for successful done, you can not directly assign its output to a variable. You may consider the module [subprocess], see doc here. However, I DO NOT recommend that way to list files (actually, it returns string instead of list, see doc find the detail by yourself).
The best way I think would be glob module, see doc here. Use glob.glob(pattern), you can put all files match the pattern in a list, then you can loop it easily.
Of course, if you are familiar with os module, you also can use os.listdir(), os.path.join(), or even os.paht.expanduser() to do this. (Unlike glob, it only put filenames without fully path into a list, you need to reconstruct file path).
By the way, for you purpose here, there is no need to declare an empty list first (i.e. inF = [])
For unzip file part, you can do it by os.system, but I also recommend to use subprocess module instead of os.system, you will find the reason in the doc of subprocess.
DO NOT see the following code, ONLY see them after you really can not solve this by yourself.
import os
import glob
inF = glob.glob('~/TargetData/zip/*.tar.gz')
if inF:
for files in inF:
# consider subprocess.call() instead of os.system
unzip_name = files.replace('zip', 'unzip')[:-7]
# get directory name and make sure it exists, otherwise create it
unzip_dir = os.path.dirname(unzip_name)
if not os.path.exists(unzip_dir):
os.mkdir(unzip_dir)
subprocess.call(['tar -xzf', files, '-C', unzip_name])
# os.system('tar -czf files /unzip/files[:-7]')
I want to do something like this:
c:\data\> python myscript.py *.csv
and pass all of the .csv files in the directory to my python script (such that sys.argv contains ["file1.csv", "file2.csv"], etc.)
But sys.argv just receives ["*.csv"] indicating that the wildcard was not expanded, so this doesn't work.
I feel like there is a simple way to do this, but can't find it on Google. Any ideas?
You can use the glob module, that way you won't depend on the behavior of a particular shell (well, you still depend on the shell not expanding the arguments, but at least you can get this to happen in Unix by escaping the wildcards :-) ).
from glob import glob
filelist = glob('*.csv') #You can pass the sys.argv argument
In Unix, the shell expands wildcards, so programs get the expanded list of filenames. Windows doesn't do this: the shell passes the wildcards directly to the program, which has to expand them itself.
Vinko is right: the glob module does the job:
import glob, sys
for arg in glob.glob(sys.argv[1]):
print "Arg:", arg
If your script is a utility, I suggest you to define a function like this in your .bashrc to call it in a directory:
myscript() {
python /path/myscript.py "$#"
}
Then the whole list is passed to your python and you can process them like:
for _file in sys.argv[1:]:
# do something on file
If you have multiple wildcard items passed in (for eg: python myscript.py *.csv *.txt) then, glob(sys.argv[1] may not cut it. You may need something like below.
import sys
from glob import glob
args = [f for l in sys.argv[1:] for f in glob(l)]
This will work even if some arguments dont have wildcard characters in them. (python abc.txt *.csv anotherfile.dat)