Handling wildcard expansion with python subprocess call - python

I'm calling this function and using %s*silent to read files that have names with the following format: name.number.silent.
I get the name from start_model.split('/')[-1].split('.')[0] so don't worry about it.
This is obviously not working because these commands are actually never passed to the shell. If I were to use glob, how can I modify my code to do what I'm doing below?
from subprocess import call
def fragment_score(rosetta_path, silent_input_and_score_output, start_model):
call([rosetta_path,
'-mode score',
'-in::file::silent', '%s/%s*silent' % (silent_input_and_score_output, start_model.split('/')[-1].split('.')[0]),
'-scorefile', '%s/scores1' % silent_input_and_score_output,
'-n_matches', '50'])

Use the Python glob module to generate a list of glob results, and splice it into your argument list at the same position where you would otherwise have a shell replacing a glob expression with the list of associated matches:
from subprocess import call
from glob import glob
def fragment_score(rosetta_path, silent_input_and_score_output, start_model):
glob_exp = '%s/%s*silent' % (silent_input_and_score_output, start_model.split('/')[-1].split('.')[0])
glob_results = glob(glob_exp)
call([rosetta_path,
'-mode score',
'-in::file::silent'
] + glob_results + [
'-scorefile', '%s/scores1' % silent_input_and_score_output,
'-n_matches', '50'])
In current Python 3.x, there's syntax that makes this a bit more natural:
call([rosetta_path,
'-mode score',
'-in::file::silent',
*glob_results,
'-scorefile', '%s/scores1' % silent_input_and_score_output,
'-n_matches', '50'])

Related

proper syntax for subprocess.call()

I have the following script:
import glob
import subprocess
import os
filePath = "/tmp/ming"
keyword = "GC10^Dummy-Segment"
#if keyword in filePath:
new=glob.glob('/tmp/ming/*Dummy-Segment*')
print(new)
for x in new:
subprocess.call(['hdfs dfs -copyFromLocal {0} /user/app'.format(x)], shell=True)
print(new) yields:
['/tmp/mike/GC10^Dummy-Segment_2018', '/tmp/mike/GC10^Dummy-Segment_2019']
Seeing the following errors:
copyFromLocal: unexpected URISyntaxException
copyFromLocal: unexpected URISyntaxException
In an earlier attempt, I had to replace the ^ with %5E but I'm really not sure how to substitute the ^ now.
I think for each x in new, I have to add in %5E where the ^ is, then do the copyFromLocal. But how do I do that?
Also, I'm running Python 2.6.6

Python subprocess script failing

Have written the below script to delete files in a folder not matching the dates in the "keep" period. Eg. Delete all except files partly matching this name.
The command works from the shell but fails with the subprocess call.
/bin/rm /home/backups/!(*"20170920"*|*"20170919"*|*"20170918"*|*"20170917"*|*"20170916"*|*"20170915"*|*"20170914"*)
#!/usr/bin/env python
from datetime import datetime
from datetime import timedelta
import subprocess
### Editable Variables
keepdays=7
location="/home/backups"
count=0
date_string=''
for count in range(0,keepdays):
if(date_string!=""):
date_string+="|"
keepdate = (datetime.now() - timedelta(days=count)).strftime("%Y%m%d")
date_string+="*\""+keepdate+"\"*"
full_cmd="/bin/rm "+location+"/!("+date_string+")"
subprocess.call([full_cmd], shell=True)
This is what the script returns:
#./test.py
/bin/rm /home/backups/!(*"20170920"*|*"20170919"*|*"20170918"*|*"20170917"*|*"20170916"*|*"20170915"*|*"20170914"*)
/bin/sh: 1: Syntax error: "(" unexpected
Python version is Python 2.7.12
Just as #hjpotter said, subprocess will use /bin/sh as default shell, which doesn't support the kind of globbing you want to do. See official documentation. You can change that using the executable parameter to subprocess.call() with a more appropriate shell (/bin/bash or /bin/zsh for example): subprocess.call([full_cmd], executable="/bin/bash", shell=True)
BUT you can be a lot better served by Python itself, you don't need to call a subprocess to delete a file:
#!/usr/bin/env python
from datetime import datetime
from datetime import timedelta
import re
import os
import os.path
### Editable Variables
keepdays=7
location="/home/backups"
now = datetime.now()
keeppatterns = set((now - timedelta(days=count)).strftime("%Y%m%d") for count in range(0, keepdays))
for filename in os.listdir(location):
dates = set(re.findall(r"\d{8}", filename))
if not dates or dates.isdisjoint(keeppatterns):
abs_path = os.path.join(location, filename)
print("I am about to remove", abs_path)
# uncomment the line below when you are sure it won't delete any valuable file
#os.path.delete(abs_path)

commands module problem

Hi I'm trying to execute bash command in python by importing commands module.I think I ask the same question here before. However this time it doesn't work.
The script is as below:
#!/usr/bin/python
import os,sys
import commands
import glob
path= '/home/xxx/nearline/bamfiles'
bamfiles = glob.glob(path + '/*.bam')
for bamfile in bamfiles:
fullpath = os.path.join(path,bamfile)
txtfile = commands.getoutput('/share/bin/samtools/samtools ' + 'view '+ fullpath)
line=txtfile.readlines()
print line
this samtools view will produce (I think) .txt file
I got the errors:
Traceback (most recent call last):
File "./try.py", line 12, in ?
txtfile = commands.getoutput('/share/bin/samtools/samtools ' + 'view '+ fullpath)
File "/usr/lib64/python2.4/commands.py", line 44, in getoutput
return getstatusoutput(cmd)[1]
File "/usr/lib64/python2.4/commands.py", line 54, in getstatusoutput
text = pipe.read()
SystemError: Objects/stringobject.c:3518: bad argument to internal function
Seems it's the problem with commands.getoutput
Thanks
I would recommend using subprocess
From the commands documentation:
Deprecated since version 2.6: The commands module has been removed in Python 3.0. Use the subprocess module instead.
Update: Just realized you're using Python 2.4. An easy way to execute a command is os.system()
A quick google search for "SystemError: Objects/stringobject.c:3518: bad argument to internal function" brings up several bug reports. Such as https://www.mercurial-scm.org/bts/issue1225 and http://www.modpython.org/pipermail/mod_python/2007-June/023852.html. It appears to be an issue with Fedora in combination with Python 2.4, but I am not exactly sure about that. I would suggest that you follow Michael's advice and use os.system or os.popen to accomplish this task. To do this the changes in your code will be:
import os,sys
import glob
path= '/home/xxx/nearline/bamfiles'
bamfiles = glob.glob(path + '/*.bam')
for bamfile in bamfiles:
fullpath = os.path.join(path,bamfile)
txtfile = os.popen('/share/bin/samtools/samtools ' + 'view '+ fullpath)
line=txtfile.readlines()
print line

How to add file extensions based on file type on Linux/Unix?

This is a question regarding Unix shell scripting (any shell), but any other "standard" scripting language solution would also be appreciated:
I have a directory full of files where the filenames are hash values like this:
fd73d0cf8ee68073dce270cf7e770b97
fec8047a9186fdcc98fdbfc0ea6075ee
These files have different original file types such as png, zip, doc, pdf etc.
Can anybody provide a script that would rename the files so they get their appropriate file extension, probably based on the output of the file command?
Answer:
J.F. Sebastian's script will work for both ouput of the filenames as well as the actual renaming.
Here's mimetypes' version:
#!/usr/bin/env python
"""It is a `filename -> filename.ext` filter.
`ext` is mime-based.
"""
import fileinput
import mimetypes
import os
import sys
from subprocess import Popen, PIPE
if len(sys.argv) > 1 and sys.argv[1] == '--rename':
do_rename = True
del sys.argv[1]
else:
do_rename = False
for filename in (line.rstrip() for line in fileinput.input()):
output, _ = Popen(['file', '-bi', filename], stdout=PIPE).communicate()
mime = output.split(';', 1)[0].lower().strip()
ext = mimetypes.guess_extension(mime, strict=False)
if ext is None:
ext = os.path.extsep + 'undefined'
filename_ext = filename + ext
print filename_ext
if do_rename:
os.rename(filename, filename_ext)
Example:
$ ls *.file? | python add-ext.py --rename
avi.file.avi
djvu.file.undefined
doc.file.dot
gif.file.gif
html.file.html
ico.file.obj
jpg.file.jpe
m3u.file.ksh
mp3.file.mp3
mpg.file.m1v
pdf.file.pdf
pdf.file2.pdf
pdf.file3.pdf
png.file.png
tar.bz2.file.undefined
Following #Phil H's response that follows #csl' response:
#!/usr/bin/env python
"""It is a `filename -> filename.ext` filter.
`ext` is mime-based.
"""
# Mapping of mime-types to extensions is taken form here:
# http://as3corelib.googlecode.com/svn/trunk/src/com/adobe/net/MimeTypeMap.as
mime2exts_list = [
["application/andrew-inset","ez"],
["application/atom+xml","atom"],
["application/mac-binhex40","hqx"],
["application/mac-compactpro","cpt"],
["application/mathml+xml","mathml"],
["application/msword","doc"],
["application/octet-stream","bin","dms","lha","lzh","exe","class","so","dll","dmg"],
["application/oda","oda"],
["application/ogg","ogg"],
["application/pdf","pdf"],
["application/postscript","ai","eps","ps"],
["application/rdf+xml","rdf"],
["application/smil","smi","smil"],
["application/srgs","gram"],
["application/srgs+xml","grxml"],
["application/vnd.adobe.apollo-application-installer-package+zip","air"],
["application/vnd.mif","mif"],
["application/vnd.mozilla.xul+xml","xul"],
["application/vnd.ms-excel","xls"],
["application/vnd.ms-powerpoint","ppt"],
["application/vnd.rn-realmedia","rm"],
["application/vnd.wap.wbxml","wbxml"],
["application/vnd.wap.wmlc","wmlc"],
["application/vnd.wap.wmlscriptc","wmlsc"],
["application/voicexml+xml","vxml"],
["application/x-bcpio","bcpio"],
["application/x-cdlink","vcd"],
["application/x-chess-pgn","pgn"],
["application/x-cpio","cpio"],
["application/x-csh","csh"],
["application/x-director","dcr","dir","dxr"],
["application/x-dvi","dvi"],
["application/x-futuresplash","spl"],
["application/x-gtar","gtar"],
["application/x-hdf","hdf"],
["application/x-javascript","js"],
["application/x-koan","skp","skd","skt","skm"],
["application/x-latex","latex"],
["application/x-netcdf","nc","cdf"],
["application/x-sh","sh"],
["application/x-shar","shar"],
["application/x-shockwave-flash","swf"],
["application/x-stuffit","sit"],
["application/x-sv4cpio","sv4cpio"],
["application/x-sv4crc","sv4crc"],
["application/x-tar","tar"],
["application/x-tcl","tcl"],
["application/x-tex","tex"],
["application/x-texinfo","texinfo","texi"],
["application/x-troff","t","tr","roff"],
["application/x-troff-man","man"],
["application/x-troff-me","me"],
["application/x-troff-ms","ms"],
["application/x-ustar","ustar"],
["application/x-wais-source","src"],
["application/xhtml+xml","xhtml","xht"],
["application/xml","xml","xsl"],
["application/xml-dtd","dtd"],
["application/xslt+xml","xslt"],
["application/zip","zip"],
["audio/basic","au","snd"],
["audio/midi","mid","midi","kar"],
["audio/mpeg","mp3","mpga","mp2"],
["audio/x-aiff","aif","aiff","aifc"],
["audio/x-mpegurl","m3u"],
["audio/x-pn-realaudio","ram","ra"],
["audio/x-wav","wav"],
["chemical/x-pdb","pdb"],
["chemical/x-xyz","xyz"],
["image/bmp","bmp"],
["image/cgm","cgm"],
["image/gif","gif"],
["image/ief","ief"],
["image/jpeg","jpg","jpeg","jpe"],
["image/png","png"],
["image/svg+xml","svg"],
["image/tiff","tiff","tif"],
["image/vnd.djvu","djvu","djv"],
["image/vnd.wap.wbmp","wbmp"],
["image/x-cmu-raster","ras"],
["image/x-icon","ico"],
["image/x-portable-anymap","pnm"],
["image/x-portable-bitmap","pbm"],
["image/x-portable-graymap","pgm"],
["image/x-portable-pixmap","ppm"],
["image/x-rgb","rgb"],
["image/x-xbitmap","xbm"],
["image/x-xpixmap","xpm"],
["image/x-xwindowdump","xwd"],
["model/iges","igs","iges"],
["model/mesh","msh","mesh","silo"],
["model/vrml","wrl","vrml"],
["text/calendar","ics","ifb"],
["text/css","css"],
["text/html","html","htm"],
["text/plain","txt","asc"],
["text/richtext","rtx"],
["text/rtf","rtf"],
["text/sgml","sgml","sgm"],
["text/tab-separated-values","tsv"],
["text/vnd.wap.wml","wml"],
["text/vnd.wap.wmlscript","wmls"],
["text/x-setext","etx"],
["video/mpeg","mpg","mpeg","mpe"],
["video/quicktime","mov","qt"],
["video/vnd.mpegurl","m4u","mxu"],
["video/x-flv","flv"],
["video/x-msvideo","avi"],
["video/x-sgi-movie","movie"],
["x-conference/x-cooltalk","ice"]]
#NOTE: take only the first extension
mime2ext = dict(x[:2] for x in mime2exts_list)
if __name__ == '__main__':
import fileinput, os.path
from subprocess import Popen, PIPE
for filename in (line.rstrip() for line in fileinput.input()):
output, _ = Popen(['file', '-bi', filename], stdout=PIPE).communicate()
mime = output.split(';', 1)[0].lower().strip()
print filename + os.path.extsep + mime2ext.get(mime, 'undefined')
Here's a snippet for old python's versions (not tested):
#NOTE: take only the first extension
mime2ext = {}
for x in mime2exts_list:
mime2ext[x[0]] = x[1]
if __name__ == '__main__':
import os
import sys
# this version supports only stdin (part of fileinput.input() functionality)
lines = sys.stdin.read().split('\n')
for line in lines:
filename = line.rstrip()
output = os.popen('file -bi ' + filename).read()
mime = output.split(';')[0].lower().strip()
try: ext = mime2ext[mime]
except KeyError:
ext = 'undefined'
print filename + '.' + ext
It should work on Python 2.3.5 (I guess).
You can use
file -i filename
to get a MIME-type. You could potentially lookup the type in a list and then append an extension. You can find a list of MIME-types and example file extensions on the net.
Following csl's response:
You can use
file -i filename
to get a MIME-type.
You could potentially lookup the type
in a list and then append an
extension. You can find list of
MIME-types and suggested file
extensions on the net.
I'd suggest you write a script that takes the output of file -i filename, and returns an extension (split on spaces, find the '/', look up that term in a table file) in your language of choice - a few lines at most. Then you can do something like:
ls | while read f; do mv "$f" "$f".`file -i "$f" | get_extension.py`; done
in bash, or throw that in a bash script. Or make the get_extension script bigger, but that makes it less useful next time you want the relevant extension.
Edit: change from for f in * to ls | while read f because the latter handles filenames with spaces in (a particular nightmare on Windows).
Of course, it should be added that deciding on a MIME type just based on file(1) output can be very inaccurate/vague (what's "data" ?) or even completely incorrect...
Agreeing with Keltia, and elaborating some on his answer:
Take care -- some filetypes may be problematic.
JPEG2000, for example.
And others might return too much info given the "file" command without any option tags. The way to avoid this is to use "file -b" for a brief return of information.BZT

Is there a standard way to list names of Python modules in a package?

Is there a straightforward way to list the names of all modules in a package, without using __all__?
For example, given this package:
/testpkg
/testpkg/__init__.py
/testpkg/modulea.py
/testpkg/moduleb.py
I'm wondering if there is a standard or built-in way to do something like this:
>>> package_contents("testpkg")
['modulea', 'moduleb']
The manual approach would be to iterate through the module search paths in order to find the package's directory. One could then list all the files in that directory, filter out the uniquely-named py/pyc/pyo files, strip the extensions, and return that list. But this seems like a fair amount of work for something the module import mechanism is already doing internally. Is that functionality exposed anywhere?
Using python2.3 and above, you could also use the pkgutil module:
>>> import pkgutil
>>> [name for _, name, _ in pkgutil.iter_modules(['testpkg'])]
['modulea', 'moduleb']
EDIT: Note that the parameter for pkgutil.iter_modules is not a list of modules, but a list of paths, so you might want to do something like this:
>>> import os.path, pkgutil
>>> import testpkg
>>> pkgpath = os.path.dirname(testpkg.__file__)
>>> print([name for _, name, _ in pkgutil.iter_modules([pkgpath])])
import module
help(module)
Maybe this will do what you're looking for?
import imp
import os
MODULE_EXTENSIONS = ('.py', '.pyc', '.pyo')
def package_contents(package_name):
file, pathname, description = imp.find_module(package_name)
if file:
raise ImportError('Not a package: %r', package_name)
# Use a set because some may be both source and compiled.
return set([os.path.splitext(module)[0]
for module in os.listdir(pathname)
if module.endswith(MODULE_EXTENSIONS)])
Don't know if I'm overlooking something, or if the answers are just out-dated but;
As stated by user815423426 this only works for live objects and the listed modules are only modules that were imported before.
Listing modules in a package seems really easy using inspect:
>>> import inspect, testpkg
>>> inspect.getmembers(testpkg, inspect.ismodule)
['modulea', 'moduleb']
This is a recursive version that works with python 3.6 and above:
import importlib.util
from pathlib import Path
import os
MODULE_EXTENSIONS = '.py'
def package_contents(package_name):
spec = importlib.util.find_spec(package_name)
if spec is None:
return set()
pathname = Path(spec.origin).parent
ret = set()
with os.scandir(pathname) as entries:
for entry in entries:
if entry.name.startswith('__'):
continue
current = '.'.join((package_name, entry.name.partition('.')[0]))
if entry.is_file():
if entry.name.endswith(MODULE_EXTENSIONS):
ret.add(current)
elif entry.is_dir():
ret.add(current)
ret |= package_contents(current)
return ret
There is a __loader__ variable inside each package instance. So, if you import the package, you can find the "module resources" inside the package:
import testpkg # change this by your package name
for mod in testpkg.__loader__.get_resource_reader().contents():
print(mod)
You can of course improve the loop to find the "module" name:
import testpkg
from pathlib import Path
for mod in testpkg.__loader__.get_resource_reader().contents():
# You can filter the name like
# Path(l).suffix not in (".py", ".pyc")
print(Path(mod).stem)
Inside the package, you can find your modules by directly using __loader__ of course.
This should list the modules:
help("modules")
If you would like to view an inforamtion about your package outside of the python code (from a command prompt) you can use pydoc for it.
# get a full list of packages that you have installed on you machine
$ python -m pydoc modules
# get information about a specific package
$ python -m pydoc <your package>
You will have the same result as pydoc but inside of interpreter using help
>>> import <my package>
>>> help(<my package>)
Based on cdleary's example, here's a recursive version listing path for all submodules:
import imp, os
def iter_submodules(package):
file, pathname, description = imp.find_module(package)
for dirpath, _, filenames in os.walk(pathname):
for filename in filenames:
if os.path.splitext(filename)[1] == ".py":
yield os.path.join(dirpath, filename)
The other answers here will run the code in the package as they inspect it. If you don't want that, you can grep the files like this answer
def _get_class_names(file_name: str) -> List[str]:
"""Get the python class name defined in a file without running code
file_name: the name of the file to search for class definitions in
return: all the classes defined in that python file, empty list if no matches"""
defined_class_names = []
# search the file for class definitions
with open(file_name, "r") as file:
for line in file:
# regular expression for class defined in the file
# searches for text that starts with "class" and ends with ( or :,
# whichever comes first
match = re.search("^class(.+?)(\(|:)", line) # noqa
if match:
# add the cleaned match to the list if there is one
defined_class_name = match.group(1).strip()
defined_class_names.append(defined_class_name)
return defined_class_names
To complete #Metal3d answer, yes you can do testpkg.__loader__.get_resource_reader().contents() to list the "module resources" but it will work only if you imported your package in the "normal" way and your loader is _frozen_importlib_external.SourceFileLoader object.
But if you imported your library with zipimport (ex: to load your package in memory), your loader will be a zipimporter object, and its get_resource_reader function is different from importlib; it will require a "fullname" argument.
To make it work in these two loaders, just specify your package name in argument to get_resource_reader :
# An example with CrackMapExec tool
import importlib
import cme.protocols as cme_protocols
class ProtocolLoader:
def get_protocols(self):
protocols = {}
protocols_names = [x for x in cme_protocols.__loader__.get_resource_reader("cme.protocols").contents()]
for prot_name in protocols_names:
prot = importlib.import_module(f"cme.protocols.{prot_name}")
protocols[prot_name] = prot
return protocols
def package_contents(package_name):
package = __import__(package_name)
return [module_name for module_name in dir(package) if not module_name.startswith("__")]

Categories