I have a number of scripts that reference a Python program via the:
python -c "execfile('myfile.py'); readFunc(param='myParam', input='blahblah')"
interface. What I'd like to do is conceptually simple: Develop a more modular system with a "main" and a normal Python CLI interface that then calls these functions, but also MAINTAINS the existing interface, so the scripts built to use it still work.
Is this possible?
Ideally, if I was to call
python myFile readFunc myParam blabblah
It'd be something like:
main(sys.argv):
readFunc(sys.argv[2], sys.arg[3])
I've tried something like that, but it hasn't quite worked. Is it possible to keep both interfaces/methods of invocation?
Thanks!
The first idea that comes to mind stems from the optional arguments to the execfile() function. You might be able to do something like this:
#!python
def main(args):
results = do_stuff()
return results
if __name__ == '__main__':
import sys
main(sys.argv[1:])
if __name__ == 'execfile':
main(args)
... and then when you want to call it via execfile() you supply a dictionary for its optional globals argument:
#!sh
python -c 'execfile(myfile, {"__name__":"execfile", "args":(1,2,3)}); ...'
This does require a little extra work when you're calling your functionality via -c as you have to remember to pass that dictionary and over-ride '__name__' ... through I suppose you could actually use any valid Python identifier. It's just that __name__ is closest to what you're actually doing.
The next idea feels a little dirty but relies on the apparent handling of the __file__ global identifier. That seems to be unset when calling python -c and set if the file is being imported or executed. So this works (at least for CPython 2.7.9):
#!/usr/bin/env python
foo='foo'
if __name__ == '__main__' and '__file__' not in globals():
print "Under -c:", foo
elif __name__ == '__main__':
print "Executed standalone:", foo
... and if you use that please don't give me credit. It looks ...
... ... ummm ....
... just ...
.... WRONG
If I understand this one
python myFile readFunc myParam blabblah
correctly, you want to parse argv[1] as a command name to be executed.
So just do
if __name__ == '__main__':
import sys
if len(sys.argv) < 2 or sys.argv[1].lower() == 'nop' or sys.argv[0] == '-c': # old, legacy interface
pass
elif sys.argv[1].lower() == 'readfunc': # new one
readFunc(sys.argv[2:])
where the 2nd part gets executed on a direct execution of the file (either via python file.py readFunc myParam blabblah or via python -m file readFunc myParam blabblah)
The "nop" / empty argv branch comes to play when using the "legacy" interface: in this case, you most probably have given no cmdline arguments and thus can assume that you don't want to execute anything.
This makes the situation as before: the readFunc identifier is exported and can be used from within the -c script as before.
Related
My package has the following structure:
mypackage
|-__main__.py
|-__init__.py
|-model
|-__init__.py
|-modelfile.py
|-simulation
|-sim1.py
|-sim2.py
The content of the file __main__.py is
from mypackage.simulation import sim1
if __name__ == '__main__':
sim1
So that when I execute python -m mypackage, the script sim1.py runs.
Now I would like to add an argument to the command line, so that python -m mypackage sim1 runs sim1.py and python -m mypackage sim2 runs sim2.py.
I've tried the follwing:
import sys
from mypackage.simulation import sim1,sim2
if __name__ == '__main__':
for arg in sys.argv:
arg
But it runs boths scripts instead of the one passed in argument.
In sim1.py and sim2.py I have the following code
from mypackage.model import modelfile
print('modelfile.ModelClass.someattr')
You can simply call __import__ with the module name as parameter, e.g.:
new_module = __import__(arg)
in your loop.
So, for example, you have your main program named example.py:
import sys
if __name__ == '__main__':
for arg in sys.argv[1:]:
module=__import__(arg)
print(arg, module.foo(1))
Note that sys.argv[0] contains the program name.
You have your sim1.py:
print('sim1')
def foo(n):
return n+1
and your sim2.py:
print('sim2')
def foo(n):
return n+2
then you can call
python example.py sim1 sim2
output:
sim1
sim1 2
sim2
sim2 3
Suppose you have you files with following content.
sim1.py
def simulation1():
print("This is simulation 1")
simulation1()
main.py
import sim1
sim1.simulation1()
output
This is simulation 1
This is simulation 1
When you import sim1 into main.py and calls its function simulation1, then This is simulation 1 gets printed 2 times.
Because, simulation1 is called inside sim1.py and also in main.py.
If you want to run that function in sim1.py, but don't want to run when sim1 is imported, then you can place it inside if __name__ == "__main__":.
sim1.py
def simulation1():
print("This is simulation 1")
if __name__ == "__main__":
simulation1()
main.py
import sim1
sim1.simulation1()
output
This is simulation 1
Your code doesn't do what you want it to do. Just sim1 doesn't actually call the function; the syntax to do that is sim1().
You could make your Python script evaluate random strings from the command line as Python expressions, but that's really not a secure or elegant way to solve this. Instead, have the strings map to internal functions, which may or may not have the same name. For example,
if __name__ == '__main__':
import sys
for arg in sys.argv[1:]:
if arg == 'sim1':
sim1()
if arg == 'mustard':
sim2()
if arg == 'ketchup':
sim3(sausages=2, cucumber=user in cucumberlovers)
else:
raise ValueError('Anguish! Don\'t know how to handle %s' % arg)
As this should hopefully illustrate, the symbol you accept on the command line does not need to correspond to the name of the function you want to run. If you want that to be the case, you can simplify this to use a dictionary:
if __name__ == '__main__':
import sys
d = {fun.__name__: fun for fun in (sim1, sim2)}
for arg in sys.argv[1:]:
if arg in d:
d[arg]()
else:
raise ValueError('Anguish! etc')
What's perhaps important to note here is that you select exactly which Python symbols you want to give the user access to from the command line, and allow no others to leak through. That would be a security problem (think what would happen if someone passed in 'import shutil; shutil.rmtree("/")' as the argument to run). This is similar in spirit to the many, many reasons to avoid eval, which you will find are easy to google (and you probably should if this is unfamiliar to you).
If sim1 is a module name you want to import only when the user specifically requests it, that's not hard to do either; see importing a module when the module name is in a variable but then you can't import it earlier on in the script.
if __name__ == '__main__':
import sys
modules = ['sim1', 'sim2']
for arg in sys.argv[1:]:
if arg in modules:
globals()[arg] = __import__(arg)
else:
raise ValueError('Anguish! etc')
But generally speaking, modules should probably only define functions, and leave it to the caller to decide if and when to run them at some time after they import the module.
Perhaps tangentially look into third-party libraries like click which easily allow you to expose selected functions as "subcommands" of your Python script, vaguely similarly to how git has subcommands init, log, etc.
Should the function name main() always be empty and have arguments called within the function itself or is it acceptable to have them as inputs to the function e.g main(arg1, arg2, arg3)?
I know it works but I'm wondering if it is poor programming practice. Apologies if this is a duplicate but I couldn't see the question specifically answered for Python.
In most other programming languages, you'd either have zero parameters or two parameters:
int main(char *argv[], int argc)
To denote the arguments passed through to the parameter. However, in Python these are accessed through the sys module:
import sys
def main():
print(sys.argv, len(sys.argv))
But then you could extend this so that you pass through argv and argc into your python function, similar to other languages yes:
import sys
def main(argv, arc):
print(argv, arc)
if __name__ == '__main__':
main(sys.argv, len(sys.argv))
But let's forget about argv/argc for now - why would you want to pass something through to main. You create something outside of main and want to pass it through to main. And this can happen in two instances:
You're calling main multiple times from other functions.
You've created variables outside main that you want to pass through.
Point number 1 is definitely bad practice. main should be unique and called only once at the beginning of your program. If you have the need to call it multiple times, then the code inside main doesn't belong inside main. Split it up.
Point number 2 may seem like it makes sense, but then you do it in practise:
def main(a, b):
print(a, b)
if __name__ == '__main__':
x = 4
y = 5
main(x, y)
But then aren't x and y global variables? And good practice would assume that these are at the top of your file (and multiple other properties - they're constant, etc), and that you wouldn't need to pass these through as arguments.
By following the pattern:
def main():
...stuff...
if __name__ == '__main__':
main()
It allows your script to both to be run directly, and if packaged using setup tools, to have an executable script generated automatically when the package is installed by specifying main as an entry point.
See: https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-script-creation
You would add to setup.py something like:
entry_points={
'console_scripts': [
'my_script = my_module:main'
]
}
And then when you build a package, people can install it in their virtual environment, and immediately get a script called my_script on their path.
Automatic script creation like this requires a function that takes no required arguments.
It's a good idea to allow you script to be imported and expose it's functionality both for code reuse, and also for testing. I would recommend something line this pattern:
import argparse
def parse_args():
parser = argparse.ArgumentParser()
#
# ... configure command line arguments ...
#
return parser.parse_args()
def do_stuff(args):
#
# ... main functionality goes in here ...
#
def main():
args = parse_args()
do_stuff(args)
if __name__ == '__main__':
main()
This allows you to run your script directly, have an automatically generated script that behaves the same way, and also import the script and call do_stuff to re-use or test the actual functionality.
This blog post was mentioned in the comments: https://www.artima.com/weblogs/viewpost.jsp?thread=4829 which uses a default argument on main to allow dependency injection for testing, however, this is a very old blog post; the getopt library has been superseded twice since then. This pattern is superior and still allows dependency injection.
I would definitely prefer to see main take arguments rather than accessing sys.argv directly.
This makes the reuse of the main function by other Python modules much easier.
import sys
def main(arg):
...
if __name__ == "__main__":
main(sys.argv[1])
Now if I want to execute this module as a script from another module I can just write (in my other module).
from main_script import main
main("use this argument")
If main uses sys.argv this is tougher.
I'm considering how a Python file could be made to be an importable module as well as a script that is capable of accepting command line options and arguments as well as pipe data. How should this be done?
My attempt seems to work, but I want to know if my approach is how such a thing should be done (if such a thing should be done). Could there be complexities (such as when importing it) that I have not considered?
#!/usr/bin/env python
"""
usage:
program [options]
options:
--version display version and exit
--datamode engage data mode
--data=FILENAME input data file [default: data.txt]
"""
import docopt
import sys
def main(options):
print("main")
datamode = options["--datamode"]
filename_input_data = options["--data"]
if datamode:
print("engage data mode")
process_data(filename_input_data)
if not sys.stdin.isatty():
print("accepting pipe data")
input_stream = sys.stdin
input_stream_list = [line for line in input_stream]
print("input stream: {data}".format(data = input_stream_list))
def process_data(filename):
print("process data of file {filename}".format(filename = filename))
if __name__ == "__main__":
options = docopt.docopt(__doc__)
if options["--version"]:
print(version)
exit()
main(options)
That's it, you're good.
Nothing matters[1] except the if __name__ == '__main__', as noted elsewhere
From the docs (emphasis mine):
A module’s __name__ is set equal to '__main__' when read from standard input, a script, or from an interactive prompt. A module can discover whether or not it is running in the main scope by checking its own __name__, which allows a common idiom for conditionally executing code in a module when it is run as a script or with python -m but not when it is imported
I also like how python 2's docs poetically phrase it
It is this environment in which the idiomatic “conditional script” stanza causes a script to run:
That guard guarantees that the code underneath it will only be accepted if it is the main function being called; put all your argument-grabbing code there. If there is no other top-level code except class/function declarations, it will be safe to import.
Other complications?
Yes:
Multiprocessing (a new interpreter is started and things are re-imported). if __name__ == '__main__' covers that
If you're used to C coding, you might be thinking you can protect your imports with ifdef's and the like. There's some analogous hacks in python, but it's not what you're looking for.
I like having a main method like C and Java - when's that coming out? Never.
But I'm paranoid! What if someone changes my main function. Stop being friends with that person. As long as you're the user, I assume this isn't an issue.
I mentioned the -m flag. That sounds great, what's that?! Here and here, but don't worry about it.
Footnotes:
[1] Well, the fact that you put your main code in a function is nice. Means things will run slightly faster
maybe the title is not very clear, let me elaborate.
I have a python script that open a ppm file , apply a chosen filter(rotations...) and create a new picture. until here everything work fine.
but I want to do the same thing through a linux console like:
ppmfilter.py ROTD /path/imageIn.ppm /path/imageOut.ppm
here ROTD is the name of the function that apply a rotation.
I don't know how to do this, I'm looking for a library that'll allow me to do this.
looking forward for your help.
P.S.: I'm using python 2.7
There is a relatively easy way:
You can determine the global names (functions, variables, etc.) with the use of 'globals()'. This gives you a dictionary of all global symbols. You'll just need to check the type (with type() and the module types) and if it's a function, you can call it with sys.argv:
import types
import sys
def ROTD(infile, outfile):
# do something
if __name__ == '__main__':
symbol = globals().get(sys.argv[1])
if hasattr(symbol, '__call__'):
symbol(*sys.argv[2:])
This will pass the program argument (excluding the filename and the command name) to the function.
EDIT: Please, don't forget the error handling. I omited it for reasons of clarity.
Use main() function:
def main()
# call your function here
if __name__ == "__main__":
main()
A nice way to do it would be to define a big dictionary {alias: function} inside your module. For instance:
actions = {
'ROTD': ROTD,
'REFL': reflect_image,
'INVT': invIm,
}
You get the idea. Then take the first command-line argument and interpret it as a key of this dictionary, applying actions[k] to the rest of the arguments.
You can define in your ppmfilter.py main section doing this:
if __name__ == "__main__":
import sys
ROTD(sys.argv[1], sys.argv[2]) # change according to the signature of the function
and call it: python ppmfilter.py file1 file2
You can also run python -c in the directory that contains you *.py file:
python -c "import ppmfilter; ppmfilter.ROTD('/path/to/file1', '/path/to/file2')"
Should I start a Python program with:
if__name__ == '__main__':
some code...
And if so, why? I saw it many times but don't have a clue about it.
If your program is usable as a library but you also have a main program (e.g. to test the library), that construct lets others import the file as a library and not run your main program. If your program is named foo.py and you do "import foo" from another python file, __name__ evaluates to 'foo', but if you run "python foo.py" from the command line, __name__ evaluates to '__main__'.
Note that you do need to insert a space between if and _, and indent the main program:
if __name__ == '__main__':
main program here
A better pattern is this:
def main():
...
if __name__ == '__main__':
main()
This allows your code to be invoked by someone who imported it, while also making programs such as pychecker and pylint work.
Guido Van Rossum suggests:
def main(argv=None):
if argv is None:
argv = sys.argv
...
if __name__ == "__main__":
sys.exit(main())
This way you can run main() from somewhere else (supplying the arguments), and if you want to exit with an error code just return 1 from main(), and it won't make an interactive interpreter exit by mistake.
This is good practice. First, it clearly marks your module entry point (assuming you don't have any other executable code at toplevel - yuck). Second, it makes your module importable by other modules without executing, which some tools like code checkers, packagers etc. need to do.