I'm a Python beginner and have successfully gotten my first program with CLI parameters passed in to run. Got lots of help from this Handling command line options.
My question is: Why in Example 5.45 a separate def main(argv) has been used, instead of calling the try/except block within __main__ itself.
Example 5.45
def main(argv):
grammar = "kant.xml"
try:
opts, args = getopt.getopt(argv, "hg:d", ["help", "grammar="]) 2
except getopt.GetoptError:
usage()
sys.exit(2)
...
if __name__ == "__main__":
main(sys.argv[1:])
Hope someone versed in Python could share your wisdom.
TIA -
Ashant
There is no strict technical reason, but it is quite idiomatic to keep the code outside functions as short as possible. Specifically, putting the code into the module scope would turn the variables grammar, opts and args into global public variables, even though they are only required inside the main code. Furthermore, using a dedicated main function simplifies unit-testing this function.
One advantage of using a main function is that it allows for easy code re-use:
import sys
import script
script.main(sys.argv[1:])
# or, e.g. script.main(['-v', 'file.txt']), etc
Any code in the script's __main__ block won't be run if it is imported as a module. So the main function acts as a simple interface providing access to all the normal functionality of the script. The __main__ block will then usually contain just a single call to main, plus any other non-essential code (such as tests).
Some tips from the author of Python on how to write a good main function can be found here.
Related
I have this code:
import sys
def random(size=16):
return open(r"C:\Users\ravishankarv\Documents\Python\key.txt").read(size)
def main():
key = random(13)
print(key)
When I try running the script, there are no errors, but nothing appears to happen. I expected it to print some content from the key file, but nothing is printed.
What is wrong? How do I make the code run?
You've not called your main function at all, so the Python interpreter won't call it for you.
Add this as the last line to just have it called at all times:
main()
Or, if you use the commonly seen:
if __name__ == "__main__":
main()
It will make sure your main method is called only if that module is executed as the starting code by the Python interpreter. More about that here: What does if __name__ == "__main__": do?
If you want to know how to write the best possible 'main' function, Guido van Rossum (the creator of Python) wrote about it here.
Python isn't like other languages where it automatically calls the main() function. All you have done is defined your function.
You have to manually call your main function:
main()
Also, you may commonly see this in some code:
if __name__ == '__main__':
main()
There's no such main method in python, what you have to do is:
if __name__ == '__main__':
main()
Something does happen, it just isn't noticeable
Python runs scripts from top to bottom. def is a statement, and it executes when it is encountered, just like any other statement. However, the effect of this is to create the function (and assign it a name), not to call it. Similarly, import is a statement that loads the other module (and makes its code run top to bottom, with its own global-variable context), and assigns it a name.
When the example code runs, therefore, three things happen:
The code for the sys standard library module runs, and then the name sys in our own module's global variables is bound to that module
A function is created from the code for random, and then the name random is bound to that function
A function is created from the code for main, and then the name main is bound to that function
There is nothing to call the functions, so they aren't called. Since they aren't called, the code inside them isn't run - it's only used to create the functions. Since that code doesn't run, the file isn't read and nothing is printed.
There are no "special" function names
Unlike in some other languages, Python does not care that a function is named main, or anything else. It will not be run automatically.
As the Zen of Python says, "Explicit is better than implicit". If we want a function to be called, we have to call it. The only things that run automatically are the things at top level, because those are the instructions we explicitly gave.
The script starts at the top
In many real-world scripts, you may see a line that says if __name__ == '__main__':. This is not "where the script starts". The script runs top to bottom.
Please read What does if __name__ == "__main__": do? to understand the purpose of such an if statement (short version: it makes sure that part of your top-level code is skipped if someone else imports this file as a module). It is not mandatory, and it does not have any kind of special "signalling" purpose to say where the code starts running. It is just a perfectly normal if statement, that is checking a slightly unusual condition. Nothing requires you to use it in a script (aside from wanting to check what it checks), and nothing prevents you from using it more than once. Nothing prevents you from checking whether __name__ is equal to other values, either (it's just... almost certainly useless).
You're not calling the function. Put main() at the bottom of your code.
I have this code:
import sys
def random(size=16):
return open(r"C:\Users\ravishankarv\Documents\Python\key.txt").read(size)
def main():
key = random(13)
print(key)
When I try running the script, there are no errors, but nothing appears to happen. I expected it to print some content from the key file, but nothing is printed.
What is wrong? How do I make the code run?
You've not called your main function at all, so the Python interpreter won't call it for you.
Add this as the last line to just have it called at all times:
main()
Or, if you use the commonly seen:
if __name__ == "__main__":
main()
It will make sure your main method is called only if that module is executed as the starting code by the Python interpreter. More about that here: What does if __name__ == "__main__": do?
If you want to know how to write the best possible 'main' function, Guido van Rossum (the creator of Python) wrote about it here.
Python isn't like other languages where it automatically calls the main() function. All you have done is defined your function.
You have to manually call your main function:
main()
Also, you may commonly see this in some code:
if __name__ == '__main__':
main()
There's no such main method in python, what you have to do is:
if __name__ == '__main__':
main()
Something does happen, it just isn't noticeable
Python runs scripts from top to bottom. def is a statement, and it executes when it is encountered, just like any other statement. However, the effect of this is to create the function (and assign it a name), not to call it. Similarly, import is a statement that loads the other module (and makes its code run top to bottom, with its own global-variable context), and assigns it a name.
When the example code runs, therefore, three things happen:
The code for the sys standard library module runs, and then the name sys in our own module's global variables is bound to that module
A function is created from the code for random, and then the name random is bound to that function
A function is created from the code for main, and then the name main is bound to that function
There is nothing to call the functions, so they aren't called. Since they aren't called, the code inside them isn't run - it's only used to create the functions. Since that code doesn't run, the file isn't read and nothing is printed.
There are no "special" function names
Unlike in some other languages, Python does not care that a function is named main, or anything else. It will not be run automatically.
As the Zen of Python says, "Explicit is better than implicit". If we want a function to be called, we have to call it. The only things that run automatically are the things at top level, because those are the instructions we explicitly gave.
The script starts at the top
In many real-world scripts, you may see a line that says if __name__ == '__main__':. This is not "where the script starts". The script runs top to bottom.
Please read What does if __name__ == "__main__": do? to understand the purpose of such an if statement (short version: it makes sure that part of your top-level code is skipped if someone else imports this file as a module). It is not mandatory, and it does not have any kind of special "signalling" purpose to say where the code starts running. It is just a perfectly normal if statement, that is checking a slightly unusual condition. Nothing requires you to use it in a script (aside from wanting to check what it checks), and nothing prevents you from using it more than once. Nothing prevents you from checking whether __name__ is equal to other values, either (it's just... almost certainly useless).
You're not calling the function. Put main() at the bottom of your code.
Should the function name main() always be empty and have arguments called within the function itself or is it acceptable to have them as inputs to the function e.g main(arg1, arg2, arg3)?
I know it works but I'm wondering if it is poor programming practice. Apologies if this is a duplicate but I couldn't see the question specifically answered for Python.
In most other programming languages, you'd either have zero parameters or two parameters:
int main(char *argv[], int argc)
To denote the arguments passed through to the parameter. However, in Python these are accessed through the sys module:
import sys
def main():
print(sys.argv, len(sys.argv))
But then you could extend this so that you pass through argv and argc into your python function, similar to other languages yes:
import sys
def main(argv, arc):
print(argv, arc)
if __name__ == '__main__':
main(sys.argv, len(sys.argv))
But let's forget about argv/argc for now - why would you want to pass something through to main. You create something outside of main and want to pass it through to main. And this can happen in two instances:
You're calling main multiple times from other functions.
You've created variables outside main that you want to pass through.
Point number 1 is definitely bad practice. main should be unique and called only once at the beginning of your program. If you have the need to call it multiple times, then the code inside main doesn't belong inside main. Split it up.
Point number 2 may seem like it makes sense, but then you do it in practise:
def main(a, b):
print(a, b)
if __name__ == '__main__':
x = 4
y = 5
main(x, y)
But then aren't x and y global variables? And good practice would assume that these are at the top of your file (and multiple other properties - they're constant, etc), and that you wouldn't need to pass these through as arguments.
By following the pattern:
def main():
...stuff...
if __name__ == '__main__':
main()
It allows your script to both to be run directly, and if packaged using setup tools, to have an executable script generated automatically when the package is installed by specifying main as an entry point.
See: https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-script-creation
You would add to setup.py something like:
entry_points={
'console_scripts': [
'my_script = my_module:main'
]
}
And then when you build a package, people can install it in their virtual environment, and immediately get a script called my_script on their path.
Automatic script creation like this requires a function that takes no required arguments.
It's a good idea to allow you script to be imported and expose it's functionality both for code reuse, and also for testing. I would recommend something line this pattern:
import argparse
def parse_args():
parser = argparse.ArgumentParser()
#
# ... configure command line arguments ...
#
return parser.parse_args()
def do_stuff(args):
#
# ... main functionality goes in here ...
#
def main():
args = parse_args()
do_stuff(args)
if __name__ == '__main__':
main()
This allows you to run your script directly, have an automatically generated script that behaves the same way, and also import the script and call do_stuff to re-use or test the actual functionality.
This blog post was mentioned in the comments: https://www.artima.com/weblogs/viewpost.jsp?thread=4829 which uses a default argument on main to allow dependency injection for testing, however, this is a very old blog post; the getopt library has been superseded twice since then. This pattern is superior and still allows dependency injection.
I would definitely prefer to see main take arguments rather than accessing sys.argv directly.
This makes the reuse of the main function by other Python modules much easier.
import sys
def main(arg):
...
if __name__ == "__main__":
main(sys.argv[1])
Now if I want to execute this module as a script from another module I can just write (in my other module).
from main_script import main
main("use this argument")
If main uses sys.argv this is tougher.
I have this code:
import sys
def random(size=16):
return open(r"C:\Users\ravishankarv\Documents\Python\key.txt").read(size)
def main():
key = random(13)
print(key)
When I try running the script, there are no errors, but nothing appears to happen. I expected it to print some content from the key file, but nothing is printed.
What is wrong? How do I make the code run?
You've not called your main function at all, so the Python interpreter won't call it for you.
Add this as the last line to just have it called at all times:
main()
Or, if you use the commonly seen:
if __name__ == "__main__":
main()
It will make sure your main method is called only if that module is executed as the starting code by the Python interpreter. More about that here: What does if __name__ == "__main__": do?
If you want to know how to write the best possible 'main' function, Guido van Rossum (the creator of Python) wrote about it here.
Python isn't like other languages where it automatically calls the main() function. All you have done is defined your function.
You have to manually call your main function:
main()
Also, you may commonly see this in some code:
if __name__ == '__main__':
main()
There's no such main method in python, what you have to do is:
if __name__ == '__main__':
main()
Something does happen, it just isn't noticeable
Python runs scripts from top to bottom. def is a statement, and it executes when it is encountered, just like any other statement. However, the effect of this is to create the function (and assign it a name), not to call it. Similarly, import is a statement that loads the other module (and makes its code run top to bottom, with its own global-variable context), and assigns it a name.
When the example code runs, therefore, three things happen:
The code for the sys standard library module runs, and then the name sys in our own module's global variables is bound to that module
A function is created from the code for random, and then the name random is bound to that function
A function is created from the code for main, and then the name main is bound to that function
There is nothing to call the functions, so they aren't called. Since they aren't called, the code inside them isn't run - it's only used to create the functions. Since that code doesn't run, the file isn't read and nothing is printed.
There are no "special" function names
Unlike in some other languages, Python does not care that a function is named main, or anything else. It will not be run automatically.
As the Zen of Python says, "Explicit is better than implicit". If we want a function to be called, we have to call it. The only things that run automatically are the things at top level, because those are the instructions we explicitly gave.
The script starts at the top
In many real-world scripts, you may see a line that says if __name__ == '__main__':. This is not "where the script starts". The script runs top to bottom.
Please read What does if __name__ == "__main__": do? to understand the purpose of such an if statement (short version: it makes sure that part of your top-level code is skipped if someone else imports this file as a module). It is not mandatory, and it does not have any kind of special "signalling" purpose to say where the code starts running. It is just a perfectly normal if statement, that is checking a slightly unusual condition. Nothing requires you to use it in a script (aside from wanting to check what it checks), and nothing prevents you from using it more than once. Nothing prevents you from checking whether __name__ is equal to other values, either (it's just... almost certainly useless).
You're not calling the function. Put main() at the bottom of your code.
I'm considering how a Python file could be made to be an importable module as well as a script that is capable of accepting command line options and arguments as well as pipe data. How should this be done?
My attempt seems to work, but I want to know if my approach is how such a thing should be done (if such a thing should be done). Could there be complexities (such as when importing it) that I have not considered?
#!/usr/bin/env python
"""
usage:
program [options]
options:
--version display version and exit
--datamode engage data mode
--data=FILENAME input data file [default: data.txt]
"""
import docopt
import sys
def main(options):
print("main")
datamode = options["--datamode"]
filename_input_data = options["--data"]
if datamode:
print("engage data mode")
process_data(filename_input_data)
if not sys.stdin.isatty():
print("accepting pipe data")
input_stream = sys.stdin
input_stream_list = [line for line in input_stream]
print("input stream: {data}".format(data = input_stream_list))
def process_data(filename):
print("process data of file {filename}".format(filename = filename))
if __name__ == "__main__":
options = docopt.docopt(__doc__)
if options["--version"]:
print(version)
exit()
main(options)
That's it, you're good.
Nothing matters[1] except the if __name__ == '__main__', as noted elsewhere
From the docs (emphasis mine):
A module’s __name__ is set equal to '__main__' when read from standard input, a script, or from an interactive prompt. A module can discover whether or not it is running in the main scope by checking its own __name__, which allows a common idiom for conditionally executing code in a module when it is run as a script or with python -m but not when it is imported
I also like how python 2's docs poetically phrase it
It is this environment in which the idiomatic “conditional script” stanza causes a script to run:
That guard guarantees that the code underneath it will only be accepted if it is the main function being called; put all your argument-grabbing code there. If there is no other top-level code except class/function declarations, it will be safe to import.
Other complications?
Yes:
Multiprocessing (a new interpreter is started and things are re-imported). if __name__ == '__main__' covers that
If you're used to C coding, you might be thinking you can protect your imports with ifdef's and the like. There's some analogous hacks in python, but it's not what you're looking for.
I like having a main method like C and Java - when's that coming out? Never.
But I'm paranoid! What if someone changes my main function. Stop being friends with that person. As long as you're the user, I assume this isn't an issue.
I mentioned the -m flag. That sounds great, what's that?! Here and here, but don't worry about it.
Footnotes:
[1] Well, the fact that you put your main code in a function is nice. Means things will run slightly faster