I have a lot of simple scripts that calculate some stuff or so. They consist of just a single module.
Should I write main methods for them and call them with the if __name__ construct, or just dump it all right in there?
What are the advantages of either method?
I always write a main() function (appropriately named), and put nothing but command-line parsing and a call to main() in the if __name__ == '__main__' block. That's because no matter how silly, trivial, or single-purpose I originally expect that script to be, I always end up wanting to call it from another module at some later date.
Either I take the time to make it an importable module today, or spend extra time to refactor it months later when I want to reuse it for something else.
Always.
Every time.
I've stopped fighting it and started writing my code with that expectation from the start.
Well, if you do this:
# your code
Then import your_module will execute your code. On the contrary, with this:
if __name__ == '__main__':
# your code
The import won't run the code, but targeting the interpreter at that file will.
If the only way the script is ever going to run is by manual interpreter opening, there's absolutely no difference.
This becomes important when you have a library (or reusing the definitions in the script).
Adding code to a library outside a definition, or outside the protection of if __name__ runs the code when importing, letting you initialize stuff that the library needs.
Maybe you want your library to also have some runnable functionality. Maybe testing, or maybe something like Python's SimpleHTTPServer (it comes with some classes, but you can also run the module and it will start a server). You can have that dual behaviour with the if __name__ clause.
Tools like epydoc import the module to access the docstrings, so running the code when you just want to generate HTML documentation is not really the intent.
The other answers are good, but I wanted to add an example of why you might want to be able to import it: unit testing. If you have a few functions and then the if __name__=="__main__":, you can import the module for unit testing. Maybe you're better at this than me, but I find that my "simple scripts that couldn't possibly have any bugs" tend to have bugs that I find with unit testing.
The if __name__ construct will let you easily re-use the functions and classes in the module in other Python scripts. If you don't do that, then everything in the module will run when it is imported.
If there's nothing in the script you want to reuse, then sure, just dump it all in there. I do that sometimes. If you later decide you want to reuse some code, I have found Python to be just about the easiest language in which to refactor code without breaking it.
For a good explanation of the purpose of Python's "main guard" idiom:
What does if __name__ == "__main__": do?
Related
In Python why a function called main doesn't have any special significance like it has in C and Java?
What if a programmer switches from C or Java to Python. Should he keep
using main in Python also like in C or Java as it's his style now to do
the programming or in a broad sense it is somehow harmful for doing
programming in Python?
Edit: I have gone through this article where it was mentioned by a very good example why first time programmers should refrain main in python main in python harmful.
You should also be asking, why in C and Java does main have a special significance. It's just a choice on the part of the language designer. main could well have been called start or begin but somebody chose main and it stuck.
In Python there is no reason why you can't call a function main and have it be the start point of your program. However, Python has its own syntax to identify whether a certain file is the equivalent of main:
__name__ == "__main__"
This is typically wrapped as part of an if and could simply have a single line within calling your main function that actually starts your program.
Part of the design of Python and many (all?) scripting languages is that code can simply be written inline. You don't have to wrap everything in a function. As such, many simple scripts do not require any functions at all. A cron job for example that rotates log files could just be written as a block of code in a python file with no functions being defined.
In that scenario, the main method just isn't required.
Not requiring a main in many ways makes the language more flexible, especially for simpler tasks.
ADDENDUM:
To add some context to your edit. That article presents a very poor argument. In reality function name collisions are not uncommon as there are many modules that do the same or similar things (not so much in core but as soon as you start using pip you'll encounter the odd collision). Therefore it is beneficial to use descriptive function names and avoid ever doing a from foo import *.
In the same way that C++ programmers generally consider it bad form to pollute your namespace with using namespace std, Python programmers typically consider it bad form to pollute your namespace with import *, especially as it can cause a snowball effect if used everywhere.
Finally, you're unlikely to call 2 functions in your program main. You're much more likely to have name collisions elsewhere. The real danger is the wildcard import, not the main function.
Such a programmer should do this:
if __name__ == "__main__":
# run stuff
The variable __name__ is set to "__main__" if the module is not imported.
In Python, there is something that acts like that main() function:
if __name__ == '__main__':
# Your main function
The code in this if block is only run if the Python file has not been imported as a module.
If you're determined to use that ugly main() function, the Python equivalent would be something like this:
def main():
# ...
if __name__ == '__main__':
main()
I don't find it very pretty.
In python a .py file can be used as either a module or can be run directly , so main is used to identify which part is to be run if the file is run directly or if the file is imported as module.
file.py:
def func(x,y):
print x+y
if __name__=='__main__':
func(5,2):
if executed directly it prints 7
when imported , the main part doesn't runs:
>>> from so32 import *
>>> func(5,5)
10
>>> func(10,20)
30
In Python why a function called main doesn't have any special significance like it has in C and Java?
Because python wasn't design with C, java or such languages in mind, there are some languages that actually don't use main, http://c2.com/cgi/wiki?HelloWorldInManyProgrammingLanguages has quite a large number of hello world programs as you can see not all use main, though you probably can't really compare them all ...
Either way as others have stated you can check if a module has being directly ran from the interpreter by checking if __name__ == '__main__': oh and here are some tips from Guido Van Russon the creator of python on how to write main functions.
What if a programmer switches from C or Java to Python. Should he keep using main in Python also like in C or Java as it's his style now to do the programming or in a broad sense it is somehow harmful for doing programming in Python?
when you write python code, or for that matter when you write any program always try to have code seem natural in that language, this means following standards and practices either set by the language developers or the community at large, please don't write c or java like code in python its possible but its frown upon, the code tends to be error prone, and very difficult to understand, and sometimes quite slow, the same applies to any other language.
this has quite few good references Python coding standards/best practices
There is an excellent explanation of why using main() to call individual procedures(or methods) is a good idea, by Kent D Lee in YouTube - basically it's because without main() all your variables are global and not only do they take longer to access, they also run the risk that one procedure's variables will interfere with those of another procedure. Also using main() to call your procedures enables you to reuse those procedures in another program by importing them. Additionally you can test individual procedures without executing the whole program. Of course if all you are going to do is output "hello world" it is no advantage at all to have a main method, and the only reason you would do it is to find out how it works.
The reason why everyone says that Python is good for learners is because you can start doing things straightaway very easily - the trouble is that if you are going to write sizeable programs you are soon faced with the fact that all those complications that other languages have, are actually quite important.
I am relatively new to Python having used C# for many years and I'm hoping someone can help me with this question. I have a module called actuators.py that contains a number of classes for defining properties and methods for the servos I use in a robot project. In another module called robot.py, I instantiate my actuators like this:
import actuators as Actuators
myActuators = Actuators.AllActuators()
This allows me to access the properties of my servos as long as I am in the robot.py module. For example, I can write:
print myActuators.HeadTilt.MinPosition
to get the minimum value allowed for the servo that tilts the robot's head. So far so good.
Now I want to access these same values in a separate thread that is defined by a different module called tilt_head.py. I assume I need to import a reference to the robot.py module, but doing this ends up re-executing all the code in robot.py whereas all I really want is a static reference to the myActuators object. And I can't use
from robot.py import myActuators
because myActuators is not a module.
In C#, I would do this using a declaration like this:
public static Actuators myActuators;
which then allows me to reference myActuators in any other file within my project. Is there a way to do something similar in Python? If you need my actual code, I will be happy to post it.
Thanks!
And I can't use
from robot.py import myActuators
because myActuators is not a module.
But myActuators doesn't need to be a module. You can do exactly that. (Though you'll want to use just robot rather than robot.py)
http://docs.python.org/reference/simple_stmts.html#import
As well:
http://docs.python.org/tutorial/modules.html#more-on-modules
"A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module is imported somewhere."
So using a script that imports robot and then tilt_head, the executable stuff in robot.py will NOT be run multiple times.
Of course, if robot.py is intended to be the main module of the program, then I'd suggest going with A. Levy's answer.
You should be able to simply use:
from robot import myActuators
You can't say "from robot.py" because the name there is a module name, not a file name.
BTW: I'm not sure why you are doing import actuators as Actuators. Why do you want to change the case? Modules are most commonly lowercase.
As others have been saying, you can actually import myActuators from the robot module. To solve your problem of the code in robot being re-executed, the standard Python approach is to wrap the stand-alone code in an if __name__ == '__main__': so that it will only be executed when the module is used as a standalone app, but not when it is imported.
E.G.
# Things that are defined when robot.py is used as a module
# and when used standalone.
import actuators as Actuators
myActuators = Actuators.AllActuators()
if __name__ == '__main__':
# Stuff that you only want executed when running robot.py
Then when you want to use robot.py in another module, you should be able to do this:
from robot import myActuators
EDIT:
Actually, as JAB pointed out, I was a little confused in my explanation of conditional execution. your code won't be re-executed if you reload the module. The module loading machinery handles reloading of modules to make sure that they only get loaded and defined one time. All subsequent loads are references to the previously loaded module. If, however, you do have some steps that you only want to happen when you are running the module as a script, then you can place them in the if block and they won't be executed when you load robots from another module.
I guess it doesn't really solve your problem, though it seems like your problem was really an incorrect import syntax.
I would guess that you can access that object as robot.myActuators because it already exists in the process, inside the robot.py namespace.
Many thanks for all the answers and pointers. As it turns out, my problem was a recursive import that ended up trying to create an object more than once. By sorting out the instance creation a little more logically, I was finally able to make the problem go away.
I wrote a python module. Running python filename.py, only checks for syntax errors. Is there a tool, which checks for runtime errors also, like concatenating int with string etc..
Thank you
Bala
Update:
Scripts are mainly about setting up a hadoop cluster in the cloud. I am not sure how I can write a unit test, because everything runs in the cloud. You can think of code as legacy code, and I just added more logging and some extra conditions a few places
Traditionally, if not writing full-fledged unit-tests and/or doc-tests (writing lots of tests is of course best practice!), one at least puts in every module a def main(): function to exercise it and ends the module with
if __name__ == '__main__':
main()
so main() won't get in the way if the module's just imported, but it will execute if you run the module as your main script. Of course, you need to actually exercise the code in the module from within main(), for this to catch all kinds of semantic problems such as the type error you mention -- doing a really thorough job this way is often as hard as writing real unit tests and doc tests would be, but you can at least get started!
You could write a unit test for your module. That way it will execute your code and any runtime errors (or even better, test failures) will be reported.
If you choose to go down this route, http://docs.python.org/library/unittest.html would probably be a good place to start. Alternatively, as Alex wrote, you can just put code at the bottom of your module that will execute when the module is run directly. This is more expedient and probably a better first approach, although if you have a lot of modules you may want a more structured approach.
You can give a try to pyanalyze. It is able to detect possible run-time errors without running the program.
pip3 install pyanalyze
python3 -m pyanalyze file.py
This question already has answers here:
What does if __name__ == "__main__": do?
(45 answers)
Closed 6 months ago.
I occasionally notice something like the following in Python scripts:
if __name__ == "__main__":
# do stuff like call main()
What's the point of this?
Having all substantial Python code live inside a function (i.e., not at module top level) is a crucial performance optimization as well as an important factor in good organization of code (the Python compiler can optimize access to local variables in a function much better than it can optimize "local" variables which are actually a module's globals, since the semantics of the latter are more demanding).
Making the call to the function conditional on the current module being run as the "main script" (rather than imported from another module) makes for potential reusability of nuggets of functionality contained in the module (since other modules may import it and just call the appropriate functions or classes), and even more importantly it supports solid unit testing (where all sort of mock-ups and fakes for external subsystems may generally need to be set up before the module's functionality is exercised and tested).
This allows a python script to be imported or run standalone is a sane way.
If you run a python file directly, the __name__ variable will contain __main__. If you import the script that will not be the case. Normally, if you import the script you want to call functions or reference classes from the file.
If you did not have this check, any code that was not in a class or function would run when you import.
The sole purpose of this, assuming it is in main.py, is so other files can import main to include classes and functions that are in your "main" program, but without running the source code.
Without this condition, code that is in the global scope will be executed when it is imported by other scripts.
It's a great place to put module tests. This will only run when a module is run directly from the shell but it will not run if imported.
The PEP 8 recommends that modules be imported at the beginning of programs.
Now, I feel that importing some of them at the beginning of the main program (i.e., after if __name__ == '__main__') makes sense. For instance, if the main program reads arguments from the command line, I tend to do import sys at the beginning of the main program: this way, sys does not have to be imported when the code is used as a module, since there is no need, in this case, for command line argument access.
How bad is this infringement to PEP 8? should I refrain from doing this? or would it be reasonable to amend PEP 8?
I can't really tell you how bad this is to do.
However, I've greatly improved performance (response time, load) for a web app by importing certain libraries only at the first usage.
BTW, the following is also from PEP 8:
But most importantly: know when to be
inconsistent -- sometimes the style
guide just doesn't apply. When in
doubt, use your best judgment. Look
at other examples and decide what
looks best. And don't hesitate to
ask!
In general I don't think there's much harm in late importing for modules that may not be needed.
However sys I would definitely import early, at the top. It's such a common module that it's quite likely you might use sys elsewhere in your script and not notice that it's not always imported. sys is also one of the modules that always gets loaded by Python itself, so you are not saving any module startup time by avoiding the import (not that there's much startup for sys anyway).
I would recommend you to do what you feel is most appropriate when there is nothing in PEP about your concern.
Importing sys doesn't really take that long that I would worry about it. Some modules do take longer however.
I don't think sys really clogs up the namespace very much. I wouldn't use a variable or class called sys regardless.
If you think it's doing more harm than good to have it at the top, by all means do it however you like. PEP 8 is just a guide line and lots of code you see does not conform to it.
The issue isn't performance.
The issue is clarity.
Your "main" program is only a main program today. Tomorrow, it may be a library included in some higher-level main program. Later, it will be just one module in a bigger package.
Since your "main" program's life may change, you have two responses.
Isolate the "main" things inside if __name__ == "__main__". This is not a grotesque violation of PEP-8. This is a reasonable way to package things.
Try to limit the number of features in your "main" program scripts. Try to keep them down to imports and the if __name__ == "__main__" stuff. If your main script is small, then your import question goes away.