Why a function called `main` doesn't have any special significance? - python

In Python why a function called main doesn't have any special significance like it has in C and Java?
What if a programmer switches from C or Java to Python. Should he keep
using main in Python also like in C or Java as it's his style now to do
the programming or in a broad sense it is somehow harmful for doing
programming in Python?
Edit: I have gone through this article where it was mentioned by a very good example why first time programmers should refrain main in python main in python harmful.

You should also be asking, why in C and Java does main have a special significance. It's just a choice on the part of the language designer. main could well have been called start or begin but somebody chose main and it stuck.
In Python there is no reason why you can't call a function main and have it be the start point of your program. However, Python has its own syntax to identify whether a certain file is the equivalent of main:
__name__ == "__main__"
This is typically wrapped as part of an if and could simply have a single line within calling your main function that actually starts your program.
Part of the design of Python and many (all?) scripting languages is that code can simply be written inline. You don't have to wrap everything in a function. As such, many simple scripts do not require any functions at all. A cron job for example that rotates log files could just be written as a block of code in a python file with no functions being defined.
In that scenario, the main method just isn't required.
Not requiring a main in many ways makes the language more flexible, especially for simpler tasks.
ADDENDUM:
To add some context to your edit. That article presents a very poor argument. In reality function name collisions are not uncommon as there are many modules that do the same or similar things (not so much in core but as soon as you start using pip you'll encounter the odd collision). Therefore it is beneficial to use descriptive function names and avoid ever doing a from foo import *.
In the same way that C++ programmers generally consider it bad form to pollute your namespace with using namespace std, Python programmers typically consider it bad form to pollute your namespace with import *, especially as it can cause a snowball effect if used everywhere.
Finally, you're unlikely to call 2 functions in your program main. You're much more likely to have name collisions elsewhere. The real danger is the wildcard import, not the main function.

Such a programmer should do this:
if __name__ == "__main__":
# run stuff
The variable __name__ is set to "__main__" if the module is not imported.

In Python, there is something that acts like that main() function:
if __name__ == '__main__':
# Your main function
The code in this if block is only run if the Python file has not been imported as a module.
If you're determined to use that ugly main() function, the Python equivalent would be something like this:
def main():
# ...
if __name__ == '__main__':
main()
I don't find it very pretty.

In python a .py file can be used as either a module or can be run directly , so main is used to identify which part is to be run if the file is run directly or if the file is imported as module.
file.py:
def func(x,y):
print x+y
if __name__=='__main__':
func(5,2):
if executed directly it prints 7
when imported , the main part doesn't runs:
>>> from so32 import *
>>> func(5,5)
10
>>> func(10,20)
30

In Python why a function called main doesn't have any special significance like it has in C and Java?
Because python wasn't design with C, java or such languages in mind, there are some languages that actually don't use main, http://c2.com/cgi/wiki?HelloWorldInManyProgrammingLanguages has quite a large number of hello world programs as you can see not all use main, though you probably can't really compare them all ...
Either way as others have stated you can check if a module has being directly ran from the interpreter by checking if __name__ == '__main__': oh and here are some tips from Guido Van Russon the creator of python on how to write main functions.
What if a programmer switches from C or Java to Python. Should he keep using main in Python also like in C or Java as it's his style now to do the programming or in a broad sense it is somehow harmful for doing programming in Python?
when you write python code, or for that matter when you write any program always try to have code seem natural in that language, this means following standards and practices either set by the language developers or the community at large, please don't write c or java like code in python its possible but its frown upon, the code tends to be error prone, and very difficult to understand, and sometimes quite slow, the same applies to any other language.
this has quite few good references Python coding standards/best practices

There is an excellent explanation of why using main() to call individual procedures(or methods) is a good idea, by Kent D Lee in YouTube - basically it's because without main() all your variables are global and not only do they take longer to access, they also run the risk that one procedure's variables will interfere with those of another procedure. Also using main() to call your procedures enables you to reuse those procedures in another program by importing them. Additionally you can test individual procedures without executing the whole program. Of course if all you are going to do is output "hello world" it is no advantage at all to have a main method, and the only reason you would do it is to find out how it works.
The reason why everyone says that Python is good for learners is because you can start doing things straightaway very easily - the trouble is that if you are going to write sizeable programs you are soon faced with the fact that all those complications that other languages have, are actually quite important.

Related

How to include a cpp file for all the function and classes but ignore the main function?

I used to use python and now I'm shifting to c++ for efficiency. I used to have the habit to test the library file with a main function, just to make sure all the functions and classes are working fine. And
if __name__ == "__main__":
main()
works really well even if I need to import the file for other codes.
However, I am not sure how can I do the same for c++. I know I can comment out main function in the library file before I want to include that file. But I would really like to know is there any equivalent method in C++.
First you shouldn't do it the python way but write a small test program to test your library. To design it the python way is not common in C++ and you will not make many new friends doing it.
But if you really want to do that, you can define main as a weak symbol.
int __attribute__((weak)) main() {
[your code]
}
The linker will override any weak symbol if it finds a non-weak symbol. That trick is for example used by lex/flex to give you a generic main if you don't write your own.
As StoryTeller mentioned it is not in the standard and not available on all platforms. Especially not on Windows.

Tracking changes in python source files?

I'm learning python and came into a situation where I need to change the behvaviour of a function. I'm initially a java programmer so in the Java world a change in a function would let Eclipse shows that a lot of source files in Java has errors. That way I can know which files need to get modified. But how would one do such a thing in python considering there are no types?! I'm using TextMate2 for python coding.
Currently I'm doing the brute-force way. Opening every python script file and check where I'm using that function and then modify. But I'm sure this is not the way to deal with large projects!!!
Edit: as an example I define a class called Graph in a python script file. Graph has two objects variables. I created many objects (each with different name!!!) of this class in many script files and then decided that I want to change the name of the object variables! Now I'm going through each file and reading my code again in order to change the names again :(. PLEASE help!
Example: File A has objects x,y,z of class C. File B has objects xx,yy,zz of class C. Class C has two instance variables names that should be changed Foo to Poo and Foo1 to Poo1. Also consider many files like A and B. What would you do to solve this? Are you serisouly going to open each file and search for x,y,z,xx,yy,zz and then change the names individually?!!!
Sounds like you can only code inside an IDE!
Two steps to free yourself from your IDE and become a better programmer.
Write unit tests for your code.
Learn how to use grep
Unit tests will exercise your code and provide reassurance that it is always doing what you wanted it to do. They make refactoring MUCH easier.
grep, what a wonderful tool grep -R 'my_function_name' src will find every reference to your function in files under the directory src.
Also, see this rather wonderful blog post: Unix as an IDE.
Whoa, slow down. The coding process you described is not scalable.
How exactly did you change the behavior of the function? Give specifics, please.
UPDATE: This all sounds like you're trying to implement a class and its methods by cobbling together a motley patchwork of functions and local variables - like I wrongly did when I first learned OO coding in Python. The code smell is that when the type/class of some class internal changes, it should generally not affect the class methods. If you're refactoring all your code every 10 mins, you're doing something seriously wrong. Step back and think about clean decomposition into objects, methods and data members.
(Please give more specifics if you want a more useful answer.)
If you were only changing input types, there might be no need to change the calling code.
(Unless the new fn does something very different to the old one, in which case what was the argument against calling it a different name?)
If you changed the return type, and you can't find a common ancestor type or container (tuple, sequence etc.) to put the return values in, then yes you need to change its caller code. However...
...however if the function should really be a method of a class, declare that class and the method already. The previous paragraph was a code smell that your function really should have been a method, specifically a polymorphic method.
Read about code smells, anti-patterns and When do you know you're dealing with an anti-pattern?. There e.g. you will find a recommendation for the video "Recovery from Addiction - A taste of the Python programming language's concision and elegance from someone who once suffered an addiction to the Java programming language." - Sean Kelly
Also, sounds like you want to use Test-Driven Design and add some unittests.
If you give us the specifics we can critique it better.
You won't get this functionality in a text editor. I use sublime text 3, and I love it, but it doesn't have this functionality. It does however jump to files and functions via its 'Goto Anything' (Ctrl+P) functionality, and its Multiple Selections / Multi Edit is great for small refactoring tasks.
However, when it comes to IDEs, JetBrains pycharm has some of the amazing re-factoring tools that you might be looking for.
The also free Python Tools for Visual Studio (see free install options here which can use the free VS shell) has some excellent Refactoring capabilities and a superb REPL to boot.
I use all three. I spend most of my time in sublime text, I like pycharm for refactoring, and I find PT4VS excellent for very involved prototyping.
Despite python being a dynamically typed language, IDEs can still introspect to a reasonable degree. But, of course, it won't approach the level of Java or C# IDEs. Incidentally, if you are coming over from Java, you may have come across JetBrains IntelliJ, which PyCharm will feel almost identical to.
One's programming style is certainly different between a statically typed language like C# and a dynamic language like python. I find myself doing things in smaller, testable modules. The iteration speed is faster. And in a dynamic language one relies less on IDE tools and more on unit tests that cover the key functionality. If you don't have these you will break things when you refactor.
One answer only specific to your edit:
if your old code was working and does not need to be modified, you could just keep old names as alias of the new ones, resulting in your old code not to be broken. Example:
class MyClass(object):
def __init__(self):
self.t = time.time()
# creating new names
def new_foo(self, arg):
return 'new_foo', arg
def new_bar(self, arg):
return 'new_bar', arg
# now creating functions aliases
foo = new_foo
bar = new_bar
if your code need rework, rewrite your common code, execute everything, and correct any failure. You could also look for any import/instantiation of your class.
One of the tradeoffs between statically and dynamically typed languages is that the latter require less scaffolding in the form of type declarations, but also provide less help with refactoring tools and compile-time error detection. Some Python IDEs do offer a certain level of type inference and help with refactoring, but even the best of them will not be able to match the tools developed for statically typed languages.
Dynamic language programmers typically ensure correctness while refactoring in one or more of the following ways:
Use grep to look for function invocation sites, and fix them. (You would have to do that in languages like Java as well if you wanted to handle reflection.)
Start the application and see what goes wrong.
Write unit tests, if you don't already have them, use a coverage tool to make sure that they cover your whole program, and run the test suite after each change to check that everything still works.

Should I use a main() method in a simple Python script?

I have a lot of simple scripts that calculate some stuff or so. They consist of just a single module.
Should I write main methods for them and call them with the if __name__ construct, or just dump it all right in there?
What are the advantages of either method?
I always write a main() function (appropriately named), and put nothing but command-line parsing and a call to main() in the if __name__ == '__main__' block. That's because no matter how silly, trivial, or single-purpose I originally expect that script to be, I always end up wanting to call it from another module at some later date.
Either I take the time to make it an importable module today, or spend extra time to refactor it months later when I want to reuse it for something else.
Always.
Every time.
I've stopped fighting it and started writing my code with that expectation from the start.
Well, if you do this:
# your code
Then import your_module will execute your code. On the contrary, with this:
if __name__ == '__main__':
# your code
The import won't run the code, but targeting the interpreter at that file will.
If the only way the script is ever going to run is by manual interpreter opening, there's absolutely no difference.
This becomes important when you have a library (or reusing the definitions in the script).
Adding code to a library outside a definition, or outside the protection of if __name__ runs the code when importing, letting you initialize stuff that the library needs.
Maybe you want your library to also have some runnable functionality. Maybe testing, or maybe something like Python's SimpleHTTPServer (it comes with some classes, but you can also run the module and it will start a server). You can have that dual behaviour with the if __name__ clause.
Tools like epydoc import the module to access the docstrings, so running the code when you just want to generate HTML documentation is not really the intent.
The other answers are good, but I wanted to add an example of why you might want to be able to import it: unit testing. If you have a few functions and then the if __name__=="__main__":, you can import the module for unit testing. Maybe you're better at this than me, but I find that my "simple scripts that couldn't possibly have any bugs" tend to have bugs that I find with unit testing.
The if __name__ construct will let you easily re-use the functions and classes in the module in other Python scripts. If you don't do that, then everything in the module will run when it is imported.
If there's nothing in the script you want to reuse, then sure, just dump it all in there. I do that sometimes. If you later decide you want to reuse some code, I have found Python to be just about the easiest language in which to refactor code without breaking it.
For a good explanation of the purpose of Python's "main guard" idiom:
What does if __name__ == "__main__": do?

How can I sandbox Python in pure Python?

I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.

Python Module Initialization Order?

I am a Python newbie coming from a C++ background. While I know it's not Pythonic to try to find a matching concept using my old C++ knowledge, I think this question is still a general question to ask:
Under C++, there is a well known problem called global/static variable initialization order fiasco, due to C++'s inability to decide which global/static variable would be initialized first across compilation units, thus a global/static variable depending on another one in different compilation units might be initialized earlier than its dependency counterparts, and when dependant started to use the services provided by the dependency object, we would have undefined behavior. Here I don't want to go too deep on how C++ solves this problem. :)
On the Python world, I do see uses of global variables, even across different .py files, and one typycal usage case I saw was: initialize one global object in one .py file, and on other .py files, the code just fearlessly start using the global object, assuming that it must have been initialized somewhere else, which under C++ is definitely unaccept by myself, due to the problem I specified above.
I am not sure if the above use case is common practice in Python (Pythonic), and how does Python solve this kind of global variable initialization order problem in general?
Under C++, there is a well known problem called global/static variable initialization order fiasco, due to C++'s inability to decide which global/static variable would be initialized first across compilation units,
I think that statement highlights a key difference between Python and C++: in Python, there is no such thing as different compilation units. What I mean by that is, in C++ (as you know), two different source files might be compiled completely independently from each other, and thus if you compare a line in file A and a line in file B, there is nothing to tell you which will get placed first in the program. It's kind of like the situation with multiple threads: you cannot say whether a particular statement in thread 1 will be executed before or after a particular statement in thread 2. You could say C++ programs are compiled in parallel.
In contrast, in Python, execution begins at the top of one file and proceeds in a well-defined order through each statement in the file, branching out to other files at the points where they are imported. In fact, you could almost think of the import directive as an #include, and in that way you could identify the order of execution of all the lines of code in all the source files in the program. (Well, it's a little more complicated than that, since a module only really gets executed the first time it's imported, and for other reasons.) If C++ programs are compiled in parallel, Python programs are interpreted serially.
Your question also touches on the deeper meaning of modules in Python. A Python module - which is everything that is in a single .py file - is an actual object. Everything declared at "global" scope in a single source file is actually an attribute of that module object. There is no true global scope in Python. (Python programmers often say "global" and in fact there is a global keyword in the language, but it always really refers to the top level of the current module.) I could see that being a bit of a strange concept to get used to coming from a C++ background. It took some getting used to for me, coming from Java, and in this respect Java is a lot more similar to Python than C++ is. (There is also no global scope in Java)
I will mention that in Python it is perfectly normal to use a variable without having any idea whether it has been initialized/defined or not. Well, maybe not normal, but at least acceptable under appropriate circumstances. In Python, trying to use an undefined variable raises a NameError; you don't get arbitrary behavior as you might in C or C++, so you can easily handle the situation. You may see this pattern:
try:
duck.quack()
except NameError:
pass
which does nothing if duck does not exist. Actually, what you'll more commonly see is
try:
duck.quack()
except AttributeError:
pass
which does nothing if duck does not have a method named quack. (AttributeError is the kind of error you get when you try to access an attribute of an object, but the object does not have any attribute by that name.) This is what passes for a type check in Python: we figure that if all we need the duck to do is quack, we can just ask it to quack, and if it does, we don't care whether it's really a duck or not. (It's called duck typing ;-)
Python import executes new Python modules from beginning to end. Subsequent imports only result in a copy of the existing reference in sys.modules, even if still in the middle of importing the module due to a circular import. Module attributes ("global variables" are actually at the module scope) that have been initialized before the circular import will exist.
main.py:
import a
a.py:
var1 = 'foo'
import b
var2 = 'bar'
b.py:
import a
print a.var1 # works
print a.var2 # fails

Categories