Question about python construction

Question about python construction - python

A friend of mine that is a programmer told me that "Python is written in Python" or something like that. He meant that Python interpreter is written in Python (I think). I've read in some websites that Python interpret in real time ANY programming language (even C++ and ASM). Is this true?
Could someone explain me HOW COULD IT BE?
The unique explanation that I came up with after thinking a bit is: python is at the same "level" of ASM, it makes sense to python interpret any language (that is in a higher level), am I right? Does this make sense?
I would be grateful is someone explain me a little about it.
Thank you

It's not true. The standard implementation of Python - CPython - is written in C, although much of the standard library is written in Python. There are other implementations in Java (Jython) and .NET (IronPython).
There is a project called PyPy which, among other things, is rewriting the C parts of Python into Python. But the main development of Python is still based on C.

Your friend told you that Python is self-hosting:
The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-hosting include kernels, assemblers, shells and revision control software.
Of course, the very first revision of Python had to be bootstrapped by some other mechanism -- perhaps C or C++ as these are fairly standard targets for lexers and parser generators.

Generally, when someone says language X is written in X, they mean that first a compiler or interpreter for X was written in assembly or other such language, compiled, and then a better compiler or interpreter was written in X.
Additionally, once a very basic compiler/interpreter for X exists, it is sometimes easier to add new language features, classes, etc. to X by writing them in X than to extend the compiler/interpreter itself.

Python is written in C (CPython) as well as Python.
Read about pypy -- that's Python written in Python.
Writing Python in Python is a two-step dance.
Write Python in some other language. C, Java, assembler, COBOL, whatever.
Once you have a working implementation of Python (i.e., passes all the tests) you can then write Python in Python.
When you read about pypy, you'll see that they do something a hair more sophisticated than this. "We are using a subset of the high-level language Python, called RPython, in which we write languages as simple interpreters with few references to and dependencies on lower level details."
So they started with a working Python and then broke the run-time into this RPython kernel which is the smallest nugget of Python goodness. Then they built the rest of Python around the RPython kernel.

Related

Base language of Python

What is the base language Python is written in?

You can't say that Python is written in some programming language, since Python as a language is just a set of rules (like syntax rules, or descriptions of standard functionality). So we might say, that it is written in English :). However, mentioned rules can be implemented in some programming language. Hence, if you send a string like 'import this' to that program called interpreter, it'd return you "Zen of Python".
Since most modern OS are written in C, compilers/interpreters for modern high-level languages are also written in C. Python is not an exception - its most popular/"traditional" implementation is called CPython and is written in C.
There are other implementations:
IronPython (Python running on .NET)
Jython (Python running on the Java Virtual Machine)
PyPy (A fast python implementation with a JIT compiler)
Stackless Python (Branch of CPython supporting microthreads)

The sources are public. Python is written in C (actually the default implementation is called CPython).

Python is written in English. But there are several implementations:
PyPy (written in Python)
CPython (written in C)
IronPython (written in C#)
Jython (written in Java)

it is written in C, its also called CPython.

You get a good idea if you compile python from source. Usually it's gcc that compiles the *.c files

To add to and reframe some of the other good answers:
The specification for Python (question) is written in English, but could be written in a formal semantics, as Standard
ML and Scheme are.
See
Programming language specification.
There are implementations of Python in many languages, as noted in Gandaro's answer.
Updated thanks to the inspiration of #TylerH:
As an aside, when the performance of a compute-intensive application is important, the fastest implementation is often not the standard CPython, which is written in C. For example, for many cases, pypy or Cython (both written in Python) may be faster

How to make Python code write once, run anywhere?

I am learning Python. My intentions are:
to write a webapp in Python/Django
create an android app (using Jython)
write some python scripts for unix box
I was under (incorrect) impression that because Python has been implemented in Java (Jython) and .NET (IronPython), I could simply write my Python code and run it through either interpreter/compiler.
I thought if I wrote a hello world in CPython and compiled it with Jython, I'd get Java bytecode. If I compliled it with IronPython, I'd get .NET bytecode.
But now it seems like regular Python code won't work with Jython compiler/interpreter. You've to import some fancy Java specific modules. So, that means, I would have to re-write my program for Java using Java modules/libraries.
Any tips on how to write my Python code so that it works everywhere? Web, Unix, Android.
NOTE: I don't want to have to learn Java.
Thanks

print 'Hello, World!'
This works just fine on any Python implementation worthy of the name. So will most other pure-Python code. Where it gets tricky is when using libraries, as Jython and IronPython are missing some standard library modules and don't support C extensions. Dealing with platform-specific code can also present some issues.
If you want your code to be portable, you need to remove as many dependencies as possible from the shared code. The standard library is generally OK (but not complete in either), and pure-Python external modules are generally OK if they only depend on other pure-Python modules.
If you do need to detect them, I believe the canonical checks are:
if os.name == 'java': # Jython
if sys.platform == 'cli': # IronPython
Neither Jython nor IronPython will produce programs that will run without Jython/IronPython being present. In principle it's possible, and it's even possible to compile a subset of Python to pure bytecode; the former requires linking in the Python engine, and the latter would require restricting what parts of Python you could use.
If someone were to provide this for IronPython I wouldn't turn it down, and I doubt the Jython team would either, but I'm not holding my breath. Either option is a lot of work.

Please be more specific about what you are trying to do. What is your regular Python code ? What does not work with it as you expected ?
According to the Jython FAQ, Jython is an implementation of the Python language. The same Python code should produce the same result on Jython or CPython.

Can PyPy/RPython be used to produce a small standalone executable?

(Or, "Can PyPy/RPython be used to compile/translate Python to C/C++ without requiring the Python runtime?")
I have tried to comprehend PyPy with its RPython and its Python, its running and its compiling and its translating, and have somewhat failed.
I have a hypothetical Python project (for Windows); I would like to keep its size down, in the order of a hundred kilobytes (O.N.O.) rather than the several megabytes that using py2exe entails (after UPX). Can I use PyPy1 in any way to produce a standalone executable which does not depend on Python26.dll? If I can, does it need to follow the RPython restrictions like for only working on builtin types, or is it full Python syntax?
I do realise that if this can be done I almost certainly couldn't use C modules from Python directly.
1 (Since the time of asking, the situation has become clearer, and this part of the toolchain is more clearly branded as RPython rather than PyPy; it wasn't so in 2010.)

Yes, PyPy can produce standalone executables from RPython code. That means, you need to follow all the awkward RPython rules when it comes to writing code. Your Python code is completely unlikely to function out of the box and porting existing Python code is usually not fun. It won't make executables as small as C, but for example rpystone target (from pypy/translator/goal) using boehm GC is 80k on 64bit after stripping.

Why do C programs require decompilers but python programs dont?

If I write a python script, anyone can simply point an editor to it and read it. But for programming written in C, one would have to use decompilers and hex tables and such. Why is that? I mean I simply can't open up the Safari web browser and look at its code.

Note: The author disavows a deep expertise in this subject. Some assertions may be incorrect.
Python actually is compiled into bytecode, which is what gets run by the python interpreter. Whenever you use a Python module, Python will generate a .pyc file with a name corresponding to the module. This is the equivalent of the .o file that's generated when you compile a C file.
So if you want something to disassemble, the .pyc file would be it :)
The process that Python goes through when compiling a module is pretty similar to what gcc or another C compiler does with C source code. The major difference is that it happens transparently as part of execution of the file. It's also optional: when running a non-module, i.e. an end-user script, Python will just interpret the code rather than compiling it first.
So really your question is "Why are python programs distributed as source rather than as compiled modules?" Or, put another way, "Why are C applications distributed as compiled binaries rather than as source code?"
It used to be very common for C applications to be distributed as source code. This was back before operating systems and their various subentities (i.e. linux distributions) became more established. Some distros, for example gentoo, still distribute apps as source code. Apps which are a bit more cutting edge or obscure are still distributed as source code for all platforms they target.
The reason for this is compatibility, and dependencies. The reason you can run the precompiled binary Safari on a Mac, or Firefox on Ubuntu Linux, is because it's been specifically built for that operating system, architecture (e.g. x86_64), and set of libraries.
Unfortunately, compilation of a large app is pretty slow, and needs to be redone at least partially every time the app is updated. Thus the motivation for binary distributions.
So why not create a binary distribution of Python? For one thing, as Aaron mentions, modules would need to be recompiled for each new version of the Python bytecode. But this would be similar to rebuilding a C app to link with a newer version of a dynamic library — Python modules are analogous in this sense to C libraries.
The real reason is that Python compilation is very much quicker than C compilation. This is in part, I think, because of the dynamic nature of the language, and also because it's not as thorough of a compilation. This has its tradeoffs: in particular, Python apps run much more slowly than do their C counterparts, because Python has to interpret the compiled bytecode into instructions for the processor, whereas the C app already contains such instructions.
That all being said, there is a program called py2exe that will take a Python module and distribution and build a precompiled windows executable, including in it the logic of the module and its dependencies, including Python itself. I guess the point of this is to avoid having to coerce people into installing Python on their Windows system just to run your app. Under linux, or I think even OS/X, Python is usually already installed, so precompilation is not really necessary. Linux systems also have super-dandy package managers that will transparently install dependencies such as Python if they are not already installed.

Python is a script language, runs in a virtual machine through an interpeter.
C is a compiled language, the code compiled to binary code which the computer can run without all that extra stuff Python needs.

This is sorta a big topic. You should look into your local friendly Computer Science curriculum, you'll find a lot of great stuff on this subject there.
The short answer is the Python is an "interpreted" language, which means that it requires a machine language program (the python interpreter) to run the python program, adding a layer of indirection. C or C++ are different. They are compiled directly to machine code, which runs directly on your processor.
There is a lot of additional voodoo to be learned here, however. Technically Python is compiled to a bytecode, and modern interpreters do more and more "Just in Time" compilation, so the boundaries between compiled and interpreted code are getting fuzzier all the time.

In several comments you asked: "Is it then possible to compile python to an executable binary file and then simply distribute that?"
From a theoretical viewpoint, there's no question the answer is yes -- a Python program could be compiled to, and distributed as, fully compiled machine code.
From a practical viewpoint, it's open to a lot more question. There are a few things like Unladen Swallow, Psyco, Shed Skin, and PyPy that you might want to know about though.
Unladen Swallow is primarily an attempt at making Python run faster, but part of the plan to do so involves using LLVM for its back-end. LLVM can (among other things) produce native machine code output. The last couple of releases of Unladen Swallow have used LLVM for native code generation, but 1) the most recent update on the web site is from late 2009, and 2) the release notes for that version say: "The Unladen Swallow team does not recommend wide adoption of the 2009Q3 release."
Psyco works as a plug-in for Python that basically does JIT compilation, so even though it can speed up execution (quite a lot in some cases), it doesn't produce a machine-code executable you can distribute. In short, while it's sort of similar to what you want, it's not intended to do exactly what you've asked for.
Shed Skin Python-to-C++ produces C++ as its output, and you then compile the C++ and (potentially) distribute the result of that. Shedskin is currently at version 0.5 -- i.e., nobody's claiming that it's a finished, released product. On the other hand, development is ongoing, and each release does seem to include pretty substantial improvements.
PyPy is a Python implementation written in Python. Their intent is to allow code production to be "plugged in" without affecting the rest of the implementation -- but while they currently support 4 different code generation models, I don't believe any of them results in producing native machine code that runs directly on the hardware.
Bottom line: work has been done and is being done with the intent of doing what you asked about, but at least to my knowledge there's not really anything I could reasonably recommend as a finished product that you can really depend on to do the job right now. The primary emphasis is really on execution speed, not producing standalone executables.

Yes, you can - it's called disassembling, and allows you to look at the code of Safari perfectly well. The thing is, C, among other languages, compiles to native code, i.e. code that your CPU can "understand" and execute.
More or less obviously, the level of abstraction present in the instruction set of your CPU is much smaller than that of a high level language like Python. The CPU instructions are not concerned with "downloading that URI", but more "check if that bit is set in a hardware register".
So, in conclusion, the level of complexity present in a native application is much higher when looking at the machine code, so many people simply can't make any sense of what is going on there, it's hard to get the big picture. With experience and time at your hands, it is possible though - people do it all the time, reversing applications and all.

you can't open up and read the code that actually runs for python either. Try
import dis
def foo():
for i in range(100):
print i
print dis.dis(foo)
That will show you the (human readable) bytcode of the foo program. equivalently, you can save the file and import it from the interactive python interpreter. This will create a .pyc file with the same basename as the script. open that with a hex editor and you are looking at the actually python bytecode.
The reason for the difference is that python changes up it's byte code between releases so that you would either need to distribute a different version of a binary only release for each version of python. This would be a pain.
With C, it's compiled to native code and so the byte code is much more stable making binary only releases possible.

because C code is complied to object (machine) code and python code is compiled into an intermediate byte code. I am not sure if you are even referring to the byte code of python - you must be referring to the source file itself which is directly executable (hiding the byte code from you!). C needs to be compiled and linked.

Python scripts are parsed and converted to binary only when they're run - i.e., they're text files and you can read them with an editor.
C code is compiled and linked to an executable binary file before they can be run. Normally, only this executable binary file is distributed - hence you need a decompiler. You can always view the source code, if you've access to it.

Not all C programs require decompilers. There's lots of C code distributed in source form. And some Python programs do require decompilers, if distributed as bytecode (.pyc files).
But, to the extent that your assumptions are valid, it's because C is a compiled language while Python is an interpreted language.

Python scripts are analogous to a man looking at a to-do list written in English (or language he understands). The man has to do all the work, every time that list of things has to be done.
If the man, instead of doing the steps on his own each time, creates and programs a robot which can carry out those steps again and again (and probably faster than him), that robot is analogous to the C program.
The man in the python case is called the "interpreter" and in the C case is called the "compiler", and the C robot is called the compiled program/executable.
When you look at the python program source, you see the to-do list. In case of the robot, you see the gears, motors and batteries, etc, which look very different from the to-do list. If you could get hold of the C "to-do" list, it looks somewhat like the python code, just in a different language.

G-WAN executes ANSI C scripts on the fly -making it just like Python scripts.
This can be server-side scripts (using G-WAN as a Web server) or any general-purpose C program and you can link any existing library.
Oh, and G-WAN C scripts are much faster than Python, PHP or Java...

PyPy -- How can it possibly beat CPython?

From the Google Open Source Blog:
PyPy is a reimplementation of Python
in Python, using advanced techniques
to try to attain better performance
than CPython. Many years of hard work
have finally paid off. Our speed
results often beat CPython, ranging
from being slightly slower, to
speedups of up to 2x on real
application code, to speedups of up to
10x on small benchmarks.
How is this possible? Which Python implementation was used to implement PyPy? CPython? And what are the chances of a PyPyPy or PyPyPyPy beating their score?
(On a related note... why would anyone try something like this?)

"PyPy is a reimplementation of Python in Python" is a rather misleading way to describe PyPy, IMHO, although it's technically true.
There are two major parts of PyPy.
The translation framework
The interpreter
The translation framework is a compiler. It compiles RPython code down to C (or other targets), automatically adding in aspects such as garbage collection and a JIT compiler. It cannot handle arbitrary Python code, only RPython.
RPython is a subset of normal Python; all RPython code is Python code, but not the other way around. There is no formal definition of RPython, because RPython is basically just "the subset of Python that can be translated by PyPy's translation framework". But in order to be translated, RPython code has to be statically typed (the types are inferred, you don't declare them, but it's still strictly one type per variable), and you can't do things like declaring/modifying functions/classes at runtime either.
The interpreter then is a normal Python interpreter written in RPython.
Because RPython code is normal Python code, you can run it on any Python interpreter. But none of PyPy's speed claims come from running it that way; this is just for a rapid test cycle, because translating the interpreter takes a long time.
With that understood, it should be immediately obvious that speculations about PyPyPy or PyPyPyPy don't actually make any sense. You have an interpreter written in RPython. You translate it to C code that executes Python quickly. There the process stops; there's no more RPython to speed up by processing it again.
So "How is it possible for PyPy to be faster than CPython" also becomes fairly obvious. PyPy has a better implementation, including a JIT compiler (it's generally not quite as fast without the JIT compiler, I believe, which means PyPy is only faster for programs susceptible to JIT-compilation). CPython was never designed to be a highly optimising implementation of the Python language (though they do try to make it a highly optimised implementation, if you follow the difference).
The really innovative bit of the PyPy project is that they don't write sophisticated GC schemes or JIT compilers by hand. They write the interpreter relatively straightforwardly in RPython, and for all RPython is lower level than Python it's still an object-oriented garbage collected language, much more high level than C. Then the translation framework automatically adds things like GC and JIT. So the translation framework is a huge effort, but it applies equally well to the PyPy python interpreter however they change their implementation, allowing for much more freedom in experimentation to improve performance (without worrying about introducing GC bugs or updating the JIT compiler to cope with the changes). It also means when they get around to implementing a Python3 interpreter, it will automatically get the same benefits. And any other interpreters written with the PyPy framework (of which there are a number at varying stages of polish). And all interpreters using the PyPy framework automatically support all platforms supported by the framework.
So the true benefit of the PyPy project is to separate out (as much as possible) all the parts of implementing an efficient platform-independent interpreter for a dynamic language. And then come up with one good implementation of them in one place, that can be re-used across many interpreters. That's not an immediate win like "my Python program runs faster now", but it's a great prospect for the future.
And it can run your Python program faster (maybe).

Q1. How is this possible?
Manual memory management (which is what CPython does with its counting) can be slower than automatic management in some cases.
Limitations in the implementation of the CPython interpreter preclude certain optimisations that PyPy can do (eg. fine grained locks).
As Marcelo mentioned, the JIT. Being able to on the fly confirm the type of an object can save you the need to do multiple pointer dereferences to finally arrive at the method you want to call.
Q2. Which Python implementation was used to implement PyPy?
The PyPy interpreter is implemented in RPython which is a statically typed subset of Python (the language and not the CPython interpreter). - Refer https://pypy.readthedocs.org/en/latest/architecture.html for details.
Q3. And what are the chances of a PyPyPy or PyPyPyPy beating their score?
That would depend on the implementation of these hypothetical interpreters. If one of them for example took the source, did some kind of analysis on it and converted it directly into tight target specific assembly code after running for a while, I imagine it would be quite faster than CPython.
Update: Recently, on a carefully crafted example, PyPy outperformed a similar C program compiled with gcc -O3. It's a contrived case but does exhibit some ideas.
Q4. Why would anyone try something like this?
From the official site. https://pypy.readthedocs.org/en/latest/architecture.html#mission-statement
We aim to provide:
a common translation and support framework for producing
implementations of dynamic languages, emphasizing a clean
separation between language specification and implementation
aspects. We call this the RPython toolchain_.
a compliant, flexible and fast implementation of the Python_
Language which uses the above toolchain to enable new advanced
high-level features without having to encode the low-level
details.
By separating concerns in this way, our implementation of Python - and
other dynamic languages - is able to automatically generate a
Just-in-Time compiler for any dynamic language. It also allows a
mix-and-match approach to implementation decisions, including many
that have historically been outside of a user's control, such as
target platform, memory and threading models, garbage collection
strategies, and optimizations applied, including whether or not to
have a JIT in the first place.
The C compiler gcc is implemented in C, The Haskell compiler GHC is written in Haskell. Do you have any reason for the Python interpreter/compiler to not be written in Python?

PyPy is implemented in Python, but it implements a JIT compiler to generate native code on the fly.
The reason to implement PyPy on top of Python is probably that it is simply a very productive language, especially since the JIT compiler makes the host language's performance somewhat irrelevant.

PyPy is written in Restricted Python. It does not run on top of the CPython interpreter, as far as I know. Restricted Python is a subset of the Python language. AFAIK, the PyPy interpreter is compiled to machine code, so when installed it does not utilize a python interpreter at runtime.
Your question seems to expect the PyPy interpreter is running on top of CPython while executing code.
Edit: Yes, to use PyPy you first translate the PyPy python code, either to C and build with gcc, to jvm byte code, or to .Net CLI code. See Getting Started

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.