When creating new programming languages, do you lose performance? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
This is proving to be a very difficult question for me to figure out how to properly ask.
For example, the Python interpreter is written in C. Say you wrote another interpreter in Python, that got compiled through CPython. And you then intend to run a program through your interpreter. Does that code have to go through your interpreter and then CPython? Or would the new interpreter be self-contained and not require CPython to interpret it's output. Because if it is not a standalone interpreter, then there would be a performance loss, in which case it would seem like all compilers would be written in low level languages until the end of time
EDIT:
PyPy was the very thing that got me wondering about this topic. I was under the impression that it was an interpreter, written in Python, and I did not understand how it could be faster than CPython. I also do not understand how bytecode gets executed by the machine without being translated to machine code, though I suppose that is another topic.

You seem to be confused about the distinction between compilers and interpreters, since you refer to both in your question without a clear distinction. (Quite understandble... see all the comments flying around this thread :-))
Compilers and interpreters are somewhat, though not totally, orthogonal concepts:
Compilers
Compilers take source code and produce a form that can be executed more efficiently, whether that be native machine code, or an intermediate form like CPython's bytecode.
c is perhaps the canonical example of a language that is almost always compiled to native machine code. The language was indeed designed to be relatively easy and efficient to translate into machine code. RISC CPU architectures became popular after the C language was already well-adopted for operating system programming, and they were often designed to make it even more efficient to translate certain features of C to machine code.
So the "compilability" of C has become self-reinforcing. It is difficult to introduce a new architecture on which it is hard to write a good C compiler (e.g. Itanium) that fully takes advantage of the hardware's potential. If your CPU can't run C code efficiently, it can't run most operating systems efficiently (the low-level bits of Linux, Unix, and Windows are mainly written in C).
Interpreters
Interpreters are traditionally defined as programs that try to run source code directly from its source representation. Most implementations of BASIC worked like this, back in the good ol' days: BASIC would literally re-parse each line of code on each iteration through a loop.
Modern languages
Modern programming languages and platforms blur the lines a lot. Languages like python, java, or c# are typically not compiled to native machine code, but to various intermediate forms like bytecode.
CPython's bytecode can be interpreted, but the overhead of interpretation is much lower because the code is fully parsed beforehand (and saved in .pyc file so it doesn't need to be re-parsed until it is modified).
Just-in-time compilation can be used to translate bytecode to native machine code just before it is actually run, with many different strategies for exactly when the native code compilation should take place.
Some languages that have "traditionally" been run via a bytecode interpreter or JIT compiler are also amenable to ahead-of-time compilation. For example, the Dalvik VM used in previous versions of Android relies on just-in-time compilation, while Android 4.4 has introduced ART which uses ahead-of-time compilation intsead.
Intermediate representations of Python
Here's a great thread containing a really useful and thoughful answer by #AlexMartelli on the lower-level compiled forms generated by various implementations of Python.
Answering the original question (I think...)
A traditional interpreter will almost certainly execute code slower than if that same code were compiled to "bare metal" machine code, all else being equal (which it typically is not), because the interpreter imposes an additional cost of parsing every line or unit of code every time it is executed.
So if a traditional interpreter were running under an interpreter, which was itself running under an interpreter, etc., ... that would result in a performance loss, just as running a VM (virtual machine) under a VM under a VM will be slower than running on "bare metal."
This is not so for a compiler. I could write a compiler which runs under an interpreter which runs under an interpreter which has been compiled by a compiler, etc... the resulting compiler could generate native machine code that is just as good as the native code generated by any other compiler. (It is important to realize that the performance of the compiler itself can be entirely independent of the performance of the executed code; an aggressive optimizing C compiler typically takes much more time to compile code than a non-optimizing compiler, but the intention is for the resultant native code to run significantly faster.)

Related

It is possible to create compilers for dynamic languages without losing his dynamic characteristics? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Are there some set of reasons that make it impossible for dynamic languages ​​such as Python or Ruby to be compiled instead of interpreted without losing any of his dynamics characteristics?
Of course one the requirements to that hypothetical compiler is that those languages doesn't lose any of his characteristics like metaprogramming, extend objects, add code or modify type system in runtime.
Summarizing, it is possible to create a Ruby or Python compiler without losing any of his characteristics as dynamic programming languages?
Yes, it is definitely possible to create compilers for dynamic languages. There are a myriad of examples of compilers for dynamic languages in the wild:
CPython is an implementation of the Python programming language which has a Python compiler.
PyPy is an implementation of the Python programming language which has a Python compiler.
Jython is an implementation of the Python programming language which has a Python compiler.
IronPython is an implementation of the Python programming language which has a Python compiler.
Pynie is an implementation of the Python programming language which has a Python compiler.
YARV is an implementation of the Ruby programming language which has a Ruby compiler.
Rubinius is an implementation of the Ruby programming language which has a Ruby compiler.
MacRuby is an implementation of the Ruby programming language which has a Ruby compiler.
JRuby is an implementation of the Ruby programming language which has a Ruby compiler.
IronRuby is an implementation of the Ruby programming language which has a Ruby compiler.
MagLev is an implementation of the Ruby programming language which has a Ruby compiler.
Quercus is an implementation of the PHP programming language which has a PHP compiler.
P8 is an implementation of the PHP programming language which has a PHP compiler.
V8 is an implementation of the ECMAScript programming language which has an ECMAScript compiler.
In general, every language can be implemented by a compiler, and every language can be implemented by an interpreter. It is also possible to automatically derive a compiler from an interpreter and vice-versa.
Most modern language implementations use both interpretation and compilation, sometimes even several compilers. Take Rubinius, for example: first Ruby code is compiled to Rubinius bytecode. Rubinius bytecode is then interpreted by the Rubinius VM. Code which has been interpreted several times is then compiled to Rubinius Compiler IR, which is then compiled to LLVM IR, which is then compiled to "native code" (whatever that is). So, Rubinius has one interpreter and three compilers.
V8 is a different example. It actually has no interpreter, but two different compilers: one very fast, very memory-efficient compiler which produces unoptimized, somewhat slow code. Code which has been run multiple times is then thrown away, and compiled again with the second compiler, which produces aggressively optimized code but takes more time and uses more memory during compilation.
However, in the end, you cannot run code without an interpreter. A compiler cannot run code. A compiler translates a program from one language into a different language. That's it. You can translate all you want, in the end, something has to run the code, and that thing is an interpreter. It might be implemented in software or in silicon, but it still is an interpreter.
I'll just assume by "compile" you mean "compile to native machine code" and leave it for others to challenge this very narrow definition. The answer is a resounding yes. In fact, people are doing this right now:
Nuitka
Cython (actually not Python, but very close and could be made to support full Python).
Various "freezing" tools, though technically those only package the bytecode and a bytecode interpreter into one binary.
However, such a compiler can't perform many (I'd say effectively zero) optimizations, so the resulting code is basically equivalent to what a simple-minded interpreter would do and you only save interpretation overhead (and you lose some nice properties of interpreters, including compact code and faster turnaround). In other words: Dynamic, correct, fast - choose two (full disclosure: the accepted answer is mine).

How can an implementation of a language in the same language be faster than the language?

If I make a JVM in Java, for example, is it possible to make the implementation I made actually faster than the original implementation I used to build this implementation, even though my implementation is built on top of the original implementation and may even be dependant on that implementation?
( Confusing... )
Look at PyPy. It's a JIT Compiler for Python made in Python. That's alright, but how can it claim to be faster than the original implementation of Python which it is using and is dependent on?
You are confused between a language and the execution apparatus for that language.
One of the reasons why PyPy can be faster than CPython is because PyPy is compiled to a completely separate native executable, and does not depend on, nor execute in, CPython.
Nevertheless, it would be possible for an inefficient implementation of one language to be surpassed by an interpreter written in that same language, and hosted in the inefficient interpreter, if the higher-level interpreter made use of more efficient execution strategies.
Absolutely, it is possible. Your JVM implementation could compile Java bytecodes to optimized machine code. If your optimizer was more sophisticated that that in the JVM implementation which you run your Java compiler on, then the end result could be faster.
In that case, you could run your Java compiler on its own source code, and benefit from faster compilation speeds from then on.
You said that PyPy is a JIT compiler for Python (I'm not familiar with it myself). If that's the case, then it converts a Python program to machine code, and then runs the machine code. Another poster said that the PyPy compiler runs as a standalone executable, separate from CPython. But even if it was to run on CPython, once your program is JIT'd to machine code, and the compiled machine code is running, the performance of the compiler no longer matters. The speed of the compiler only has an effect on startup time.
PyPy isn't Python interpreter implemented in Python, it's Python interpreter and compiler implemented in RPython, which is a restricted statically typed subset of Python:
RPython is a restricted subset of Python that is amenable to static
analysis. Although there are additions to the language and some things
might surprisingly work, this is a rough list of restrictions that
should be considered. Note that there are tons of special cased
restrictions that you’ll encounter as you go.
The real speed difference comes from the fact, that unlike CPython which is interpreting whole program as bytecode, PyPy uses just-in-time (JIT) compilation (into machine code) for RPython parts.
I don't think it's possible to implement an interpreter for a language in that language (call this A), then run it on top of another existing interpreter (call this B) for that language and execute a program (call this P), and have P running on (A running on B) be faster than P running on B.
Every operation of A is going to have to be implemented with at least one operation of B. So even if B is atrociously bad and A, is optimally good, the fact that A is being run on B means that B's badness will slow down A.
It could be possible to implement an interpreter + JIT compiler for a language in the language itself, where the JIT compiler produces some other faster code at runtime, and have P running on (A running on B) be faster than P running on B. The part of P's runtime that isn't JIT compiled will be slower (much slower, normally) but if the JIT compiler successfully identifies the "hot" parts of P and executes them more quickly than B would then the whole system might run faster overall.
But that's not really interesting. It's also possible to implement a compiler for a language in that language (C), compile it with an existing compiler (D), and have the new compiler language produce code that is faster than what the original compiler would have produced. I hope that doesn't startle you; it should be clear that the speed of the code emitted by D will only have an effect on the execution time of C, not on the execution time of other programs compiled with C.
Writing compilers in the languages they compile has been done for decades (GCC is written in C, for example), and isn't really relevant to the real question I think you're asking; neither is JIT-compiling a language using itself. In both cases the underlying execution is something other than the language you're considering; usually machine code.
However, the source of your question is a misconception. PyPy's Python interpreter isn't actually implemented in Python. The PyPy project has an interpreter for Python written in RPython. RPython is a subset of Python, chosen so that it can be efficiently compiled down to machine code; as a language RPython is much more like Java with type inference and indented blocks instead of braces. The PyPy project also has a compiler for RPython which is written in Python, and is capable of (mostly) automatically adding a JIT compiler to any interpreter it compiles.
When you're actually using the PyPy interpreter in production, you're using a machine-code interpreter compiled from the RPython sources, just as when you're using the CPython interpreter you use a machine-code interpreter compiled from C source code. If you execute the PyPy interpreter on top of another Python interpreter (which you can do because valid RPython code is also valid Python code; but not the other way around), then it runs hugely slower than the CPython interpreter.
The pypy translation process runs on CPython, but the output is a list of .c files (19 files last time I checked) which are then compiled to a binary : pypy-c. At runtime pypy-c does not have any relation with CPython, that's why it can be faster.

Question about python construction

A friend of mine that is a programmer told me that "Python is written in Python" or something like that. He meant that Python interpreter is written in Python (I think). I've read in some websites that Python interpret in real time ANY programming language (even C++ and ASM). Is this true?
Could someone explain me HOW COULD IT BE?
The unique explanation that I came up with after thinking a bit is: python is at the same "level" of ASM, it makes sense to python interpret any language (that is in a higher level), am I right? Does this make sense?
I would be grateful is someone explain me a little about it.
Thank you
It's not true. The standard implementation of Python - CPython - is written in C, although much of the standard library is written in Python. There are other implementations in Java (Jython) and .NET (IronPython).
There is a project called PyPy which, among other things, is rewriting the C parts of Python into Python. But the main development of Python is still based on C.
Your friend told you that Python is self-hosting:
The term self-hosting was coined to refer to the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-hosting include kernels, assemblers, shells and revision control software.
Of course, the very first revision of Python had to be bootstrapped by some other mechanism -- perhaps C or C++ as these are fairly standard targets for lexers and parser generators.
Generally, when someone says language X is written in X, they mean that first a compiler or interpreter for X was written in assembly or other such language, compiled, and then a better compiler or interpreter was written in X.
Additionally, once a very basic compiler/interpreter for X exists, it is sometimes easier to add new language features, classes, etc. to X by writing them in X than to extend the compiler/interpreter itself.
Python is written in C (CPython) as well as Python.
Read about pypy -- that's Python written in Python.
Writing Python in Python is a two-step dance.
Write Python in some other language. C, Java, assembler, COBOL, whatever.
Once you have a working implementation of Python (i.e., passes all the tests) you can then write Python in Python.
When you read about pypy, you'll see that they do something a hair more sophisticated than this. "We are using a subset of the high-level language Python, called RPython, in which we write languages as simple interpreters with few references to and dependencies on lower level details."
So they started with a working Python and then broke the run-time into this RPython kernel which is the smallest nugget of Python goodness. Then they built the rest of Python around the RPython kernel.

Why do C programs require decompilers but python programs dont?

If I write a python script, anyone can simply point an editor to it and read it. But for programming written in C, one would have to use decompilers and hex tables and such. Why is that? I mean I simply can't open up the Safari web browser and look at its code.
Note: The author disavows a deep expertise in this subject. Some assertions may be incorrect.
Python actually is compiled into bytecode, which is what gets run by the python interpreter. Whenever you use a Python module, Python will generate a .pyc file with a name corresponding to the module. This is the equivalent of the .o file that's generated when you compile a C file.
So if you want something to disassemble, the .pyc file would be it :)
The process that Python goes through when compiling a module is pretty similar to what gcc or another C compiler does with C source code. The major difference is that it happens transparently as part of execution of the file. It's also optional: when running a non-module, i.e. an end-user script, Python will just interpret the code rather than compiling it first.
So really your question is "Why are python programs distributed as source rather than as compiled modules?" Or, put another way, "Why are C applications distributed as compiled binaries rather than as source code?"
It used to be very common for C applications to be distributed as source code. This was back before operating systems and their various subentities (i.e. linux distributions) became more established. Some distros, for example gentoo, still distribute apps as source code. Apps which are a bit more cutting edge or obscure are still distributed as source code for all platforms they target.
The reason for this is compatibility, and dependencies. The reason you can run the precompiled binary Safari on a Mac, or Firefox on Ubuntu Linux, is because it's been specifically built for that operating system, architecture (e.g. x86_64), and set of libraries.
Unfortunately, compilation of a large app is pretty slow, and needs to be redone at least partially every time the app is updated. Thus the motivation for binary distributions.
So why not create a binary distribution of Python? For one thing, as Aaron mentions, modules would need to be recompiled for each new version of the Python bytecode. But this would be similar to rebuilding a C app to link with a newer version of a dynamic library — Python modules are analogous in this sense to C libraries.
The real reason is that Python compilation is very much quicker than C compilation. This is in part, I think, because of the dynamic nature of the language, and also because it's not as thorough of a compilation. This has its tradeoffs: in particular, Python apps run much more slowly than do their C counterparts, because Python has to interpret the compiled bytecode into instructions for the processor, whereas the C app already contains such instructions.
That all being said, there is a program called py2exe that will take a Python module and distribution and build a precompiled windows executable, including in it the logic of the module and its dependencies, including Python itself. I guess the point of this is to avoid having to coerce people into installing Python on their Windows system just to run your app. Under linux, or I think even OS/X, Python is usually already installed, so precompilation is not really necessary. Linux systems also have super-dandy package managers that will transparently install dependencies such as Python if they are not already installed.
Python is a script language, runs in a virtual machine through an interpeter.
C is a compiled language, the code compiled to binary code which the computer can run without all that extra stuff Python needs.
This is sorta a big topic. You should look into your local friendly Computer Science curriculum, you'll find a lot of great stuff on this subject there.
The short answer is the Python is an "interpreted" language, which means that it requires a machine language program (the python interpreter) to run the python program, adding a layer of indirection. C or C++ are different. They are compiled directly to machine code, which runs directly on your processor.
There is a lot of additional voodoo to be learned here, however. Technically Python is compiled to a bytecode, and modern interpreters do more and more "Just in Time" compilation, so the boundaries between compiled and interpreted code are getting fuzzier all the time.
In several comments you asked: "Is it then possible to compile python to an executable binary file and then simply distribute that?"
From a theoretical viewpoint, there's no question the answer is yes -- a Python program could be compiled to, and distributed as, fully compiled machine code.
From a practical viewpoint, it's open to a lot more question. There are a few things like Unladen Swallow, Psyco, Shed Skin, and PyPy that you might want to know about though.
Unladen Swallow is primarily an attempt at making Python run faster, but part of the plan to do so involves using LLVM for its back-end. LLVM can (among other things) produce native machine code output. The last couple of releases of Unladen Swallow have used LLVM for native code generation, but 1) the most recent update on the web site is from late 2009, and 2) the release notes for that version say: "The Unladen Swallow team does not recommend wide adoption of the 2009Q3 release."
Psyco works as a plug-in for Python that basically does JIT compilation, so even though it can speed up execution (quite a lot in some cases), it doesn't produce a machine-code executable you can distribute. In short, while it's sort of similar to what you want, it's not intended to do exactly what you've asked for.
Shed Skin Python-to-C++ produces C++ as its output, and you then compile the C++ and (potentially) distribute the result of that. Shedskin is currently at version 0.5 -- i.e., nobody's claiming that it's a finished, released product. On the other hand, development is ongoing, and each release does seem to include pretty substantial improvements.
PyPy is a Python implementation written in Python. Their intent is to allow code production to be "plugged in" without affecting the rest of the implementation -- but while they currently support 4 different code generation models, I don't believe any of them results in producing native machine code that runs directly on the hardware.
Bottom line: work has been done and is being done with the intent of doing what you asked about, but at least to my knowledge there's not really anything I could reasonably recommend as a finished product that you can really depend on to do the job right now. The primary emphasis is really on execution speed, not producing standalone executables.
Yes, you can - it's called disassembling, and allows you to look at the code of Safari perfectly well. The thing is, C, among other languages, compiles to native code, i.e. code that your CPU can "understand" and execute.
More or less obviously, the level of abstraction present in the instruction set of your CPU is much smaller than that of a high level language like Python. The CPU instructions are not concerned with "downloading that URI", but more "check if that bit is set in a hardware register".
So, in conclusion, the level of complexity present in a native application is much higher when looking at the machine code, so many people simply can't make any sense of what is going on there, it's hard to get the big picture. With experience and time at your hands, it is possible though - people do it all the time, reversing applications and all.
you can't open up and read the code that actually runs for python either. Try
import dis
def foo():
for i in range(100):
print i
print dis.dis(foo)
That will show you the (human readable) bytcode of the foo program. equivalently, you can save the file and import it from the interactive python interpreter. This will create a .pyc file with the same basename as the script. open that with a hex editor and you are looking at the actually python bytecode.
The reason for the difference is that python changes up it's byte code between releases so that you would either need to distribute a different version of a binary only release for each version of python. This would be a pain.
With C, it's compiled to native code and so the byte code is much more stable making binary only releases possible.
because C code is complied to object (machine) code and python code is compiled into an intermediate byte code. I am not sure if you are even referring to the byte code of python - you must be referring to the source file itself which is directly executable (hiding the byte code from you!). C needs to be compiled and linked.
Python scripts are parsed and converted to binary only when they're run - i.e., they're text files and you can read them with an editor.
C code is compiled and linked to an executable binary file before they can be run. Normally, only this executable binary file is distributed - hence you need a decompiler. You can always view the source code, if you've access to it.
Not all C programs require decompilers. There's lots of C code distributed in source form. And some Python programs do require decompilers, if distributed as bytecode (.pyc files).
But, to the extent that your assumptions are valid, it's because C is a compiled language while Python is an interpreted language.
Python scripts are analogous to a man looking at a to-do list written in English (or language he understands). The man has to do all the work, every time that list of things has to be done.
If the man, instead of doing the steps on his own each time, creates and programs a robot which can carry out those steps again and again (and probably faster than him), that robot is analogous to the C program.
The man in the python case is called the "interpreter" and in the C case is called the "compiler", and the C robot is called the compiled program/executable.
When you look at the python program source, you see the to-do list. In case of the robot, you see the gears, motors and batteries, etc, which look very different from the to-do list. If you could get hold of the C "to-do" list, it looks somewhat like the python code, just in a different language.
G-WAN executes ANSI C scripts on the fly -making it just like Python scripts.
This can be server-side scripts (using G-WAN as a Web server) or any general-purpose C program and you can link any existing library.
Oh, and G-WAN C scripts are much faster than Python, PHP or Java...

PyPy -- How can it possibly beat CPython?

From the Google Open Source Blog:
PyPy is a reimplementation of Python
in Python, using advanced techniques
to try to attain better performance
than CPython. Many years of hard work
have finally paid off. Our speed
results often beat CPython, ranging
from being slightly slower, to
speedups of up to 2x on real
application code, to speedups of up to
10x on small benchmarks.
How is this possible? Which Python implementation was used to implement PyPy? CPython? And what are the chances of a PyPyPy or PyPyPyPy beating their score?
(On a related note... why would anyone try something like this?)
"PyPy is a reimplementation of Python in Python" is a rather misleading way to describe PyPy, IMHO, although it's technically true.
There are two major parts of PyPy.
The translation framework
The interpreter
The translation framework is a compiler. It compiles RPython code down to C (or other targets), automatically adding in aspects such as garbage collection and a JIT compiler. It cannot handle arbitrary Python code, only RPython.
RPython is a subset of normal Python; all RPython code is Python code, but not the other way around. There is no formal definition of RPython, because RPython is basically just "the subset of Python that can be translated by PyPy's translation framework". But in order to be translated, RPython code has to be statically typed (the types are inferred, you don't declare them, but it's still strictly one type per variable), and you can't do things like declaring/modifying functions/classes at runtime either.
The interpreter then is a normal Python interpreter written in RPython.
Because RPython code is normal Python code, you can run it on any Python interpreter. But none of PyPy's speed claims come from running it that way; this is just for a rapid test cycle, because translating the interpreter takes a long time.
With that understood, it should be immediately obvious that speculations about PyPyPy or PyPyPyPy don't actually make any sense. You have an interpreter written in RPython. You translate it to C code that executes Python quickly. There the process stops; there's no more RPython to speed up by processing it again.
So "How is it possible for PyPy to be faster than CPython" also becomes fairly obvious. PyPy has a better implementation, including a JIT compiler (it's generally not quite as fast without the JIT compiler, I believe, which means PyPy is only faster for programs susceptible to JIT-compilation). CPython was never designed to be a highly optimising implementation of the Python language (though they do try to make it a highly optimised implementation, if you follow the difference).
The really innovative bit of the PyPy project is that they don't write sophisticated GC schemes or JIT compilers by hand. They write the interpreter relatively straightforwardly in RPython, and for all RPython is lower level than Python it's still an object-oriented garbage collected language, much more high level than C. Then the translation framework automatically adds things like GC and JIT. So the translation framework is a huge effort, but it applies equally well to the PyPy python interpreter however they change their implementation, allowing for much more freedom in experimentation to improve performance (without worrying about introducing GC bugs or updating the JIT compiler to cope with the changes). It also means when they get around to implementing a Python3 interpreter, it will automatically get the same benefits. And any other interpreters written with the PyPy framework (of which there are a number at varying stages of polish). And all interpreters using the PyPy framework automatically support all platforms supported by the framework.
So the true benefit of the PyPy project is to separate out (as much as possible) all the parts of implementing an efficient platform-independent interpreter for a dynamic language. And then come up with one good implementation of them in one place, that can be re-used across many interpreters. That's not an immediate win like "my Python program runs faster now", but it's a great prospect for the future.
And it can run your Python program faster (maybe).
Q1. How is this possible?
Manual memory management (which is what CPython does with its counting) can be slower than automatic management in some cases.
Limitations in the implementation of the CPython interpreter preclude certain optimisations that PyPy can do (eg. fine grained locks).
As Marcelo mentioned, the JIT. Being able to on the fly confirm the type of an object can save you the need to do multiple pointer dereferences to finally arrive at the method you want to call.
Q2. Which Python implementation was used to implement PyPy?
The PyPy interpreter is implemented in RPython which is a statically typed subset of Python (the language and not the CPython interpreter). - Refer https://pypy.readthedocs.org/en/latest/architecture.html for details.
Q3. And what are the chances of a PyPyPy or PyPyPyPy beating their score?
That would depend on the implementation of these hypothetical interpreters. If one of them for example took the source, did some kind of analysis on it and converted it directly into tight target specific assembly code after running for a while, I imagine it would be quite faster than CPython.
Update: Recently, on a carefully crafted example, PyPy outperformed a similar C program compiled with gcc -O3. It's a contrived case but does exhibit some ideas.
Q4. Why would anyone try something like this?
From the official site. https://pypy.readthedocs.org/en/latest/architecture.html#mission-statement
We aim to provide:
a common translation and support framework for producing
implementations of dynamic languages, emphasizing a clean
separation between language specification and implementation
aspects. We call this the RPython toolchain_.
a compliant, flexible and fast implementation of the Python_
Language which uses the above toolchain to enable new advanced
high-level features without having to encode the low-level
details.
By separating concerns in this way, our implementation of Python - and
other dynamic languages - is able to automatically generate a
Just-in-Time compiler for any dynamic language. It also allows a
mix-and-match approach to implementation decisions, including many
that have historically been outside of a user's control, such as
target platform, memory and threading models, garbage collection
strategies, and optimizations applied, including whether or not to
have a JIT in the first place.
The C compiler gcc is implemented in C, The Haskell compiler GHC is written in Haskell. Do you have any reason for the Python interpreter/compiler to not be written in Python?
PyPy is implemented in Python, but it implements a JIT compiler to generate native code on the fly.
The reason to implement PyPy on top of Python is probably that it is simply a very productive language, especially since the JIT compiler makes the host language's performance somewhat irrelevant.
PyPy is written in Restricted Python. It does not run on top of the CPython interpreter, as far as I know. Restricted Python is a subset of the Python language. AFAIK, the PyPy interpreter is compiled to machine code, so when installed it does not utilize a python interpreter at runtime.
Your question seems to expect the PyPy interpreter is running on top of CPython while executing code.
Edit: Yes, to use PyPy you first translate the PyPy python code, either to C and build with gcc, to jvm byte code, or to .Net CLI code. See Getting Started

Categories