Based on questions like this What makes C faster than Python? I've learned that dynamic/static typing isn't the main reason that C is faster than Python. It appears to be largely because python programs are interpreted, and c programs are compiled.
I'm wondering if strict typing would close the gap in performance for interpreted vs compiled programs enough that strict typing would be a viable strategy for improving interpreted Python program performance post facto?
If the answer is yes, is this done in pro-dev contexts?
With current versions of Python, type annotations are mostly hints for the programmer and possibly some validation tools but are ignored by the compiler and not used at runtime by the byte-code interpreter, which is similar to the behavior of Typescript.
It might be possible to change the semantics of Python to take advantage of static typing in some circumstances to generate more efficient byte-code and possibly perform just in time executable code generation (JIT). Advanced Javascript engines use complex heuristics to achieve this without type annotations. Both approaches could help make Python programs much faster and in some cases perform better than equivalent C code.
Note also that many advanced Python packages use native code, written in C and other languages, taking advantage of optimizing compilers, SIMD instructions and even multi-threading... The Python code in programs using these libraries is not where the time is spent and the performance is comparable to that of compiled languages, while giving the programmer a simpler language to express their problems.
Related
I'm trying to write some code that has to be as efficient as possible since I'll be doing thousands if not millions of operations in as short a time as possible.
Now I've got some code that does this written in C, but after some research on wrappers it seems above my skill level to delve in to that (especially since I have near no knowledge of C). Because of that, I was considering the possibility of re-writing the code/algorithm in python.
I've read that the reason most high efficiency tasks get done in C is because of the high efficiency of compiled languages and was wondering whether it is possible to compile python code to get close to the same efficiency using pure python.
The code would include specific bit manipulation, table lookups, bitwise logic and binary search or possibly using perfect hash.
Recap: I'm trying to code something that has to be efficient and was wondering whether it's possible to make python run at or near C efficiency. If so: how?
Ultimately, Python is an interpreted language. The Python binaries compiled for your target machine not only execute your Python code, but also the interpreter that takes your code and interprets each command at run-time. That additional overhead of interpretation is what affects performance.
Languages that get compiled to native machine code and skip both interpretation (Python) and a Virtual Machine (JVM, .Net CLR) for the target CPU offer the best performance. Those are C and C++11/14/17.
You can use languages that get best performance that do use a particular kind of virtual machine, such as LLVM, which is useful for SSA. Clang (C/C++) and Rust can be compiled for LLVM (front-end), analyzed and optimized using LLVM-specific tools, and finally the Assembly and Linking using an LLVM back-end for the target machine. Either way, (direct or indirect machine code generation), native compilation offers the best performance.
I found that when I ask something more to Python, python doesn't use my machine resource at 100% and it's not really fast, it's fast if compared to many other interpreted languages, but when compared to compiled languages i think that the difference is really remarkable.
Is it possible to speedup things with a Just In Time (JIT) compiler in Python 3?
Usually a JIT compiler is the only thing that can improve performances in interpreted languages, so i'm referring to this one, if other solutions are available i would love to accept new answers.
First off, Python 3(.x) is a language, for which there can be any number of implementations. Okay, to this day no implementation except CPython actually implements those versions of the language. But that will change (PyPy is catching up).
To answer the question you meant to ask: CPython, 3.x or otherwise, does not, never did, and likely never will, contain a JIT compiler. Some other Python implementations (PyPy natively, Jython and IronPython by re-using JIT compilers for the virtual machines they build on) do have a JIT compiler. And there is no reason their JIT compilers would stop working when they add Python 3 support.
But while I'm here, also let me address a misconception:
Usually a JIT compiler is the only thing that can improve performances in interpreted languages
This is not correct. A JIT compiler, in its most basic form, merely removes interpreter overhead, which accounts for some of the slow down you see, but not for the majority. A good JIT compiler also performs a host of optimizations which remove the overhead needed to implement numerous Python features in general (by detecting special cases which permit a more efficient implementation), prominent examples being dynamic typing, polymorphism, and various introspective features.
Just implementing a compiler does not help with that. You need very clever optimizations, most of which are only valid in very specific circumstances and for a limited time window. JIT compilers have it easy here, because they can generate specialized code at run time (it's their whole point), can analyze the program easier (and more accurately) by observing it as it runs, and can undo optimizations when they become invalid. They can also interact with interpreters, unlike ahead of time compilers, and often do it because it's a sensible design decision. I guess this is why they are linked to interpreters in people's minds, although they can and do exist independently.
There are also other approaches to make Python implementation faster, apart from optimizing the interpreter's code itself - for example, the HotPy (2) project. But those are currently in research or experimentation stage, and are yet to show their effectiveness (and maturity) w.r.t. real code.
And of course, a specific program's performance depends on the program itself much more than the language implementation. The language implementation only sets an upper bound for how fast you can make a sequence of operations. Generally, you can improve the program's performance much better simply by avoiding unnecessary work, i.e. by optimizing the program. This is true regardless of whether you run the program through an interpreter, a JIT compiler, or an ahead-of-time compiler. If you want something to be fast, don't go out of your way to get at a faster language implementation. There are applications which are infeasible with the overhead of interpretation and dynamicness, but they aren't as common as you'd think (and often, solved by calling into machine code-compiled code selectively).
The only Python implementation that has a JIT is PyPy. Byt - PyPy is both a Python 2 implementation and a Python 3 implementation.
The Numba project should work on Python 3. Although it is not exactly what you asked, you may want to give it a try:
https://github.com/numba/numba/blob/master/docs/source/doc/userguide.rst.
It does not support all Python syntax at this time.
You can try the pypy py3 branch, which is more or less python compatible, but the official CPython implementation has no JIT.
This will best be answered by some of the remarkable Python developer folks on this site.
Still I want to comment: When discussing speed of interpreted languages, I just love to point to a project hosted at this location: Computer Language Benchmarks Game
It's a site dedicated to running benchmarks. There are specified tasks to do. Anybody can submit a solution in his/her preferred language and then the tests compare the runtime of each solution. Solutions can be peer reviewed, are often further improved by others, and results are checked against the spec. In the long run this is the most fair benchmarking system to compare different languages.
As you can see from indicative summaries like this one, compiled languages are quite fast compared to interpreted languages. However, the difference is probably not so much in the exact type of compilation, it's the fact that Python (and the others in the graph slower than python) are fully dynamic. Objects can be modified on the fly. Types can be modified on the fly. So some type checking has to be deferred to runtime, instead of compile time.
So while you can argue about compiler benefits, you have to take into account that there are different features in different languages. And those features may come at an intrinsic price.
Finally, when talking about speed: Most often it's not the language and the perceived slowness of a language that's causing the issue, it's a bad algorithm. I never had to switch languages because one was too slow: When there's a speed issue in my code, I fix the algorithm. However, if there are time-consuming, computational intensive loops in your code it is usually worth the while to recompile those. A prominent example are libraries coded in C used by scripting languages (Perl XS libs, or e.g. numpy/scipy for Python, lapack/blas are examples of libs available with bindings for many scripting languages)
If you mean JIT as in Just in time compiler to a Bytecode representation then it has such a feature(since 2.2). If you mean JIT to machine code, then no. Yet the compilation to byte code provides a lot of performance improvement. If you want it to compile to machine code, then Pypy is the implementation you're looking for.
Note: pypy doesn't work with Python 3.x
If you are looking for speed improvements in a block of code, then you may want to have a look to rpythonic, that compiles down to C using pypy. It uses a decorator that converts it in a JIT for Python.
I have a memory and CPU intensive problem to solve and I need to benchmark the different solutions in ruby and python on different platforms.
To do the benchmark, I need to measure the time taken and the memory occupied by objects (not the entire program, but a selected list of objects) in both python and ruby.
Please recommend ways to do it, and also let me know if it is possible to do it without using OS specify tools like (Task Manager and ps). Thanks!
Update: Yes, I know that both Python and Ruby are not strong in performance and there are better alternatives like c, c++, Java etc. I am actually more interested in comparing the performance of Python and Ruby. And please no fame-wars.
For Python I recommend heapy
from guppy import hpy
h = hpy()
print h.heap()
or Dowser or PySizer
For Ruby you can use the BleakHouse Plugin or just read this answer on memory leak debugging (ruby).
If you really need to write fast code in a language like this (and not a language far more suited to CPU intensive operations and close control over memory usage such as C++) then I'd recommend pushing the bulk of the work out to Cython.
Cython is a language that makes
writing C extensions for the Python
language as easy as Python itself.
Cython is based on the well-known
Pyrex, but supports more cutting edge
functionality and optimizations.
The Cython language is very close to
the Python language, but Cython
additionally supports calling C
functions and declaring C types on
variables and class attributes. This
allows the compiler to generate very
efficient C code from Cython code.
That way you can get most of the efficiency of C with most of the ease of use of Python.
If you are using Python for CPU intensive algorithmic tasks I suggest use Numpy/Scipy to speed up your numerical calculations and use the Psyco JIT compiler for everything else. Your speeds can approach that of much lower-level languages if you use optimized components.
I'd be wary of trying to measure just the memory consumption of an object graph over the lifecycle of an application. After all, you really don't care about that, in the end. You care that your application, in its entirety, has a sufficiently low footprint.
If you choose to limit your observation of memory consumption anyway, include garbage collector timing in your list of considerations, then look at ruby-prof:
http://ruby-prof.rubyforge.org/
Ciao,
Sheldon.
(you didn't specify py 2.5, 2.6 or 3; or ruby 1.8 or 1.9, jruby, MRI; The JVM has a wealth of tools to attack memory issues; Generally it 's helpful to zero in on memory depletion by posting stripped down versions of programs that replicate the problem
Heapy, ruby-prof, bleak house are all good tools, here are others:
Ruby
http://eigenclass.org/R2/writings/object-size-ruby-ocaml
watch ObjectSpace yourself
http://www.coderoshi.com/2007/08/cheap-tricks-ix-spying-on-ruby.html
http://sporkmonger.com/articles/2006/10/22/a-question
(ruby and python)
http://www.softwareverify.com/
I like to use python for almost everything and always had clear in my mind that if for some reason I was to find a bottleneck in my python code(due to python's limitations), I could always use a C script integrated to my code.
But, as I started to read a guide on how to integrate python. In the article the author says:
There are several reasons why one might wish to extend Python in C or C++, such as:
Calling functions in an existing library.
Adding a new builtin type to Python
Optimising inner loops in code
Exposing a C++ class library to Python
Embedding Python inside a C/C++ application
Nothing about performance. So I ask again, is it reasonable to integrate python with c for performance?
In my experience it is rarely necessary to optimize using C. I prefer to identify bottlenecks and improve algorithms in those areas completely in Python. Using hash tables, caching, and generally re-organizing your data structures to suit future needs has amazing potential for speeding up your program. As your program develops you'll get a better sense of what kind of material can be precalculated, so don't be afraid to go back and redo your storage and algorithms. Additionally, look for chances to kill "two birds with one stone", such as sorting objects as you render them instead of doing huge sorts.
When everything is worked to the best of your knowledge, I'd consider using an optimizer like Psyco. I've experienced literally 10x performance improvements just by using Psyco and adding one line to my program.
If all else fails, use C in the proper places and you'll get what you want.
* Optimising inner loops in code
Isn't that about performance ?
Performance is a broad topic so you should be more specific. If the bottleneck in your program involves a lot of networking then rewriting it in C/C++ probably won't make a difference since it's the network calls taking up time, not your code. You would be better off rewriting the slow section of your program to use fewer network calls thus reducing the time your program spends waiting on entwork IO. If your doing math intensive stuff such as solving differential equations and you know there are C librarys that can offer better performance then the way you are currently doing it in Python you may want to rewrite the section of your program to use those librarys to increase it's performance.
The C extensions API is notoriously hard to work with, but there are a number of other ways to integrate C code.
For some more usable alternatives see http://www.scipy.org/PerformancePython, in particular the section about using Weave for easy inlining of C code.
Also of interest is Cython, which provides a nice system for integrating with C code. Cython is used for optimization by some well-respected high-performance Python projects such as NumPy and Sage.
As mentioned above, Psyco is another attractive option for optimization, and one which requires nothing more than
import psyco
psyco.bind(myfunction)
Psyco will identify your inner loops and automatically substitute optimized versions of the routines.
C can definitely speed up processor bound tasks. Integrating is even easier now, with the ctypes library, or you could go for any of the other methods you mention.
I feel mercurial has done a good job with the integration if you want to look at their code as an example. The compute intensive tasks are in C, and everything else is python.
You will gain a large performance boost using C from Python (assuming your code is well written, etc) because Python is interpreted at run time, whereas C is compiled beforehand. This will speed up things quite a bit because with C, your code is simply running, whereas with Python, the Python interpreter must figure out what you are doing and interpret it into machine instructions.
I've been told for the calculating portion use C for the scripting use python. So yes you can integrate both. C is capable of faster calculations than that of python
Out of curiosity, are there many compilers out there which target .pyc files?
After a bit of Googling, the only two I can find are:
unholy: why_'s Ruby-to-pyc compiler
Python: The PSF's Python to pyc compiler
So… Are there any more?
(as a side note, I got thinking about this because I want to write a Scheme-to-pyc compiler)
(as a second side note, I'm not under any illusion that a Scheme-to-pyc compiler would be useful, but it would give me an incredible excuse to learn some internals of both Scheme and Python)
"I want to write a Scheme-to-pyc compiler".
My brain hurts! Why would you want to do that? Python byte code is an intermediate language specifically designed to meet the needs of the Python language and designed to run on Python virtual machines that, again, have been tailored to the needs of Python. Some of the most important areas of Python development these days are moving Python to other "virtual machines", such as Jython (JVM), IronPython (.NET), PyPy and the Unladen Swallow project (moving CPython to an LLVM-based representation). Trying to squeeze the syntax and semantics of another, very different language (Scheme) into the intermediate representation of another high-level language seems to be attacking the problem (whatever the problem is) at the wrong level. So, in general, it doesn't seem like there would be many .pyc compilers out there and there's a good reason for that.
I wrote a compiler several years ago which accepted a lisp-like language called "Noodle" and produced Python bytecode. While it never became particularly useful, it was a tremendously good learning experience both for understanding Common Lisp better (I copied several of its features) and for understanding Python better.
I can think of two particular cases when it might be useful to target Python bytecode directly, instead of producing Python and passing it on to a Python compiler:
Full closures: in Python before 3.0 (before the nonlocal keyword), you can't modify the value of a closed-over variable without resorting to bytecode hackery. You can mutate values instead, so it's common practice to have a closure referencing a list, for example, and changing the first element in it from the inner scope. That can get real annoying. The restriction is part of the syntax, though, not the Python VM. My language had explicit variable declaration, so it successfully provided "normal" closures with modifiable closed-over values.
Getting at a traceback object without referencing any builtins. Real niche case, for sure, but I used it to break an early version of the "safelite" jail. See my posting about it.
So yeah, it's probably way more work than it's worth, but I enjoyed it, and you might too.
I suggest you focus on CPython.
http://www.network-theory.co.uk/docs/pytut/CompiledPythonfiles.html
Rather than a Scheme to .pyc translator, I suggest you write a Scheme to Python translator, and then let CPython handle the conversion to .pyc. (There is precedent for doing it this way; the first C++ compiler was Cfront which translated C++ into C, and then let the system C compiler do the rest.)
From what I know of Scheme, it wouldn't be that difficult to translate Scheme to Python.
One warning: the Python virtual machine is probably not as fast for Scheme as Scheme itself. For example, Python doesn't automatically turn tail recursion into iteration; and Python has a relatively shallow stack, so you would actually need to turn tail recursion to iteration for your translator.
As a bonus, once Unladen Swallow speeds up Python, your Scheme-to-Python translator would benefit, and at that point might even become practical!
If this seems like a fun project to you, I say go for it. Not every project has to be immediately practical.
P.S. If you want a project that is somewhat more practical, you might want to write an AWK to Python translator. That way, people with legacy AWK scripts could easily make the leap forward to Python!
Just for your interest, I have written a toy compiler from a simple LISP to Python. Practically, this is a LISP to pyc compiler.
Have a look: sinC - The tiniest LISP compiler
Probably a bit late at the party but if you're still interested the clojure-py project (https://github.com/halgari/clojure-py) is now able to compile a significant subset of clojure to python bytecode -- but some help is always welcome.
Targeting bytecode is not that hard in itself, except for one thing: it is not stable across platforms (e.g. MAKE_FUNCTION pops 2 elements from the stack in Python 3 but only 1 in Python 2), and these differences are not clearly documented in a single spot (afaict) -- so you probably have some abstraction layer needed.