Comparing performance between ruby and python code - python

I have a memory and CPU intensive problem to solve and I need to benchmark the different solutions in ruby and python on different platforms.
To do the benchmark, I need to measure the time taken and the memory occupied by objects (not the entire program, but a selected list of objects) in both python and ruby.
Please recommend ways to do it, and also let me know if it is possible to do it without using OS specify tools like (Task Manager and ps). Thanks!
Update: Yes, I know that both Python and Ruby are not strong in performance and there are better alternatives like c, c++, Java etc. I am actually more interested in comparing the performance of Python and Ruby. And please no fame-wars.

For Python I recommend heapy
from guppy import hpy
h = hpy()
print h.heap()
or Dowser or PySizer
For Ruby you can use the BleakHouse Plugin or just read this answer on memory leak debugging (ruby).

If you really need to write fast code in a language like this (and not a language far more suited to CPU intensive operations and close control over memory usage such as C++) then I'd recommend pushing the bulk of the work out to Cython.
Cython is a language that makes
writing C extensions for the Python
language as easy as Python itself.
Cython is based on the well-known
Pyrex, but supports more cutting edge
functionality and optimizations.
The Cython language is very close to
the Python language, but Cython
additionally supports calling C
functions and declaring C types on
variables and class attributes. This
allows the compiler to generate very
efficient C code from Cython code.
That way you can get most of the efficiency of C with most of the ease of use of Python.

If you are using Python for CPU intensive algorithmic tasks I suggest use Numpy/Scipy to speed up your numerical calculations and use the Psyco JIT compiler for everything else. Your speeds can approach that of much lower-level languages if you use optimized components.

I'd be wary of trying to measure just the memory consumption of an object graph over the lifecycle of an application. After all, you really don't care about that, in the end. You care that your application, in its entirety, has a sufficiently low footprint.
If you choose to limit your observation of memory consumption anyway, include garbage collector timing in your list of considerations, then look at ruby-prof:
http://ruby-prof.rubyforge.org/
Ciao,
Sheldon.

(you didn't specify py 2.5, 2.6 or 3; or ruby 1.8 or 1.9, jruby, MRI; The JVM has a wealth of tools to attack memory issues; Generally it 's helpful to zero in on memory depletion by posting stripped down versions of programs that replicate the problem
Heapy, ruby-prof, bleak house are all good tools, here are others:
Ruby
http://eigenclass.org/R2/writings/object-size-ruby-ocaml
watch ObjectSpace yourself
http://www.coderoshi.com/2007/08/cheap-tricks-ix-spying-on-ruby.html
http://sporkmonger.com/articles/2006/10/22/a-question
(ruby and python)
http://www.softwareverify.com/

Related

Does strict typing increase Python program performance?

Based on questions like this What makes C faster than Python? I've learned that dynamic/static typing isn't the main reason that C is faster than Python. It appears to be largely because python programs are interpreted, and c programs are compiled.
I'm wondering if strict typing would close the gap in performance for interpreted vs compiled programs enough that strict typing would be a viable strategy for improving interpreted Python program performance post facto?
If the answer is yes, is this done in pro-dev contexts?
With current versions of Python, type annotations are mostly hints for the programmer and possibly some validation tools but are ignored by the compiler and not used at runtime by the byte-code interpreter, which is similar to the behavior of Typescript.
It might be possible to change the semantics of Python to take advantage of static typing in some circumstances to generate more efficient byte-code and possibly perform just in time executable code generation (JIT). Advanced Javascript engines use complex heuristics to achieve this without type annotations. Both approaches could help make Python programs much faster and in some cases perform better than equivalent C code.
Note also that many advanced Python packages use native code, written in C and other languages, taking advantage of optimizing compilers, SIMD instructions and even multi-threading... The Python code in programs using these libraries is not where the time is spent and the performance is comparable to that of compiled languages, while giving the programmer a simpler language to express their problems.

Is it possible to compile Python code for high efficiency?

I'm trying to write some code that has to be as efficient as possible since I'll be doing thousands if not millions of operations in as short a time as possible.
Now I've got some code that does this written in C, but after some research on wrappers it seems above my skill level to delve in to that (especially since I have near no knowledge of C). Because of that, I was considering the possibility of re-writing the code/algorithm in python.
I've read that the reason most high efficiency tasks get done in C is because of the high efficiency of compiled languages and was wondering whether it is possible to compile python code to get close to the same efficiency using pure python.
The code would include specific bit manipulation, table lookups, bitwise logic and binary search or possibly using perfect hash.
Recap: I'm trying to code something that has to be efficient and was wondering whether it's possible to make python run at or near C efficiency. If so: how?
Ultimately, Python is an interpreted language. The Python binaries compiled for your target machine not only execute your Python code, but also the interpreter that takes your code and interprets each command at run-time. That additional overhead of interpretation is what affects performance.
Languages that get compiled to native machine code and skip both interpretation (Python) and a Virtual Machine (JVM, .Net CLR) for the target CPU offer the best performance. Those are C and C++11/14/17.
You can use languages that get best performance that do use a particular kind of virtual machine, such as LLVM, which is useful for SSA. Clang (C/C++) and Rust can be compiled for LLVM (front-end), analyzed and optimized using LLVM-specific tools, and finally the Assembly and Linking using an LLVM back-end for the target machine. Either way, (direct or indirect machine code generation), native compilation offers the best performance.

Does the Python 3 interpreter have a JIT feature?

I found that when I ask something more to Python, python doesn't use my machine resource at 100% and it's not really fast, it's fast if compared to many other interpreted languages, but when compared to compiled languages i think that the difference is really remarkable.
Is it possible to speedup things with a Just In Time (JIT) compiler in Python 3?
Usually a JIT compiler is the only thing that can improve performances in interpreted languages, so i'm referring to this one, if other solutions are available i would love to accept new answers.
First off, Python 3(.x) is a language, for which there can be any number of implementations. Okay, to this day no implementation except CPython actually implements those versions of the language. But that will change (PyPy is catching up).
To answer the question you meant to ask: CPython, 3.x or otherwise, does not, never did, and likely never will, contain a JIT compiler. Some other Python implementations (PyPy natively, Jython and IronPython by re-using JIT compilers for the virtual machines they build on) do have a JIT compiler. And there is no reason their JIT compilers would stop working when they add Python 3 support.
But while I'm here, also let me address a misconception:
Usually a JIT compiler is the only thing that can improve performances in interpreted languages
This is not correct. A JIT compiler, in its most basic form, merely removes interpreter overhead, which accounts for some of the slow down you see, but not for the majority. A good JIT compiler also performs a host of optimizations which remove the overhead needed to implement numerous Python features in general (by detecting special cases which permit a more efficient implementation), prominent examples being dynamic typing, polymorphism, and various introspective features.
Just implementing a compiler does not help with that. You need very clever optimizations, most of which are only valid in very specific circumstances and for a limited time window. JIT compilers have it easy here, because they can generate specialized code at run time (it's their whole point), can analyze the program easier (and more accurately) by observing it as it runs, and can undo optimizations when they become invalid. They can also interact with interpreters, unlike ahead of time compilers, and often do it because it's a sensible design decision. I guess this is why they are linked to interpreters in people's minds, although they can and do exist independently.
There are also other approaches to make Python implementation faster, apart from optimizing the interpreter's code itself - for example, the HotPy (2) project. But those are currently in research or experimentation stage, and are yet to show their effectiveness (and maturity) w.r.t. real code.
And of course, a specific program's performance depends on the program itself much more than the language implementation. The language implementation only sets an upper bound for how fast you can make a sequence of operations. Generally, you can improve the program's performance much better simply by avoiding unnecessary work, i.e. by optimizing the program. This is true regardless of whether you run the program through an interpreter, a JIT compiler, or an ahead-of-time compiler. If you want something to be fast, don't go out of your way to get at a faster language implementation. There are applications which are infeasible with the overhead of interpretation and dynamicness, but they aren't as common as you'd think (and often, solved by calling into machine code-compiled code selectively).
The only Python implementation that has a JIT is PyPy. Byt - PyPy is both a Python 2 implementation and a Python 3 implementation.
The Numba project should work on Python 3. Although it is not exactly what you asked, you may want to give it a try:
https://github.com/numba/numba/blob/master/docs/source/doc/userguide.rst.
It does not support all Python syntax at this time.
You can try the pypy py3 branch, which is more or less python compatible, but the official CPython implementation has no JIT.
This will best be answered by some of the remarkable Python developer folks on this site.
Still I want to comment: When discussing speed of interpreted languages, I just love to point to a project hosted at this location: Computer Language Benchmarks Game
It's a site dedicated to running benchmarks. There are specified tasks to do. Anybody can submit a solution in his/her preferred language and then the tests compare the runtime of each solution. Solutions can be peer reviewed, are often further improved by others, and results are checked against the spec. In the long run this is the most fair benchmarking system to compare different languages.
As you can see from indicative summaries like this one, compiled languages are quite fast compared to interpreted languages. However, the difference is probably not so much in the exact type of compilation, it's the fact that Python (and the others in the graph slower than python) are fully dynamic. Objects can be modified on the fly. Types can be modified on the fly. So some type checking has to be deferred to runtime, instead of compile time.
So while you can argue about compiler benefits, you have to take into account that there are different features in different languages. And those features may come at an intrinsic price.
Finally, when talking about speed: Most often it's not the language and the perceived slowness of a language that's causing the issue, it's a bad algorithm. I never had to switch languages because one was too slow: When there's a speed issue in my code, I fix the algorithm. However, if there are time-consuming, computational intensive loops in your code it is usually worth the while to recompile those. A prominent example are libraries coded in C used by scripting languages (Perl XS libs, or e.g. numpy/scipy for Python, lapack/blas are examples of libs available with bindings for many scripting languages)
If you mean JIT as in Just in time compiler to a Bytecode representation then it has such a feature(since 2.2). If you mean JIT to machine code, then no. Yet the compilation to byte code provides a lot of performance improvement. If you want it to compile to machine code, then Pypy is the implementation you're looking for.
Note: pypy doesn't work with Python 3.x
If you are looking for speed improvements in a block of code, then you may want to have a look to rpythonic, that compiles down to C using pypy. It uses a decorator that converts it in a JIT for Python.

Why does Psyco use a lot of memory?

Psyco is a specialising compiler for Python. The documentation states
Psyco can and will use large amounts of memory.
What are the main reasons for this memory usage? Is substantial memory overhead a feature of JIT compilers in general?
Edit: Thanks for the answers so far. There are three likely contenders.
Writing multiple specialised blocks, each of which require memory
Overhead due to compiling source on the fly
Overhead due to capturing enough data to do dynamic profiling
The question is, which one is the dominant factor in memory usage? I have my own opinion. But I'm adding a bounty, because I'd like to accept the answer that's actually correct! If anyone can demonstrate or prove where the majority of the memory is used, I'll accept it. Otherwise whoever the community votes for will be auto-accepted at the end of the bounty.
From psyco website "The difference with the traditional approach to JIT compilers is that Psyco writes several version of the same blocks (a block is a bit of a function), which are optimized by being specialized to some kinds of variables (a "kind" can mean a type, but it is more general)"
"Psyco uses the actual run-time data that your program manipulates to write potentially several versions of the machine code, each differently specialized for different kinds of data." http://psyco.sourceforge.net/introduction.html
Many JIT compilers work with statically typed languages, so they know what the types are so can create machine code for just the known types. The better ones do dynamic profiling if the types are polymorphic and optimise the more commonly encountered paths; this is also commonly done with languages featuring dynamic types†. Psyco appears to hedge its bets in order to avoid doing a full program analysis to decide what the types could be, or profiling to find what the types in use are.
† I've never gone deep enough into Python to work out whether it does or doesn't have dynamic types or not ( types whose structure can be changed at runtime after objects have been created with that type ), or just the common implementations only check types at runtime; most of the articles just rave about dynamic typing without actually defining it in the context of Python.
The memory overhead of Psyco is currently large. I has been reduced a bit over time, but it is still an overhead. This overhead is proportional to the amount of Python code that Psyco rewrites; thus if your application has a few algorithmic "core" functions, these are the ones you will want Psyco to accelerate --- not the whole program.
So I would think the large memory requirements are due to the fact that it's loading source into memory and then compiling it as it goes. The more source you try and compile the more it's going to need. I'd guess that if it's trying to optomise it on top of that, it'll look at multiple possible solutions to try and identify the best case.
Definitely psyco memory usage comes from compiled assembler blocks. Psyco suffers sometimes from overspecialization of functions, which means there are multiple versions of assembler
blocks. Also, which is also very important, psyco never frees once allocated assembler blocks
even if the code assosciated with it is dead.
If you run your program under linux you can look at /proc/xxx/smaps to see a growing block of anonymous memory, which is in different region than heap. That's anonymously mmap'ed part for writing down assembler, which of course disappears when running without psyco.

Is it reasonable to integrate python with c for performance?

I like to use python for almost everything and always had clear in my mind that if for some reason I was to find a bottleneck in my python code(due to python's limitations), I could always use a C script integrated to my code.
But, as I started to read a guide on how to integrate python. In the article the author says:
There are several reasons why one might wish to extend Python in C or C++, such as:
Calling functions in an existing library.
Adding a new builtin type to Python
Optimising inner loops in code
Exposing a C++ class library to Python
Embedding Python inside a C/C++ application
Nothing about performance. So I ask again, is it reasonable to integrate python with c for performance?
In my experience it is rarely necessary to optimize using C. I prefer to identify bottlenecks and improve algorithms in those areas completely in Python. Using hash tables, caching, and generally re-organizing your data structures to suit future needs has amazing potential for speeding up your program. As your program develops you'll get a better sense of what kind of material can be precalculated, so don't be afraid to go back and redo your storage and algorithms. Additionally, look for chances to kill "two birds with one stone", such as sorting objects as you render them instead of doing huge sorts.
When everything is worked to the best of your knowledge, I'd consider using an optimizer like Psyco. I've experienced literally 10x performance improvements just by using Psyco and adding one line to my program.
If all else fails, use C in the proper places and you'll get what you want.
* Optimising inner loops in code
Isn't that about performance ?
Performance is a broad topic so you should be more specific. If the bottleneck in your program involves a lot of networking then rewriting it in C/C++ probably won't make a difference since it's the network calls taking up time, not your code. You would be better off rewriting the slow section of your program to use fewer network calls thus reducing the time your program spends waiting on entwork IO. If your doing math intensive stuff such as solving differential equations and you know there are C librarys that can offer better performance then the way you are currently doing it in Python you may want to rewrite the section of your program to use those librarys to increase it's performance.
The C extensions API is notoriously hard to work with, but there are a number of other ways to integrate C code.
For some more usable alternatives see http://www.scipy.org/PerformancePython, in particular the section about using Weave for easy inlining of C code.
Also of interest is Cython, which provides a nice system for integrating with C code. Cython is used for optimization by some well-respected high-performance Python projects such as NumPy and Sage.
As mentioned above, Psyco is another attractive option for optimization, and one which requires nothing more than
import psyco
psyco.bind(myfunction)
Psyco will identify your inner loops and automatically substitute optimized versions of the routines.
C can definitely speed up processor bound tasks. Integrating is even easier now, with the ctypes library, or you could go for any of the other methods you mention.
I feel mercurial has done a good job with the integration if you want to look at their code as an example. The compute intensive tasks are in C, and everything else is python.
You will gain a large performance boost using C from Python (assuming your code is well written, etc) because Python is interpreted at run time, whereas C is compiled beforehand. This will speed up things quite a bit because with C, your code is simply running, whereas with Python, the Python interpreter must figure out what you are doing and interpret it into machine instructions.
I've been told for the calculating portion use C for the scripting use python. So yes you can integrate both. C is capable of faster calculations than that of python

Categories