Psyco is a specialising compiler for Python. The documentation states
Psyco can and will use large amounts of memory.
What are the main reasons for this memory usage? Is substantial memory overhead a feature of JIT compilers in general?
Edit: Thanks for the answers so far. There are three likely contenders.
Writing multiple specialised blocks, each of which require memory
Overhead due to compiling source on the fly
Overhead due to capturing enough data to do dynamic profiling
The question is, which one is the dominant factor in memory usage? I have my own opinion. But I'm adding a bounty, because I'd like to accept the answer that's actually correct! If anyone can demonstrate or prove where the majority of the memory is used, I'll accept it. Otherwise whoever the community votes for will be auto-accepted at the end of the bounty.
From psyco website "The difference with the traditional approach to JIT compilers is that Psyco writes several version of the same blocks (a block is a bit of a function), which are optimized by being specialized to some kinds of variables (a "kind" can mean a type, but it is more general)"
"Psyco uses the actual run-time data that your program manipulates to write potentially several versions of the machine code, each differently specialized for different kinds of data." http://psyco.sourceforge.net/introduction.html
Many JIT compilers work with statically typed languages, so they know what the types are so can create machine code for just the known types. The better ones do dynamic profiling if the types are polymorphic and optimise the more commonly encountered paths; this is also commonly done with languages featuring dynamic types†. Psyco appears to hedge its bets in order to avoid doing a full program analysis to decide what the types could be, or profiling to find what the types in use are.
† I've never gone deep enough into Python to work out whether it does or doesn't have dynamic types or not ( types whose structure can be changed at runtime after objects have been created with that type ), or just the common implementations only check types at runtime; most of the articles just rave about dynamic typing without actually defining it in the context of Python.
The memory overhead of Psyco is currently large. I has been reduced a bit over time, but it is still an overhead. This overhead is proportional to the amount of Python code that Psyco rewrites; thus if your application has a few algorithmic "core" functions, these are the ones you will want Psyco to accelerate --- not the whole program.
So I would think the large memory requirements are due to the fact that it's loading source into memory and then compiling it as it goes. The more source you try and compile the more it's going to need. I'd guess that if it's trying to optomise it on top of that, it'll look at multiple possible solutions to try and identify the best case.
Definitely psyco memory usage comes from compiled assembler blocks. Psyco suffers sometimes from overspecialization of functions, which means there are multiple versions of assembler
blocks. Also, which is also very important, psyco never frees once allocated assembler blocks
even if the code assosciated with it is dead.
If you run your program under linux you can look at /proc/xxx/smaps to see a growing block of anonymous memory, which is in different region than heap. That's anonymously mmap'ed part for writing down assembler, which of course disappears when running without psyco.
Related
I would like to use statprof.py for profiling code in PyPy. Unfortunately, it does not seem to work, the line numbers it points to are off. Does anyone know how to make it work or know of an alternative?
It's likely that "the line numbers are off" because PyPy, in JITted code, will inline many functions and will only deliver signals (here from the timer) at the end of the loops. Compare this with CPython, which delivers the signals between two random bytecodes -- occasionally at the end of the loops too, but generally anywhere. So what you get on PyPy is the same as what you'd get on CPython if you constrained the signal handlers to run only at the "end of loop" bytecode.
This is why this kind of profiling will seem to always miss a lot of functions, like most functions with no loop in them.
You can try to use the built-in cProfile module. It comes of course with a bigger performance hit than statistical profiling, but try it anyway --- it doesn't prevent JITting, for example, so the performance hit should still be reasonable.
More generally, I don't see an easy way to implement the equivalent of statistical profiling in PyPy. It's quite hard to give it sense in the presence of functions that are inlined into each other and then optimized globally... I'd be interested if you can find that a tool actually exists, for some other high-level language, doing statistical profiling, on a VM with a tracing JIT.
We could record enough information to track each small group of assembler instructions back to the real Python function it comes from, and then use hacks to inspect the current Instruction Pointer (IP) at the machine level. Not impossible, but serious work :-)
I found that when I ask something more to Python, python doesn't use my machine resource at 100% and it's not really fast, it's fast if compared to many other interpreted languages, but when compared to compiled languages i think that the difference is really remarkable.
Is it possible to speedup things with a Just In Time (JIT) compiler in Python 3?
Usually a JIT compiler is the only thing that can improve performances in interpreted languages, so i'm referring to this one, if other solutions are available i would love to accept new answers.
First off, Python 3(.x) is a language, for which there can be any number of implementations. Okay, to this day no implementation except CPython actually implements those versions of the language. But that will change (PyPy is catching up).
To answer the question you meant to ask: CPython, 3.x or otherwise, does not, never did, and likely never will, contain a JIT compiler. Some other Python implementations (PyPy natively, Jython and IronPython by re-using JIT compilers for the virtual machines they build on) do have a JIT compiler. And there is no reason their JIT compilers would stop working when they add Python 3 support.
But while I'm here, also let me address a misconception:
Usually a JIT compiler is the only thing that can improve performances in interpreted languages
This is not correct. A JIT compiler, in its most basic form, merely removes interpreter overhead, which accounts for some of the slow down you see, but not for the majority. A good JIT compiler also performs a host of optimizations which remove the overhead needed to implement numerous Python features in general (by detecting special cases which permit a more efficient implementation), prominent examples being dynamic typing, polymorphism, and various introspective features.
Just implementing a compiler does not help with that. You need very clever optimizations, most of which are only valid in very specific circumstances and for a limited time window. JIT compilers have it easy here, because they can generate specialized code at run time (it's their whole point), can analyze the program easier (and more accurately) by observing it as it runs, and can undo optimizations when they become invalid. They can also interact with interpreters, unlike ahead of time compilers, and often do it because it's a sensible design decision. I guess this is why they are linked to interpreters in people's minds, although they can and do exist independently.
There are also other approaches to make Python implementation faster, apart from optimizing the interpreter's code itself - for example, the HotPy (2) project. But those are currently in research or experimentation stage, and are yet to show their effectiveness (and maturity) w.r.t. real code.
And of course, a specific program's performance depends on the program itself much more than the language implementation. The language implementation only sets an upper bound for how fast you can make a sequence of operations. Generally, you can improve the program's performance much better simply by avoiding unnecessary work, i.e. by optimizing the program. This is true regardless of whether you run the program through an interpreter, a JIT compiler, or an ahead-of-time compiler. If you want something to be fast, don't go out of your way to get at a faster language implementation. There are applications which are infeasible with the overhead of interpretation and dynamicness, but they aren't as common as you'd think (and often, solved by calling into machine code-compiled code selectively).
The only Python implementation that has a JIT is PyPy. Byt - PyPy is both a Python 2 implementation and a Python 3 implementation.
The Numba project should work on Python 3. Although it is not exactly what you asked, you may want to give it a try:
https://github.com/numba/numba/blob/master/docs/source/doc/userguide.rst.
It does not support all Python syntax at this time.
You can try the pypy py3 branch, which is more or less python compatible, but the official CPython implementation has no JIT.
This will best be answered by some of the remarkable Python developer folks on this site.
Still I want to comment: When discussing speed of interpreted languages, I just love to point to a project hosted at this location: Computer Language Benchmarks Game
It's a site dedicated to running benchmarks. There are specified tasks to do. Anybody can submit a solution in his/her preferred language and then the tests compare the runtime of each solution. Solutions can be peer reviewed, are often further improved by others, and results are checked against the spec. In the long run this is the most fair benchmarking system to compare different languages.
As you can see from indicative summaries like this one, compiled languages are quite fast compared to interpreted languages. However, the difference is probably not so much in the exact type of compilation, it's the fact that Python (and the others in the graph slower than python) are fully dynamic. Objects can be modified on the fly. Types can be modified on the fly. So some type checking has to be deferred to runtime, instead of compile time.
So while you can argue about compiler benefits, you have to take into account that there are different features in different languages. And those features may come at an intrinsic price.
Finally, when talking about speed: Most often it's not the language and the perceived slowness of a language that's causing the issue, it's a bad algorithm. I never had to switch languages because one was too slow: When there's a speed issue in my code, I fix the algorithm. However, if there are time-consuming, computational intensive loops in your code it is usually worth the while to recompile those. A prominent example are libraries coded in C used by scripting languages (Perl XS libs, or e.g. numpy/scipy for Python, lapack/blas are examples of libs available with bindings for many scripting languages)
If you mean JIT as in Just in time compiler to a Bytecode representation then it has such a feature(since 2.2). If you mean JIT to machine code, then no. Yet the compilation to byte code provides a lot of performance improvement. If you want it to compile to machine code, then Pypy is the implementation you're looking for.
Note: pypy doesn't work with Python 3.x
If you are looking for speed improvements in a block of code, then you may want to have a look to rpythonic, that compiles down to C using pypy. It uses a decorator that converts it in a JIT for Python.
While I know projects promising large speed gains can result in let downs, I don't see much in the way of a roadmap for speeding up CPython and/or PyPy.
Is there something planned that promises a huge boost in speed for the core interpreter (e.g. --with-computed-gotos) in either of them? How about their standard libraries (e.g. Decimal in C, IO in C)?
I know HotPy(2) has an outline of a plan for speeding CPython up, but it sounds like an one-man project without much traction in core CPython.
PyPy has some information about where performance isn't great, but I can find no big goals for speedup in the docs.
So, are there known targets that could bring big performance improvement for Python implementations?
I'll answer the part about PyPy. I can't speak for CPython, but I think there are performance improvements that are being worked on (don't quote me on this though).
There is no project plan, since it's really not working that way. All the major parts (like "JIT" or "Garbage Collection") has been essentially done, however that completely does not mean everything is fast. There are definitely things that are slow and we generally improve on a case by case basis - submit a bug report if you think something is too slow. I have quite a few performance improvements on my plate that would definitely help twisted, but I have no idea about others.
Big things that are being worked on that might be worth mentioning:
Improved frames, that should help recursion and function calls that are not inlined (for example that contain loops)
Better string implementations for various kinds of usages, like concatenation, slicing etc.
Faster tracing
More compact tuples and objects, storing unwrapped results
Can I promise when how or how much it'll speed up things? Absolutely not, but on average we manage to have 10-30% speed improvements release-to-release, which is usually every 4 months or so, so I guess some stuff will get faster, but without you giving me a crystal ball or a time machine, I won't tell you for sure.
Cheers,
fijal
Your comments belie a lot of confusion...
PyPy and Python have currently very different performance capabilities.
Pypy is currently more than 5x faster than CPython on average.
HotPy has nothing to do with CPython. It's a one-man project and it's a whole new VM (not yet released, so I can't say anything about it's performance).
At the moment, there's a lot of activity in the PyPy project and they are improving it day by day.
There's a numpy port in a very advanced stage of development, they are improving ctypes, Cython compatibility, and soon there will be a complete Python3 implementation.
I believe PyPy is currently on pair with the V8 JavaScript engine and similar projects in terms of performance.
If speed and Python is what you want, pay attention to this project.
The answer is that PyPy is the plan to speed up CPython. PyPy aims to be an extremely conformant python interpreter which is highly optimized. The project has collected together all of the benchmarks they could find, and runs all of them for each build of pypy, to ensure against performance regressions. Check it out: http://speed.pypy.org/
I believe that by the time that the performance of cpython won't cut it anymore (for web dev work), pypy will be completely ready for prime-time. Raymond Hettinger (a core python dev) has called PyPy "python with the optimizations turned on".
I have a memory and CPU intensive problem to solve and I need to benchmark the different solutions in ruby and python on different platforms.
To do the benchmark, I need to measure the time taken and the memory occupied by objects (not the entire program, but a selected list of objects) in both python and ruby.
Please recommend ways to do it, and also let me know if it is possible to do it without using OS specify tools like (Task Manager and ps). Thanks!
Update: Yes, I know that both Python and Ruby are not strong in performance and there are better alternatives like c, c++, Java etc. I am actually more interested in comparing the performance of Python and Ruby. And please no fame-wars.
For Python I recommend heapy
from guppy import hpy
h = hpy()
print h.heap()
or Dowser or PySizer
For Ruby you can use the BleakHouse Plugin or just read this answer on memory leak debugging (ruby).
If you really need to write fast code in a language like this (and not a language far more suited to CPU intensive operations and close control over memory usage such as C++) then I'd recommend pushing the bulk of the work out to Cython.
Cython is a language that makes
writing C extensions for the Python
language as easy as Python itself.
Cython is based on the well-known
Pyrex, but supports more cutting edge
functionality and optimizations.
The Cython language is very close to
the Python language, but Cython
additionally supports calling C
functions and declaring C types on
variables and class attributes. This
allows the compiler to generate very
efficient C code from Cython code.
That way you can get most of the efficiency of C with most of the ease of use of Python.
If you are using Python for CPU intensive algorithmic tasks I suggest use Numpy/Scipy to speed up your numerical calculations and use the Psyco JIT compiler for everything else. Your speeds can approach that of much lower-level languages if you use optimized components.
I'd be wary of trying to measure just the memory consumption of an object graph over the lifecycle of an application. After all, you really don't care about that, in the end. You care that your application, in its entirety, has a sufficiently low footprint.
If you choose to limit your observation of memory consumption anyway, include garbage collector timing in your list of considerations, then look at ruby-prof:
http://ruby-prof.rubyforge.org/
Ciao,
Sheldon.
(you didn't specify py 2.5, 2.6 or 3; or ruby 1.8 or 1.9, jruby, MRI; The JVM has a wealth of tools to attack memory issues; Generally it 's helpful to zero in on memory depletion by posting stripped down versions of programs that replicate the problem
Heapy, ruby-prof, bleak house are all good tools, here are others:
Ruby
http://eigenclass.org/R2/writings/object-size-ruby-ocaml
watch ObjectSpace yourself
http://www.coderoshi.com/2007/08/cheap-tricks-ix-spying-on-ruby.html
http://sporkmonger.com/articles/2006/10/22/a-question
(ruby and python)
http://www.softwareverify.com/
I like to use python for almost everything and always had clear in my mind that if for some reason I was to find a bottleneck in my python code(due to python's limitations), I could always use a C script integrated to my code.
But, as I started to read a guide on how to integrate python. In the article the author says:
There are several reasons why one might wish to extend Python in C or C++, such as:
Calling functions in an existing library.
Adding a new builtin type to Python
Optimising inner loops in code
Exposing a C++ class library to Python
Embedding Python inside a C/C++ application
Nothing about performance. So I ask again, is it reasonable to integrate python with c for performance?
In my experience it is rarely necessary to optimize using C. I prefer to identify bottlenecks and improve algorithms in those areas completely in Python. Using hash tables, caching, and generally re-organizing your data structures to suit future needs has amazing potential for speeding up your program. As your program develops you'll get a better sense of what kind of material can be precalculated, so don't be afraid to go back and redo your storage and algorithms. Additionally, look for chances to kill "two birds with one stone", such as sorting objects as you render them instead of doing huge sorts.
When everything is worked to the best of your knowledge, I'd consider using an optimizer like Psyco. I've experienced literally 10x performance improvements just by using Psyco and adding one line to my program.
If all else fails, use C in the proper places and you'll get what you want.
* Optimising inner loops in code
Isn't that about performance ?
Performance is a broad topic so you should be more specific. If the bottleneck in your program involves a lot of networking then rewriting it in C/C++ probably won't make a difference since it's the network calls taking up time, not your code. You would be better off rewriting the slow section of your program to use fewer network calls thus reducing the time your program spends waiting on entwork IO. If your doing math intensive stuff such as solving differential equations and you know there are C librarys that can offer better performance then the way you are currently doing it in Python you may want to rewrite the section of your program to use those librarys to increase it's performance.
The C extensions API is notoriously hard to work with, but there are a number of other ways to integrate C code.
For some more usable alternatives see http://www.scipy.org/PerformancePython, in particular the section about using Weave for easy inlining of C code.
Also of interest is Cython, which provides a nice system for integrating with C code. Cython is used for optimization by some well-respected high-performance Python projects such as NumPy and Sage.
As mentioned above, Psyco is another attractive option for optimization, and one which requires nothing more than
import psyco
psyco.bind(myfunction)
Psyco will identify your inner loops and automatically substitute optimized versions of the routines.
C can definitely speed up processor bound tasks. Integrating is even easier now, with the ctypes library, or you could go for any of the other methods you mention.
I feel mercurial has done a good job with the integration if you want to look at their code as an example. The compute intensive tasks are in C, and everything else is python.
You will gain a large performance boost using C from Python (assuming your code is well written, etc) because Python is interpreted at run time, whereas C is compiled beforehand. This will speed up things quite a bit because with C, your code is simply running, whereas with Python, the Python interpreter must figure out what you are doing and interpret it into machine instructions.
I've been told for the calculating portion use C for the scripting use python. So yes you can integrate both. C is capable of faster calculations than that of python