PyPy and CPython: are big performance increases planned? - python

While I know projects promising large speed gains can result in let downs, I don't see much in the way of a roadmap for speeding up CPython and/or PyPy.
Is there something planned that promises a huge boost in speed for the core interpreter (e.g. --with-computed-gotos) in either of them? How about their standard libraries (e.g. Decimal in C, IO in C)?
I know HotPy(2) has an outline of a plan for speeding CPython up, but it sounds like an one-man project without much traction in core CPython.
PyPy has some information about where performance isn't great, but I can find no big goals for speedup in the docs.
So, are there known targets that could bring big performance improvement for Python implementations?

I'll answer the part about PyPy. I can't speak for CPython, but I think there are performance improvements that are being worked on (don't quote me on this though).
There is no project plan, since it's really not working that way. All the major parts (like "JIT" or "Garbage Collection") has been essentially done, however that completely does not mean everything is fast. There are definitely things that are slow and we generally improve on a case by case basis - submit a bug report if you think something is too slow. I have quite a few performance improvements on my plate that would definitely help twisted, but I have no idea about others.
Big things that are being worked on that might be worth mentioning:
Improved frames, that should help recursion and function calls that are not inlined (for example that contain loops)
Better string implementations for various kinds of usages, like concatenation, slicing etc.
Faster tracing
More compact tuples and objects, storing unwrapped results
Can I promise when how or how much it'll speed up things? Absolutely not, but on average we manage to have 10-30% speed improvements release-to-release, which is usually every 4 months or so, so I guess some stuff will get faster, but without you giving me a crystal ball or a time machine, I won't tell you for sure.
Cheers,
fijal

Your comments belie a lot of confusion...
PyPy and Python have currently very different performance capabilities.
Pypy is currently more than 5x faster than CPython on average.
HotPy has nothing to do with CPython. It's a one-man project and it's a whole new VM (not yet released, so I can't say anything about it's performance).
At the moment, there's a lot of activity in the PyPy project and they are improving it day by day.
There's a numpy port in a very advanced stage of development, they are improving ctypes, Cython compatibility, and soon there will be a complete Python3 implementation.
I believe PyPy is currently on pair with the V8 JavaScript engine and similar projects in terms of performance.
If speed and Python is what you want, pay attention to this project.

The answer is that PyPy is the plan to speed up CPython. PyPy aims to be an extremely conformant python interpreter which is highly optimized. The project has collected together all of the benchmarks they could find, and runs all of them for each build of pypy, to ensure against performance regressions. Check it out: http://speed.pypy.org/
I believe that by the time that the performance of cpython won't cut it anymore (for web dev work), pypy will be completely ready for prime-time. Raymond Hettinger (a core python dev) has called PyPy "python with the optimizations turned on".

Related

Pointers and memory management in Python

I've started learning Python yesterday, and it seems to abstract away many features that would otherwise be essential for a language like C (which I know).
I understand the appeal of a much simpler syntax, short code and less memory-related bugs, but, in certain cases, I'd prefer to manually allocate, reallocate and free memory, as well as using pointers. Is there any way to do so in Python 3.10?

Does the Python 3 interpreter have a JIT feature?

I found that when I ask something more to Python, python doesn't use my machine resource at 100% and it's not really fast, it's fast if compared to many other interpreted languages, but when compared to compiled languages i think that the difference is really remarkable.
Is it possible to speedup things with a Just In Time (JIT) compiler in Python 3?
Usually a JIT compiler is the only thing that can improve performances in interpreted languages, so i'm referring to this one, if other solutions are available i would love to accept new answers.
First off, Python 3(.x) is a language, for which there can be any number of implementations. Okay, to this day no implementation except CPython actually implements those versions of the language. But that will change (PyPy is catching up).
To answer the question you meant to ask: CPython, 3.x or otherwise, does not, never did, and likely never will, contain a JIT compiler. Some other Python implementations (PyPy natively, Jython and IronPython by re-using JIT compilers for the virtual machines they build on) do have a JIT compiler. And there is no reason their JIT compilers would stop working when they add Python 3 support.
But while I'm here, also let me address a misconception:
Usually a JIT compiler is the only thing that can improve performances in interpreted languages
This is not correct. A JIT compiler, in its most basic form, merely removes interpreter overhead, which accounts for some of the slow down you see, but not for the majority. A good JIT compiler also performs a host of optimizations which remove the overhead needed to implement numerous Python features in general (by detecting special cases which permit a more efficient implementation), prominent examples being dynamic typing, polymorphism, and various introspective features.
Just implementing a compiler does not help with that. You need very clever optimizations, most of which are only valid in very specific circumstances and for a limited time window. JIT compilers have it easy here, because they can generate specialized code at run time (it's their whole point), can analyze the program easier (and more accurately) by observing it as it runs, and can undo optimizations when they become invalid. They can also interact with interpreters, unlike ahead of time compilers, and often do it because it's a sensible design decision. I guess this is why they are linked to interpreters in people's minds, although they can and do exist independently.
There are also other approaches to make Python implementation faster, apart from optimizing the interpreter's code itself - for example, the HotPy (2) project. But those are currently in research or experimentation stage, and are yet to show their effectiveness (and maturity) w.r.t. real code.
And of course, a specific program's performance depends on the program itself much more than the language implementation. The language implementation only sets an upper bound for how fast you can make a sequence of operations. Generally, you can improve the program's performance much better simply by avoiding unnecessary work, i.e. by optimizing the program. This is true regardless of whether you run the program through an interpreter, a JIT compiler, or an ahead-of-time compiler. If you want something to be fast, don't go out of your way to get at a faster language implementation. There are applications which are infeasible with the overhead of interpretation and dynamicness, but they aren't as common as you'd think (and often, solved by calling into machine code-compiled code selectively).
The only Python implementation that has a JIT is PyPy. Byt - PyPy is both a Python 2 implementation and a Python 3 implementation.
The Numba project should work on Python 3. Although it is not exactly what you asked, you may want to give it a try:
https://github.com/numba/numba/blob/master/docs/source/doc/userguide.rst.
It does not support all Python syntax at this time.
You can try the pypy py3 branch, which is more or less python compatible, but the official CPython implementation has no JIT.
This will best be answered by some of the remarkable Python developer folks on this site.
Still I want to comment: When discussing speed of interpreted languages, I just love to point to a project hosted at this location: Computer Language Benchmarks Game
It's a site dedicated to running benchmarks. There are specified tasks to do. Anybody can submit a solution in his/her preferred language and then the tests compare the runtime of each solution. Solutions can be peer reviewed, are often further improved by others, and results are checked against the spec. In the long run this is the most fair benchmarking system to compare different languages.
As you can see from indicative summaries like this one, compiled languages are quite fast compared to interpreted languages. However, the difference is probably not so much in the exact type of compilation, it's the fact that Python (and the others in the graph slower than python) are fully dynamic. Objects can be modified on the fly. Types can be modified on the fly. So some type checking has to be deferred to runtime, instead of compile time.
So while you can argue about compiler benefits, you have to take into account that there are different features in different languages. And those features may come at an intrinsic price.
Finally, when talking about speed: Most often it's not the language and the perceived slowness of a language that's causing the issue, it's a bad algorithm. I never had to switch languages because one was too slow: When there's a speed issue in my code, I fix the algorithm. However, if there are time-consuming, computational intensive loops in your code it is usually worth the while to recompile those. A prominent example are libraries coded in C used by scripting languages (Perl XS libs, or e.g. numpy/scipy for Python, lapack/blas are examples of libs available with bindings for many scripting languages)
If you mean JIT as in Just in time compiler to a Bytecode representation then it has such a feature(since 2.2). If you mean JIT to machine code, then no. Yet the compilation to byte code provides a lot of performance improvement. If you want it to compile to machine code, then Pypy is the implementation you're looking for.
Note: pypy doesn't work with Python 3.x
If you are looking for speed improvements in a block of code, then you may want to have a look to rpythonic, that compiles down to C using pypy. It uses a decorator that converts it in a JIT for Python.

RegEx performance in Objective-C vs Python

This question might seem vague, sorry. Does anybody have experience writing RegEx with Objective-C and Python? I am wondering about the performance of one vs the other? Which is faster in terms of 1. runtime speed, and 2. memory consumption? I have a Mac OS application that is always running in the background, and I'd like my app to index some text files that are being saved, and then save the result... I could write a regex method in my app in Obj-C, or I could potentially write a separate app using Perl or Python (just a beginner in Python).
(Thanks, I got some good info from some of you already. Boo to those who downvoted; I am here to learn, and I might have some stupid questions time to time - part of the deal.)
If you’re looking for raw speed, neither of those two would be a very good choice. For execution speed, you’d choose Perl. For how quickly you could code it up, either Python or Perl alike would easily beat the time to write it in Objective C, just as both would easily beat a Java solution. High-level languages that take less time to code up are always a win if all you’re measuring is time-to-solution compared with solutions that take many more lines of code.
As far as actual run-time performance goes, Perl’s regexes are written in very tightly coded C, and are known to be the fastest and most flexible regexes available. The regex optimizer does a lot of very clever things to the compiled regex program, such as applying an Aho–Corasick start-point optimization for finding the start of an alternation trie, running in O(1) time. Nobody else does that. Heck, I don’t think anybody else but Perl even bothers to optimize alternations into tries, which is the thing that takes you from O(n) to O(1), because the compiler spent more time doing something smart so that the interpreter runs much faster. Perl regexes also offer substantial improvements in debugging and profiling. They’re also more flexible than Python’s, but the debugging alone is enough to tip the balance.
The only exception on performance matters is with certain pathological patterns that degenerate when run under any recursive backtracker, whether Perl’s, Java’s, or Python’s. Those can be addressed by using the highly recommended RE2 library, written by Russ Cox, as a replacement plugin. I know it’s available as a transparent replacement regex engine for Perl, and I’m pretty sure I remember seeing that it was also available for Python, too.
On the other hand, if you really want to use Python but just want a more expressive and robust regex library, particularly one that is well-behaved on Unicode, then you want to use Matthew Barnett’s regex module, available for both Python2 and Python3. Besides conforming to tr18’s level-1 compliance requirements (that’s the standards doc on Unicode regexes), it also has all kinds of other clever features, some of which are completely sui generis. If you’re a regex connoisseur, it’s very much worth checking out.
In my Mac OS application I will be doing some text processing, and I was wondering if doing that in Python would be faster.
It will be faster in terms of development time, almost certainly. For nearly all software projects, development time dominates runtime as a measure of success.
If you mean runtime, then you're almost certainly doing premature optimization, unless you've shown that slow code will cause unbearable/noticeable user-interface slowdown.
Premature optimization is the root of all evil.
-- Donald Knuth

Comparing performance between ruby and python code

I have a memory and CPU intensive problem to solve and I need to benchmark the different solutions in ruby and python on different platforms.
To do the benchmark, I need to measure the time taken and the memory occupied by objects (not the entire program, but a selected list of objects) in both python and ruby.
Please recommend ways to do it, and also let me know if it is possible to do it without using OS specify tools like (Task Manager and ps). Thanks!
Update: Yes, I know that both Python and Ruby are not strong in performance and there are better alternatives like c, c++, Java etc. I am actually more interested in comparing the performance of Python and Ruby. And please no fame-wars.
For Python I recommend heapy
from guppy import hpy
h = hpy()
print h.heap()
or Dowser or PySizer
For Ruby you can use the BleakHouse Plugin or just read this answer on memory leak debugging (ruby).
If you really need to write fast code in a language like this (and not a language far more suited to CPU intensive operations and close control over memory usage such as C++) then I'd recommend pushing the bulk of the work out to Cython.
Cython is a language that makes
writing C extensions for the Python
language as easy as Python itself.
Cython is based on the well-known
Pyrex, but supports more cutting edge
functionality and optimizations.
The Cython language is very close to
the Python language, but Cython
additionally supports calling C
functions and declaring C types on
variables and class attributes. This
allows the compiler to generate very
efficient C code from Cython code.
That way you can get most of the efficiency of C with most of the ease of use of Python.
If you are using Python for CPU intensive algorithmic tasks I suggest use Numpy/Scipy to speed up your numerical calculations and use the Psyco JIT compiler for everything else. Your speeds can approach that of much lower-level languages if you use optimized components.
I'd be wary of trying to measure just the memory consumption of an object graph over the lifecycle of an application. After all, you really don't care about that, in the end. You care that your application, in its entirety, has a sufficiently low footprint.
If you choose to limit your observation of memory consumption anyway, include garbage collector timing in your list of considerations, then look at ruby-prof:
http://ruby-prof.rubyforge.org/
Ciao,
Sheldon.
(you didn't specify py 2.5, 2.6 or 3; or ruby 1.8 or 1.9, jruby, MRI; The JVM has a wealth of tools to attack memory issues; Generally it 's helpful to zero in on memory depletion by posting stripped down versions of programs that replicate the problem
Heapy, ruby-prof, bleak house are all good tools, here are others:
Ruby
http://eigenclass.org/R2/writings/object-size-ruby-ocaml
watch ObjectSpace yourself
http://www.coderoshi.com/2007/08/cheap-tricks-ix-spying-on-ruby.html
http://sporkmonger.com/articles/2006/10/22/a-question
(ruby and python)
http://www.softwareverify.com/

Opinions on Unladen Swallow? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are your opinions and expectations on Google's Unladen Swallow? From their project plan:
We want to make Python faster, but we
also want to make it easy for large,
well-established applications to
switch to Unladen Swallow.
Produce a version of Python at least 5x faster than CPython.
Python application performance should be stable.
Maintain source-level compatibility with CPython
applications.
Maintain source-level compatibility with CPython extension
modules.
We do not want to maintain a Python implementation forever; we view
our work as a branch, not a fork.
And even sweeter:
In addition, we intend to remove the
GIL and fix the state of
multithreading in Python. We believe
this is possible through the
implementation of a more sophisticated
GC
It almost looks too good to be true, like the best of PyPy and Stackless combined.
More info:
Jesse Noller: "Pycon: Unladen-Swallow"
ArsTechnica: "Google searches for holy grail of Python performance"
Update: as DNS pointed out, there was related question: What is LLVM and How is replacing Python VM with LLVM increasing speeds 5x?
I have high hopes for it.
This is being worked on by several people from Google. Seeing as how the BDFL is also employed there, this is a positive.
Off the bat, they state that this is a branch, and not a fork. As such, it's within the realm of possibility that this will eventually get merged into trunk.
Most importantly, they have a working version. They're using a version of unladen swallow right now for Youtube stuff.
They seem to have their shit together. They have a relatively detailed plan for a project at this stage, and they have a list of tests they use to gauge performance improvements and regressions.
I'm not holding my breath on GIL removal, but even if they never get around to that, the speed increases alone make it awesome.
I'm sorry to disappoint you, but when you read PEP 3146 things look bad.
The improvement is by now minimal and therfore the compiler-code gets more complicated.
Also removing the GIL has many downsides.
Btw. PyPy seems to be faster then Unladen Swallow in some tests.
This question discussed many of the same things. My opinion is that it sounds great, but I'm waiting to see what it looks like, and how long it takes to become stable.
I'm particularly concerned with compatibility with existing code and libraries, and how the library-writing community responds to it. Ultimately, aside from personal hobby projects, it's of zero value to me until it can run all my third-party libraries.
I think the project has noble goals and with enough time (2-3 years), they will probably reach most of them.
They may not be able to merge their branch back into the trunk because Guido's current view is that cpython should be a reference implementation (ie. it shouldn't do things that are impossible for IronPython and jython to copy.) I've seen reports that this is what kept the cool parts of stackless from being merged into cpython.
Guido just posted an article to his twitter account that is an update to the Jesse Noller article posted earlier. http://jessenoller.com/2010/01/06/unladen-swallow-python-3s-best-feature/. Sounds like they are moving ahead as previously mentioned with python 3.
They have a quarterly release. So not far away, wait and watch, let them come up with some thing more than just a plan.
If it indeed comes to be true, easy to do away with C and C++ even for performance intensive operations.
Even tho' it is a Google sponsored Open Source project, surprisingly doesn't involve Guido anywhere.
I think that a 5 times speed improvement is not all that important for me personally.
It is not an order of magnitude change. Although if you consume CPU power at the scale of Google it can be a worth while investment to have some of your staff work on it.
Many of the speed improvements will likely make it into cpython eventually.
Getting rid of the GIL is interesting in principle but will likely reveal lots of problems with modules that are not thread safe once the GIL is removed.
I do not think I will use Unladen Swallow any time soon but like how giving attention to performance may improve the regular Python versions.

Categories