Is it reasonable to integrate python with c for performance? - python

I like to use python for almost everything and always had clear in my mind that if for some reason I was to find a bottleneck in my python code(due to python's limitations), I could always use a C script integrated to my code.
But, as I started to read a guide on how to integrate python. In the article the author says:
There are several reasons why one might wish to extend Python in C or C++, such as:
Calling functions in an existing library.
Adding a new builtin type to Python
Optimising inner loops in code
Exposing a C++ class library to Python
Embedding Python inside a C/C++ application
Nothing about performance. So I ask again, is it reasonable to integrate python with c for performance?

In my experience it is rarely necessary to optimize using C. I prefer to identify bottlenecks and improve algorithms in those areas completely in Python. Using hash tables, caching, and generally re-organizing your data structures to suit future needs has amazing potential for speeding up your program. As your program develops you'll get a better sense of what kind of material can be precalculated, so don't be afraid to go back and redo your storage and algorithms. Additionally, look for chances to kill "two birds with one stone", such as sorting objects as you render them instead of doing huge sorts.
When everything is worked to the best of your knowledge, I'd consider using an optimizer like Psyco. I've experienced literally 10x performance improvements just by using Psyco and adding one line to my program.
If all else fails, use C in the proper places and you'll get what you want.

* Optimising inner loops in code
Isn't that about performance ?

Performance is a broad topic so you should be more specific. If the bottleneck in your program involves a lot of networking then rewriting it in C/C++ probably won't make a difference since it's the network calls taking up time, not your code. You would be better off rewriting the slow section of your program to use fewer network calls thus reducing the time your program spends waiting on entwork IO. If your doing math intensive stuff such as solving differential equations and you know there are C librarys that can offer better performance then the way you are currently doing it in Python you may want to rewrite the section of your program to use those librarys to increase it's performance.

The C extensions API is notoriously hard to work with, but there are a number of other ways to integrate C code.
For some more usable alternatives see http://www.scipy.org/PerformancePython, in particular the section about using Weave for easy inlining of C code.
Also of interest is Cython, which provides a nice system for integrating with C code. Cython is used for optimization by some well-respected high-performance Python projects such as NumPy and Sage.
As mentioned above, Psyco is another attractive option for optimization, and one which requires nothing more than
import psyco
psyco.bind(myfunction)
Psyco will identify your inner loops and automatically substitute optimized versions of the routines.

C can definitely speed up processor bound tasks. Integrating is even easier now, with the ctypes library, or you could go for any of the other methods you mention.
I feel mercurial has done a good job with the integration if you want to look at their code as an example. The compute intensive tasks are in C, and everything else is python.

You will gain a large performance boost using C from Python (assuming your code is well written, etc) because Python is interpreted at run time, whereas C is compiled beforehand. This will speed up things quite a bit because with C, your code is simply running, whereas with Python, the Python interpreter must figure out what you are doing and interpret it into machine instructions.

I've been told for the calculating portion use C for the scripting use python. So yes you can integrate both. C is capable of faster calculations than that of python

Related

Benefits of accessing the Abstract Syntaxt Tree (AST) . How does Julia exploit it?

I have read that Julia has access to the AST of the code it runs. What exactly does this mean? Is it that the runtime can access it, that code itself can access it, or both?
Building on this:
Is this a key difference of Julia with respect to other dynamic languages, specifically Python?
What are the practical benefits of being able to access the AST?
What would be a good example of something that you can't easily do in Python, but that you can do in Julia, because of this?
What distinguishes Julia from languages like Python is that Julia allows you to intercept code before it is evaluated. Macros are just functions, written in Julia, which let you access that code and manipulate it before it runs. Furthermore, rather than treating code as a string (like "f(x)"), it's provided as a Julian object (like Expr(:call, :f, :x)).
There are plenty of things this allows which just aren't possible in Python. The main ones are:
You can do more work at compile time, increasing performance
Two good examples of this are regexes and printf. Both of these take a format specification of some kind and interpret it in some way. Now, these can fairly straightforwardly be implemented as functions, which might look like this:
match(Regex(".*"), str)
printf("%d", num)
The problem with this is that these specifications must be re-interpreted every time the statement is run. Every time the interpreter goes over this block, the regex must be re-compiled into a state machine, and the format must be run through a mini-interpreter. On the other hand, if we implement these as macros:
match(r".*", str)
#printf("%d", num)
Then the r and #printf macros will intercept the code at compile time, and run their respective interpreters then. The regex turns into a fast state machine, and the #printf statement turns into a simple println(num). At run time the minimum of work is done, so the code is blazing fast. Now, other languages are able to provide fast regexes, for example, by providing special syntax for it – but the fact that they're not special-cased in Julia means that developers can use the same techniques in their own code.
You can make mini-compilers for, well, pretty much anything
Languages with macros tend to have more capable embedded DSLs, because you can change the semantics of the language at will. For example, the algebraic modelling language, JuMP.jl. Clojure also has some neat examples of this too, like its embedded logic programming language. Mathematica.jl even embeds Mathematica's semantics in Julia, so that you can write really natural symbolic expressions like #Integrate(log(x), {x,0,2}). You can fake this to a point in Python (SymPy does a good job), but not as cleanly or as efficiently.
If that doesn't convince you, consider that someone managed to implement an interactive Julia debugger in pure Julia using macros. Try that in Python.
Edit: Another great example of something that's difficult in other languages is Cartestian.jl, which lets you write generic algorithms across arrays of any number of dimensions.
I am not familiar with Julia and only first heard of it with your question, but this sounded an awfully lot like Lisp (and indeed Julia seems to be a new grandchild/dialect of Lisp from what I'm reading) and it's powerful macros. The ability to access the AST at run/compile time brings a whole new dimension to the programmers code: metaprogramming.
See http://docs.julialang.org/en/latest/manual/metaprogramming/ and especially http://docs.julialang.org/en/latest/manual/metaprogramming/#macros for some of the practical uses. Basically you can 'inject/modify' code in places where it would be impossible for python/R to do the same.
Example: loop unrolling without any copy & paste, which takes a compile time argument to easily vary how much you want to unroll the loop.
Here's an excellent resource on Julia metaprogramming: https://en.wikibooks.org/wiki/Introducing_Julia/Metaprogramming

Statistical profiler for PyPy

I would like to use statprof.py for profiling code in PyPy. Unfortunately, it does not seem to work, the line numbers it points to are off. Does anyone know how to make it work or know of an alternative?
It's likely that "the line numbers are off" because PyPy, in JITted code, will inline many functions and will only deliver signals (here from the timer) at the end of the loops. Compare this with CPython, which delivers the signals between two random bytecodes -- occasionally at the end of the loops too, but generally anywhere. So what you get on PyPy is the same as what you'd get on CPython if you constrained the signal handlers to run only at the "end of loop" bytecode.
This is why this kind of profiling will seem to always miss a lot of functions, like most functions with no loop in them.
You can try to use the built-in cProfile module. It comes of course with a bigger performance hit than statistical profiling, but try it anyway --- it doesn't prevent JITting, for example, so the performance hit should still be reasonable.
More generally, I don't see an easy way to implement the equivalent of statistical profiling in PyPy. It's quite hard to give it sense in the presence of functions that are inlined into each other and then optimized globally... I'd be interested if you can find that a tool actually exists, for some other high-level language, doing statistical profiling, on a VM with a tracing JIT.
We could record enough information to track each small group of assembler instructions back to the real Python function it comes from, and then use hacks to inspect the current Instruction Pointer (IP) at the machine level. Not impossible, but serious work :-)

Does the Python 3 interpreter have a JIT feature?

I found that when I ask something more to Python, python doesn't use my machine resource at 100% and it's not really fast, it's fast if compared to many other interpreted languages, but when compared to compiled languages i think that the difference is really remarkable.
Is it possible to speedup things with a Just In Time (JIT) compiler in Python 3?
Usually a JIT compiler is the only thing that can improve performances in interpreted languages, so i'm referring to this one, if other solutions are available i would love to accept new answers.
First off, Python 3(.x) is a language, for which there can be any number of implementations. Okay, to this day no implementation except CPython actually implements those versions of the language. But that will change (PyPy is catching up).
To answer the question you meant to ask: CPython, 3.x or otherwise, does not, never did, and likely never will, contain a JIT compiler. Some other Python implementations (PyPy natively, Jython and IronPython by re-using JIT compilers for the virtual machines they build on) do have a JIT compiler. And there is no reason their JIT compilers would stop working when they add Python 3 support.
But while I'm here, also let me address a misconception:
Usually a JIT compiler is the only thing that can improve performances in interpreted languages
This is not correct. A JIT compiler, in its most basic form, merely removes interpreter overhead, which accounts for some of the slow down you see, but not for the majority. A good JIT compiler also performs a host of optimizations which remove the overhead needed to implement numerous Python features in general (by detecting special cases which permit a more efficient implementation), prominent examples being dynamic typing, polymorphism, and various introspective features.
Just implementing a compiler does not help with that. You need very clever optimizations, most of which are only valid in very specific circumstances and for a limited time window. JIT compilers have it easy here, because they can generate specialized code at run time (it's their whole point), can analyze the program easier (and more accurately) by observing it as it runs, and can undo optimizations when they become invalid. They can also interact with interpreters, unlike ahead of time compilers, and often do it because it's a sensible design decision. I guess this is why they are linked to interpreters in people's minds, although they can and do exist independently.
There are also other approaches to make Python implementation faster, apart from optimizing the interpreter's code itself - for example, the HotPy (2) project. But those are currently in research or experimentation stage, and are yet to show their effectiveness (and maturity) w.r.t. real code.
And of course, a specific program's performance depends on the program itself much more than the language implementation. The language implementation only sets an upper bound for how fast you can make a sequence of operations. Generally, you can improve the program's performance much better simply by avoiding unnecessary work, i.e. by optimizing the program. This is true regardless of whether you run the program through an interpreter, a JIT compiler, or an ahead-of-time compiler. If you want something to be fast, don't go out of your way to get at a faster language implementation. There are applications which are infeasible with the overhead of interpretation and dynamicness, but they aren't as common as you'd think (and often, solved by calling into machine code-compiled code selectively).
The only Python implementation that has a JIT is PyPy. Byt - PyPy is both a Python 2 implementation and a Python 3 implementation.
The Numba project should work on Python 3. Although it is not exactly what you asked, you may want to give it a try:
https://github.com/numba/numba/blob/master/docs/source/doc/userguide.rst.
It does not support all Python syntax at this time.
You can try the pypy py3 branch, which is more or less python compatible, but the official CPython implementation has no JIT.
This will best be answered by some of the remarkable Python developer folks on this site.
Still I want to comment: When discussing speed of interpreted languages, I just love to point to a project hosted at this location: Computer Language Benchmarks Game
It's a site dedicated to running benchmarks. There are specified tasks to do. Anybody can submit a solution in his/her preferred language and then the tests compare the runtime of each solution. Solutions can be peer reviewed, are often further improved by others, and results are checked against the spec. In the long run this is the most fair benchmarking system to compare different languages.
As you can see from indicative summaries like this one, compiled languages are quite fast compared to interpreted languages. However, the difference is probably not so much in the exact type of compilation, it's the fact that Python (and the others in the graph slower than python) are fully dynamic. Objects can be modified on the fly. Types can be modified on the fly. So some type checking has to be deferred to runtime, instead of compile time.
So while you can argue about compiler benefits, you have to take into account that there are different features in different languages. And those features may come at an intrinsic price.
Finally, when talking about speed: Most often it's not the language and the perceived slowness of a language that's causing the issue, it's a bad algorithm. I never had to switch languages because one was too slow: When there's a speed issue in my code, I fix the algorithm. However, if there are time-consuming, computational intensive loops in your code it is usually worth the while to recompile those. A prominent example are libraries coded in C used by scripting languages (Perl XS libs, or e.g. numpy/scipy for Python, lapack/blas are examples of libs available with bindings for many scripting languages)
If you mean JIT as in Just in time compiler to a Bytecode representation then it has such a feature(since 2.2). If you mean JIT to machine code, then no. Yet the compilation to byte code provides a lot of performance improvement. If you want it to compile to machine code, then Pypy is the implementation you're looking for.
Note: pypy doesn't work with Python 3.x
If you are looking for speed improvements in a block of code, then you may want to have a look to rpythonic, that compiles down to C using pypy. It uses a decorator that converts it in a JIT for Python.

Comparing performance between ruby and python code

I have a memory and CPU intensive problem to solve and I need to benchmark the different solutions in ruby and python on different platforms.
To do the benchmark, I need to measure the time taken and the memory occupied by objects (not the entire program, but a selected list of objects) in both python and ruby.
Please recommend ways to do it, and also let me know if it is possible to do it without using OS specify tools like (Task Manager and ps). Thanks!
Update: Yes, I know that both Python and Ruby are not strong in performance and there are better alternatives like c, c++, Java etc. I am actually more interested in comparing the performance of Python and Ruby. And please no fame-wars.
For Python I recommend heapy
from guppy import hpy
h = hpy()
print h.heap()
or Dowser or PySizer
For Ruby you can use the BleakHouse Plugin or just read this answer on memory leak debugging (ruby).
If you really need to write fast code in a language like this (and not a language far more suited to CPU intensive operations and close control over memory usage such as C++) then I'd recommend pushing the bulk of the work out to Cython.
Cython is a language that makes
writing C extensions for the Python
language as easy as Python itself.
Cython is based on the well-known
Pyrex, but supports more cutting edge
functionality and optimizations.
The Cython language is very close to
the Python language, but Cython
additionally supports calling C
functions and declaring C types on
variables and class attributes. This
allows the compiler to generate very
efficient C code from Cython code.
That way you can get most of the efficiency of C with most of the ease of use of Python.
If you are using Python for CPU intensive algorithmic tasks I suggest use Numpy/Scipy to speed up your numerical calculations and use the Psyco JIT compiler for everything else. Your speeds can approach that of much lower-level languages if you use optimized components.
I'd be wary of trying to measure just the memory consumption of an object graph over the lifecycle of an application. After all, you really don't care about that, in the end. You care that your application, in its entirety, has a sufficiently low footprint.
If you choose to limit your observation of memory consumption anyway, include garbage collector timing in your list of considerations, then look at ruby-prof:
http://ruby-prof.rubyforge.org/
Ciao,
Sheldon.
(you didn't specify py 2.5, 2.6 or 3; or ruby 1.8 or 1.9, jruby, MRI; The JVM has a wealth of tools to attack memory issues; Generally it 's helpful to zero in on memory depletion by posting stripped down versions of programs that replicate the problem
Heapy, ruby-prof, bleak house are all good tools, here are others:
Ruby
http://eigenclass.org/R2/writings/object-size-ruby-ocaml
watch ObjectSpace yourself
http://www.coderoshi.com/2007/08/cheap-tricks-ix-spying-on-ruby.html
http://sporkmonger.com/articles/2006/10/22/a-question
(ruby and python)
http://www.softwareverify.com/

Scripting language choice for initial performance [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a small lightweight application that is used as part of a larger solution. Currently it is written in C but I am looking to rewrite it using a cross-platform scripting language. The solution needs to run on Windows, Linux, Solaris, AIX and HP-UX.
The existing C application works fine but I want to have a single script I can maintain for all platforms. At the same time, I do not want to lose a lot of performance but am willing to lose some.
Startup cost of the script is very important. This script can be called anywhere from every minute to many times per second. As a consequence, keeping it's memory and startup time low are important.
So basically I'm looking for the best scripting languages that is:
Cross platform.
Capable of XML parsing and HTTP Posts.
Low memory and low startup time.
Possible choices include but are not limited to: bash/ksh + curl, Perl, Python and Ruby. What would you recommend for this type of a scenario?
Lua is a scripting language that meets your criteria. It's certainly the fastest and lowest memory scripting language available.
Because of your requirement for fast startup time and a calling frequency greater than 1Hz I'd recommend either staying with C and figuring out how to make it portable (not always as easy as a few ifdefs) or exploring the possibility of turning it into a service daemon that is always running. Of course this depends on how
Python can have lower startup times if you compile the module and run the .pyc file, but it is still generally considered slow. Perl, in my experience, in the fastest of the scripting languages so you might have good luck with a perl daemon.
You could also look at cross platform frameworks like gtk, wxWidgets and Qt. While they are targeted at GUIs they do have low level cross platform data types and network libraries that could make the job of using a fast C based application easier.
"called anywhere from every minute to many times per second. As a consequence, keeping it's memory and startup time low are important."
This doesn't sound like a script to me at all.
This sounds like a server handling requests that arrive from every minute to several times a second.
If it's a server, handling requests, start-up time doesn't mean as much as responsiveness. In which case, Python might work out well, and still keep performance up.
Rather than restarting, you're just processing another request. You get to keep as much state as you need to optimize performance.
When written properly, C should be platform independant and would only need a recompile for those different platforms. You might have to jump through some #ifdef hoops for the headers (not all systems use the same headers), but most normal (non-win32 API) calls are very portable.
For web access (which I presume you need as you mention bash+curl), you could take a look at libcurl, it's available for all the platforms you mentioned, and shouldn't be that hard to work with.
With execution time and memory cost in mind, I doubt you could go any faster than properly written C with any scripting language as you would lose at least some time on interpreting the script...
I concur with Lua: it is super-portable, it has XML libraries, either native or by binding C libraries like Expat, it has a good socket library (LuaSocket) plus, for complex stuff, some cURL bindings, and is well known for being very lightweight (often embedded in low memory devices), very fast (one of the fastest scripting languages), and powerful. And very easy to code!
It is coded in pure Ansi C, and lot of people claim it has one of the best C biding API (calling C routines from Lua, calling Lua code from C...).
If Low memory and low startup time are truly important you might want to consider doing the work to keep the C code cross platform, however I have found this is rarely necessary.
Personally I would use Ruby or Python for this type of job, they both make it very easy to make clear understandable code that others can maintain (or you can maintain after not looking at it for 6 months). If you have the control to do so I would also suggest getting the latest version of the interpreter, as both Ruby and Python have made notable improvements around performance recently.
It is a bit of a personal thing. Programming Ruby makes me happy, C code does not (nor bash scripting for anything non-trivial).
As others have suggested, daemonizing your script might be a good idea; that would reduce the startup time to virtually zero. Either have a small C wrapper that connects to your daemon and transmits the request back and forth, or have the daemon handle requests directly.
It's not clear if this is intended to handle HTTP requests; if so, Perl has a good HTTP server module, bindings to several different C-based XML parsers, and blazing fast string support. (If you don't want to daemonize, it has a good, full-featured CGI module; if you have full control over the server it's running on, you could also use mod_perl to implement your script as an Apache handler.) Ruby's strings are a little slower, but there are some really good backgrounding tools available for it. I'm not as familiar with Python, I'm afraid, so I can't really make any recommendations about it.
In general, though, I don't think you're as startup-time-constrained as you think you are. If the script is really being called several times a second, any decent interpreter on any decent operating system will be cached in memory, as will the source code of your script and its modules. Result: the startup times won't be as bad as you might think.
Dagny:~ brent$ time perl -MCGI -e0
real 0m0.610s
user 0m0.036s
sys 0m0.022s
Dagny:~ brent$ time perl -MCGI -e0
real 0m0.026s
user 0m0.020s
sys 0m0.006s
(The parameters to the Perl interpreter load the rather large CGI module and then execute the line of code '0;'.)
Python is good. I would also check out The Computer Languages Benchmarks Game website:
http://shootout.alioth.debian.org/
It might be worth spending a bit of time understanding the benchmarks (including numbers for startup times and memory usage). Lots of languages are compared such as Perl, Python, Lua and Ruby. You can also compare these languages against benchmarks in C.
I agree with others in that you should probably try to make this a more portable C app instead of porting it over to something else since any scripting language is going to introduce significant overhead from a startup perspective, have a much larger memory footprint, and will probably be much slower.
In my experience, Python is the most efficient of the three, followed by Perl and then Ruby with the difference between Perl and Ruby being particularly large in certain areas. If you really want to try porting this to a scripting language, I would put together a prototype in the language you are most comfortable with and see if it comes close to your requirements. If you don't have a preference, start with Python as it is easy to learn and use and if it is too slow with Python, Perl and Ruby probably won't be able to do any better.
Remember that if you choose Python, you can also extend it in C if the performance isn't great. Heck, you could probably even use some of the code you have right now. Just recompile it and wrap it using pyrex.
You can also do this fairly easily in Ruby, and in Perl (albeit with some more difficulty). Don't ask me about ways to do this though.
Can you instead have it be a long-running process and answer http or rpc requests?
This would satisfy the latency requirements in almost any scenario, but I don't know if that would break your memory footprint constraints.
At first sight, it's sounds like over engineering, as a rule of thumb I suggest fixing only when things are broken.
You have an already working application. Apparently you want to want to call the feature provided from few more several sources. It looks like the description of a service to me (maybe easier to maintain).
Finally you also mentioned that this is part of a larger solution, then you may want to reuse the language, facilities of the larger solutions. From the description you gave (xml+http) it seems quite an usual application that can be written in any generalist language (maybe a web container in java?).
Some libraries can help you to make your code portable:
Boost,
Qt
more details may trigger more ideas :)
Port your app to Ruby. If your app is too slow, profile it and rewrite the those parts in C.

Categories