I have recently stumpled upon the boost.odeint library and I am surprised about the number of possibilities and configurability. However, having used scipy.integrate.odeint extensively (which is essentially a wrapper to ODEPACK in fortran) I wonder how their performances compare. I know that boost.odeint also comes with parallelization, which is not possible with scipy (as far as I know) which would increase performance then a lot, but I am asking for the single core case.
However, since I would have to wrap boost.odeint (using cython or boost.python) into python in that case, maybe someone of you has done that already? This would be a great achivement, since all the analysis possibilites are much more advanced in python.
As far as I can tell from comparing the lists of available steppers for
Boost.odeint and scipy.integrate.ode, the only algorithm implemented by both is
the Dormand-Prince fifth order stepper, dopri5. You could compare the
efficiency of the two implementations of this algorithm in Python by using
this Cython wrapper to Boost.odeint (it does not expose all the
steppers provided by Boost.odeint, but does expose dopri5).
Depending on your definition of "testing performance" you could also compare
different algorithms, but this is obviously not the same thing as comparing
two implementations of the same algorithm.
Related
I've read online that Scala is faster than Python, e.g. here. I've also seen a comparison between different front ends that concluded that R was so slow that the tester gave up trying to measure its performance (here; although this was specifically a test for user-defined functions and may not have used the sparklyr package).
I also know that sparklyr now has arrow integration, which has led to performance improvements for user defined functions as well as copying data to/from the cluster, as shown here.
My question: how fast is sparklyr compared to Python/Scala? I'm mostly interested in standard 'out-of-the-box' functions, but would also be interested to know how it stacks up for user-defined functions now that arrow has been integrated. And are there particular circumstances in which it performs well or badly?
I ask because I've built an app in sparklyr that is slower than I would have hoped despite lots of tinkering with tuning parameters, and I'm wondering if this is partly because of limitations in the package.
in a local PC, most of the time, R is much faster than spark.
The Julia Language syntax looks very similar to python, while the concept of a class (if one should address it as such a thing) is more what you use in C. There were many reasons why the creators decided on the difference with respect to the OOP. Still would it have been so hard (in comparison to create Julia in first place which is impressive) to find some canonical way to interpret python to Julia and thus get a hold of all the python libraries?
Yes. The design of Python makes it fundamentally difficult to optimize at compile-time (i.e. before you run the code). It is simply false that Julia is fast BECAUSE of its JIT. Rather, Julia is designed with its type system and multiple dispatch in mind so that way the compiler can know all of the necessary details to compile "the same code you would have written in C". That's what makes it fast: the type system. It makes a few trade-offs that allow it to, in "type-stable" functions, fully deduce what the types of every variable is, know what the memory layout of the type should be (including parametric types, so Vector{Float64} has a memory layout which is determined by the type and its parameter which inlines Float64 values like a NumPy array, except this is generalized in a way that your own struct types get the same efficiency), and compile a version of the code specifically for the types which are seen.
There are many ways where this is at odds with Python. For example, if the number of fields in a struct could change, then the memory layout could not be determined and thus these optimizations cannot occur at "compile-time". Julia was painstakingly designed to make sure that it would have type inferrability, and it uses that to generate code which is fully typed and remove all runtime checks (in type-stable functions. When a function is not type-stable, the types of the variables become dynamic rather than static and it slows down to Python-like speeds). In this sense, Julia actually isn't even optimized yet: all of its performance comes "for free" given the design of its type system. Python/MATLAB/R has to try really hard to optimize at runtime because it doesn't have the capability to do these deductions. In fact, those languages are "better optimized" right now in terms of runtime optimizations, but no one has really worked on runtime optimizations in Julia yet because in most performance sensitive cases you can get it all at compile time.
So then, what about Numba? Numba tries to take the route that Julia takes but with Python code by limiting what can be done so that way it can get type-stable code and compile that efficiently. However, this means a few things. First of all, it's not compatible with all Python codes or libraries. But more importantly, since Python is not a language built around its type system, the tools for controlling the code at the level of types is much reduced. So Numba doesn't have parametric vectors and generic codes which auto-specialize via multiple dispatch because these aren't features of the language. But that also means that it cannot make full use of the design, which limits how much it can do. It can handle the "use only floating point array" stuff just fine, but you can see it gets limited if you want one code to produce efficient code for "any number type, even ones I don't know about". However, by design, Julia does this automatically.
So at the core, Julia and Python are extremely different languages. It can be hard to see because Julia's syntax is close to Python's syntax, but they do not work the same at all.
This is a short summary of what I have described in a few blog posts. These go into more detail and show you how Julia is actually generating efficient code, how it gives you a generic "Python looking style" but doing so with full inferrability all the way down, and what the tradeoffs are.
How type-stability plus multiple dispatch gives performance:
http://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia
http://www.stochasticlifestyle.com/7-julia-gotchas-handle/
How the type system allows for highly performant generic designs
http://www.stochasticlifestyle.com/type-dispatch-design-post-object-oriented-programming-julia/
I'm working on improving the performance of a python program, and I was wondering if there was a way to improve performance by disabling array bounds checking? I know that some versions of Pascal let you do this. Does python have any such feature?
I know that Python isn't really designed for high performance, but I'd like to know if it's possible to improve performance in this manner; otherwise yes, I am aware that switching to C would be faster.
Disabling array bounds checking cannot be done in Python as far as I know.
It can, however, be done in cython fairly easily with the directive
#cython: boundscheck=False
However, since "premature optimization is the root of all evil." (Knuth), you might want to first check if your script runs faster by switching to pypy instead of python.
For ordinary lists, no. However, you can create your own datastructures / functions in C or C++ using the Python C API (see Extending Python with C or C++), which you could use to implement a dangerous_unchecked_list data structure that behaves like a regular list but without this checking (or some other higher level data structure that needs to bypass bounds-checking internally). For much larger operations that can't be implemented optimally in Python, you could also use interprocess communication (IPC) -- as opposed to the C API -- for invoking a non-Python implementation.
However, before going down this route, you should make sure that this is truly the performance bottleneck. You may find that there are other areas where you can get bigger wins, such as by using a better algorithm, by choosing other functions or datastructures to implement natively that actually are the peformance bottleneck, by precomputing and/or caching information, or by some other means.
I'm having a lot of fun learning Python by writing a genetic programming type of application.
I've had some great advice from Torsten Marek, Paul Hankin and Alex Martelli on this site.
The program has 4 main functions:
generate (randomly) an expression tree.
evaluate the fitness of the tree
crossbreed
mutate
As all of generate, crossbreed and mutate call 'evaluate the fitness'. it is the busiest function and is the primary bottleneck speedwise.
As is the nature of genetic algorithms, it has to search an immense solution space so the faster the better. I want to speed up each of these functions. I'll start with the fitness evaluator. My question is what is the best way to do this. I've been looking into cython, ctypes and 'linking and embedding'. They are all new to me and quite beyond me at the moment but I look forward to learning one and eventually all of them.
The 'fitness function' needs to compare the value of the expression tree to the value of the target expression. So it will consist of a postfix evaluator which will read the tree in a postfix order. I have all the code in python.
I need advice on which I should learn and use now: cython, ctypes or linking and embedding.
Thank you.
Ignore everyone elses' answer for now. The first thing you should learn to use is the profiler. Python comes with a profile/cProfile; you should learn how to read the results and analyze where the real bottlenecks is. The goal of optimization is three-fold: reduce the time spent on each call, reduce the number of calls to be made, and reduce memory usage to reduce disk thrashing.
The first goal is relatively easy. The profiler will show you the most time-consuming functions and you can go straight to that function to optimize it.
The second and third goal is harder since this means you need to change the algorithm to reduce the need to make so much calls. Find the functions that have high number of calls and try to find ways to reduce the need to call them. Utilize the built-in collections, they're very well optimized.
If you're doing a lot of number and array processing, you should take a look at pandas, Numpy/Scipy, gmpy third party modules; they're well optimised C libraries for processing arrays/tabular data.
Another thing you want to try is PyPy. PyPy can JIT recompile and do much more advanced optimisation than CPython, and it'll work without the need to change your python code. Though well optimised code targeting CPython can look quite different from well optimised code targeting PyPy.
Next to try is Cython. Cython is a slightly different language than Python, in fact Cython is actually best described as C with typed Python-like syntax.
For parts of your code that is in very tight loops that you can no longer optimize using any other ways, you may want to rewrite it as C extension. Python has a very good support for extending with C. In PyPy, the best way to extend PyPy is with cffi.
Cython is the quickest to get the job done, either by writing your algorithm directly in Cython, or by writing it in C and bind it to python with Cython.
My advice: learn Cython.
Another great option is boost::python which lets you easily wrap C or C++.
Of these possibilities though, since you have python code already written, cython is probably a good thing to try first. Perhaps you won't have to rewrite any code to get a speedup.
Try to work your fitness function so that it will support memoization. This will replace all calls that are duplicates of previous calls with a quick dict lookup.
Are parts of NumPy and/or SciPy programmed in C/C++?
And how does the overhead of calling C from Python compare to the overhead of calling C from Java and/or C#?
I'm just wondering if Python is a better option than Java or C# for scientific apps.
If I look at the shootouts, Python loses by a huge margin. But I guess this is because they don't use 3rd-party libraries in those benchmarks.
I would question any benchmark which doesn't show the source for each implementation (or did I miss something)? It's entirely possible that either or both of those solutions are coded badly which would result in an unfair appraisal of either or both language's performance. [Edit] Oops, now I see the source. As others have pointed out though, it's not using the NumPy/SciPy libraries so those benchmarks are not going to help you make a decision.
I believe the vast majority of NumPy and SciPy is written in C and wrapped in Python for ease of use.
It probably depends what you're doing in any of those languages as to how much overhead there is for a particular application.
I've used Python for data processing and analysis for a couple of years now so I would say it's certainly fit for purpose.
What are you trying to achieve at the end of the day? If you want a fast way to develop readable code, Python is an excellent option and certainly fast enough for a first stab at whatever it is you're trying to solve.
Why not have a bash at each for a small subset of your problem and benchmark the results in terms of development time and run time? Then you can make an objective decision based on some relevant data ...or at least that's what I'd do :-)
There is a better comparison here (not a benchmark but shows ways of speeding up Python). NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost.python, weave.inline, weave.blitz) / Fortran (f2py) - or even just by adding type annotations to Python so it can be processed to C (cython). I don't think there are many things comparably easy for C# or Java - at least that so seemlessly handle passing numerical arrays of different types (although I guess proponents would argue since they don't have the performance penalty of Python there is less need to).
A lot of it is written in C or fortran. You can re-write the hot loops in C (or use one of the gazillion ways to speed python up, boost/weave is my favorite), but does it really matter?
Your scientific app will be run once. The rest is just debugging and development, and those can be much quicker on Python.
Most of NumPy is in C, but a large portion of the C code is "boilerplate" to handle all the dirty details of the Python/C interface. I think the ratio C vs. Python is around 50/50 ATM for NumPy.
I am not too familiar with vm-based low-level details, but I believe the interface cost would be higher because of the restrictions put on the jvm and the .clr. One of the reason why numpy is often faster than similar environments is the memory representation and how arrays are shared/passed between functions. Whereas most environments (Matlab and R as well I believe) use Copy-On-Write to pass arrays between functions, NumPy use references. But doing so in e.g. the JVM would be hard (because of restrictions on how to use pointer, etc...). It is doable (an early port of NumPy for Jython exists), but I don't know how they solve this issue. Maybe C++/Cli would make this easier, but I have zero experience with that environment.
It always depends on your own capability to handle the langue, so the language is able to generate fast code. Out of my experience, numpy is several times slower then good .NET implementations. And I expect JAVA to be similar fast. Their optimizing JIT compilers have improved significantly over the years and produce very efficient instructions.
numpy on the other hand comes with a syntax wich is easier to use for those, which are attuned to scripting languages. But if it comes to application development, those advantages often turn to obstacles and you will yearn for typesafety and enterprise IDEs. Also, the syntactic gap is already closing with C#. A growing number of scientific libraries exist for Java and .NET.Personally I tend towards C#, bacause it provides better syntax for multidimensional arrays and somehow feels more 'modern'. But of course, this is only my personal experience.