Seeking performance in python applications using pandas/numpy often benefits from the use of the pandas/numpy implemented methods other than own implemented code such as through looping. This might be a bad introduction to the question I have, but in the following screenshot (if I hadn't tested it) I was expecting the versions using the series' methods to run faster than the python builtins. Since that's not the case, it means I built a false intuition on this example, but I could not yet find the reason for this. So the question is, why in this case the use of the python builtins has higher performance than the methods applied on the series (am I missing something else?)?
Pandas has its own functions which are way different than Python's built in functions, therefore if you call Series.max() you are in fact calling nanops._nanminmax() which is added via the IndexOpsMixin instead of builtins.max()
Each behave differently, thus have different performance times.
Similarly for the rest of the methods. If you are curious, check the source code for Series class and classes it inherits from for the exact differences between builtin functions and Pandas' implementation.
am I missing something else?
You assumed that you always will get same result, which is not true. Example showing different output for sum and pandas.Series.sum is
import pandas as pd
s = pd.Series([1.0,2.0,float("nan")])
print(s.sum())
print(sum(s))
output
3.0
nan
Related
I can't seem to find the code for numpy argmax.
The source link in the docs lead me to here, which doesn't have any actual code.
I went through every function that mentions argmax using the github search tool and still no luck. I'm sure I'm missing something.
Can someone lead me in the right direction?
Thanks
Numpy is written in C. It uses a template engine that parsing some comments to generate many versions of the same generic function (typically for many different types). This tool is very helpful to generate fast code since the C language does not provide (proper) templates unlike C++ for example. However, it also make the code more cryptic than necessary since the name of the function is often generated. For example, generic functions names can look like #TYPE#_#OP# where #TYPE# and #OP# are two macros that can take different values each. On top of all of this, the CPython binding also make the code more complex since C functions have to be wrapped to be called from a CPython code with complex arrays (possibly with a high amount of dimensions and custom user types) and CPython arguments to decode.
_PyArray_ArgMinMaxCommon is a quite good entry point but it is only a wrapping function and not the main computing one. It is only useful if you plan to change the prototype of the Numpy function from Python.
The main computational function can be found here. The comment just above the function is the one used to generate the variants of the functions (eg. CDOUBLE_argmax). Note that there are some alternative specific implementation for alternative type below the main one like OBJECT_argmax since CPython objects and strings must be computed a bit differently. Thank you for contributing to Numpy.
As mentioned in the comments, you'll likely find what you are searching in the C code implementation (here under _PyArray_ArgMinMaxCommon). The code itself can be very convoluted, so if your intent was to open an issue on numpy with a broad idea, I would do it on the page you linked anyway.
I'm using numpy v1.18.2 in some simulations, and using inbuilt functions such as np.unique, np.diff and np.interp. I'm using these functions on standard objects, i.e lists or numpy arrays.
When I checked with cProfile, I saw that these functions make a call to an built-in method numpy.core._multiarray_umath.implement_array_function and that this method accounts for 32.5% of my runtime! To my understanding this is a wrapper that performs some checks to make sure the that the arguments passed to the function are compatible with the function.
I have two questions:
Is this function (implement_array_function) actually taking up so much time or is it actually the operations I'm doing (np.unique, np.diff, np.interp) that is actually taking up all this time? That is, am I misinterpreting the cProfile output? I was confused by the hierarchical output of snakeviz. Please see snakeviz output here and details for the function here.
Is there any way to disable it/bypass it, since the inputs need not be checked each time as the arguments I pass to these numpy functions are already controlled in my code? I am hoping that this will give me a performance improvement.
I already saw this question (what is numpy.core._multiarray_umath.implement_array_function and why it costs lots of time?), but I was not able to understand what exactly the function is or does. I also tried to understand NEP 18, but couldn't make out how to exactly solve the issue. Please fill in any gaps in my knowledge and correct any misunderstandings.
Also I'd appreciate if someone can explain this to me like I'm 5 (r/explainlikeimfive/) instead of assuming developer level knowledge of python.
All the information below is taken from NEP 18.
Is this function (implement_array_function) actually taking up so much time or is it actually the operations I'm doing (np.unique, np.diff, np.interp) that is actually taking up all this time?
As #hpaulj correctly mentioned in the comment, the overhead of the dispatcher adds 2-3 microseconds to each numpy function call. This will probably be shorten to 0.5-1 microseconds once it is implemented in C. See here.
Is there any way to disable it/bypass it
Yes, from NumPy 1.17, you can set the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION to 0 (before importing numpy) and this will disable the use of implement_array_function (See here). Something like
import os
os.environ['NUMPY_EXPERIMENTAL_ARRAY_FUNCTION'] = '0'
import numpy as np
However, disabling it probably would not give you any notable performance improvement as the overhead of it is just few microseconds, and this will be the default in later numpy version too.
The Julia Language syntax looks very similar to python, while the concept of a class (if one should address it as such a thing) is more what you use in C. There were many reasons why the creators decided on the difference with respect to the OOP. Still would it have been so hard (in comparison to create Julia in first place which is impressive) to find some canonical way to interpret python to Julia and thus get a hold of all the python libraries?
Yes. The design of Python makes it fundamentally difficult to optimize at compile-time (i.e. before you run the code). It is simply false that Julia is fast BECAUSE of its JIT. Rather, Julia is designed with its type system and multiple dispatch in mind so that way the compiler can know all of the necessary details to compile "the same code you would have written in C". That's what makes it fast: the type system. It makes a few trade-offs that allow it to, in "type-stable" functions, fully deduce what the types of every variable is, know what the memory layout of the type should be (including parametric types, so Vector{Float64} has a memory layout which is determined by the type and its parameter which inlines Float64 values like a NumPy array, except this is generalized in a way that your own struct types get the same efficiency), and compile a version of the code specifically for the types which are seen.
There are many ways where this is at odds with Python. For example, if the number of fields in a struct could change, then the memory layout could not be determined and thus these optimizations cannot occur at "compile-time". Julia was painstakingly designed to make sure that it would have type inferrability, and it uses that to generate code which is fully typed and remove all runtime checks (in type-stable functions. When a function is not type-stable, the types of the variables become dynamic rather than static and it slows down to Python-like speeds). In this sense, Julia actually isn't even optimized yet: all of its performance comes "for free" given the design of its type system. Python/MATLAB/R has to try really hard to optimize at runtime because it doesn't have the capability to do these deductions. In fact, those languages are "better optimized" right now in terms of runtime optimizations, but no one has really worked on runtime optimizations in Julia yet because in most performance sensitive cases you can get it all at compile time.
So then, what about Numba? Numba tries to take the route that Julia takes but with Python code by limiting what can be done so that way it can get type-stable code and compile that efficiently. However, this means a few things. First of all, it's not compatible with all Python codes or libraries. But more importantly, since Python is not a language built around its type system, the tools for controlling the code at the level of types is much reduced. So Numba doesn't have parametric vectors and generic codes which auto-specialize via multiple dispatch because these aren't features of the language. But that also means that it cannot make full use of the design, which limits how much it can do. It can handle the "use only floating point array" stuff just fine, but you can see it gets limited if you want one code to produce efficient code for "any number type, even ones I don't know about". However, by design, Julia does this automatically.
So at the core, Julia and Python are extremely different languages. It can be hard to see because Julia's syntax is close to Python's syntax, but they do not work the same at all.
This is a short summary of what I have described in a few blog posts. These go into more detail and show you how Julia is actually generating efficient code, how it gives you a generic "Python looking style" but doing so with full inferrability all the way down, and what the tradeoffs are.
How type-stability plus multiple dispatch gives performance:
http://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia
http://www.stochasticlifestyle.com/7-julia-gotchas-handle/
How the type system allows for highly performant generic designs
http://www.stochasticlifestyle.com/type-dispatch-design-post-object-oriented-programming-julia/
I am looking for a way to rename all variables in a formula according to a given substitution map. I am currently using the substitute function, but it seems to be quite slow.
Is there another function I can use which is faster than it? is there any other way of doing it quickly?
N.B. I am only substituting fresh variables to the variables in the original formula, so there are no renaming clashes. Is there any way to perform the renaming faster under this assumption?
For instance,
# given
f = And(Int('x') > Int('y'), Or(Int('x') - 5 >= Int('z'), Int('k') > 1))
# expected result after substitution
# f = And(Int('v0') > Int('v1'), Or(Int('v0') - 5 >= Int('v2'), Int('v3') > 1))
Is there any way to do it working on the context of f?
There isn't an inherently faster way over the API. I have a few comments regarding speed:
You seem to be using the Python API, which by itself has a huge overhead. It may help to time
the portion used in python separately from Z3.
The implementation of the substitute function uses class that gets allocated on the stack.
It is quite possible that making this class a persisted attribute on the context will
speed up amortized time because it will not be allocating and re-allocating memory repeatedly. I would have to profile an instance to be able to tell if this change really pays off.
The more fundamental way to perform renaming is to work with implicit renaming, so not apply substitution at all, but access variables with different offsets. This low level way of dereferencing variables is not available in any way over the API or even the way we represent high-level expressions so it is not going to be an option.
If your application allows it, you may be able to work with existing terms and encode substitutions implicitly. For example in some applications one can just add equality constraints between old and new variables.
How bad is it to redefine a class method from another, third-party module, in Python?
In fact, users can create NumPy matrices that contain numbers with uncertainty; ideally, I would like their code to run unmodified (compared to when the code manipulates float matrices); in particular, it would be great if the inverse of matrix m could still be obtained with m.I, despite the fact that m.I has to be calculated with my own code (the original I method does not work, in general).
How bad is it to redefine numpy.matrix.I? For one thing, it does tamper with third-party code, which I don't like, as it may not be robust (what if other modules do the same?…). Another problem is that the new numpy.matrix.I is a wrapper that involves a small overhead when the original numpy.matrix.I can actually be applied in order to obtain the inverse matrix.
Is subclassing NumPy matrices and only changing their I method better? this would force users to update their code and create matrices of numbers with uncertainty with m = matrix_with_uncert(…) (instead of keeping using numpy.matrix(…), as for a matrix of floats), but maybe this is an inconvenience that should be accepted for the sake of robustness? Matrix inversions could still be performed with m.I, which is good… On the other hand, it would be nice if users could build all their matrices (of floats or of numbers with uncertainties) with numpy.matrix() directly, without having to bother checking for data types.
Any comment, or additional approach would be welcome!
Subclassing (which does involve overriding, as the term is generally used) is generally much preferable to "monkey-patching" (stuffing altered methods into existing classes or modules), even when the latter is available (built-in types, meaning ones implemented in C, can protect themselves against monkey-patching, and most of them do).
For example, if your functionality relies on monkey-patching, it will break and stop upgrades if at any time the class you're monkey patching gets upgraded to be implemented in C (for speed or specifically to defend against monkey patching). Maintainers of third party packages hate monkey-patching because it means they get bogus bug reports from hapless users who (unbeknownst to them) are in fact using a buggy monkey-patch which breaks the third party package, where the latter (unless broken monkey-wise) is flawless. You've already remarked on the possible performance hit.
Conceptually, a "matrix of numbers with uncertainty" is a different concept from a "matrix of numbers". Subclassing expresses this cleanly, monkey-patching tries to hide it. That's really the root of what's wrong with monkey-patching in general: a covert channel operating through global, hidden means, without clarity and transparency. All the many practical problems descend in a sense from this root conceptual problem.
I strongly urge you to reject monkey-patching in favor of clean solutions such as subclassing.
In general, it's perfectly acceptable to override methods that are ...
Intentionally permit overrides
In a way they document (satisfying LSP won't hurt)
If both conditions are met, then overriding should be safe.
Depends on what you mean with "redefine". Obviously you can use your own version of it, no problem at all. Also you can redefine it by subclassing if it's a method.
You can also make a new method, and patch it into the class, a practice known as monkey_patching. Like so:
from amodule import aclass
def newfunction(self, param):
do_something()
aclass.oldfunction = newfunction
This will make all instances of aclass use your new function instead of the old one, including instances in any "fourth-party" modules. This works and is highly useful, but it's regarded as very ugly and a last resort option. This is because there is nothing in the aclass code to suggest that you have overridden the method, so it's hard to debug. And even worse it gets when two modules monkeypatch the same thing. Then you really get confused.