Numpy provides both np.absolute and the alias np.abs defined via
from .numeric import absolute as abs
which seems to be in obvious violation of the zen of python:
There should be one-- and preferably only one --obvious way to do it.
So I'm guessing that there is a good reason for this.
I have personally been using np.abs in almost all of my code and looking at e.g. the number of search results for np.abs vs np.absolute on Stack Overflow it seems like an overwhelming majority does the same (2130 vs 244 hits).
Is there any reason i should preferentially use np.absolute over np.abs in my code, or should I simply go for the more "standard" np.abs?
It's likely because there a built-in functions with the same name, abs. The same is true for np.amax, np.amin and np.round_.
The aliases for the NumPy functions abs, min, max and round are only defined in the top-level package.
So np.abs and np.absolute are completely identical. It doesn't matter which one you use.
There are several advantages to the short names: They are shorter and they are known to Python programmers because the names are identical to the built-in Python functions. So end-users have it easier (less to type, less to remember).
But there are reasons to have different names too: NumPy (or more generally 3rd party packages) sometimes need the Python functions abs, min, etc. So inside the package they define functions with a different name so you can still access the Python functions - and just in the top-level of the package you expose the "shortcuts". Note: Different names are not the only available option in that case: One could work around that with the Python module builtins to access the built-in functions if one shadowed a built-in name.
It might also be the case (but that's pure speculation on my part) that they originally only included the long-named functions absolute (and so on) and only added the short aliases later. Being a large and well-used library the NumPy developers don't remove or deprecate stuff lightly. So they may just keep the long names around because it could break old code/scripts if they would remove them.
There also is Python's built-in abs(), but really all those functions are doing the same thing. They're even exactly equally fast! (This is not the case for other functions, like max().)
Code to reproduce the plot:
import numpy as np
import perfplot
def np_absolute(x):
return np.absolute(x)
def np_abs(x):
return np.abs(x)
def builtin_abs(x):
return abs(x)
b = perfplot.bench(
setup=np.random.rand,
kernels=[np_abs, np_absolute, builtin_abs],
n_range=[2 ** k for k in range(25)],
xlabel="len(data)",
)
b.save("out.png")
b.show()
Related
This question already has answers here:
How do I use numpy.where()? What should I pass, and what does the result mean? [closed]
(2 answers)
Closed 4 years ago.
I have already found the source for the numpy.ma.where() function but it seems to be calling the numpy.where() function and to better understand it I would like to take a look if possible.
Most Python functions are written in the Python language, but some functions are written in something more native (often the C language).
Regular Python functions ("pure Python")
There are a few techniques you can use to ask Python itself where a function is defined. Probably the most portable uses the inspect module:
>>> import numpy
>>> import inspect
>>> inspect.isbuiltin(numpy.ma.where)
False
>>> inspect.getsourcefile(numpy.ma.where)
'.../numpy/core/multiarray.py'
But this won't work with a native ("built-in") function:
>>> import numpy
>>> import inspect
>>> inspect.isbuiltin(numpy.where)
True
>>> inspect.getsourcefile(numpy.where)
TypeError: <built-in function where> is not a module, class, method, function, traceback, frame, or code object
Native ("built-in") functions
Unfortunately, Python doesn't provide a record of source files for built-in functions. You can find out which module provides the function:
>>> import numpy as np
>>> np.where
<built-in function where>
>>> np.where.__module__
'numpy.core.multiarray'
Python won't help you find the native (C) source code for that module, but in this case it's reasonable to look in the numpy project for C source that has similar names. I found the following file:
numpy/core/src/multiarray/multiarraymodule.c
And in that file, I found a list of definitions (PyMethodDef) including:
{"where",
(PyCFunction)array_where,
METH_VARARGS, NULL},
This suggests that the C function array_where is the one that Python sees as "where".
The array_where function is defined in the same file, and it mostly delegates to the PyArray_Where function.
In short
NumPy's np.where function is written in C, not Python. A good place to look is PyArray_Where.
First there are 2 distinct versions of where, one that takes just the condition, the other that takes 3 arrays.
The simpler one is most commonly used, and is just another name for np.nonzero. This scans through the condition array twice. Once with np.count_nonzero to determine how many nonzero entries there are, which allows it to allocate the return arrays. The second step is to fill in the coordinates of all nonzero entries. The key is that it returns a tuple of arrays, one array for each dimension of condition.
The condition, x, y version takes three arrays, which it broadcasts against each other. The return array has the common broadcasted shape, with elements chosen from x and y as explained in the answers to your previous question, How exactly does numpy.where() select the elements in this example?
You do realize that most of this code is c or cython, with a significant about of preprocessing. It is hard to read, even for experienced users. It is easier to run a variety of test cases and get a feel for what is happening that way.
A couple things to watch out for. np.where is a python function, and python evaluates each input fully before passing them to it. This is conditional assignment, not conditional evaluation function.
And unless you pass 3 arrays that match in shape, or scalar x and y, you'll need a good understanding of broadcasting.
You can find the code in numpy.core.multiarray
C:\Users\<name>\AppData\Local\Programs\Python\Python37-32\Lib\site-packages\numpy\core\multiarray.py is where I found it.
Looking at this question, I realised that it is kind of awkward to use multiprocessing's Pool.map if you want is to run a list of functions in parallel:
from multiprocessing import Pool
def my_fun1(): return 1
def my_fun2(): return 2
def my_fun3(): return 3
with Pool(3) as p:
one, two, three = p.map(lambda f: f(), [my_fun1, my_fun2, my_fun3])
I'm not saying it is exactly cryptic, but I think I expected some conventional name for this, even if only within functools or something, similarly to apply/call in JavaScript (yes, I know JavaScript didn't have lambdas at the time those functions were defined, and no, I'm not saying JavaScript is an exemplary programming language, just an example). In fact, I definitely think something like this should be present in operator, but (unless my eyes deceive me) it seems to be absent. I read that in the case of the identity function the resolution was to let people define their own trivial functions, and I understand it better in that case because there are a couple of different variations you may want, but this one feels like a missing bit to me.
EDIT: As pointed out in the comments, Python 2 used to have an apply function for this purpose.
First, let's look at the practical question.
For any Python from 2.3 on, you can trivially write not just your no-argument apply, but a perfect-forwarding apply, as a one-liner, as explained in the 2.x docs for apply:
The use of apply() is equivalent to function(*args, **keywords)
In other words:
def apply(function, *args, **keywords):
return function(*args, **keywords)
… or, as an inline lambda:
lambda f, *a, **k: f(*a, **kw)
Of course the C implementation was a bit faster, but this is almost never relevant.1
If you're going to be using this more than once, I think defining the function out-of-line and reusing it by name is probably clearer, but the lamdba version is simple and obvious enough (even more so for your no-args use case) that I can't imagine anyone complaining about it.
Also, notice that this is actually more trivial than identity if you understand what you're doing, not less. With identity, it's ambiguous what you should return with multiple arguments (or keyword arguments), so you have to decide which behavior you want; with apple, there's only one obvious answer, and it's pretty much impossible to get wrong.
As for the history:
Python, like JavaScript, originally had no lambda. It's hard to dig up linkable docs for versions before 2.6, and hard to even find them before 2.3, but I think lambda was added in 1.5, and eventually reached the point where it could be used for perfect forwarding around 2.2. Before then, the docs recommended using apply for forwarding, but after that, the docs recommended using lambda in place of apply. In fact, there was no longer any recommended use of apply.
So in 2.3, the function was deprecated.2
During the Python-3000 discussions that led to 3.0, Guido suggested that all of the "functional programming" functions except maybe map and filter were unnecessary.3 Others made good cases for reduce and partial.4 But a big part of the case was that they're actually not trivial to write (in fully-general form), and easy to get wrong. That isn't true for apply. Also, people were able to find relevant uses of reduce and partial in real-world codebases, but the only uses of apply anyone could find were old pre-2.3 code. In fact, it was so rare that it wasn't even worth making the 2to3 tool transform calls to apply.
The final rationale for removing it was summarized in PEP 3100:
apply(): use f(*args, **kw) instead [2]
That footnote links to an essay by Guido called "Python Regrets", which is now a 404 link. The accompanying PowerPoint presentation is still available, however, or you can view an HTML flipbook of the presentation he wrote it for. But all it really says is the same one-liner, and IIRC, the only further discussion was "We already effectively got rid of it in 2.3."
1. In most idiomatic Python code that has to apply a function, the work inside that function is pretty heavy. In your case, of course, the overhead of calling the functions (pickling arguments and passing them over a pipe) is even heavier. The one case where it would matter is when you're doing "Haskell-style functional programming" instead of "Lisp-style"—that is, very few function definitions, and lots of functions made by transforming functions and composing the results. But that's already so slow (and stack-heavy) in Python that it's not a reasonable thing to do. (Flat use of decorators to apply a wrapper or three works great, but a potentially unbounded chain of wrappers will kill your performance.)
2. The formal deprecation mechanism didn't exist yet, so it was just moved to a "Non-essential Built-in Functions" section in the docs. But it was retroactively considered to be deprecated since 2.3, as you can see in the 2.7 docs.
3. Guido originally wanted to get rid of even them; the argument was that list comprehensions can do the same job better, as you can see in the "Regrets" flipbook. But promoting itertools.imap in place of map means it could be made lazy, like the new zip, and therefore better than comprehensions. I'm not sure why Guido didn't just make the same argument with generator expressions.
4. I'm not sure Guido himself was ever convinced for reduce, but the core devs as a whole were.
It sort of is in operator if you do one line of extra work:
>>> def foo():
... print 'hi'
...
>>> from operator import methodcaller
>>> call = methodcaller('__call__')
>>> call(foo)
hi
Of course, call = lambda f: f() is only one line as well...
I noticed that many libraries nowadays seem to prefer the use of strings over enum-type variables for parameters.
Where people would previously use enums, e.g. dateutil.rrule.FR for a Friday, it seems that this has shifted towards using string (e.g. 'FRI').
Same in numpy (or pandas for that matter), where searchsorted for example uses of strings (e.g. side='left', or side='right') rather than a defined enum. For the avoidance of doubt, before python 3.4 this could have been easily implemented as an enum as such:
class SIDE:
RIGHT = 0
LEFT = 1
And the advantages of enums-type variable are clear: You can't misspell them without raising an error, they offer proper support for IDEs, etc.
So why use strings at all, instead of sticking to enum types? Doesn't this make the programs much more prone to user errors? It's not like enums create an overhead - if anything they should be slightly more efficient. So when and why did this paradigm shift happen?
I think enums are safer especially for larger systems with multiple developers.
As soon as the need arises to change the value of such an enum, looking up and replacing a string in many places is not my idea of fun :-)
The most important criteria IMHO is the usage: for use in a module or even a package a string seems to be fine, in a public API I'ld prefer enums.
[update]
As of today (2019) Python introduced dataclasses - combined with optional type annotations and static type analyzers like mypy I think this is a solved problem.
As for efficiency, attribute lookup is somewhat expensive in Python compared to most computer languages so I guess some libraries may still chose to avoid it for performance reasons.
[original answer]
IMHO it is a matter of taste. Some people like this style:
def searchsorted(a, v, side='left', sorter=None):
...
assert side in ('left', 'right'), "Invalid side '{}'".format(side)
...
numpy.searchsorted(a, v, side='right')
Yes, if you call searchsorted with side='foo' you may get an AssertionError way later at runtime - but at least the bug will be pretty easy to spot looking the traceback.
While other people may prefer (for the advantages you highlighted):
numpy.searchsorted(a, v, side=numpy.CONSTANTS.SIDE.RIGHT)
I favor the first because I think seldom used constants are not worth the namespace cruft. You may disagree, and people may align with either side due to other concerns.
If you really care, nothing prevents you from defining your own "enums":
class SIDE(object):
RIGHT = 'right'
LEFT = 'left'
numpy.searchsorted(a, v, side=SIDE.RIGHT)
I think it is not worth but again it is a matter of taste.
[update]
Stefan made a fair point:
As soon as the need arises to change the value of such an enum, looking up and replacing a string in many places is not my idea of fun :-)
I can see how painful this can be in a language without named parameters - using the example you have to search for the string 'right' and get a lot of false positives. In Python you can narrow it down searching for side='right'.
Of course if you are dealing with an interface that already has a defined set of enums/constants (like an external C library) then yes, by all means mimic the existing conventions.
I understand this question has already been answered, but there is one thing that has not at all been addressed: the fact that Python Enum objects must be explicitly called for their value when using values stored by Enums.
>>> class Test(Enum):
... WORD='word'
... ANOTHER='another'
...
>>> str(Test.WORD.value)
'word'
>>> str(Test.WORD)
'Test.WORD'
One simple solution to this problem is to offer an implementation of __str__()
>>> class Test(Enum):
... WORD='word'
... ANOTHER='another'
... def __str__(self):
... return self.value
...
>>> Test.WORD
<Test.WORD: 'word'>
>>> str(Test.WORD)
'word'
Yes, adding .value is not a huge deal, but it is an inconvenience nonetheless. Using regular strings requires zero extra effort, no extra classes, or redefinition of any default class methods. Still, there must be explicit casting to a string value in many cases, where a simple str would not have a problem.
i prefer strings for the reason of debugging. compare an object like
side=1, opt_type=0, order_type=6
to
side='BUY', opt_type='PUT', order_type='FILL_OR_KILL'
i also like "enums" where the values are strings:
class Side(object):
BUY = 'BUY'
SELL = 'SELL'
SHORT = 'SHORT'
Strictly speaking Python does not have enums - or at least it didn't prior to v3.4
https://docs.python.org/3/library/enum.html
I prefer to think of your example as programmer defined constants.
In argparse, one set of constants have string values. While the code uses the constant names, users more often use the strings.
e.g. argparse.ZERO_OR_MORE = '*'
arg.parse.OPTIONAL = '?'
numpy is one of the older 3rd party packages (at least its roots like numeric are). String values are more common than enums. In fact I can't off hand think of any enums (as you define them).
I've been using scientific python (via from pylab import *) on and off for a while as a free Matlab substitute, mainly for work in chemical engineering (I'm a fan of the IPython Notebook). Something that's always appeared strange to me is the fact that there's generally two different ways to interact with an object. For instance, if I have an array, I can tell its dimensions in two ways:
a = array([1,2,3],[2,3,4])
There's the 'Matlab' way:
shape(a)
Or instead I could find it by typing:
a.shape
This seems to contradict The Zen of Python: "There should be one-- and preferably only one --obvious way to do it"
I'm just wondering why there's multiple ways of doing the same thing, and which practice is more fundamental/natural for the language and would be better to use in the long run.
Using the method is preferable. After all, the implementation of shape simply defers to the method anyway (from /numpy/core/fromnumeric.py):
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
I assume a lot of this pylab stuff is just included to help ease the transition for people coming from MATLAB. Get used to it because there are many more examples of numpy being, ahem, not very pythonic.
When you get more used to python and matplotlib you will likely want to ditch the from pylab import * anyway and start writing more numpythonic code, rather than MATLAB style work.
It mostly comes down to a matter of preference, but there are a few differences that you might want to be aware of. First off, you should be using numpy.shape(a) or np.shape(a) instead of shape(a), that's because "Namespaces are one honking great idea -- let's do more of those!" But really, numpy has several names you'll likely find in other python modules, ie array appears as array.array in python stdlib, numpy.array and numpy.ma.array, so to avoid confusing other (and yourself) just go ahead and avoid import the entire numpy namespace.
Other than that, turns out that numpy.shape, and most other similar functions, just look for a shape attribute/method on the argument and if they don't find one, they try and convert the argument to an array. Here is the code:
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
This can be useful if you want the shape of an "array_like" object, you'll notice that most numpy functions take "array_like" arguments. But it can be slow if you're doing something like:
shape = np.shape(list_of_lists)
mx = np.max(list_of_lists)
mn = np.min(list_of_lists)
Other than that, they're pretty much the same.
I read a post recently where someone mentioned that there is no need for using enums in python. I'm interested in whether this is true or not.
For example, I use an enum to represent modem control signals:
class Signals:
CTS = "CTS"
DSR = "DSR"
...
Isn't it better that I use if signal == Signals.CTS: than if signal == "CTS":, or am I missing something?
Signals.CTS does seem better than "CTS". But Signals is not an enum, it's a class with specific fields. The claim, as I've heard it, is that you don't need a separate enum language construct, as you can do things like you've done in the question, or perhaps:
CTS, DSR, XXX, YYY, ZZZ = range(5)
If you have that in a signals module, it can be imported as used in a similar fashion, e.g., if signal == signals.CTS:. This is used in several modules in the standard library, including the re and os modules.
In your exact example, I guess it would be okay to use defined constants, as it would raise an error, when the constant is not found, alas a typo in a string would not.
I guess there is an at least equal solution using object orientation.
BTW: if "CTS": will always be True, since only empty strings are interpreted as False.
It depends on whether you use values of Signal.CTS, Signal.DSR as data. For example if you send these strings to actual modem. If this is true, then it would be a good idea to have aliases defined as you did, because external interfaces tend to change or be less uniform when you would expect. Otherwise if you don't ever use symbols values then you can skip layer of abstraction and use strings directly.
The only thing is not to mix internal symbols and external data.
If you want to have meaningful string constants (CTS = "CTS", etc.), you can simply do:
for constant_name in ('CTS', 'DSR'): # All constant names go here
globals()[constant_name] = constant_name
This defines variables CTS dans DSR with the values you want. (Reference about the use of globals(): Programmatically creating variables in Python.)
Directly defining your constants at the top level of a module is done in many standard library modules (like for instance the re and os modules [re.IGNORECASE, etc.]), so this approach is quite clean.
I think there's a lot more to load on enumerations (setting arbitrary values, bitwise operations, whitespace-d descriptions).
Please read below very short post, check the enum class offered, and judge yourself.
Python Enum Post
Do we need Enums in Python? Do we need an html module, a database module, or a bool type?
I would classify Enums as a nice-to-have, not a must-have.
However, part of the reason Enums finally showed up (in Python 3.4) is because they a such a nice-to-have that many folk reimplemented enums by hand. With so many private and public versions of enumerations, interoperability becomes an issue, standard use becomes an issue, etc., etc.
So to answer your question: No, we don't need an Enum type. But we now have one anyway. There's even a back-ported version.