Trailing underscore in `np.ix_` - python

Why does the name of np.ix_ contain a trailing underscore?

I can't give a complete reason, but it's for symmetry with np.r_, np.c_, etc. I can make a guess for the overall reason:
All of the short numpy operators like np.r_, np.ix_, etc are oriented towards interactive use.
Therefore, it's common (although not advisible) to do from numpy import * in an interactive console.
r, c, ix, etc, are likely to be variable names. Therefore, they're probably suffixed with _ to prevent getting clobbered when a user defines a variable named r or ix in an interactive session after doing from numpy import *.

ix_ is found in numpy.lib.index_tricks
This module is attributed to:
# Written by Konrad Hinsen <hinsen#cnrs-orleans.fr>
# last revision: 1999-7-23
#
# Cosmetic changes by T. Oliphant 2001
It was write many years ago, and incorporated as legacy component in the current numpy. The names were chosen by one programmer many years ago, and never changed to fit Python community standards.
From the .ix_ doc:
Using ix_ one can quickly construct index arrays that will index
the cross product.
My guess: 'i' for 'index', 'x' for 'cross', '_' to avoid confusion with a (potentially) common indexing variable name.
Similarly named objects from the same module are r_, c_ and s_. Technically they are not functions, since they are not callable (don't take ()). But they are indexable (take []). They are actually instances of classes that have __getitem__ definitions. ogrid and mgrid are also indexable objects.

Related

Viewing class definition of built-in classes in Python like Symbol

I am very new to Python (switching from Matlab) and I am currently working with the SymPy package. I realised that I can calculate the derivate of a function with f.diff(x), even when I have not imported the diff function. So, basically f.diff(x) works but diff(f,x) returns an error.
from sympy import symbols
x = symbols('x')
f = x**2 + 1
f.diff(x)
The reason that I could think of was that diff is actually defined as a method attribute for the class Symbol and thus, f.diff(x) works as long as x is of Symbol type and f has been defined using x. Is there a way to somehow view the Symbol class definition in order to verify that a diff method attribute actually exists?
The reason that I could think of was that diff is actually defined as a method attribute for the class Symbol and thus, f.diff(x) works as long as x is of Symbol type and f has been defined using x.
This is mostly correct (corrections below).
In contrast to Matlab, Python uses namespaces. This means that you only have very basic functions, classes, etc. available by default and everything else needs to be imported into the main namespace or is only available with a “prefix” specifying the namespace. What you gain from this is that you avoid name clashes and it’s easy to trace from which module a function is coming. For instance, in your example, the reader can see that symbols was imported from the sympy module (into the main namespace). This module also has a diff function (not the method) that you could use after importing with from sympy import diff.
In this sense, each object comes along with its own namespace, which is for most practical purposes determined by its class¹.
Functions in this namespace are called methods and (usually) do something on the object itself or using the specifics of the object itself.
Now, for the promised corrections or clarifications:
It is f’s class which is relevant here, not x’s.
You can see the class of f with type(f) and it is Add (residing in sympy.core.add).
This is because it is primarily a sum (of x**2 and 1).
More importantly, Add is a subclass of Expr (expression), which is the parent class for all SymPy expressions.
For example, the class Symbol is also a subclass of Expr.
(You can see this with type(f).mro().)
And this is the important thing here: All SymPy expressions have the diff method.
It is actually not relevant that the argument of f.diff is a Symbol or Expr.
It only needs to be something that SymPy can reasonably interpret as one.
For example f.diff("x") also works, because SymPy can translate the string "x" to a Symbol that is equivalent to your x.
Is there a way to somehow view the Symbol class definition in order to verify that a diff method attribute actually exists?
Yes. The easiest way is the basic Python function dir, which returns a list of all attributes (everything accessible by the . operator) of an object. Typically, most of these are methods. In you case, you can just call dir(f). Note that this lists also contains quite some attributes starting with _, which indicates that they are not designated for user consumption. In any reasonable programming environment (IDE, IPython, Jupyter), this list is also shown to you when you use tab completion (F, ., Tab).
However, while learning about a class by going through all its methods is usually a good approach, for SymPy expressions this is not feasible.
There is a lot of things somebody could want to do with these expressions, but you will only ever use a fraction of them.
Instead, you can either guess the name of the method and thus narrow down your search considerable.
For example, you can guess that the method for differentiation starts with a d (be it for differentiate or derivative), and here the tab completion (F, ., D, Tab) only gives you four results instead of three hundred.
Another approach is that you start searching the documentation (or the Internet in general) with what your operation of interest (here differentiating) instead of your the object of your operation (here, SymPy expressions, i.e., instances of Expr). After all SymPy is all about the latter, so that is kind of a given.
Finally, normally there is a documentation of a class featuring all its methods.
For Expr, this is here.
Unfortunately, in case of Expr the documentation is not exhaustive, e.g., it lacks the diff method.
While this is not ideal, it is somewhat understandable given the amount of methods as well as the duality of methods and functions of SymPy: For most methods of Expr, an analogous function can be directly imported from sympy.
¹ You can also just add stuff there (symbols.wrzlprmft = "foo"), but that’s a pretty advanced and rare usage. Also some classes are made to block this, e.g., you cannot do f.wrzlprmft = "foo".

Why was the name "arange" chosen for the numpy function?

Is there a specific reason the numpy function arange was named so?
People habitually make the typo arrange, assuming it is spelled as the English word, so the choice seems like something of an oversight given that other less ambiguously spelled options (range etc) are unused by numpy.
Was it chosen as a portmanteau of array and range?
NumPy derives from an older python library called Numeric (in fact, the first array object built for python). The arange function dates back to this library, and its etymology is detailed in its manual:
arrayrange()
The arrayrange() function is similar to the range() function in Python, except that it returns an array as opposed to a list.
...
arange() is a shorthand for arrayrange().
Numeric Manual
2001 (p. 18-19), 1999 (p.23)
Tellingly, there is another example with array(range(25)) (p.16), which is functionally the same as arrayrange().
It is explicitly modelled on the Python range function. The precedent for prefixing a was that in Python 1 there was already a variant of range called xrange.

Should I use np.absolute or np.abs?

Numpy provides both np.absolute and the alias np.abs defined via
from .numeric import absolute as abs
which seems to be in obvious violation of the zen of python:
There should be one-- and preferably only one --obvious way to do it.
So I'm guessing that there is a good reason for this.
I have personally been using np.abs in almost all of my code and looking at e.g. the number of search results for np.abs vs np.absolute on Stack Overflow it seems like an overwhelming majority does the same (2130 vs 244 hits).
Is there any reason i should preferentially use np.absolute over np.abs in my code, or should I simply go for the more "standard" np.abs?
It's likely because there a built-in functions with the same name, abs. The same is true for np.amax, np.amin and np.round_.
The aliases for the NumPy functions abs, min, max and round are only defined in the top-level package.
So np.abs and np.absolute are completely identical. It doesn't matter which one you use.
There are several advantages to the short names: They are shorter and they are known to Python programmers because the names are identical to the built-in Python functions. So end-users have it easier (less to type, less to remember).
But there are reasons to have different names too: NumPy (or more generally 3rd party packages) sometimes need the Python functions abs, min, etc. So inside the package they define functions with a different name so you can still access the Python functions - and just in the top-level of the package you expose the "shortcuts". Note: Different names are not the only available option in that case: One could work around that with the Python module builtins to access the built-in functions if one shadowed a built-in name.
It might also be the case (but that's pure speculation on my part) that they originally only included the long-named functions absolute (and so on) and only added the short aliases later. Being a large and well-used library the NumPy developers don't remove or deprecate stuff lightly. So they may just keep the long names around because it could break old code/scripts if they would remove them.
There also is Python's built-in abs(), but really all those functions are doing the same thing. They're even exactly equally fast! (This is not the case for other functions, like max().)
Code to reproduce the plot:
import numpy as np
import perfplot
def np_absolute(x):
return np.absolute(x)
def np_abs(x):
return np.abs(x)
def builtin_abs(x):
return abs(x)
b = perfplot.bench(
setup=np.random.rand,
kernels=[np_abs, np_absolute, builtin_abs],
n_range=[2 ** k for k in range(25)],
xlabel="len(data)",
)
b.save("out.png")
b.show()

Enum vs String as a parameter in a function

I noticed that many libraries nowadays seem to prefer the use of strings over enum-type variables for parameters.
Where people would previously use enums, e.g. dateutil.rrule.FR for a Friday, it seems that this has shifted towards using string (e.g. 'FRI').
Same in numpy (or pandas for that matter), where searchsorted for example uses of strings (e.g. side='left', or side='right') rather than a defined enum. For the avoidance of doubt, before python 3.4 this could have been easily implemented as an enum as such:
class SIDE:
RIGHT = 0
LEFT = 1
And the advantages of enums-type variable are clear: You can't misspell them without raising an error, they offer proper support for IDEs, etc.
So why use strings at all, instead of sticking to enum types? Doesn't this make the programs much more prone to user errors? It's not like enums create an overhead - if anything they should be slightly more efficient. So when and why did this paradigm shift happen?
I think enums are safer especially for larger systems with multiple developers.
As soon as the need arises to change the value of such an enum, looking up and replacing a string in many places is not my idea of fun :-)
The most important criteria IMHO is the usage: for use in a module or even a package a string seems to be fine, in a public API I'ld prefer enums.
[update]
As of today (2019) Python introduced dataclasses - combined with optional type annotations and static type analyzers like mypy I think this is a solved problem.
As for efficiency, attribute lookup is somewhat expensive in Python compared to most computer languages so I guess some libraries may still chose to avoid it for performance reasons.
[original answer]
IMHO it is a matter of taste. Some people like this style:
def searchsorted(a, v, side='left', sorter=None):
...
assert side in ('left', 'right'), "Invalid side '{}'".format(side)
...
numpy.searchsorted(a, v, side='right')
Yes, if you call searchsorted with side='foo' you may get an AssertionError way later at runtime - but at least the bug will be pretty easy to spot looking the traceback.
While other people may prefer (for the advantages you highlighted):
numpy.searchsorted(a, v, side=numpy.CONSTANTS.SIDE.RIGHT)
I favor the first because I think seldom used constants are not worth the namespace cruft. You may disagree, and people may align with either side due to other concerns.
If you really care, nothing prevents you from defining your own "enums":
class SIDE(object):
RIGHT = 'right'
LEFT = 'left'
numpy.searchsorted(a, v, side=SIDE.RIGHT)
I think it is not worth but again it is a matter of taste.
[update]
Stefan made a fair point:
As soon as the need arises to change the value of such an enum, looking up and replacing a string in many places is not my idea of fun :-)
I can see how painful this can be in a language without named parameters - using the example you have to search for the string 'right' and get a lot of false positives. In Python you can narrow it down searching for side='right'.
Of course if you are dealing with an interface that already has a defined set of enums/constants (like an external C library) then yes, by all means mimic the existing conventions.
I understand this question has already been answered, but there is one thing that has not at all been addressed: the fact that Python Enum objects must be explicitly called for their value when using values stored by Enums.
>>> class Test(Enum):
... WORD='word'
... ANOTHER='another'
...
>>> str(Test.WORD.value)
'word'
>>> str(Test.WORD)
'Test.WORD'
One simple solution to this problem is to offer an implementation of __str__()
>>> class Test(Enum):
... WORD='word'
... ANOTHER='another'
... def __str__(self):
... return self.value
...
>>> Test.WORD
<Test.WORD: 'word'>
>>> str(Test.WORD)
'word'
Yes, adding .value is not a huge deal, but it is an inconvenience nonetheless. Using regular strings requires zero extra effort, no extra classes, or redefinition of any default class methods. Still, there must be explicit casting to a string value in many cases, where a simple str would not have a problem.
i prefer strings for the reason of debugging. compare an object like
side=1, opt_type=0, order_type=6
to
side='BUY', opt_type='PUT', order_type='FILL_OR_KILL'
i also like "enums" where the values are strings:
class Side(object):
BUY = 'BUY'
SELL = 'SELL'
SHORT = 'SHORT'
Strictly speaking Python does not have enums - or at least it didn't prior to v3.4
https://docs.python.org/3/library/enum.html
I prefer to think of your example as programmer defined constants.
In argparse, one set of constants have string values. While the code uses the constant names, users more often use the strings.
e.g. argparse.ZERO_OR_MORE = '*'
arg.parse.OPTIONAL = '?'
numpy is one of the older 3rd party packages (at least its roots like numeric are). String values are more common than enums. In fact I can't off hand think of any enums (as you define them).

side effect gotchas in python/numpy? horror stories and narrow escapes wanted

I am considering moving from Matlab to Python/numpy for data analysis and numerical simulations. I have used Matlab (and SML-NJ) for years, and am very comfortable in the functional environment without side effects (barring I/O), but am a little reluctant about the side effects in Python. Can people share their favorite gotchas regarding side effects, and if possible, how they got around them? As an example, I was a bit surprised when I tried the following code in Python:
lofls = [[]] * 4 #an accident waiting to happen!
lofls[0].append(7) #not what I was expecting...
print lofls #gives [[7], [7], [7], [7]]
#instead, I should have done this (I think)
lofls = [[] for x in range(4)]
lofls[0].append(7) #only appends to the first list
print lofls #gives [[7], [], [], []]
thanks in advance
Confusing references to the same (mutable) object with references to separate objects is indeed a "gotcha" (suffered by all non-functional languages, ones which have mutable objects and, of course, references). A frequently seen bug in beginners' Python code is misusing a default value which is mutable, e.g.:
def addone(item, alist=[]):
alist.append(item)
return alist
This code may be correct if the purpose is to have addone keep its own state (and return the one growing list to successive callers), much as static data would work in C; it's not correct if the coder is wrongly assuming that a new empty list will be made at each call.
Raw beginners used to functional languages can also be confused by the command-query separation design decision in Python's built-in containers: mutating methods that don't have anything in particular to return (i.e., the vast majority of mutating methods) return nothing (specifically, they return None) -- they're doing all their work "in-place". Bugs coming from misunderstanding this are easy to spot, e.g.
alist = alist.append(item)
is pretty much guaranteed to be a bug -- it appends an item to the list referred to by name alist, but then rebinds name alist to None (the return value of the append call).
While the first issue I mentioned is about an early-binding that may mislead people who think the binding is, instead, a late one, there are issues that go the other way, where some people's expectations are for an early binding while the binding is, instead, late. For example (with a hypothetical GUI framework...):
for i in range(10):
Button(text="Button #%s" % i,
click=lambda: say("I'm #%s!" % i))
this will show ten buttons saying "Button #0", "Button #1", etc, but, when clicked, each and every one of them will say it's #9 -- because the i within the lambda is late bound (with a lexical closure). A fix is to take advantage of the fact that default values for argument are early-bound (as I pointed out about the first issue!-) and change the last line to
click=lambda i=i: say("I'm #%s!" % i))
Now lambda's i is an argument with a default value, not a free variable (looked up by lexical closure) any more, and so the code works as intended (there are other ways too, of course).
I stumbled upon this one recently again, (after years of python) while trying to remove a small dependency on numpy.
If you come from matlab you should use and trust numpy functions for mono-type array handling. Along with matplotlib, they are some very convenient packages for a smooth transition.
import numpy as np
np.zeros((4,)) # to make an array full of zeros [0,0,0,0]
np.zeros((4,1)) # another one full of zeros but 2 dimensions [[0],[0],[0],[0]]
np.zeros((4,0)) # an empty array like [[],[],[],[]]
np.zeros((0,4)) # another empty array, which can not be represented with python lists o_O
etc.

Categories