Where can I find numpy.where() source code? [duplicate] - python

This question already has answers here:
How do I use numpy.where()? What should I pass, and what does the result mean? [closed]
(2 answers)
Closed 4 years ago.
I have already found the source for the numpy.ma.where() function but it seems to be calling the numpy.where() function and to better understand it I would like to take a look if possible.

Most Python functions are written in the Python language, but some functions are written in something more native (often the C language).
Regular Python functions ("pure Python")
There are a few techniques you can use to ask Python itself where a function is defined. Probably the most portable uses the inspect module:
>>> import numpy
>>> import inspect
>>> inspect.isbuiltin(numpy.ma.where)
False
>>> inspect.getsourcefile(numpy.ma.where)
'.../numpy/core/multiarray.py'
But this won't work with a native ("built-in") function:
>>> import numpy
>>> import inspect
>>> inspect.isbuiltin(numpy.where)
True
>>> inspect.getsourcefile(numpy.where)
TypeError: <built-in function where> is not a module, class, method, function, traceback, frame, or code object
Native ("built-in") functions
Unfortunately, Python doesn't provide a record of source files for built-in functions. You can find out which module provides the function:
>>> import numpy as np
>>> np.where
<built-in function where>
>>> np.where.__module__
'numpy.core.multiarray'
Python won't help you find the native (C) source code for that module, but in this case it's reasonable to look in the numpy project for C source that has similar names. I found the following file:
numpy/core/src/multiarray/multiarraymodule.c
And in that file, I found a list of definitions (PyMethodDef) including:
{"where",
(PyCFunction)array_where,
METH_VARARGS, NULL},
This suggests that the C function array_where is the one that Python sees as "where".
The array_where function is defined in the same file, and it mostly delegates to the PyArray_Where function.
In short
NumPy's np.where function is written in C, not Python. A good place to look is PyArray_Where.

First there are 2 distinct versions of where, one that takes just the condition, the other that takes 3 arrays.
The simpler one is most commonly used, and is just another name for np.nonzero. This scans through the condition array twice. Once with np.count_nonzero to determine how many nonzero entries there are, which allows it to allocate the return arrays. The second step is to fill in the coordinates of all nonzero entries. The key is that it returns a tuple of arrays, one array for each dimension of condition.
The condition, x, y version takes three arrays, which it broadcasts against each other. The return array has the common broadcasted shape, with elements chosen from x and y as explained in the answers to your previous question, How exactly does numpy.where() select the elements in this example?
You do realize that most of this code is c or cython, with a significant about of preprocessing. It is hard to read, even for experienced users. It is easier to run a variety of test cases and get a feel for what is happening that way.
A couple things to watch out for. np.where is a python function, and python evaluates each input fully before passing them to it. This is conditional assignment, not conditional evaluation function.
And unless you pass 3 arrays that match in shape, or scalar x and y, you'll need a good understanding of broadcasting.

You can find the code in numpy.core.multiarray

C:\Users\<name>\AppData\Local\Programs\Python\Python37-32\Lib\site-packages\numpy\core\multiarray.py is where I found it.

Related

overloading arbitrary operator in python

Is it possible to overload arbitrary operators in Python? Or is one restricted to the list of operators which have associated magic methods as listed here: https://www.python-course.eu/python3_magic_methods.php ?
I'm asking because I noticed that Numpy uses the # operator to perform matrix multiplication e.g. C=A#B where A,B are Numpy arrays, and I was wondering how they did it.
Edit: The # operator is not in the list I linked to.
Could someone point me to the Numpy source where this is done?
In Python, you cannot create new operators, no. By defining those "magic" functions, you can affect what happens when objects of your own definition are operated upon using the standard operators.
However, the list you linked to is not complete. In Python 3.5, they added special methods for the # operator. Here's the rather terse listing in the Python operator module docs and here are the docs on operator overloading.
operator.matmul(a, b)
operator.__matmul__(a, b)
Return a # b.
New in version 3.5.
I hadn't seen that operator personally, so I did a little more research. It's intended specifically for matrix multiplication. But, I was able to use it for other purposes, though I would argue against doing so as a matter of style:
In [1]: class RichGuyEmailAddress(str):
...: def __matmul__(self, domain_name):
...: return f'{self}#{domain_name}'
...:
In [2]: my_email = RichGuyEmailAddress('billg') # 'microsoft.com'
In [3]: print(my_email)
billg#microsoft.com
So, no, you can't overload any random character, but you can overload the # operator.

Why was the name "arange" chosen for the numpy function?

Is there a specific reason the numpy function arange was named so?
People habitually make the typo arrange, assuming it is spelled as the English word, so the choice seems like something of an oversight given that other less ambiguously spelled options (range etc) are unused by numpy.
Was it chosen as a portmanteau of array and range?
NumPy derives from an older python library called Numeric (in fact, the first array object built for python). The arange function dates back to this library, and its etymology is detailed in its manual:
arrayrange()
The arrayrange() function is similar to the range() function in Python, except that it returns an array as opposed to a list.
...
arange() is a shorthand for arrayrange().
Numeric Manual
2001 (p. 18-19), 1999 (p.23)
Tellingly, there is another example with array(range(25)) (p.16), which is functionally the same as arrayrange().
It is explicitly modelled on the Python range function. The precedent for prefixing a was that in Python 1 there was already a variant of range called xrange.

How to use arrays/vectors in a Python user-defined function?

I'm building a function to calculate the Reliability of a given component/subsystem. For this, I wrote the following in a script:
import math as m
import numpy as np
def Reliability (MTBF,time):
failure_param = pow(MTBF,-1)
R = m.exp(-failure_param*time)
return R
The function works just fine for any time values I call in the function. Now I wanna call the function to calculate the Reliability for a given array, let's say np.linspace(0,24,25). But then I get errors like "Type error: only length-1 arrays can be converted to Python scalars".
Anyone that could help me being able to pass arrays/vectors on a Python function like that?
Thank you very much in advance.
The math.exp() function you are using knows nothing about numpy. It expects either a scalar, or an iterable with only one element, which it can treat as a scalar. Use the numpy.exp() instead, which accepts numpy arrays.
To be able to work with numpy arrays you need to use numpy functions:
import numpy as np
def Reliability (MTBF,time):
return np.exp(-(MTBF ** -1) * time)
If possible you should always use numpy functions instead of math functions, when working with numpy objects.
They do not only work directly on numpy objects like arrays and matrices, but they are highly optimized, i.e using vectorization features of the CPU (like SSE). Most functions like exp/sin/cos/pow are available in the numpy module. Some more advanced functions can be found in scipy.
Rather than call Reliability on the vector, use list comprehension to call it on each element:
[Reliability(MTBF, test_time) for test_time in np.linspace(0,24,25)]
Or:
map(Reliability, zip([MTBF]*25, linspace(0,24,25))
The second one produces a generator object which may be better for performance if the size of your list starts getting huge.

Substitute numpy functions with Python only

I have a python function that employs the numpy package. It uses numpy.sort and numpy.array functions as shown below:
def function(group):
pre_data = np.sort(np.array(
[c["data"] for c in group[1]],
dtype = np.float64
))
How can I re-write the sort and array functions using only Python in such a way that I no longer need the numpy package?
It really depends on the code after this. pre_data will be a numpy.ndarray which means that it has array methods which will be really hard to replicate without numpy. If those methods are being called later in the code, you're going to have a hard time and I'd advise you to just bite the bullet and install numpy. It's popularity is a testament to it's usefulness...
However, if you really just want to sort a list of floats and put it into a sequence-like container:
def function(group):
pre_data = sorted(float(c['data']) for c in group[1])
should do the trick.
Well, it's not strictly possible because the return type is an ndarray. If you don't mind to use a list instead, try this:
pre_data = sorted(float(c["data"]) for c in group[1])
That's not actually using any useful numpy functions anyway
def function(group):
pre_data = sorted(float(c["data"]) for c in group[1])

"Correct" way to interact with numpy objects

I've been using scientific python (via from pylab import *) on and off for a while as a free Matlab substitute, mainly for work in chemical engineering (I'm a fan of the IPython Notebook). Something that's always appeared strange to me is the fact that there's generally two different ways to interact with an object. For instance, if I have an array, I can tell its dimensions in two ways:
a = array([1,2,3],[2,3,4])
There's the 'Matlab' way:
shape(a)
Or instead I could find it by typing:
a.shape
This seems to contradict The Zen of Python: "There should be one-- and preferably only one --obvious way to do it"
I'm just wondering why there's multiple ways of doing the same thing, and which practice is more fundamental/natural for the language and would be better to use in the long run.
Using the method is preferable. After all, the implementation of shape simply defers to the method anyway (from /numpy/core/fromnumeric.py):
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
I assume a lot of this pylab stuff is just included to help ease the transition for people coming from MATLAB. Get used to it because there are many more examples of numpy being, ahem, not very pythonic.
When you get more used to python and matplotlib you will likely want to ditch the from pylab import * anyway and start writing more numpythonic code, rather than MATLAB style work.
It mostly comes down to a matter of preference, but there are a few differences that you might want to be aware of. First off, you should be using numpy.shape(a) or np.shape(a) instead of shape(a), that's because "Namespaces are one honking great idea -- let's do more of those!" But really, numpy has several names you'll likely find in other python modules, ie array appears as array.array in python stdlib, numpy.array and numpy.ma.array, so to avoid confusing other (and yourself) just go ahead and avoid import the entire numpy namespace.
Other than that, turns out that numpy.shape, and most other similar functions, just look for a shape attribute/method on the argument and if they don't find one, they try and convert the argument to an array. Here is the code:
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
This can be useful if you want the shape of an "array_like" object, you'll notice that most numpy functions take "array_like" arguments. But it can be slow if you're doing something like:
shape = np.shape(list_of_lists)
mx = np.max(list_of_lists)
mn = np.min(list_of_lists)
Other than that, they're pretty much the same.

Categories