Is there a specific reason the numpy function arange was named so?
People habitually make the typo arrange, assuming it is spelled as the English word, so the choice seems like something of an oversight given that other less ambiguously spelled options (range etc) are unused by numpy.
Was it chosen as a portmanteau of array and range?
NumPy derives from an older python library called Numeric (in fact, the first array object built for python). The arange function dates back to this library, and its etymology is detailed in its manual:
arrayrange()
The arrayrange() function is similar to the range() function in Python, except that it returns an array as opposed to a list.
...
arange() is a shorthand for arrayrange().
Numeric Manual
2001 (p. 18-19), 1999 (p.23)
Tellingly, there is another example with array(range(25)) (p.16), which is functionally the same as arrayrange().
It is explicitly modelled on the Python range function. The precedent for prefixing a was that in Python 1 there was already a variant of range called xrange.
Related
This question already has answers here:
How do I use numpy.where()? What should I pass, and what does the result mean? [closed]
(2 answers)
Closed 4 years ago.
I have already found the source for the numpy.ma.where() function but it seems to be calling the numpy.where() function and to better understand it I would like to take a look if possible.
Most Python functions are written in the Python language, but some functions are written in something more native (often the C language).
Regular Python functions ("pure Python")
There are a few techniques you can use to ask Python itself where a function is defined. Probably the most portable uses the inspect module:
>>> import numpy
>>> import inspect
>>> inspect.isbuiltin(numpy.ma.where)
False
>>> inspect.getsourcefile(numpy.ma.where)
'.../numpy/core/multiarray.py'
But this won't work with a native ("built-in") function:
>>> import numpy
>>> import inspect
>>> inspect.isbuiltin(numpy.where)
True
>>> inspect.getsourcefile(numpy.where)
TypeError: <built-in function where> is not a module, class, method, function, traceback, frame, or code object
Native ("built-in") functions
Unfortunately, Python doesn't provide a record of source files for built-in functions. You can find out which module provides the function:
>>> import numpy as np
>>> np.where
<built-in function where>
>>> np.where.__module__
'numpy.core.multiarray'
Python won't help you find the native (C) source code for that module, but in this case it's reasonable to look in the numpy project for C source that has similar names. I found the following file:
numpy/core/src/multiarray/multiarraymodule.c
And in that file, I found a list of definitions (PyMethodDef) including:
{"where",
(PyCFunction)array_where,
METH_VARARGS, NULL},
This suggests that the C function array_where is the one that Python sees as "where".
The array_where function is defined in the same file, and it mostly delegates to the PyArray_Where function.
In short
NumPy's np.where function is written in C, not Python. A good place to look is PyArray_Where.
First there are 2 distinct versions of where, one that takes just the condition, the other that takes 3 arrays.
The simpler one is most commonly used, and is just another name for np.nonzero. This scans through the condition array twice. Once with np.count_nonzero to determine how many nonzero entries there are, which allows it to allocate the return arrays. The second step is to fill in the coordinates of all nonzero entries. The key is that it returns a tuple of arrays, one array for each dimension of condition.
The condition, x, y version takes three arrays, which it broadcasts against each other. The return array has the common broadcasted shape, with elements chosen from x and y as explained in the answers to your previous question, How exactly does numpy.where() select the elements in this example?
You do realize that most of this code is c or cython, with a significant about of preprocessing. It is hard to read, even for experienced users. It is easier to run a variety of test cases and get a feel for what is happening that way.
A couple things to watch out for. np.where is a python function, and python evaluates each input fully before passing them to it. This is conditional assignment, not conditional evaluation function.
And unless you pass 3 arrays that match in shape, or scalar x and y, you'll need a good understanding of broadcasting.
You can find the code in numpy.core.multiarray
C:\Users\<name>\AppData\Local\Programs\Python\Python37-32\Lib\site-packages\numpy\core\multiarray.py is where I found it.
numpy.copysign
I know how this function works, but I can't fully understand what this title description mean.
Like
x2[, out]
what does this parameter mean? Is it a datatype in python?
and
" = < ufunc 'copysign'> "
I have see something like this several times when I look through the documents.
Can anybody help? Thank you so much.
The brackets are standard Python documentation syntax for optional parameters to a function call. From the Python Language Reference Introduction:
a phrase enclosed in square brackets ([ ]) means zero or one occurrences (in other words, the enclosed phrase is optional)
You'll notice it all over the place in Python & its libraries' documentation.
The = <ufunc 'func_name'> bit is to let you know that the function is an instance of the numpy.ufunc class. From the NumPy docs on Universal Functions:
A universal function (or ufunc for short) is a function that operates
on ndarrays in an element-by-element fashion, supporting array
broadcasting, type casting, and several other standard features. That
is, a ufunc is a “vectorized” wrapper for a function that takes a
fixed number of scalar inputs and produces a fixed number of scalar
outputs.
In NumPy, universal functions are instances of the numpy.ufunc class. Many of the built-in functions are implemented in compiled C code, but ufunc instances can also be produced using the frompyfunc factory function.
So .loc and .iloc are not your typical functions. They somehow use [ and ] to surround the arguments so that it is comparable to normal array indexing. However, I have never seen this in another library (that I can think of, maybe numpy as something like this that I'm blanking on), and I have no idea how it technically works/is defined in the python code.
Are the brackets in this case just syntactic sugar for a function call? If so, how then would one make an arbitrary function use brackets instead of parenthesis? Otherwise, what is special about their use/defintion Pandas?
Note: The first part of this answer is a direct adaptation of my answer to this other question, that was answered before this question was reopened. I expand on the "why" in the second part.
So .loc and .iloc are not your typical functions
Indeed, they are not functions at all. I'll make examples with loc, iloc is analogous (it uses different internal classes).
The simplest way to check what loc actually is, is:
import pandas as pd
df = pd.DataFrame()
print(df.loc.__class__)
which prints
<class 'pandas.core.indexing._LocIndexer'>
this tells us that df.loc is an instance of a _LocIndexer class. The syntax loc[] derives from the fact that _LocIndexer defines __getitem__ and __setitem__*, which are the methods python calls whenever you use the square brackets syntax.
So yes, brackets are, technically, syntactic sugar for some function call, just not the function you thought it was (there are of course many reasons why python is designed this way, I won't go in the details here because 1) I am not sufficiently expert to provide an exhaustive answer and 2) there are a lot of better resources on the web about this topic).
*Technically, it's its base class _LocationIndexer that defines those methods, I'm simplifying a bit here
Why does Pandas use square brackets with .loc and .iloc?
I'm entering speculation area here, because I couldn't find any document explicitly talking about design choices in Pandas, however: there are at least two good reasons I see for choosing the square brackets.
The first, and most important reason is: you simply can't do with a function call everything you do with the square-bracket notation, because assigning to a function call is a syntax error in python:
# contrived example to show this can't work
a = []
def f():
global a
return a
f().append(1) # OK
f() = dict() # SyntaxError: cannot assign to function call
Using round brackets for a "function" call, calls the underlying __call__ method (note that any class that defines __call__ is callable, so "function" call is an incorrect term because python doesn't care whether something is a function or just behaves like one).
Using square brackets, instead, alternatively calls __getitem__ or __setitem__ depending on when the call happens (__setitem__ if it's on the left of an assignment operator, __getitem__ in any other case). There is no way to mimic this behaviour with a function call, you'd need a setter method to modify the data in the dataframe, but it still wouldn't be allowed in an assignment operation:
# imaginary method-based alternative to the square bracket notation:
my_data = df.get_loc(my_index)
df.set_loc(my_index, my_data*2)
This example brings me to the second reason: consistency. You can access elements of a DataFrame via square brackets:
something = df['a']
df['b'] = 2*something
when using loc you're still trying to refer to some items in the DataFrame, so it's more consistent to use the same syntax instead of asking the user to use some getter and setter functions (it's also, I believe, "more pythonic", but that's a fuzzy concept I'd rather stay away from).
Underneath the covers, both are using the __setitem__ and __getitem__ functions.
I was wondering if there is a proper Python convention to distinguish between functions that either alter their arguments in place or functions that leave their arguments in tact and return an altered copy. For example, consider two functions that apply some permutation. Function f takes a list as an argument and shuffles the elements around, while function g takes a list, copies it, applies f and then returns the altered copy.
For the above example, I figured that f could be called permute(x), while g could be permutation(x), using verbs and nouns to distinguish. This is not always really optimal, though, and in some cases it might lead to confusion as to whether an argument is going to get changed along the way - especially if f were to have some other value to return.
Is there a standard way to deal with this problem?
There is no handy naming convention, not written down in places like PEP-8 at any rate.
The Python standard library does use such a convention to a certain extend. Compare:
listobj.sort()
listobj.reverse()
with
sorted_listobj = sorted(listobj)
reversed_iterator = reversed(listobj)
Using a verb when acting on the object directly, an adjective when returning a new.
The convention isn't consistent. When enumerating an iterable, you use enumerate(), not enumerated(). filter() doesn't alter the input sequence in place. round() doesn't touch the input number object. compile() produces a new bytecode object. Etc. But none of those operations have in-place equivalents in the Python standard library anyway, and I am not aware of any use of adjectives where the input is altered in-place.
Why does the name of np.ix_ contain a trailing underscore?
I can't give a complete reason, but it's for symmetry with np.r_, np.c_, etc. I can make a guess for the overall reason:
All of the short numpy operators like np.r_, np.ix_, etc are oriented towards interactive use.
Therefore, it's common (although not advisible) to do from numpy import * in an interactive console.
r, c, ix, etc, are likely to be variable names. Therefore, they're probably suffixed with _ to prevent getting clobbered when a user defines a variable named r or ix in an interactive session after doing from numpy import *.
ix_ is found in numpy.lib.index_tricks
This module is attributed to:
# Written by Konrad Hinsen <hinsen#cnrs-orleans.fr>
# last revision: 1999-7-23
#
# Cosmetic changes by T. Oliphant 2001
It was write many years ago, and incorporated as legacy component in the current numpy. The names were chosen by one programmer many years ago, and never changed to fit Python community standards.
From the .ix_ doc:
Using ix_ one can quickly construct index arrays that will index
the cross product.
My guess: 'i' for 'index', 'x' for 'cross', '_' to avoid confusion with a (potentially) common indexing variable name.
Similarly named objects from the same module are r_, c_ and s_. Technically they are not functions, since they are not callable (don't take ()). But they are indexable (take []). They are actually instances of classes that have __getitem__ definitions. ogrid and mgrid are also indexable objects.