Why tuple convention in function parameters? - python

I was wondering why many functions - especially in numpy - utilize tuples as function parameters?
e.g.:
a = numpy.ones( (10, 5) )
What could possibly be the use for that? Why not simply have something such as the following, since clearly the first parameters will always denote the size of the array?
a = numpy.ones(10, 5)
Is it because there might be additional parameters, such as dtype? even if so,
a = numpy.ones(10, 5, dtype=numpy.int)
seems much cleaner to me, than using the convoluted tuple convention.
Thanks for your replies

Because you want to be able to do:
a = numpy.ones(other_array.shape)
and other_array.shape is a tuple. There are a few functions that are not consistent with this and work as you've described, e.g. numpy.random.rand()

I think one of the benefits of this is that it can lead to consistency between the various methods. I'm not that familiar with numpy, but it would seem to me the first use case that comes to mind is if numpy can return the size of an array, that size, as one variable, can be directly passed to another numpy method, without having to know anything about the internals of how that size item is built.
The other part of it is that size of an array may have two components but it's discussed as one value, not as two.

My guess: this is because in functions like np.ones, shape can be passed as a keyword argument when it's a single value. Try
np.ones(dtype=int, shape=(2, 3))
and notice that you get the same value as you would have gotten from np.ones((2, 3), dtype=int).
[This works in Python more generally:
>>> def f(a, b):
... return a + b
...
>>> f(b="foo", a="bar")
'barfoo'
]

In order for python to tell the difference between foo(1, 2), foo(1, dtype='int') and foo(1, 2, dtype='int') you would have to use keyword-only arguments which weren't formally introduced until python 3. It is possible to use **kargs to implement keyword only arguments in python 2.x but it's unnatural and does not seem Pythonic. I think for that reason array does not allow array(1, 2) but reshape(1, 2) is ok because reshape does not take any keywords.

Related

Passing new shape to `np.reshape`

Within numpy.ndarray.reshape, the shape parameter is an int or tuple of ints, and
The new shape should be compatible with the original shape. If an
integer, then the result will be a 1-D array of that length.
The documentation signature is just:
# Note this question doesn't apply to the function version, `np.reshape`
np.ndarray.reshape(shape, order='C')
In practice the specification doesn't seem to be this strict. From the description above I would expect to need to use:
import numpy as np
a = np.arange(12)
b = a.reshape((4,3)) # (4,3) is passed to `newshape`
But instead I can get away with just:
c = a.reshape(4,3) # Seems like just 4 would be passed to `newshape`
# and 3 would be passed to next parameter, `order`
print(np.array_equal(b,c))
# True
How is it that I can do this? I know that if I just simply enter 2, 3 into a Python shell, it is technically a tuple whether I use parentheses or not. But the comparison above seems to violate basic laws of how positional parameters are passed to the dict of keyword args. I.e.:
def f(a, b=1, order='c'):
print(a)
print(b)
f((4,3))
print()
f(4,3)
# (4, 3)
# 1
#
# 4
# 3
...and there are no star operators in reshape. (Something akin to def f(*a, order='c') above.)
With the way that parameters are bound with normal Python methods, it should not work, but the method is not a Python method at all. Numpy is an extension module for CPython, and numpy.ndarray.reshape is actually implemented in C.
If you look at the implementation, the order parameter is only ever read as a keyword argument. A positional argument will never be bound to it, unlike with a normal Python method where the second positional argument would be bound to order. The C code tries to build the value for newshape from all of the positional arguments.
There's nothing magic going on. The function's signature just doesn't match the documentation. It's documented as
ndarray.reshape(shape, order='C')
but it's written in C, and instead of doing the C-api equivalent of
def reshape(self, shape, order='C'):
it does the C-api equivalent of manual *args and **kwargs handling. You can take a look in numpy/core/src/multiarray/methods.c. (Note that the C-api equivalent of def reshape(self, shape, order='C'): would have the same C-level signature as what the current code is doing, but it would immediately use something like PyArg_ParseTupleAndKeywords to parse the arguments instead of doing manual handling.)

What to pass when passing arguments where a list or tuple is required?

Which of the following should I use and why?
import numpy as np
a = np.zeros([2, 3])
b = np.zeros((2, 3))
There are many cases where you can pass arguments in either way, I just wonder if one is more Pythonic or if there are other reasons where one should be preferred over the other.
I looked at this question where people tried to explain the difference between a tuple and a list. That's not what I'm interested in, unless there are reasons I should care which I ignore of course!
UPDATE:
Although numpy was used as an example this pertains generally to python. A non numpy example is as follows:
a = max([1, 2, 3, 5, 4])
b = max((1, 2, 3, 5, 4))
I'm not editing the above because some answers use numpy in their explanation
I'm answering this in the context of passing a literal iterable to a constructor or function beyond which the type does not matter. If you need to pass in a hashable argument, you need a tuple. If you'll need it mutated, pass in a list (so that you don't add tuples to tuples thereby multiplying the creation of objects.)
The answer to your question is that the better option varies situationally. Here's the tradeoffs.
Starting with list type, which is mutable, it preallocates memory for future extension:
a = np.zeros([2, 3])
Pro: It's easily readable.
Con: It wastes memory, and it's less performant.
Next, the tuple type, which is immutable. It doesn't need to preallocate memory for future extension, because it can't be extended.
b = np.zeros((2, 3))
Pro: It uses minimal memory, and it's more performant.
Con: It's a little less readable.
My preference is to pass tuple literals where memory is a consideration, for example, long-running scripts that will be used by lots of people. On the other hand, when I'm using an interactive interpreter, I prefer to pass lists because they're a bit more readable, the contrast between the square brackets and the parenthesis makes for easy visual parsing.
You should only care about performance in a function, where the code is compiled to bytecode:
>>> min(timeit.repeat('foo()', 'def foo(): return (0, 1)'))
0.080030765042010898
>>> min(timeit.repeat('foo()', 'def foo(): return [0, 1]'))
0.17389221549683498
Finally, note that the performance consideration will be dwarfed by other considerations. You use Python for speed of development, not for speed of algorithmic implementation. If you use a bad algorithm, your performance will be much worse. It's also very performant in many respects as well. I consider this only important insomuch as it may scale, if it can ameliorate heavily used processes from dying a death of a thousand cuts.
If the number of items is known at design time (e.g coordinates, colour systems) then I would go with tuples, otherwise go with lists.
If I am writing an interface, my code will tend to just check for whether the argument is iterable or is a sequence (rather than checking for a specific type, unless the interface needs a specific type). I use the collections module to do my checks - it feels cleaner than checking for particular attributes.

Unpacking arguments: only named arguments may follow *expression

The following works beautifully in Python:
def f(x,y,z): return [x,y,z]
a=[1,2]
f(3,*a)
The elements of a get unpacked as if you had called it like f(3,1,2) and it returns [3,1,2]. Wonderful!
But I can't unpack the elements of a into the first two arguments:
f(*a,3)
Instead of calling that like f(1,2,3), I get "SyntaxError: only named arguments may follow *expression".
I'm just wondering why it has to be that way and if there's any clever trick I might not be aware of for unpacking arrays into arbitrary parts of argument lists without resorting to temporary variables.
As Raymond Hettinger's answer points out, this may change has changed in Python 3 and here is a related proposal, which has been accepted.
Especially related to the current question, here's one of the possible changes to that proposal that was discussed:
Only allow a starred expression as the last item in the exprlist. This would simplify the
unpacking code a bit and allow for the starred expression to be assigned an iterator. This
behavior was rejected because it would be too surprising.
So there are implementation reasons for the restriction with unpacking function arguments but it is indeed a little surprising!
In the meantime, here's the workaround I was looking for, kind of obvious in retrospect:
f(*(a+[3]))
It doesn't have to be that way. It was just rule that Guido found to be sensible.
In Python 3, the rules for unpacking have been liberalized somewhat:
>>> a, *b, c = range(10)
>>> a
0
>>> b
[1, 2, 3, 4, 5, 6, 7, 8]
>>> c
9
Depending on whether Guido feels it would improve the language, that liberalization could also be extended to function arguments.
See the discussion on extended iterable unpacking for some thoughts on why Python 3 changed the rules.
Thanks to the PEP 448 - Additional Unpacking Generalizations,
f(*a, 3)
is now accepted syntax starting from Python 3.5. Likewise you can use the double-star ** for keyword argument unpacking anywhere and either one can be used multiple times.
f is expecting 3 arguments (x, y, z, in that order).
Suppose L = [1,2]. When you call f(3, *L), what python does behind the scenes, is to call f(3, 1, 2), without really knowing the length of L.
So what happens if L was instead [1,2,3]?
Then, when you call f(3, *L), you'll end up calling f(3,1,2,3), which will be an error because f is expecting exactly 3 arguments and you gave it 4.
Now, suppose L=[1,2]1. Look at what happens when you callf`:
>>> f(3,*L) # works fine
>>> f(*L) # will give you an error when f(1,2) is called; insufficient arguments
Now, you implicitly know when you call f(*L, 3) that 3 will be assigned to z, but python doesn't know that. It only knows that the last j many elements of the input to f will be defined by the contents of L. But since it doesn't know the value of len(L), it can't make assumptions about whether f(*L,3) would have the correct number of arguments.
This however, is not the case with f(3,*L). In this case, python knows that all the arguments EXCEPT the first one will be defined by the contents of L.
But if you have named arguments f(x=1, y=2, z=3), then the arguments being assigned to by name will be bound first. Only then are the positional arguments bound. So you do f(*L, z=3). In that case, z is bound to 3 first, and then, the other values get bound.
Now interestingly, if you did f(*L, y=3), that would give you an error for trying to assign to y twice (once with the keyword, once again with the positional)
Hope this helps
You can use f(*a, z=3) if you use f(*a, 3), it do not know how to unpack the parameter for you provided 2 parameters and 2 is the second.
Nice. This also works for tuples. Don't forget the comma:
a = (1,2)
f(*(a+(3,)))

Python-numpy test for ndarray using ndim

I'm working on a project in Python requiring a lot of numerical array calculations. Unfortunately (or fortunately, depending on your POV), I'm very new to Python, but have been doing MATLAB and Octave programming (APL before that) for years. I'm very used to having every variable automatically typed to a matrix float, and still getting used to checking input types.
In many of my functions, I require the input S to be a numpy.ndarray of size (n,p), so I have to both test that type(S) is numpy.ndarray and get the values (n,p) = numpy.shape(S). One potential problem is that the input could be a list/tuple/int/etc..., another problem is that the input could be an array of shape (): S.ndim = 0. It occurred to me that I could simultaneously test the variable type, fix the S.ndim = 0problem, then get my dimensions like this:
# first simultaneously test for ndarray and get proper dimensions
try:
if (S.ndim == 0):
S = S.copy(); S.shape = (1,1);
# define dimensions p, and p2
(p,p2) = numpy.shape(S);
except AttributeError: # got here because input is not something array-like
raise AttributeError("blah blah blah");
Though it works, I'm wondering if this is a valid thing to do? The docstring for ndim says
If it is not already an ndarray, a conversion is
attempted.
and we surely know that numpy can easily convert an int/tuple/list to an array, so I'm confused why an AttributeError is being raised for these types inputs, when numpy should be doing this
numpy.array(S).ndim;
which should work.
When doing input validation for NumPy code, I always use np.asarray:
>>> np.asarray(np.array([1,2,3]))
array([1, 2, 3])
>>> np.asarray([1,2,3])
array([1, 2, 3])
>>> np.asarray((1,2,3))
array([1, 2, 3])
>>> np.asarray(1)
array(1)
>>> np.asarray(1).shape
()
This function has the nice feature that it only copies data when necessary; if the input is already an ndarray, the data is left in-place (only the type may be changed, because it also gets rid of that pesky np.matrix).
The docstring for ndim says
That's the docstring for the function np.ndim, not the ndim attribute, which non-NumPy objects don't have. You could use that function, but the effect would be that the data might be copied twice, so instead do:
S = np.asarray(S)
(p, p2) = S.shape
This will raise a ValueError if S.ndim != 2.
[Final note: you don't need ; in Python if you just follow the indentation rules. In fact, Python programmers eschew the semicolon.]
Given the comments to #larsmans answer, you could try:
if not isinstance(S, np.ndarray):
raise TypeError("Input not a ndarray")
if S.ndim == 0:
S = np.reshape(S, (1,1))
(p, p2) = S.shape
First, you check explicitly whether S is a (subclass of) ndarray. Then, you use the np.reshape to copy your data (and reshaping it, of course) if needed. At last, you get the dimension.
Note that in most cases, the np functions will first try to access the corresponding method of a ndarray, then attempt to convert the input to a ndarray (sometimes keeping it a subclass, as in np.asanyarray, sometimes not (as in np.asarray(...)). In other terms, it's always more efficient to use the method rather than the function: that's why we're using S.shape and not np.shape(S).
Another point: the np.asarray, np.asanyarray, np.atleast_1D... are all particular cases of the more generic function np.array. For example, asarray sets the optional copy argument of array to False, asanyarray does the same and sets subok=True, atleast_1D sets ndmin=1, atleast_2d sets ndmin=2... In other terms, it's always easier to use np.array with the appropriate arguments. But as mentioned in some comments, it's a matter of style. Shortcuts can often improve readability, which is always an objective to keep in mind.
In any case, when you use np.array(..., copy=True), you're explicitly asking for a copy of your initial data, a bit like doing a list([....]). Even if nothing else changed, your data will be copied. That has the advantages of its drawbacks (as we say in French), you could for example change the order from row-first C to column-first F. But anyway, you get the copy you wanted.
With np.array(input, copy=False), a new array is always created. It will either point to the same block of memory as input if this latter was already a ndarray (that is, no waste of memory), or will create a new one "from scratch" if input wasn't. The interesting case is of course if input was a ndarray.
Using this new array in a function may or may not change the original input, depending on the function. You have to check the documentation of the function you want to use to see whether it returns a copy or not. The NumPy developers try hard to limit unnecessary copies (following the Python example), but sometimes it can't be avoided. The documentation should tell explicitly what happens, if it doesn't or it's unclear, please mention it.
np.array(...) may raise some exceptions if something goes awry. For example, trying to use a dtype=float with an input like ["STRING", 1] will raise a ValueError. However, I must admit I can't remember which exceptions in all the cases, please edit this post accordingly.
Welcome to stack-overflow. This comes down to almost a style choice, but the most common way I've seen to deal with this kind of situation is to convert the input to an array. Numpy provides some useful tools for this. numpy.asarray has already been mentioned, but here are a few more. numpy.at_least1d is similar to asarray, but reshapes () arrays to be (1,) numpy.at_least2d is the same as above but reshapes 0d and 1d arrays to be 2d, ie (3,) to (1, 3). The reason we convert "array_like" inputs to arrays is partly just because we're lazy, for example sometimes it can be easier to write foo([1, 2, 3]) than foo(numpy.array([1, 2, 3])), but this is also the design choice made within numpy itself. Notice that the following works:
>>> numpy.mean([1., 2., 3.])
>>> 2.0
In the docs for numpy.mean we can see that x should be "array_like".
Parameters
----------
a : array_like
Array containing numbers whose mean is desired. If `a` is not an
array, a conversion is attempted.
That being said, there are situations when you want to only accept arrays as arguments and not all "array_like" types.

Semantics of tuple unpacking in python

Why does python only allow named arguments to follow a tuple unpacking expression in a function call?
>>> def f(a,b,c):
... print a, b, c
...
>>> f(*(1,2),3)
File "<stdin>", line 1
SyntaxError: only named arguments may follow *expression
Is it simply an aesthetic choice, or are there cases where allowing this would lead to some ambiguities?
i am pretty sure that the reason people "naturally" don't like this is because it makes the meaning of later arguments ambiguous, depending on the length of the interpolated series:
def dangerbaby(a, b, *c):
hug(a)
kill(b)
>>> dangerbaby('puppy', 'bug')
killed bug
>>> cuddles = ['puppy']
>>> dangerbaby(*cuddles, 'bug')
killed bug
>>> cuddles.append('kitten')
>>> dangerbaby(*cuddles, 'bug')
killed kitten
you cannot tell from just looking at the last two calls to dangerbaby which one works as expected and which one kills little kitten fluffykins.
of course, some of this uncertainty is also present when interpolating at the end. but the confusion is constrained to the interpolated sequence - it doesn't affect other arguments, like bug.
[i made a quick search to see if i could find anything official. it seems that the * prefix for varags was introduced in python 0.9.8. the previous syntax is discussed here and the rules for how it worked were rather complex. since the addition of extra arguments "had to" happen at the end when there was no * marker it seems like that simply carried over. finally there's a mention here of a long discussion on argument lists that was not by email.]
I suspect that it's for consistency with the star notation in function definitions, which is after all the model for the star notation in function calls.
In the following definition, the parameter *c will slurp all subsequent non-keyword arguments, so obviously when f is called, the only way to pass a value for d will be as a keyword argument.
def f(a, b, *c, d=1):
print "slurped", len(c)
(Such "keyword-only parameters" are only supported in Python 3. In Python 2 there is no way to assign values after a starred argument, so the above is illegal.)
So, in a function definition the starred argument must follow all ordinary positional arguments. What you observed is that the same rule has been extended to function calls. This way, the star syntax is consistent for function declarations and function calls.
Another parallelism is that you can only have one (single-)starred argument in a function call. The following is illegal, though one could easily imagine it being allowed.
f(*(1,2), *(3,4))
First of all, it is simple to provide a very similar interface yourself using a wrapper function:
def applylast(func, arglist, *literalargs):
return func(*(literalargs + arglist))
applylast(f, (1, 2), 3) # equivalent to f(3, 1, 2)
Secondly, enhancing the interpreter to support your syntax natively might add overhead to the very performance-critical activity of function application. Even if it only requires a few extra instructions in compiled code, due to the high usage of those routines, that might constitute an unacceptable performance penalty in exchange for a feature that is not called for all that often and easily accommodated in a user library.
Some observations:
Python processes positional arguments before keyword arguments (f(c=3, *(1, 2)) in your example still prints 1 2 3). This makes sense as (i) most arguments in function calls are positional and (ii) the semantics of a programming language need to be unambiguous (i.e., a choice needs to be made either way on the order in which to process positional and keyword arguments).
If we did have a positional argument to the right in a function call, it would be difficult to define what that means. If we call f(*(1, 2), 3), should that be f(1, 2, 3) or f(3, 1, 2) and why would either choice make more sense than the other?
For an official explanation, PEP 3102 provides a lot of insight on how function definitions work. The star (*) in a function definition indicates the end of position arguments (section Specification). To see why, consider: def g(a, b, *c, d). There's no way to provide a value for d other than as a keyword argument (positional arguments would be 'grabbed' by c).
It's important to realize what this means: as the star marks the end of positional arguments, that means all positional arguments must be in that position or to the left of it.
change the order:
def f(c,a,b):
print(a,b,c)
f(3,*(1,2))
If you have a Python 3 keyword-only parameter, like
def f(*a, b=1):
...
then you might expect something like f(*(1, 2), 3) to set a to (1 , 2) and b to 3, but of course, even if the syntax you want were allowed, it would not, because keyword-only parameters must be keyword-only, like f(*(1, 2), b=3). If it were allowed, I suppose it would have to set a to (1, 2, 3) and leave b as the default 1. So it's perhaps not syntactic ambiguity so much as ambiguity in what is expected, which is something Python greatly tries to avoid.

Categories