I've been using scientific python (via from pylab import *) on and off for a while as a free Matlab substitute, mainly for work in chemical engineering (I'm a fan of the IPython Notebook). Something that's always appeared strange to me is the fact that there's generally two different ways to interact with an object. For instance, if I have an array, I can tell its dimensions in two ways:
a = array([1,2,3],[2,3,4])
There's the 'Matlab' way:
shape(a)
Or instead I could find it by typing:
a.shape
This seems to contradict The Zen of Python: "There should be one-- and preferably only one --obvious way to do it"
I'm just wondering why there's multiple ways of doing the same thing, and which practice is more fundamental/natural for the language and would be better to use in the long run.
Using the method is preferable. After all, the implementation of shape simply defers to the method anyway (from /numpy/core/fromnumeric.py):
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
I assume a lot of this pylab stuff is just included to help ease the transition for people coming from MATLAB. Get used to it because there are many more examples of numpy being, ahem, not very pythonic.
When you get more used to python and matplotlib you will likely want to ditch the from pylab import * anyway and start writing more numpythonic code, rather than MATLAB style work.
It mostly comes down to a matter of preference, but there are a few differences that you might want to be aware of. First off, you should be using numpy.shape(a) or np.shape(a) instead of shape(a), that's because "Namespaces are one honking great idea -- let's do more of those!" But really, numpy has several names you'll likely find in other python modules, ie array appears as array.array in python stdlib, numpy.array and numpy.ma.array, so to avoid confusing other (and yourself) just go ahead and avoid import the entire numpy namespace.
Other than that, turns out that numpy.shape, and most other similar functions, just look for a shape attribute/method on the argument and if they don't find one, they try and convert the argument to an array. Here is the code:
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
This can be useful if you want the shape of an "array_like" object, you'll notice that most numpy functions take "array_like" arguments. But it can be slow if you're doing something like:
shape = np.shape(list_of_lists)
mx = np.max(list_of_lists)
mn = np.min(list_of_lists)
Other than that, they're pretty much the same.
Related
I would like to know why these two "programs" produce different output
f(x)=x^2
f(90).mod(7)
and
def f(x):
return(x^2)
f(90).mod(7)
Thanks
Great question! Let's take a deeper look at the functions in question.
f(x)=x^2
def g(x):
return(x^2)
print type(g(90))
print type(f(90))
This yields
<type 'sage.rings.integer.Integer'>
<type 'sage.symbolic.expression.Expression'>
So what you are seeing is the difference between a symbolic function defined with the f(x) notation and a Python function using the def keyword. In Sage, the former has access to a lot of stuff (e.g. calculus) that plain old Sage integers won't have.
What I would recommend in this case, just for what you need, is
sage: a = f(90)
sage: ZZ(a).mod(7)
1
or actually the possibly more robust
sage: mod(a,7)
1
Longer explanation.
For symbolic stuff, mod isn't what you think. In fact, I'm not sure it will do anything (see the documentation for mod to see how to use it for polynomial modular work over ideals, though). Here's the code (accessible with x.mod??, documentation accessible with x.mod?):
from sage.rings.ideal import is_Ideal
if not is_Ideal(I) or not I.ring() is self._parent:
I = self._parent.ideal(I)
#raise TypeError, "I = %s must be an ideal in %s"%(I, self.parent())
return I.reduce(self)
And it turns out that for generic rings (like the symbolic 'ring'), nothing happens in that last step:
return f
This is why we need to, one way or another, ask it to be an integer again. See Trac 27401.
I've tried searching quite a lot on this one, but being relatively new to python I feel I am missing the required terminology to find what I'm looking for.
I have a function:
def my_function(x,y):
# code...
return(a,b,c)
Where x and y are numpy arrays of length 2000 and the return values are integers. I'm looking for a shorthand (one-liner) to loop over this function as such:
Output = [my_function(X[i],Y[i]) for i in range(len(Y))]
Where X and Y are of the shape (135,2000). However, after running this I am currently having to do the following to separate out 'Output' into three numpy arrays.
Output = np.asarray(Output)
a = Output.T[0]
b = Output.T[1]
c = Output.T[2]
Which I feel isn't the best practice. I have tried:
(a,b,c) = [my_function(X[i],Y[i]) for i in range(len(Y))]
But this doesn't seem to work. Does anyone know a quick way around my problem?
my_function(X[i], Y[i]) for i in range(len(Y))
On the verge of crossing the "opinion-based" border, ...Y[i]... for i in range(len(Y)) is usually a big no-no in Python. It is even a bigger no-no when working with numpy arrays. One of the advantages of working with numpy is the 'vectorization' that it provides, and thus pushing the for loop down to the C level rather than the (slower) Python level.
So, if you rewrite my_function so it can handle the arrays in a vectorized fashion using the multiple tools and methods that numpy provides, you may not even need that "one-liner" you are looking for.
Numpy provides both np.absolute and the alias np.abs defined via
from .numeric import absolute as abs
which seems to be in obvious violation of the zen of python:
There should be one-- and preferably only one --obvious way to do it.
So I'm guessing that there is a good reason for this.
I have personally been using np.abs in almost all of my code and looking at e.g. the number of search results for np.abs vs np.absolute on Stack Overflow it seems like an overwhelming majority does the same (2130 vs 244 hits).
Is there any reason i should preferentially use np.absolute over np.abs in my code, or should I simply go for the more "standard" np.abs?
It's likely because there a built-in functions with the same name, abs. The same is true for np.amax, np.amin and np.round_.
The aliases for the NumPy functions abs, min, max and round are only defined in the top-level package.
So np.abs and np.absolute are completely identical. It doesn't matter which one you use.
There are several advantages to the short names: They are shorter and they are known to Python programmers because the names are identical to the built-in Python functions. So end-users have it easier (less to type, less to remember).
But there are reasons to have different names too: NumPy (or more generally 3rd party packages) sometimes need the Python functions abs, min, etc. So inside the package they define functions with a different name so you can still access the Python functions - and just in the top-level of the package you expose the "shortcuts". Note: Different names are not the only available option in that case: One could work around that with the Python module builtins to access the built-in functions if one shadowed a built-in name.
It might also be the case (but that's pure speculation on my part) that they originally only included the long-named functions absolute (and so on) and only added the short aliases later. Being a large and well-used library the NumPy developers don't remove or deprecate stuff lightly. So they may just keep the long names around because it could break old code/scripts if they would remove them.
There also is Python's built-in abs(), but really all those functions are doing the same thing. They're even exactly equally fast! (This is not the case for other functions, like max().)
Code to reproduce the plot:
import numpy as np
import perfplot
def np_absolute(x):
return np.absolute(x)
def np_abs(x):
return np.abs(x)
def builtin_abs(x):
return abs(x)
b = perfplot.bench(
setup=np.random.rand,
kernels=[np_abs, np_absolute, builtin_abs],
n_range=[2 ** k for k in range(25)],
xlabel="len(data)",
)
b.save("out.png")
b.show()
I am vectorizing a test in Numpy for the following idea: perform elementwise some test and pick expr1 or expr2 according to the test. This is like the ternary-operator in C: test?expr1:expr2
I see two major ways for performing that; I would like to know if there is a good reason to choose one rather than the other one; maybe also other tricks are available and I would be very happy to know about them. Main goal is speed; for that reason I don't want to use np.vectorize with an if-else statement.
For my example, I will re-build the min function; please, don't tell me about some Numpy function for computing that; this is a mere example!
Idea 1: Use the arithmetic value of the booleans in a multiplication:
# a and b have similar shape
test = a < b
ntest = np.logical_not(test)
out = test*a + ntest*b
Idea 2: More or less following the APL/J style of coding (by using the conditional expression as an index for an array made with one dimension more than initial arrays).
# a and b have similar shape
np.choose(a<b, np.array([b,a]))
This is a better way to use choose
np.choose(a<b, [b,a])
In my small timings it is faster. Also the choose doc says Ifchoicesis itself an array (not recommended), ....
(a<b).choose([b,a])
saves one level of function redirection.
Another option:
out = b.copy(); out[test] = a[test]
In quick tests this actually faster. masked.filled uses np.copyto for this sort of 'where' copy, though it doesn't seem to be any faster.
A variation on the choose is where:
np.where(test,a,b)
Or use where (or np.nonzero) to convert boolean index to a numeric one:
I = np.where(test); out = b.copy(); out[I] = a[I]
For some reason this times faster than the one-piece where.
I've used the multiplication approach in the past; if I recall correctly even with APL (though that's decades ago). An old trick to avoid divide by 0 was to add n==0, a/(b+(b==0)). But it's not as generally applicable. a*0, a*1 have to make sense.
choose looks nice, but with the mode parameter may be more powerful (and hence complicated) that needed.
I'm not sure there is a 'best' way. timing tests can evaluate certain situations, but I don't know where they can be generalized across all cases.
I'm trying to subclass numpy.complex64 in order to make use of the way numpy stores the data, (contiguous, alternating real and imaginary part) but use my own __add__, __sub__, ... routines.
My problem is that when I make a numpy.ndarray, setting dtype=mysubclass, I get a numpy.ndarray with dtype='numpy.complex64' in stead, which results in numpy not using my own functions for additions, subtractions and so on.
Example:
import numpy as np
class mysubclass(np.complex64):
pass
a = mysubclass(1+1j)
A = np.empty(2, dtype=mysubclass)
print type(a)
print repr(A)
Output:
<class '__main__.mysubclass'>
array([ -2.07782988e-20 +4.58546896e-41j, -2.07782988e-20 +4.58546896e-41j], dtype=complex64)'
Does anyone know how to do this?
Thanks in advance - Soren
The NumPy type system is only designed to be extended from C, via the PyArray_RegisterDataType function. It may be possible to access this functionality from Python using ctypes but I wouldn't recommend it; better to write an extension in C or Cython, or subclass ndarray as #seberg describes.
There's a simple example dtype in the NumPy source tree: newdtype_example/floatint.c. If you're into Pyrex, reference.pyx in the pytables source may be worth a look.
Note that scalars and arrays are quite different in numpy. np.complex64 (this is 32-bit float, just to note, not double precision). You will not be able to change the array like that, you will need to subclass the array instead and then override its __add__ and __sub__.
If that is all you want to do, it should just work otherwise look at http://docs.scipy.org/doc/numpy/user/basics.subclassing.html since subclassing an array is not that simple.
However if you want to use this type also as a scalar. For example you want to index scalars out, it gets more difficult at least currently. You can get a little further by defining __array_wrap__ to convert to scalars to your own scalar type for some reduce functions, for indexing to work in all cases it appears to me that you may have define your own __getitem__ currently.
In all cases with this approach, you still use the complex datatype, and all functions that are not explicitly overridden will still behave the same. #ecatmur mentioned that you can create new datatypes from the C side, if that is really what you want.