numpy ndarrays: row-wise and column-wise operations - python

If I wanted to apply a function row-wise (or column-wise) to an ndarray, do I look to ufuncs (doesn't seem like it) or some type of array broadcasting (not what I'm looking for either?) ?
Edit
I am looking for something like R's apply function. For instance,
apply(X,1,function(x) x*2)
would multiply 2 to each row of X through an anonymously defined function, but could also be a named function. (This is of course a silly, contrived example in which apply is not actually needed). There is no generic way to apply a function across an NumPy array's "axis", ?

First off, many numpy functions take an axis argument. It's probably possible (and better) to do what you want with that sort of approach.
However, a generic "apply this function row-wise" approach would look something like this:
import numpy as np
def rowwise(func):
def new_func(array2d, **kwargs):
# Run the function once to determine the size of the output
val = func(array2d[0], **kwargs)
output_array = np.zeros((array2d.shape[0], val.size), dtype=val.dtype)
output_array[0] = val
for i,row in enumerate(array2d[1:], start=1):
output_array[i] = func(row, **kwargs)
return output_array
return new_func
#rowwise
def test(data):
return np.cumsum(data)
x = np.arange(20).reshape((4,5))
print test(x)
Keep in mind that we can do exactly the same thing with just:
np.cumsum(x, axis=1)
There's often a better way that the generic approach, especially with numpy.
Edit:
I completely forgot about it, but the above is essentially equivalent to numpy.apply_along_axis.
So, we could re-write that as:
import numpy as np
def test(row):
return np.cumsum(row)
x = np.arange(20).reshape((4,5))
print np.apply_along_axis(test, 1, x)

Related

Pandas appy alternative

I am looking to use option #2 to get the result of option #1.
import pandas as pd
df=pd.DataFrame(np.arange(50), columns=['A'])
def test(x):
v=30
if v>x:
return(x)
#option 1
df['A'].apply(lambda x: test(x))
#option 2
test(df['A'])
The error message that I get when I run your code says:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
The problem is that a is an array and v a single value, so there is no single truth value in the comparison. If you intention is to check if v is greater than all numbers in a, use np.all(v>a). If you want to check if v is greater than just some use np.any(v>a).
On Edit
You have now edited your question so much that it is now a new question. The entire point of the apply method is that if f is a Python function and v is a numpy array, then f(v) is probably not the array that you would get by applying f to the elements of v. Python is not a language that directly supports vectorized calculations. The reason that it sometimes seems that computations in numpy or pandas are as easy to vectorize as similar calculations in e.g. R is because of the way Python's duck-typing works. If a class defines the magic method __add__ then you can use + to add elements of that class to each other in any way that you want. This is exactly what the people who created numpy have done (as well as other magic methods for things like *,/,< etc.) So, if a function definition is something like def f(x): return x*x + 2*x + 3 where all the computational steps correspond to magic methods, then v.apply(f) and f(v) will work the same. Your test function uses the keyword if. There is not a magic method which can convert that part of the core language into something else.

Creating a function in Python which runs over a range and returns a new value to an array each time

Basically, what I'm trying to create is a function which takes an array, in this case:
numpy.linspace(0, 0.2, 100)
and runs a lot of other code for each of the elements in the array and at the end creates a new array with one a number for each of the calculations for each element. A simple example would be that the function is doing a multiplication like this:
def func(x):
y = x * 10
return (y)
However, I want it to be able to take an array as an argument and return an array consisting of each y for each multiplication. The function above works for this, but the one I've tried creating for my code doesn't work with this method and only returns one value instead. Is there another way to make the function work as intended? Thanks for the help!
You could use this simple code:
def func(x):
y = []
for i in x:
y.append(i*10)
return y
Maybe take a look at np.vectorize:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.vectorize.html
np.vectorize can for example be used as a decorator:
#np.vectorize
def func(value):
...
return return_value
The function to be vectorized (here func) has to be a function,
that takes a value as input and returns a value.
This function then gets vectorized over the whole array.
It is mentioned in the documentation, but it cant hurt to emphasize it here:
In general this function is only used for convenience not for performance,
it is basically equivalent to using a for-loop.
If you are able to build up your function from numpys ufuncs like (np.add, np.mean, etc.) this will likely be much faster.
Or you could write your own:
https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html
You can do this with numpy already with your function. For example, the code below will do what you want:
x = numpy.linspace(0, 0.2, 100)
y = x*10
If you defined x as above and passed it to your function it would perform exactly as you want.

Use of numpy fromfunction

im trying to use fromfunction to create a 5x5 matrix with gaussian values of mu=3 and sig=2, this is my attempt :
from random import gauss
import numpy as np
np.fromfunction(lambda i,j: gauss(3,2), (5, 5))
this is the result : 5.365244570434782
as i understand from the docs this should have worked, but i am getting a scalar instead of 5x5 matrix... why? and how to fix this?
The numpy.fromfunction docs are extremely misleading. Instead of calling your function repeatedly and building an array from the results, fromfunction actually only makes one call to the function you pass it. In that one call, it passes a number of index arrays to your function, instead of individual indices.
Stripping out the docstring, the implementation is as follows:
def fromfunction(function, shape, **kwargs):
dtype = kwargs.pop('dtype', float)
args = indices(shape, dtype=dtype)
return function(*args,**kwargs)
That means unless your function broadcasts, numpy.fromfunction doesn't do anything like what the docs say it does.
I know this is an old post, but for anyone stumbling upon this, the reason why it didn't work is, the expression inside lambda is not making use of the i, j variables
what you need could you achieved like this:
np.zeros((5, 5)) + gauss(3, 2)

Calculating mean or median in one function in numpy

I have function that is supposed to compute, depending on user input, either the mean or median of a numpy.array. I have written it like this
import numpy as np
...
if input=='means':
return np.mean(matrix, axis=1)
if input=='median':
return np.median(matrix, axis=1)
But this seems kinda cumbersome. I figured there might be a standard numpy function that takes the array as well as the operation as input. I'm thinking something similar to R's tapply(X, Y, FUNCTION=Z) where Z can be any kind of function. But I could not find anything in the docs or on the Google...
Is there something like this in Numpy?
If there is no specific function, can someone think of a nice way of
doing this?
Thanks!
If your input string is mean instead of means, you could do:
return getattr(np, input)(matrix, axis=1)
Here the getattr call grabs the function you want from the numpy library. Then the second set of parentheses calls that function.
I don't think you need something specific to NumPy.
For example:
def myFunc(matrix, func, axis=1):
return func(matrix)
Then, to use the function:
import numpy as np
#Create random matrix (10, 10)
mat = np.random.randint(100, size=(10, 10))
print myFunc(mat, np.mean)

Applying a function element-wise to multiple numpy arrays

Say I have two numpy arrays of the same dimensions, e.g.:
a = np.ones((4,))
b = np.linspace(0,4,4)
and a function that is supposed to operate on elements of those arrays:
def my_func (x,y):
# do something, e.g.
z = x+y
return z
How can I apply this function to the elements of a and b in an element-wise fashion and get the result back?
It depends, really. For the given function; how about 'a+b', for instance? Presumably you have something more complex in mind though.
The most general solution is np.vectorize; but its also the slowest. Depending on what you want to do, more clever solutions may exist though. Take a look at numexp for example.

Categories