Is there an equivalent to R apply function in Python?

Is there an equivalent to R apply function in Python? - python

I am trying to find the Python equivalent to R's apply function but with multidimensional arrays.
For example, when called the following code:
z <- array(1, dim = 2:4)
apply(z, 1, sum)
The result is:
[1] 12 12
and when called with two values for margin:
apply(z, c(1,2), sum)
The result is:
[,1] [,2] [,3]
[1,] 4 4 4
[2,] 4 4 4
I found that the sum function in numpy can be used, but not in the same consistent way:
For example:
import numpy as np
xx= np.ones((2,3,4))
np.sum(xx,axis=(1,2))
The result is:
array([12., 12.])
but I can't find a function that equivalent to apply in its manner specifically when dealing with margin=c(1,2). Could anyone help?

The equivalent in NumPy is:
xx.sum(axis=2)
That is, you are summing over axis 2 (the last dimension), which as its length is 4, leaves the other two dimensions (2,3) as the shape of the result:
array([[4., 4., 4.],
[4., 4., 4.]])
Perhaps a more literal translation of your R code would be:
np.apply_over_axes(np.sum, xx, 2)
Which gives a similar result but transposed. This is likely to be slower, however, and is not idiomatic unless the actual operation you're performing is something more complicated than sum.

np.apply_over_axes is different from R's apply in several ways.
First, np.apply_over_axes needs collapsing axes to be specified,
whereas R's apply needs remaining axes to be specified.
Secondly, np.apply_over_axes applies function iteratively as the documentation stated below. The result is the same for np.sum but it could be different for other functions.
func is called as res = func(a, axis), where axis is the first element of axes. The result res of the function call must have either the same dimensions as a or one less dimension. If res has one less dimension than a, a dimension is inserted before axis. The call to func is then repeated for each axis in axes, with res as the first argument.
And the func for np.apply_over_axes needs to be in particular format and the return of func needs to be in particular shape for np.apply_over_axes to perform correctly.
Here's an example how np.apply_over_axes fails
>>> arr.shape
(5, 4, 3, 2)
>>> np.apply_over_axes(np.mean, arr, (0,1))
array([[[[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]]]])
>>> np_mean = lambda x: np.mean(x)
>>> np.apply_over_axes(np_mean, arr, (0,1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in apply_over_axes
File "/Users/kwhkim/opt/miniconda3/envs/rtopython2-pip/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 495, in apply_over_axes
res = func(*args)
TypeError: <lambda>() takes 1 positional argument but 2 were given
Since there seems to be no equivalent function in Python,
I made a function that is similar to R's apply
def np_apply(arr, axes_remain, fun, *args, **kwargs):
axes_remain = tuple(set(axes_remain))
arr_shape = arr.shape
axes_to_move = set(range(len(arr.shape)))
for axis in axes_remain:
axes_to_move.remove(axis)
axes_to_move = tuple(axes_to_move)
arr, axes_to_move
arr2 = np.moveaxis(arr, axes_to_move, [-x for x in list(range(1,len(axes_to_move)+1))]).copy()
#if arr2.flags.c_contiguous:
arr2 = arr2.reshape([arr_shape[x] for x in axes_remain]+[-1])
return np.apply_along_axis(fun, -1, arr2, *args, **kwargs)
It works fine at least for the sample example as above(not exactly the same as the result above but math.close() returns True for nearly all elements)
>>> np_apply(arr, (2,3), np.mean)
array([[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]])
>>> np_apply(arr, (2,3), np_mean)
array([[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]])
For the function to work smoothly for large multidimensional array,
it needs to be optimized. For instance,
array should be prevented from copying.
Anyway it seems to work as a proof-of-concept and I hope it helps.
PS)
arr is generated by arr = np.random.normal(0,1,(5,4,3,2))

Related

Computing weighted average on a deque using numpy

I have a deque object whose weighted average I would like to find. The deque object is a collection of (60 * 60 * 3) NumPy arrays (They are actually images stored in deque object). I would like to find the weighted average of all the elements(i.e images) in the deque object.
motion_buffer = deque(maxlen = 5)
motion_weights = [5./15, 4./15, 3./15, 2./15, 1./15]
# After adding few elements ( i.e images ) in motion_buffer. the following is done:
motion_avg = np.average(motion_buffer, weights=motion_weights)
I get the error:
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.
I understand there is a mismatch somewhere but supplying axis values ( as per docs) did not help me. I have tested it in the following manner:
>>> A = np.random.randn(4,4)
>>> weights = [1 , 4 ,6 ,7]
>>> buf = deque(maxlen=5)
>>> buf.appendleft(A)
>>> c = np.average(buf, weights=weights)
Traceback (most recent call last):
...
"Axis must be specified when shapes of a and weights "
TypeError: Axis must be specified when shapes of a and weights differ.
I have tried using np.average for deque object with 1d elements and it works.
How exactly should I modify my code, I experimented but it didn't work for me.

According to np.average documentation
weights : array_like, optional
    An array of weights associated with the values in a. Each value in
    a contributes to the average according to its associated weight.
    The weights array can either be 1-D (in which case its length must be
    the size of a along the given axis) or of the same shape as a.
you can't.
You can implement a workaround
av_average = np.average(np.average(your_deque, axis=(1,2,3)), weights=(5,4,3,2,1))
where you first (the inner average) mediate each 60×60×3matrix specifying the axis on which to sum and later using the weights to compute the weighted average of the averages.
The OP really wants this
average = np.average(the_deque, axis=0, weights=(…))
where (…) is a sequence whose length equals the current length of the deque.

One way which worked, is converting deque to numpy array and then
my_array = np.array(deque)
np.average(deque, axis=0, weights=weights)
helped with the only problem of increase of computational time.

In [1]: from collections import deque
In [2]: >>> A = np.random.randn(4,4)
...: >>> weights = [1 , 4 ,6 ,7]
...: >>> buf = deque(maxlen=5)
...: >>> buf.appendleft(A)
In [3]: buf
Out[3]:
deque([array([[ 1.10651806, -0.50125715, -0.35877456, 1.31969932],
[-0.4674734 , 0.25144544, -1.5392525 , 0.09607722],
[ 2.24245413, -1.09636901, 1.97502862, -0.90069983],
[ 0.61917197, -0.13276115, -0.1103521 , 0.56556319]])])
In [4]: np.array(buf)
Out[4]:
array([[[ 1.10651806, -0.50125715, -0.35877456, 1.31969932],
[-0.4674734 , 0.25144544, -1.5392525 , 0.09607722],
[ 2.24245413, -1.09636901, 1.97502862, -0.90069983],
[ 0.61917197, -0.13276115, -0.1103521 , 0.56556319]]])
In [5]: np.average(buf, weights=weights, axis=0)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-8a8c3dba2415> in <module>
----> 1 np.average(buf, weights=weights, axis=0)
<__array_function__ internals> in average(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in average(a, axis, weights, returned)
399 if wgt.shape[0] != a.shape[axis]:
400 raise ValueError(
--> 401 "Length of weights not compatible with specified axis.")
402
403 # setup wgt to broadcast along axis
ValueError: Length of weights not compatible with specified axis.
In [6]: _4.shape
Out[6]: (1, 4, 4)
Oops, I should have checked the shape of the array. For some reason, which I won't dig into Out[4] has an initial size 1 dimension.
In [7]: np.average(buf, weights=weights, axis=1)
Out[7]: array([[ 0.94586406, -0.38905653, 0.25344014, 0.01437508]])

Is vectorization of Components defined on scalars possible in OpenMDAO?

In the context of functional programming, a function that takes and returns a scalar can be mapped onto lists/vectors to return a list/vector of the mapped values. In regular Python, I would do this either functionally or in NumPy's vectorized fashion:
import numpy as np # NumPy import
xs = [1,2,3,4,5] # List of inputs
f = lambda x: x**2 # Some function, which could be the equivalent of an ExplicitComponent
list(map(f, xs)) # Functional style
[ f(x) for x in xs ] # List comprehension
f(np.array(xs)) # NumPy vectorized style
Is this type of behaviour attainable using Components? By this I mean a Component could take scalar inputs and perform like a normal function, but automatically performs the operations on vectors and lists of varying length as well.
I haven't been able to find anything similar in the documentation. I understand that most of the behaviour in OpenMDAO uses NumPy's vectorization for efficiency, but does this mean all components that could have vector inputs must be written using some kind of self.options.declare('num_nodes', default=1) method and passing a global state n for the number of nodes for lists/vectors of length/dimension n across all Components?
Regarding design considerations, I understand that vectorizations over Cartesian products of input vectors are not implemented by default in NumPy, and that they're more zip-like. But it does work like a partially-applied mapped function by default for a single NumPy array, e.g.:
>>> xs = [1,2,3,4,5]
>>> f = lambda x,y: x**2 + y**2
>>> f(np.array(xs), np.array([2,4,6,8]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
ValueError: operands could not be broadcast together with shapes (5,) (4,)
>>> f(np.array(xs), np.array([2,4,6,8,10]))
array([ 5, 20, 45, 80, 125])
>>> f(np.array(xs), 1)
array([ 2, 5, 10, 17, 26])
An alternative is to use NumPy's meshgrid() as follows:
xs, ys = [1,2,3,4,5], [2,4,6,8]
f = lambda x, y: x**2 + y**2
xys = np.meshgrid(xs, ys)
f(xys[0], xys[1])
So would something like this be more feasible (and desirable!) behaviour for Components?

As of OpenMDAO V3.1 the only way to do this in OpenMDAO is via an option/argument such as num_nodes or vec_size.
Other users have expressed an interest in allowing dynamically sized IO in components. For instance, basing he size of an input on the output to which it is connected. See https://github.com/OpenMDAO/POEMs/pull/51.
We're working on it, but we don't have a time table for when we'll find an acceptable solution at this time.

How to fix broadcasting issues with numpy.vectorize()

I am writing a custom function that I want to have behaving as if it where a numpy-function, having the ability to take in an array, and perform the same operation on every element of the input list, and returning a list of same shape with all the results.
Luckily, there is a solution for this: numpy.vectorize()
So I used that: I have a function that creates a model in the form of a sine wave, and it takes in two variables: one numpy list X containing some input values for the sine function, and one numpy list param that contains the four possible parameters that a sine curve can have.
import numpy as np
def sine(X, param):
#Unpacking param
A = param[0]
P = param[1]
Phi = param[2]
B = param[3]
#translating variables
#Phi = t0/P
f = X/P
Y = A*np.sin(2*np.pi*(f + Phi)) + B
return Y
Because only the input values X need the broadcasting while the all the parameters are necessary all the time, so, according to the documentation, the way to vecorise the function is as follows:
np_sine = np.vectorize(sine, excluded=['param']) #makes sine() behave like a numpy function
...so that param is properly excluded from vectorisation.
This method is useful, since I will be fitting this model to a dataset, which requires occasionally tweaking the parameters, meanwhile, with this method the code where I need it is only one line long:
CHIsqrt = np.sum(((ydata - np_sine(xdata, param))/yerr)**2)
where ydata, xdata and yerr are equally long lists of datapoints and where param is the list of four parameters.
Yet, the result was a broadcasting error:
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 2831, in _vectorize_call outputs = ufunc(*inputs)
ValueError: operands could not be broadcast together with shapes (500,) (4,)
Since the list param is 4 elements long, I get that the function ignored my commands to exclude it from vectorisation. That is a problem.
I tried specifying that the end result should be a ndArray, which did not change the error.
np_sine = np.vectorize(sine, excluded=['param'], otypes=[np.ndarray])
What would be the correct way to use this function?

You've specified excluded wrong.
In [270]: def foo(x, param):
...: a,b,c = param
...: return a*x
...:
In [271]: f = np.vectorize(foo, excluded=[1]) # specify by position
In [272]: f(np.arange(4),[1,3,2])
Out[272]: array([0, 1, 2, 3])
For a keyword arg:
In [277]: def foo(x, param=[0,0,0]):
...: a,b,c = param
...: return a*x
...:
In [278]: f = np.vectorize(foo, excluded=['param'])
In [279]: f(np.arange(4),param=[1,3,2])
Out[279]: array([0, 1, 2, 3])

Adding a New Column to an Empty NumPy Array

I'm trying to add a new column to an empty NumPy array and am facing some troubles. I've looked at a lot of other questions, but for some reason they don't seem to be helping me solve the problem I'm facing, so I decided to ask my own question.
I have an empty NumPy array such that:
array1 = np.array([])
Let's say I have data that is of shape (100, 100), and want to append each column to array1 one by one. However, if I do for example:
array1 = np.append(array1, some_data[:, 0])
array1 = np.append(array1, some_data[:, 1])
I noticed that I won't be getting a (100, 2) matrix, but a (200,) array. So I tried to specify the axis as
array1 = np.append(array1, some_data[:, 0], axis=1)
which produces a AxisError: axis 1 is out of bounds for array of dimension 1.
Next I tried to use the np.c_[] method:
array1 = np.c_[array1, somedata[:, 0]]
which gives me a ValueError: all the input array dimensions except for the concatenation axis must match exactly.
Is there any way that I would be able to add columns to the NumPy array sequentially?
Thank you.
EDIT
I learned that my initial question didn't contain enough information for others to offer help, and made this update to make up for the initial mistake.
My big objective is to make a program that selects features in a "greedy fashion." Basically, I'm trying to take the design matrix some_data, which is a (100, 100) matrix containing floating point numbers as entries, and fitting a linear regression model with an increasing number of features until I find the best set of features.
For example, since I have a total of 100 features, the first round would fit the model on each 100, select the best one and store it, then continue with the remaining 99.
That's what I'm trying to do in my head, but I got stuck from the beginning with the problem I mentioned.

You start with a (0,) array and (n,) shaped one:
In [482]: arr1 = np.array([])
In [483]: arr1.shape
Out[483]: (0,)
In [484]: arr2 = np.array([1,2,3])
In [485]: arr2.shape
Out[485]: (3,)
np.append uses concatenate (but with some funny business when axis is not provided):
In [486]: np.append(arr1, arr2)
Out[486]: array([1., 2., 3.])
In [487]: np.append(arr1, arr2,axis=0)
Out[487]: array([1., 2., 3.])
In [489]: np.concatenate([arr1, arr2])
Out[489]: array([1., 2., 3.])
And trying axis=1
In [488]: np.append(arr1, arr2,axis=1)
---------------------------------------------------------------------------
AxisError Traceback (most recent call last)
<ipython-input-488-457b8657453e> in <module>()
----> 1 np.append(arr1, arr2,axis=1)
/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py in append(arr, values, axis)
4526 values = ravel(values)
4527 axis = arr.ndim-1
-> 4528 return concatenate((arr, values), axis=axis)
AxisError: axis 1 is out of bounds for array of dimension 1
Look at the whole message - the error occurs in the concatenate step. You can't concatenate 1d arrays along axis=1.
Using np.append or even np.concatenate iteratively is slow (it creates a new array each time), and hard to initialize correctly. It is a poor substitute for the widely use list append-to-empty-list recipe.
np.c_ is also just a cover function for concatenate.
There isn't just one empty array. np.array([[]]) and np.array([[[]]]) also have 0 elements.
If you want to add a column to an array, you need to start with a 2d array, and the column also needs to be 2d.
Here's an example of a proper concatenation of 2 2d arrays:
In [490]: np.concatenate([ np.zeros((3,0),int), np.arange(3)[:,None]], axis=1)
Out[490]:
array([[0],
[1],
[2]])
column_stack is another cover function for concatenate that makes sure the inputs are 2d. But even with that getting an initial 'empty' array is tricky.
In [492]: np.column_stack([np.zeros(3,int), np.arange(3)])
Out[492]:
array([[0, 0],
[0, 1],
[0, 2]])
In [493]: np.column_stack([np.zeros((3,0),int), np.arange(3)])
Out[493]:
array([[0],
[1],
[2]])
np.c_ is a lot like column_stack, though implemented in a different way:
In [496]: np.c_[np.zeros(3,int), np.arange(3)]
Out[496]:
array([[0, 0],
[0, 1],
[0, 2]])
The basic message is, that when using np.concatenate you need to pay attention to dimensions. Its variants allow you to fudge things a bit, but you really need to understand that fudging to get things right, especially when starting from this poorly defined idea of a 'empty' array.

I usually use concatenate method and do it like this:
# Some stuff
alldata = None
....
array1 = np.random.random((100,1))
if alldata is None: alldata = array1
...
array2 = np.random.random((100,1))
alldata = np.concatenate((alldata,array2),axis=1)
In case, you are working with vectors:
alldata = None
....
array1 = np.random.random((100,))
if alldata is None: alldata = array1[:,np.newaxis]
...
array2 = np.random.random((100,))
alldata = np.concatenate((alldata,array2[:,np.newaxis]),axis=1)

Converting OpenCV SURF features to float32 arrays in Python

I extract the features with the compute() function and add them to a list. I then try to convert all the features to float32 using NumPy so that they can be used with OpenCV for classification. The error I am getting is:
ValueError: setting an array element with a sequence.
Not really sure what I can do about this. I am following a book and doing the same steps except they use HOS to extract the features. I am extracting the features and getting back matrices of inconsistent sizes and am not sure how I can make them all equal. Related code (which might have minor syntax errors cause I truncated it from the original code):
def get_SURF_feature_vector(area_of_interest, surf):
# Detect the key points in the image
key_points = surf.detect(area_of_interest);
# Create array of zeros with the same shape and type as a given array
image_key_points = np.zeros_like(area_of_interest);
# Draw key points on the image
image_key_points = cv2.drawKeypoints(area_of_interest, key_points, image_key_points, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
# Create feature discriptors
key_points, feature_descriptors = surf.compute(area_of_interest, key_points);
# Plot Image and descriptors
# plt.imshow(image_key_points);
# Return computed feature description matrix
return feature_descriptors;
for x in range(0, len(data)):
feature_list.append(get_SURF_feature_vector(area_of_interest[x], surf));
list_of_features = np.array(list_of_features, dtype = np.float32);

The error isn't specific to OpenCV at all, just numpy.
Your list feature_list contains different length arrays. You can't make a 2d array out of arrays of different sizes.
For e.g. you can reproduce the error really simply:
>>> np.array([[1], [2, 3]], dtype=np.float32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.
I'm assuming what you're expecting from the operation is to input [1], [1, 2] and be returned np.array([1, 2, 3]), i.e., concatenation (actually this is not what OP wants, see the comments under this post). You can use the np.hstack() or np.vstack() for those operations, just depending on the shape of your input. You can use np.concatenate() too with the axis argument but the stacking operations are more explicit for 2D/3D arrays.
>>> a = np.array([1], dtype=np.float32)
>>> b = np.array([2, 3, 4], dtype=np.float32)
>>> np.hstack([a, b])
array([1., 2., 3., 4.], dtype=float32)
Descriptors are listed vertically though, so they should be stacked vertically, not horizontally as above. Thus you can simply do:
list_of_features = np.vstack(list_of_features)
You don't need to specify dtype=np.float32 as the descriptors are np.float32 by default (also, vstack doesn't have a dtype argument so you'd have to convert it after the stacking operation).
If you instead want an 3D array, then you need the same number of features across all images so that it's an evenly filled 3D array. You could just fill up your feature vectors with placeholder values, like 0s or np.nan so that they're all the same length, and then you can group them together as you did originally.
>>> des1 = np.random.rand(500, 64).astype(np.float32)
>>> des2 = np.random.rand(200, 64).astype(np.float32)
>>> des3 = np.random.rand(400, 64).astype(np.float32)
>>> feature_descriptors = [des1, des2, des3]
So here each image's feature descriptors have a different number of features. You can find the largest one:
>>> max_des_length = max([len(d) for d in feature_descriptors])
>>> max_des_length
500
You can use np.pad() to pad each feature array with however many more values it needs to be the same size as your maximum size descriptor set.
Now this is a little unnecessary to do it all in one line, but whatever.
>>> feature_descriptors = [np.pad(d, ((0, (max_des_length - len(d))), (0, 0)), 'constant', constant_values=np.nan) for d in feature_descriptors]
The annoying argument here ((0, (max_des_length - len(d))), (0, 0)) is just saying to pad with 0 elements on the top, max_des_length - len(des) elements on the bottom, 0 on the left, 0 on the right.
As you can see here, I'm adding np.nan values to the arrays. If you left out the constant_values argument it defaults to 0. Lastly all you have to do is cast as a numpy array:
>>> feature_descriptors = np.array(feature_descriptors)
>>> feature_descriptors.shape
(3, 500, 64)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there an equivalent to R apply function in Python? - python

Related

Computing weighted average on a deque using numpy

Is vectorization of Components defined on scalars possible in OpenMDAO?

How to fix broadcasting issues with numpy.vectorize()

Adding a New Column to an Empty NumPy Array

Converting OpenCV SURF features to float32 arrays in Python

Categories

Resources