Is vectorization of Components defined on scalars possible in OpenMDAO? - python

In the context of functional programming, a function that takes and returns a scalar can be mapped onto lists/vectors to return a list/vector of the mapped values. In regular Python, I would do this either functionally or in NumPy's vectorized fashion:
import numpy as np # NumPy import
xs = [1,2,3,4,5] # List of inputs
f = lambda x: x**2 # Some function, which could be the equivalent of an ExplicitComponent
list(map(f, xs)) # Functional style
[ f(x) for x in xs ] # List comprehension
f(np.array(xs)) # NumPy vectorized style
Is this type of behaviour attainable using Components? By this I mean a Component could take scalar inputs and perform like a normal function, but automatically performs the operations on vectors and lists of varying length as well.
I haven't been able to find anything similar in the documentation. I understand that most of the behaviour in OpenMDAO uses NumPy's vectorization for efficiency, but does this mean all components that could have vector inputs must be written using some kind of self.options.declare('num_nodes', default=1) method and passing a global state n for the number of nodes for lists/vectors of length/dimension n across all Components?
Regarding design considerations, I understand that vectorizations over Cartesian products of input vectors are not implemented by default in NumPy, and that they're more zip-like. But it does work like a partially-applied mapped function by default for a single NumPy array, e.g.:
>>> xs = [1,2,3,4,5]
>>> f = lambda x,y: x**2 + y**2
>>> f(np.array(xs), np.array([2,4,6,8]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
ValueError: operands could not be broadcast together with shapes (5,) (4,)
>>> f(np.array(xs), np.array([2,4,6,8,10]))
array([ 5, 20, 45, 80, 125])
>>> f(np.array(xs), 1)
array([ 2, 5, 10, 17, 26])
An alternative is to use NumPy's meshgrid() as follows:
xs, ys = [1,2,3,4,5], [2,4,6,8]
f = lambda x, y: x**2 + y**2
xys = np.meshgrid(xs, ys)
f(xys[0], xys[1])
So would something like this be more feasible (and desirable!) behaviour for Components?

As of OpenMDAO V3.1 the only way to do this in OpenMDAO is via an option/argument such as num_nodes or vec_size.
Other users have expressed an interest in allowing dynamically sized IO in components. For instance, basing he size of an input on the output to which it is connected. See https://github.com/OpenMDAO/POEMs/pull/51.
We're working on it, but we don't have a time table for when we'll find an acceptable solution at this time.

Related

Is there an equivalent to R apply function in Python?

I am trying to find the Python equivalent to R's apply function but with multidimensional arrays.
For example, when called the following code:
z <- array(1, dim = 2:4)
apply(z, 1, sum)
The result is:
[1] 12 12
and when called with two values for margin:
apply(z, c(1,2), sum)
The result is:
[,1] [,2] [,3]
[1,] 4 4 4
[2,] 4 4 4
I found that the sum function in numpy can be used, but not in the same consistent way:
For example:
import numpy as np
xx= np.ones((2,3,4))
np.sum(xx,axis=(1,2))
The result is:
array([12., 12.])
but I can't find a function that equivalent to apply in its manner specifically when dealing with margin=c(1,2). Could anyone help?
The equivalent in NumPy is:
xx.sum(axis=2)
That is, you are summing over axis 2 (the last dimension), which as its length is 4, leaves the other two dimensions (2,3) as the shape of the result:
array([[4., 4., 4.],
[4., 4., 4.]])
Perhaps a more literal translation of your R code would be:
np.apply_over_axes(np.sum, xx, 2)
Which gives a similar result but transposed. This is likely to be slower, however, and is not idiomatic unless the actual operation you're performing is something more complicated than sum.
np.apply_over_axes is different from R's apply in several ways.
First, np.apply_over_axes needs collapsing axes to be specified,
whereas R's apply needs remaining axes to be specified.
Secondly, np.apply_over_axes applies function iteratively as the documentation stated below. The result is the same for np.sum but it could be different for other functions.
func is called as res = func(a, axis), where axis is the first element of axes. The result res of the function call must have either the same dimensions as a or one less dimension. If res has one less dimension than a, a dimension is inserted before axis. The call to func is then repeated for each axis in axes, with res as the first argument.
And the func for np.apply_over_axes needs to be in particular format and the return of func needs to be in particular shape for np.apply_over_axes to perform correctly.
Here's an example how np.apply_over_axes fails
>>> arr.shape
(5, 4, 3, 2)
>>> np.apply_over_axes(np.mean, arr, (0,1))
array([[[[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]]]])
>>> np_mean = lambda x: np.mean(x)
>>> np.apply_over_axes(np_mean, arr, (0,1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in apply_over_axes
File "/Users/kwhkim/opt/miniconda3/envs/rtopython2-pip/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 495, in apply_over_axes
res = func(*args)
TypeError: <lambda>() takes 1 positional argument but 2 were given
Since there seems to be no equivalent function in Python,
I made a function that is similar to R's apply
def np_apply(arr, axes_remain, fun, *args, **kwargs):
axes_remain = tuple(set(axes_remain))
arr_shape = arr.shape
axes_to_move = set(range(len(arr.shape)))
for axis in axes_remain:
axes_to_move.remove(axis)
axes_to_move = tuple(axes_to_move)
arr, axes_to_move
arr2 = np.moveaxis(arr, axes_to_move, [-x for x in list(range(1,len(axes_to_move)+1))]).copy()
#if arr2.flags.c_contiguous:
arr2 = arr2.reshape([arr_shape[x] for x in axes_remain]+[-1])
return np.apply_along_axis(fun, -1, arr2, *args, **kwargs)
It works fine at least for the sample example as above(not exactly the same as the result above but math.close() returns True for nearly all elements)
>>> np_apply(arr, (2,3), np.mean)
array([[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]])
>>> np_apply(arr, (2,3), np_mean)
array([[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]])
For the function to work smoothly for large multidimensional array,
it needs to be optimized. For instance,
array should be prevented from copying.
Anyway it seems to work as a proof-of-concept and I hope it helps.
PS)
arr is generated by arr = np.random.normal(0,1,(5,4,3,2))

scipy parallel interpolation of multiple arrays

I have multiple arrays of the same dimension, or rather a matrix say
data.shape
# (n, m)
I want to interpolate the m-axis and leave the n-axis. Ideally I would get a function which I can call by with an x-array of length n.
interpolated(x)
x.shape
# (n,)
I tried
from scipy import interpolate
interpolated = interpolate.interp1d(x=x_points, y=data)
interpolated(x).shape
# (n, n)
but this evaluates every array at the given point. Is there a better way to do it than ugly loops like
interpolated = array(interpolate.interp1d(x=x_points, y=array_) for
array_ in data)
array(func_(xi) for func_, xi in zip(interpolated, x))
Your (n,m)-shaped data is, as you said, is a collection of n datasets, each of length m. You're trying to pass this an n-length x array, and expect to obtain an n-length result. That is, you're querying the n independent datasets at n unrelated points.
This makes me believe that you need to use n independent interpolators. There is no real benefit in trying to get away with a single call to an interpolation routine. Interpolation routines as far as I know assume that the target of the interpolation is a single object. Either a multivariate function, or a function that has an array-shaped value; in either case you can query the function one (optionally higher-dimensional) point at a time. For instance, multilinear interpolation works across rows of the input, so there's (again, as far as I know) no way to "interpolate linearly along an axis". In your case, there is absolutely no relationship between the rows of your data, and there's no relationship between query points, so it's also semantically motivated to use n independent interpolators for your problem.
As for convenience, you can shove all those interpolating functions into a single function for ease of use:
interpolated = [interpolate.interp1d(x=x_points, y=array_) for
array_ in data]
def common_interpolator(x):
'''interpolate n separate datasets at n separate input points'''
return array([fun(xx) for fun,xx in zip(interpolated,x)])
This will allow you to use a single call to common_interpolator with an input array_like of length n.
But since you mentioned it in comments, you can actually make use of np.vectorize if you want to add multiple sets if query points to this function. Here's a complete example with three trivial dummy functions:
import numpy as np
# three scalar (well, or vectorized) functions:
funs = [lambda x,i=i: x+i for i in range(3)]
# define a wrapper for calling them together
def allfuns(xs):
'''bundled call to functions: n-length input to n-length output'''
return np.array([fun(x) for fun,x in zip(funs,xs)])
# define a vectorized version of the wrapper, (...,n) to (...,n)-shape
allfuns_vector = np.vectorize(allfuns,signature='(n)->(n)')
# print some examples
x = np.arange(3)
print([fun(xx) for fun,xx in zip(funs,x)])
# [0, 2, 4]
print(allfuns(x))
# [0 2 4]
print(allfuns_vector(x))
# [0 2 4]
print(allfuns_vector([x,x+10]))
#[[ 0 2 4]
# [10 12 14]]
As you can see, all of the above work the same way for a 1d input array. But we can pass a (k,n)-shaped array to the vectorized version and it will perform the interpolation row-wise, that is each [:,n] slice will be fed to the original interpolator bundle. As far as I know np.vectorize is essentially a wrapper for a for loop, but at least it makes calling your functions more convenient.

How does matplotlib accept function parameters? Are they lambdas?

The following syntax is very intuitive. Run in Spyder, and it plots a nonlinear function.
import numpy as numpy
import matplotlib.pyplot as plot
x = numpy.arange(0, 1, 0.01)
def nonlinear(x, deriv=False): #sigmoid
if (deriv==True):
return x*(1-x)
return 1/(1-numpy.exp(-x))
plot.plot(x, nonlinear(x))
My question is, how is the function nonlinear passed to plot.plot? Is it a lambda? How is nonlinear accepting an array without crashing when it does math ops?
It works fine because the usual arithmetic operations (e.g. / and - as you've used) are defined for numpy arrays; they're just performed element-wise. The same goes for np.exp(). You can see exactly what nonlinear(x) looks like for yourself (it's also a numpy array):
>>> import numpy as np
>>> def nonlinear(x): return 1/(1 + np.exp(-x))
...
>>> nonlinear(np.arange(0, 1, 0.1))
array([ 0.5 , 0.52497919, 0.549834 , 0.57444252, 0.59868766,
0.62245933, 0.64565631, 0.66818777, 0.68997448, 0.7109495 ])
You're just finding the value of the sigmoid evaluated at each point in the specified range, and passing those as the y-values to plot.
Python has special double underscore methods. e.g. __add__, __sub__, etc. https://docs.python.org/2/reference/datamodel.html has a more comprehensive list.
x + y is just x.__add__(y)
x * y is just x.__mul__(y)
Numpy makes use of these "magic" methods to implement point-wise arithmetic
The matplotlib plot function needs two lists (or numpy arrays) as arguments for x and y. As arshajii answered the syntax is vaild because the numpy array x is evaluated elementwise in the return statement of the nonlinear function (which is really nice).
However, in case the nonlinear function includes a case-by-case operation a numpy evaluation is not possible anymore (without some further numpy-magic). For example look at this continuously differentiable but non-smooth function:
from pylab import *
def nonlinear(x, x0=2):
return x**2 if x < x0 else 2*x0*(x - x0) + x0**2
x = linspace(0, 5, 100)
y = nonlinear(x)
The last line rises the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Instead use a list comprehension
y = [nonlinear(x_, x0=2.5) for x_ in x]
plot(x, y)
show()
which results in the following figure

apply a function on two ndarrays's corresponding rows at once(without for loop)

I would like to apply one function to two ndarray's corresponding elements at once without using a for loop. Let's say I have the following two ndarrays x and y and a function foo that takes in two 1d-arrays and compute the beta.
The end result I want is to compute beta00 = foo(x[0, 0],y[0]), beta01 = foo(x[0, 1], y[1]), beta10 = foo(x[1, 0],y[0]), beta11 = foo(x[1, 1], y[1]) and yield a expected result of
[[beta00, beta01],
[beta10, beta11]]
I have been looking into vectorize function and apply function, but still don't have a solution. Could someone help me on this? Many thanks in advance.
import numpy as np
x = np.array([[[0, 1, 2, 3], [0, 1, 2, 3]],
[[2,3,4,5], [2,3,4,5]]])
y = np.array([[-1, 0.2, 0.9, 2.1], [-1, 0.2, 0.9, 2.1]])
def foo(x,y):
A = np.vstack([x, np.ones(x.shape)]).T
return np.linalg.lstsq(A, y)[0][0]
So you want
beta[i,j] = foo(x[i,j,:], y[j,:])
Where foo takes 2 1d arrays, and returns a scalar. The explicit : make it clear that we are using 3 and 2 arrays.
np.vectorize will not help because its function must accept scalars, not arrays. And - it is not a speed solution. It just as nice way of enabling broadcasting, handling inputs with a variety of dimensions.
There looping wrappers like apply_along(over)_axis, but they are still Python level loops. The key to any real speedup will be reworking the foo so it operates on 2 or 3d arrays, not just 1d ones. But that may be more work than it's worth, or even impossible.
So for reference, any alternative must match:
beta = np.zeros(x.shape[:2])
for i in range(x.shape[0]):
for j in range(x.shape[1]):
beta[i,j] = foo(x[i,j,:],y[j,:])
An alternative way of generating the multidimensional indexes is:
for i,j in np.ndindex(x.shape[:2]):
beta[i,j] = foo(x[i,j,:], y[j,:])
but it's not a time saver.
Look into whether foo can be written to accept a 2d y,
foo(x[i,j,:], y[None,j,:])
aiming eventually to be able to do:
beta = foo1(x, y[None,:])

Is dot product and normal multiplication results of 2 numpy arrays same?

I am working with kernel PCA in Python and I have to find the values after projecting the original data to the principal components.I use the equation
fv = eigvecs[:,:ncomp]
print(len(fv))
td = fv.T * K.T
where K is the kernel matrix of dimension (150x150),ncomp is the number of principal components.The code works perfectly fine when fv has dimension (150x150).But when I select ncomp as 3 making fv to be of (150x3) as dimension,there occurs error stating operands could not be broadcast together.I referred various links and tried using dot products like
td=np.dot(fv.T,K.T).
I dont get any error now.But I dont know whether the values retrieved are correct or not...
Plz help...
The * operator depends on the data type. On Numpy arrays it does an element-wise multiplication (not the matrix multiplication); numpy.vdot() does the "dot" scalar product of two vectors (which returns a simple scalar result)
>>> import numpy as np
>>> x = np.array([[1,2,3]])
>>> np.vdot(x, x)
14
>>> x * x
array([[1, 4, 9]])
To multiply 2 arrays as matrices properly, use numpy.dot:
>>> np.dot(x, x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: objects are not aligned
>>> np.dot(x.T, x)
array([[ 1, 4, 9],
[ 4, 16, 36],
[ 9, 36, 81]])
>>> np.dot(x, x.T)
array([[98]])
Then there is numpy.matrix, a specialization of array for which the * means matrix multiplication, and ** means matrix power; so be sure to know what datatype you are operating on.
The upcoming Python 3.5 will have a new operator # that can be used for matrix multiplication; then you could write x # x.T to replace the code in the last example.

Categories