Applying a function along a numpy array - python

I've the following numpy ndarray.
[ -0.54761371 17.04850603 4.86054302]
I want to apply this function to all elements of the array
def sigmoid(x):
return 1 / (1 + math.exp(-x))
probabilities = np.apply_along_axis(sigmoid, -1, scores)
This is the error that I get.
TypeError: only length-1 arrays can be converted to Python scalars
What am I doing wrong.

Function numpy.apply_along_axis is not good for this purpose.
Try to use numpy.vectorize to vectorize your function: https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
This function defines a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns an single or tuple of numpy array as output.
import numpy as np
import math
# custom function
def sigmoid(x):
return 1 / (1 + math.exp(-x))
# define vectorized sigmoid
sigmoid_v = np.vectorize(sigmoid)
# test
scores = np.array([ -0.54761371, 17.04850603, 4.86054302])
print sigmoid_v(scores)
Output: [ 0.36641822 0.99999996 0.99231327]
Performance test which shows that the scipy.special.expit is the best solution to calculate logistic function and vectorized variant comes to the worst:
import numpy as np
import math
import timeit
def sigmoid_(x):
return 1 / (1 + math.exp(-x))
sigmoidv = np.vectorize(sigmoid_)
def sigmoid(x):
return 1 / (1 + np.exp(x))
print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(100)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(100)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(100)", number=25)
print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(1000)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(1000)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(1000)", number=25)
print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(10000)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(10000)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(10000)", number=25)
Results:
size vectorized numpy expit
N=100: 0.00179314613342 0.000460863113403 0.000132083892822
N=1000: 0.0122890472412 0.00084114074707 0.000464916229248
N=10000: 0.109477043152 0.00530695915222 0.00424313545227

Use np.exp and that will work on numpy arrays in a vectorized fashion:
>>> def sigmoid(x):
... return 1 / (1 + np.exp(-x))
...
>>> sigmoid(scores)
array([ 6.33581776e-01, 3.94391811e-08, 7.68673281e-03])
>>>
You will likely not get any faster than this. Consider:
>>> def sigmoid(x):
... return 1 / (1 + np.exp(-x))
...
And:
>>> def sigmoidv(x):
... return 1 / (1 + math.exp(-x))
...
>>> vsigmoid = np.vectorize(sigmoidv)
Now, to compare the timings. With a small (size 100) array:
>>> t = timeit.timeit("vsigmoid(arr)", "from __main__ import vsigmoid, np; arr = np.random.randn(100)", number=100)
>>> t
0.006894525984534994
>>> t = timeit.timeit("sigmoid(arr)", "from __main__ import sigmoid, np; arr = np.random.randn(100)", number=100)
>>> t
0.0007238480029627681
So, still an order-of-magnitude difference with small arrays. This performance differences stays relatively constant, with a 10,000 size array:
>>> t = timeit.timeit("vsigmoid(arr)", "from __main__ import vsigmoid, np; arr = np.random.randn(10000)", number=100)
>>> t
0.3823414359940216
>>> t = timeit.timeit("sigmoid(arr)", "from __main__ import sigmoid, np; arr = np.random.randn(10000)", number=100)
>>> t
0.011259705002885312
And finally with a size 100,000 array:
>>> t = timeit.timeit("vsigmoid(arr)", "from __main__ import vsigmoid, np; arr = np.random.randn(100000)", number=100)
>>> t
3.7680041620042175
>>> t = timeit.timeit("sigmoid(arr)", "from __main__ import sigmoid, np; arr = np.random.randn(100000)", number=100)
>>> t
0.09544878199812956

Just to clarify what apply_along_axis is doing, or not doing.
def sigmoid(x):
print(x) # show the argument
return 1 / (1 + math.exp(-x))
In [313]: np.apply_along_axis(sigmoid, -1,np.array([ -0.54761371 ,17.04850603 ,4.86054302]))
[ -0.54761371 17.04850603 4.86054302] # the whole array
...
TypeError: only length-1 arrays can be converted to Python scalars
The reason you get the error is that apply_along_axis passes a whole 1d array to your function. I.e. the axis. For your 1d array this is the same as
sigmoid(np.array([ -0.54761371 ,17.04850603 ,4.86054302]))
The apply_along_axis does nothing for you.
As others noted,switching to np.exp allows sigmoid to work with the array (with or without the apply_along_axis wrapper).

scipy already implements the function
Luckily, Python allows us to rename things upon import:
from scipy.special import expit as sigmoid

Related

How to use math function in Python

How to execute this code:
import numpy as np
import math
x = np.arange(1,9, 0.5)
k = math.cos(x)
print(x)
I got an error like this:
TypeError: only size-1 arrays can be converted to Python scalars
Thank you in advance.
So this is happening because math.cos doesn't accept numpy arrays larger than size 1. That's why if you had a np array of size 1, your approach would still work.
A simpler way you can achieve the result is to use np.cos(x) directly:
import numpy as np
x = np.arange(1,9, 0.5)
k = np.cos(x)
print(x)
print(k)
If you have to use the math module, you can try iterating through the array and applying math.cos to each member of the array:
import numpy as np
import math
x = np.arange(1,9,0.5)
for item in x:
k = math.cos(item)
print(k) # or add to a new array/list
You're looking for something like this?
import numpy as np
import math
x = np.arange(1,9, 0.5)
for ang in x:
k = math.cos(ang)
print(k)
You are trying to pass ndarray (returned by arange) to a function, which expects just real number. Use np.cos instead.
If you want pure-Python:
You can use math.fun in map like below:
import math
x = range(1,9)
print(list(map(math.cos, x)))
Output:
[0.5403023058681398, -0.4161468365471424, -0.9899924966004454, -0.6536436208636119, 0.2836621854632263, 0.9601702866503661, 0.7539022543433046, -0.14550003380861354]

TabPy - No return value

I am working in TabPy inside Tableau and want to perform normal statistical calculations.
I am stuck with Cp calculation. Here is the code that I wrote -
SCRIPT_REAL("
import pandas as pd
import numpy as np
from scipy import stats
# Calculate Cp
def Cp(list,_arg2,_arg3):
arr = np.array(list)
arr = arr.ravel()
sigma = np.std(arr)
Cp = float(_arg2 - _arg3) / (6*sigma)
return Cp
",FLOAT([USL - Param]), FLOAT([LSL - Param]))
The error that I am getting is -
No Return Value
although I am clearly returning Cp. What could be the issue?
Please help.
Something like the below would solve some of the issues you're seeing.
I haven't checked the validity of your Cp function, and whether this would work with lists or single values.
SCRIPT_REAL("
import pandas as pd
import numpy as np
from scipy import stats
# Define Cp
def Cp(argu_1,argu_2):
arr = np.array(list)
arr = arr.ravel()
sigma = np.std(arr)
Cp_value = float(argu_1 - argu_2) / (6*sigma)
return Cp_value
# Call function with variables from Tableau, and return the Cp_value
return Cp(<Argument 1>, <Argument 2>)
",FLOAT([USL - Param]), FLOAT([LSL - Param]))

using pandas dataframe to set indices in numpy array

I have a pandas dataframe with indices to a numpy array. The value of the array has to be set to 1 for those indices. I need to do this millions of times on a big numpy array. Is there a more efficient way than the approach shown below?
from numpy import float32, uint
from numpy.random import choice
from pandas import DataFrame
from timeit import timeit
xy = 2000,300000
sz = 10000000
ind = DataFrame({"i":choice(range(xy[0]),sz),"j":choice(range(xy[1]),sz)}).drop_duplicates()
dtype = uint
repeats = 10
#original (~21s)
stmt = '''\
from numpy import zeros
a = zeros(xy, dtype=dtype)
a[ind.values[:,0],ind.values[:,1]] = 1'''
print(timeit(stmt, "from __main__ import xy,sz,ind,dtype", number=repeats))
#suggested by #piRSquared (~13s)
stmt = '''\
from numpy import ones
from scipy.sparse import coo_matrix
i,j = ind.i.values,ind.j.values
a = coo_matrix((ones(i.size, dtype=dtype), (i, j)), dtype=dtype).toarray()
'''
print(timeit(stmt, "from __main__ import xy,sz,ind,dtype", number=repeats))
I have edited the above post to show the approach(es) suggested by #piRSquared and re-wrote it to allow an apples-to-apples comparison. Irrespective of the data type (tried uint and float32), the suggested approach has a 40% reduction in time.
OP time
56.56 s
I can only marginally improve with
i, j = ind.i.values, ind.j.values
a[i, j] = 1
New Time
52.19 s
However, you can considerably speed this up by using scipy.sparse.coo_matrix to instantiate a sparse matrix and then convert it to a numpy.array.
import timeit
stmt = '''\
import numpy, pandas
from scipy.sparse import coo_matrix
xy = 2000,300000
sz = 10000000
ind = pandas.DataFrame({"i":numpy.random.choice(range(xy[0]),sz),"j":numpy.random.choice(range(xy[1]),sz)}).drop_duplicates()
################################################
i, j = ind.i.values, ind.j.values
dtype = numpy.uint8
a = coo_matrix((numpy.ones(i.size, dtype=dtype), (i, j)), dtype=dtype).toarray()'''
timeit.timeit(stmt, number=10)
33.06471237000369

NLopt minimize eigenvalue, Python

I have matrices where elements can be defined as arithmetic expressions and have written Python code to optimise parameters in these expressions in order to minimize particular eigenvalues of the matrix. I have used scipy to do this, but was wondering if it is possible with NLopt as I would like to try a few more algorithms which it has (derivative free variants).
In scipy I would do something like this:
import numpy as np
from scipy.linalg import eig
from scipy.optimize import minimize
def my_func(x):
y, w = x
arr = np.array([[y+w,-2],[-2,w-2*(w+y)]])
ev, ew=eig(arr)
return ev[0]
x0 = np.array([10, 3.45]) # Initial guess
minimize(my_func, x0)
In NLopt I have tried this:
import numpy as np
from scipy.linalg import eig
import nlopt
def my_func(x,grad):
arr = np.array([[x[0]+x[1],-2],[-2,x[1]-2*(x[1]+x[0])]])
ev, ew=eig(arr)
return ev[0]
opt = nlopt.opt(nlopt.LN_BOBYQA, 2)
opt.set_lower_bounds([1.0,1.0])
opt.set_min_objective(my_func)
opt.set_xtol_rel(1e-7)
x = opt.optimize([10.0, 3.5])
minf = opt.last_optimum_value()
print "optimum at ", x[0],x[1]
print "minimum value = ", minf
print "result code = ", opt.last_optimize_result()
This returns:
ValueError: nlopt invalid argument
Is NLopt able to process this problem?
my_func should return double, posted sample return complex
print(type(ev[0]))
None
<class 'numpy.complex128'>
ev[0]
(13.607794065928395+0j)
correct version of my_func:
def my_func(x, grad):
arr = np.array([[x[0]+x[1],-2],[-2,x[1]-2*(x[1]+x[0])]])
ev, ew=eig(arr)
return ev[0].real
updated sample returns:
optimum at [ 1. 1.]
minimum value = 2.7015621187164243
result code = 4

what is this error: 'matrix' object has no attribute 'diff'

when i am trying to run this program it gives Attribute error.
I am new to python so please forgive if i miss anything.Thanks
import math
import numpy as np
from sympy import *
from sympy import diff
import sympy as sp
p=np.matrix([[0],[0],[0],[1]])
pdash=p
zi=Matrix(2, 1, lambda i,j: Symbol('z%d' % (i+1)))
xi=Matrix(2, 1, lambda i,j: Symbol('x%d' % (i+1)))
alphai=Matrix(2,1, lambda i,j: Symbol('a%d' % (i+1)))
thetai=Matrix(2,1, lambda i,j: Symbol('t%d' % (i+1)))
transformed=np.matrix([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]])
def transformation_fn(zi,xi,thetai,alphai):
ca=cos((alphai))
sa=sin((alphai))
ct=cos((thetai))
st=sin((thetai))
transformation=np.matrix([[ct,-st*ca,st*sa,xi*ct],
[st,ct*ca,-ct*sa,xi*st],
[0,sa,ca,zi],
[0,0,0,1]])
return transformation
for z,x,t,a in zip(zi,xi,thetai,alphai):
transformed=transformed*transformation_fn(z,x,t,a)
e=transformed*p
jacobian=e.diff(t1)
print jacobian
I also tried with a sample code if the diff() works or not it worked in this case
import math
import numpy as np
from sympy import *
from sympy import diff
import sympy as sp
x, y, e1 = symbols('x y e1')
e=Matrix(2,1,lambda i,j:Symbol('e%d'%(i+1)))
I=np.matrix([[1 ,0 ],
[0 ,1 ]])
k=I*e
print k.diff(e1)
As was said in comments, symbolic differentiation cannot be applied to a SymPy matrix object. Apply it to each entry separately. Example:
t1 = Symbol('t1')
jacobian = Matrix(*e.shape, lambda i,j: e[i,j].diff(t1))
The second line constructs a matrix of the same shape as e, in which the entries are the derivatives of the entries of e with respect to t1.
(You never actually defined t1 in the code, which made the first line here necessary.)

Categories