average computation on Python - python

I have this function which computes the average of y value for the same x, but it doesn't work when I have (x +/- eps).
import numpy as np
import matplotlib.pyplot as plt
from uncertainties import ufloat
from uncertainties.umath import *
x = np.array([0, 0,1,1,2,2,2], float)
y = np.array([1, 2, 3,5,4, 4, 6.8], float)
def avg_group(x, y):
A, ind, counts = np.unique(x, return_index=True, return_counts=True)
B = y[ind]
for dup in A[counts>1]:
B[(A==dup)] = np.average(y[(x==dup)] )
return A, B
new_x, new_y = avg_group(x, y)
plt.plot(new_x,new_y,'o')
plt.show()
How can I add a condition into avg_average to get the average of y (for an x+/- eps)?

I dont think there is an in-built function for doing this. Even if there was, there is a little bit of ambiguity in the question. Consider a sequence like this:
x = x1, x1+eps, x1 + 2eps...
x1 is close to x1+eps (ie within eps) but not close to x1+2eps. x1+eps is close to both x1 and x1+2eps. Wo which x's do we treat as "same"? x1 and x1+eps can be treated as same but so can x1+eps and x2+eps. For the sake of further discussion, I will that we will have two different x values in this case: x1 and x1+2eps.
Assuming the above, we can iterate over sorted copy of x, and for each value of x, we check if it is close enough to previous value. If it is, we group it with the previous value else create a new entry.
I extended your code to implement the above.
import numpy as np
import matplotlib.pyplot as plt
eps = 1e-5
x = np.array([0, 0,1,1,2,2,2], float)
y = np.array([1, 2, 3,5,4, 4, 6.8], float)
def avg_group(x, y):
# sort by x-vales
sorted_indices = np.argsort(x)
x = x[sorted_indices]
y = y[sorted_indices]
sum_and_count = [(x[0], y[0], 1)]
# we are maintaining a list of tupls of (x, sum of y, count of y)
for index, (current_x, current_y) in enumerate(zip(x[1:], y[1:]), 1):
# check if this x value is eps close to the previous x value
previous_x = x[index-1]
if current_x - previous_x <= eps:
# This entry belongs to the previous tuple
prev_tuple = sum_and_count[-1]
new_tuple = (prev_tuple[0], prev_tuple[1]+current_y, prev_tuple[2]+1)
sum_and_count[-1] = new_tuple
else:
# insert a new tuple
new_tuple = (current_x, current_y, 1)
sum_and_count.append(new_tuple)
x, sum_y, count_y = zip(*sum_and_count)
return np.asarray(x), np.asarray(sum_y) / np.asarray(count_y)
new_x, new_y = avg_group(x, y)
plt.plot(new_x,new_y,'o')
plt.show()
A colab notebook (with the code) is linked here.
Let me know if this helps or if you have any followup questions :)

I'd suggest you use pandas.DataFrame, instead of writing your own functions. By using pandas.DataFrame.GroupBy.mean() on a new column of close-values you can get the expected result.
import numpy as np, pandas as pd,
df = pd.DataFrame({'x':np.r_[0.0, 0.05, 0.93, 1, 2.1, 1.95, 2 ], #added small values
'y':np.r_[1, 2, 3, 5, 4, 4, 6.8]})
dist = .3
df['close_ind'] = df['x'].sort_values().diff().gt(dist).cumsum()
x_new = df.groupby('close_ind')['x'].mean().tolist()
y_new = df.groupby('close_ind')['y'].mean().tolist()
x_new: [0.025, 0.965, 2.017]
y_new: [1.5, 4.0, 4.933]

Related

Outputting results of loop of a loop

I am iteratively solving this implicit equation
using fsolve within a for loop over a range of values of the independent variable, V.
I also want to vary I_L and run the for loop over each value and generate an individual text files.
I know how to use the open and write text files, what I'm struggling with is setting loops up correctly to output what I want.
I have coded a simpler example below to allow for ease of understanding since it's just the loops I'm stuck on.
import numpy as np
from scipy.optimize import fsolve
import scipy.constants as sc
x = np.linspace(-1, 1, 1001)
C_vary = [0, 1, 2, 3]
def equation(y, x, C):
return C - np.exp(x+y) - y
for C in C_vary:
y = []
Solve equation at each value of C_vary and output y values to new list of
results
I have introduced an initial guess for the function y(x), but you can check the details further. The output y is a list, with each element corresponding to a value of the C_vary parameters, for each x.
import numpy as np
from scipy.optimize import fsolve
import scipy.constants as sc
x = np.linspace(-1, 1, 1001)
C_vary = [0, 1, 2, 3]
def equation(y, x, C):
return C - np.exp(x+y) - y
y0 = np.exp( 0.5*x ) #initial guess
y = [ fsolve( equation, y0, (x,ci) ) for ci in C_vary ]
If you are after I as a function of V and other parameters, you can solve the equation by means of the Lambert W function.
It has the form
z = e^(a z + b)
where z is linear in I, and this is
- a z e^(- a z) = - a e^b
or
z = - W(-a e^b) / a.

Can someone explain these lines: X1, y1 = np.c_[np.random.normal(loc=new_center[0],

I want to create a dataset first thought Gaussian disrtibution (make_blobs) which gives me: 300 rows with 2 columns each X,y then having the maximum of X as a new center next I'm kinda lost I don't know what these lines meant by
so I need these lines to be explained:
X1, y1 = np.c_[np.random.normal(loc=new_center[0], size=size),
np.random.normal(loc=new_center[1], size=size)], np.ones(size)X, y = np.r_[X, X1], np.r_[y, y1].astype(int)
then:
def plot_dataset_with_class(x, y):
uniques = np.unique(y)
[plt.plot(x[:, 0][y == unique], x[:, 1][y == unique], '.') for unique in uniques]
can someone please explain I'm lost !!
the complete code is this:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
"""Create Dataset."""
X, y = make_blobs(300, centers=2, cluster_std=2.3, random_state=RANDOM_SEED)
new_center = max(X, key=lambda x: x[1])
size = 100
X1, y1 = np.c_[np.random.normal(loc=new_center[0], size=size),
np.random.normal(loc=new_center[1], size=size)], np.ones(size)
X, y = np.r_[X, X1], np.r_[y, y1].astype(int)
## Plot dataset method
def plot_dataset(x):
plt.plot(x[:, 0], x[:, 1], '.')
def plot_dataset_with_class(x, y):
uniques = np.unique(y)
[plt.plot(x[:, 0][y == unique], x[:, 1][y == unique], '.') for unique in uniques]
plt.figure()
plot_dataset(X)
plt.show()
Both assignments are using tuple shorthand. They can be broken down further with little extra work thusly:
X1 = np.c_[np.random.normal(loc=new_center[0], size=size),
np.random.normal(loc=new_center[1], size=size)]
y1 = np.ones(size)
X = np.r_[X, X1]
y = np.r_[y, y1].astype(int)
The assignment to X1 is the column stacking of the elements in that list (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.c_.html#numpy.c_).
The assignment to y1 is an array of ones of length size.
The assignment to X is the already existing variable X concatenated with X1 (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html#numpy.r_).
The assignment to y is the concatenation of existing variable y with y1, but casted element-wise as integers.
The next bit you ask about: [plt.plot(x[:, 0][y == unique], x[:, 1][y == unique], '.') for unique in uniques] plots each class as a different color. It achieves this by using list comprehension to select points for each class separately (iterating over values of y and selecting only the current value of the iterator in each call to plot()), since by default plots are overlayed with different color lines on repeated calls to plot().

Implementation of a threshold detection function in Python

I want to implement following trigger function in Python:
Input:
time vector t [n dimensional numpy vector]
data vector y [n dimensional numpy vector] (values correspond to t vector)
threshold tr [float]
Threshold type vector tr_type [m dimensional list of int values]
Output:
Threshold time vector tr_time [m dimensional list of float values]
Function:
I would like to return tr_time which consists of the exact (preferred also interpolated which is not yet in code below) time values at which y is crossing tr (crossing means going from less then to greater then or the other way around). The different values in tr_time correspond to the tr_type vector: the elements of tr_type indicate the number of the crossing and if this is an upgoing or a downgoing crossing. For example 1 means first time y goes from less then tr to greater than tr, -3 means the third time y goes from greater then tr to less then tr (third time means along the time vector t)
For the moment I have next code:
import numpy as np
import matplotlib.pyplot as plt
def trigger(t, y, tr, tr_type):
triggermarker = np.diff(1 * (y > tr))
positiveindices = [i for i, x in enumerate(triggermarker) if x == 1]
negativeindices = [i for i, x in enumerate(triggermarker) if x == -1]
triggertime = []
for i in tr_type:
if i >= 0:
triggertime.append(t[positiveindices[i - 1]])
elif i < 0:
triggertime.append(t[negativeindices[i - 1]])
return triggertime
t = np.linspace(0, 20, 1000)
y = np.sin(t)
tr = 0.5
tr_type = [1, 2, -2]
print(trigger(t, y, tr, tr_type))
plt.plot(t, y)
plt.grid()
Now I'm pretty new to Python so I was wondering if there is a more Pythonic and more efficient way to implement this. For example without for loops or without the need to write separate code for upgoing or downgoing crossings.
You can use two masks: the first separates the value below and above the threshold, the second uses np.diff on the first mask: if the i and i+1 value are both below or above the threshold, np.diff yields 0:
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0, 8 * np.pi, 400)
y = np.sin(t)
th = 0.5
mask = np.diff(1 * (y > th) != 0)
plt.plot(t, y, 'bx', markersize=3)
plt.plot(t[:-1][mask], y[:-1][mask], 'go', markersize=8)
Using the slice [:-1] will yield the index "immediately before" crossing the threshold (you can see that in the chart). if you want the index "immediately after" use [1:] instead of [:-1]

interpolation based on one array values

I have two arrays with values:
x = np.array([100, 123, 123, 118, 123])
y = np.array([12, 1, 14, 13])
I want to evaluate for example the function:
def func(a, b):
return a*0.8 * (b/2)
So, I want to fill the y missing values.
I am using:
import numpy as np
from scipy import interpolate
def func(a, b):
return a*0.8 * (b/2)
x = np.array([100, 123, 123, 118, 123])
y = np.array([12, 1, 14, 13])
X, Y = np.meshgrid(x, y)
Z = func(X, Y)
f = interpolate.interp2d(x, y, Z, kind='cubic')
Now, I am not sure how to continue from here.If I try:
xnew = np.linspace(0,150,10)
ynew = np.linspace(0,150,10)
Znew = f(xnew, ynew)
Znew is filled with nan values.
Also, I want to make the opposite.
If x is smaller than y and I want to interpolate always based on x values.
So, for example:
x = np.array([1,3,4])
y = np.array([1,2,3,4,5,6,7])
I want to remove values from y now.
How can I proceed with this?
To interpolate from a 1d array you can use np.interp as follow :
np.interp(np.linspace(0,1,len(x)), np.linspace(0,1,len(y)),y)
you can have a look at the documentation for full details but in short :
consider that your array y have value with references from 0 to 1 (example [5,2,6,3,9] will have indexes [0,0.25,0.5,0.75,1])
The second and the third argument of the function are the indexes and the vector y
The first argument is the indexes of the interpolated value of y
as an example :
>>> y = [0,5]
>>> indexes = [0,1]
>>> new_indexes = [0,0.5,1]
>>> np.interp(new_indexes, indexes, y)
[0,2.5,5]

numpy - evaluate function on a grid of points

What is a good way to produce a numpy array containing the values of a function evaluated on an n-dimensional grid of points?
For example, suppose I want to evaluate the function defined by
def func(x, y):
return <some function of x and y>
Suppose I want to evaluate it on a two dimensional array of points with the x values going from 0 to 4 in ten steps, and the y values going from -1 to 1 in twenty steps. What's a good way to do this in numpy?
P.S. This has been asked in various forms on StackOverflow many times, but I couldn't find a concisely stated question and answer. I posted this to provide a concise simple solution (below).
shorter, faster and clearer answer, avoiding meshgrid:
import numpy as np
def func(x, y):
return np.sin(y * x)
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
result = func(xaxis[:,None], yaxis[None,:])
This will be faster in memory if you get something like x^2+y as function, since than x^2 is done on a 1D array (instead of a 2D one), and the increase in dimension only happens when you do the "+". For meshgrid, x^2 will be done on a 2D array, in which essentially every row is the same, causing massive time increases.
Edit: the "x[:,None]", makes x to a 2D array, but with an empty second dimension. This "None" is the same as using "x[:,numpy.newaxis]". The same thing is done with Y, but with making an empty first dimension.
Edit: in 3 dimensions:
def func2(x, y, z):
return np.sin(y * x)+z
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
zaxis = np.linspace(0, 1, 20)
result2 = func2(xaxis[:,None,None], yaxis[None,:,None],zaxis[None,None,:])
This way you can easily extend to n dimensions if you wish, using as many None or : as you have dimensions. Each : makes a dimension, and each None makes an "empty" dimension. The next example shows a bit more how these empty dimensions work. As you can see, the shape changes if you use None, showing that it is a 3D object in the next example, but the empty dimensions only get filled up whenever you multiply with an object that actually has something in those dimensions (sounds complicated, but the next example shows what i mean)
In [1]: import numpy
In [2]: a = numpy.linspace(-1,1,20)
In [3]: a.shape
Out[3]: (20,)
In [4]: a[None,:,None].shape
Out[4]: (1, 20, 1)
In [5]: b = a[None,:,None] # this is a 3D array, but with the first and third dimension being "empty"
In [6]: c = a[:,None,None] # same, but last two dimensions are "empty" here
In [7]: d=b*c
In [8]: d.shape # only the last dimension is "empty" here
Out[8]: (20, 20, 1)
edit: without needing to type the None yourself
def ndm(*args):
return [x[(None,)*i+(slice(None),)+(None,)*(len(args)-i-1)] for i, x in enumerate(args)]
x2,y2,z2 = ndm(xaxis,yaxis,zaxis)
result3 = func2(x2,y2,z2)
This way, you make the None-slicing to create the extra empty dimensions, by making the first argument you give to ndm as the first full dimension, the second as second full dimension etc- it does the same as the 'hardcoded' None-typed syntax used before.
Short explanation: doing x2, y2, z2 = ndm(xaxis, yaxis, zaxis) is the same as doing
x2 = xaxis[:,None,None]
y2 = yaxis[None,:,None]
z2 = zaxis[None,None,:]
but the ndm method should also work for more dimensions, without needing to hardcode the None-slices in multiple lines like just shown. This will also work in numpy versions before 1.8, while numpy.meshgrid only works for higher than 2 dimensions if you have numpy 1.8 or higher.
import numpy as np
def func(x, y):
return np.sin(y * x)
xaxis = np.linspace(0, 4, 10)
yaxis = np.linspace(-1, 1, 20)
x, y = np.meshgrid(xaxis, yaxis)
result = func(x, y)
I use this function to get X, Y, Z values ready for plotting:
def npmap2d(fun, xs, ys, doPrint=False):
Z = np.empty(len(xs) * len(ys))
i = 0
for y in ys:
for x in xs:
Z[i] = fun(x, y)
if doPrint: print([i, x, y, Z[i]])
i += 1
X, Y = np.meshgrid(xs, ys)
Z.shape = X.shape
return X, Y, Z
Usage:
def f(x, y):
# ...some function that can't handle numpy arrays
X, Y, Z = npmap2d(f, np.linspace(0, 0.5, 21), np.linspace(0.6, 0.4, 41))
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z)
The same result can be achieved using map:
xs = np.linspace(0, 4, 10)
ys = np.linspace(-1, 1, 20)
X, Y = np.meshgrid(xs, ys)
Z = np.fromiter(map(f, X.ravel(), Y.ravel()), X.dtype).reshape(X.shape)
In the case your function actually takes a tuple of d elements, i.e. f((x1,x2,x3,...xd)) (for example the scipy.stats.multivariate_normal function), and you want to evaluate f on N^d combinations/grid of N variables, you could also do the following (2D case):
x=np.arange(-1,1,0.2) # each variable is instantiated N=10 times
y=np.arange(-1,1,0.2)
Z=f(np.dstack(np.meshgrid(x,y))) # result is an NxN (10x10) matrix, whose entries are f((xi,yj))
Here np.dstack(np.meshgrid(x,y)) creates an 10x10 "matrix" (technically a 10x10x2 numpy array) whose entries are the 2-dimensional tuples to be evaluated by f.
My two cents:
import numpy as np
x = np.linspace(0, 4, 10)
y = np.linspace(-1, 1, 20)
[X, Y] = np.meshgrid(x, y, indexing = 'ij', sparse = 'true')
def func(x, y):
return x*y/(x**2 + y**2 + 4)
# I have defined a function of x and y.
func(X, Y)

Categories