Related
I'm trying to draw a regression line in plotly python in ternary space, but there doesn't seem to be an option like "trendline = 'loess' for scatter ternaries. Is there another way to achieve the same result for ternaries? Code from a previous post that makes a spline line but not a regression.
import numpy as np
import plotly.graph_objects as go
a = np.array([0.15, 0.15, 0.17, 0.2 , 0.21, 0.24, 0.26, 0.27, 0.27, 0.29, 0.32, 0.35, 0.39, 0.4 , 0.4 , 0.41, 0.47, 0.48, 0.51, 0.52, 0.54, 0.56, 0.59, 0.62, 0.63, 0.65, 0.69, 0.73, 0.74])
b = np.array([0.14, 0.15, 0.1 , 0.17, 0.17, 0.18, 0.05, 0.16, 0.17, 0.04, 0.03, 0.14, 0.13, 0.13, 0.14, 0.14, 0.13, 0.13, 0.14, 0.14, 0.15, 0.16, 0.18, 0.2 , 0.21, 0.22, 0.24, 0.25, 0.25])
c = np.array([0.71, 0.7 , 0.73, 0.63, 0.62, 0.58, 0.69, 0.57, 0.56, 0.67, 0.65, 0.51, 0.48, 0.47, 0.46, 0.45, 0.4 , 0.39, 0.35, 0.34, 0.31, 0.28, 0.23, 0.18, 0.16, 0.13, 0.07, 0.02, 0.01])
fig = go.Figure()
curve_portion = np.where((b < 0.15) & (c > 0.6))
curve_other_portion = np.where(~((b < 0.15) & (c > 0.6)))
def add_plot_spline_portions(fig, indices_groupings):
for indices in indices_groupings:
fig.add_trace(go.Scatterternary({
'mode': 'lines',
'connectgaps': True,
'a': a[indices],
'b': b[indices],
'c': c[indices],
'line': {'color': 'black', 'shape': 'spline', 'smoothing': 1},
'marker': {'size': 2, 'line': {'width': 0.1}}
})
)
add_plot_spline_portions(fig, [curve_portion, curve_other_portion])
fig.show(renderer='png')
I can outline what I think is a general sort of solution - it doesn't have as much mathematical rigor as I would like, and involves some guess and check type work - but hopefully it's helpful.
The first consideration is that for this regression on a ternary plot, there are only two degrees of freedom because A+B+C=1 (you might find this explanation helpful). This means it only makes sense to consider the relationship between two of the variables at a time. What we really want to do is create a regression between two of the variables, then determine the value of the third variable using the equation A+B+C=1.
The second consideration is bit harder to define, but since you are after a regression that captures the "reversing" nature of the variable A, we want a regression where A can take on repeated values. I think the most straightforward way to achieve this is for A to be the variable you are predicting.
For simplicity sake, let's say we use a degree 2 polynomial regression that predicts A from either B or C. We can make a scatter and choose whichever polynomial will have a better fit for our purposes.
Here is a quick eda:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
a = np.array([0.15, 0.15, 0.17, 0.2 , 0.21, 0.24, 0.26, 0.27, 0.27, 0.29, 0.32, 0.35, 0.39, 0.4 , 0.4 , 0.41, 0.47, 0.48, 0.51, 0.52, 0.54, 0.56, 0.59, 0.62, 0.63, 0.65, 0.69, 0.73, 0.74])
b = np.array([0.14, 0.15, 0.1 , 0.17, 0.17, 0.18, 0.05, 0.16, 0.17, 0.04, 0.03, 0.14, 0.13, 0.13, 0.14, 0.14, 0.13, 0.13, 0.14, 0.14, 0.15, 0.16, 0.18, 0.2 , 0.21, 0.22, 0.24, 0.25, 0.25])
c = np.array([0.71, 0.7 , 0.73, 0.63, 0.62, 0.58, 0.69, 0.57, 0.56, 0.67, 0.65, 0.51, 0.48, 0.47, 0.46, 0.45, 0.4 , 0.39, 0.35, 0.34, 0.31, 0.28, 0.23, 0.18, 0.16, 0.13, 0.07, 0.02, 0.01])
## eda to determine polynomial of best fit to predict A
fig_eda = make_subplots(rows=1, cols=2)
fig_eda.add_trace(go.Scatter(x=b, y=a, mode='markers'),row=1, col=1)
coefficients = np.polyfit(b,a,2)
p = np.poly1d(coefficients)
b_vals = np.linspace(min(b),max(b))
a_pred = np.array([p(x) for x in b_vals])
fig_eda.add_trace(go.Scatter(x=b_vals, y=a_pred, mode='lines'),row=1, col=1)
fig_eda.add_trace(go.Scatter(x=c, y=a, mode='markers'),row=1, col=2)
coefficients = np.polyfit(c,a,2)
p = np.poly1d(coefficients)
c_vals = np.linspace(min(c),max(c))
a_pred = np.array([p(x) for x in c_vals])
fig_eda.add_trace(go.Scatter(x=c_vals, y=a_pred, mode='lines'),row=1, col=2)
Notice how predicting A from B looks like it captures the reversing nature of A better than predicting A from C. If we try to make a degree 2 polynomial regression on A from C, we can see A is not going to repeat within the domain of C: [0,1] because of the very low sloping nature of that polynomial.
So let's proceed with this regression with C as the predictor variable, and A as the predicted variable (and B also being a predicted variable using B = 1 - (A + C).
fig = go.Figure()
fig.add_trace(go.Scatterternary({
'mode': 'markers',
'connectgaps': True,
'a': a,
'b': b,
'c': c
}))
## since A+B+C = 100, we only need to fit a polynomial between two of the variables
## fit an n-degree polynomial to 2 of your variables
## source https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html
coefficients = np.polyfit(b,a,2)
p = np.poly1d(coefficients)
## we use the entire domain of the input variable B
b_vals = np.linspace(0,1)
a_pred = np.array([p(x) for x in b_vals])
c_pred = 1 - (b_vals + a_pred)
fig.add_trace(go.Scatterternary({
'mode': 'lines',
'connectgaps': True,
'a': a_pred,
'b': b_vals,
'c': c_pred,
'marker': {'size': 2, 'color':'red', 'line': {'width': 0.1}}
}))
fig.show()
This is the lowest degree polynomial regression that allows for repeated values of A (a linear regression to predict A would be the wouldn't allow A to take on repeated values). However, you can definitely experiment with increasing the degree of the polynomial you are using, and predicting A from either variables B or C.
Generation of a list of many lists each with different ranges
Isc_act = [0.1, 0.2, 0.3]
I_cel = []
a = []
for i in range(0,len(Isc_act)):
a = np.arange(0, Isc_act[i], 0.1*Isc_act[i])
I_cel[i].append(a)
print(I_cel)
Output is:
IndexError: list index out of range
My code is giving error. But, I want to get I_cel = [[0,0.01,..,0.1],[0,0.02,0.04,...,0.2],[0, 0.03, 0.06,...,0.3]]. Hence, the 'nested list' I_cel has three lists and each list has 10 values.
The simplest fix to your code, probably what you were intending to do:
Isc_act = [0.1, 0.2, 0.3]
I_cel = []
for i in range(0,len(Isc_act)):
a = np.arange(0, Isc_act[i], 0.1*Isc_act[i])
I_cel.append(a)
print(I_cel)
Note that the endpoint will be one step less than you wanted! For example row zero, you have to pick two of the below:
Steps of size 0.01
Start point 0.0 and end point 0.1
10 elements total
You can not have all three.
More numpythonic approach:
>>> Isc_act = [0.1, 0.2, 0.3]
>>> (np.linspace(0, 1, 11).reshape(11,1) # [Isc_act]).T
array([[0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ],
[0. , 0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 ],
[0. , 0.03, 0.06, 0.09, 0.12, 0.15, 0.18, 0.21, 0.24, 0.27, 0.3 ]])
linspace gives better control of the end point when dealing with floats:
In [84]: [np.linspace(0,x,11) for x in [.1,.2,.3]]
Out[84]:
[array([0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ]),
array([0. , 0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 ]),
array([0. , 0.03, 0.06, 0.09, 0.12, 0.15, 0.18, 0.21, 0.24, 0.27, 0.3 ])]
Or we could scale just one array (arange with integers is predictable):
In [86]: np.array([.1,.2,.3])[:,None]*np.arange(0,11)
Out[86]:
array([[0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ],
[0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ],
[0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. ]])
Case 1 (solved): Array A has shape (say) (300,50). Array B is an indices array with the shape (300,5), such that B[i,j] indicate for the row i the index of another row to "concate" next to the row i. The end result is an array C with the shape (300,5,50), such that C[i,j,:] = A[B[i,j],:]. This can be done by calling A[B,:].
Here is small script example for case 1:
import numpy as np
## A is the data array
A = np.arange(20).reshape((5,4))
## B indicate for each row which rows to pull together
B = np.array([[0,2],[1,2],[2,0],[3,4],[4,1]])
A[B,:] #The desired result
Case 2 (unsolved): Same problem, only now A is shaped (100,300,50). If B is the indicies matrix shaped (100,300,5), the end result would be an array C with the shape (100,300,5,50) such that C[i,j,k,:] = A[i,B[i,j,k],:]. A[B,:] doesn't work anymore, because it result with a shape (100,300,5,300,50), due to broadcasting.
How should I approach this with indexing?
One approach would be reshaping to 2D keeping the number of columns intact and then indexing into the first axis with the flattened B indices and finally reshaping back to the desired one.
Thus, the implementation would be -
A.reshape(-1,A.shape[-1])[B.ravel()].reshape(100,300,5,50)
Those reshaping being merely views into the arrays, should be quite efficient.
This solves both cases. Here's a sample run for the case #1 -
1) Inputs :
In [667]: A = np.random.rand(3,4)
...: B = np.random.randint(0,3,(3,5))
...:
2) Original method :
In [668]: A[B,:]
Out[668]:
array([[[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.1 , 0.91, 0.1 , 0.98]],
[[ 0.45, 0.16, 0.02, 0.02],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.45, 0.16, 0.02, 0.02]],
[[ 0.48, 0.6 , 0.96, 0.21],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.45, 0.16, 0.02, 0.02]]])
3) Proposed method :
In [669]: A.reshape(-1,A.shape[-1])[B.ravel()].reshape(3,5,4)
Out[669]:
array([[[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.1 , 0.91, 0.1 , 0.98]],
[[ 0.45, 0.16, 0.02, 0.02],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.1 , 0.91, 0.1 , 0.98],
[ 0.45, 0.16, 0.02, 0.02]],
[[ 0.48, 0.6 , 0.96, 0.21],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.48, 0.6 , 0.96, 0.21],
[ 0.45, 0.16, 0.02, 0.02],
[ 0.45, 0.16, 0.02, 0.02]]])
I have an array
x1 = tf.Variable([[0.51, 0.52, 0.53, 0.94, 0.35],
[0.32, 0.72, 0.83, 0.74, 0.55],
[0.23, 0.72, 0.63, 0.64, 0.35],
[0.11, 0.02, 0.03, 0.14, 0.15],
[0.01, 0.72, 0.73, 0.04, 0.75]],tf.float32)
I want to sort the elements in each row from min to max. Is there any function for doing such ?
In the example here they are using tf.nn.top_k2d array,using this I can loop to create the max to min.
def sort(instance):
sorted = []
rows = tf.shape(instance)[0]
col = tf.shape(instance)[1]
for i in range(rows.eval()):
matrix.append([tf.gather(instance[i], tf.nn.top_k(instance[i], k=col.eval()).indices)])
return matrix
Is there any thing similar for finding the min to max or how to reverse the array in each row ?
As suggested by #Yaroslav you can just use the top_k values.
a = tf.Variable([[0.51, 0.52, 0.53, 0.94, 0.35],
[0.32, 0.72, 0.83, 0.74, 0.55],
[0.23, 0.72, 0.63, 0.64, 0.35],
[0.11, 0.02, 0.03, 0.14, 0.15],
[0.01, 0.72, 0.73, 0.04, 0.75]],tf.float32)
row_size = a.get_shape().as_list()[-1]
top_k = tf.nn.top_k(-a, k=row_size)
sess.run(-top_k.values)
this prints for me
array([[ 0.34999999, 0.50999999, 0.51999998, 0.52999997, 0.94 ],
[ 0.31999999, 0.55000001, 0.72000003, 0.74000001, 0.82999998],
[ 0.23 , 0.34999999, 0.63 , 0.63999999, 0.72000003],
[ 0.02 , 0.03 , 0.11 , 0.14 , 0.15000001],
[ 0.01 , 0.04 , 0.72000003, 0.73000002, 0.75 ]], dtype=float32)
I have two arrays X and Y, X is the base array and Y is operated in a loop. As the loop runs I want to compare the arrays to find the nearest value of Y to X or in other words where is Y most close to X. As an example I have attached the reproducible code:
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
x = np.array([[0.12, 0.11, 0.1, 0.09, 0.08],
[0.13, 0.12, 0.11, 0.1, 0.09],
[0.15, 0.14, 0.12, 0.11, 0.1],
[0.17, 0.15, 0.14, 0.12, 0.11],
[0.19, 0.17, 0.16, 0.14, 0.12],
[0.22, 0.19, 0.17, 0.15, 0.13],
[0.24, 0.22, 0.19, 0.16, 0.14],
[0.27, 0.24, 0.21, 0.18, 0.15],
[0.29, 0.26, 0.22, 0.19, 0.16]])
y = np.array([[0.07, 0.06, 0.05, 0.04, 0.03],
[0.08, 0.07, 0.06, 0.05, 0.04],
[0.10, 0.09, 0.07, 0.06, 0.05],
[0.14, 0.12, 0.11, 0.09, 0.08],
[0.16, 0.14, 0.13, 0.11, 0.09],
[0.19, 0.16, 0.14, 0.12, 0.10],
[0.22, 0.20, 0.17, 0.14, 0.12],
[0.25, 0.22, 0.19, 0.16, 0.13],
[0.27, 0.24, 0.20, 0.17, 0.14]])
for i in range(100):
y = y + (i / 10000)
I want to break the loop when the closest values have been found. By closest I mean, the values should be within ±10% of the original values or some other percentage. How can this be done in Python?
You can compute the Euclidean distance between the two matrices:
import numpy as np
import scipy.spatial.distance
import matplotlib.pyplot as plt
x = np.array([[0.12, 0.11, 0.1, 0.09, 0.08],
[0.13, 0.12, 0.11, 0.1, 0.09],
[0.15, 0.14, 0.12, 0.11, 0.1],
[0.17, 0.15, 0.14, 0.12, 0.11],
[0.19, 0.17, 0.16, 0.14, 0.12],
[0.22, 0.19, 0.17, 0.15, 0.13],
[0.24, 0.22, 0.19, 0.16, 0.14],
[0.27, 0.24, 0.21, 0.18, 0.15],
[0.29, 0.26, 0.22, 0.19, 0.16]])
y = np.array([[0.07, 0.06, 0.05, 0.04, 0.03],
[0.08, 0.07, 0.06, 0.05, 0.04],
[0.10, 0.09, 0.07, 0.06, 0.05],
[0.14, 0.12, 0.11, 0.09, 0.08],
[0.16, 0.14, 0.13, 0.11, 0.09],
[0.19, 0.16, 0.14, 0.12, 0.10],
[0.22, 0.20, 0.17, 0.14, 0.12],
[0.25, 0.22, 0.19, 0.16, 0.13],
[0.27, 0.24, 0.20, 0.17, 0.14]])
dists = []
for i in range(100):
y = y + (i / 10000.)
dists.append(scipy.spatial.distance.euclidean(x.flatten(), y.flatten()))
plt.plot(dists)
will return this graph, which is the evolution of the Euclidean distance between your 2 matrices:
To break the loop at the minimum, you can use:
dist = np.inf
for i in range(100):
y = y + (i / 10000.)
d = scipy.spatial.distance.euclidean(x.flatten(), y.flatten())
if d < dist:
dist = d
else:
break
print dist
# 0.0838525491562 #(the minimal distance)
print y
#[[ 0.1051 0.0951 0.0851 0.0751 0.0651]
#[ 0.1151 0.1051 0.0951 0.0851 0.0751]
#[ 0.1351 0.1251 0.1051 0.0951 0.0851]
#[ 0.1751 0.1551 0.1451 0.1251 0.1151]
#[ 0.1951 0.1751 0.1651 0.1451 0.1251]
#[ 0.2251 0.1951 0.1751 0.1551 0.1351]
#[ 0.2551 0.2351 0.2051 0.1751 0.1551]
#[ 0.2851 0.2551 0.2251 0.1951 0.1651]
#[ 0.3051 0.2751 0.2351 0.2051 0.1751]]