Edit: Everything is good :)
This is a code which works with small values of t=20 and TR=([[30,20,12,23..],[...]]) but when I put higher values it is shown "Expect x to be a 1-D sorted array_like.". Do you know how to solve this problem??
import matplotlib.pylab as plt
from scipy.special import erfc
from scipy import sqrt
from scipy import exp
import numpy as np
from scipy.interpolate import interp1d
# The function to inverse:
t = 100
alfa = 1.1*10**(-7)
k = 0.18
T1 = 20
Tpow = 180
def F(h):
p = erfc(h*sqrt(alfa*t)/k)
return T1 + (Tpow-T1)*(1-exp((h**2*alfa*t)/k**2)*(p))
# Interpolation
h_eval = np.linspace(-80, 500, 200) # critical step: define the discretization grid
F_inverse = interp1d( F(h_eval), h_eval, kind='cubic', bounds_error=True )
# Some random data:
TR = np.array([[130, 100, 130, 130, 130],
[ 90, 101, 100, 120, 90],
[130, 130, 100, 100, 130],
[120, 101, 120, 90, 110],
[110, 130, 130, 110, 130]])
# Compute the array h for a given array TR
h = F_inverse(TR)
print(h)
# Graph to verify the interpolation
plt.plot(h_eval, F(h_eval), '.-', label='discretized F(h)');
plt.plot(h.ravel(), TR.ravel(), 'or', label='interpolated values')
plt.xlabel('h'); plt.ylabel('F(h) or TR'); plt.legend();
Has anyone an idea how to solve non-linear, implicit equation in numpy.
I have array TR and other values which are included in my equation.
I need to solve it - as a result receive a new array with the same shape
Here is a solution using an 1D interpolation to compute the inverse of the F(h) function. Because non standard root finding method is used, the error is not controlled, and the discretization grid have to be chosen with care. However, the interpolated inverse function can be directly computed over an array.
note: the definition of F is modified, the problem is now Solve h for F(h) = TR
import numpy as np
from scipy.interpolate import interp1d
import matplotlib.pylab as plt
# The function to inverse:
t = 10
alfa = 1.1*10**(-7)
k = 0.18
T1 = 20
Tpow = 100
def F(h):
A = np.exp(h**2*alfa*t/k**2)
B = h**3*2/(3*np.sqrt(3))*(alfa*t)**(3/2)/k**3
return -(Tpow-T1)*( 1 - A + B )
# Interpolation
h_eval = np.linspace(40, 100, 50) # critical step: define the discretization grid
F_inverse = interp1d( F(h_eval), h_eval, kind='cubic', bounds_error=True )
# Some random data:
TR = np.array([[13, 10, 13, 13, 13],
[ 9, 11, 10, 12, 9],
[13, 13, 10, 10, 13],
[12, 11, 12, 9, 11],
[11, 13, 13, 11, 13]])
# Compute the array h for a given array TR
h = F_inverse(TR)
print(h)
# Graph to verify the interpolation
plt.plot(h_eval, F(h_eval), '.-', label='discretized F(h)');
plt.plot(h.ravel(), TR.ravel(), 'or', label='interpolated values')
plt.xlabel('h'); plt.ylabel('F(h) or TR'); plt.legend();
With the other function, the following lines are changed:
from scipy.special import erf
def F(h):
return (Tpow-T1)*(1-np.exp((h**2*alfa*t)/k**2)*(1.0-erf(h*np.sqrt(alfa*t)/k)))
# Interpolation
h_eval = np.linspace(15, 35, 50) # the range is changed
Related
I am using CausalNex to create a DAG from a dataset in Python.
I got the graph, and the nodes are correct, but the edges are totally off. I tried this in a DataFrame with four random independent variables (Requestor, Risk, Size, Developer) and a single dependent one (Duration), and the graph produced is this:
Am I using the library incorrectly? Why is the figure so distant from the true data-generating process? Could a Bayesian Network model outperform CausalNex?
I tried this code:
# Generate initial data
import numpy as np
import pandas as pd
np.random.seed(42)
fib_list = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
df = pd.DataFrame({
"Requestor": np.random.randint(1, 4, 100),
"Size": np.random.randint(1, 4, 100),
"Risk": np.random.randint(1, 4, 100)
})
df['Developer'] = np.random.choice(fib_list, df.shape[0])
df["Duration"] = (
0.1 * df["Requestor"] +
0.2 * df["Size"] +
0.2 * df["Risk"] +
0.5 * df["Developer"]
)
# Generate graph
from causalnex.structure.notears import from_pandas
import matplotlib.pyplot as plt
import networkx as nx
sm = from_pandas(df)
sm.remove_edges_below_threshold(0.8)
nx.draw_shell(sm, with_labels=True, font_weight ="bold")
plt.show()
I was expecting something like this:
I would say that the relations between the variables are not easy to capture (particularly due to the domain size of Developer). The parents of continuous "Duration" have a domain size of 4*4*4*12 ... And duration itself is not really continuous, but can take 102 different values ...
So a database of size 100 is really not enough for the tests/scores to be accurate during the learning algorithms.
Note that I multiplied Duration by 10 to keep integer values.
FYI an inference is the last BN
The code :
import numpy as np
import pandas as pd
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
gum.config["notebook","default_graph_size"]="3!" #change default size for BN
def createDB(N:int):
# code from Rafaela Medeiros
np.random.seed(42)
fib_list = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
data = {"Requestor": np.random.randint(1,4,N),
"Size": np.random.randint(1,4,N),
"Risk": np.random.randint(1,4,N)}
df = pd.DataFrame(data)
df['Developer'] = np.random.choice(fib_list, df.shape[0])
df["Duration"] = (1*df["Requestor"] + 2*df["Size"] + 2*df["Risk"] + 5*df["Developer"])
return df
def learnForSize(N:int):
learner=gum.BNLearner(createDB(N))
learner.useMIIC() # choosing this algo to learn
bn=learner.learnBN()
return bn
sizes=[100,5000,55000]
gnb.flow.row(*[learnForSize(N) for N in sizes],
captions=[f"{size=}" for size in sizes])
I need to make subtractions inside red frames as [20-10,60-40,100-70]
that results in [10,20,30]
Current code makes subtractions but I don't know how to define red frames
seq = [10, 20, 40, 60, 70, 100]
window_size = 2
for i in range(len(seq) - window_size+1):
x=seq[i: i + window_size]
y=x[1]-x[0]
print(y)
You can build a quick solution using the fact that seq[0::2] will give you every other element of seq starting at zero. So you can compute seq[1::2] - seq[0::2] to get this result.
Without using any packages you could do:
seq = [10, 20, 40, 60, 70, 100]
sub_seq = [0]*(len(seq)//2)
for i in range(len(sub_seq)):
sub_seq[i] = seq[1::2][i] - seq[0::2][i]
print(sub_seq)
Instead you could use Numpy. Using the numpy array object you can subtract the arrays rather than explicitly looping:
import numpy as np
seq = np.array([10, 20, 40, 60, 70, 100])
sub_seq = seq[1::2] - seq[0::2]
print(sub_seq)
Here's a solution using numpy which might be useful if you have to process large amounts of data in a short time. We select values based on whether their index is even (index % 2 == 0) or odd (index % 2 != 0).
import numpy as np
seq = [10, 20, 40, 60, 70, 100]
seq = np.array(seq)
index = np.arange(len(seq))
seq[index % 2 != 0] - seq[index % 2 == 0]
I have two line on graph using matplotlib in python
Line A
A_X = [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
A_Y = [33300.0, 32887.6, 33046.4, 33140.9, 32967.8, 32960.0, 33128.95, 33376.95, 33300.0, 33080.0]
Line B which has first and last point from Line A to draw straight line between those points.
B_X = [11, 20]
B_Y = [33300.0, 33080.0]
So now i want to find all y reference coordinates for B_X 12 to B_X 19 x coordinate.
Basically from the below image I want to find coordinates for red points. Thank you in advance for huge help.
Use linear interpolation with the scipy module:
from scipy.interpolate import interp1d
f = interp1d(B_X, B_Y)
To get the y values for the red points use
f(A_X)
Coordinates for the red points will be (A_X, f(A_X))
You should create an interpolation function with interp1d:
line = interp1d(B_X, B_Y)
Then choose the X points you want to use for the interpolation and call the interpolation function on them:
B_X_points = np.arange(B_X[0], B_X[-1] + 1, 1)
B_Y_points = line(B_X_points)
Complete Code
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
A_X = [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
A_Y = [33300.0, 32887.6, 33046.4, 33140.9, 32967.8, 32960.0, 33128.95, 33376.95, 33300.0, 33080.0]
B_X = [11, 20]
B_Y = [33300.0, 33080.0]
line = interp1d(B_X, B_Y)
B_X_points = np.arange(B_X[0], B_X[-1] + 1, 1)
B_Y_points = line(B_X_points)
fig, ax = plt.subplots()
ax.plot(A_X, A_Y)
ax.plot(B_X_points, B_Y_points, marker = 's', markerfacecolor = 'r')
plt.show()
UCLA has this great site for statistical tests
https://stats.idre.ucla.edu/r/whatstat/what-statistical-analysis-should-i-usestatistical-analyses-using-r/#1sampt
but the code is all in R. I am trying to convert the code to Python equivalents but it is not a straightforward process for some like the chi-square goodness of fit. Here is the R version:
hsb2 <- within(read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv"), {
race <- as.factor(race)
schtyp <- as.factor(schtyp)
prog <- as.factor(prog)
})
chisq.test(table(hsb2$race), p = c(10, 10, 10, 70)/100)
My Python attempt is this:
import numpy as np
import pandas as pd
from scipy import stats
df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
# convert to category
df["race"] = df["race"].astype("category")
t_race = pd.crosstab(df.race, columns = 'race')
p_tests = np.array((10, 10, 10, 70))
p_tests = ptests/100
# tried this
stats.chisquare(t_race, p_tests)
# and this
stats.chisquare(t_race.T, p_tests)
but neither stats.chisquare output comes close to the R version. Can anybody steer me in the right direction? TIA
chisq.test takes a vector of probabilities; stats.chisquare takes the expected frequencies (docs).
> results = chisq.test(c(24, 11, 20, 145), p=c(0.1, 0.1, 0.1, 0.7))
> results
Chi-squared test for given probabilities
data: c(24, 11, 20, 145)
X-squared = 5.028571429, df = 3, p-value = 0.169716919
vs.
In [49]: obs = np.array([24, 11, 20, 145])
In [50]: prob = np.array([0.1, 0.1, 0.1, 0.7])
In [51]: stats.chisquare(obs, f_exp=obs.sum() * prob)
Out[51]: Power_divergenceResult(statistic=5.0285714285714285, pvalue=0.16971691923343338)
I'm not sure what I'm doing wrong. I'm attempting to use scipy griddata to interpolate data in an irregular grid.
from scipy.interpolate import griddata
I have two lists, "x" and "y", that represent the axes of my original, uninterpolated grid. They are both lists of length 8.
Then, I make the arrays that represent the axes of the intended final, filled-in grid.
ny = np.linspace(0.0, max(y), y[len(y)-1]/min_interval+1)
nx = np.linspace(0.0, max(x), len(ny))
I've checked and both "ny" and "nx" are of shape (61,). Then, I create an 8 x 8 list "z". Finally, I attempt to make my final grid.
Z = griddata((np.array(x), np.array(y)), np.array(z), (nx, ny), method='nearest', fill_value=0)
print Z.shape
The resulting 2D array has dimensions (61,8). I tried using "x" and "y" as lists and arrays - no change. Why is it only interpolating in one direction? I was expecting a (61,61) array output.
I would have included actual numbers if I felt it would have been helpful, but I don't see how it would make a difference. Do I not understand how griddata works?
Here is the full code:
import numpy as np
from scipy.interpolate import griddata
# random data to interpolate
x = np.array([0, 10, 13, 17, 20, 50, 55, 60.0])
y = np.array([10, 20, 40, 80, 90, 95, 100, 120.0])
zg = np.random.randn(8, 8)
#select one of the following two line, it depends on the order in z
#xg, yg = np.broadcast_arrays(x[:, None], y[None, :])
xg, yg = np.broadcast_arrays(x[None, :], y[:, None])
yg2, xg2 = np.mgrid[y.min()-10:y.max()+10:100j, x.min()-10:x.max()+10:100j]
zg2 = griddata((xg.ravel(), yg.ravel()), zg.ravel(), (xg2.ravel(), yg2.ravel()), method="nearest")
zg2.shape = yg2.shape
import pylab as pl
pl.pcolormesh(xg2, yg2, zg2)
pl.scatter(xg.ravel(), yg.ravel(), c=zg.ravel())
the output is: