I'm trying to test the fuzzy logic tipping example that exists at the following link: click here
My question is how can I make this control system prints the output value (tipping) in terms of ['low', 'medium', 'high'] rather than printing the actual computed value.
The following is the example code
import matplotlib.pyplot as plt
import numpy as np
import skfuzzy as fuzzy
from skfuzzy import control
# Universe variables
quality = control.Antecedent(np.arange(0, 11, 1), 'quality')
service = control.Antecedent(np.arange(0, 11, 1), 'service')
tip = control.Consequent(np.arange(0, 26, 1), 'tip')
# Auto-membership function population (3,5,7)
quality.automf(3)
service.automf(3)
# Custom triangle membership functions
tip['low'] = fuzzy.trimf(tip.universe, [0, 0, 13])
tip['medium'] = fuzzy.trimf(tip.universe, [0, 13, 25])
tip['high'] = fuzzy.trimf(tip.universe, [13, 25, 25])
#view memberships
#quality.view()
#service.view()
#tip.view()
#Fuzzy rules
rule1 = control.Rule(quality['poor'] | service['poor'], tip['low'])
rule2 = control.Rule(service['average'], tip['medium'])
rule3 = control.Rule(service['good'] | quality['good'], tip['high'])
#Control System Creation and Simulation
tipping_ctrl = control.ControlSystem([rule1, rule2, rule3])
tipping = control.ControlSystemSimulation(tipping_ctrl)
# Pass inputs to the ControlSystem & compute
tipping.input['quality'] = 10
tipping.input['service'] = 3
tipping.compute()
#visualize & view
print (tipping.output)
tip.view(sim=tipping)
plt.show()
You have to pass the tip in this case
tipping.output['tip']
Related
I am using CausalNex to create a DAG from a dataset in Python.
I got the graph, and the nodes are correct, but the edges are totally off. I tried this in a DataFrame with four random independent variables (Requestor, Risk, Size, Developer) and a single dependent one (Duration), and the graph produced is this:
Am I using the library incorrectly? Why is the figure so distant from the true data-generating process? Could a Bayesian Network model outperform CausalNex?
I tried this code:
# Generate initial data
import numpy as np
import pandas as pd
np.random.seed(42)
fib_list = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
df = pd.DataFrame({
"Requestor": np.random.randint(1, 4, 100),
"Size": np.random.randint(1, 4, 100),
"Risk": np.random.randint(1, 4, 100)
})
df['Developer'] = np.random.choice(fib_list, df.shape[0])
df["Duration"] = (
0.1 * df["Requestor"] +
0.2 * df["Size"] +
0.2 * df["Risk"] +
0.5 * df["Developer"]
)
# Generate graph
from causalnex.structure.notears import from_pandas
import matplotlib.pyplot as plt
import networkx as nx
sm = from_pandas(df)
sm.remove_edges_below_threshold(0.8)
nx.draw_shell(sm, with_labels=True, font_weight ="bold")
plt.show()
I was expecting something like this:
I would say that the relations between the variables are not easy to capture (particularly due to the domain size of Developer). The parents of continuous "Duration" have a domain size of 4*4*4*12 ... And duration itself is not really continuous, but can take 102 different values ...
So a database of size 100 is really not enough for the tests/scores to be accurate during the learning algorithms.
Note that I multiplied Duration by 10 to keep integer values.
FYI an inference is the last BN
The code :
import numpy as np
import pandas as pd
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
gum.config["notebook","default_graph_size"]="3!" #change default size for BN
def createDB(N:int):
# code from Rafaela Medeiros
np.random.seed(42)
fib_list = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
data = {"Requestor": np.random.randint(1,4,N),
"Size": np.random.randint(1,4,N),
"Risk": np.random.randint(1,4,N)}
df = pd.DataFrame(data)
df['Developer'] = np.random.choice(fib_list, df.shape[0])
df["Duration"] = (1*df["Requestor"] + 2*df["Size"] + 2*df["Risk"] + 5*df["Developer"])
return df
def learnForSize(N:int):
learner=gum.BNLearner(createDB(N))
learner.useMIIC() # choosing this algo to learn
bn=learner.learnBN()
return bn
sizes=[100,5000,55000]
gnb.flow.row(*[learnForSize(N) for N in sizes],
captions=[f"{size=}" for size in sizes])
can someone help me transform this R t.test function to python?
r code:
t.test(y, mu = 85, paired = FALSE, var.equal =TRUE, alternative = "greater)
You are testing a single sample x against a population mean mu, so the corresponding function from SciPy is scipy.stats.ttest_1samp. When a second sample y is not given to t.test, var_equal and paired are not relevant, so the only other parameter to deal with is alternative, and the SciPy function also takes an alternative parameter. So the Python code is
from scipy.stats import ttest_1samp
result = ttest_1samp(y, mu, alternative='greater')
Note that ttest_1samp returns only the t statistic (result.statistic) and the p-value (result.pvalue).
For example, here is a calculation in R:
> x = c(3, 1, 4, 1, 5, 9)
> result = t.test(x, mu=2, alternative='greater')
> result$statistic
t
1.49969
> result$p.value
[1] 0.09699043
Here's the corresponding calculation in Python
In [14]: x = [3, 1, 4, 1, 5, 9]
In [15]: result = ttest_1samp(x, 2, alternative='greater')
In [16]: result.statistic
Out[16]: 1.499690178660333
In [17]: result.pvalue
Out[17]: 0.0969904256712105
You may find this blog useful: https://www.reneshbedre.com/blog/ttest.html
This is below an example of conversion with bioinfokit package but you can use the scipy one.
# Perform one sample t-test using bioinfokit,
# Doc: https://github.com/reneshbedre/bioinfokit
from bioinfokit.analys import stat
from bioinfokit.analys import get_data
df = get_data("t_one_samp").data #replace this with your data file
res = stat()
res.ttest(df=df, test_type=1, res='size', mu=5,evar=True)
print(res.summary)
Out put :
One Sample t-test
------------------ --------
Sample size 50
Mean 5.05128
t 0.36789
Df 49
P-value (one-tail) 0.35727
P-value (two-tail) 0.71454
Lower 95.0% 4.77116
Upper 95.0% 5.3314
------------------ --------
I'm working in R to estimate non-parametric regression. The complete project: https://systematicinvestor.wordpress.com/2012/05/22/classical-technical-patterns
My R code is the following , relying on the sm package's h.select and sm.regression.
library(sm)
y = as.vector( last( Cl(data), 190) )
t = 1:len(y)
h = h.select(t, y, method = 'cv')
temp = sm.regression(t, y, h=h, display = 'none')
I now would like to do the same in Python. I managed to set up the data (see below) but do not know how to select the smoothing parameter and estimate the non-parametric regression.
import pandas as pd
import datetime
import pandas_datareader.data as web
from pandas import Series, DataFrame
start = datetime.datetime(1970, 1, 1)
end = datetime.datetime(2020, 3, 24)
df = web.DataReader("^GSPC", 'yahoo', start, end)
y = df['Close'].tail(190).values
t = list(range(1, len(y) + 1))
UCLA has this great site for statistical tests
https://stats.idre.ucla.edu/r/whatstat/what-statistical-analysis-should-i-usestatistical-analyses-using-r/#1sampt
but the code is all in R. I am trying to convert the code to Python equivalents but it is not a straightforward process for some like the chi-square goodness of fit. Here is the R version:
hsb2 <- within(read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv"), {
race <- as.factor(race)
schtyp <- as.factor(schtyp)
prog <- as.factor(prog)
})
chisq.test(table(hsb2$race), p = c(10, 10, 10, 70)/100)
My Python attempt is this:
import numpy as np
import pandas as pd
from scipy import stats
df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
# convert to category
df["race"] = df["race"].astype("category")
t_race = pd.crosstab(df.race, columns = 'race')
p_tests = np.array((10, 10, 10, 70))
p_tests = ptests/100
# tried this
stats.chisquare(t_race, p_tests)
# and this
stats.chisquare(t_race.T, p_tests)
but neither stats.chisquare output comes close to the R version. Can anybody steer me in the right direction? TIA
chisq.test takes a vector of probabilities; stats.chisquare takes the expected frequencies (docs).
> results = chisq.test(c(24, 11, 20, 145), p=c(0.1, 0.1, 0.1, 0.7))
> results
Chi-squared test for given probabilities
data: c(24, 11, 20, 145)
X-squared = 5.028571429, df = 3, p-value = 0.169716919
vs.
In [49]: obs = np.array([24, 11, 20, 145])
In [50]: prob = np.array([0.1, 0.1, 0.1, 0.7])
In [51]: stats.chisquare(obs, f_exp=obs.sum() * prob)
Out[51]: Power_divergenceResult(statistic=5.0285714285714285, pvalue=0.16971691923343338)
I have pulled the following data from a .csv file(databoth.csv) and performed a k-means clustering utilising matplotlib. The data is 3 columns(Country, birthrate, life expectancy).
I need help to output:
The number of countries belonging to each cluster.
The list of countries belonging to each cluster.
The mean Life Expectancy and Birth Rate for each cluster.
Here is my code:
import csv
import matplotlib.pyplot as plt
import sys
import pylab as plt
import numpy as np
plt.ion()
#K-Means clustering implementation
# data = set of data points
# k = number of clusters
# maxIters = maximum number of iterations executed k-means
def kMeans(data, K, maxIters = 10, plot_progress = None):
centroids = data[np.random.choice(np.arange(len(data)), K), :]
for i in range(maxIters):
# Cluster Assignment step
C = np.array([np.argmin([np.dot(x_i-y_k, x_i-y_k) for y_k in
centroids]) for x_i in data])
# Move centroids step
centroids = [data[C == k].mean(axis = 0) for k in range(K)]
if plot_progress != None: plot_progress(data, C, np.array(centroids))
return np.array(centroids) , C
# Calculates euclidean distance between
# a data point and all the available cluster
# centroids.
def euclidean_dist(data, centroids, clusters):
for instance in data:
mu_index = min([(i[0], np.linalg.norm(instance-centroids[i[0]])) \
for i in enumerate(centroids)], key=lambda t:t[1])[0]
try:
clusters[mu_index].append(instance)
except KeyError:
clusters[mu_index] = [instance]
# If any cluster is empty then assign one point
# from data set randomly so as to not have empty
# clusters and 0 means.
for cluster in clusters:
if not cluster:
cluster.append(data[np.random.randint(0, len(data), size=1)].flatten().tolist())
return clusters
# this function reads the data from the specified files
def csvRead(file):
np.genfromtxt('dataBoth.csv', delimiter=',')
# function to show the results on the screen in form of 3 clusters
def show(X, C, centroids, keep = False):
import time
time.sleep(0.5)
plt.cla()
plt.plot(X[C == 0, 0], X[C == 0, 1], '*b',
X[C == 1, 0], X[C == 1, 1], '*r',
X[C == 2, 0], X[C == 2, 1], '*g')
plt.plot(centroids[:,0],centroids[:,1],'*m',markersize=20)
plt.draw()
if keep :
plt.ioff()
plt.show()
# generate 3 cluster data
data = csvRead('dataBoth.csv')
m1, cov1 = [9, 8], [[1.5, 2], [1, 2]]
m2, cov2 = [5, 13], [[2.5, -1.5], [-1.5, 1.5]]
m3, cov3 = [3, 7], [[0.25, 0.5], [-0.1, 0.5]]
data1 = np.random.multivariate_normal(m1, cov1, 250)
data2 = np.random.multivariate_normal(m2, cov2, 180)
data3 = np.random.multivariate_normal(m3, cov3, 100)
X = np.vstack((data1,np.vstack((data2,data3))))
np.random.shuffle(X)
# calls to the functions
# first to find centroids using k-means
centroids, C = kMeans(X, K = 3, plot_progress = show)
#second to show the centroids on the graph
show(X, C, centroids, True)
maybe you can use annotate:
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.annotate
more example :
http://matplotlib.org/users/annotations.html#plotting-guide-annotation
This will allow to have a text label near to each point.
or you can use colours as in this post