'list' object has no attribute 'matmul' - python

I have the code below to compute Markov chain iterations. Having two matrices: the current state matrix and transitional matrix; when stating the number of iterations (multiplications between the two matrices) the code should save the result of the state matrix after one iteration for the next iteration, and so on. When compiling the code, there is an error:
AttributeError: 'list' object has no attribute 'matmul'.
I'm working with NumPy version 1.17. How can I solve it?
import numpy as np
transitionalMatrix = ([0.42, 0.16, 0.36, 0.02 ],[0.05, 0.43, 0.04, 0.11 ], [0.24, 0.16, 0.51 , 0.04 ], [0.01, 0.31, 0.01, 0.59 ])
stateMatrix = ([0.20461531, 0.26104588, 0.19799357, 0.14561973])
maxIterations = 6
res = [stateMatrix]
for iteration in range(1, maxIterations):
prev = res[iteration - 1]
res.append(prev.matmul(transitionalMatrix))

As the error says, you are trying to apply matmul to a list, which doesn't have any such attribute. Assuming that what you want to use is np.matmul(), what you should be doing is:
np.matmul(prev, transitionalMatrix))
However, as Prune pointed out, the lack of a minimal, reproducible example makes it impossible to help you any further.

Related

Getting ValueError when expanding my GMMHMM from 2 to three states

I am trying to expand my GMMHMM model from two to three states but get the error below
"ValueError: startprob_ must sum to 1 (got nan)"
. It looks like it states that my initial state distribution does not sum up to one but it does (see Pi). Furthermore
I get the following warning, might have something to do with it:
"UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1."
Furthermore, If I look into it I can see that my state transition matrix returns nan values.
import numpy as np
from hmmlearn.hmm import GMMHMM
import pandas as pd
Pi = np.array([0.24, 0.37, 0.39])
A = np.array([[0.74, 0.20, 0.06],
[0.20, 0.53, 0.27 ],
[0.05, 0.40, 0.54]])
model = GMMHMM(n_components=3, n_mix=1, startprob_prior=Pi, transmat_prior=A,
min_covar=0.001, tol=0.0001, n_iter=10000)
Obs = df[['gdp','un','inf','inx','itr']].to_numpy()
print(Obs)
model.fit(Obs)
print(model.transmat_)
seq = model.decode(Obs)
print(seq)
I am not a really experienced Python programmer, so might be an easy solve but unfortunately I do not see how. Any help would be highly appreciated!

possibility of Pymoo to work within a candidate search space

I have a problem of two objective functions, three variables, and zero constraints.
I have also a search space for these variables read from CSV.
Is it possible to use pymoo to use that search space of variables (instead of xl, and xu) to get the best combination of them that maximize the two functions.
class MyProblem (Problem):
def __init__(self):
super().__init__(n_var=3,
n_obj=2,
n_constr=0,
#I want to use the search space of the three variables (I already have)
xl=np.array([0.0,0.0,0.0]),
xu=np.array([1.0,1.0,1.0])
)
def _evaluate(self,X,out,*args,**kwargs):
#Maximizing the triangle area of the three variables
f1=-1*(0.5*math.sin(120)*(X[:,0]*X[:,1] +X[:,2]*X[:,1]+X[:,0]*X[:,2]))
#maximizing the sum of the variables
f2 = -1*(X[:,0]+X[:,1]+X[:,2])
out["F"] = np.column_stack([f1, f2])
problem = MyProblem()
When I use the xl and xu, it always gets the combination of ones [1.0,1.0,1.0], but I want to get the best combination out of my numpy multi-dimension array.
import csv
with open("sample_data/dimensions.csv", 'r') as f:
dimensions = list(csv.reader(f, delimiter=","))
import numpy as np
dimensions = np.array(dimensions[1:])
dimensions=np.array(dimensions[:,1:], dtype=np.float)
dimensions
that looks like the following:
array([[0.27 , 0.45 , 0.23 ],
[0. , 0.23 , 0.09 ],
[0.82 , 0.32 , 0.27 ],
[0.64 , 0.55 , 0.32 ],
[0.77 , 0.55 , 0.36 ],
[0.25 , 0.86 , 0.18 ],
[0. , 0.68 , 0.09 ],...])
Thanks for your help!
Have you tried sampling by numpy.array?
class pymoo.algorithms.nsga2.NSGA2(self, pop_size=100, sampling=numpy.array)
where (from pymoo API)
The sampling process defines the initial set of solutions which are
the starting point of the optimization algorithm. Here, you have three
different options by passing
(i) A Sampling implementation which is an implementation of a random
sampling method.
(ii) A Population object containing the variables to be evaluated
initially OR already evaluated solutions (F needs to be set in this
case).
(iii) Pass a two dimensional numpy.array with (n_individuals, n_var)
which contains the variable space values for each individual.

Scikit FDA - Landmark_registration Problem

After a smoothing procedure, I have a problem with the landmark registration in this line:
skfda.preprocessing.registration.landmark_registration_warping(fd, land)
It return the following error:
ValueError: `x` must be strictly increasing sequence.
fd is a FDataGrid (typical type of data required to represent the function) with 5 samples, while land is an array of landmark that I want to align and it is an increasing sequence of points (see below)
land <- array([[[0.1 , 0.134, 0.258, 0.292, 0.328, 0.558, 0.602],
[0.1 , 0.126, 0.23 , 0.256, 0.292, 0.454, 0.474],
[0.1 , 0.148, 0.25 , 0.278, 0.34 , 0.514, 0.568],
[0.1 , 0.116, 0.25 , 0.276, 0.298, 0.508, 0.612],
[0.1 , 0.132, 0.258, 0.286, 0.376, 0.59 , 0.648]]])
fd <-
Can somebody help me? I'm using scikit fda package to perform this kind of analysis
https://fda.readthedocs.io/en/latest/modules/preprocessing/autosummary/skfda.preprocessing.registration.landmark_registration.html#skfda.preprocessing.registration.landmark_registration
This is the link to the function that I'm using
I had this error when finding my own landmarks. I forgot to pass in the actual domain value at that point (in my case the peak(s) I was wanting to align). Once I did that my error changed to: ValueError: Sample points must be within the domain range. Which brings me to my next point:
Manually specifying the end result landmark locations allowed the code to run, and from what I can tell "work." I'm not sure if this is a bug, or if I am doing something wrong myself. However, the examples they provide do explicitly state that the end result landmark locations shouldn't have to be specified.
Additionally, the end result landmark locations do not seem to end up at the specified points. They end up at the closest point in the grid_point array. This may not be too obvious/a problem for high sample rate data, but for the demo GAIT data scikit-fda provides, there are only 20 sample points so it is noticeably visible that the landmarks do not go exactly where specified. This is also the case for when converting to a basis function as well. One could possibly toy around with the interpolation options and see if it helps.

Error when calculating percent error in python

So I have a neural network and I am trying to calculate the percent error.
for i in range(len(y_test_predicted)):
difference = np.array(abs(y_test_predicted[i] - y_test_unscaled[i]))
print("Difference: ",difference)
error = np.array(difference/y_test_predicted[i])
print("error: ",error)
print("---------------")
av_error = np.mean(error)
av_per_error = av_error * 100
I have predicted values and actual values. I take the absolute value of their difference and divide by the predicted value. However the error array is only a single value. It gets over written each time the loop iterates. I tried using
error[i] = np.array(difference/y_test_predicted[i])
But it throws an error saying that it is out of bounds. I also tried hard coding the problem to avoid using array's by just having a running sum of all the error values but it keeps returning NaN for some reason.
Assuming that y_test_predicted and y_test_unscaled are numpy arrays, you can use numpy's vectorised operators and avoid the for loop entirely, like so:
difference = np.abs(y_test_predicted - y_test_unscaled)
error = difference / y_test_predicted
av_error = np.mean(error)
For instance:
>>> import numpy as np
>>> y_test_unscaled = np.array([0.11, 0.63, 0.44, 0.54, 0.65])
>>> y_test_predicted = np.array([0.1, 0.5, 0.3, 0.5, 0.7])
>>> difference = np.abs(y_test_predicted - y_test_unscaled)
>>> error = difference / y_test_predicted
>>> av_error = np.mean(error)
>>> av_error
0.19561904761904764
If you're hellbent on using a loop, then the error you're getting is probably because error is the wrong shape (though I can't tell that for sure as it's not included in your question). Something like:
error = np.zeros(y_test_predicted.shape)
before your loop would probably resolve it -- this pre-allocates an array which is the same shape as y_test_predicted.

Proximity Matrix in sklearn.ensemble.RandomForestClassifier

I'm trying to perform clustering in Python using Random Forests. In the R implementation of Random Forests, there is a flag you can set to get the proximity matrix. I can't seem to find anything similar in the python scikit version of Random Forest. Does anyone know if there is an equivalent calculation for the python version?
We don't implement proximity matrix in Scikit-Learn (yet).
However, this could be done by relying on the apply function provided in our implementation of decision trees. That is, for all pairs of samples in your dataset, iterate over the decision trees in the forest (through forest.estimators_) and count the number of times they fall in the same leaf, i.e., the number of times apply give the same node id for both samples in the pair.
Hope this helps.
Based on Gilles Louppe answer I have written a function. I don't know if it is effective, but it works. Best regards.
def proximityMatrix(model, X, normalize=True):
terminals = model.apply(X)
nTrees = terminals.shape[1]
a = terminals[:,0]
proxMat = 1*np.equal.outer(a, a)
for i in range(1, nTrees):
a = terminals[:,i]
proxMat += 1*np.equal.outer(a, a)
if normalize:
proxMat = proxMat / nTrees
return proxMat
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
train = load_breast_cancer()
model = RandomForestClassifier(n_estimators=500, max_features=2, min_samples_leaf=40)
model.fit(train.data, train.target)
proximityMatrix(model, train.data, normalize=True)
## array([[ 1. , 0.414, 0.77 , ..., 0.146, 0.79 , 0.002],
## [ 0.414, 1. , 0.362, ..., 0.334, 0.296, 0.008],
## [ 0.77 , 0.362, 1. , ..., 0.218, 0.856, 0. ],
## ...,
## [ 0.146, 0.334, 0.218, ..., 1. , 0.21 , 0.028],
## [ 0.79 , 0.296, 0.856, ..., 0.21 , 1. , 0. ],
## [ 0.002, 0.008, 0. , ..., 0.028, 0. , 1. ]])
There is nothing currently implemented for this in python. I took a first try at it here. It would be great if somebody would be interested in adding these methods to scikit.

Categories