How do you fix inconsistent numbers of samples when using GaussianNB()? - python

How do you fix inconsistent numbers of samples when using GaussianNB()? Also, is it possible for input pandas dataframe as arguments for model.fit function?

The issue is that GaussianNB is expecting weather to be in the shape (n_samples, n_features). You currently have it as a one-dimensional array, so GaussianNB is interpreting it as a 1 sample with 14 features.
To convert to the right shape, you can use weather[:,None] as described in this answer. So, the following should do the trick:
model.fit(weather[:,None], play)

Related

Input 0 of layer dense is incompatible with the layer: expected axis -1 of input shape to have value 784 but received input with shape (None, 14)

please help me out in this. Thank you
Please check out the pic for more info on coding https://imgur.com/gallery/Oppnaq7
So bear with me...Also please help if you know the solution. Thank you
import numpy as np
import pandas as pd
import tensorflow as tf
from google.colab import files
uploaded = files.upload()
import io
df=pd.read_csv(io.BytesIO(uploaded['heart.csv']))
df
df.isna().sum(axis="rows")
from tensorflow.keras.utils import to_categorical
df.shape
y=df["cp"]
x=df.drop("cp",axis="columns")
y=to_categorical(y)
y.shape
x=pd.get_dummies(x,columns=["sex"])
x
df.hist(figsize=(10,10))
plt.show()
mnist = tf.keras.datasets.mnist
(x_train,y_train), (x_test,y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
from keras.layers import Dense , Flatten
from keras.models import Sequential
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)])`
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss_fn,optimizer="adam",metrics=["accuracy"])
model.fit(x_train,y_train,epochs=1000)
x
data=pd.DataFrame({"age":[50],"trestbps":[120],"chol":[350],"fbs":[1],"restecg":[1],"thalach":[150],"exang":[0],"oldpeak":[1.5],"slope":[1],"ca":[0],"thal":[2],"target":[1],"sex_0":[0],"sex_1":[1]})
data
model.predict(data)
My teacher used this cmds in jupyter but it doesnt work...it lit shows error...tried this in both colab n jupyter...
from keras.utils import to_categorical
y=to_categorical(y)
y.shape
X=X.drop(["PassengerId","Name","Ticket"],axis="columns")
X
X=pd.get_dummies(X,columns=["Sex"])
from keras.layers import Dense
from keras.models import Sequential
model=Sequential()
model.add(Dense(32,activation="relu",input_shape=(7,)))
model.add(Dense(2,activation="softmax"))
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
model.fit(X,y,epochs=10)
X
data=pd.DataFrame({"Pclass":[3],"Age":[84],"SibSp":[0],"Parch":[1],"Fare":[7],"Sex_female":[0],"Sex_male":[1]})
data
model.predict(data)
#Output array([[0.9702792 , 0.02972085]], dtype=float32)
Looking at your teacher's python code, the error is telling you that the Dense layer is expecting in input data with 784 features but got a series of data with only 14 features.
By looking at your python code, I'm assuming that you're using the predefined dataset MNIST in the teacher's code which, if I remember correctly, the input data in x_train has a shape of (784,x) where x is an integer number that could be 3,4, etc.
Now to solve this problem, you have to understand how you have to pass the data into your model. Looking again at your teacher's code we have this:
from keras.layers import Dense
from keras.models import Sequential
model=Sequential()
model.add(Dense(32,activation="relu",input_shape=(7,)))
model.add(Dense(2,activation="softmax"))
So we have a Sequential model that has two layers:
The first Dense node, that works also as the Input Layer for the model, is made of 32 units with the Rectified Linear Unit as the activation function and, the most important thing, we know that the Input shape of this model is written as a tuple of (7,). So what is this value (7,)? Basically you're telling to the model that your input data has 7 features so each single data must have 7 values within it.
The second Dense node is simply the output layer which gives in output a probability answer between two values. This is less important since your problem is passing the input data.
Now to understand how the data is handled in Keras/Tensorflow you have to think the data represented as a matrix where in the rows you have the values of the data while the columns indicates the meaning of the values. Kinda like Excel/CSV files where there is a header which explains the meaning of values in each column.
If you ever used NumPy, more specifically dimensional arrays, basically the shape of the data is the most important thing you should check before giving the data to your model because the shape tells to you how the data is handled and in how many dimensions is the data represented. If your data has, for example, a shape of (784,14) the data is represented as a bidimensional (2D) matrix where the first value (784) are the rows of the matrix while the second value (14) are the columns.
So to answer your question, the python program is expecting in input a series of data of 784 features, which to me is translated to 784 columns, but instead you have given in input a series of data with only 14 features. So to solve this problem you have to check the shape of your data, which you can simply do with x_train.shape and see in output what you get. Next you have to pass correctly the data to the model and then your program will work for sure. One way could be reshaping the data.
For a better understanding of how Tensorflow handles the shape of the data look at this guide.
Another thing I suggest to you is to write your question in a more appropriate way. My suggestion is to write a question in the following way:
Title of the problem.
Description of the problem.
Show the error you get.
Write what have you tried, add also references to other similar questions from yours.
ALWAYS show at least one piece of code so people can analyze it. More pieces of codes are also acceptable.
Optionally, write what is the expected result you want.
If you write the questions in this way surely people will understand better your problem.
I hope this helps understand better your problem.

efficiently converting for model.fit

I'm struggling in loading data into model.fit efficiently. My code creates training_data object with samples and values. Samples is a standard python list of objects of tf.Tensor class. Values is a list of integers.
When running
model.fit(training_data.samples, training_data.values, epochs=10)
I get an error
ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'tensorflow.python.framework.ops.EagerTensor'>"}), (<class 'list'> containing values of types {"<class 'int'>"})
I can get this two work, by pre-converting it all to numpy arrays like this:
s, v = np.asarray(training_data.samples), np.asarray(training_data.values)
model.fit(s, v, epochs=10)
However this is impossibly slow. Loading data and very heavy preprocessing (signal chunking, fft, etc. etc.) takes about a minute and then just the data conversion with 1800 samples this part hangs for an hour and I lose patience before actual learning starts. Tensors's shape is (94, 257) so nothing big.
So what's an efficient way to pass data to model.fit, given that I already have it in memory.
Hi this is just a suggestion but try using a generator object from tf.keras.utils.Sequence but this also depends on what type of data you are using?
You can look at the example here:
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
I implemented my own here you can check it out:
https://github.com/edwin-19/custom_keras_generator/blob/master/notebooks/Model%20Comparison.ipynb
For me it's a common error when I use the wrong input format (typically when I pass a list).
Try to convert only training_data.values to a numpy array or a tensor if it's a list.
As pinpointed in the documentation (x, y, validation_data,..) have a limited number of accepted inputs, i.e. numpy array and tensor:
model.fit

Scipy.stats.anderson returns an array instead of a float as described in documentation

I am attempting to use the Anderson-Darling method to test if my residuals follow a normal distribution using Scipy.stats.anderson, I'm using Scipy v1.0.0, Python3.5 in a Linux environment.
However, instead of returning a float for the test statistic as described in the documentation, what is returned is an array.
My input is a 1-dimensional numpy array. Below is the output.
Thanks for your help!
AndesonResults from Scipy.stats.anderson
Try:
results = anderson(rez.flatten(), dist='norm')
I think the problem is that your rez is has a shape of (n,1) rather than (n,), which is expected for 1d arrays by the anderson code.

Cannot interpolate data using Rbf in Scipy

I was trying to interpolate data using Rbf. The output data that I need is actually a single value. So I used something like
x=numpy.array([100])
y=numpy.array([200])
d=numpy.array([300])
rbfi=scipy.interpolate.Rbf(x,y,d)
But there was an error:
ValueError: array must not contain infs or NaNs
Does anybody know how to solve this problem? Thanks a lot!
Quite outdated but in case anyone wonders:
Rbf requires at least 2 data points.
x=numpy.array([100,120])
y=numpy.array([200,220])
d=numpy.array([300,100])
rbfi=scipy.interpolate.Rbf(x,y,d)
>>> print(rbfi)
<scipy.interpolate.rbf.Rbf object at 0x7fdac142b240>

Using scikit learn's GaussianNB with nltk doesn't work

I am trying to use nltk's wrapper for scikit-learn's classifiers. I use this code to train the classifier:
classifier = SklearnClassifier(GaussianNB())
classifier.train(self.training_set)
Where training_set looks like
[({'name':'Alpha Hotel', 'clicks':765, 'zip_code':75025},'no bookings')]
The error I am getting is
TypeError: A sparse matrix was passed, but dense data is required. Use
X.toarray() to convert to a dense numpy array.
I don't know how to convert to a dense array, especially since nltk's documentation for the train method requires A list of (featureset, label) where each featureset is a dict mapping strings to either numbers, booleans or strings.
You have three features just two of them is in numerical format.You first should convert the 'name' feature to a number. If the name variable is categorical then you can encode it in a meaningful manner as described here:
http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
i think your labels also limited, so you can encode them too. The last step is really easy you just need to convert nltk format to numpy array format. just read each feature in a loop and then insert your desired features in X (features) and Y (labels):
http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
Maybe it's was late, but maybe help other who get same problem(cz i got this problem yesterday).
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
Like error said, it's need to convert to array so i just convert this to array as the error said
vector = vectorizer.transform(corpus).toarray()
So just add .toarray() solve this problem.
;)
when i switch to MultinomialNB or BernoulliNB, neither they didn't error. with or without toarray().
note: dont forget to convert to fit and transform your text to word representation(numeric values).

Categories