loading mnist data in keras [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
The below program must import mnist data from keras but the imported data consist only zeros. I have tried it on remote servers and again it has the same issue. Anyone knows why?
import tensorflow as tf
from keras.datasets import mnist
import numpy as np
from tempfile import TemporaryFile
(x_train, y_train), (x_test, y_test) = mnist.load_data()
data= np.concatenate((x_train, x_test), axis= 0)

Mnist data consists of grayscale images of the 10 digits. Black background is representing by zeros and white number by integer numbers from 1 to 255 (255 means white).
You probable printed x_train and show only zeros in printed portion of the array, but this is not the case for whole array.
Try:
import numpy as np
print(np.mean(x_train))

Related

Simple Regression Analysis by dividing the assigned data set as 30% testing and 70% training set. fit a linear regression line to the model [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Please do help, I am unable to split the two. Should we do it after importing the data or before?
You can use train_test_split to split your datasets.
Try this :
from sklearn.model_selection import train_test_split
train_test_split(X, y, test_size=0.3, random_state=0)

What's the best way to process strings for inputs in Keras? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a dataset where name is an important feature . I want to use it has an input node in my keras neural network in python . But since this is not possible , what's the best way to do it ??
I have tried one hot encoding but since the length of the name is not fixed , it is not useful ?
You can use Embeddings, which translates large sparse vectors (one-hot encoded) into a lower-dimensional space that preserves semantic relationships. So for the categorical feature, you will have dense vector representation.
unique_amount = np.unique(col1)
input_1 = tf.keras.layers.Input(shape=(1,), name='input_1')
embedding_1 = tf.keras.layers.Embedding(unique_amount_1, 50, trainable=True)(input_1)
col1_embedding = tf.keras.layers.Flatten()(embedding_1)
Where 50 - the size of the embedding vector, that you can choose by your self.
You can try with character level one hot encoding as follows in keras. Make sure you set char_level=True flag in Tokenizer. This could leads us very low dimension sparse matrix.
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(char_level=True)
tokenizer.fit_on_texts(<names>)
sequence_of_int = tokenizer.texts_to_sequences(<dataset_names>)
Even, you try with representing frequency based character encoding by your own.

How do I load MNIST data into a Google Colab Jupyter Notebook? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I've got a working environment for running Keras on Google Colab, but cannot figure out how to load MNIST data so I can import it into my program. Any suggestions greatly appreciated!
Keras has built-in common datasets and MNIST is one of them:
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
So if you have Keras on Colab, you should also have MNIST ready to go.

NLTK: feature reduction after vectorization [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have extracted unstructured textual data from approximately 3000 documents and I am attempting to use this data to classify this document.
However, even after removing stopwords & punctuation and lemmatizing the data, the count vectorization produces more than 64000 features.
A lot of these features contain unnecessary tokens like random numbers and text in different languages.
The libraries I have used are:
tokenization: Punkt (NLTK)
pos tagging: Penn Treebank (NLTK)
lemmatization: WordNet(NLTK)
vectorization: CountVectorizer (sk-learn)
Can anyone suggest how I can reduce the number of features for training my classifier?
You have two choices here, that can be complementary :
Change your tokenization with stronger rules using regex to remove numbers or other tokens you are not interested in.
Use feature selection to keep a subset of your features that are relevant for the classification. Here is a demo sample of code to keep 50% of the features in data:
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectPercentile
from sklearn.feature_selection import chi2
import numpy
iris = load_iris()
X, y = iris.data, iris.target
selector = SelectPercentile(score_func=chi2, percentile=50)
X_reduced = selector.fit_transform(X, y)

Using Scikit-learn on a csv dataset [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How do I apply scikit-learn to a numpy array with 4 columns each representing a different attribute?
Basically, I'm wanting to teach it how to recognize a healthy patient from these 4 characteristics and then see if it can identify an abnormal one.
Thanks in advance!
A pipeline usually has the following steps:
Define a classifier/ regressor
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100.)
Fit the data
clf.fit(X_train,y_train)
Here X_train will your four column features and y_train will be the labels whether the patient is healthy.
Predict on new data
y_pred = clf.prdict(X_test)
This tutorial is great starting point for you to get some basic idea about the pipeline.
Look into the pandas package which allows you to import CSV files into a dataframe. pandas is supported by scikit-learn.

Categories