Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
It's my first time using python and I have created a basic csv file. I want to know how would you create a predictive model using Naive Bayes to classify if the words are informative or uninformative ?
Python has a very extensive machine learning library scikit-learn.
As you are new to AI with python, you should consider learning from the basics. If you have already, DataCamp's Naive Bayes guide would be a good resource you can follow to achieve the classification model you want.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I'm making an anti-scam bot and I want to predict if a message is a scam message, but it throws a ValueError(could'nt convert string to float)
I am currently using the Decision Tree algorithm
Almost all the models accept floats and don't accept Strings, you have multiple ways to convert String to numbers (preprocess your data) before feeding it to your model as input.
You can try for example Word2Vec model by Gensim:
Word2Vec
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm using the following snippet of code:
The function test_submodels calculates the r^2 testscore of each submodel and tosses out the bad ones (in this case only the svm model), and returns the new list model_names. Then I'm calculating the r^2 scores of my stacked regressor which turns out the be awful. The output of this code can be seen below:
Here is some more clarification regarding the submodels, they are created as such:
I ended up fixing the problem, I had to define the final estimator in the stacking regressor, for example as such:
This improves the stacking score to roughly 0.9
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We have requirement that we get different type of documents from client like student admission document, marksheet etc. So we want to create an algorithm which identify which document it is. So for this we choose some specific keyword to identify the document type like if admission documents have keywords like fee, admission etc . And marksheet documents keyword like marks, grade etc. So Here we can predict document type by comparing keywords frequency.
For this above requirement which algorithm should implement? I was planning to implement multinomial naive base algorithm. But I can not fit my data in to it.
FYI.. I am using python sklearn module.
Can you please anyone tell me which algorithm should suitable for above requirement. If possible can you also please provide an example with code so that i can easily figure out the solution?
You are looking for Topic Modeling solution and there are plenty of it to solve the problem. via python and scikit-learn i recommend you to take a look at this article
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Context:
I'm trying learning machine-learning using python3. My intended goal is to create a CNN program that can guess simple 4 letters 72*24 pixels CAPTCHA image as below:
CAPTCHA Image Displaying VDF5. This challenge was inspired by https://medium.com/#ageitgey/how-to-break-a-captcha-system-in-15-minutes-with-machine-learning-dbebb035a710, which I thought would be a great challenge for me to learn k-means clustering and CNN.
Edit---
I see I was being too "build me this guy". Now that I found scikit, I'll try to learn it and apply that instead. Sorry for annoying you all.
It seems as if you are looking to build a machine learning algorithm for educational purposes. If so, import TensorFlow and get to it! However, seeing as your question seems to be "create this for me" you might be better off simply using existing implementations from the scikit learn package. Simply import scikit learn, make an instance of the KNearestNeighborClassifier train it, and boom you've cracked this problem.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
It would be really helpful if someone can explain and give an example of how to apply machine learning algorithms using scikit-learn and python to images,sound or videos. I know how to apply it to csv file just want to learn how it can be extended for multimedia files.
Thankyou
There's a section in the sklearn documentation on feature extracting that focuses on working with images. There's also a section of the docs that talk about working with images, video and audio. I suggest you spend some time going through these sections and the rest of the documentation.
The MNIST dataset is a standard dataset of images of hand written digits that is used in a lot of examples so if you're searching google for examples "MNIST sklearn" will probably be helpful.