Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Can anyone guide me towards any dataset which consists of questions/survey based on psychology which when answered in full extent can tell you the gender if the person taking the test?
I need it to create a tool through which we can detect the patterns of fake profiles on the social platforms.
I know a few groups which are gender-specific (e.g. for mothers, for women private talk) but the opposite gender tries to trash it getting into the group pretending to be female.
I know it sounds silly for now, but anyone who wants to join these group can go through the questionnaire and the AI can detect it's gender.
Thank you in advance.
There is a dataset that I came across on kaggle. It does not have question-answer pairs from surveys, but the project was mainly about attempting to predict gender based on users' tweets. Not sure if you need your dataset in questionnaire format, but if not then you can check this out:
Twitter User Gender Classification
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Why it is so important to do pre-processing and what are the simple steps doing it? Can anyone help. I am working on python.
I have a dataframe consisting of null values. The data consist of outliers, moreover it is not distributed uniformly.
My question is what protocol I should follow inorder to fill null values, should I remove outliers because this might lead to loss of information and what are the steps to make data distributed uniformly?
Firstly it really doesnot matter which language you are working on. Both python and R are popular in Data Science.
Secondly, you cannot insert raw data to any machine learning models. Before you need to clean it. Here are some simple steps:
1. Remove missing values: Many a times there are missing values present in the data. So you have to fill those data. Question arises how? There are planty of methods which you can google out.
2. Remove skewness and outliers: Generally data contains values that are not within the range of other data. So you have to bring those data with that range.
3. One-hot-encoding: Categorical values needs to be transformed to encoding format.
Still there are more steps but you to google it out there are tons of blogs you can go through.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I want to forecast upcoming total users on a daily basis within Python using a machine learning algorithm. Check the pattern below:
Looking at this graph, I was wondering if someone knows which forecasting method in Python I should use to predict?
Thanks!
If you have no additional data expect the user data over time which you have shown, the only thing you can do is try to find a function dependent on time which gives you a good approximation for that plot (ordinary curve fitting). I suppose that's not what you want.
To do a predection (which can be done not only by a machine learning approach), you need other data which is somehow correlated to the data you want to predict.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a lot of latitude and longitude information that belong to different countries. How do I map each of these lat-long to the country it belongs to in an efficient manner?
I came across googleapis, but that would be very inefficient and time consuming if I had to query this api for every lat-long pair that I have. I thought of grouping all the lat-long pairs that are closer and then figure out a way to find the country that each of these groups belongs to.
Could you please point me to the right algorithms that I could use for this purpose? Or are there better ways to do this?
import reverse_geocode
coordinates = (-37.81, 144.96), (31.76, 35.21)
reverse_geocode.search(coordinates)
Can find the locations offline like this using the python reverse_geocode library
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have about 3,000 words and I would like to group them into about 20-50 different categories. My words are typical phrases you might find in company names. "Face", "Book", "Sales", "Force", for example.
The libraries I have been looking at so far are pandas and scikit-learn. I'm wondering if there is a machine-learning or deep-learning algorithm that would be well suited for this?
The topics I have been looking are Classification: identifying which category an object belongs to, and Dimensionality Reduction: reducing the random number of variables to consider.
When I search for putting words into categories on Google, it brings up kids puzzles such as "things you do with a pencil" - draw. Or "parts of a house" - yard, room.
for deep learning to work on this you would have to develop a large dataset, most likely manually. the largest natural language processing dataset was, in fact, created manually.
BUT even if you were able to find a dataset which a model could learn off. THEN a model such as gradient boosted trees would be one, amongst others, that would be well suited to multi-class classification like this. A classic library for this is xgboost.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm using LDA to categorize small documents, about 4-5 lines.
I'm categorizing them into topics such as Technology, Politics, Art, Music etc etc
I'm using wikipedia to download articles in each category (Technology, Politics, Art etc etc) and training LDA for each category
Wikipedia is huge (about 8GB compressed), and computations take hours! and uses a huge space in my hard drive
Is there any toolkit that already provides "ready-made" generic topics which i can directly use for categorization?
There are quite a few online API's that categorize text into a predefined set of topics. For example, https://www.textrazor.com/demo identifies topics such as Business, Law, and Politics. You can also take a look at MeaningCloud or AlchemyAPI. Most of these services are paid, but do have a free tier that may be sufficient, depending on your needs.