Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
For example, I have email sending logs like this:
day_of_week| time | cta | is_opened |is_clicked
1 |10:00AM|CLICK HERE|True |False
7 |07:30PM|BUY NOW |False |False
...
I want to write a program to see "best performing day and time to send emails".
This example is only for sending day/time. I want I can add extra parameters (like CTA, sender name etc.) when I need it.
Is the machine learning best way to do it? (I have no experience in ML) I'm experienced with Python and I think I can use the TensorFlow to do it.
ps: These are marketing emails that we send our members, not spam or malware.
There are two views for your case :
given day, time, etc. predict will be opened/clicked or not.
given cta, etc. predict the best day-time to send the email
For the first case, you can use Neural Net, or any classifier to predict it will be opened/clicked or not
For the second case, I assume this is your case, You may look at Multivariate Regression, because two variables you need to predict (day_of_week, time) may not be handled separately (e.g. by creating two models then predict day_of_week and time separately ). You need to predict two variables simultaneously. But you need to clean your data first, so it only contain opened/clicked email.
And of course, you can implement it using Tensorflow.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a few lists of movement tracking data, which looks something like this
I want to create a list of outputs where I mark these large spikes, essentially telling that there is a movement at that point.
I applied a rolling standard deviation on the data with a window size of two and got this result
Now I can see the spikes which mark the point of interest, but I am not sure how to do it in code. A statistical tool to measure these spikes, which can be used to flag these spikes.
There are several approaches that you can use for an anomaly detection task.
The choice depends on your data.
If you want to use a statistical approach, you can use some measures like z-score or IQR.
Here you can find a tutorial for these measures.
Here instead, you can find another tutorial for a statistical approach which uses mean and variance.
Last but not least, I suggest you also to check how to use a control chart, because in some cases it's enough.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to create a machine learning algorithm, for address classification or similar address classification, for rural(Villages) areas. I have a historical data, which includes list of Addresses (Independent Variable), Village Name (Independent Variable) Pin-Codes (Independent Variable), Customer Mobile Number and Route No (Dependent Variable). Route No is for delivery cart, which will help them to cover maximum number of delivery destination in that area.
Challenges -
"Address" can be miss spelled.
"Villages Name" can be null.
"Pin-codes" can be wrong.
Good Thing -
Not all the independent variables can be wrong/null at the same time.
Now the point of creating this algorithm is for selecting the best Route Number, on the basis of "Address", "Villages", "Pin-Codes", and Historical Data(In which we have manually selected the Route for delivery carts).
I'm the beginner, i'm confused how to do this which process is to use.
Tasked I have done.
Address cleaning - Removed short words, Removed Big Words, Removed Stop Words.
Now trying to do it with word vector, but i'm not able to do that.
for this first you'll have to build a dataset first - consisting the names of as many villages as you can! because many villages have similar names so identifying a typo is pretty difficult and risky! there is a difference of one or two letters. So, bigger dataset is better.
Then, try to use TF-IDF on the combination of village name and PIN code (this link may be helpful for Indian data) or you can go for fuzzy logic.
Hope it helps! Happy coding!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
My data was modelled with a Cox-regression, using R, however I would like to use this model into a python GUI. As my knowledge of R is very limited. This way non-coders would be able to 'predict' survival rates based on our model.
What is the best way that I could use this model (combination of 3 different regressions) in python?
Do you want to predict values based on your estimates?
In this case you can just copy the R outputs into python and apply to
respective procedures.
Do you want the user to be able to run "your R regression pipeline" from within Python?
There are python libraries that help with that. I find this
source a useful start.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This may be a stupid question, but I am new to ML and can't seem to find a clear answer.
I have implemented a ML algorithm on a Python web app.
Right now I am storing the data that the algorithm uses in an offline CSV file, and every time the algorithm is run, it analyzes all of the data (one new piece of data gets added each time the algorithm is used).
Apologies if I am being too vague, but I am wondering how one should generally go about implementing the data and algorithm properly so that:
The data isn't stored in a CSV (Do I simply store it in a database like I would with any other type of data?)
Some form of preprocessing is used so that the ML algorithm doesn't have to analyze the same data repeatedly each time it is used (or does it have to given that one new piece of data is added every time the algorithm is used?).
The data isn't stored in a CSV (Do I simply store it in a database like I would with any other type of data?)
You can store in whatever format you like.
Some form of preprocessing is used so that the ML algorithm doesn't have to analyze the same data repeatedly each time it is used (or does it have to given that one new piece of data is added every time the algorithm is used?).
This depends very much on what algorithm you use. Some algorithms can easily be implemented to learn in an incremental manner. For example, Linear/Logistic Regression implemented with Stochastic Gradient Descent could easily just run a quick update on every new instance as it gets added. For other algorithms, full re-trains are the only option (though you could of course elect not to always do them over and over again for every new instance; you could, for example, simply re-train once per day at a set point in time).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a dataset of IT operations tickets with fields like Ticket No, Description, Category,SubCategory,Priority etc.
What I need to do is to use available data(except ticket no) to predict the ticket priority. Sample data shown below.
Number Priority Created_on Description Category Sub Category
719515 MEDIUM 05-01-2016 MedWay 3rd Lucene.... Server Change
720317 MEDIUM 07-01-2016 DI - Medway 13146409 Application Incident
720447 MEDIUM 08-01-2016 DI QLD Chermside.... Application Medway
Please guide me on this.
Answering without more is a bit tough, and this is more of a context questions than a code question. But here is the logic I would use to start to evaluate this problem Keep in mind it might involve writing a few separate scripts each performing part of the task.
Try breaking the problem up into smaller pieces.You cannot do an analysis without all the data so start by creating the data.
You have the category and sub category already make a list of all the unique factors in each list and create a set of weights for each based on your system and business needs. As you make subcategory weights, keep in mind how they will interact with categories (+/- as well as magnitude).
Write a script to read the description, count all the non-trivial words. Create some kind of classifications for words to help you build lists that will inform the model with categories and sub categories.
Is the value an error message, or machine name, or some other code or type of problem you can extract using key words?
How are all the word groupings meaningful?
How would the contribute to making a decision?
Think about the categories when you decide these things.
Then with all of the parts, decide on a model, build, test and refine. I know there is no code in this but the problem solving part of Data Science happens outside of code most of the time.
You need to come up with the code yourself. If you get stuck post an edit and we can help.