Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Recently I have completed ML course in coursera by Andrews Ng. It's an awesome course. I was working with octave through out the course. But, python is much popular when compared to octave. So, I have started to learn python now. I was implementing linear regression using python. In that I am doing nothing. Simply calling the predefined function for linear regression. But, in octave I used to write the code from scratch. I have to find parameters using gradient descent algorithm. But, no such things in python. I have referred the following link:
https://towardsdatascience.com/linear-regression-python-implementation-ae0d95348ac4
My question is, won't we use any algorithms like gradient descent to learn parameter Theta? Is everything is predefined in python?
Thanks.
Python is a programming language, just like Octave. So everything that can be done in Octave can be done using Python too. If you want to implement Linear Regression algorithm from scratch using Python in order to validate your understanding, of course you can do it (I have done it too). Why stop at Linear Regression, you can implement SVM, Decision Trees or even Deep Neural Networks from scratch in Python. And it is a good way to gain concrete understanding of these algorithms.
However, over the years all these have been implemented in Python in libraries like Sklearn etc. So as the complexity and volume of data increases, you would want to use one of these libraries or frameworks. Why? Because these are highly optimized implementations. To get high level feeling - implement Linear Regression using simple list and for loops, and then vectorize it with Numpy, you will see the difference in performance.
So to summarize - if you are curious, go ahead and implement the algorithms from scratch to gain solid understanding. As complexity and data volume will increase, start using the libraries and frameworks. Hope this helps.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 months ago.
Improve this question
I have a dataset composed of several large csv files. Their total size is larger than the RAM of the machine on which the training is executed.
I need to train an ML model from Scikit-Learn or TF or pyTorch (Think SVR, not deep learning). I need to use the whole dataset which is impossible to load at once. Any recommendation on how to overcome this, please?
I have been in this situation before and my suggestion would be take a step back and look at the problem again.
Does your model absolutely need all of the data at once? Or can it be done in batches? It's also possible that the model you are using can be done in batches, but the library you are using does not support such a case. In that situation, either try to find a library that does support batches or if such a library does not exist (unlikely), "reinvent the wheel" yourself, i.e., create the model from scratch and allow batches. However, as your question mentioned, you need to use a model from Scikit-Learn, TensorFlow, or PyTorch. So if you truly want to stick with your mentioned libraries, there are techniques such as those that Alexey Larionov and I'mahdi mentioned in comments to your question in relation to PyTorch and TensorFlow.
Is all of your data actually relevant? Once I found that a whole subset of my data was useless to the problem I was trying to solve; another time I found that it was only marginally helpful. Dimensionality reduction, numerosity reduction, and statistical modeling may be your friends here. Here is a link to a wikipedia page about data reduction:
https://en.wikipedia.org/wiki/Data_reduction
Not only will data reduction reduce the amount of memory you need, it will also improve your model. Bad data in means bad data out.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm trying put my optimization problem into Pyomo, but it is strongly dependent upon standard linear algebra operations - qr, inverse, transpose, product. Actually, this is Kalman filter problem; recursive linear algebra for long time series. I failed to find pyomo functions to implement it like I could in tensor flow. Is it possible?
Connected questions:
Am I right that numpy target function is practically not usable in pyomo?
Is there a better free optimization solution for the purpose? (scipy cannot approach efficiency of Matlab by far, tensor flow is extremely slow for particular problem, though I do not see why, algorithmic differentiation in Matlab was reasonably fast though not fast enough)
Many thanks,
Vladimir
Pyomo is mainly a package for optimization. i.e. specifying data -> building problem -> sending to the solver -> wait for solver's results -> retrieving solution. Even if it can handle matrix-like data, it cannot manipulate it with matrix operations. This should be done using a good external library, before you send your data to Pyomo. Once you have all your matrixes ready to be used as data in your optimization model, then you can use Pyomo for optimization.
That being said, you should look into finding a library that fits your needs to build your data, since your data values must be static, once you provide it as an input to your model.
Also, keep in mind that Pyomo, like any optimization tools, is deterministic. It is not meant to do data analysis or data description, but to provide a way to find one optimal solution of a mathematical problem. In your case, Pyomo is not meant to do the Kalman filter problem, but to give you the solution of minimizing the mean square error.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Context:
I'm trying learning machine-learning using python3. My intended goal is to create a CNN program that can guess simple 4 letters 72*24 pixels CAPTCHA image as below:
CAPTCHA Image Displaying VDF5. This challenge was inspired by https://medium.com/#ageitgey/how-to-break-a-captcha-system-in-15-minutes-with-machine-learning-dbebb035a710, which I thought would be a great challenge for me to learn k-means clustering and CNN.
Edit---
I see I was being too "build me this guy". Now that I found scikit, I'll try to learn it and apply that instead. Sorry for annoying you all.
It seems as if you are looking to build a machine learning algorithm for educational purposes. If so, import TensorFlow and get to it! However, seeing as your question seems to be "create this for me" you might be better off simply using existing implementations from the scikit learn package. Simply import scikit learn, make an instance of the KNearestNeighborClassifier train it, and boom you've cracked this problem.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
What is the algorithm of extraction based automatic summarization ? googled alot, couldnt find anything related to it . I want to implement the algo on python
There is not one single algorithm for extraction based summarization. There are several different algorithms to choose from. You should choose one that fits your specific needs.
There are two approaches to extraction based summarization:
Supervised learning - you give the program lots of examples of documents together with their keywords. The program learns what constitutes a keyword. Then you give it a new document, this time without any keywords, and the program extracts the keywords of this document based on what it learned during the training phase. There is a huge number of supervised learning techniques. To name a few, there are neural networks, decision trees, random forests and support vector machines.
Unsupervised learning - you simly give the program a document and it creates a list of keywords without relying on any past experience. A popular unsupervised algorithm for extraction based summarization is TextRank.
First off, I think you should learn more about how to find papers and research. It is absolutely impossible if you haven't found anything by google. In any case, some of the extraction based text summarziation are:
Easy to implement methods based on word frequency
Bayesian methods
Graph based methods eg TextRank/LexRank is a good start.
Clustering
Fuzzy Systems for summarization
Neural Network based system
I have seen methods based on optimization algorithms
I suggest googling these methods and see what you get. There are a lot of variations for these and I can't really tell what method is the best. Remember to find proper preprocessing tools as well.
Good luck.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I recently got interested in soccer statistics. Right now I want to implement the famous Dixon-Coles Model in Python 3.5 (paper-link).
The basic problem is, that from the model described in the paper a Likelihood function with numerous parameters results, which needs to be maximized.
For example: The likelihood function for one Bundesliga season would result in 37 parameters. Of course I do the minimization of the corresponding negative log-likelihood function. I know that this log function is strictly convex so the optimization should not be too difficult. I also included the analytic gradient, but as the number of parameters exceeds ~10 the optimization methods from the SciPy-Package fail (scipy.optimize.minimize()).
My question:
Which other optimization techniques are out there and are mostly suited for optimization problems involving ~40 independent parameters?
Some hints to other methods would be great!
You may want to have a look at convex optimization packages like https://cvxopt.org/ or https://www.cvxpy.org/. It's Python-based, hence easy to use!
You can make use of Metaheuristic algorithms which work both on convex and non-convex spaces. Probably the most famous one of them is Genetic algorithm. It is also easy to implement and the concept is straightforward. The beautiful thing about Genetic algorithm is that you can adapt it to solve most of the optimization problems.