Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
As mentioned in my other thread (Tensorflow Object Detection - Avoid overlapping boxes) I'm new to machine learning and I have to implement an algorithm for detecting traffic lights.
Regarding Tensorflow and it's possibilities, I've got a whole bunch of questions I don't know where to ask except Stack Overflow.
For a quick start I've downloaded a pre trained model and started training it using the Bosch Traffic Lights Dataset. Using a pre trained model is fine but every now and then I wonder if it's possible to modify this model (add or remove a layer) or if it would be best to use keras on top of tf for better customization possibilities.
Additionally I wonder how I should know what are the possible configurations in the pipeline.config file inside every pre trained model. Is there any documentation or do I have to dig into the python files to get into it? In other words, does it even make any sense to change the configuration?
For documentation purpose we're using tensorboard. Unfortunately there's no accuracy (but loss) documented out of the box - how do we get the accuracy displayed as an additional graph?
You should use configuration to tune all the aspects. As mentioned in Tensorflow object detection config files documentation, configuration parameters can be browser in the protocol buffers message definitions. For example, for the model, if you are using faster RCNN, have a look at the different fields of the FasterRcnn message. You could export a trained model, load it in a regular TensorFlow script and add anything you want to it for whatever purpose, but the object detection framework is meant to be configuration-driven.
For the metrics, have a look at Supported object detection evaluation protocols. In the EvalConfig message, there is a metrics_set that you can set to different values for different evaluation metrics.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I want to create two separate CNN models from a single CNN model. Let me name it as CNN-A and CNN-B.
i.e., Original CNN model = CNN-A model + CNN-B model
During the prediction, the raw input dataset are feed to the CNN-A. The output of the CNN-A are feed as input to the CNN-B. The original output of the original model is the output of CNN-B model.
To implement the above architecture, I would like get your suggestions and idea if any, please?
The implementation seems redundant. The reason is the input to each CNN should be an image. And let's say the output of the first CNN network is an image and you feed that to the second CNN, this is the same as stacking multiple convolution layers with additional dropouts and whatnot for the One CNN model.
So after all implementing a deep CNN will mimic the architecture you want.
You can also take a look at transfer learning, which is able to utilize a pre-trained model's layers and you are capable to add your own final layers and make adjustments. This is also similar to what you are talking about.
================ After Comment =====================
You could use a model architecture like mobilenet for a model to be deployed on your mobile.
You could also apply transfer learning to existing pre-trained mobilenet models which will save a lot of time and resources.
Lastly, you could deploy the model(used for computers) on a server using Flask. Then create an API that will provide predictions when you send the relevant data to the server via a POST request. This is commonly used to reduce the load on mobiles and this the approach that I would prefer. This method is relatively efficient and is easily scalable.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to implement and reproduce the results of federated Bert pertaining in paper
Federated pretraining and fine-tuning of BERT using clinical notes from multiple silos.
I prefer to use TensorFlow
code of Bert pretraining.
For training in a federated way, initially, I had divided dataset into 3 different silos(each of that contains discharge summary of 50 patients, using mimic-3 data). and then pretrained the Bert model for each dataset using TensorFlow
implementation of Bert pretraining from the official release of Bert.
Now I have three different models that are pretrained from a different dataset. for model aggregation, I need to take an average of all three models. since the number of notes in each silo is equal, for averaging I need to do sum all models and divide by three.
How to take avg of models as did in the paper? somebody, please give me some insights to code this correctly. The idea of averaging the model weight is taken from the paper FEDERATED LEARNING: STRATEGIES FOR IMPROVING
COMMUNICATION EFFICIENCY
.
I am very new to deep learning
and TensorFlow
. so someone please help me to figure out the issue and suggest some reading material for TensorFlow
.
In the paper, it is mentioned that It is a good option to overcome privacy and regulatory issues while sharing of clinical data. My question is
is it possible to get sensitive data from this model.ckpt files? Then how?
Any help would be appreciated. Thanks...
Model averaging can be done in many ways. The simplest is to have a complete copy of each architecture in each silo, and take a (weighted) average of their parameter scores, and use this as the parameters for the full model. However there are a number of practical issues (latency, network speed, computational power of device) which may prohibit this, and so more complex solutions where silos are only trained on subsets of variables etc are used (as in the paper you cite).
It is not generally possible to retrieve information (sensitive of otherwise) from a dataset purely from the parameter updates to a model fine-tuned on it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am studying deep learning recently, mainly rely on Andrew Ng's Deep Learning Specialization on Coursera.
And I want to build my own model to classify MNIST with 99% accuracy (simple MLP model, not CNN). So I use KerasClassifier to wrap my model and use GridsearchCV to fine tune the hyperparameters (including hidden layer number, units number, dropout rate, etc.)
However, when I google "fine tuning", the majority of the results are mainly on "transfer learning", which are just tuning the learning rate, output layer number or freeze layer number.
I know these famous models are capable to deal with many problems with just a little changes. But what if I want to build a tiny model from scratch to handle a special question, what are the common/best practice?
So my questions are mainly about the common/best practice of fine tuning model:
What is the common/best way to fine tuning? (I have seen people tune hyperparameters manually, or using scikit-learn's RandomizedSearchCV/GridSearchCV, or hyperas)
Should I use k-fold cross validation? (Because it's the default set of GridSearchCV, and it extremely increases the training time but helps little)
Is it enough to solve most problems by slightly modifying the off-the-shelf models? If not, to what direction should I move on?
Thanks!
In deep learning, fine tuning usually refers to retraining certain layers of a pretrained model on data from a new domain. I believe you are looking for something like "hyperparameter optimisation". There are various ways to approach this. People have their own preferences regarding the best method. Using RandomizedSearchCV/GridSearch CV or a combination of the two is fine. Trial and error using a few configurations is fine. People sometimes retrain well known architectures completely. Others resort to more complicated approaches, such as opmtimising hyperparameters using evolutionary programming (genetic algorithms). So you're kinda spoiled for choice.
Cross validation is usually only required if your data set is small, and further splitting into a train and test set reduces the size of the test set, decreasing statistical significance (reliability) of your results. If you have enough data, you can go ahead and just split into train/val/test sets. But its usually necessary since since large amounts of labelled data tend to be rare.
If off-the-shelf models are available for the problem you are trying to solve, use them! While familiarising yourself with the implementation and source code is insightful, there's no need to reinvent the wheel and rewrite these solutions!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to machine learning and have spent some time learning python. I have started to learn TensorFlow and Keras for machine learning and I literally have no clue nor any understanding of the process to make the model. How do you know which models to use? which activation functions to use? The amount of layers and dimensions of the output space?
I've noticed most models were the Sequential type, and tend to have 3 layers, why is that? I couldn't find any resources that explain which to use, why we use them, and when. The best I could find was tensorflow's function details. Any elaboration or any resources to clarify would be greatly appreciated.
Thanks.
I'd suggest you continue to read more about machine learning. The one here is a multi-part explanation. Disclaimer: I don't know the author and this is not my own work.
Also, I suggest a simple thought experiment where you have binary classification and have to consider how the different shapes of the activation functions may affect your results.
https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12
Regarding your model choice: This is highly dependent on your data and what you wish to explore. If I were you, I'd try to visualize your data first to see if there are any interesting relationships.
For example, a seaborn pairplot (https://seaborn.pydata.org/generated/seaborn.pairplot.html) is one way to visualize relationships between variables. If you have a lot of data points, I'd suggest only using a sample of at most a few hundred data points as this plot can take a long time to make otherwise. You can also try DataShader, but I haven't used it personally.
Once you visualize your data, then try to actually think about what these relationships might mean between the variables. Doing all of this before using these machine learning models will guide you later as you try to implement some of the models in the above post.
Also, sometimes some deep learning algorithm is not the best approach. Often times (depending on whether you are doing a classification or regression problem) linear (or multiple linear) regression will suffice. For a regression problem, I often start with (multiple) linear regression as my baseline model and then improve upon it with regularization before I try fancy deep neural networks.
Deep neural networks are slower to train then linear models, can easily overfit your data, and can even give the same (and even worse!) results as the simpler linear regression. Consider whether you are trying to be a hammer in search of a nail when outright applying deep learning to a multitude of problems from the get-go.
Hope this helps.
TLDR:
Visualize your data and figure out if you want to do regression or classification
Start with simple linear models as a baseline and compute performance metric (ex. MSE)
Improve (hopefully) with neural networks and see if the additional gain is worth it in your case. At some point, you may have to experiment with different activation functions to see which suits your case more.
Nobody really knows why certain architectures work, that is still a topic of ongoing discussion (see, e.g., this paper).
Finding architectures that work is mostly trial and error, and adopting or modifying existing architectures that seem to work well for related tasks and dataset sizes.
I would refer you to Goodfellow, Bengio, and Courville's book, it is a great resource to get started with machine learning and deep learning in particular.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 11 months ago.
Improve this question
I am making a machine learning program for time series data analysis and using NEAT could help the work. I started to learn TensorFlow not long ago but it seems that the computational graphs in TensorFlow are usually fixed. Is there tools in TensorFlow to help build a dynamically evolving neural network? Or something like Pytorch would be a better alternative? Thanks.
One way to make an evolving tensorflow network would be to use either hyperneat or the es-hyperneat algorithms instead of running the evolution on the individual networks in the species this instead evolves a "genome" that is actually cppn that encodes the phenotype neural nets. For the cppn you can use a feed forward tensorflow network with the caveat of having different activation functions that can be used at each node, this lets the cppn evolve to be able to be queried for the structure and weights of the "phenotype" neural network for which you can use a generic tensorflow net(or whatever net you so choose)
I would look into the neat-python and peas libraries and look at the networks they use and replicate those classes with tensorflow nets.
It can't be implemented in the static graph mode of TensorFlow without significant tradeoffs because the topology of the neural networks in the population changes. Static graphs are suited for models whose architecture doesn't change during training. However, it can be done in TensorFlow Eager or PyTorch because they support dynamic computation graphs.
Check this implementation in TensorFlow Eager: https://github.com/crisbodnar/TensorFlow-NEAT
TensorFlow supports eager execution which can support arbitrarily dynamic network topologies.