neural network find best hyperameters or architecture first

neural network find best hyperameters or architecture first - python

I'm implementing my first neural network for images classification.
I would like to know if i should start to find best hyperparameters first and then try to modify my neural network architecture (e.g number of layer, dropout...) or architecture then hyperameters?

First you should decide for an architecture and then play around with the hyperparameters. To compare different hyperparameters it is important to have the same base (architecture).
Of course you can also play around with the architecture (layers, nodes,...).But I think here it is easier to search for an architecture online, because often the same or a similar problem yet have been solved or described in a tutorial/blog.
The dropout is also a (training-)hyperparameter and not part of the architecture!

The answer is as always : it depends
What are you trying to achieve?
If you're hoping to make the worlds best image classifier by trial and error then you might want to ask yourself if you think you have more compute available than the people who have already done this. For a really good classifier there are several ones that come with tensorflow/keras and can be easily implemented. If you're goofing around and learning the coding then I'd recommend different architectures because that's going to teach you more functions. If you have a dataset you don't think existing solutions will be good at analysing and genuinely need the best network to solve classify them then unfortunately it still depends...
How to decide:
Firstly decide on the rough order of magnitude for your overall parameter count (the literal number of parameters your model has). For a given number of parameters, architecture is likely to produce the biggest difference in results between representative hyperparameter choices (don't choke your network down to a single neuron in the middle and expect it to be representative of that architecture).
Its important to compare the rough performance per parameter so you're not giving an edge to the networks with greater overfitting capacity. You don't need to use all your training data or even train to completion, mostly you'll find the better networks learn faster and finish better (mostly). In the past I've done grid searches with multiple trials at each point using significantly reduced data then optimised the architecture with the most potential by considering the gradients of the grid search. Fun fact: with sufficient time you can use gradient descent methods on hyperparameters to find local minima. You might well find that there are many similarly top performing models, all of which should you can tune until a clear winner emerges.

Related

how many epochs needed for 68000 input frames for CNN + LSTM acrhitecture?

i'm trying to train my neural network. it aims is to predict video quality. i am using VGG16+LSTM networks, but the VGG16 is not trainable. the total of trainable parameters are 700,000.
i have three questions:
is enough 700,000 trainable parameters for training the 68000 input frames?
is it essential to train vgg16?
how many epochs needed for getting the best resaults?

I haven't been into machine learning in a while, but my understanding is that:
depends, but the only way to find out is to train it and look for over/underfitting
depends on the network layout. It might also be useful to bypass some information around the VGG16, in case the VGG16 hides some of the information you actually need about 'video quality'
depends. You wouldh have to split your data into a training and a test set in order to find that out.
As most things in machine learning and especially deep learning the answers aren't obvious and depend heavily on the problem and the exact network layout. There will be much trial and error involved.
The most important takeaway, I think, is to have two (or even three) different datasets for the training/validation/test step, so you can answer those questions yourself.
For more information, read the wikipedia entry about splitting your datasets.

You start with one and see what impact it had.
Even one epoch will take long and getting the error takes also a bit of time.

How to understand/debug/visualize U-Net segmentation results

I am training a U-Net architecture to for a segmentation task. This is in Python using Keras. I have now run into an issue, that I am trying to understand:
I have two very similar images from a microscopy image series (these are consecutive images), where my current U-Net model performs very good on one, but performs extremely poor on the immediately following one. However, there is little difference between the two to the eye and the histograms also look very much alike. Also on other measurements the model performs great across the whole frame-range, but then this issue appears for other measurements.
I am using data-augmentation during training (histogram stretching, affine transformation, noise-addition) and I am surprised that still the model is so brittle.
Since the U-Net is still mostly a black-box to me, I want to find out steps I can take to better understand the issue and then adjust the training/model accordingly.
I know there are ways to visualize what individual layers learn (e.g. as discussed F. Chollets book see here) and I should be able to apply these to U-Nets, which is fully convolutional.
However, these kinds of methods are practically always discussed in the realm of classifying networks - not semantic segmentation.
So my question is:
Is this the best/most direct approach to reach an understanding of how U-Net models attain a segmentation result? If not, what are better ways to understand/debug U-Nets?

I suggest you use the U-Net container on NGC https://ngc.nvidia.com/catalog/resources/nvidia:unet_industrial_for_tensorflow
I also suggest you read this: Mixed Precision Training: https://arxiv.org/abs/1710.03740
https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/
Let me know how you are progressing and if any public repo, happy to have a look

Improving prediction accuracy in Bayesian Causal Network

I would like to determine the causes of an unexpected outcome (or anamoly) in a thermodynamic process. I have continuous data of the associated variables and trying to make use of 'Bayesian Network (BN)' for the determination of causality relationships. For this purpose, I used a library called 'Causalnex' in Python.
I have followed the tutorial section of this library to build the DAG,BN model and everything works fine upto the step of predictions. The prediction results of minority/less majority classes have an accuracy of around 60-70% (80-90% with SMOTE/SMOTETomek and a particular random state) whereas a stable accuracy of more than 90% is expected. I have implemented following data-preprocessing steps.
Ensuring no missing/NaN values
Discretization (only it is supported by the library)
SMOTE/SMOTETomek for data balancing
Various train/test size combinations
I am struggling to figure out the ways to optimize the model. I could not find any supportive material in Internet for the same.
Are there any Guidelines or 'Best practices' of data pre-processing techniques and dataset requirements that particulary work for this library/ BN model? Could you please suggest any troubleshooting methods to identify the causes of low accuracy/metrics? Perhaps a misunderstood node-node causal relationship in DAG causes mediocre accuracy?
Any ideas/literature/other suitable library regarding this would be of great help!

A few tips that can help:
Changing/Tuning the Structure learning.
Trying different thresholds. When doing from_pandas, you can experiment with different w-threshold values (and the beta term (if you are using from_pandas_lasso)).
This will change the density of the network. A more dense structure implies a BN with more parameters. If the structure is more dense, you have more parameters and your model may perform better. If it is too dense, though, you may not have enough data to train it and may overfit.
Center the Data. Empirically, it seems that NOTEARS (the algorithm behind from_pandas) works best if the data is centered. So, subtracting the mean of the see this may be a good idea.
Ensure causality. NOTEARS does not ensure causality. So we need "experts" to judge the output and make the necessary modifications. If you see edges that don't make causal sense, you can either remove them or add them as tabu_edges and train your network again.
Experiment with discretisation. The performance can be very sensitive to how you discretise the data. Experimenting with various types of discretisation can help. You can use:
Methods available in Causalnex (uniform, for example)
fixed discretisations based on what thresholds make sense for your data
MDLP is a supervised way to discretise data. You can apply MDLP for each node having as "target" one of its children. There are 2 main packages for MDLP in pypy: mdlp and mdlp-discretization

Weight prediction using NNs

I’m relatively new to the topic of machine learning, so naturally I have a couple of issues that I hope you can help me with or lead me in the right direction. I had a project before, during which we collected data of people walking normally and also with a stone in their shoe. We measured Acceleration and also with a gyroscope sensor. Based on this data I build a neural network that can classify the signals into normal or impaired walking. So two possible outputs.
Now my idea is this: I want to, using the same data, build a network that can predict the weights of the participants (it was also recorded).
Based on this my three questions:
- What kind of network structure is most suitable for such a task? (Dense, CNN, LSTM,…)
- Before the network basically had two options to answer from (normal or impaired walking) but now I have a continuous range of answers… How can this be approached?
- How can I make sure the network initializes with a sensible prediction?
I hope all the questions make sense. Any help will be much appreciated!

You can use the NNa architecture you prefer:
If you work with sequences use 1d convolutionals or RNNs.
As you are dealing with a regression problem you have to have a single neuron as output without activation function.
Take a.look here to learn to solve a regression problem with RNNs

Is it possible to remove categories in a pretrained tensorflow model?

I am currently using Tensorflow Object Detection API for my human detection app.
I tried filtering in the API itself which worked but I am still not contended by it because it's slow. So I'm wondering if I could remove other categories in the model itself to also make it faster.
If it is not possible, can you please give me other suggestions to make the API faster since I will be using two cameras. Thanks in advance and also pardon my english :)

Your questions addresses several topics for using neural network pretrained models.
Theoretical methods
In general, you can always neutralize categories by removing the corresponding neurons in the softmax layer and compute a new softmax layer only with the relevant rows of the matrix.
This method will surely work (maybe that is what you meant by filtering) but will not accelerate the network computation time by much, since most of the flops (multiplications and additions) will remain.
Similar to decision trees, pruning is possible but may reduce performance. I will explain what pruning means, but note that the accuracy over your categories may remain since you are not just trimming, you are predicting less categories as well.
Transfer the learning to your problem. See stanford's course in computer vision here. Most of the times I've seen that works good is by keeping the convolution layers as-is, and preparing a medium-size dataset of the objects you'd like to detect.
I will add more theoretical methods if you request, but the above are the most common and accurate I know.
Practical methods
Make sure you are serving your tensorflow model, and not just using an inference python code. This could significantly accelerate performance.
You can export the parameters of the network and load them in a faster framework such as CNTK or Caffe. These frameworks work in C++/CSharp and can inference much faster. Make sure you load the weights correctly, some frameworks use different order in tensor dimensions when saving/loading (little/big endian-like issues).
If your application perform inference on several images, you can distribute the computation via several GPUs. **This can also be done in tensorflow, see Using GPUs.
Pruning a neural network
Maybe this is the most interesting method of adapting big networks for simple tasks. You can see a beginner's guide here.
Pruning means that you remove parameters from your network, specifically the whole nodes/neurons in a decision tree/neural network (resp). To do that in object detection, you can do as follows (simplest way):
Randomly prune neurons from the fully connected layers.
Train one more epoch (or more) with low learning rate, only on objects you'd like to detect.
(optional) Perform the above several times for validation and choose best network.
The above procedure is the most basic one, but you can find plenty of papers that suggest algorithms to do so. For example
Automated Pruning for Deep Neural Network Compression and An iterative pruning algorithm for feedforward neural networks.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.