how does batched matrix multiply works - python

deep learning learning libraries usually offers a function to calculate a batched matrix multiply, but to this moment I can't understand how they do this without the use of any for loop, I want to know the steps to achieve this

Related

Neural Networks Extending Learning Domain

I have a simple function f : R->R, f(x) = x2 + a, and would like to create a neural network to learn that function, as entirely as it can. Currently, I have a pytorch implementation that takes in inputs of a limited range of course, from x0 to xN with a particular number of points. Each epoch, the training data is randomly perturbed, in efforts to not only learn the relationship on the same grid points each time.
Currently, it does a great job of learning on the function on the range it is trained on, but is it at all feasible to train in such a way that can extend this learning beyond what it is trained on? Currently the behavior outside the training range seems dependent on the activation function. For example, with ReLU, the true function (orange) compared to the networks prediction (blue) are below:
I understand that if I transform the input vector to higher dimensions that contain higher powers of x, it may work out pretty well, but for a generalized case and how I plan to implement this in the future it won't work as well on non-polynomial functions.
One thought that came to mind is from support vector machines and the choice of a kernel, and how the radial basis kernel gets around this generalization issue, but I'm not sure if this can be applied here without the inner product properties of svm.
What you want is called extrapolation (as opposed to interpolation which is predicting a value that is inside the trained domain / range). There is never a good solution for extrapolation and using higher powers can give you a better fit for a specific problem, but if you change the fitted curve slightly (either change its x and y-intercept, one of the powers, etc) the extrapolation will be pretty bad again.
This is also why neural networks use a large data set (to maximize their input range and rely on interpolation) and why over-training / over fitting (which is what you're trying to do) is a bad idea; it never works well in the general case.

Where can I find the algorithm behind model.predict?

I would like to implement the code for model.predict (https://keras.io/models/model/) in C++. But I am unable to find the exact logic (equations, formula) used in prediction?
For C++, I implemented the source code here: https://github.com/Dobiasd/frugally-deep
but unfortunately could not find the equation behind the predict function. (Frugally deep exports the model as a .json file and does the prediction using the predict function).
Would there be any resources that I could refer to find the equations for model.predict?
model.predict implements a forward pass of the model, so there is no direct equation, the computation is inferred from the computation graph of the model.
So in order to implement the same behavior, you have to do a forward pass through the layers of the model, where each layer implements its own computation, so its not a simple recommendation of use equation X, because its a large set of computational formulas that you have to implement, one for each kind of layer.
Looking at the repo, it appears you're looking for this.

How do I implement a Gaussian activation function in tensorflow?

I read that adding a new function is not so straightforward in this answer, so that is not a viable option. Also it was mentioned here that it is possible to implement Gaussian using the tools made available by tensorflow. Can someone please give a detailed answer on how to exactly implement a Gaussian activation function in tf such that it behaves like a normal non-linear function and can be trained by back-prop ?
In general, if you want to implement a complex function in tensorflow, you can use all the basic mathematic functions that tensorflow provides and many of the common functions will be implemented.
Those functions provided by tensorflow's api are usually implemented with gradient descent compatibility in mind. So if you implement your complex function using derivable tensorflow operations, your function will be compatible with gradient descent.
So when you have a new function to implement, search for its mathematical formula, most of the time you will find corresponding mathematical operations in tensorflow.
(To provide a specific, ready to use answer for your particular problem is not in the best interest of you or SO), but check the formula of a gaussian function and it should be easy to implement.

How-To Generate a 3D Numpy Array On-Demand for an LSTM

I am currently trying to use a "simple" LSTM network implemented through Keras for a summer project. Looking at the example code given, it appears the LSTM code wants a pre-generated 3D numpy array. As the dataset and the associated time interval I want to use are both rather large, it would be very prohibitive for me to load a "complete array" all at once. Is it possible to load the raw dataset and apply the sequencing transform to it as needed by the network (in this case construct the 3D array from x time-interval windows that then increment by 1 each time)? If so, how would you go about doing this?
Thanks for any help you can provide!
I found the answer to this on the Keras slack from user rocketknight. Use the model.fit_generator function. Define a generator function somewhere within your main python script that "yields" a batch of data. Then call this function in the arguments of the model.fit_generator function.

Data Augmentation using GPU in Theano

I am new in Theano and Deep Learning, I am running my experiments in Theano but I would like to reduce the time I spend per epoch by doing data augmentation directly using the GPU.
Unfortunately I can not use PyCuda, so I would like to know if is possible to do basic Data Augmentation using Theano. For example Translation or Rotation in images, meanwhile I am using scipy functions in CPU using Numpy but it is quite slow.
If the data augmentation is part of your computation graph, and can be executed on GPU, it will naturally be executed on the GPU. So the question narrows down to "is it possible to do common data augmentation tasks using Theano tensor operations on the GPU".
If the transformations you want to apply are just translations, you can just use theano.tensor.roll followed by some masking. If you want the rotations as well, take a look at this implementation of spatial transformer network. In particular take a look at the _transform function, it takes as an input a matrix theta that has a 2x3 transformation (left 2x2 is rotation, and right 1x2 is translation) one per sample and the actual samples, and applies the rotation and translation to those samples. I didn't confirm that what it does is optimized for the GPU (i.e. it could be that the bottleneck of that function is executed on the CPU, which will make it not appropriate for your use case), but it's a good starting point.

Categories