I'm trying to parallelize different tensor operations. I'm aware that tf.vectorized_map and/or tf.map_fn can parallelize input tensor(s) with respect to their first axis, but that's not what I'm looking for. I'm looking for a way to parallelize a for loop on a set of tensors with possibly different shapes.
a = tf.ones((2))
b = tf.ones((2,2))
list_of_tensors = [a,b*2,a*3]
for t in list_of_tensors:
# some operation on t which may vary depending on its shape
Is there a possible way to parallelize this for loop on GPU with TensorFlow? (I'm open to any other library if possible i.e. JAX, numba etc.)
Thanks!
According to the documentation,
The shape and dtype of any intermediate or output tensors in the
computation of fn should not depend on the input to fn.
I'm struggling with this problem myself. I think the answer is one suggested in the comments: If you know the maximum length that your tensor can have, represent the variable length tensor by the maximum length tensor plus an integer which gives the actual length of the tensor. Whether this will be useful at all depends on the meaning of "any intermediate", because at some point you may still need the result of the actual shorter length tensor in your computation. It's a bit of a tail-chasing exercise. This part of Tensorflow is extremely frustrating, it's very very hacky to get things to work that should be easy, especially in the realm of obtaining true parallelism on the GPU for deterministic matrix algorithms, outside of the context of machine learning.
This might work inside the loop:
tf.autograph.experimental.set_loop_options(
shape_invariants=[(v, tf.TensorShape([None]))]
)
Related
I want to estimate the fourier transform for a given image of size BxCxWxH
In previous torch version the following did the job:
fft_im = torch.rfft(img, signal_ndim=2, onesided=False)
and the output was of size:
BxCxWxHx2
However, with the new version of rfft :
fft_im = torch.fft.rfft2(img, dim=2, norm=None)
I do not get the same results. Do I miss something?
A few issues
The dim argument you provided is an invalid type, it should be a tuple of two numbers or should be omitted. Really PyTorch should raise an exception. I would argue that the fact this ran without exception is a bug in PyTorch (I opened a ticket stating as much).
PyTorch now supports complex tensor types, so FFT functions return those instead of adding a new dimension for the real/imaginary parts. You can use torch.view_as_real to convert to the old representation. Also worth pointing out that view_as_real doesn't copy data since it returns a view so shouldn't slow things down in any noticeable way.
PyTorch no longer gives the option of disabling one-sided calculation in RFFT. Probably because disabling one-sided makes the result identical to torch.fft.fft2, which is in conflict with the 13th aphorism of PEP 20. The whole point of providing a special real-valued version of the FFT is that you need only compute half the values for each dimension, since the rest can be inferred via the Hermition symmetric property.
So from all that you should be able to use
fft_im = torch.view_as_real(torch.fft.fft2(img))
Important If you're going to pass fft_im to other functions in torch.fft (like fft.ifft or fft.fftshift) then you'll need to convert back to the complex representation using torch.view_as_complex so those functions don't interpret the last dimension as a signal dimension.
Tensorflow has a great deal of transformations that can be applied to 3D-tensors representing images ([height, width, depth]) like tf.image.rot90() or tf.image.random_flip_left_right() for example.
I know that they are meant to be used with queues hence the fact that they operate on only one image.
But would there be a way to vectorize the ops to transform 4D-tensor ([batch_size,height,width,depth]) to same size tensor with op applied image-wise along the first dimension without explicitely looping through them with tf.while_loop()?
(EDIT : Regarding rot90() a clever hack taken from numpy rot90 would be to do:
rot90=tf.reverse(x,tf.convert_to_tensor((False,False,True,False)))
rot90=tf.transpose(rot90,([0,2,1,3])
EDIT 2: It turns out this question has already been answered quite a few times (one example) it seems map_fn is the way to go if you want an optimized version. I had already seen it but I had forgotten. I guess this makes this question a duplicate...
However for random op or more complex op it would be nice to have a generic method to vectorize existing functions...)
Try tf.map_fn.
processed_images = tf.map_fn(process_fn, images)
I need to repeatedly take the Fourier Transform/Inverse Fourier Transform of a 3d function in order to solve a differential equation. Something like:
import pyfftw.interfaces.numpy_fft as fftw
for i in range(largeNumber):
fFS = fftw.rfftn(f)
# Do stuff
f = fftw.irfftn(fFS)
The shape of f is highly noncubic. Is there any performance difference based on the order of dimensions, for example (512, 32, 128) vs (512, 128, 32), etc.?
I am looking for any speed ups available. I have already tried playing around with wisdom. I thought it might be fastest if the largest dimension went last (e.g. 32, 128, 512) so that fFS.shape = (32, 128, 257), but this doesn't appear to be the case.
If you really want to squeeze all the performance out you can, use the FFTW object directly (most easily accessed through pyfftw.builders). This way you get careful control over exactly what copies occur and whether the normalization is performed on inverse.
Your code as-is will likely benefit from using the cache (enabled by calling pyfftw.interfaces.cache.enable()), which minimises the set up time for the general and safe case, though doesn't eliminate it.
Regarding the best arrangement of dimensions, you'll have to suck it and see. Try all the various options and see what is fastest (with timeit). Make sure when you do the tests you're actually using the data arranged in memory as expected and not just taking a view of the same array in memory (which pyfftw may well handle fine without a copy - though there are tweak parameters for this sort of thing).
FFTW tries lots of different options (different algorithms over different FFT representations) and picks the fastest, so you end up with non-obvious implementations that may well change for different datasets that are superficially very similar.
General tips:
Turn on the multi-threading for maximum performance (set threads=N where appropriate).
Make sure your arrays are suitably byte aligned - this has less impact than it used to with modern hardware, but will probably make a difference (particularly if all your higher dimension sizes have the byte alignment as a factor).
Read the tutorial and the api docs.
Is there a way to make a Tensor iterable without running eval() to get its numpy array?
I am trying to iterate through two parts of a tensor after using split() on it, but it happens within the construction of the hidden layers of my neural network, so it needs to happen before I am able to start a session.
import tensorflow as tf
x = tf.placeholder('float', [None, nbits])
layer = [x]
for i in range(1,numbits):
layer.append(tf.add(tf.matmul(weights[i-1], layer[i-1]), biases[i-1]))
aes, bes = tf.split(1, 2, layer[-1])
if i%2 == 1:
for am, a, b in zip(add_layer, aes, bes):
layer.append(am.ex(a, b))
The problem is that layer[-1] is a tf.placeholder at this point, so aes and bes are both tensors, and I can't iterate through them with zip().
Any ideas would be appreciated.
No, there isn't; not directly.
It's easiest to think about Tensorflow programs as being split into two phases: a building Python phase that builds a computation graph, and a execution phase that runs the computation graph. Nothing actually runs during the building phase; all computation happens during the execution phase. The building phase can't depend on the results of the execution phase, except by running the graph (session.run(), .eval(), etc.)
You can't iterate over a Tensor while building the graph, because it doesn't actually get evaluated to a specific set of values until you call session.run(). Instead it's just a reference to a node in the computation graph.
In general, you have to use Tensorflow functions to manipulate Tensors, not Python primitives (like zip). One way I like to think of it is that it's almost like a Tensor is a radioactive object in a sealed box, and you can only handle it indirectly using a robot that can perform a certain set of actions (Tensorflow library functions) :-) So you likely need to find a way to express your task using Tensorflow primitives.
If you gave a complete example of what you're trying to do, it might be possible to say more (it's not clear to me from your code fragment). One possibility might be to use tf.split to split the tensors up into Python lists of subtensors, and then use something like zip on the lists.
I hope that helps!
I am trying to run multiple convolutions on image in TensorFlow and then concatenate results. Because tf.concat allocates new tensor I sometimes run into ResourceExhaustedError (current solution is to change batch_size to smaller value).
So here is my question. Is there a way to create big tensor (I know all dimensions in advance) and then assign results of convolutions to it (part by part to avoid concatenating and memory allocation)? Or maybe there is other more efficient way of doing this?
Something like:
convs = tf.Variable(tf.zeros([..])
tf.update(convs, [..], tf.nn.conv2d(..) + biases1)
tf.update(convs, [..], tf.nn.conv2d(..) + biases2)
^^^^^^^^^ ^^offsets
There isn't a way to do this - TensorFlow objects are immutable by design.
There might be another way to accomplish what you want (and it'd be interesting to know about cases that are running out of memory for future improvements).