I am developing an RNN/LSTM model to which I want to encode the sequence in a ByteTensor to save memory as I am limited to a very tight memory. However, when I do so, the model returns the following error:
Expected object of scalar type Byte but got scalar type Float for argument #2 'mat2'
So, there seems to be something else that is need to be Byte tensor as well, but I do not know what is it since the console only shows an error at the line:
output = model(predictor)
It means that inside the model there are float tensors which are being used to operate on your byte tensor (most likely operands in matrix multiplications, additions, etc). I believe you can technically cast them to byte by executing model.type(torch.uint8), but your approach will sooner or later fail anyway - since integers are discrete there is no way to used them in gradient calculations necessary for backpropagation. uint8 values can be used in deep learning to improve performance and memory footprint of inference in a network which is already trained, but this is an advanced technique. For this task your best bet are the regular float32s. If your GPU supports it, you could also use float16 aka half, though it introduces additional complexity and I wouldn't suggest it for beginners.
Related
I want to estimate the fourier transform for a given image of size BxCxWxH
In previous torch version the following did the job:
fft_im = torch.rfft(img, signal_ndim=2, onesided=False)
and the output was of size:
BxCxWxHx2
However, with the new version of rfft :
fft_im = torch.fft.rfft2(img, dim=2, norm=None)
I do not get the same results. Do I miss something?
A few issues
The dim argument you provided is an invalid type, it should be a tuple of two numbers or should be omitted. Really PyTorch should raise an exception. I would argue that the fact this ran without exception is a bug in PyTorch (I opened a ticket stating as much).
PyTorch now supports complex tensor types, so FFT functions return those instead of adding a new dimension for the real/imaginary parts. You can use torch.view_as_real to convert to the old representation. Also worth pointing out that view_as_real doesn't copy data since it returns a view so shouldn't slow things down in any noticeable way.
PyTorch no longer gives the option of disabling one-sided calculation in RFFT. Probably because disabling one-sided makes the result identical to torch.fft.fft2, which is in conflict with the 13th aphorism of PEP 20. The whole point of providing a special real-valued version of the FFT is that you need only compute half the values for each dimension, since the rest can be inferred via the Hermition symmetric property.
So from all that you should be able to use
fft_im = torch.view_as_real(torch.fft.fft2(img))
Important If you're going to pass fft_im to other functions in torch.fft (like fft.ifft or fft.fftshift) then you'll need to convert back to the complex representation using torch.view_as_complex so those functions don't interpret the last dimension as a signal dimension.
I'm trying to parallelize different tensor operations. I'm aware that tf.vectorized_map and/or tf.map_fn can parallelize input tensor(s) with respect to their first axis, but that's not what I'm looking for. I'm looking for a way to parallelize a for loop on a set of tensors with possibly different shapes.
a = tf.ones((2))
b = tf.ones((2,2))
list_of_tensors = [a,b*2,a*3]
for t in list_of_tensors:
# some operation on t which may vary depending on its shape
Is there a possible way to parallelize this for loop on GPU with TensorFlow? (I'm open to any other library if possible i.e. JAX, numba etc.)
Thanks!
According to the documentation,
The shape and dtype of any intermediate or output tensors in the
computation of fn should not depend on the input to fn.
I'm struggling with this problem myself. I think the answer is one suggested in the comments: If you know the maximum length that your tensor can have, represent the variable length tensor by the maximum length tensor plus an integer which gives the actual length of the tensor. Whether this will be useful at all depends on the meaning of "any intermediate", because at some point you may still need the result of the actual shorter length tensor in your computation. It's a bit of a tail-chasing exercise. This part of Tensorflow is extremely frustrating, it's very very hacky to get things to work that should be easy, especially in the realm of obtaining true parallelism on the GPU for deterministic matrix algorithms, outside of the context of machine learning.
This might work inside the loop:
tf.autograph.experimental.set_loop_options(
shape_invariants=[(v, tf.TensorShape([None]))]
)
I'm writing some code in PyTorch and I came across the gather function. Checking the documentation I saw that the index argument takes in a LongTensor, why is that? Why does it need to take in a LongTensor instead of another type such as IntTensor? What are the benefits?
By default all indices in pytorch are represented as long tensors - allowing for indexing very large tensors beyond just 4GB elements (maximal value of "regular" int).
I tried to run the following graph:
Unfortunately, I receive the following error message:
tensorflow.python.framework.errors.InternalError: Message length was negativ
[[Node: random_uniform_1_S1 = _Recv[client_terminated=false,
recv_device= "/job:worker/replica:0/task:1/cpu:0",
send_device="/job:worker/replica:0/task:0/cpu:0",
send_device_incarnation=3959744268201087672,
tensor_name="edge_18_random_uniform_1",
tensor_type=DT_DOUBLE,
_device="/job:worker/replica:0/task:1/cpu:0"]()]]
I noticed that this error message does not occur if the size of random_uniform_1 is 800MB, but it does occur if the size is 8GB.
(Notice that random_uniform_1 has to be transferred from one device to another device.)
Question: Is there a limit on how big a tensor can be, if that tensor has to be transferred between devices?
Yes, currently there is a 2GB limit on an individual tensor when sending it between processes. This limit is imposed by the protocol buffer representation (more precisely, by the auto-generated C++ wrappers produced by the protoc compiler) that is used in TensorFlow's communication layer.
We are investigating ways to lift this restriction. In the mean time, you can work around it by manually adding tf.split() or tf.slice(), and tf.concat() operations to partition the tensor for transfer. If you have very large tf.Variable objects, you can use variable partitioners to perform this transformation automatically. Note that in your program you have multiple 8GB tensors in memory at once, so the peak memory utilization will be at least 16GB.
I am trying to run multiple convolutions on image in TensorFlow and then concatenate results. Because tf.concat allocates new tensor I sometimes run into ResourceExhaustedError (current solution is to change batch_size to smaller value).
So here is my question. Is there a way to create big tensor (I know all dimensions in advance) and then assign results of convolutions to it (part by part to avoid concatenating and memory allocation)? Or maybe there is other more efficient way of doing this?
Something like:
convs = tf.Variable(tf.zeros([..])
tf.update(convs, [..], tf.nn.conv2d(..) + biases1)
tf.update(convs, [..], tf.nn.conv2d(..) + biases2)
^^^^^^^^^ ^^offsets
There isn't a way to do this - TensorFlow objects are immutable by design.
There might be another way to accomplish what you want (and it'd be interesting to know about cases that are running out of memory for future improvements).