I wonder why I find no utility to map custom pytorch or numpy transformations along any dimensions of complicated tensors/arrays/matrices.
I think I remember that such a thing was available in R. With this fantasy tch.map utility you could do:
>>> import torch as tch # or numpy
>>> # one torch tensor
>>> a = tch.tensor([0, 1, 2, 3, 4])
>>> # one torch function (dummy) returning 2 values
>>> f = lambda x: tch.tensor((x + 1, x * 2))
>>> # map f along dimension 0 of a, expecting 2 outputs
>>> res = tch.map(f, a, 0, 2) # fantasy, optimized on CPU/GPU..
>>> res
tensor([[1, 0],
[2, 2],
[3, 4],
[4, 6],
[5, 8]])
>>> res.shape
torch.Size([5, 2])
>>> # another tensor
>>> a = tch.tensor(list(range(24))).reshape(2, 3, 4).type(tch.double)
>>> # another function (dummy) returning 2 values
>>> f = lambda x: tch.tensor((tch.mean(x), tch.std(x)))
>>> # map f along dimension 2 of a, expecting 2 outputs
>>> res = tch.map(f, a, 2, 2) # fantasy, optimized on CPU/GPU..
tensor([[[ 1.5000, 1.2910],
[ 5.5000, 1.2910],
[ 9.5000, 1.2910]],
[[13.5000, 1.2910],
[17.5000, 1.2910],
[21.5000, 1.2910]]])
>>> res.shape
torch.Size([2, 3, 2])
>>> # yet another tensor
>>> a = tch.tensor(list(range(12))).reshape(3, 4)
>>> # another function (dummy) returning 2x2 values
>>> f = lambda x: x + tch.rand(2, 2)
>>> # map f along all values of a, expecting 2x2 outputs
>>> res = tch.map(f, a, -1, (2, 2)) # fantasy, optimized on CPU/GPU..
>>> print(res)
tensor([[[[ 0.4827, 0.3043],
[ 0.8619, 0.0505]],
[[ 1.4670, 1.5715],
[ 1.1270, 1.7752]],
[[ 2.9364, 2.0268],
[ 2.2420, 2.1239]],
[[ 3.9343, 3.6059],
[ 3.3736, 3.5178]]],
[[[ 4.2063, 4.9981],
[ 4.3817, 4.4109]],
[[ 5.3864, 5.3826],
[ 5.3614, 5.1666]],
[[ 6.6926, 6.2469],
[ 6.7888, 6.6803]],
[[ 7.2493, 7.5727],
[ 7.6129, 7.1039]]],
[[[ 8.3171, 8.9037],
[ 8.0520, 8.9587]],
[[ 9.5006, 9.1297],
[ 9.2620, 9.8371]],
[[10.4955, 10.5853],
[10.9939, 10.0271]],
[[11.3905, 11.9326],
[11.9376, 11.6408]]]])
>>> res.shape
torch.Size([3, 4, 2, 2])
Instead, I keep finding myself messing around with complicated tch.stack, tch.squeeze, tch.reshape, tch.permute, etc., counting dimensions on my fingers not to get lost.
Does such a utility exist and I have missed it for some reason?
Is such a utility impossible to implement for some reason?
Related
I have a script produces the first several iterations of a Markov matrix multiplying a given set of input values. With the matrix stored as A and the start values in the column u0, I use this list comprehension to store the output in an array:
out = np.array([ ( (A**n) * u0).T for n in range(10) ])
The output has shape (10,1,6), but I want the output in shape (10,6) instead. Obviously, I can fix this with .reshape(), but is there a way to avoid creating the extra dimension in the first place, perhaps by simplifying the list comprehension or the inputs?
Here's the full script and output:
import numpy as np
# Random 6x6 Markov matrix
n = 6
A = np.matrix([ (lambda x: x/x.sum())(np.random.rand(n)) for _ in range(n)]).T
print(A)
#[[0.27457312 0.20195133 0.14400801 0.00814027 0.06026188 0.23540134]
# [0.21526648 0.17900277 0.35145882 0.30817386 0.15703758 0.21069114]
# [0.02100412 0.05916883 0.18309142 0.02149681 0.22214047 0.15257011]
# [0.17032696 0.11144443 0.01364982 0.31337906 0.25752732 0.1037133 ]
# [0.03081507 0.2343255 0.2902935 0.02720764 0.00895182 0.21920371]
# [0.28801424 0.21410713 0.01749843 0.32160236 0.29408092 0.07842041]]
# Random start values
u0 = np.matrix(np.random.randint(51, size=n)).T
print(u0)
#[[31]
# [49]
# [44]
# [29]
# [10]
# [ 0]]
# Find the first 10 iterations of the Markov process
out = np.array([ ( (A**n) * u0).T for n in range(10) ])
print(out)
#[[[31. 49. 44. 29. 10.
# 0. ]]
#
# [[25.58242101 41.41600236 14.45123543 23.00477134 26.08867045
# 32.45689942]]
#
# [[26.86917065 36.02438292 16.87560159 26.46418685 22.66236879
# 34.10428921]]
#
# [[26.69224394 37.06346073 16.59208202 26.48817955 22.56696872
# 33.59706504]]
#
# [[26.68772374 36.99727159 16.49987315 26.5003184 22.61130862
# 33.7035045 ]]
#
# [[26.68766363 36.98517264 16.50532933 26.51717543 22.592951
# 33.71170797]]
#
# [[26.68695152 36.98895204 16.50314718 26.51729716 22.59379049
# 33.70986161]]
#
# [[26.68682195 36.98848867 16.50286371 26.51763013 22.59362679
# 33.71056876]]
#
# [[26.68681128 36.98850409 16.50286036 26.51768807 22.59359453
# 33.71054167]]
#
# [[26.68680313 36.98851046 16.50285038 26.51769497 22.59359219
# 33.71054886]]]
print(out.shape)
#(10, 1, 6)
out = out.reshape(10,n)
print(out)
#[[31. 49. 44. 29. 10. 0. ]
# [25.58242101 41.41600236 14.45123543 23.00477134 26.08867045 32.45689942]
# [26.86917065 36.02438292 16.87560159 26.46418685 22.66236879 34.10428921]
# [26.69224394 37.06346073 16.59208202 26.48817955 22.56696872 33.59706504]
# [26.68772374 36.99727159 16.49987315 26.5003184 22.61130862 33.7035045 ]
# [26.68766363 36.98517264 16.50532933 26.51717543 22.592951 33.71170797]
# [26.68695152 36.98895204 16.50314718 26.51729716 22.59379049 33.70986161]
# [26.68682195 36.98848867 16.50286371 26.51763013 22.59362679 33.71056876]
# [26.68681128 36.98850409 16.50286036 26.51768807 22.59359453 33.71054167]
# [26.68680313 36.98851046 16.50285038 26.51769497 22.59359219 33.71054886]]
I think your confusion lies with how arrays can be joined.
Start with a simple 1d array (in numpy 1d is a real thing, not just a 'row vector' or 'column vector'):
In [288]: arr = np.arange(6)
In [289]: arr
Out[289]: array([0, 1, 2, 3, 4, 5])
np.array joins element arrays along a new 1st dimension:
In [290]: np.array([arr,arr])
Out[290]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
np.stack with the default axis value does the same thing. Read its docs.
We can make a 2d array, a column vector:
In [291]: arr1 = arr[:,None]
In [292]: arr1
Out[292]:
array([[0],
[1],
[2],
[3],
[4],
[5]])
In [293]: arr1.shape
Out[293]: (6, 1)
Using np.array on its transpose the (1,6) arrays:
In [294]: np.array([arr1.T, arr1.T])
Out[294]:
array([[[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5]]])
In [295]: _.shape
Out[295]: (2, 1, 6)
Note the middle size 1 dimension, that bothered you.
np.vstack joins the arrays along the existing 1st dimension. It does not add one:
In [296]: np.vstack([arr1.T, arr1.T])
Out[296]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
Or we could join the arrays horizontally, on the 2nd dimension:
In [297]: np.hstack([arr1, arr1])
Out[297]:
array([[0, 0],
[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
That is (6,2) which can be transposed to (2,6):
In [298]: np.hstack([arr1, arr1]).T
Out[298]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
If you use np.array() for input and # for matrix multiplication, it works as expected.
# Random 6x6 Markov matrix
n = 6
A = np.array([ (lambda x: x/x.sum())(np.random.rand(n)) for _ in range(n)]).T
# Random start values
u0 = np.random.randint(51, size=n).T
# Find the first 10 iterations of the Markov process
out = np.array([ ( np.linalg.matrix_power(A,n) # u0).T for n in range(10) ])
print(out)
#[[29. 24. 5. 12. 10. 32. ]
# [15.82875119 13.53436868 20.61648725 19.22478172 20.34082205 22.45478912]
# [21.82434718 10.06037119 14.29281935 20.75271393 18.76134538 26.30840297]
# [20.77484848 10.1379821 15.47488423 19.4965479 20.05618311 26.05955418]
# [21.02944236 10.09401438 15.24263478 19.48662616 19.95767996 26.18960236]
# [20.96887722 10.11647819 15.30729334 19.44261102 20.00089222 26.16384802]
# [20.98086362 10.11522779 15.29529799 19.44899285 19.99137187 26.16824587]
# [20.97795615 10.11606978 15.29817734 19.44798612 19.99293494 26.16687566]
# [20.97858032 10.11591954 15.29752865 19.44839852 19.99245389 26.16711909]
# [20.97844343 10.11594666 15.29766432 19.4483417 19.99254284 26.16706104]]
I made a few changes to the code, although I'm not 100% certain that the result is still the same (I am not familiar with Markov chains).
import numpy as np
n = 6
num_proc_iters = 10
rand_nums_arr = np.random.random_sample((n, n))
rand_nums_arr = np.transpose(rand_nums_arr / rand_nums_arr.sum(axis=1))
u0 = np.random.randint(51, size=n)
res_arr = np.concatenate([np.linalg.matrix_power(rand_nums_arr, curr) # u0 for curr in range(num_proc_iters)])
I would love to hear if anyone can think of any further improvements.
Suppose you have 3 tensors of the same size:
a = torch.randn(3,3)
a = ([[ 0.1945, 0.8583, 2.6479],
[-0.1000, 1.2136, -0.3706],
[-0.0094, 0.4279, -0.6840]])
b = torch.randn(3, 3)
b = ([[-1.1155, 0.2106, -0.2183],
[ 1.6610, -0.6953, 0.0052],
[-0.8955, 0.0953, -0.7737]])
c = torch.randn(3, 3)
c = ([[-0.2303, -0.3427, -0.4990],
[-1.1254, 0.4432, 0.3999],
[ 0.2489, -0.9459, -0.5576]])
In Lua (torch7), they have this function:
[self] map2(tensor1, tensor2, function(x, xt1, xt2))
which applies the given function to all elements of self.
My questions are:
Is there any similar function in python (pytorch)?
Is there any pythonic method to iterate over the 3 tensors and get the respective elements of each tensor without using for loop and indices?
For example:
0.1945 -1.1155 -0.2303
0.8583 0.2106 -0.3427
2.6479 -0.2183 -0.4990
-0.1000 1.6610 -1.1254
...
Edit_1: I have also tried itertools.zip_longest and zip, but the results are not as I expected as mentioned above
You can use Python's map function similar to what you have mentioned. Like this:
>>> tensor_list = [torch.tensor([i, i, i]) for i in range(3)]
>>> list(map(lambda x: x**2, tensor_list))
[tensor([0, 0, 0]), tensor([1, 1, 1]), tensor([4, 4, 4])]
>>>
EDIT: For a PyTorch only approach you can use torch.Tensor.apply_ (Note this does the changes in place and doesn't return a new tensor)
>>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> x.apply_(lambda y: y ** 2)
tensor([[ 1, 4, 9],
[16, 25, 36],
[49, 64, 81]])
>>>
I would like to make three vectors from a Matrix summarizes its non-zero values. A vector of values, a vector of row indexes, and a vector of column indexes.
For example if W = [[ 0. 2. 0.], [ 0. 10. 0.], [ 0. 0. 5.]].
I would like the function to return ([2.0, 10.0, 5.0], [0, 1, 2], [1, 1, 2]).
The code below does the job but is too slow for large matrix. I am working n on the order of 100000. And I do not know which indexes are non zero. Is there a way to speed this up?
from __future__ import division
import numpy as np
import collections
from numpy import *
import copy
#import timing
def nonZeroIndexes(W):
s = W.shape
nRows = s[0]
nColumns = s[1]
values = []
row_indexes = []
column_indexes = []
for r in xrange(nRows):
for c in xrange(nColumns):
if W[r,c] != 0:
values.append(W[r,c])
row_indexes.append(r)
column_indexes.append(c)
return values, row_indexes, column_indexes
n = 3
W = np.zeros((n,n))
W[0,1] = 2
W[1,1] = 10
W[2,2] = 5
vecs = nonZeroIndexes(W)
Use np.nonzero
>>> import numpy as np
>>> W = np.array([[0, 2, 0], [0, 10, 0], [0, 0, 5]])
>>>
>>> def nonZeroIndexes(W):
... zero_pos = np.nonzero(W)
... return (W[zero_pos],) + zero_pos
...
>>>
>>> nonZeroIndexes(W)
(array([ 2, 10, 5]), array([0, 1, 2]), array([1, 1, 2]))
I have a 2D array that I would like to down sample to compare it to another.
Lets say my array x is 512x512, I'd like an array y 128x128 where the elements of y are build using an interpolation of the values overs 4x4 blocks of x (this interpolation could just be taking the average, but other methodes, like geometric average, could be interesting)
So far I looked at scipy.ndimage.interpolation.zoom but I don't get the results I want
>> x = np.arange(16).reshape(4,4)
>> print(x)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
>> y = scipy.ndimage.interpolation.zoom(x, 0.5)
>> print(y)
[[ 0 3]
[12 15]]
I expected y to be
[[ 2.5 4.5]
[10.5 12.5]]
Note that simply setting dtype=np.float32 doesn't solve that ...
sklearn.feature_extraction.image.extract_patches cleverly uses np.lib.stride_tricks.as_strided to produce a windowed array that can be operated on.
The sliding_window function, found here
Efficient Overlapping Windows with Numpy, produces a windowed array with or without overlap
also and let's you get a glimpse of what is happening under the hood.
>>> a = np.arange(16).reshape(4,4)
step_height,step_width determines the overlap for the windows - in your case the steps are the same as the window size, no overlap.
>>> window_height, window_width, step_height, step_width = 2, 2, 2, 2
>>> y = sliding_window(a, (window_height, window_width), (step_height,step_width))
>>> y
array([[[ 0, 1],
[ 4, 5]],
[[ 2, 3],
[ 6, 7]],
[[ 8, 9],
[12, 13]],
[[10, 11],
[14, 15]]])
Operate on the windows:
>>> y = y.mean(axis = (1,2))
>>> y
array([ 2.5, 4.5, 10.5, 12.5])
You need to determine the final shape depending on the number of windows.
>>> final_shape = (2,2)
>>> y = y.reshape(final_shape)
>>> y
array([[ 2.5, 4.5],
[ 10.5, 12.5]])
Searching SO for numpy, window, array should produce numerous other answers and possible solutions.
What you seem to be looking for is the mean over blocks of 4, which is not obtainable with zoom, since zoom uses interpolation (see its docstring)
To obtain what you show, try the following
import numpy as np
x = np.arange(16).reshape(4, 4)
xx = x.reshape(len(x) // 2, 2, x.shape[1] // 2, 2).transpose(0, 2, 1, 3).reshape(len(x) // 2, x.shape[1] // 2, -1).mean(-1)
print xx
This yields
[[ 2.5 4.5]
[ 10.5 12.5]]
Alternatively, this can be done using sklearn.feature_extraction.image.extract_patches
from sklearn.feature_extraction.image import extract_patches
patches = extract_patches(x, patch_shape=(2, 2), extraction_step=(2, 2))
xx = patches.mean(-1).mean(-1)
print xx
However, if your goal is to subsample an image in a graceful way, then taking the mean over blocks of the image is not the right way to do it: It is likely to cause aliasing effects. What you should do in this case is smooth the image ever so slightly using scipy.ndimage.gaussian_filter (e.g. sigma=0.35 * subsample_factor) and then subsample simply by indexing [::2, ::2]
How to concatenate two numpy arrays inside a function and return it considering the following program
#!/usr/bin/env python
import numpy as np
def myfunction(myarray = np.zeros(0)):
print "myfunction : before = ", myarray # This line should not be modified
data = np.loadtxt("test.txt", unpack=True) # This line should not be modified
myarray = np.concatenate((myarray, data))
print "myfunction : after = ", myarray # This line should not be modified
return # This line should not be modified
myarray = np.array([1, 2, 3])
print "main : before = ", myarray
myfunction(myarray)
print "main : after = ", myarray
The result of this code is :
main : before = [1 2 3]
myfunction : before = [1 2 3]
myfunction : after = [ 1. 2. 3. 1. 2. 3. 4. 5.]
main : after = [1 2 3]
And I want :
main : before = [1 2 3]
myfunction : before = [1 2 3]
myfunction : after = [ 1. 2. 3. 1. 2. 3. 4. 5.]
main : after = [ 1. 2. 3. 1. 2. 3. 4. 5.]
How to modify the provided program to get the expected result (the 4 lines marked by # This line should not be modified should remain the same) ?
You should return the value
Modify the function like that:
def myfunction(myarray = np.zeros(0)):
print "myfunction : before = ", myarray # This line should not be modified
data = np.loadtxt("test.txt", unpack=True) # This line should not be modified
concatenated = np.concatenate((myarray, data))
print "myfunction : after = ", myarray # This line should not be modified
return concatenated
and then you get the result like that
result = myfunction(myarray)
You could do the following, but it can go very wrong:
def in_place_concatenate(arr1, arr2) :
n = len(arr1)
arr1.resize((n + len(arr2),), refcheck=False)
arr1[n:] = arr2
And as you would expect:
>>> a = np.arange(10)
>>> b = np.arange(4)
>>> in_place_concatenate(a, b)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3])
But:
>>> a = np.arange(10)
>>> b = np.arange(4)
>>> c = a[:5]
>>> c
array([0, 1, 2, 3, 4])
>>> in_place_concatenate(a, b)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3])
>>> c
array([ 1, 1731952544, 71064376, 1, 67293736])
And if you try to modify any of the data in c you have a segmentation fault...
If you didn't set refcheck to False that wouldn't happen, but it wouldn't let you do the modification inside a function either. So yes, it can be done, but you shouldn't do it: follow Entropiece's method.