Replace torch.gather by other operator? - python

I have one script code, where x1 and x2 size of 1x68x8x8
tmp_batch, tmp_channel, tmp_height, tmp_width = x1.size()
x1 = x1.view(tmp_batch*tmp_channel, -1)
max_ids = torch.argmax(x1, 1)
max_ids = max_ids.view(-1, 1)
x2 = x2.view(tmp_batch*tmp_channel, -1)
outputs_x_select = torch.gather(x2, 1, max_ids) # size of 68 x 1
As for the above code, I have trouble with torch.gather when I used old onnx. Hence, I would like to find an alternative solution that replaces the toch.gather by other operators but gives the same output with the above code. Could you please give me some suggestions?

One workaround is to use the equivalent numpy method. If you include an import numpy as np statement somewhere, you could do the following.
outputs_x_select = torch.Tensor(np.take_along_axis(x2,max_ids,1))
If that gives you a grad related error, try
outputs_x_select = torch.Tensor(np.take_along_axis(x2.detach(),max_ids,1))
An approach without numpy: in this case, it seems that max_ids contains exactly one entry per row. Thus, I believe the following will work:
max_ids = torch.argmax(x1, 1) # do not reshape
x2 = x2.view(tmp_batch*tmp_channel, -1)
outputs_x_select = x2[torch.arange(tmp_batch*tmp_channel),max_ids]

Related

Try to work around the numpy.core._exceptions._ArrayMemoryError issue within my code

I have a data frame -> data with the shape (10000,257). I need to preprocess this dataframe so that I can use it in LSTM which requires a 3 dimensional input - (nrows,ntimesteps,nfeatures)I am working with the code snippet that is provided here:
def univariate_processing(variable, window):
import numpy as np
# create empty 2D matrix from variable
V = np.empty((len(variable)-window+1, window))
# take each row/time window
for i in range(V.shape[0]):
V[i,:] = variable[i : i+window]
V = V.astype(np.float32) # set common data type
return V
def RNN_regprep(df, y, len_input, len_pred): #, test_size):
# create 3D matrix for multivariate input
X = np.empty((df.shape[0]-len_input+1, len_input, df.shape[1]))
# Iterate univariate preprocessing on all variables - store them in XM
for i in range(df.shape[1]):
X[ : , : , i ] = univariate_processing(df[:,i], len_input)
# create 2D matrix of y sequences
y = y.reshape((-1,)) # reshape to 1D if needed
Y = univariate_processing(y, len_pred)
## Trim dataframes as explained
X = X[ :-(len_pred + 1) , : , : ]
Y = Y[len_input:-1 , :]
# Set common datatype
X = X.astype(np.float32)
Y = Y.astype(np.float32)
return X, Y
X,y = RNN_regprep(data,label, len_ipnut=200,len_pred=1)
While running this the following error is obtained:
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 28.9 GiB for an array with shape (10000, 200, 257) and data type float64
I do understand that this is more of an issue with my memory within my server. I want to know any solution that I can change within my code to see if I can avoid this memory error or try reducing this memory consumption?
This is what windowed views are for. Using my recipe here:
var = np.random.rand(10000,257)
w = window_nd(var, 200, axis = 0)
Now you have a windowed view over var:
w.shape
Out[]: (9801, 200, 257)
But, importantly, it's using the exact same data as var, just looking into it in a windowed way:
w.__array_interface__['data'] #This is the memory's starting address
Out[]: (1448954720320, False)
var.__array_interface__['data']
Out[]: (1448954720320, False)
np.shares_memory(var, w)
Out[]: True
w.base.base.base is var #(lots of rearranging views in the background)
Out[]: True
So you can do:
def univariate_processing(variable, window):
return window_nd(variable, window, axis = 0)
That should significantly reduce your memory allocation, no "magic" required :)
You can also try
from skimage.util import view_as_windows
w = np.squeeze(view_as_windows(var, (200, 1)))
Which does almost the same thing. In this case: your answer would be:
def univariate_processing(variable, window):
from skimage.util import view_as_windows
window = (window,) + (1,)*(len(variable.shape)-1)
return np.squeeze(view_as_windows(variable, window))

Vectorizing for loop using splicing in NumPy

I have this for loop:
blockSize = 5
ds = np.arange(20)
ds = np.reshape(ds, (1, len(ds))
counts = np.zeros(len(ds[0]/blockSize))
for i in range(len(counts[0])):
counts[0, i] = np.floor(np.sum(ds[0, i*blockSize:i*blockSize+blockSize]))
I am trying to vectorize it, doing something like this:
countIndices = np.arange(len(counts[0]))
counts[0, countsIndices] = np.floor(np.sum(ds[0, countIndices*blockSize:countIndices*blockSize + blockSize]))
However, this does not work and gives this error:
counts[0, countIndices] = np.floor(np.sum(ds[0, countIndices*blockSize:countIndices*blockSize + blockSize]))
TypeError: only integer scalar arrays can be converted to a scalar index
I know that something like this works:
counts[0, countIndices] = np.floor(ds[0, countIndices*blockSize]
+ ds[0, countIndices*blockSize + 2] +
... ds[0, countIndices*blockSize + blockSize])
The issue is that for large values of blocksize (which blocksize is very large in my actual code), this is not feasible to implement. I am confused as to how to accomplish what I want. Any help is greatly appreciated.
You don't need to do floor if you store the result to an integer array. You can also create a fake new axis of size blockSize to fully vectorize your operation.
block_size = 5
ds = np.arange(80.0).reshape(4, -1) # Shape (4, 20)
counts = np.empty((ds.shape[0], ds.shape[1] // block_size), dtype=int)
To introduce the fake dimension and sum:
ds.reshape(ds.shape[0], -1, block_size).sum(axis=-1, out=counts)
Reshaping does not copy the data, so the operation ds.reshape(ds.shape[0], -1, block_size) is extremely cheap in both time and space.
You can use -1 for one of the reshape dimensions to avoid computing/writing out long division expressions.

Reverse stacking operation on NumPy array

I'm have a line of code that efficiently reshapes a numpy array from a 400x8x8 array to a 160x160 array and I need to reverse the process but can't figure out the reverse of the line.
I can already do this process but it is very inefficient and requires nested loops which I would like to avoid for performance purposes.
Here is the code that I currently have to reverse the process (160x160 > 400x8x8):
previousRow = 0
for rowBlock in range(noBlocksOn1Axis):
previousRow = rowBlock * blockSize
previousColumn = 0
for columnBlock in range(noBlocksOn1Axis):
previousColumn = columnBlock * blockSize
block =
arrayY[previousRow:previousRow+blockSize,
previousColumn:previousColumn + blockSize]
blocksList.append(block)
And here is the line of code that reshapes 400x8x8 > 160x160:
xy = np.zeros((160,160), dtype = np.uint8)
xy = np.vstack(np.hstack(overDone[20*i:20+20*i]) for i in
range(overDone.shape[0]//20))
So any ideas of how I can perform this line of code in reverse?
Thanks :D
Reshape, swap-axes (or transpose axes) and reshape to get overDone back -
xy.reshape(20,8,20,8).swapaxes(1,2).reshape(400,8,8)
More info on intuition behind nd-to-nd array transformation.
Make it generic to handle generic shapes -
m,n = xy.shape
M,N = 20,20 # block size used to get xy
overDone_ = xy.reshape(M,m//M,N,n//N).swapaxes(1,2).reshape(-1,m//M,n//N)
Sample run -
# Original input
In [21]: overDone = np.random.rand(400,8,8)
# Perform forward step to get xy
In [22]: xy = np.vstack(np.hstack(overDone[20*i:20+20*i]) for i in range(overDone.shape[0]//20))
# Use proposed approach to get back overDone
In [23]: out = xy.reshape(20,8,20,8).swapaxes(1,2).reshape(400,8,8)
# Verify output to be same as overDone
In [42]: np.array_equal(out,overDone)
Out[42]: True
Bonus :
We could use those same vectorized reshape+permute-axes steps to create xy for the forward process -
xy = overDone.reshape(20,20,8,8).swapaxes(1,2).reshape(160,160)
What's wrong with numpy.reshape?
my_array_3d = my_array.reshape((400, 8, 8))
my_array_2d = my_array.reshape((160, 160))

OpenCV Python cv2.perspectiveTransform

I'm currently trying to video stabilization using OpenCV and Python.
I use the following function to calculate rotation:
def accumulate_rotation(src, theta_x, theta_y, theta_z, timestamps, prev, current, f, gyro_delay=None, gyro_drift=None, shutter_duration=None):
if prev == current:
return src
pts = []
pts_transformed = []
for x in range(10):
current_row = []
current_row_transformed = []
pixel_x = x * (src.shape[1] / 10)
for y in range(10):
pixel_y = y * (src.shape[0] / 10)
current_row.append([pixel_x, pixel_y])
if shutter_duration:
y_timestamp = current + shutter_duration * (pixel_y - src.shape[0] / 2)
else:
y_timestamp = current
transform = getAccumulatedRotation(src.shape[1], src.shape[0], theta_x, theta_y, theta_z, timestamps, prev,
current, f, gyro_delay, gyro_drift)
output = cv2.perspectiveTransform(np.array([[pixel_x, pixel_y]], dtype="float32"), transform)
current_row_transformed.append(output)
pts.append(current_row)
pts_transformed.append(current_row_transformed)
o = utilities.meshwarp(src, pts_transformed)
return o
I get the following error when it gets to output = cv2.perspectiveTransform(np.array([[pixel_x, pixel_y]], dtype="float32"), transform):
cv2.error: /Users/travis/build/skvark/opencv-python/opencv/modules/core/src/matmul.cpp:2271: error: (-215) scn + 1 == m.cols in function perspectiveTransform
Any help or suggestions would really be appreciated.
This implementation really needs to be changed in a future version, or the docs should be more clear.
From the OpenCV docs for perspectiveTransform():
src – input two-channel (...) floating-point array
Slant emphasis added by me.
>>> A = np.array([[0, 0]], dtype=np.float32)
>>> A.shape
(1, 2)
So we see from here that A is just a single-channel matrix, that is, two-dimensional. One row, two cols. You instead need a two-channel image, i.e., a three-dimensional matrix where the length of the third dimension is 2 or 3 depending on if you're sending in 2D or 3D points.
Long story short, you need to add one more set of brackets to make the set of points you're sending in three-dimensional, where the x values are in the first channel, and the y values are in the second channel.
>>> A = np.array([[[0, 0]]], dtype=np.float32)
>>> A.shape
(1, 1, 2)
Also, as suggested in the comments:
If you have an array points of shape (n_points, dimension) (i.e. dimension is 2 or 3), a nice way to re-format it for this use-case is points[np.newaxis]
It's not intuitive, and though it's documented, it's not very explicit on that point. That's all you need. I've answered an identical question before, but for the cv2.transform() function.

Estimate formants using LPC in Python

I'm new to signal processing (and numpy, scipy, and matlab for that matter). I'm trying to estimate vowel formants with LPC in Python by adapting this matlab code:
http://www.mathworks.com/help/signal/ug/formant-estimation-with-lpc-coefficients.html
Here is my code so far:
#!/usr/bin/env python
import sys
import numpy
import wave
import math
from scipy.signal import lfilter, hamming
from scikits.talkbox import lpc
"""
Estimate formants using LPC.
"""
def get_formants(file_path):
# Read from file.
spf = wave.open(file_path, 'r') # http://www.linguistics.ucla.edu/people/hayes/103/Charts/VChart/ae.wav
# Get file as numpy array.
x = spf.readframes(-1)
x = numpy.fromstring(x, 'Int16')
# Get Hamming window.
N = len(x)
w = numpy.hamming(N)
# Apply window and high pass filter.
x1 = x * w
x1 = lfilter([1., -0.63], 1, x1)
# Get LPC.
A, e, k = lpc(x1, 8)
# Get roots.
rts = numpy.roots(A)
rts = [r for r in rts if numpy.imag(r) >= 0]
# Get angles.
angz = numpy.arctan2(numpy.imag(rts), numpy.real(rts))
# Get frequencies.
Fs = spf.getframerate()
frqs = sorted(angz * (Fs / (2 * math.pi)))
return frqs
print get_formants(sys.argv[1])
Using this file as input, my script returns this list:
[682.18960189917243, 1886.3054773107765, 3518.8326108511073, 6524.8112723782951]
I didn't even get to the last steps where they filter the frequencies by bandwidth because the frequencies in the list aren't right. According to Praat, I should get something like this (this is the formant listing for the middle of the vowel):
Time_s F1_Hz F2_Hz F3_Hz F4_Hz
0.164969 731.914588 1737.980346 2115.510104 3191.775838
What am I doing wrong?
Thanks very much
UPDATE:
I changed this
x1 = lfilter([1., -0.63], 1, x1)
to
x1 = lfilter([1], [1., 0.63], x1)
as per Warren Weckesser's suggestion and am now getting
[631.44354635609318, 1815.8629524985781, 3421.8288991389031, 6667.5030877036006]
I feel like I'm missing something since F3 is very off.
UPDATE 2:
I realized that the order being passed to scikits.talkbox.lpc was off due to a difference in sampling frequency. Changed it to:
Fs = spf.getframerate()
ncoeff = 2 + Fs / 1000
A, e, k = lpc(x1, ncoeff)
Now I'm getting:
[257.86573127888488, 774.59006835496086, 1769.4624576002402, 2386.7093679399809, 3282.387975973973, 4413.0428174593926, 6060.8150432549655, 6503.3090645887842, 7266.5069407315023]
Much closer to Praat's estimation!
The problem had to do with the order being passed to the lpc function. 2 + fs / 1000 where fs is the sampling frequency is the rule of thumb according to:
http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html
I have not been able to get the results you expect, but I do notice two things which might cause some differences:
Your code uses [1, -0.63] where the MATLAB code from the link you provided has [1 0.63].
Your processing is being applied to the entire x vector at once instead of smaller segments of it (see where the MATLAB code does this: x = mtlb(I0:Iend); ).
Hope that helps.
There are at least two problems:
According to the link, the "pre-emphasis filter is a highpass all-pole (AR(1)) filter". The signs of the coefficients given there are correct: [1, 0.63]. If you use [1, -0.63], you get a lowpass filter.
You have the first two arguments to scipy.signal.lfilter reversed.
So, try changing this:
x1 = lfilter([1., -0.63], 1, x1)
to this:
x1 = lfilter([1.], [1., 0.63], x1)
I haven't tried running your code yet, so I don't know if those are the only problems.

Categories