Overlapping iteration over theano tensor - python

I am trying to implement a scan loop in theano, which given a tensor will use a "moving slice" of the input. It doesn't have to actually be a moving slice, it can be a preprocessed tensor to another tensor that represents the moving slice.
Essentially:
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]
|-------| (first iteration)
|-------| (second iteration)
|-------| (third iteration)
...
...
...
|-------| (last iteration)
where |-------| is the input for each iteration.
I am trying to figure out the most efficient way to do it, maybe using some form of referencing or manipulating strides, but I haven't managed to get something to work even for pure numpy.
One possible solution I found can be found here, but I can't figure out how to use strides and I don't see a way to use that with theano.

You can build a vector containing the starting index for the slice at each timestep and call Scan with that vector as a sequence and your original vector as a non-sequence. Then, inside Scan, you can obtain the slice you want at every iteration.
I included an example in which I also made the size of the slices a symbolic input, in case you want to change it from one call of your Theano function to the next:
import theano
import theano.tensor as T
# Input variables
x = T.vector("x")
slice_size = T.iscalar("slice_size")
def step(idx, vect, length):
# From the idx of the start of the slice, the vector and the length of
# the slice, obtain the desired slice.
my_slice = vect[idx:idx + length]
# Do something with the slice here. I don't know what you want to do
# to I'll just return the slice itself.
output = my_slice
return output
# Make a vector containing the start idx of every slice
slice_start_indices = T.arange(x.shape[0] - slice_size + 1)
out, updates = theano.scan(fn=step,
sequences=[slice_start_indices],
non_sequences=[x, slice_size])
fct = theano.function([x, slice_size], out)
Running the function with your parameters produces the output :
print fct(range(17), 5)
[[ 0. 1. 2. 3. 4.]
[ 1. 2. 3. 4. 5.]
[ 2. 3. 4. 5. 6.]
[ 3. 4. 5. 6. 7.]
[ 4. 5. 6. 7. 8.]
[ 5. 6. 7. 8. 9.]
[ 6. 7. 8. 9. 10.]
[ 7. 8. 9. 10. 11.]
[ 8. 9. 10. 11. 12.]
[ 9. 10. 11. 12. 13.]
[ 10. 11. 12. 13. 14.]
[ 11. 12. 13. 14. 15.]
[ 12. 13. 14. 15. 16.]]

You could use this rolling_window recipe:
import numpy as np
def rolling_window_lastaxis(arr, winshape):
"""
Directly taken from Erik Rigtorp's post to numpy-discussion.
http://www.mail-archive.com/numpy-discussion#scipy.org/msg29450.html
(Erik Rigtorp, 2010-12-31)
See also:
http://mentat.za.net/numpy/numpy_advanced_slides/ (Stéfan van der Walt, 2008-08)
https://stackoverflow.com/a/21059308/190597 (Warren Weckesser, 2011-01-11)
https://stackoverflow.com/a/4924433/190597 (Joe Kington, 2011-02-07)
https://stackoverflow.com/a/4947453/190597 (Joe Kington, 2011-02-09)
"""
if winshape < 1:
raise ValueError("winshape must be at least 1.")
if winshape > arr.shape[-1]:
print(winshape, arr.shape)
raise ValueError("winshape is too long.")
shape = arr.shape[:-1] + (arr.shape[-1] - winshape + 1, winshape)
strides = arr.strides + (arr.strides[-1], )
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
x = np.arange(17)
print(rolling_window_lastaxis(x, 5))
which prints
[[ 0 1 2 3 4]
[ 1 2 3 4 5]
[ 2 3 4 5 6]
[ 3 4 5 6 7]
[ 4 5 6 7 8]
[ 5 6 7 8 9]
[ 6 7 8 9 10]
[ 7 8 9 10 11]
[ 8 9 10 11 12]
[ 9 10 11 12 13]
[10 11 12 13 14]
[11 12 13 14 15]
[12 13 14 15 16]]
Note that there are even fancier extensions of this, such as Joe Kington's rolling_window which can roll over multi-dimensional windows, and Sebastian Berg's implementation which, in addition, can jump by steps.

Related

numpy.gradient with edge_order=2

I don't understand the fontionnement of the option edge_order=2 of np.gradient :
x=np.arange(5)**2
print(x)
print(np.gradient(x,edge_order=2))
np.gradient return :
[0. 2. 4. 6. 8.]
I don't understand why value at index 0 is 0 and not : (x[1]-x[-1])/2=(1-16)/2=-7.5
And at index 4 it's 8 and not : (x[0]-x[3])/2=(0-6)/2=-3

Creating an array based on a plot of custom function (Python)

I'm trying to use Numpy to create a y vector that will correspond to the following plot:
The x values will run from 0 to 24, the y values should be:
0 to 6 will be 0
6 to 18 will be sort of parabola
18 to 24 will be 0 again
What is a good way to do it? I don't have any practical ideas yet (I thought about some sort of interpolation).
Thank you!
I have done it assuming that you want a circle shape instead of a parabola (based on your scheme).
import numpy as np
length = 24
radius = 6
x = np.arange(length)
y = np.sqrt(radius**2-(x-(length/2))**2)
y = np.nan_to_num(y)
print(x)
# [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
print(y)
# [0. 0. 0. 0. 0. 0.
# 0. 3.31662479 4.47213595 5.19615242 5.65685425 5.91607978
# 6. 5.91607978 5.65685425 5.19615242 4.47213595 3.31662479
# 0. 0. 0. 0. 0. 0. ]

Weighting Data Using Numpy

My data looks like:
list=[44359, 16610, 8364, ..., 1, 1, 1]
For each element in list I want to take i*([i+1]+[i-1])/2, where i is an element in the list, and i+1 and i-1 are the adjacent elements.
For some reason I cannot seem to do this cleanly in NumPy.
Here's what I've tried:
weights=[]
weights.append(1)
for i in range(len(hoff[3])-1):
weights.append((hoff[3][i-1]+hoff[3][i+1])/2)
Where I append 1 to the weights list so that lengths will match at the end. I arbitrarily picked 1, I'm not sure how to deal with the leftmost and rightmost points either.
You can use numpy's array operations to represent your "loop". If you think of data as bellow, where pL and pR are the values you choose to "pad" your data with on the left and right:
[pL, 0, 1, 2, ..., N-2, N-1, pR]
What you're trying to do is this:
[0, ..., N - 1] * ([pL, 0, ..., N-2] + [1, ..., N -1, pR]) / 2
Written in code it looks something like this:
import numpy as np
data = np.random.random(10)
padded = np.concatenate(([data[0]], data, [data[-1]]))
data * (padded[:-2] + padded[2:]) / 2.
Repeating the first and last value is known as "extending" in image processing, but there are other edge handling methods you could try.
I would use pandas for this, filling in the missing left- and right-most values with 1 (but you can use any value you want):
import numpy
import pandas
numpy.random.seed(0)
data = numpy.random.randint(0, 10, size=15)
df = (
pandas.DataFrame({'hoff': data})
.assign(before=lambda df: df['hoff'].shift(1).fillna(1).astype(int))
.assign(after=lambda df: df['hoff'].shift(-1).fillna(1).astype(int))
.assign(weight=lambda df: df['hoff'] * df[['before', 'after']].mean(axis=1))
)
print(df.to_string(index=False)
And that gives me:
hoff before after weight
5 1 0 2.5
0 5 3 0.0
3 0 3 4.5
3 3 7 15.0
7 3 9 42.0
9 7 3 45.0
3 9 5 21.0
5 3 2 12.5
2 5 4 9.0
4 2 7 18.0
7 4 6 35.0
6 7 8 45.0
8 6 8 56.0
8 8 1 36.0
1 8 1 4.5
A pure numpy-based solution would look like this (again, filling with 1):
before_after = numpy.ones((data.shape[0], 2))
before_after[1:, 0] = data[:-1]
before_after[:-1, 1] = data[1:]
weights = data * before_after.mean(axis=1)
print(weights)
array([ 2.5, 0. , 4.5, 15. , 42. , 45. , 21. , 12.5, 9. ,
18. , 35. , 45. , 56. , 36. , 4.5])

PYTHON 3.6 replacing the elements of a matrix by the elements of a vector SNAIL/SPIRAL MATRIX AGAIN

I hope you'r doing awesome.
I need your help, im trying to do some matrices like this
[2 3]
[7 5]
[17 19 23]
[13 2 3 ]
[11 7 5 ]
[17 19 23 29]
[13 2 3 31]
[11 7 5 37]
[53 47 43 41]
As you can see, these matrices are made of prime numbers organiced in a snail/spiral form.
I'm almost there. Let me explain to you what I've done.
first, I made an "base matrix", just like this for a 4x4 example
[6 7 8 9]
[5 0 1 10]
[4 3 2 11]
[15 14 13 12]
Second, I made a vector made of the first 16 prime numbers without the 1, for this example. Just like this
[ 2. 3. 5. 7. 11. 13. 17. 19. 23. 29. 31. 37. 41. 43. 47. 53.]
And Third, i want to replace the elements of the vector in the matrix, but this final step is my problem
I have tried with this code
n = input("enter the length of the matrix (maximum 12): ")
if (n <= 0):
print("please enter a positive integer")
elif (n > 0):
M = np.zeros([n, n])
init = 0
nlimit = n - 1
c = 0
if (n % 2 == 0):
while (c < (n*n)):
for i in range(init, nlimit, 1):
c = c + 1
M[nlimit, i] = c
for i in range(nlimit, init, -1):
c = c + 1
M[i, nlimit] = c
for i in range(nlimit, init, -1):
c = c + 1
M[init, i] = c
for i in range(init, nlimit, 1):
c = c + 1
M[i, init] = c
init = init + 1
nlimit = nlimit - 1
if (n % 2 != 0):
while (c < ((n*n)-1)):
for i in range(nlimit, init, -1):
c = c + 1
M[init, i] = c
for i in range(init, nlimit, 1):
c = c + 1
M[i, init] = c
for i in range(init, nlimit, 1):
c = c + 1
M[nlimit, i] = c
for i in range(nlimit, init, -1):
c = c + 1
M[i,nlimit] = c
init = init +1
nlimit = nlimit - 1
M[(n - 1)/2, (n - 1)/2] = n * n
R = (n*n)*np.ones([n,n])
T = R - M #T = base matrix
A = T
print(T)
q = 1
w=np.zeros(n*n)
w[0] = 2
for i in range(3,1000,2):
p = 0
for j in range (3,i+1,2):
if (i % j == 0):
p = p + 1
if (p == 1):
w[q] = i
q = q + 1
if (q == (n*n)):
break
print (w)
for k in range (0,n*n,1):
for m in range (0, n-1, 1):
for z in range (0, n-1, 1):
if (T[m,z] == k):
A[m,z] = w[k]
print(A)
I know that my mistake is in the las 6 lines, but i don't know what it is.
This program will show you three thing
1) the base matrix
2) the prime number vector
3) the spiral matrix with prime numbers
for n=4 this is the output
[[ 6. 7. 8. 9.]
[ 5. 0. 1. 10.]
[ 4. 3. 2. 11.]
[ 15. 14. 13. 12.]]
[ 2. 3. 5. 7. 11. 13. 17. 19. 23. 29. 31. 37. 41. 43. 47. 53.]
[[ 17. 19. 23. 9.]
[ 43. 43. 19. 10.]
[ 37. 19. 43. 11.]
[ 15. 14. 13. 12.]]
As you can see the last matrix isn't right.
I really really need your help, i hope you can do it, thank you so much.
You are right, you're almost there.
Just use your pre-generated matrix with prime number indices for accessing your vector of prime numbers (last few lines only):
for k in range(n):
for m in range(n):
idx = int(A[k,m])
A[k,m] = w[idx]
print(A)
The element A[k,m] in you pre-generated matrix is the index of the prime number you want. You need to convert it to int (from float) in order to use it for indexing.
The result is as you expect:
[[ 17. 19. 23. 29.]
[ 13. 2. 3. 31.]
[ 11. 7. 5. 37.]
[ 53. 47. 43. 41.]]

Create all x,y pairs from two coordinate arrays

I have 4 lists that I need to iterate over so that I get the following:
x y a b
Lists a and b are of equal length and I iterate over both using the zip function, the code:
for a,b in zip(aL,bL):
print(a,"\t",b)
list x contains 1000 items and list b contains 750 items, after the loop is finished I am supposed to have 750.000 lines.
What is want to achieve is the following:
1 1 a b
1 2 a b
1 3 a b
1 4 a b
.....
1000 745 a b
1000 746 a b
1000 747 a b
1000 748 a b
1000 749 a b
1000 750 a b
How can I achieve this? I have tried enumerate and izip but both results are not what I am seeking.
Thanks.
EDIT:
I have followed your code and used since it is way faster. My output now looks like this:
[[[ 0.00000000e+00 0.00000000e+00 4.00000000e+01 2.30000000e+01]
[ 1.00000000e+00 0.00000000e+00 8.50000000e+01 1.40000000e+01]
[ 2.00000000e+00 0.00000000e+00 7.20000000e+01 2.00000000e+00]
...,
[ 1.44600000e+03 0.00000000e+00 9.20000000e+01 4.60000000e+01]
[ 1.44700000e+03 0.00000000e+00 5.00000000e+01 6.10000000e+01]
[ 1.44800000e+03 0.00000000e+00 8.40000000e+01 9.40000000e+01]]]
I have now 750 lists and each of those have another 1000. I have tried to flatten those to get 4 values (x,y,a,b) per line. This just takes forever. Is there another way to flatten those?
EDIT2
I have tried
np.fromiter(itertools.chain.from_iterable(arr), dtype='int')
but it gave and error: setting an array element with a sequence, so I tried
np.fromiter(itertools.chain.from_iterable(arr[0]), dtype='int')
but this just gave one list back with what I suspect is the whole first list in the array.
EDIT v2
Now using np.stack instead of np.dstack, and handling file output.
This is considerably simpler than the solutions proposed below.
import numpy as np
import numpy.random as nprnd
aL = nprnd.randint(0,100,size=10) # 10 random ints
bL = nprnd.randint(0,100,size=10) # 10 random ints
xL = np.linspace(0,100,num=5) # 5 evenly spaced ints
yL = np.linspace(0,100,num=2) # 2 evenly spaced ints
xv,yv = np.meshgrid(xL,yL)
arr = np.stack((np.ravel(xv), np.ravel(yv), aL, bL), axis=-1)
np.savetxt('out.out', arr, delimiter=' ')
Using np.meshgrid gives us the following two arrays:
xv = [[ 0. 25. 50. 75. 100.]
[ 0. 25. 50. 75. 100.]]
yv = [[ 0. 0. 0. 0. 0.]
[ 100. 100. 100. 100. 100.]]
which, when we ravel, become:
np.ravel(xv) = [ 0. 25. 50. 75. 100. 0. 25. 50. 75. 100.]
np.ravel(yv) = [ 0. 0. 0. 0. 0. 100. 100. 100. 100. 100.]
These arrays have the same shape as aL and bL,
aL = [74 79 92 63 47 49 18 81 74 32]
bL = [15 9 81 44 90 93 24 90 51 68]
so all that's left is to stack all four arrays along axis=-1:
arr = np.stack((np.ravel(xv), np.ravel(yv), aL, bL), axis=-1)
arr = [[ 0. 0. 62. 41.]
[ 25. 0. 4. 42.]
[ 50. 0. 94. 71.]
[ 75. 0. 24. 91.]
[ 100. 0. 10. 55.]
[ 0. 100. 41. 81.]
[ 25. 100. 67. 11.]
[ 50. 100. 21. 80.]
[ 75. 100. 63. 37.]
[ 100. 100. 27. 2.]]
From here, saving is trivial:
np.savetxt('out.out', arr, delimiter=' ')
ORIGINAL ANSWER
idx = 0
out = []
for x in xL:
for y in yL:
v1 = aL[idx]
v2 = bL[idx]
out.append((x, y, v1, v2))
# print(x,y, v1, v2)
idx += 1
but, it's slow, and only gets slower with more coordinates. I'd consider using the numpy package instead. Here's an example with a 2 x 5 dataset.
aL = nprnd.randint(0,100,size=10) # 10 random ints
bL = nprnd.randint(0,100,size=10) # 10 random ints
xL = np.linspace(0,100,num=5) # 5 evenly spaced ints
yL = np.linspace(0,100,num=2) # 2 evenly spaced ints
lenx = len(xL) # 5
leny = len(yL) # 2
arr = np.ndarray(shape=(leny,lenx,4)) # create a 3-d array
this creates an 3-dimensional array having a shape of 2 rows x 5 columns. On the third axis (length 4) we populate the array with the data you want.
for x in range(leny):
arr[x,:,0] = xL
this syntax is a a little confusing at first. You can learn more about it here. In short, it iterates over the number of rows and sets a particular slice of the array to xL. In this case, the slice we have selected is the zeroth index in all columns of row x. (the : means, "select all indices on this axis"). For our small example, this would yield:
[[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]
[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]]
now we do the same for each column:
for y in range(lenx):
arr[:,y,1] = yL
-----
[[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]
[[ 0 100 0 0]
[ 25 100 0 0]
[ 50 100 0 0]
[ 75 100 0 0]
[100 100 0 0]]]
now we need to address arrays aL and bL. these arrays are flat, so we must first reshape them to conform to the shape of arr. In our simple example, this would take an array of length 10 and reshape it into a 2 x 5 2-dimensional array.
a_reshaped = aL.reshape(leny,lenx)
b_reshaped = bL.reshape(leny,lenx)
to insert the reshaped arrays into our arr, we select the 2nd and 3rd index for all rows and all columns (note the two :'s this time:
arr[:,:,2] = a_reshaped
arr[:,:,3] = b_reshaped
----
[[[ 0 0 3 38]
[ 25 0 63 89]
[ 50 0 4 25]
[ 75 0 72 1]
[100 0 24 83]]
[[ 0 100 55 85]
[ 25 100 39 9]
[ 50 100 43 85]
[ 75 100 63 57]
[100 100 6 63]]]
this runs considerably faster than the nested loop solution. hope it helps!
Sounds like you need a nested loop for x and y:
for x in yL:
for y in yL:
for a, b in zip(aL, bL):
print "%d\t%d\t%s\t%s" % (x, y, a, b)
Try this,
for i,j in zip(zip(a,b),zip(c,d)):
print "%d\t%d\t%s\t%s" % (i[0], i[1], j[0], j[1])

Categories