Creating an array based on a plot of custom function (Python) - python

I'm trying to use Numpy to create a y vector that will correspond to the following plot:
The x values will run from 0 to 24, the y values should be:
0 to 6 will be 0
6 to 18 will be sort of parabola
18 to 24 will be 0 again
What is a good way to do it? I don't have any practical ideas yet (I thought about some sort of interpolation).
Thank you!

I have done it assuming that you want a circle shape instead of a parabola (based on your scheme).
import numpy as np
length = 24
radius = 6
x = np.arange(length)
y = np.sqrt(radius**2-(x-(length/2))**2)
y = np.nan_to_num(y)
print(x)
# [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
print(y)
# [0. 0. 0. 0. 0. 0.
# 0. 3.31662479 4.47213595 5.19615242 5.65685425 5.91607978
# 6. 5.91607978 5.65685425 5.19615242 4.47213595 3.31662479
# 0. 0. 0. 0. 0. 0. ]

Related

Numpy matrix with values equal to offset from central row/column

For given odd value a, I want to generate two matrices, where values represent the offset from central row/column in x or y direction. Example for a=5:
| -2 -1 0 1 2 | | -2 -2 -2 -2 -2 |
| -2 -1 0 1 2 | | -1 -1 -1 -1 -1 |
X = | -2 -1 0 1 2 | Y = | 0 0 0 0 0 |
| -2 -1 0 1 2 | | 1 1 1 1 1 |
| -2 -1 0 1 2 | | 2 2 2 2 2 |
What is the easiest way to achieve this with Numpy?
Try meshgrid:
n=5
X,Y = np.meshgrid(np.arange(n),np.arange(n))
X -= n//2
Y -= n//2
Or
n = 5
range_ = np.arange(-(n//2), n-n//2)
X,Y = np.meshgrid(range_, range_)
Also check out ogrid.
np.arange and np.repeat will do:
a = 5
limits = -(a//2), a//2 + 1
col = np.c_[np.arange(*limits)]
Y = np.repeat(col, repeats=a, axis=1)
X = Y.T
Just use fancy indexing technique of Numpy module. The following code demonstrates the solution for a 5X5 matrix:
import numpy as np
if __name__=='__main__':
A = np.zeros((5, 5))
A[np.arange(5), :] = np.arange(5)//2 - np.arange(5)[::-1]//2
B = np.zeros((5, 5))
B[:, np.arange(5)] = np.arange(5)//2 - np.arange(5)[::-1]//2
B = B.T
Output
[[-2. -1. 0. 1. 2.]
[-2. -1. 0. 1. 2.]
[-2. -1. 0. 1. 2.]
[-2. -1. 0. 1. 2.]
[-2. -1. 0. 1. 2.]]
[[-2. -2. -2. -2. -2.]
[-1. -1. -1. -1. -1.]
[ 0. 0. 0. 0. 0.]
[ 1. 1. 1. 1. 1.]
[ 2. 2. 2. 2. 2.]]
Cheers.

For Loop Stops After Only 10 Iterations Python

Good evening,
I was wondering if someone could please provide insight into a problem I'm having. I've written a simple for loop, but it keeps stopping after only 10 iterations. Any thoughts or ideas would be greatly appreciated, thank you. Below is my code:
'''
import pandas as pd
import numpy as np
directory1 = pd.read_csv('/media/Thesis_Maps//testing/JM_rev5.csv', header=None, skiprows=[-1], encoding='utf-8')
results = np.zeros((len(directory_DOE), 3) )
for i in directory1:
x1=directory1.iloc[i,1]
y1=x1+5
results[i,0] = y1
In your example pandas iterates columns, not rows.
You would have to use directory1.iterrows() (or similar functions)
for index, row in directory1.iterrows():
results[index, 0] = row[1] + 5
but you can do the same without iteration
results[:,0] = directory1[1] + 5
Example
import pandas as pd
import numpy as np
import random
random.seed(1) # random will create always the same values
directory1 = pd.DataFrame(
[[random.randint(0, 10) for x in range(10)] for x in range(200)],
)
print('shape:', directory1.shape)
print(directory1.head())
# ----
results = np.zeros((len(directory1), 3))
for index, row in directory1.iterrows():
results[index, 0] = row[1] + 5
print(results[:5])
# ---
results = np.zeros((len(directory1), 3))
results[:,0] = directory1[1] + 5
print(results[:5])
Result
shape: (200, 10)
0 1 2 3 4 5 6 7 8 9
0 2 9 1 4 1 7 7 7 10 6
1 3 1 7 0 6 6 9 0 7 4
2 3 9 1 5 0 0 0 10 8 0
3 6 10 3 6 0 8 3 7 7 8
4 3 5 3 10 3 7 4 0 6 8
# ---
[[14. 0. 0.]
[ 6. 0. 0.]
[14. 0. 0.]
[15. 0. 0.]
[10. 0. 0.]]
# ---
[[14. 0. 0.]
[ 6. 0. 0.]
[14. 0. 0.]
[15. 0. 0.]
[10. 0. 0.]]

Python numpy array split index out of range

I am trying to execute the following code:
def calculate_squared_dist_sliced_data(self, data, output, proc_numb):
for k in range(1, self.calc_border):
print("Calculating",k, "of", self.calc_border, "\n", (self.calc_border - k), "to go!")
kmeans = KMeansClusterer.KMeansClusterer(k, data)
print("inertia in round", k, ": ", kmeans.calc_custom_params(data, k).inertia_)
output.put( proc_numb, (kmeans.calc_custom_params(self.data, k).inertia_))
def calculate_squared_dist_mp(self):
length = np.shape(self.data)[0]
df_array = []
df_array[0] = self.data[int(length/4), :]
df_array[1] = self.data[int((length/4)+1):int(length/2), :]
df_array[2] = self.data[int((length/2)+1):int(3*length/4), :]
df_array[3] = self.data[int((3*length/4)+1):int(length/4), :]
output = mp.Queue()
processes = [mp.Process(target=self.calculate_squared_dist_sliced_data, args=(df_array[x], output, x)) for x in range(4)]
for p in processes:
p.start()
for p in processes:
p.join()
results = [output.get() for p in processes]
When executing df_array[0] = self.data[int(length/4), :], I get the following error:
IndexError: list assignment index out of range
The variable lentgh has the value 20195 (which is correct). I want to do the method calculate_squared_dist_sliced_data by multiprocessing, so I need to split the array data that is passed to this class.
Here is an example of how this numpy array looks:
[[ 0. 0. 0.02072968 ..., -0.07872599 -0.10147049 -0.44589 ]
[ 0. -0.11091352 0.11208243 ..., 0.08164318 -0.02754813
-0.44921876]
[ 0. -0.10642599 0.0028097 ..., 0.1185457 -0.22482443
-0.25121125]
...,
[ 0. 0. 0. ..., -0.03617197 0.00921685 0. ]
[ 0. 0. 0. ..., -0.08241634 -0.05494423
-0.10988845]
[ 0. 0. 0. ..., -0.03010139 -0.0925091
-0.02145017]]
Now I want to split this hole array into four equal pieces to give each one to a process. However, when selecting the rows I get the exception mentioned above. Can someone help me?
Maybe for a more theroretical approach of what I want to do:
A B C D
1 2 3 4
5 6 7 8
9 5 4 3
1 8 4 3
As a result I want to have for example two arrays, each containing two rows:
A B C D
1 2 3 4
5 6 7 8
and
A B C D
9 5 4 3
1 8 4 3
Can someone help me?
The left side of the assignment is not allowed as you list has length 0.
Either fix it to:
df_array = [None, None, None, None]
or use
df_array.append(self.data[int(length/4), :])
...
instead.
I just noticed that I tried to use a list like an array...

Create all x,y pairs from two coordinate arrays

I have 4 lists that I need to iterate over so that I get the following:
x y a b
Lists a and b are of equal length and I iterate over both using the zip function, the code:
for a,b in zip(aL,bL):
print(a,"\t",b)
list x contains 1000 items and list b contains 750 items, after the loop is finished I am supposed to have 750.000 lines.
What is want to achieve is the following:
1 1 a b
1 2 a b
1 3 a b
1 4 a b
.....
1000 745 a b
1000 746 a b
1000 747 a b
1000 748 a b
1000 749 a b
1000 750 a b
How can I achieve this? I have tried enumerate and izip but both results are not what I am seeking.
Thanks.
EDIT:
I have followed your code and used since it is way faster. My output now looks like this:
[[[ 0.00000000e+00 0.00000000e+00 4.00000000e+01 2.30000000e+01]
[ 1.00000000e+00 0.00000000e+00 8.50000000e+01 1.40000000e+01]
[ 2.00000000e+00 0.00000000e+00 7.20000000e+01 2.00000000e+00]
...,
[ 1.44600000e+03 0.00000000e+00 9.20000000e+01 4.60000000e+01]
[ 1.44700000e+03 0.00000000e+00 5.00000000e+01 6.10000000e+01]
[ 1.44800000e+03 0.00000000e+00 8.40000000e+01 9.40000000e+01]]]
I have now 750 lists and each of those have another 1000. I have tried to flatten those to get 4 values (x,y,a,b) per line. This just takes forever. Is there another way to flatten those?
EDIT2
I have tried
np.fromiter(itertools.chain.from_iterable(arr), dtype='int')
but it gave and error: setting an array element with a sequence, so I tried
np.fromiter(itertools.chain.from_iterable(arr[0]), dtype='int')
but this just gave one list back with what I suspect is the whole first list in the array.
EDIT v2
Now using np.stack instead of np.dstack, and handling file output.
This is considerably simpler than the solutions proposed below.
import numpy as np
import numpy.random as nprnd
aL = nprnd.randint(0,100,size=10) # 10 random ints
bL = nprnd.randint(0,100,size=10) # 10 random ints
xL = np.linspace(0,100,num=5) # 5 evenly spaced ints
yL = np.linspace(0,100,num=2) # 2 evenly spaced ints
xv,yv = np.meshgrid(xL,yL)
arr = np.stack((np.ravel(xv), np.ravel(yv), aL, bL), axis=-1)
np.savetxt('out.out', arr, delimiter=' ')
Using np.meshgrid gives us the following two arrays:
xv = [[ 0. 25. 50. 75. 100.]
[ 0. 25. 50. 75. 100.]]
yv = [[ 0. 0. 0. 0. 0.]
[ 100. 100. 100. 100. 100.]]
which, when we ravel, become:
np.ravel(xv) = [ 0. 25. 50. 75. 100. 0. 25. 50. 75. 100.]
np.ravel(yv) = [ 0. 0. 0. 0. 0. 100. 100. 100. 100. 100.]
These arrays have the same shape as aL and bL,
aL = [74 79 92 63 47 49 18 81 74 32]
bL = [15 9 81 44 90 93 24 90 51 68]
so all that's left is to stack all four arrays along axis=-1:
arr = np.stack((np.ravel(xv), np.ravel(yv), aL, bL), axis=-1)
arr = [[ 0. 0. 62. 41.]
[ 25. 0. 4. 42.]
[ 50. 0. 94. 71.]
[ 75. 0. 24. 91.]
[ 100. 0. 10. 55.]
[ 0. 100. 41. 81.]
[ 25. 100. 67. 11.]
[ 50. 100. 21. 80.]
[ 75. 100. 63. 37.]
[ 100. 100. 27. 2.]]
From here, saving is trivial:
np.savetxt('out.out', arr, delimiter=' ')
ORIGINAL ANSWER
idx = 0
out = []
for x in xL:
for y in yL:
v1 = aL[idx]
v2 = bL[idx]
out.append((x, y, v1, v2))
# print(x,y, v1, v2)
idx += 1
but, it's slow, and only gets slower with more coordinates. I'd consider using the numpy package instead. Here's an example with a 2 x 5 dataset.
aL = nprnd.randint(0,100,size=10) # 10 random ints
bL = nprnd.randint(0,100,size=10) # 10 random ints
xL = np.linspace(0,100,num=5) # 5 evenly spaced ints
yL = np.linspace(0,100,num=2) # 2 evenly spaced ints
lenx = len(xL) # 5
leny = len(yL) # 2
arr = np.ndarray(shape=(leny,lenx,4)) # create a 3-d array
this creates an 3-dimensional array having a shape of 2 rows x 5 columns. On the third axis (length 4) we populate the array with the data you want.
for x in range(leny):
arr[x,:,0] = xL
this syntax is a a little confusing at first. You can learn more about it here. In short, it iterates over the number of rows and sets a particular slice of the array to xL. In this case, the slice we have selected is the zeroth index in all columns of row x. (the : means, "select all indices on this axis"). For our small example, this would yield:
[[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]
[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]]
now we do the same for each column:
for y in range(lenx):
arr[:,y,1] = yL
-----
[[[ 0 0 0 0]
[ 25 0 0 0]
[ 50 0 0 0]
[ 75 0 0 0]
[100 0 0 0]]
[[ 0 100 0 0]
[ 25 100 0 0]
[ 50 100 0 0]
[ 75 100 0 0]
[100 100 0 0]]]
now we need to address arrays aL and bL. these arrays are flat, so we must first reshape them to conform to the shape of arr. In our simple example, this would take an array of length 10 and reshape it into a 2 x 5 2-dimensional array.
a_reshaped = aL.reshape(leny,lenx)
b_reshaped = bL.reshape(leny,lenx)
to insert the reshaped arrays into our arr, we select the 2nd and 3rd index for all rows and all columns (note the two :'s this time:
arr[:,:,2] = a_reshaped
arr[:,:,3] = b_reshaped
----
[[[ 0 0 3 38]
[ 25 0 63 89]
[ 50 0 4 25]
[ 75 0 72 1]
[100 0 24 83]]
[[ 0 100 55 85]
[ 25 100 39 9]
[ 50 100 43 85]
[ 75 100 63 57]
[100 100 6 63]]]
this runs considerably faster than the nested loop solution. hope it helps!
Sounds like you need a nested loop for x and y:
for x in yL:
for y in yL:
for a, b in zip(aL, bL):
print "%d\t%d\t%s\t%s" % (x, y, a, b)
Try this,
for i,j in zip(zip(a,b),zip(c,d)):
print "%d\t%d\t%s\t%s" % (i[0], i[1], j[0], j[1])

Overlapping iteration over theano tensor

I am trying to implement a scan loop in theano, which given a tensor will use a "moving slice" of the input. It doesn't have to actually be a moving slice, it can be a preprocessed tensor to another tensor that represents the moving slice.
Essentially:
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]
|-------| (first iteration)
|-------| (second iteration)
|-------| (third iteration)
...
...
...
|-------| (last iteration)
where |-------| is the input for each iteration.
I am trying to figure out the most efficient way to do it, maybe using some form of referencing or manipulating strides, but I haven't managed to get something to work even for pure numpy.
One possible solution I found can be found here, but I can't figure out how to use strides and I don't see a way to use that with theano.
You can build a vector containing the starting index for the slice at each timestep and call Scan with that vector as a sequence and your original vector as a non-sequence. Then, inside Scan, you can obtain the slice you want at every iteration.
I included an example in which I also made the size of the slices a symbolic input, in case you want to change it from one call of your Theano function to the next:
import theano
import theano.tensor as T
# Input variables
x = T.vector("x")
slice_size = T.iscalar("slice_size")
def step(idx, vect, length):
# From the idx of the start of the slice, the vector and the length of
# the slice, obtain the desired slice.
my_slice = vect[idx:idx + length]
# Do something with the slice here. I don't know what you want to do
# to I'll just return the slice itself.
output = my_slice
return output
# Make a vector containing the start idx of every slice
slice_start_indices = T.arange(x.shape[0] - slice_size + 1)
out, updates = theano.scan(fn=step,
sequences=[slice_start_indices],
non_sequences=[x, slice_size])
fct = theano.function([x, slice_size], out)
Running the function with your parameters produces the output :
print fct(range(17), 5)
[[ 0. 1. 2. 3. 4.]
[ 1. 2. 3. 4. 5.]
[ 2. 3. 4. 5. 6.]
[ 3. 4. 5. 6. 7.]
[ 4. 5. 6. 7. 8.]
[ 5. 6. 7. 8. 9.]
[ 6. 7. 8. 9. 10.]
[ 7. 8. 9. 10. 11.]
[ 8. 9. 10. 11. 12.]
[ 9. 10. 11. 12. 13.]
[ 10. 11. 12. 13. 14.]
[ 11. 12. 13. 14. 15.]
[ 12. 13. 14. 15. 16.]]
You could use this rolling_window recipe:
import numpy as np
def rolling_window_lastaxis(arr, winshape):
"""
Directly taken from Erik Rigtorp's post to numpy-discussion.
http://www.mail-archive.com/numpy-discussion#scipy.org/msg29450.html
(Erik Rigtorp, 2010-12-31)
See also:
http://mentat.za.net/numpy/numpy_advanced_slides/ (Stéfan van der Walt, 2008-08)
https://stackoverflow.com/a/21059308/190597 (Warren Weckesser, 2011-01-11)
https://stackoverflow.com/a/4924433/190597 (Joe Kington, 2011-02-07)
https://stackoverflow.com/a/4947453/190597 (Joe Kington, 2011-02-09)
"""
if winshape < 1:
raise ValueError("winshape must be at least 1.")
if winshape > arr.shape[-1]:
print(winshape, arr.shape)
raise ValueError("winshape is too long.")
shape = arr.shape[:-1] + (arr.shape[-1] - winshape + 1, winshape)
strides = arr.strides + (arr.strides[-1], )
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
x = np.arange(17)
print(rolling_window_lastaxis(x, 5))
which prints
[[ 0 1 2 3 4]
[ 1 2 3 4 5]
[ 2 3 4 5 6]
[ 3 4 5 6 7]
[ 4 5 6 7 8]
[ 5 6 7 8 9]
[ 6 7 8 9 10]
[ 7 8 9 10 11]
[ 8 9 10 11 12]
[ 9 10 11 12 13]
[10 11 12 13 14]
[11 12 13 14 15]
[12 13 14 15 16]]
Note that there are even fancier extensions of this, such as Joe Kington's rolling_window which can roll over multi-dimensional windows, and Sebastian Berg's implementation which, in addition, can jump by steps.

Categories