Related
Suppose I have two NumPy arrays
x = [[5, 2, 8],
[4, 9, 1],
[7, 8, 9],
[1, 3, 5],
[1, 2, 3],
[1, 2, 4]]
y = [0, 0, 1, 1, 1, 2]
I want to efficiently split the array x into sub-arrays according to the values in y.
My desired outputs would be
z_0 = [[5, 2, 8],
[4, 9, 1]]
z_1 = [[7, 8, 9],
[1, 3, 5],
[1, 2, 3]]
z_2 = [[1, 2, 4]]
Assuming that y starts with zero and is sorted in ascending order, what is the most efficient way to do this?
Note: This question is the sorted version of this question:
Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array
If y is grouped (doesn't have to be sorted), you can use diff to get the split points:
indices = np.flatnonzero(np.diff(y)) + 1
You can pass those directly to np.split:
z = np.split(x, indices, axis=0)
If you want to know the labels too:
labels = y[np.r_[0, indices]]
I'm having trouble understanding how to shape data to evaluate an interpolated view of an nD-array, using scipy.interpolate.RegularGridInterpolator
Considering A a (n1,n2,n3)-shaped numpy array, indexed along the following coordinates :
x = np.linspace(0, 10, 5) # n1 = 5
y = np.linspace(-1, 1, 10) # n2 = 10
z = np.linspace(0, 500, 1000) # n3 = 1000
For this example, you can generate A = ex_array with this bit of code from the documentation :
def f(x,y,z):
return 2 * x**3 + 3 * y**2 - z
ex_array = f(*np.meshgrid(x, y, z, indexing='ij', sparse=True))
Let's imagine I want to interpolate the entire array along each axis.
This is done with :
from scipy.interpolate import RegularGridInterpolator
interpolated = RegularGridInterpolator((x,y,z), ex_array)
Now, the part where my brain starts to hurt hard :
In order to evaluate this interpolator object at any given coordinates, you have to __call__ it on said point like so :
evaluated_pts = interpolated((0,1,0)) # evaluate at (x,y,z) = (5,0.5,300)
print(evaluated_pts)
In order to evaluate it on several points, you can iterate like this :
pts = ((5,0.5,_z) for _z in np.linspace(100,200,50))
evaluated_pts = interpolated(pts)
Now, what if I want to use the same logic as above, and evaluate on an entire new grid, such as :
new_x = np.linspace(2, 3, 128)
new_y = np.linspace(-0.1, 0.1, 100)
new_z = np.linspace(350, 400, 256)
As you can see now, it's not as straightforward as interpolated(new_x, new_y, new_z), and I tried to use np.meshgrid but could not figure it out.
Ideally, I'd want to output a new (128, 100, 256) array in this example.
RegularGridInterpolator input values are located on a grid. The grid points are defined using a tuple of "ticks" along each axis, for instance ((x0, x1, ..., xn), (y0, y1, ..., xm), (z0, z1, ..., zk) ) in 3D. The values are given as an nd-array of shape (n, m, k) in this case.
To evaluate the interpolated function, the assumption that the points are on a grid is no more required. Then, the asked points are defined as a list of points (actually an array of coordinates): ((x1, y1, z1), (x2, y2, z2), ... (xP, yP, zP)) i.e. a nd-array of shape (Number of points, Number of dimension).
To evaluate the interpolation on a new grid, it must be constructed using meshgrid.
reshape and transpose are used to transform arrays from one shape to another (see this question).
For example:
x = [0, 1, 2]
y = [3, 4]
z = [5, 6, 7, 8]
xyz_grid = np.meshgrid(x, y, z, indexing='ij')
xyz_list = np.reshape(xyz_grid, (3, -1), order='C').T
xyz_list
̀xyz_list could be used to call the interpolation function and it looks like that:
array([[0, 3, 5],
[0, 3, 6],
[0, 3, 7],
[0, 3, 8],
[0, 4, 5],
[0, 4, 6],
[0, 4, 7],
[0, 4, 8],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7],
[1, 3, 8],
[1, 4, 5],
[1, 4, 6],
[1, 4, 7],
[1, 4, 8],
[2, 3, 5],
[2, 3, 6],
[2, 3, 7],
[2, 3, 8],
[2, 4, 5],
[2, 4, 6],
[2, 4, 7],
[2, 4, 8]])
First off, I'm very new to python and so any tips/help is really appreciated.
Essentially I want an nx3 numpy array to be sorted initially by the second column then by the third but I want all of the data in the row to remain with each other.
Like so:
import numpy as np
a = np.array([[20, 2, 4],
[7, 5, 6],
[25, 1, 5],
[2, 2, 3],
[3, 5, 8],
[4, 1, 3]])
......... (n times)
In this array the first column represents the value, the second it's x coordinate and the third its y coordinate. What is the best way to do a descending sort the array by first the x coordinate, then do a descending sort on the y coordinate whilst value still stays assigned to the x and y coordinate?
So after the sort, it looks like this:
a = ([[4, 1, 3],
[25, 1, 5],
[2, 2, 3],
[20, 2, 4],
[7, 5, 6],
[3, 5, 8]])
......... (n times)
As you can see how can it first sort the x coordinate then with sort all the y coordinates which have the same x coordinates. As it first finds all x coordinates of 1 then within that sort the y coordinates. Whilst the value, x and y coordinates all remain on the same row with each other.
Easiest way is to convert it into a pandas dataframe then it's easier to manipulate it.
import pandas as pd
df = pd.DataFrame({'a': [6, 2, 1], 'b': [4, 5, 6]})
print(df)
Out
a b
0 6 4
1 2 5
2 1 6
sorteddf = df.sort_values(by='a')
print(sorteddf)
Out
a b
2 1 6
1 2 5
0 6 4
Take a look at the 'order' parameter: https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html
import numpy as np
dtype = [('x',int),('y',int)]
values = [(1,7),(3,4),(1,4),(1,3)]
a = np.array(values,dtype=dtype)
print(np.sort(a, order=['x','y']))
The easiest way is to first sort by y and then sort the result by x, so for equals values of x the final result will be sorted by y.
Full code:
import numpy as np
a = np.array([[20, 2, 4],
[7, 5, 6],
[25, 1, 5],
[2, 2, 3],
[3, 5, 8],
[4, 1, 3]])
a = a[a[:,2].argsort()] # sort according to column 2
a = a[a[:,1].argsort()] # sort according to column 1
result
a
array([[ 4, 1, 3],
[25, 1, 5],
[ 2, 2, 3],
[20, 2, 4],
[ 7, 5, 6],
[ 3, 5, 8]])
I have a list of lists(2000x1000), but as example consider this one(10x3):
l = [[8, 7, 6], [5, 3, 1], [4, 5, 9], [1, 5, 1], [3, 5, 7], [8, 2, 5], [1, 9, 2], [8, 7, 6], [9, 9, 9], [4, 5, 9]]
In this example, every list corresponds to 3 measurements of each instant:
t0 -> [8,7,6]
t1 -> [5,3,1]
and so on.
I would like to compare the measurements with a window of 4 instants for position and take the max value which is in the 99 percentile of the peak to peak value.
EXAMPLE
Lets consider the first window:
[8, 7, 6], [5, 3, 1], [4, 5, 9], [1, 5, 1] :
[8,5,4,1] -> peak to peak: 8-1=7
[7,3,5,5] -> ptp=4
[6,1,9,1] -> ptp=8
with these 3 values [7,4,8] I want to take the max in the 99percentile, in this case 7
For the second window:
[5, 3, 1], [4, 5, 9], [1, 5, 1], [3, 5, 7]:
[5,4,1,3] -> ptp=4
[3,5,5,5] -> ptp=2
[1,9,1,7] -> ptp=8
max in 99percentile -> 4
After I do that for all the windows of size 4, I want to make a list with these values.
My code is this the following one, but it is slow. Is there a fast way to implement this?
NOTE: I cannot use pandas, and Numpy version should be <=1.6
num_meas = 4
m = []
for index, i in enumerate(l):
if index < len(l) - num_meas + 1:
p = []
for j in range(len(i)):
t = []
for k in range(num_meas):
t.append(l[index + k][j])
t = [x for x in t if ~np.isnan(x)]
try:
a = np.ptp(t)
except ValueError:
a = 0
p.append(a)
perce = np.percentile(p, 99)
p = max([el for el in p if el < perce])
m.append(p)
print m
The output:
[7, 4, 7, 6, 5, 7, 7]
Please check if the following code works with NumPy 1.6:
import numpy as np
l = [[8, 7, 6], [5, 3, 1], [4, 5, 9], [1, 5, 1], [3, 5, 7], [8, 2, 5],
[1, 9, 2], [8, 7, 6], [9, 9, 9], [4, 5, 9]]
l = np.array(l)
# range matrix
mat_ptp = np.zeros((l.shape[0]-3, l.shape[1]))
for i in range(l.shape[0]-3):
l[i:i+4].ptp(axis=0, out=mat_ptp[i])
percentiles = np.percentile(mat_ptp, 99, axis=1)
greater_pos = np.greater_equal(mat_ptp, percentiles.reshape(-1, 1))
mat_ptp[greater_pos] = -np.inf
result = np.max(mat_ptp, axis=1)
For enhancing performance, you can try to vetorize your operations as much as possible using numpy. It could be much faster than using for loops and the append function.
EDIT
Sorry, I didn't notice that you wanted the selected elements strictly less than the percentile. Here is the correct version.
BENCHMARK
Just to validate the question about performance, here is the result with:
l = np.random.randint(0, 100, size=(200, 100))
run 100 times with timeit:
OP code: 0.5197743272900698 ms in average
Code above: 0.0021439407201251015 in average
I am very new to Python, trying to learn the basics. Have a doubt about the list.
Have a list:
L = [[1,2,3],[4,5,6],[3,4,6]]
The output should be:
[[2,4,6],[8,10,12],[6,8,12]]
The code that works for me is the following
for x in range(len(L)):
for y in range(len(L[x])):
L[x][y] = L[x][y] + L[x][y]
print L
It gives the output [[2,4,6],[8,10,12],[6,8,12]].
Now I want the same output with a different code:
for x in L:
a = L.index(x)
for y in L[a]:
b = L[a].index(y)
L[a][b] = L[a][b] + L[a][b]
print L
With the above code the output obtained is:
[[4,2,6],[8,10,12],[12,8,6]]
I tried to debug about the above output.
I put a print statement above the line "L[a][b] = L[a][b] + L[a][b]" for printing a and b. I was surprised to see the values of a and b are :
0,0
0,0
0,2
1,0
1,1
1,2
2,0
2,1
2,0
Again if I comment out the line "L[a][b] = L[a][b] + L[a][b]" then the values of a and b are as expected:
0,0
0,1
0,2
1,0
1,1
1,2
2,0
2,1
2,2
I suspect this might be happening due to the scope of variable in python and tried to study few stuffs about scoping in python. But I didn't get appropriate answer neither for scoping or the above question.
You modifying your list with statement - L[a][b] = L[a][b] + L[a][b]
e.g. -
L = [[1, 2, 3], [4, 5, 6], [3, 4, 6]]
L[0][0] = 1 initially
Then you modify it as L[0][0] = 2
L = [[2, 2, 3], [4, 5, 6], [3, 4, 6]]
In next loop you search index for 2, which is 0,0 now, Because you modified list L.
I tried to print L along with a,b in your example. Result explains the behavior -
0 0
[[1, 2, 3], [4, 5, 6], [3, 4, 6]]
0 0
[[2, 2, 3], [4, 5, 6], [3, 4, 6]]
0 2
[[4, 2, 3], [4, 5, 6], [3, 4, 6]]
1 0
[[4, 2, 6], [4, 5, 6], [3, 4, 6]]
1 1
[[4, 2, 6], [8, 5, 6], [3, 4, 6]]
1 2
[[4, 2, 6], [8, 10, 6], [3, 4, 6]]
2 0
[[4, 2, 6], [8, 10, 12], [3, 4, 6]]
2 1
[[4, 2, 6], [8, 10, 12], [6, 4, 6]]
2 0
[[4, 2, 6], [8, 10, 12], [6, 8, 6]]
As other people have explained, when you use the index function, it finds the first occurrence of the value you are search for. So the first time through you're loop (for the first row), it looks like
b = 1
[1,2,3].find(1) # returns index 0
#Then you modify the first element of the list
b = 2
[2,2,3].find(2) #returns index 0 again!
For getting the indices in an easier, more deterministic way, you can use the enumerate function on a list. It will provided you with an iterator that returns the index AND value as you move throughout a list.
for rowInd, x in enumerate(L):
for colInd, y in enumerate(x):
L[rowInd][colInd] = y + y
Note that this will do it in place, as in your original solution.
L = [[2, 4, 6], [8, 10, 12], [6, 8, 12]]
The best way to achieved your desired output is to use a list comprehension. You could do as follows:
L = [[1,2,3], [4,5,6], [3,4,6]]
answer = [[2*el for el in sublist] for sublist in L]
print(answer)
Output
[[2, 4, 6], [8, 10, 12], [6, 8, 12]]
This iterates over each sublist in your list L and multiplies each el in the sublist by 2, thus achieving the desired result.
I think the following piece of code might be better
for x in L: #iterating over the orig list
for y in x: #iterating over the inner list
[x][y] = [x][y] + [x][y]
If you insist on using your second method, then you need to store the results in a temporary variable:
L = [[1, 2, 3], [4, 5, 6], [3, 4, 6]]
M = [[0 for y in range(3)] for x in range(3)]
for x in L:
a = L.index(x)
for y in L[a]:
b = L[a].index(y)
M[a][b] = L[a][b] + L[a][b]
L = M
print L
Output:
[[2, 4, 6], [8, 10, 12], [6, 8, 12]]