Pardon if I couldn't synthesize precisely my problem with my title, however I guess that by explaning it, things will get more clear.
My problem is that I have to perform a calculation utilizing every combination of a set of arrays and then, store the corresponding result in an array for each loop, as well as the arguments utilized for the calculation. Subsequently, I will store the NumPy arrays and the corresponding result of the calculation in a pandas dataframe
I am trying to avoid for loops as much as possible and I am not sure if there are other means to achieve my goal utilizing pandas or python methods which I am still not aware of. The problem that I am trying to solve is naturally more complex and would involve several more arrays as well as more complex data. So, to sum up, my questions is if there are other smart ways to avoid what I have been doing.
The code that I have been working on - written in a very matlab-ish way - has a similar logic to the following(this is just an illustrative simpler example):
max_x = 5
min_x = 1
x_1 = np.linspace(min_x, max_x, 5)
x_2 = np.linspace(min_x, max_x, 5)
x_3 = np.linspace(min_x, max_x, 5)
x_4 = np.linspace(min_x, max_x, 5)
x_result = np.array([])
x1 = np.array([])
x2 = np.array([])
x3 = np.array([])
x4 = np.array([])
for i in range(0,len(x_1)):
for j in range(0,len(x_2)):
for k in range(0,len(x_3)):
for l in range(0, len(x_4)):
x_set = [x_1[i],x_2[j],x_3[k], x_4[l]]
x_calc = calculation_1(arg = x_set)
x1 = np.append(x1, x_1[i])
x2 = np.append(x2, x_2[j])
x3 = np.append(x3, x_3[k])
x4 = np.append(x4, x_4[l])
x_result = np.append(x_result, x_calc)
df_x = pd.DataFrame(np.array([x1, x2, x3, x4, x_result])).T
If I understand it correctly, you want to implement some sort of cartesian product with arrays. We can do this by using np.meshgrid, like:
def cartesian_product(*arrs):
return np.transpose(np.meshgrid(*arrs)).reshape(-1, len(arrs))
For example:
>>> x_1 = [1,2,3]
>>> x_2 = [3,4]
>>> x_3 = [5]
>>> cartesian_product(x_1, x_2, x_3)
array([[1, 3, 5],
[1, 4, 5],
[2, 3, 5],
[2, 4, 5],
[3, 3, 5],
[3, 4, 5]])
You can then channel the items of this cross product through calculation_1 , for example with np.apply_along_axis(..):
np.apply_axis(calculation_1, 1, c)
We can then add that result as a new column, for example with sum:
>>> c = cartesian_product(x_1, x_2, x_3)
>>> np.hstack((c, np.apply_axis(sum, 1, c)[:, None]))
array([[ 1, 3, 5, 9],
[ 1, 4, 5, 10],
[ 2, 3, 5, 10],
[ 2, 4, 5, 11],
[ 3, 3, 5, 11],
[ 3, 4, 5, 12]])
You can rewrite n nested loops with recursion for example.
(And eventually you can write any recursion with single loop using queues)
Read this for details
Related
I have written a code for this. But is there any easier method or function specifically in NumPy?
I tried this code:
ar = np.array([4, 1, 2, 3, 4, 7, 0, 8])
prod = 0
ar1 = 0
for x in ar:
i = np.where(ar == x)[0][0]
for y in ar:
if y == ar[i]:
continue
else:
ar1 = x * y #storing highest product
if ar1 > prod:
prod = ar1
y1 = y
x1 = x
print([x1, y1])
Taking the two maximum numbers and the two minimum numbers and comparing their products should give the right answer.
ar = np.array([4, 1, 2, 3, 4, 7, 0, 8])
ar.sort()
a, b = ar[-2:], ar[:2]
if np.prod(a) > np.prod(b):
print(a)
else:
print(b)
Deliberately not an efficient way to do this, but it uses Numpy:
import numpy as np
x = np.array([4, 1, 2, 3, 4, 7, 0, 8], dtype=object)
o = np.outer(x, x).flatten()
a = np.argsort(o)
print(o[a][-2:])
'''
[56 64]
'''
Please clarify your question if this is not what you are asking for.
Thanks to #DaniMesejo I got the answer:
ar = np.array([4, 1, 2, 3, 4, 7, 0, 8])
ar.sort()
print(ar[-2:])
I'm trying to speed up a process, I think this might be possible using numpy's apply_along_axis. The problem is that not all my axis have the same length.
When I do:
a = np.array([[1, 2, 3],
[2, 3, 4],
[4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)
This works fine. But I would like to do something similar to (please note that the first row has 4 elements and the rest have 3):
a = np.array([[1, 2, 3, 4],
[2, 3, 4],
[4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)
But this fails because:
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
I've looked around and the only 'solution' I've found is to add zeros to make all the arrays the same length, which would probably defeat the purpose of performance improvement.
Is there any way to use numpy_apply_along_axis on a non-regular shaped numpy array?
You can transform your initial array of iterable-objects to ndarray by padding them with zeros in a vectorized manner:
import numpy as np
a = np.array([[1, 2, 3, 4],
[2, 3, 4],
[4, 5, 6]])
max_len = len(max(a, key = lambda x: len(x))) # max length of iterable-objects contained in array
cust_func = np.vectorize(pyfunc=lambda x: np.pad(array=x,
pad_width=(0,max_len),
mode='constant',
constant_values=(0,0))[:max_len], otypes=[list])
a_pad = np.stack(cust_func(a))
output:
array([[1, 2, 3, 4],
[2, 3, 4, 0],
[4, 5, 6, 0]])
It depends.
Do you know the size of the vectors before or are you appending to a list?
see e.g. http://stackoverflow.com/a/58085045/7919597
You could for example pad the arrays
import numpy as np
a1 = [1, 2, 3, 4]
a2 = [2, 3, 4, np.nan] # pad with nan
a3 = [4, 5, 6, np.nan] # pad with nan
b = np.stack([a1, a2, a3], axis=0)
print(b)
# you can apply the normal numpy operations on
# arrays with nan, they usually just result in a nan
# in a resulting array
c = np.diff(b, axis=-1)
print(c)
Afterwards you can apply a moving window on each row over the columns.
Have a look at https://stackoverflow.com/a/22621523/7919597 which is only 1d, but can give you an idea of how it could work.
It is possible to use a 2d array with only one row as kernel (shape e.g. (1, 3)) with scipy.signal.convolve2d and use the idea above.
This is a workaround to get a "row-wise 1D convolution":
from scipy import signal
krnl = np.array([[0, 1, 0]])
d = signal.convolve2d(c, krnl, mode='same')
print(d)
I'm having trouble understanding how to shape data to evaluate an interpolated view of an nD-array, using scipy.interpolate.RegularGridInterpolator
Considering A a (n1,n2,n3)-shaped numpy array, indexed along the following coordinates :
x = np.linspace(0, 10, 5) # n1 = 5
y = np.linspace(-1, 1, 10) # n2 = 10
z = np.linspace(0, 500, 1000) # n3 = 1000
For this example, you can generate A = ex_array with this bit of code from the documentation :
def f(x,y,z):
return 2 * x**3 + 3 * y**2 - z
ex_array = f(*np.meshgrid(x, y, z, indexing='ij', sparse=True))
Let's imagine I want to interpolate the entire array along each axis.
This is done with :
from scipy.interpolate import RegularGridInterpolator
interpolated = RegularGridInterpolator((x,y,z), ex_array)
Now, the part where my brain starts to hurt hard :
In order to evaluate this interpolator object at any given coordinates, you have to __call__ it on said point like so :
evaluated_pts = interpolated((0,1,0)) # evaluate at (x,y,z) = (5,0.5,300)
print(evaluated_pts)
In order to evaluate it on several points, you can iterate like this :
pts = ((5,0.5,_z) for _z in np.linspace(100,200,50))
evaluated_pts = interpolated(pts)
Now, what if I want to use the same logic as above, and evaluate on an entire new grid, such as :
new_x = np.linspace(2, 3, 128)
new_y = np.linspace(-0.1, 0.1, 100)
new_z = np.linspace(350, 400, 256)
As you can see now, it's not as straightforward as interpolated(new_x, new_y, new_z), and I tried to use np.meshgrid but could not figure it out.
Ideally, I'd want to output a new (128, 100, 256) array in this example.
RegularGridInterpolator input values are located on a grid. The grid points are defined using a tuple of "ticks" along each axis, for instance ((x0, x1, ..., xn), (y0, y1, ..., xm), (z0, z1, ..., zk) ) in 3D. The values are given as an nd-array of shape (n, m, k) in this case.
To evaluate the interpolated function, the assumption that the points are on a grid is no more required. Then, the asked points are defined as a list of points (actually an array of coordinates): ((x1, y1, z1), (x2, y2, z2), ... (xP, yP, zP)) i.e. a nd-array of shape (Number of points, Number of dimension).
To evaluate the interpolation on a new grid, it must be constructed using meshgrid.
reshape and transpose are used to transform arrays from one shape to another (see this question).
For example:
x = [0, 1, 2]
y = [3, 4]
z = [5, 6, 7, 8]
xyz_grid = np.meshgrid(x, y, z, indexing='ij')
xyz_list = np.reshape(xyz_grid, (3, -1), order='C').T
xyz_list
̀xyz_list could be used to call the interpolation function and it looks like that:
array([[0, 3, 5],
[0, 3, 6],
[0, 3, 7],
[0, 3, 8],
[0, 4, 5],
[0, 4, 6],
[0, 4, 7],
[0, 4, 8],
[1, 3, 5],
[1, 3, 6],
[1, 3, 7],
[1, 3, 8],
[1, 4, 5],
[1, 4, 6],
[1, 4, 7],
[1, 4, 8],
[2, 3, 5],
[2, 3, 6],
[2, 3, 7],
[2, 3, 8],
[2, 4, 5],
[2, 4, 6],
[2, 4, 7],
[2, 4, 8]])
I have a 2D array, and would like to sum its entries along a particular line. It should basically be like numpy.sum(), not along a column or row but rather along a line (given by an equation).
I don't really know where to start from. There is this answer which uses a Radon transfer (though I haven't managed to porperly install the skimage package).
Is there any built-in function I can start from?
Here's what I've come up with:
array = [[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
def points_on_line(x0,y0, x1,y1):
dx = x1 - x0
dy = y1 - y0
D = 2*dy - dx
y = y0
for x in range(x0, x1):
yield (x,y)
if D > 0:
y = y + 1
D = D - 2*dx
D = D + 2*dy
print(sum([array[y][x] for x, y in points_on_line(0,0, 5, 4)]))
This uses Bresenheim's Line Algorithm to find the points lying on the line between two points. It's not perfect though, and won't return all the points that it touches. This should be a good jumping off point, though!
A is a point, and P is a list of points.
I want to find which point P[i] is the closest to A, i.e. I want to find P[i_0] with:
i_0 = argmin_i || A - P[i]||^2
I do it this way:
import numpy as np
# P is a list of 4 points
P = [np.array([-1, 0, 7, 3]), np.array([5, -2, 8, 1]), np.array([0, 2, -3, 4]), np.array([-9, 11, 3, 4])]
A = np.array([1, 2, 3, 4])
distance = 1000000000 # better would be : +infinity
closest = None
for p in P:
delta = sum((p - A)**2)
if delta < distance:
distance = delta
closest = p
print closest # the closest point to A among all the points in P
It works, but how to do this in a shorter/more Pythonic way?
More generally in Python (and even without using Numpy), how to find k_0 such that D[k_0] = min D[k]? i.e. k_0 = argmin_k D[k]
A more Pythonic way of implementing the same algorithm you're using is to replace your loop with a call to min with a key function:
closest = min(P, key=lambda p: sum((p - A)**2))
Note that I'm using ** for exponentiation (^ is the binary-xor operator in Python).
A fully vectorized approach in numpy. Similar to the one of #MikeMüller, but using numpy's broadcasting to avoid lambda functions.
With the example data:
>>> P = [np.array([-1, 0, 7, 3]), np.array([5, -2, 8, 1]), np.array([0, 2, -3, 4]), np.array([-9, 11, 3, 4])]
>>> A = np.array([1, 2, 3, 4])
And making P a 2D numpy array:
>>> P = np.asarray(P)
>>> P
array([[-1, 0, 7, 3],
[ 5, -2, 8, 1],
[ 0, 2, -3, 4],
[-9, 11, 3, 4]])
It can be computed in one line using numpy:
>>> P[np.argmin(np.sum((P - A)**2, axis=1))]
Note that P - A, with P.shape = (N, 4) and A.shape = (4,) will brooadcast the substraction to all the rows of P (Pi = Pi - A).
For small N (number of rows in P), the pythonic approach is probably faster. For large values of N this should be significantly faster.
A NumPy version as one-liner:
clostest = P[np.argmin(np.apply_along_axis(lambda p: np.sum((p - A) **2), 1, P))]
Usage of the builtin min is the way for this:
import math
p1 = [1,2]
plst = [[1,3], [10,10], [5,5]]
res = min(plst, key=lambda x: math.sqrt(pow(p1[0]-x[0], 2) + pow(p1[1]-x[1], 2)))
print res
[1, 3]
Note that I just used plain python lists.