confused by numpy meshgrid output - python

Using Python 2.7 with miniconda interpreter. I am confused by what means N-D coordinate in the following statements, and could anyone tell how in the below sample xv and yv are calculated, it will be great.
"Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, given one-dimensional coordinate arrays x1, x2,..., xn."
http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html
>>> nx, ny = (3, 2)
>>> x = np.linspace(0, 1, nx)
>>> y = np.linspace(0, 1, ny)
>>> xv, yv = meshgrid(x, y)
>>> xv
array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]])
>>> yv
array([[ 0., 0., 0.],
[ 1., 1., 1.]])
regards,
Lin

xv,yv are simply defined as:
xv = np.array([x for _ in y])
yv = np.array([y for _ in x]).T
so that for every index pair (i,j), you have
xv[i,j] = x[i]
yv[i,j] = y[j]
which is useful especially for plotting 2D maps.

Related

How to generalise the interweaving of a list of NumPy arrays?

I need to interweave a list of NumPy arrays (that I call parts_list which can be of any len) in a particular way to form a large array. This large array always has an even number of dimensions, and will always have a shape of the form (x, x), (x, y, x, y), (x, y, z, x, y, z), and so on. (Eventually, this would need to work for up to 12 dimensions, but 6 is fine for now).
When the large array has shape (x, x), the parts will have shape (x, 1); when the large array has shape (x, y, x, y), the parts will have shape (x, y, 1, y); then for (x, y, z, x, y, z), it will be (x, y, z, 1, y, z), and so on.
So, I can construct the array by doing for example:
original_size = [3, 4, 5]
large_array = np.zeros(tuple(list(original_size) +list(original_size)))
shp = large_array.shape # (3, 4, 5, 3, 4, 5)
n_dimensions = len(shp)
for index in range(shp[0]):
array_part = parts_list[index]
if n_dimensions == 6:
large_array[:, :, :, index, :, :] = array_part[:, :, :, 0, :, :]
elif n_dimensions == 4:
large_array[:, :, index, :] = array_part[:, :, 0, :]
elif n_dimensions == 2:
large_array[:, index] = array_part[:, 0]
This piece of code works, but (for reasons that are long to explain) I need to modify it and make it work using functions like hstack, block, reshape, etc, instead of assigning slices like I'm currently doing.
So if, for 2 dimensions and a large array of shape (3, 3), my parts_list is:
[array([[0],
[1],
[2]]),
array([[3],
[4],
[5]]),
array([[6],
[7],
[8]])]
The wanted result is:
array([[0., 3., 6.],
[1., 4., 7.],
[2., 5., 8.]])
I can get this result alternatively with:
np.block(parts_list)
If, for 4 dimensions and a large array of shape (3, 2, 3, 2), my parts_list is:
[array([[[[ 0, 1]],
[[ 2, 3]]],
[[[ 4, 5]],
[[ 6, 7]]],
[[[ 8, 9]],
[[10, 11]]]]),
array([[[[12, 13]],
[[14, 15]]],
[[[16, 17]],
[[18, 19]]],
[[[20, 21]],
[[22, 23]]]]),
array([[[[24, 25]],
[[26, 27]]],
[[[28, 29]],
[[30, 31]]],
[[[32, 33]],
[[34, 35]]]])]
The wanted result is:
array([[[[ 0., 1.],
[12., 13.],
[24., 25.]],
[[ 2., 3.],
[14., 15.],
[26., 27.]]],
[[[ 4., 5.],
[16., 17.],
[28., 29.]],
[[ 6., 7.],
[18., 19.],
[30., 31.]]],
[[[ 8., 9.],
[20., 21.],
[32., 33.]],
[[10., 11.],
[22., 23.],
[34., 35.]]]])
I can make this work alternatively with:
np.dstack(parts_list)
However, I haven't been able to find an alternative way to get the same result for 6 dimensions and above. Is there are way of generalising this behaviour? I've tried many different combinations of funcions to no avail.
My biggest clue right now is that in the docs, about dstack it says:
This is equivalent to concatenation along the third axis after 2-D arrays of shape (M,N) have been reshaped to (M,N,1) and 1-D arrays of shape (N,) have been reshaped to (1,N,1). Rebuilds arrays divided by dsplit.
So maybe coding these instructions, and then extending the code could work, but it's not too clear to me. It may be much easier to find a solution based on the code that currently works (the one that uses slices).
This is the generalisation for all dimensions.
import numpy as np
def get_nested_parts(parts_list):
# Get the number of different dimensions i.e. (x, y, z) instead of (x, y, z, x, y, z)
n_dimensions = len(parts_list[0].shape) // 2
nested_parts_list = parts_list.copy()
for i in range(n_dimensions-1):
# Nest parts
nested_parts_list = [[part] for part in nested_parts_list]
return nested_parts_list
def generate_tensor_by_block(parts_list):
nested_parts_list = get_nested_parts(parts_list)
return np.block(nested_parts_list)

Distance between 2 points in 3D for a big array

I have an array n×m, where n = 217000 and m = 3 (some data from telescope).
I need to calculate the distances between 2 points in 3D (according to my x, y, z coordinates in columns).
When I try to use sklearn tools the result is:
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
What tool can I use in this situation and what max possible size for this tools?
What tool can I use in this situation...?
You could implement the euclidean distance function on your own using the approach suggested by #Saksow. Assuming that a and b are one-dimensional NumPy arrays, you could also use any of the methods proposed in this thread:
import numpy as np
np.linalg.norm(a-b)
np.sqrt(np.sum((a-b)**2))
np.sqrt(np.dot(a-b, a-b))
If you wish to compute in one go the pairwise distance (not necessarily the euclidean distance) between all the points in your array, the module scipy.spatial.distance is your friend.
Demo:
In [79]: from scipy.spatial.distance import squareform, pdist
In [80]: arr = np.asarray([[0, 0, 0],
...: [1, 0, 0],
...: [0, 2, 0],
...: [0, 0, 3]], dtype='float')
...:
In [81]: squareform(pdist(arr, 'euclidean'))
Out[81]:
array([[ 0. , 1. , 2. , 3. ],
[ 1. , 0. , 2.23606798, 3.16227766],
[ 2. , 2.23606798, 0. , 3.60555128],
[ 3. , 3.16227766, 3.60555128, 0. ]])
In [82]: squareform(pdist(arr, 'cityblock'))
Out[82]:
array([[ 0., 1., 2., 3.],
[ 1., 0., 3., 4.],
[ 2., 3., 0., 5.],
[ 3., 4., 5., 0.]])
Notice that the number of points in the mock data array used in this toy example is and the resulting pairwise distance array has elements.
...and what max possible size for this tools?
If you try to apply the approach above using your data () you get an error:
In [105]: data = np.random.random(size=(217000, 3))
In [106]: squareform(pdist(data, 'euclidean'))
Traceback (most recent call last):
File "<ipython-input-106-fd273331a6fe>", line 1, in <module>
squareform(pdist(data, 'euclidean'))
File "C:\Users\CPU 2353\Anaconda2\lib\site-packages\scipy\spatial\distance.py", line 1220, in pdist
dm = np.zeros((m * (m - 1)) // 2, dtype=np.double)
MemoryError
The issue is you are running out of RAM. To perform such computation you would need more than 350TB! The required amount of memory result from multiplying the number of elements of the distance matrix (2170002) by the number of bytes of each element of that matrix (8), and dividing this product by the apropriate factor (10243) to express the result in gigabytes:
In [107]: round(data.shape[0]**2 * data.dtype.itemsize / 1024.**3)
Out[107]: 350.8
So the maximum allowed size for your data is determined by the amount of available RAM (take a look at this thread for further details).
Using only Python and Euclidean distance formula for 3 dimensions:
import math
distance = math.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2 + (z1 - z2) ** 2)

Python - Binning x,y,z values on a 2D grid

I have a list of z points associated to pairs x,y, meaning that for example
x y z
3.1 5.2 1.3
4.2 2.3 9.3
5.6 9.8 3.5
and so on. The total number of z values is relatively high, around 10000.
I would like to bin my data, in the following sense:
1) I would like to split the x and y values into cells, so as to make a 2-dimensional grid in x,y.If I have Nx cells for the x axis and Ny for the y axis, I would then have Nx*Ny cells on the grid. For example, the first bin for x could be ranging from 1. to 2., the second from 2. to 3. and so on.
2) For each of this cell in the 2dimensional grid, I would then need to calculate how many points fall into that cell, and sum all their z values. This gives me a numerical value associated to each cell.
I thought about using binned_statistic from scipy.stats, but I would have no idea on how to set the options to accomplish my task. Any suggestions? Also other tools, other than binned_statistic, are well accepted.
Assuming I understand, you can get what you need by exploiting the expand_binnumbers parameter for binned_statistic_2d, thus.
from scipy.stats import binned_statistic_2d
import numpy as np
x = [0.1, 0.1, 0.1, 0.6]
y = [2.1, 2.6, 2.1, 2.1]
z = [2.,3.,5.,7.]
binx = [0.0, 0.5, 1.0]
biny = [2.0, 2.5, 3.0]
ret = binned_statistic_2d(x, y, None, 'count', bins=[binx,biny], \
expand_binnumbers=True)
print (ret.statistic)
print (ret.binnumber)
sums = np.zeros([-1+len(binx), -1+len(biny)])
for i in range(len(x)):
m = ret.binnumber [0][i] - 1
n = ret.binnumber [1][i] - 1
sums[m][n] += sums[m][n] + z[i]
print (sums)
This is just an expansion of one of the examples. Here's the output.
[[ 2. 1.]
[ 1. 0.]]
[[1 1 1 2]
[1 2 1 1]]
[[ 9. 3.]
[ 7. 0.]]
Establish the edges of the cells, iterate over cell edges and use boolean indexing to extract the z values in each cell, keep the sums in a list, convert the list and reshape it.
import itertools
import numpy as np
x = np.array([0.1, 0.1, 0.1, 0.6, 1.2, 2.1])
y = np.array([2.1, 2.6, 2.1, 2.1, 3.4, 4.7])
z = np.array([2., 3., 5., 7., 10, 20])
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return itertools.izip(a, b)
minx, maxx = int(min(x)), int(max(x)) + 1
miny, maxy = int(min(y)), int(max(y)) + 1
result = []
x_edges = pairwise(xrange(minx, maxx + 1))
for xleft, xright in x_edges:
xmask = np.logical_and(x >= xleft, x < xright)
y_edges = pairwise(xrange(miny, maxy + 1))
for yleft, yright in y_edges:
ymask = np.logical_and(y >= yleft, y < yright)
cell = z[np.logical_and(xmask, ymask)]
result.append(cell.sum())
result = np.array(result).reshape((maxx - minx, maxy - miny))
>>> result
array([[ 17., 0., 0.],
[ 0., 10., 0.],
[ 0., 0., 20.]])
>>>
Unfortunately, no numpy vectorization magic

numpy.meshgrid explanation

Could someone care to explain the meshgrid method? I cannot wrap my mind around it. The example is from the [SciPy][1] site:
import numpy as np
nx, ny = (3, 2)
x = np.linspace(0, 1, nx)
print ("x =", x)
y = np.linspace(0, 1, ny)
print ("y =", y)
xv, yv = np.meshgrid(x, y)
print ("xv_1 =", xv)
print ("yv_1 =", yv)
xv, yv = np.meshgrid(x, y, sparse=True) # make sparse output arrays
print ("xv_2 =", xv)
print ("yv_2 =", yv)
Printout is :
x = [ 0. 0.5 1. ]
y = [ 0. 1.]
xv_1 = [[ 0. 0.5 1. ]
[ 0. 0.5 1. ]]
yv_1 = [[ 0. 0. 0.]
[ 1. 1. 1.]]
xv_2 = [[ 0. 0.5 1. ]]
yv_2 = [[ 0.]
[ 1.]]
Why are arrays xv_1 and yv_1 formed like this ? Ty :)
[1]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html#numpy.meshgrid
In [214]: nx, ny = (3, 2)
In [215]: x = np.linspace(0, 1, nx)
In [216]: x
Out[216]: array([ 0. , 0.5, 1. ])
In [217]: y = np.linspace(0, 1, ny)
In [218]: y
Out[218]: array([ 0., 1.])
Using unpacking to better see the 2 arrays produced by meshgrid:
In [225]: X,Y = np.meshgrid(x, y)
In [226]: X
Out[226]:
array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]])
In [227]: Y
Out[227]:
array([[ 0., 0., 0.],
[ 1., 1., 1.]])
and for the sparse version. Notice that X1 looks like one row of X (but 2d). and Y1 like one column of Y.
In [228]: X1,Y1 = np.meshgrid(x, y, sparse=True)
In [229]: X1
Out[229]: array([[ 0. , 0.5, 1. ]])
In [230]: Y1
Out[230]:
array([[ 0.],
[ 1.]])
When used in calculations like plus and times, both forms behave the same. That's because of numpy's broadcasting.
In [231]: X+Y
Out[231]:
array([[ 0. , 0.5, 1. ],
[ 1. , 1.5, 2. ]])
In [232]: X1+Y1
Out[232]:
array([[ 0. , 0.5, 1. ],
[ 1. , 1.5, 2. ]])
The shapes might also help:
In [235]: X.shape, Y.shape
Out[235]: ((2, 3), (2, 3))
In [236]: X1.shape, Y1.shape
Out[236]: ((1, 3), (2, 1))
The X and Y have more values than are actually needed for most uses. But usually there isn't much of penalty for using them instead the sparse versions.
Your linear spaced vectors x and y defined by linspace use 3 and 2 points respectively.
These linear spaced vectors are then used by the meshgrid function to create a 2D linear spaced point cloud. This will be a grid of points for each of the x and y coordinates. The size of this point cloud will be 3 x 2.
The output of the function meshgrid creates an indexing matrix that holds in each cell the x and y coordinates for each point of your space.
This is created as follows:
# dummy
def meshgrid_custom(x,y):
xv = np.zeros((len(x),len(y)))
yv = np.zeros((len(x),len(y)))
for i,ix in zip(range(len(x)),x):
for j,jy in zip(range(len(y)),y):
xv[i,j] = ix
yv[i,j] = jy
return xv.T, yv.T
So, for example the point at the location (1,1) has the coordinates:
x = xv_1[1,1] = 0.5
y = yv_1[1,1] = 1.0

coefficients of spline interpolation in scipy

I want to calculate the coefficients of a spline interpolation by scipy.
In MATLAB:
x=[0:3];
y=[0,1,4,0];
spl=spline(x,y);
disp(spl.coefs);
and it will return:
ans =
-1.5000 5.5000 -3.0000 0
-1.5000 1.0000 3.5000 1.0000
-1.5000 -3.5000 1.0000 4.0000
But i can't do that by interpolate.splrep in scipy. Can you tell me how to calc it?
I'm not sure there is any way to get exactly those coefficients from scipy. What scipy.interpolate.splrep gives you is the coefficients for the knots for a b-spline. What Matlab's spline gives you appears to be the partial polynomial coefficients describing the cubic equations connecting the points you pass in, which leads me to believe that the Matlab spline is a control-point based spline such as a Hermite or Catmull-Rom instead of a b-spline.
However, scipy.interpolate.interpolate.spltopp does provide a way to get the partial polynomial coefficients of a b-spline. Unfortunately, it doesn't seem to work very well.
>>> import scipy.interpolate
>>> x = [0, 1, 2, 3]
>>> y = [0, 1, 4, 0]
>>> tck = scipy.interpolate.splrep(x, y)
>>> tck
Out:
(array([ 0., 0., 0., 0., 3., 3., 3., 3.]),
array([ 3.19142761e-16, -3.00000000e+00, 1.05000000e+01,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00]),
3)
>>> pp = scipy.interpolate.interpolate.spltopp(tck[0][1:-1], tck[1], tck[2])
>>> pp.coeffs.T
Out:
array([[ -4.54540394e-322, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ -4.54540394e-322, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ -4.54540394e-322, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000]])
Note that there is one set of coefficients per knot, not one for each of the original points passed in. Also, multiplying the coefficients by the b-spline basis matrix doesn't seem to be very helpful.
>>> bsbm = array([[-1, 3, -3, 1], [ 3, -6, 3, 0], [-3, 0, 3, 0],
[ 1, 4, 1, 0]]) * 1.0/6
Out:
array([[-0.16666667, 0.5 , -0.5 , 0.16666667],
[ 0.5 , -1. , 0.5 , 0. ],
[-0.5 , 0. , 0.5 , 0. ],
[ 0.16666667, 0.66666667, 0.16666667, 0. ]])
>>> dot(pp.coeffs.T, bsbm)
Out:
array([[ 7.41098469e-323, -2.27270197e-322, 2.27270197e-322,
-7.41098469e-323],
[ 7.41098469e-323, -2.27270197e-322, 2.27270197e-322,
-7.41098469e-323],
[ 7.41098469e-323, -2.27270197e-322, 2.27270197e-322,
-7.41098469e-323],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000],
[ 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000]])
The FORTRAN Piecewise Polynomial Package, PPPack, has a command bsplpp that converts from B-spline to piecewise polynomial form, which may serve your needs. Unfortunately, there isn't a Python wrapper for PPPack at this time.
If you have scipy version >= 0.18.0 installed you can use CubicSpline function from scipy.interpolate for cubic spline interpolation.
You can check scipy version by running following commands in python:
#!/usr/bin/env python3
import scipy
scipy.version.version
If your scipy version is >= 0.18.0 you can run following example code for cubic spline interpolation:
#!/usr/bin/env python3
import numpy as np
from scipy.interpolate import CubicSpline
# calculate 5 natural cubic spline polynomials for 6 points
# (x,y) = (0,12) (1,14) (2,22) (3,39) (4,58) (5,77)
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([12,14,22,39,58,77])
# calculate natural cubic spline polynomials
cs = CubicSpline(x,y,bc_type='natural')
# show values of interpolation function at x=1.25
print('S(1.25) = ', cs(1.25))
## Aditional - find polynomial coefficients for different x regions
# if you want to print polynomial coefficients in form
# S0(0<=x<=1) = a0 + b0(x-x0) + c0(x-x0)^2 + d0(x-x0)^3
# S1(1< x<=2) = a1 + b1(x-x1) + c1(x-x1)^2 + d1(x-x1)^3
# ...
# S4(4< x<=5) = a4 + b4(x-x4) + c5(x-x4)^2 + d5(x-x4)^3
# x0 = 0; x1 = 1; x4 = 4; (start of x region interval)
# show values of a0, b0, c0, d0, a1, b1, c1, d1 ...
cs.c
# Polynomial coefficients for 0 <= x <= 1
a0 = cs.c.item(3,0)
b0 = cs.c.item(2,0)
c0 = cs.c.item(1,0)
d0 = cs.c.item(0,0)
# Polynomial coefficients for 1 < x <= 2
a1 = cs.c.item(3,1)
b1 = cs.c.item(2,1)
c1 = cs.c.item(1,1)
d1 = cs.c.item(0,1)
# ...
# Polynomial coefficients for 4 < x <= 5
a4 = cs.c.item(3,4)
b4 = cs.c.item(2,4)
c4 = cs.c.item(1,4)
d4 = cs.c.item(0,4)
# Print polynomial equations for different x regions
print('S0(0<=x<=1) = ', a0, ' + ', b0, '(x-0) + ', c0, '(x-0)^2 + ', d0, '(x-0)^3')
print('S1(1< x<=2) = ', a1, ' + ', b1, '(x-1) + ', c1, '(x-1)^2 + ', d1, '(x-1)^3')
print('...')
print('S5(4< x<=5) = ', a4, ' + ', b4, '(x-4) + ', c4, '(x-4)^2 + ', d4, '(x-4)^3')
# So we can calculate S(1.25) by using equation S1(1< x<=2)
print('S(1.25) = ', a1 + b1*0.25 + c1*(0.25**2) + d1*(0.25**3))
# Cubic spline interpolation calculus example
# https://www.youtube.com/watch?v=gT7F3TWihvk
Here is how I could get results similar to MATLAB:
>>> from scipy.interpolate import PPoly, splrep
>>> x = [0, 1, 2, 3]
>>> y = [0, 1, 4, 0]
>>> tck = splrep(x, y)
>>> tck
Out: (array([ 0., 0., 0., 0., 3., 3., 3., 3.]),
array([ 3.19142761e-16, -3.00000000e+00, 1.05000000e+01,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00]),
3)
>>> pp = PPoly.from_spline(tck)
>>> pp.c.T
Out: array([[ -1.50000000e+00, 5.50000000e+00, -3.00000000e+00,
3.19142761e-16],
[ -1.50000000e+00, 5.50000000e+00, -3.00000000e+00,
3.19142761e-16],
[ -1.50000000e+00, 5.50000000e+00, -3.00000000e+00,
3.19142761e-16],
[ -1.50000000e+00, 5.50000000e+00, -3.00000000e+00,
3.19142761e-16],
[ -1.50000000e+00, -8.00000000e+00, -1.05000000e+01,
0.00000000e+00],
[ -1.50000000e+00, -8.00000000e+00, -1.05000000e+01,
0.00000000e+00],
[ -1.50000000e+00, -8.00000000e+00, -1.05000000e+01,
0.00000000e+00]])
The docs on scipy.interpolate.splrep say that you can get the coefficients:
Returns:
tck : tuple
(t,c,k) a tuple containing the vector of knots, the B-spline coefficients, and the degree of the spline.

Categories