Numpy dtype=int - python

In the below code .I get the expected results of x1
import numpy as np
x1 = np.arange(0.5, 10.4, 0.8)
print(x1)
[ 0.5 1.3 2.1 2.9 3.7 4.5 5.3 6.1 6.9 7.7 8.5 9.3 10.1]
But in the code below, when i set dtype=int why the result of x2 is not [ 0 1 2 2 3 4 5 6 6 7 8 9 10] and Instead I am getting the value of x2 as [ 0 1 2 3 4 5 6 7 8 9 10 11 12] where last value 12 overshoots the end value of 10.4.Please clarify my concept regarding this.
import numpy as np
x2 = np.arange(0.5, 10.4, 0.8, dtype=int)
print(x2)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12]

According to the docs: https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.arange.html
stop : number
End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.
arange : ndarray
Array of evenly spaced values.
For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.
So here the last element will be.
In [33]: np.ceil((10.4-0.5)/0.8)
Out[33]: 13.0
Hence we see the overshoot to 12 in case of np.arange(0.5, 10.4, 0.8, dtype=int), since stop=13 in the above case, and the default value is 0,
hence the output we observe is.
In [35]: np.arange(0.5, 10.4, 0.8, dtype=int)
Out[35]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
Hence the better way of generating integer ranges, is to use integer parameters like so:
In [25]: np.arange(0, 11, 1)
Out[25]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Related

How to set some values with an interval in a vector to be another vector

I have a vector with size, for example, (1,16) which is x = [1,2,3,4,.....16] and another vector y = [1,2,3,4] whose size is( 1,4)
I want to set the values in the vector x with interval 4 to be the vector y. it means it will be like that x(1:4:16) = y ; In python, how can I do that?
The expected output is to be x = [1 2 3 4 2 6 7 8 3 10 11 12 4 14 15 16].
Try using slice assignment:
x[::len(y)] = y
And now:
print(x)
Will give:
[1, 2, 3, 4, 2, 6, 7, 8, 3, 10, 11, 12, 4, 14, 15, 16]

Convert c-order index into f-order index in Python

I am trying to find a solution to the following problem. I have an index in C-order and I need to convert it into F-order.
To explain simply my problem, here is an example:
Let's say we have a matrix x as:
x = np.arange(1,5).reshape(2,2)
print(x)
array([[1, 2],
[3, 4]])
Then the flattened matrix in C order is:
flat_c = x.ravel()
print(flat_c)
array([1, 2, 3, 4])
Now, the value 3 is at the 2nd position of the flat_c vector i.e. flat_c[2] is 3.
If I would flatten the matrix x using the F-order, I would have:
flat_f = x.ravel(order='f')
array([1, 3, 2, 4])
Now, the value 3 is at the 1st position of the flat_f vector i.e. flat_f[1] is 3.
I am trying to find a way to get the F-order index knowing the dimension of the matrix and the corresponding index in C-order.
I tried using np.unravel_index but this function returns the matrix positions...
We can use a combination of np.ravel_multi_index and np.unravel_index for a ndarray supported solution. Hence, given array shape s of input array a and c-order index c_idx, it would be -
s = a.shape
f_idx = np.ravel_multi_index(np.unravel_index(c_idx,s)[::-1],s[::-1])
So, the idea is pretty simple. Use np.unravel_index to get c-based indices in n-dim, then get flattened-linear index in fortran order by using np.ravel_multi_index on flipped shape and those flipped n-dim indices to simulate fortran behavior.
Sample runs on 2D -
In [321]: a
Out[321]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [322]: s = a.shape
In [323]: c_idx = 6
In [324]: np.ravel_multi_index(np.unravel_index(c_idx,s)[::-1],s[::-1])
Out[324]: 4
In [325]: c_idx = 12
In [326]: np.ravel_multi_index(np.unravel_index(c_idx,s)[::-1],s[::-1])
Out[326]: 8
Sample run on 3D array -
In [336]: a
Out[336]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
In [337]: s = a.shape
In [338]: c_idx = 21
In [339]: np.ravel_multi_index(np.unravel_index(c_idx,s)[::-1],s[::-1])
Out[339]: 9
In [340]: a.ravel('F')[9]
Out[340]: 21
Suppose your matrix is of shape (nrow,ncol). Then the 1D index when unraveled in C style for the (irow,icol) entry is given by
idxc = ncol*irow + icol
In the above equation, you know idxc. Then,
icol = idxc % ncol
Now you can find irow
irow = (idxc - icol) / ncol
Now you know both irow and icol. You can use them to get the F index. I think the F index will be given by
idxf = nrow*icol + irow
Please double-check my math, I might have got something wrong...
For the 3D case, if your array has dimensions [n1][n2][n3], then the unraveled C-index for [i1][i2][i3] is
idxc = n2*n3*i1 + n3*i2+i3
Using modulo operations similar to the 2D case, we can recover i1,i2,i3 and then convert to unraveled F index, i.e.
n3*i2 + i3 = idxc % (n2*n3)
i3 = (n3*i2+i3) % n3
i2 = ((n3*i2+i3) - i3) /n3
i1 = (idxc-(n3+i2+i3)) /(n2*n3)
F index would be:
idxf = i1 + n1*i2 +n1*n2*i3
Please check my math.
In simple cases you may also get away with transposing and ravelling the array:
import numpy as np
x = np.arange(2 * 2).reshape(2, 2)
print(x)
# [[0 1]
# [2 3]]
print(x.ravel())
# [0 1 2 3]
print(x.transpose().ravel())
# [0 2 1 3]
x = np.arange(2 * 3 * 4).reshape(2, 3, 4)
print(x)
# [[[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# [[12 13 14 15]
# [16 17 18 19]
# [20 21 22 23]]]
print(x.ravel())
# [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
print(x.transpose().ravel())
# [ 0 12 4 16 8 20 1 13 5 17 9 21 2 14 6 18 10 22 3 15 7 19 11 23]

In Matlab for a matrix(m,n) equivalent matrix(:) , colon, in python

I need to find the equivalent from matlab of A(:) in python. Where A is a matrix(m,n):
For example:
A =
5 6 7
8 9 10
A(:)
ans =
5
8
6
9
7
10
thanks in advance!
If you want the column-major result (to match the Matlab convention), you probably want to use the transpose of your numpy matrix, and then the ndarray.ravel() method:
m = numpy.array([[ 5, 6, 7 ], [ 8, 9, 10 ]])
m.T.ravel()
which gives:
array([ 5, 8, 6, 9, 7, 10])
You can do this by reshaping your array with numpy.reshape
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.reshape.html
import numpy
m = numpy.array([[ 5, 6, 7 ], [ 8, 9, 10 ]])
print(numpy.reshape(m, -1, 'F'))

How can I get the index of next non-NaN number with series in pandas?

In pandas, I am now looping with an instance of Series, is it possible for me to know the index of the next non-NaN instantly when I meet a NaN. I don't want to skip those NaNs, because I want to do the interpolation against them.
e.g now I have a Series a with elements
5, 6, 5, NaN, NaN, NaN, 7, 8, 9, NaN, NaN, NaN, 10, 10
The indexes of them is from 0 to 13, when I iterating the Series, when would simply love to know what is the index of the next NaN, and what is the next non-NaN. So from the beginning, can I instantly know the index of the first NaN is 4? Then when I jump to a[4], I need to know the index of the next non-NaN number, which is 6 in this case.
Thank you so much.
You could use isnull method to find in what indices you have NaN values and then for current step you could compare your index with the next:
In [48]: s.index[s.isnull()]
Out[48]: Int64Index([3, 4, 5, 9, 10, 11], dtype='int64')
You could also use first_valid_index to find first non NaN value, e.g.:
In [49]: s[4:]
Out[49]:
4 NaN
5 NaN
6 7
7 8
8 9
9 NaN
10 NaN
11 NaN
12 10
13 10
dtype: float64
In [50]: s[4:].first_valid_index()
Out[50]: 6
EDIT
If you want to an integer index you could use get_loc of the pandas indices:
b = s[4:]
In [156]: b
Out[156]:
4 NaN
5 NaN
6 7
7 8
8 9
9 NaN
10 NaN
11 NaN
12 10
13 10
dtype: float64
In [157]: b.first_valid_index()
Out[157]: 6
In [158]: b.index.get_loc(b.first_valid_index())
Out[158]: 2
EDIT2
You could use get_indexer to get all indices where you have NaNs and where you have valid values:
import string
s = pd.Series([5, 6, 5, np.nan, np.nan, np.nan, 7, 8, 9, np.nan, np.nan, np.nan, 10, 10], index = list(string.ascii_letters[:len(s.index)]))
In [216]: s
Out[216]:
a 5
b 6
c 5
d NaN
e NaN
f NaN
g 7
h 8
i 9
j NaN
k NaN
l NaN
m 10
n 10
dtype: float64
valid_indx = s.index.get_indexer(s.index[~s.isnull()])
nan_indx = s.index.get_indexer(s.index[s.isnull()])
In [220]: valid_indx
Out[220]: array([ 0, 1, 2, 6, 7, 8, 12, 13])
In [221]: nan_indx
Out[221]: array([ 3, 4, 5, 9, 10, 11])
Or the simplest way will be with np.where:
In [222]: np.where(s.isnull())
Out[222]: (array([ 3, 4, 5, 9, 10, 11], dtype=int32),)
In [223]: np.where(~s.isnull())
Out[223]: (array([ 0, 1, 2, 6, 7, 8, 12, 13], dtype=int32),)

Fast interpolation of grid data

I have a large 3d np.ndarray of data that represents a physical variable sampled over a volume in a regular grid fashion (as in the value in array[0,0,0] represents the value at physical coords (0,0,0)).
I would like to go to a finer grid spacing by interpolating the data in the rough grid. At the moment I'm using scipy griddata linear interpolation but it's pretty slow (~90secs for 20x20x20 array). It's a bit overengineered for my purposes, allowing random sampling of the volume data. Is there anything out there that can take advantage of my regularly spaced data and the fact that there is only a limited set of specific points I want to interpolate to?
Sure! There are two options that do different things but both exploit the regularly-gridded nature of the original data.
The first is scipy.ndimage.zoom. If you just want to produce a denser regular grid based on interpolating the original data, this is the way to go.
The second is scipy.ndimage.map_coordinates. If you'd like to interpolate a few (or many) arbitrary points in your data, but still exploit the regularly-gridded nature of the original data (e.g. no quadtree required), it's the way to go.
"Zooming" an array (scipy.ndimage.zoom)
As a quick example (This will use cubic interpolation. Use order=1 for bilinear, order=0 for nearest, etc.):
import numpy as np
import scipy.ndimage as ndimage
data = np.arange(9).reshape(3,3)
print 'Original:\n', data
print 'Zoomed by 2x:\n', ndimage.zoom(data, 2)
This yields:
Original:
[[0 1 2]
[3 4 5]
[6 7 8]]
Zoomed by 2x:
[[0 0 1 1 2 2]
[1 1 1 2 2 3]
[2 2 3 3 4 4]
[4 4 5 5 6 6]
[5 6 6 7 7 7]
[6 6 7 7 8 8]]
This also works for 3D (and nD) arrays. However, be aware that if you zoom by 2x, for example, you'll zoom along all axes.
data = np.arange(27).reshape(3,3,3)
print 'Original:\n', data
print 'Zoomed by 2x gives an array of shape:', ndimage.zoom(data, 2).shape
This yields:
Original:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
Zoomed by 2x gives an array of shape: (6, 6, 6)
If you have something like a 3-band, RGB image that you'd like to zoom, you can do this by specifying a sequence of tuples as the zoom factor:
print 'Zoomed by 2x along the last two axes:'
print ndimage.zoom(data, (1, 2, 2))
This yields:
Zoomed by 2x along the last two axes:
[[[ 0 0 1 1 2 2]
[ 1 1 1 2 2 3]
[ 2 2 3 3 4 4]
[ 4 4 5 5 6 6]
[ 5 6 6 7 7 7]
[ 6 6 7 7 8 8]]
[[ 9 9 10 10 11 11]
[10 10 10 11 11 12]
[11 11 12 12 13 13]
[13 13 14 14 15 15]
[14 15 15 16 16 16]
[15 15 16 16 17 17]]
[[18 18 19 19 20 20]
[19 19 19 20 20 21]
[20 20 21 21 22 22]
[22 22 23 23 24 24]
[23 24 24 25 25 25]
[24 24 25 25 26 26]]]
Arbitrary interpolation of regularly-gridded data using map_coordinates
The first thing to undersand about map_coordinates is that it operates in pixel coordinates (e.g. just like you'd index the array, but the values can be floats). From your description, this is exactly what you want, but if often confuses people. For example, if you have x, y, z "real-world" coordinates, you'll need to transform them to index-based "pixel" coordinates.
At any rate, let's say we wanted to interpolate the value in the original array at position 1.2, 0.3, 1.4.
If you're thinking of this in terms of the earlier RGB image case, the first coordinate corresponds to the "band", the second to the "row" and the last to the "column". What order corresponds to what depends entirely on how you decide to structure your data, but I'm going to use these as "z, y, x" coordinates, as it makes the comparison to the printed array easier to visualize.
import numpy as np
import scipy.ndimage as ndimage
data = np.arange(27).reshape(3,3,3)
print 'Original:\n', data
print 'Sampled at 1.2, 0.3, 1.4:'
print ndimage.map_coordinates(data, [[1.2], [0.3], [1.4]])
This yields:
Original:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
Sampled at 1.2, 0.3, 1.4:
[14]
Once again, this is cubic interpolation by default. Use the order kwarg to control the type of interpolation.
It's worth noting here that all of scipy.ndimage's operations preserve the dtype of the original array. If you want floating point results, you'll need to cast the original array as a float:
In [74]: ndimage.map_coordinates(data.astype(float), [[1.2], [0.3], [1.4]])
Out[74]: array([ 13.5965])
Another thing you may notice is that the interpolated coordinates format is rather cumbersome for a single point (e.g. it expects a 3xN array instead of an Nx3 array). However, it's arguably nicer when you have sequences of coordinate. For example, consider the case of sampling along a line that passes through the "cube" of data:
xi = np.linspace(0, 2, 10)
yi = 0.8 * xi
zi = 1.2 * xi
print ndimage.map_coordinates(data, [zi, yi, xi])
This yields:
[ 0 1 4 8 12 17 21 24 0 0]
This is also a good place to mention how boundary conditions are handled. By default, anything outside of the array is set to 0. Thus the last two values in the sequence are 0. (i.e. zi is > 2 for the last two elements).
If we wanted the points outside the array to be, say -999 (We can't use nan as this is an integer array. If you want nan, you'll need to cast to floats.):
In [75]: ndimage.map_coordinates(data, [zi, yi, xi], cval=-999)
Out[75]: array([ 0, 1, 4, 8, 12, 17, 21, 24, -999, -999])
If we wanted it to return the nearest value for points outside the array, we'd do:
In [76]: ndimage.map_coordinates(data, [zi, yi, xi], mode='nearest')
Out[76]: array([ 0, 1, 4, 8, 12, 17, 21, 24, 25, 25])
You can also use "reflect" and "wrap" as boundary modes, in addition to "nearest" and the default "constant". These are fairly self-explanatory, but try experimenting a bit if you're confused.
For example, let's interpolate a line along the first row of the first band in the array that extends for twice the distance of the array:
xi = np.linspace(0, 5, 10)
yi, zi = np.zeros_like(xi), np.zeros_like(xi)
The default give:
In [77]: ndimage.map_coordinates(data, [zi, yi, xi])
Out[77]: array([0, 0, 1, 2, 0, 0, 0, 0, 0, 0])
Compare this to:
In [78]: ndimage.map_coordinates(data, [zi, yi, xi], mode='reflect')
Out[78]: array([0, 0, 1, 2, 2, 1, 2, 1, 0, 0])
In [78]: ndimage.map_coordinates(data, [zi, yi, xi], mode='wrap')
Out[78]: array([0, 0, 1, 2, 0, 1, 1, 2, 0, 1])
Hopefully that clarifies things a bit!
Great answer by Joe. Based on his suggestion, I created the regulargrid package (https://pypi.python.org/pypi/regulargrid/, source at https://github.com/JohannesBuchner/regulargrid)
It provides support for n-dimensional Cartesian grids (as needed here) via the very fast scipy.ndimage.map_coordinates for arbitrary coordinate scales.

Categories