linspace that would always include the final point? - python

For arbitrary pair of 2D points in the plane, I want to break the connecting vector to parts specified by a precision factor. However I want it to always include the start and endpoint. As an extra feature I am expecting the segmenting from the end of the vector to the beginning would give me the same segmentation from the beginning to end(of course after a flipping) . As I can see, numpy.linspace naturally satisfies this condition except for the situations where
the precision is too big that it only consists of one point. Is there any built-in function to take care of this situation or any hints that I would be able to correct this behaviour?
import numpy as np
alpha = np.array([0,0])
beta = np.array([1,1])
alpha_beta_dist = np.linalg.norm(beta - alpha)
for i in range(10):
precision = np.random.random(1)
traversal = np.linspace(0.0, 1.0, num = alpha_beta_dist / float(precision))
traversal2 = np.fliplr([np.linspace(1.0, 0.0, num = alpha_beta_dist / float(precision))])
traversal2 = traversal2[0]
if (traversal != traversal2).all():
print 'precision: ', precision
print 'taversal: ', traversal
print 'taversal2: ', traversal2[0]

Make sure num is at least 2:
traversal = np.linspace(0.0, 1.0,
num=max(alpha_beta_dist/float(precision), 2))
np.linspace will return both endpoints (by default) unless num is less than 2:
In [23]: np.linspace(0, 1, num=0)
Out[23]: array([], dtype=float64)
In [24]: np.linspace(0, 1, num=1)
Out[24]: array([ 0.])
In [25]: np.linspace(0, 1, num=2)
Out[25]: array([ 0., 1.])

Related

Convert Numpy array of floats to ints proportionately (balancing chemical equation)

I have a code that balances the chemical equations. The only problem is that I want to convert the final solution i.e. 1D np array of floats to integers. Obviously, I can not directly round it to nearest integers, that would mess up the balancing. One way is to multiply it with a number that will convert the floats to integers(type does not matter). See below for an example.
>>> coeffs=equation_balancer(reactants=["H2","O2"], products=["H2O"])
>>> coeffs
{"H2": 1.0, "O2": 0.5, 'H2O1': 1.0}
>>> import numpy as np
>>> np.asarray([i for i in coeffs.values()])
array([1. , 0.5, 1.])
if the final array is multiplied by 2, then the fractions (floats) can be removed.
PS to show an example above, I changed back to np, since the equation_balancer uses scipy.linalg.solve to balance the equation.
>>> np.asrray([i for i in coeffs.values()])*2
array([2., 1., 2.])
How to get this number that on multiplication with array gives the integer-valued array? The actual type of array does not matter.
One way would be to multiply the array with highest denominator i.e. multiples of 10. And then find the highest common factor:
>>> c=np.asrray([i for i in coeffs.values()])*10
>>> factor = np.gcd.reduce(c.astype(int))
>>> factor
5
>>> c/factor
array([2., 1., 2.])
In the above case finding the 10*n that is defined by the number of highest decimal places, is crucial. I don't know how to code it at the moment. Is there any other approach that would be more suitable? Any help.
This seems to work:
(Credit to this SO answer on how to convert a floating point number into a tuple of "minimal" integer numerator and integer denominator -- rather than some freaksihly large numerator and denominator)
import numpy as np
from fractions import Fraction
# A configurable param.
# Keep this small to avoid frekish large results.
# Increase it only in rare cases where the coeffs
# span a "huge" scale.
MAX_DENOM = 100
fractions = [Fraction(val).limit_denominator(MAX_DENOM)
for val in coeffs.values()]
ratios = np.array([(f.numerator, f.denominator) for f in fractions])
# As an alternative to the above two statements, uncomment and use
# below statement for Python 3.8+
# ratios = np.array([Fraction(val).limit_denominator(MAX_DENOM).as_integer_ratio()
# for val in coeffs.values()])
factor = np.lcm.reduce(ratios[:,1])
result = [round(v * factor) for v in coeffs.values()]
# print
result
Output for coeffs = {"H2": 1.0, "O2": 0.5, 'H2O1': 1.0}:
[2, 1, 2]
Output for coeffs = {"H2": 0.5, "N2":0.5, "O2": 1.5, "H1N1O3":1.0}:
[1, 1, 3, 2]
Output for coeffs = {"H2": 1.0, "O3": (1/3), "H2O1":1.0}:
[3, 1, 3]
Output for coeffs = {"H4": 0.5, "O7": (1/7), "H2O1":1.0}:
[7, 2, 14]
Output for coeffs = {"H2": .1, "O2": 0.05, 'H2O1': .1}:
[2, 1, 2]
I am not entirely happy with my solution but it seems to work alright, let me know what you think, I am essentially converting the float to a string and counting the number of characters after the decimal place, it will work as long as the values are always float
import numpy as np
coeffs = {"H2": .1, "O2": 0.05, 'H2O1': .1}
n = max([len(str(i).split('.')[1]) for i in coeffs.values()])
c=np.array([i for i in coeffs.values()])*10**n
factor = np.gcd.reduce(c.astype(np.uint64))
print((c/factor).astype(np.uint64))
source and other solutions:
Easy way of finding decimal places
Testing: running some possible difficult cases examples converting back
primes = [3,5,7,11,13,17,19,23,29,79] ## some prime numbers
primes_over_1 = [1/i for i in primes]
for i in range(1, len(primes_over_1) - 1):
coeffs = {"H2": primes_over_1[i-1], "O2": primes_over_1[i], 'H2O1': primes_over_1[i+1]}
print('coefs: ', [a for a in coeffs.values()])
n = max([len(str(a).split('.')[1]) for a in coeffs.values()])
c=np.array([a for a in coeffs.values()])*10**n
factor = np.gcd.reduce(c.astype(np.uint64))
coeffs_asInt = (c/factor).astype(np.uint64)
print('as int:', coeffs_asInt)
coeffs_back = coeffs_asInt.astype(np.float64)*(factor/10**n)
coeffs_back_str = ["{0:.16g}".format(a) for a in coeffs_back]
print('back: ', coeffs_back_str)
print('########################################################\n')
output:
coefs: [0.3333333333333333, 0.2, 0.14285714285714285]
as int: [8333333333333333 5000000000000000 3571428571428571]
back: ['0.3333333333333334', '0.2', '0.1428571428571428']
########################################################
coefs: [0.2, 0.14285714285714285, 0.09090909090909091]
as int: [5000000000000000 3571428571428571 2272727272727273]
back: ['0.2', '0.1428571428571428', '0.09090909090909093']
########################################################
coefs: [0.14285714285714285, 0.09090909090909091, 0.07692307692307693]
as int: [14285714285714284 9090909090909092 7692307692307693]
back: ['0.1428571428571428', '0.09090909090909093', '0.07692307692307694']
########################################################
coefs: [0.09090909090909091, 0.07692307692307693, 0.058823529411764705]
as int: [2840909090909091 2403846153846154 1838235294117647]
back: ['0.09090909090909091', '0.07692307692307693', '0.05882352941176471']
########################################################
coefs: [0.07692307692307693, 0.058823529411764705, 0.05263157894736842]
as int: [2403846153846154 1838235294117647 1644736842105263]
back: ['0.07692307692307693', '0.05882352941176471', '0.05263157894736842']
########################################################
coefs: [0.058823529411764705, 0.05263157894736842, 0.043478260869565216]
as int: [1838235294117647 1644736842105263 1358695652173913]
back: ['0.05882352941176471', '0.05263157894736842', '0.04347826086956522']
########################################################
coefs: [0.05263157894736842, 0.043478260869565216, 0.034482758620689655]
as int: [6578947368421052 5434782608695652 4310344827586207]
back: ['0.05263157894736842', '0.04347826086956522', '0.03448275862068966']
########################################################
coefs: [0.043478260869565216, 0.034482758620689655, 0.012658227848101266]
as int: [21739130434782608 17241379310344828 6329113924050633]
back: ['0.04347826086956522', '0.03448275862068966', '0.01265822784810127']
########################################################

How to do element-wise rounding of NumPy array to first non-zero digit?

I would like to "round" (not exact a mathematical rounding) the elements of a numpy array in the following way:
Given a numpy NxN or NxM 2D array with digit between 0.00001 to 9.99999 like
a=np.array([[1.232, 1.872,2.732,0.123],
[0.0019, 0.025, 1.854, 0.00017],
[1.457, 0.0021, 2.34 , 9.99],
[1.527, 3.3, 0.012 , 0.005]]
)
I would like basically to "round" this numpy array by selecting the first non-zero digit (irregardless of the digit that follows the first non-zero digit) of each element
giving the output:
output =np.array([[1.0, 1.0, 2.0, 0.1],
[0.001, 0.02, 1.0, 0.0001],
[1.0, 0.002, 2 , 9.0],
[1, 3, 0.01 , 0.005]]
)
thanks for any help!
You could use np.logspace and np.seachsorted to determine the order of magnitude of each element and then floor divide and multiply back
po10 = np.logspace(-10,10,21)
oom = po10[po10.searchsorted(a)-1]
a//oom*oom
# array([[1.e+00, 1.e+00, 2.e+00, 1.e-01],
# [1.e-03, 2.e-02, 1.e+00, 1.e-04],
# [1.e+00, 2.e-03, 2.e+00, 9.e+00],
# [1.e+00, 3.e+00, 1.e-02, 5.e-03]])
What you would want to do is to keep a fixed number of significant figures.
This functionality is not integrated into NumPy.
To get only the 1 significant figure, you could look into either #PaulPanzer or #darcamo answers (assuming that you only have positive values).
If you want something that works a specified number of significant figures, you could use something like:
def significant_figures(arr, num=1):
# : compute the order of magnitude
order = np.zeros_like(arr)
mask = arr != 0
order[mask] = np.floor(np.log10(np.abs(arr[mask])))
del mask # free unused memory
# : compute the corresponding precision
prec = num - order - 1
return np.round(arr * 10.0 ** prec) / 10.0 ** prec
print(significant_figures(a, 1))
# [[1.e+00 2.e+00 3.e+00 1.e-01]
# [2.e-03 2.e-02 2.e+00 2.e-04]
# [1.e+00 2.e-03 2.e+00 1.e+01]
# [2.e+00 3.e+00 1.e-02 5.e-03]]
print(significant_figures(a, 2))
# [[1.2e+00 1.9e+00 2.7e+00 1.2e-01]
# [1.9e-03 2.5e-02 1.9e+00 1.7e-04]
# [1.5e+00 2.1e-03 2.3e+00 1.0e+01]
# [1.5e+00 3.3e+00 1.2e-02 5.0e-03]]
EDIT
For truncated output use np.floor() instead of np.round() just before the return.
First get the powers of 10 for each number in the array with
powers = np.floor(np.log10(a))
In your example this gives us
array([[ 0., 0., 0., -1.],
[-3., -2., 0., -4.],
[ 0., -3., 0., 0.],
[ 0., 0., -2., -3.]])
Now, if we divide the i-th element in the array by 10**power_i we essentially move each number non-zero element in the array to the first position. Now we can simple take the floor to remove the other non-zero digits and then multiply the result by 10**power_i to get back to the original scale.
The complete solution is then only the code below
powers = np.floor(np.log10(a))
10**powers * np.floor(a/10**powers)
What about numbers greater than or equal to 10?
For this you can simply take np.floor of the original value in the array. We can do this easily with a mask. You can modify the answer as below
powers = np.floor(np.log10(a))
result = 10**powers * np.floor(a/10**powers)
mask = a >= 10
result[mask] = np.floor(a[mask])
You can also use a mask to avoid computing the powers and logarithm for numbers that will just be replaced later.

Automatically reduce piecewise function components - Pyomo

In pyomo, I have a piece-wise linear constraint defined through pyomo.environ.Piecewise. I keep getting a warning along the lines of
Piecewise component '<component name>' has detected slopes of consecutive piecewise segments to be within <tolerance> of one another. Refer to the Piecewise help documentation for information on how to disable this warning.
I know I could increase the tolerance and get rid of the warning, but I'm wondering if there is a general approach (through Pyomo or numpy) to reduce the number of "segments" if two consecutive slopes are below a given tolerance.
I could obviously implement this myself, but I'd like to avoid reinventing the wheel.
Ok, this is what I came up with. Definitely not optimized for performance, but my case depends on few points. It also lacks more validations on the inputs (e.g. x being sorted and unique).
def reduce_piecewise(x, y, abs_tol):
"""
Remove unnecessary points from piece-wise curve.
Points are remove if the slopes of consecutive segments
differ by less than `abs_tol`.
x points must be sorted and unique.
Consecutive y points can be the same though!
Parameters
----------
x : List[float]
Points along x-axis.
y : List[float]
abs_tol : float
Tolerance between consecutive segments.
Returns
-------
(np.array, np.array)
x and y points - reduced.
"""
if not len(x) == len(y):
raise ValueError("x and y must have same shape")
x_reduced = [x[0]]
y_reduced = [y[0]]
for i in range(1, len(x) - 1):
left_slope = (y[i] - y_reduced[-1])/(x[i] - x_reduced[-1])
right_slope = (y[i+1] - y[i])/(x[i+1] - x[i])
if abs(right_slope - left_slope) > abs_tol:
x_reduced.append(x[i])
y_reduced.append(y[i])
x_reduced.append(x[-1])
y_reduced.append(y[-1])
return np.array(x_reduced), np.array(y_reduced)
And here are some examples:
>>> x = np.array([0, 1, 2, 3])
>>> y = np.array([0, 1, 2, 3])
>>> reduce_piecewise(x, y, 0.01)
(array([0, 3]), array([0, 3]))
>>> x = np.array([0, 1, 2, 3, 4, 5])
>>> y = np.array([0, 2, -1, 3, 4.001, 5]) # 4.001 should be removed
>>> reduce_piecewise(x, y, 0.01)
(array([0, 1, 2, 3, 5]), array([ 0., 2., -1., 3., 5.]))

Filtering histogram edges and counts

Consider a histogram calculation of a numpy array that returns percentages:
# 500 random numbers between 0 and 10,000
values = np.random.uniform(0,10000,500)
# Histogram using e.g. 200 buckets
perc, edges = np.histogram(values, bins=200,
weights=np.zeros_like(values) + 100/values.size)
The above returns two arrays:
perc containing the % (i.e. percentages) of values within each pair of consecutive edges[ix] and edges[ix+1] out of the total.
edges of length len(hist)+1
Now, say that I want to filter perc and edges so that I only end up with the percentages and edges for values contained within a new range [m, M]. '
That is, I want to work with the sub-arrays of perc and edges corresponding to the interval of values within [m, M]. Needless to say, the new array of percentages would still refer to the total fraction count of the input array. We just want to filter perc and edges to end up with the correct sub-arrays.
How can I post-process perc and edges to do so?
The values of m and M can be any number of course. In the example above, we can assume e.g. m = 0 and M = 200.
m = 0; M = 200
mask = [(m < edges) & (edges < M)]
>>> edges[mask]
array([ 37.4789683 , 87.07491593, 136.67086357, 186.2668112 ])
Let's work on a smaller dataset so that it is easier to understand:
np.random.seed(0)
values = np.random.uniform(0, 100, 10)
values.sort()
>>> values
array([ 38.34415188, 42.36547993, 43.75872113, 54.4883183 ,
54.88135039, 60.27633761, 64.58941131, 71.51893664,
89.17730008, 96.36627605])
# Histogram using e.g. 10 buckets
perc, edges = np.histogram(values, bins=10,
weights=np.zeros_like(values) + 100./values.size)
>>> perc
array([ 30., 0., 20., 10., 10., 10., 0., 0., 10., 10.])
>>> edges
array([ 38.34415188, 44.1463643 , 49.94857672, 55.75078913,
61.55300155, 67.35521397, 73.15742638, 78.9596388 ,
84.76185122, 90.56406363, 96.36627605])
m = 0; M = 50
mask = (m <= edges) & (edges < M)
>>> mask
array([ True, True, True, False, False, False, False, False, False,
False, False], dtype=bool)
>>> edges[mask]
array([ 38.34415188, 44.1463643 , 49.94857672])
>>> perc[mask[:-1]][:-1]
array([ 30., 0.])
m = 40; M = 60
mask = (m < edges) & (edges < M)
>>> edges[mask]
array([ 44.1463643 , 49.94857672, 55.75078913])
>>> perc[mask[:-1]][:-1]
array([ 0., 20.])
Well you might need some mathematics for this. The bins are equally spaced so you can determine which bin is the first to include and which is the last by using the width of each bin:
bin_width = edges[1] - edges[0]
Now compute the first and last valid bin:
first = math.floor((m - edges[0]) / bin_width) + 1 # How many bins from the left
last = math.floor((edges[-1] - M) / bin_width) + 1 # How many bins from the right
(Ignore the +1 for both if you want to include the bin containing m or M - but then be careful that you don't end up with negative values for first and last!)
Now you know how many bins to include:
valid_edges = edges[first:-last]
valid_perc = perc[first:-last]
This will exclude the first first points and the last last points.
Might be that I haven't payed enough attention to rounding and there is an "off by one" error included but I think the idea is sound. :-)
You probably need to catch special cases like M > edges[-1] but for readability I haven't included these.
Or if the bins are not equally spaced use boolean masks instead of the calculation:
first = edged[edges < m].size + 1
last = edged[edges > M].size + 1

Calculating wind divergence of u and v using Python, np.gradient

I'm very new to Python and currently trying to replicate plots etc that I previously used GrADs for. I want to calculate the divergence at each grid box using u and v wind fields (which are just scaled by specific humidity, q), from a netCDF climate model file.
From endless searching I know I need to use some combination of np.gradient and np.sum, but can't find the right combination. I just know that to do it 'by hand', the calculation would be
divg = dqu/dx + dqv/dy
I know the below is wrong, but it's the best I've got so far...
nc = Dataset(ifile)
q = np.array(nc.variables['hus'][0,:,:])
u = np.array(nc.variables['ua'][0,:,:])
v = np.array(nc.variables['va'][0,:,:])
lon=nc.variables['lon'][:]
lat=nc.variables['lat'][:]
qu = q*u
qv = q*v
dqu/dx, dqu/dy = np.gradient(qu, [dx, dy])
dqv/dx, dqv/dy = np.gradient(qv, [dx, dy])
divg = np.sum(dqu/dx, dqv/dy)
This gives the error 'SyntaxError: can't assign to operator'.
Any help would be much appreciated.
try something like:
dqu_dx, dqu_dy = np.gradient(qu, [dx, dy])
dqv_dx, dqv_dy = np.gradient(qv, [dx, dy])
you can not assign to any operation in python; any of those are syntax errors:
a + b = 3
a * b = 7
# or, in your case:
a / b = 9
UPDATE
following Pinetwig's comment: a/b is not a valid identifier name; it is (the return value of) an operator.
Try removing the [dx, dy].
[dqu_dx, dqu_dy] = np.gradient(qu)
[dqv_dx, dqv_dy] = np.gradient(qv)
Also to point out if you are recreating plots. Gradient changed in numpy between 1.82 and 1.9. This had an effect for recreating matlab plots in python as 1.82 was the matlab method. I am not sure how this relates to GrADs. Here is the wording for both.
1.82
"The gradient is computed using central differences in the interior
and first differences at the boundaries. The returned gradient hence has
the same shape as the input array."
1.9
"The gradient is computed using second order accurate central differences in the interior and either first differences or second order accurate one-sides (forward or backwards) differences at the boundaries. The returned gradient hence has the same shape as the input array."
The gradient function for 1.82 is here.
def gradient(f, *varargs):
"""
Return the gradient of an N-dimensional array.
The gradient is computed using central differences in the interior
and first differences at the boundaries. The returned gradient hence has
the same shape as the input array.
Parameters
----------
f : array_like
An N-dimensional array containing samples of a scalar function.
`*varargs` : scalars
0, 1, or N scalars specifying the sample distances in each direction,
that is: `dx`, `dy`, `dz`, ... The default distance is 1.
Returns
-------
gradient : ndarray
N arrays of the same shape as `f` giving the derivative of `f` with
respect to each dimension.
Examples
--------
>>> x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> np.gradient(x)
array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])
>>> np.gradient(x, 2)
array([ 0.5 , 0.75, 1.25, 1.75, 2.25, 2.5 ])
>>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float))
[array([[ 2., 2., -1.],
[ 2., 2., -1.]]),
array([[ 1. , 2.5, 4. ],
[ 1. , 1. , 1. ]])]
"""
f = np.asanyarray(f)
N = len(f.shape) # number of dimensions
n = len(varargs)
if n == 0:
dx = [1.0]*N
elif n == 1:
dx = [varargs[0]]*N
elif n == N:
dx = list(varargs)
else:
raise SyntaxError(
"invalid number of arguments")
# use central differences on interior and first differences on endpoints
outvals = []
# create slice objects --- initially all are [:, :, ..., :]
slice1 = [slice(None)]*N
slice2 = [slice(None)]*N
slice3 = [slice(None)]*N
otype = f.dtype.char
if otype not in ['f', 'd', 'F', 'D', 'm', 'M']:
otype = 'd'
# Difference of datetime64 elements results in timedelta64
if otype == 'M' :
# Need to use the full dtype name because it contains unit information
otype = f.dtype.name.replace('datetime', 'timedelta')
elif otype == 'm' :
# Needs to keep the specific units, can't be a general unit
otype = f.dtype
for axis in range(N):
# select out appropriate parts for this dimension
out = np.empty_like(f, dtype=otype)
slice1[axis] = slice(1, -1)
slice2[axis] = slice(2, None)
slice3[axis] = slice(None, -2)
# 1D equivalent -- out[1:-1] = (f[2:] - f[:-2])/2.0
out[slice1] = (f[slice2] - f[slice3])/2.0
slice1[axis] = 0
slice2[axis] = 1
slice3[axis] = 0
# 1D equivalent -- out[0] = (f[1] - f[0])
out[slice1] = (f[slice2] - f[slice3])
slice1[axis] = -1
slice2[axis] = -1
slice3[axis] = -2
# 1D equivalent -- out[-1] = (f[-1] - f[-2])
out[slice1] = (f[slice2] - f[slice3])
# divide by step size
outvals.append(out / dx[axis])
# reset the slice object in this dimension to ":"
slice1[axis] = slice(None)
slice2[axis] = slice(None)
slice3[axis] = slice(None)
if N == 1:
return outvals[0]
else:
return outvals
If your grid is Gaussian and the wind names in the file are "u" and "v" you can also calculate divergence directly using cdo:
cdo uv2dv in.nc out.nc
See https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-6850002.13.2 for more details.

Categories