How do I stretch a list of floats in Python? - python

I'm working in Python and have a list of hourly values for a day. For simplicity let's say there are only 10 hours in a day.
[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0]
I want to stretch this around the centre-point to 150% to end up with:
[0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0]
Note this is just an example and I will also need to stretch things by amounts that leave fractional amounts in a given hour. For example stretching to 125% would give:
[0.0, 0.0, 0.5, 1.0, 1.0, 1.0, 1.0, 0.5, 0.0, 0.0]
My first thought for handling the fractional amounts is to multiply the list up by a factor of 10 using np.repeat, apply some method for stretching out the values around the midpoint, then finally split the list into chunks of 10 and take the mean for each hour.
My main issue is the "stretching" part but if the answer also solves the second part so much the better.

I guess, you need something like that:
def stretch(xs, coef):
# compute new distibution
oldDist = sum(hours[:len(hours)/2])
newDist = oldDist * coef
# generate new list
def f(x):
if newDist - x < 0:
return 0.0
return min(1.0, newDist - x)
t = [f(x) for x in range(len(xs)/2)]
res = list(reversed(t))
res.extend(t)
return res
But be careful with odd count of hours.

If I look at the expected output, the algorithm goes something like this:
start with a list of numbers, values >0.0 indicate working hours
sum those hours
compute how many extra hours are requested
divide those
extra hours over both ends of the sequence by prepending or appending
half of this at each 'end'
So:
hours = [0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0]
expansion = 130
extra_hrs = float(sum(hours)) * float(expansion - 100)/100
# find indices of the first and last non-zero hours
# because of floating point can't use "==" for comparison.
hr_idx = [idx for (idx, value) in enumerate(hours) if value>0.001]
# replace the entries before the first and after the last
# with half the extra hours
print "Before expansion:",hours
hours[ hr_idx[0]-1 ] = hours[ hr_idx[-1]+1 ] = extra_hrs/2.0
print "After expansion:",hours
Gives as output:
Before expansion: [0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0]
After expansion: [0.0, 0.0, 0.6, 1.0, 1.0, 1.0, 1.0, 0.6, 0.0, 0.0]

This is what I've ended up doing. It's a little ugly as it needs to handle stretch coefficients less than 100%.
def stretch(xs, coef, centre):
"""Scale a list by a coefficient around a point in the list.
Parameters
----------
xs : list
Input values.
coef : float
Coefficient to scale by.
centre : int
Position in the list to use as a centre point.
Returns
-------
list
"""
grain = 100
stretched_array = np.repeat(xs, grain * coef)
if coef < 1:
# pad start and end
total_pad_len = grain * len(xs) - len(stretched_array)
centre_pos = float(centre) / len(xs)
start_pad_len = centre_pos * total_pad_len
end_pad_len = (1 - centre_pos) * total_pad_len
start_pad = [stretched_array[0]] * int(start_pad_len)
end_pad = [stretched_array[-1]] * int(end_pad_len)
stretched_array = np.array(start_pad + list(stretched_array) + end_pad)
else:
pivot_point = (len(xs) - centre) * grain * coef
first = int(pivot_point - (len(xs) * grain)/2)
last = first + len(xs) * grain
stretched_array = stretched_array[first:last]
return [round(chunk.mean(), 2) for chunk in chunks(stretched_array, grain)]
def chunks(iterable, n):
"""
Yield successive n-sized chunks from iterable.
Source: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python#answer-312464
"""
for i in xrange(0, len(iterable), n):
yield iterable[i:i + n]

Related

Python polynomial regression values ceiling <0,1>

I am trying to find the best curve that will describe my data. my data are stored in numpy arrays of t and dur they are both in values only from 0-1. However the best fit I get according to R**2 score is this yellow line with score of 0.979388 which doesn't fit my data because it is way off from expected values when it is well above 1 in Y axis:
t = [1.0, 1.0, 1.0, 1.0, 1.0, 0.33695652173913043, 0.010869565217391304, 1.0, 0.018518518518518517, 1.0, 1.0, 1.0, 1.0, 1.0, 0.005076142131979695, 1.0, 1.0, 1.0, 1.0, 0.03225806451612903, 1.0, 1.0, 1.0, 1.0, 1.0, 0.5, 0.25, 1.0]
dur = [1.0, 1.0, 1.0, 1.0, 0.9999999999999998, 0.2688679245283018, 0.2688679245283018, 1.0, 0.46692607003891046, 1.0, 1.0, 1.0, 1.0, 1.0, 0.4444444444444444, 1.0, 1.0, 1.0, 1.0, 0.34210526315789475, 1.0, 1.0, 1.0, 1.0, 1.0, 0.4714285714285715, 0.4714285714285715, 1.0]
#polynomial curve fitting
mymodel1 = np.poly1d(np.polyfit(t, dur, 1))
mymodel2 = np.poly1d(np.polyfit(t, dur, 2))
mymodel3 = np.poly1d(np.polyfit(t, dur, 3))
mymodel4 = np.poly1d(np.polyfit(t, dur, 4))
#polynomial score
p1 = r2_score(dur, mymodel1(t))
p2 = r2_score(dur, mymodel2(t))
p3 = r2_score(dur, mymodel3(t))
p4 = r2_score(dur, mymodel4(t))
#append results of R**2 to list of tuples from which I extract best score
fit = []
fit.append(p1)
fit.append(p2)
fit.append(p3)
fit.append(p4)
fitname = []
fitname.append('p1')
fitname.append('p2')
fitname.append('p3')
fitname.append('p4')
#append best result value
resultValue.append(max(fitTuple,key=lambda item:item[0])[0])
#append best result name
resultName.append(max(fitTuple,key=lambda item:item[0])[1])
#plot values from regression models
myline = np.linspace(0, 1, 100)
plt.plot(myline, mymodel1(myline),color = "black")
plt.plot(myline, mymodel2(myline),color = "black")
plt.plot(myline, mymodel3(myline),color = "black")
plt.plot(myline, mymodel4(myline),color = "yellow")
This is what is called "overfitting". If you fit overly complex models to your data, the models will usually have very high R^2 and indeed meet the data points of the data you use for training quite well, but are clearly not the appropriate choice, as can be seen when trying to fit new data, e.g. they don't interpolate well. And fitting polynomials of high degree is usually taken as a standard example for overfitting.
If you want to stick with polynomial models, you should think about what the least complex model, i.e. in this case the lowest degree polynomial, is, that you would still think appropriate for your data. In your case, quadratic seems OK.
One usually employs more sophisticated methods for regression, like those provided in e.g. scikit learn, which can help you find the right model (e.g. via cross-validation) and also provide regularization techniques. For model selection, see here.

Problem while doing vector manipulation in Python.(vector component perpendicular to reference vector)

I have a set of coordinates which I would like to manipulate and get desired results. I need to find the horizontal projection of some vector with respect to a reference vector ie, if v1(x1,y1,z1) represent the reference unit vector and v2(x2,y2,z2) be some random vector, I want to find the projection of v2 perpendicular to the direction of v1.
The figurative representation of the required is given as,
https://drive.google.com/file/d/1i3RQm--nLc1dIdZGMpt7oc5KLahOJySk/view?usp=sharing
The vectorial representation is required as
https://drive.google.com/file/d/12QLIoIJ0wckLa8sIgJ8ynMaD33PWrgxQ/view?usp=sharing
The code that I have written is as follows,
catalog=ascii.read("catalog.txt",'r',format='fixed_width_no_header', fast_reader=False, delimiter="\s",names=x,col_starts=y,col_ends=z)
x=[]
y=[]
z=[]
for i in range(0,len(catalog['PSRJ'])):
catalog['DECJD'][i] = catalog['DECJD'][i]+90
x.append(sin(catalog['DECJD'][i])*cos(catalog['RAJD'][i]))
y.append(sin(catalog['DECJD'][i])*sin(catalog['RAJD'][i]))
z.append(cos(catalog['DECJD'][i]))
coord=[]
for i in range(0,len(catalog['PSRJ'])):
coord.append([x[i],y[i],z[i]])
def norm(k):
p=[0.0,0.0,0.0]
p[0]=k[0]/(k[0]**2+k[1]**2+k[2]**2)**0.5
p[1]=k[1]/(k[0]**2+k[1]**2+k[2]**2)**0.5
p[2]=k[2]/(k[0]**2+k[1]**2+k[2]**2)**0.5
return([p[0],p[1],p[2]])
direc = norm(coord[]) #insert key for required coordinate
pix=[]
print(direc)
for i in range(0,len(catalog['PSRJ'])):
if a-5<=catalog['RAJD'][i]<=a+5 and b-5<=catalog['DECJD'][i]<=b+5:
pix.append(coord[i])
if pix[-1]==f:
p=len(pix)
print('p',pix)
val=pix
count=count+1
for i in range(0,count):
for j in range(0,3):
if not(i==p-1):
k=np.dot(direc,pix[i])
val[i][j]=direc[j]*k
val[i][j]=pix[i][j]-val[i][j]
else:
val[i][j]=0.0
print(val)
coord is list containing position vectors,
eg:[[x1,y1,z1],[x2,y2,z2]...]
f is the reference coordinate,
eg:[a,b,c]
OUTPUTS:
[0.049780917594520344, 0.9791671435583665, -0.19685925230782697]
p= 1
[[0.049780917594520344, 0.9791671435583665, -0.19685925230782697]]
[[0.0, 0.0, 0.0]]
[-0.813400538291293, 0.4058994949023155, 0.41668353020665466]
p= 2
[[0.683288067396023, -0.16836586544306054, -0.7104719222515533], [-0.813400538291293, 0.4058994949023155, 0.41668353020665466]]
[[-0.0, -0.0, -0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
[0.15331177729205145, -0.40555298841701465, 0.9011227843804535]
p= 2
[[0.08556174481590322, 0.8925106169941267, -0.4428362974924498], [0.15331177729205145, -0.40555298841701465, 0.9011227843804535]]
[[-0.0, -0.0, -0.0], [0.0, 0.0, 0.0], [-0.0, -0.0, -0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
[0.4561283699625753, 0.08868628538946206, 0.8854838524214335]
p= 3
[[0.016015167632180395, 0.07982154161648915, -0.9966805084377242], [-0.39320614327231507, 0.918593873485819, -0.03967649792044076], [0.4561283699625753, 0.08868628538946206, 0.8854838524214335]]
[[-0.0, -0.0, -0.0], [-0.0, 0.0, -0.0], [0.0, 0.0, 0.0]]
The end result I am getting is null vectors, I don't know if it is my calculation causing the issue, Any insight into the problem would be helpful.

Difference between SimpleITK.Euler3DTransform and scipy.spatial.transform.Rotation.from_euler?

Using these two library functions:
SimpleITK.Euler3DTransform
scipy.spatial.transform.Rotation.from_euler
to create a simple rotation matrix from Euler Angles:
import numpy as np
import SimpleITK as sitk
from scipy.spatial.transform import Rotation
from math import pi
euler_angles = [pi / 10, pi / 18, pi / 36]
sitk_matrix = sitk.Euler3DTransform((0, 0, 0), *euler_angles).GetMatrix()
sitk_matrix = np.array(sitk_matrix).reshape((3,3))
print(np.array_str(sitk_matrix, precision=3, suppress_small=True))
order = 'XYZ' # Different results for any order in ['XYZ','XZY','YZX','YXZ','ZXY','ZYX','xyz','xzy','yzx','yxz','zxy','zyx']
scipy_matrix = Rotation.from_euler(order, euler_angles).as_matrix()
print(np.array_str(scipy_matrix, precision=3, suppress_small=True))
I get two different results:
[[ 0.976 -0.083 0.2 ]
[ 0.139 0.947 -0.288]
[-0.165 0.309 0.937]]
[[ 0.981 -0.086 0.174]
[ 0.136 0.943 -0.304]
[-0.138 0.322 0.937]]
Why? How can I compute the same matrix as SimpleITK using scipy?
The issue is that the itk.Euler3DTransform class by default applies the rotation matrix multiplications in Z # X # Y order and the Rotation.from_euler function in Z # Y # X order.
Note that this is independent of the order you specified. The order you specify refers to the order of the angles, not the order of the matrix multiplications.
If you are using the itk.Euler3DTransform directly as you showed in your example, you can actually change the default behavior for itk to perform the matrix multiplication in Z # Y # X order.
I never worked with sitk but in theory and based on the documentation, something like this should work:
euler_transform = sitk.Euler3DTransform((0, 0, 0), *euler_angles)
euler_transform.SetComputeZYX(True)
sitk_matrix = euler_transform.GetMatrix()
Alternatively, I wrote a function which is similar to Rotation.from_euler but has the option to specify the rotation order as well:
def build_rotation_3d(radians: NDArray,
radians_oder: str = 'XYZ',
rotation_order: str = 'ZYX',
dims: List[str] = ['X', 'Y', 'Z']) -> NDArray:
x_rad, y_rad, z_rad = radians[(np.searchsorted(dims, list(radians_oder)))]
x_cos, y_cos, z_cos = np.cos([x_rad, y_rad, z_rad], dtype=np.float64)
x_sin, y_sin, z_sin = np.sin([x_rad, y_rad, z_rad], dtype=np.float64)
x_rot = np.asarray([
[1.0, 0.0, 0.0, 0.0],
[0.0, x_cos, -x_sin, 0.0],
[0.0, x_sin, x_cos, 0.0],
[0.0, 0.0, 0.0, 1.0],
])
y_rot = np.asarray([
[y_cos, 0.0, y_sin, 0.0],
[0.0, 1.0, 0.0, 0.0],
[-y_sin, 0.0, y_cos, 0.0],
[0.0, 0.0, 0.0, 1.0],
])
z_rot = np.asarray([
[z_cos, -z_sin, 0.0, 0.0],
[z_sin, z_cos, 0.0, 0.0],
[0.0, 0.0, 1.0, 0.0],
[0.0, 0.0, 0.0, 1.0],
])
rotations = np.asarray([x_rot, y_rot, z_rot])[(np.searchsorted(dims, list(rotation_order)))]
return rotations[0] # rotations[1] # rotations[2]
What is your 'order' string. When I ran your code with order='xyz', I get the same results for SimpleITK and scipy's Rotation.

Transformation matrix in Python

I have a scale in a 3D space from -10 to 10 with step of 2.5 ([-10,-7.5,-5,-2.5,0,2.5,5,7.5,10]).
I have a 3D point in this scale, I want mapped it in another 3D space from 0 to 8 with step of 1 ([0,1,2,3,4,5,6,7,8]).
How can I do it?
Thanks for your help
In this stack exachange post you find the mathematical formula for normalization within a range. The python implementation could be the following:
def normalize(values, curr_bounds, new_bounds):
return [(x - curr_bounds[0]) * (new_bounds[1] - new_bounds[0]) / (curr_bounds[1] - curr_bounds[0] + new_bounds[0]) for x in values]
Your request is not very clear, if your data needs to be mapped in a discrete range (rounded by step) then you can do this:
def normalize_step(values, curr_bounds, new_bounds, new_step):
return [round_step(new_step, (x - curr_bounds[0]) * (new_bounds[1] - new_bounds[0]) / (curr_bounds[1] - curr_bounds[0] + new_bounds[0])) for x in values]
def round_step(step, n):
return (n//step + 1) * step if n%step >= step/2 else n//step * step
For example, given the following data:
current_bounds = (-10, 10)
new_bounds = (0, 8)
step = 1
values = [-10, -2.5, 0, 7.5, 2.5]
normalize(values, current_bounds, new_bounds)
# [0.0, 3.0, 4.0, 7.0, 5.0]
normalize_step(values, current_bounds, new_bounds, step)
# [0.0, 3.0, 4.0, 7.0, 5.0]
Note: In this case the result is the same because your step=1, if we put the step=1.5 the results change:
normalize_step(values, current_bounds, new_bounds, 1.5)
# [0.0, 3.0, 4.5, 7.5, 4.5]

gaussian fit with scipy.optimize.curve_fit in python with wrong results

I am having some trouble to fit a gaussian to data. I think the problem is that most of the elements are close to zero, and there not many points to actually be fitted. But in any case, I think they make a good dataset to fit, and I don't get what is confussing python. Here is the program, I have also added a line to plot the data so you can see what I am trying to fit
#Gaussian function
def gauss_function(x, a, x0, sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
# program
from scipy.optimize import curve_fit
x = np.arange(0,21.,0.2)
# sorry about these data!
y = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.2888599818864958e-275, 1.0099964933708256e-225, 4.9869496866403137e-184, 4.4182929795060327e-149, 7.2953754336628778e-120, 1.6214815763354974e-95, 2.5845990267696154e-75, 1.2195550372375896e-58, 5.6756631456872126e-45, 7.2520963306599953e-34, 6.0926453402093181e-25, 7.1075523112494745e-18, 2.1895584709541657e-12, 3.1040093615952226e-08, 3.2818874974043519e-05, 0.0039462011337049593, 0.077653596114448178, 0.33645159419151383, 0.40139213808285212, 0.15616093582013874, 0.0228751827752081, 0.0014423440677009125, 4.4400754532288282e-05, 7.4939123408714068e-07, 7.698340466102054e-09, 5.2805658851032628e-11, 2.6233358880470556e-13, 1.0131613609937094e-15, 3.234727006243684e-18, 9.0031014316344088e-21, 2.2867065482392331e-23, 5.5126221075296919e-26, 1.3045106781768978e-28, 3.1185031969890313e-31, 7.7170036365830092e-34, 2.0179753504732056e-36, 5.6739187799428708e-39, 1.7403776988666581e-41, 5.8939645426573027e-44, 2.2255784749636281e-46, 9.4448944519959299e-49, 4.5331936383388069e-51, 2.4727435506007072e-53, 1.5385048936078214e-55, 1.094651071873419e-57, 8.9211199390945735e-60, 8.3347561634783632e-62, 8.928140776588251e-64, 1.0960564546383266e-65, 1.5406342485015278e-67, 2.4760905399114866e-69, 4.5423744881977258e-71, 9.4921949220625905e-73, 2.2543765002199549e-74, 6.0698995872666723e-76, 1.8478996852922248e-77, 6.3431644488676084e-79, 0.0, 0.0, 0.0, 0.0]
plot(x,y) #Plot the curve, the gaussian is quite clear
plot(x,y,'ok') #Overplot the dots
# Try to fit the result
popt, pcov = curve_fit(gauss_function, x, y)
The problem is that the results for popt is
print popt
array([ 7.39717176e-10, 1.00000000e+00, 1.00000000e+00])
Any hint on why this could be happening?
Thanks!
Your problem is with the initial parameters of the curve_fit. By default, if no other information is given, it will start with an array of 1, but this obviously lead to a radically wrong result. This can be corrected simply by giving a reasonable starting vector.
To do this, I start from the estimated mean and standard deviation of your dataset
#estimate mean and standard deviation
meam = sum(x * y)
sigma = sum(y * (x - m)**2)
#do the fit!
popt, pcov = curve_fit(gauss_function, x, y, p0 = [1, mean, sigma])
#plot the fit results
plot(x,gauss_function(x, *popt))
#confront with the given data
plot(x,y,'ok')
This will perfectly approximate your results. Remember that curve fitting in general cannot work unless you start from a good point (inside the convergence basin, to be clear), and this doesn't depend on the implementation. Never do blind fit when you can use your knowledge!

Categories