pandas interpolation documentary already leaves helpful notes for all the other Notes on wether they use the actual numerical indices or a timeindex for the interpolation.
method str, default ‘linear’
Interpolation technique to use. One of:
‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
‘time’: Works on daily and higher resolution data to interpolate given length of interval.
‘index’, ‘values’: use the actual numerical values of the index.
‘pad’: Fill in NaNs using existing values.
‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18.
But unfortunately I couldnt find this information for the two last ones:
cubicspline
and
from_derivatives
scipy.interpolate.CubicSpline
Interpolate data with a piecewise cubic polynomial which is twice continuously differentiable. The result is represented as a PPoly instance with breakpoints matching the given data.
scipy.interpolate.BPoly.from_derivatives
Construct a piecewise polynomial in the Bernstein basis, compatible with the specified values and derivatives at breakpoints.
Related
Calling
rayleigh_args = stats.rayleigh.fit(num_list)
Returns a tuple of 2 values e.g. (-320.34, 360.77).
Where I can use it to get the CDF or PDF of the distribution for a given value.
I can't find what each of those values represents.
In addition, as far as I'm aware the rayleigh distribution requires only one scale parameter σ in order to calculate CDF and PDF.
My question is, what is the meaning of the scipy.stats return values, and how can I get the actual σ of the distribution. So I can use it in another application.
I am trying to fit 3 peaks using lmfit with a skewed-Voigt profile (this is not that important for my question). I want to set a constraint on the peaks centers of the form:
peak1 = SkewedVoigtModel(prefix='sv1_')
pars.update(peak1.make_params())
pars['sv1_center'].set(x)
peak2 = SkewedVoigtModel(prefix='sv2_')
pars.update(peak2.make_params())
pars['sv2_center'].set(1000+x)
peak3 = SkewedVoigtModel(prefix='sv3_')
pars.update(peak3.make_params())
pars['sv3_center'].set(2000+x)
Basically I want them to be 1000 apart from each other, but I need to fit for the actual shift, x. I know that I can force some parameters to be equal using pars['sv2_center'].set(expr='sv1_center'), but what I would need is pars['sv2_center'].set(expr='sv1_center'+1000) (which doesn't work just like that). How can I achieve what I need? Thank you!
Just do:
pars['sv2_center'].set(expr='sv1_center+1000')
pars['sv3_center'].set(expr='sv1_center+2000')
The constraint expression is a Python expression that will be evaluated every time the constrained parameter needs to get its value.
scipy.stats.spearmanr([1,2,3,4,1],[1,2,2,1,np.nan],nan_policy='omit')
it will give a spearman correlation of 0.349999
My understanding is that nan_policy ='omit' will discard all the pairs which have nan. If that's the case, the results should be the same as scipy.stats.spearmanr([1,2,3,4],[1,2,2,1])
However, it gives a correlation of 0.235702.
Why are they different? Is my understand of nan_policy ='omit' corrent?
I tried to run your code, it gives me cero correlation (R=0.0).
I use this function and you are understanding well nan_policy ='omit'.
If you don't need the p-value of the correlation I would sugest using .corr(method = 'spearman') from pandas library. By default it excludes NA/null values.
Official Documentation
nan_policy='omit' should completely omit those pairs for which one or both values are nan. When I run the two commands you pasted above, I get the same correlation value, not different ones.
I have a feature set
[x1,x2....xm]
Now I want to create polynomial feature set
What that means is that if degree is two, then I have the feature set
[x1.... xm,x1^2,x2^2...xm^2, x1x2, x1x3....x1,xm......xm-1x1....xm-1xm]
So it contains terms of only of order 2..
same is if order is three.. then you will have cubic terms as well..
How to do this?
Edit 1: I am working on a machine learning project where I have close to 7 features... and a non-linear regression on this linear features are giving ok result...Hence I thought that to get more number in features I can map these features to a higher dimension..
So one way is to consider polynomial order of the feature vector...
Also generating x1*x1 is easy.. :) but getting the rest of the combinations are a bit tricky..
Can combinations give me x1x2x3 result if the order is 3?
Use
itertools.combinations(list, r)
where list is the feature set, and r is the order of desired polynomial features. Then multiply elements of the sublists given by the above. That should give you {x1*x2, x1*x3, ...}. You'll need to construct other ones, then union all parts.
[Edit]
Better: itertools.combinations_with_replacement(list, r) will nicely give sorted length-r tuples with repeated elements allowed.
You could use itertools.product to create all the possible sets of n values that are chosen from the original set; but keep in mind that this will generate (x2, x1) as well as (x1, x2).
Similarly, itertools.combinations will produce sets without repetition or re-ordering, but that means you won't get (x1, x1) for example.
What exactly are you trying to do? What do you need these result values for? Are you sure you do want those x1^2 type terms (what does it mean to have the same feature more than once)? What exactly is a "feature" in this context anyway?
Using Karl's answer as inspiration, try using product and then taking advantage of the set object. Something like,
set([set(comb) for comb in itertools.product(range(5),range(5)])
This will get rid of recurring pairs. Then you can turn the set back into a list and sort it or iterate over it as you please.
EDIT:
this will actually kill the x_m^2 terms, so build sorted tuples instead of sets. this will allow the terms to be hashable and nonrepeating.
set([tuple(sorted(comb)) for comb in itertools.product(range(5),range(5))])
I have a few functions that return an array of data corresponding to parameters ranges.
Example: for a 2d array a, the a_{ij} value corresponds to the parameter set (param1_i, param2_j). How do I return the result and keep the parameter-value correspondence?
Calling the function for each and every of param1_i, para2_j and returning one value would take ages (far more efficient if you do it in one go)
Break the function into (many) smaller functions and make usage difficult? (the point is to get the values for a range of parameters, 1 value is completely useless)
The best I can come up with is make a new numpy dtype, for example for a 2d array:
tagged2d = np.dtype( [('vals', float, 1), ('params', float, (2,))] )
so that a['vals'][i,j] contains the values and a['params'][i,j] the corresponding parameters.
Any thoughts? Maybe I should just return 2 arrays, one with values, other with parameter tuples?
I recommend your last suggestion... just return two arrays {'values': a, 'params':params}.
There are a few reasons for this.
Primarily, your other solution (using dtype and recarrays) tangles too many things together. For example, what about quantities derived from a that correspond to the same parameters... do you make a new recarray and a new copy of the parameters for that? Even something as simple as 2*a becoming the salient quantity will require that you make difficult decisions.
Recarrays have limitations and this is so easily solved in other ways that it's not worth accepting those limitations.
If you want an easier interrelation between the returned terms, you could put the items in a class. For example, you could have a method that takes a param pair and returns the corresponding result. This way, you wouldn't be limited by the recarray, and you could still construct whatever convenience relationship between the two that you like, and easily make backward-compatible change to behavior, etc.