Histogram function - Python - python

Looking for someone who can explain this to me:
phase = mod(phase,Nper*2*pi)
cl_phase = arange(0,Nper*2*pi+step,step)
c,p = histogram(phase,cl_phase)
while 0 in c:
step = step*2
cl_phase = arange(0,Nper*2*pi+step,step)
c,p = histogram(phase,cl_phase)
Where phase is the phase of a wave, Nper is the number of periods I'm analysing.
What I want to know is if some one can give me the name/link to an explanation of the histogram function..! Im not even sure from what package it comes from. Maybe numpy? Or maybe it even is a function that comes with python..! Super lost here..!
Any help here would be greatly appreciated!!

histogram() function is from numpy library. It doesn't come as a default function in Python.
You can use it by:
import numpy as np
np.histogram(phase,cl_phase)
In your code, it looks like you are using it as:
from numpy import histogram
histogram(phase,cl_phase)
c,p = histogram(phase, cl_phase) will give you two values as output. c will be the values of the histogram, and p will return the bin edges. You should take a look at the above docs for more info.

Related

Can we modify the solution vector between integrations steps with scipy.integrate.ode, using VODE?

I am trying to get a solution for a stiff ODE problem where at each integration step, i have to modify the solution vector before continuing on the integration.
For that, i am using scipy.integrate.ode, with the integrator VODE, in bdf mode.
Here is a simplified version of the code i am using. The function is much more complex than that and involve the use of CANTERA.
from scipy.integrate import ode
import numpy as np
import matplotlib.pyplot as plt
def yprime(t,y):
return y
vode = ode(yprime)
vode.set_integrator('vode', method='bdf', with_jacobian=True)
y0 = np.array([1.0])
vode.set_initial_value(y0, 0.0)
y_list = np.array([])
t_list = np.array([])
while vode.t<5.0 and vode.successful:
vode.integrate(vode.t+1e-3,step=True)
y_list = np.append(y_list,vode.y)
t_list = np.append(t_list,vode.t)
plt.plot(t_list,y_list)
Output:
So far so good.
Now, the problem is that within each step, I would like to modify y after it has been integrated by VODE. Naturally, i want VODE to keep on integrating with the modified solution.
This is what i have tried so far :
while vode.t<5.0 and vode.successful:
vode.integrate(vode.t+1e-3,step=True)
vode.y[0] += 1 # Will change the solution until vode.integrate is called again
vode._y[0] += 1 # Same here.
I also have tried looking at vode._integrator, but it seems that everything is kept inside the fortran instance of the solver.
For quick reference, here is the source code of scipy.integrate.ode, and here is the pyf interface scipy is using for VODE.
Has anyone tried something similar ? I could also change the solver and / or the wrapper i am using, but i would like to keep on using python for that.
Thank you very much !
For those getting the same problem, the issue lies in the Fortran wrapper from Scipy.
My solution was to change the package used, from ode to solve_ivp. The difference is that solve_ivp is entirely made with Python, and you will be able to hack your way through the implementation. Note that the code will run slowly compared to the vode link that the other package used, even though the code is very well written and use numpy (basically, C level of performances whenever possible).
Here are the few steps you will have to follow.
First, to reproduce the already working code :
from scipy.integrate import _ivp # Not supposed to be used directly. Be careful.
import numpy as np
import matplotlib.pyplot as plt
def yprime(t,y):
return y
y0 = np.array([1.0])
t0 = 0.0
t1 = 5.0
# WITHOUT IN-BETWEEN MODIFICATION
bdf = _ivp.BDF(yprime,t0,y0,t1)
y_list = np.array([])
t_list = np.array([])
while bdf.t<t1:
bdf.step()
y_list = np.append(y_list,bdf.y)
t_list = np.append(t_list,bdf.t)
plt.plot(t_list,y_list)
Output :
Now, to implement a way to modify the values of y between integration steps.
# WITH IN-BETWEEN MODIFICATION
bdf = _ivp.BDF(yprime,t0,y0,t1)
y_list = np.array([])
t_list = np.array([])
while bdf.t<t1:
bdf.step()
bdf.D[0] -= 0.1 # The first column of the D matrix is the value of your vector y.
# By modifying the first column, you modify the solution at this instant.
y_list = np.append(y_list,bdf.y)
t_list = np.append(t_list,bdf.t)
plt.plot(t_list,y_list)
Gives the plot :
This does not have any physical sense for this problem, unfortunately, but it works for the moment.
Note : It is entirely possible that the solver become unstable. It has to do with the Jacobian not being updated at the right time, and so one would have to recalculate it again, which is performance heavy most of the time. The good solution to that would be to rewrite the class BDF to implement the modification before the Jacobian Matrix is updated.
Source code here.

Statsmodels: vector_ar and IRAnalysis

I'm trying to estimate impulse response functions of a -1 standard-deviation shock to a 3-dimension VAR using statsmodels.tsa, however I'm currently having issues with setting the shock magnitude.
This gives me the IRFs for a 1 s.d. shock, the default:
import numpy as np
import statsmodels.tsa as sm
model = sm.vector_ar.var_model.VAR(endog = data)
fitted = model.fit()
shock= -1*fitted.sigma_u
irf = sm.vector_ar.irf.IRAnalysis(model = fitted)
The function IRAnalysis takes an argument P, an upper diagonal matrix that sets the shocks, I found this looking at the source code. However inputting P as shown below doesn't seem to be doing anything.
irf = statsmodels.tsa.vector_ar.irf.IRAnalysis(model = fitted, P = -np.linalg.cholesky(model.fitted_U))
I would really appreciate some help.
Thanks in advance.
I have had the same question and finally found something that works on my end.
instead of using the IRAnalysis explicitly, I found that transforming the VAR model into it's MA representation was the best way to adjust the size of the shock.
from statsmodels.tsa.vector_ar.irf import IRAnalysis
J = fitted.ma_rep(T)
J = shock*np.array(J)
This will give you the output of the irfs for T periods.
I also wanted the standard error bands on my plots, so I did something similar to that particular function as well.
G, H = fitted.irf_errband_mc(orth=False, repl=1000, steps=T, signif=0.05, seed=None, burn=100, cum=False)
Hope this helps

Discretize or bin LAB colorspace in 2 dimensions

I have a lab colorspace
And I want to "bin" the colorspace in a grid of 10x10 squares.
So the first bin might be (-110,-110) to (-100,-100) then the next one might be (-100,-110) to (-90,-100) and so on. These bins could be bin 1 and bin 2
I have seen np.digitize() but it appears that you have to pass it 1-dimensional bins.
A rudimentary approach that I have tried is this:
for fn in filenames:
image = color.rgb2lab(io.imread(fn))
ab = image[:,:,1:]
width,height,d = ab.shape
reshaped_ab = np.reshape(ab,(width*height,d))
print reshaped_ab.shape
images.append(reshaped_ab)
all_abs = np.vstack(images)
all_abs = shuffle(all_abs,random_state=0)
sns
df = pd.DataFrame(all_abs[:3000],columns=["a","b"])
top_a,top_b = df.max()
bottom_a,bottom_b = df.min()
range_a = top_a-bottom_a
range_b = top_b-bottom_b
corner_a = bottom_a
corner_b = bottom_b
bins = []
for i in xrange(int(range_a/10)):
for j in xrange(int(range_b/10)):
bins.append([corner_a,corner_b,corner_a+10,corner_b+10])
corner_b = bottom_b+10
corner_a = corner_a+10
but the "bins" that results seem kinda sketchy. For one thing there are many empty bins as the color space does have values in a square arrangement and that code pretty much just boxes off from the max and min values. Additionally, the rounding might cause issues. I am wondering if there is a better way to do this? I have heard of color histograms which count the values in each "bin". I don't need the values but the bins are I think what I am looking for here.
Ideally the bins would be an object that each have a label. So I could do bins.indices[0] and it would return the bounding box I gave it. Then also I could bin each observation, like if a new color was color = [15.342,-6.534], color.bin would return 15 or the 15th bin.
I realize this is a lot to ask for, but I think it must be a somewhat common need for people working with color spaces. So is there any python module or tool that can accomplish what I'm asking? How would you approach this? thanks!
Use the standard numpy 2D-histogram function: numpy.histogram2d:
import numpy as np
# a and b are arrays representing your color points
H, a_edges, b_edges = np.histogram2d(a, b, bins=10)
If you want to discard the empty bins, you'd have to do some work from here. But I don't see why you'd want that, because assigning future colors to existing nonempty bins will be much more work if they are not on a rectangular grid.
You are probably trying to repeat what Richard Zhang did in "Colorful Image Colorization" research: http://richzhang.github.io/colorization/
Here, author himself discuss this problem: https://github.com/richzhang/colorization/issues/23
Fortunately Zhang provides .npy file, that contains those quantized values. It is under: https://github.com/richzhang/colorization/blob/master/resources/pts_in_hull.npy
The only thing, you have to do now, is to load this file in your python script:
import numpy as np
pts_in_hull = np.load("pts_in_hull.npy")
It is numpy array of shape 313x2 containing values from your image.
I know this answer comes few years too late, but maybe it will help someone else.

What's the correct usage of matplotlib.mlab.normpdf()?

I intend for part of a program I'm writing to automatically generate Gaussian distributions of various statistics over multiple raw text sources, however I'm having some issues generating the graphs as per the guide at:
python pylab plot normal distribution
The general gist of the plot code is as follows.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as pyplot
meanAverage = 222.89219487179491 # typical value calculated beforehand
standardDeviation = 3.8857889432054091 # typical value calculated beforehand
x = np.linspace(-3,3,100)
pyplot.plot(x,mlab.normpdf(x,meanAverage,standardDeviation))
pyplot.show()
All it does is produce a rather flat looking and useless y = 0 line!
Can anyone see what the problem is here?
Cheers.
If you read documentation of matplotlib.mlab.normpdf, this function is deprycated and you should use scipy.stats.norm.pdf instead.
Deprecated since version 2.2: scipy.stats.norm.pdf
And because your distribution mean is about 222, you should use np.linspace(200, 220, 100).
So your code will look like:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as pyplot
meanAverage = 222.89219487179491 # typical value calculated beforehand
standardDeviation = 3.8857889432054091 # typical value calculated beforehand
x = np.linspace(200, 220, 100)
pyplot.plot(x, norm.pdf(x, meanAverage, standardDeviation))
pyplot.show()
It looks like you made a few small but significant errors. You either are choosing your x vector wrong or you swapped your stddev and mean. Since your mean is at 222, you probably want your x vector in this area, maybe something like 150 to 300. This way you get all the good stuff, right now you are looking at -3 to 3 which is at the tail of the distribution. Hope that helps.
I see that, for the *args which are sending meanAverage, standardDeviation, the correct thing to be sent is:
mu : a numdims array of means of a
sigma : a numdims array of atandard deviation of a
Does this help?

Why are LASSO in sklearn (python) and matlab statistical package different?

I am using LaasoCV from sklearn to select the best model is selected by cross-validation. I found that the cross validation gives different result if I use sklearn or matlab statistical toolbox.
I used matlab and replicate the example given in
http://www.mathworks.se/help/stats/lasso-and-elastic-net.html
to get a figure like this
Then I saved the matlab data, and tried to replicate the figure with laaso_path from sklearn, I got
Although there are some similarity between these two figures, there are also certain differences. As far as I understand parameter lambda in matlab and alpha in sklearn are same, however in this figure it seems that there are some differences. Can somebody point out which is the correct one or am I missing something? Further the coefficient obtained are also different (which is my main concern).
Matlab Code:
rng(3,'twister') % for reproducibility
X = zeros(200,5);
for ii = 1:5
X(:,ii) = exprnd(ii,200,1);
end
r = [0;2;0;-3;0];
Y = X*r + randn(200,1)*.1;
save randomData.mat % To be used in python code
[b fitinfo] = lasso(X,Y,'cv',10);
lassoPlot(b,fitinfo,'plottype','lambda','xscale','log');
disp('Lambda with min MSE')
fitinfo.LambdaMinMSE
disp('Lambda with 1SE')
fitinfo.Lambda1SE
disp('Quality of Fit')
lambdaindex = fitinfo.Index1SE;
fitinfo.MSE(lambdaindex)
disp('Number of non zero predictos')
fitinfo.DF(lambdaindex)
disp('Coefficient of fit at that lambda')
b(:,lambdaindex)
Python Code:
import scipy.io
import numpy as np
import pylab as pl
from sklearn.linear_model import lasso_path, LassoCV
data=scipy.io.loadmat('randomData.mat')
X=data['X']
Y=data['Y'].flatten()
model = LassoCV(cv=10,max_iter=1000).fit(X, Y)
print 'alpha', model.alpha_
print 'coef', model.coef_
eps = 1e-2 # the smaller it is the longer is the path
models = lasso_path(X, Y, eps=eps)
alphas_lasso = np.array([model.alpha for model in models])
coefs_lasso = np.array([model.coef_ for model in models])
pl.figure(1)
ax = pl.gca()
ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k'])
l1 = pl.semilogx(alphas_lasso,coefs_lasso)
pl.gca().invert_xaxis()
pl.xlabel('alpha')
pl.show()
I do not have matlab but be careful that the value obtained with the cross--validation can be unstable. This is because it influenced by the way you subdivide the samples.
Even if you run 2 times the cross-validation in python you can obtain 2 different results.
consider this example :
kf=sklearn.cross_validation.KFold(len(y),n_folds=10,shuffle=True)
cv=sklearn.linear_model.LassoCV(cv=kf,normalize=True).fit(x,y)
print cv.alpha_
kf=sklearn.cross_validation.KFold(len(y),n_folds=10,shuffle=True)
cv=sklearn.linear_model.LassoCV(cv=kf,normalize=True).fit(x,y)
print cv.alpha_
0.00645093258722
0.00691712356467
it's possible that alpha = lambda / n_samples
where n_samples = X.shape[0] in scikit-learn
another remark is that your path is not very piecewise linear as it could/should be. Consider reducing the tol and increasing max_iter.
hope this helps
I know this is an old thread, but:
I'm actually working on piping over to LassoCV from glmnet (in R), and I found that LassoCV doesn't do too well with normalizing the X matrix first (even if you specify the parameter normalize = True).
Try normalizing the X matrix first when using LassoCV.
If it is a pandas object,
(X - X.mean())/X.std()
It seems you also need to multiple alpha by 2
Though I am unable to figure out what is causing the problem, there is a logical direction in which to continue.
These are the facts:
Mathworks have selected an example and decided to include it in their documentation
Your matlab code produces exactly the result as the example.
The alternative does not match the result, and has provided inaccurate results in the past
This is my assumption:
The chance that mathworks have chosen to put an incorrect example in their documentation is neglectable compared to the chance that a reproduction of this example in an alternate way does not give the correct result.
The logical conclusion: Your matlab implementation of this example is reliable and the other is not.
This might be a problem in the code, or maybe in how you use it, but either way the only logical conclusion would be that you should continue with Matlab to select your model.

Categories