Plot average of an array in python - python

I have a 2D array of temperature over time data. There are about 7500 x-values and as much corresponding y-values (so one y for every x).
It looks like this:
The blue line in the middle is the result of my unsuccessful attempt to draw a plot line, which would represent the average of my data. Code:
import numpy as np
import matplotlib.pyplot as plt
data=np.genfromtxt("data.csv")
temp_av=[np.mean(data[1])]*len(data[0])
plt.figure()
plt.subplot(111)
plt.scatter(data[0],data[1])
plt.plot(data[0],temp_av)
plt.show()
However what I need is a curve, which will follow the rise in the temperature. Basically a line which will be somewhere in the middle of data points.
I googled for some solutions, but all I found were suggestions how to compute an average in cases where you have multiple y-values for one x. I understand how to do that, but it doesn't help in this case.
My next idea would be to use a loop to compute an average for every 2 neighbor points. But I am not sure how to do that best and if there aren't better solutions.
Also, I understand that what I need is to compute an other array. Plotting is only for representation.

If I undestrand correclty, what you are trying to plot is a trend line. You could do it by using the numpy function 'polyfit'. If that's what you are looking for, try this small modification to your code
import numpy as np
import matplotlib.pyplot as plt
data=np.genfromtxt("data.csv")
plt.figure()
plt.subplot(111)
plt.scatter(data[0],data[1])
pfit = np.polyfit(data[0], data[1], 1)
trend_line_model = np.poly1d(pfit)
plt.plot(data[0], trend_line_model(data[0]), "m--")
plt.show()
This will plot the trend line in dashed magenta

Related

Python: Histogram return wrong values for counts (EDIT: more general with example)

EDIT: Ive found a general example where it doesnt work either!
I am trying to extract the data for a histogram, but different counts seem wrong. As an example code:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.rand(1000000)
bins = np.arange(0,1,0.0001)
a,b,c = plt.hist(data,bins)
This gives me this rather messy histogram, and i've saved the counts as a and the interval as b. Now, plotting a and b, I should expect the same histogram, right? But that's not what I get:
plt.scatter(b[0:len(b)-1],a,s=2)
which gives me this, which doesnt match at all! Furthurmore, when I try and find the maximum value of a, it gives me 144, which fits fine with the scatterplot, but not with the histogram function.
If I count the numbers myself with the following code:
len(np.intersect1d(np.where(data>=b[np.argmax(a)]),np.where(data<b[np.argmax(a)+1])))
then it also gives me 144, in accordance with the values. So is the displayed histogram just wrong for some reason, and I should ignore it and just take the extracted data?
Old, unedited post:
For a physics course I am trying to bin my results in the following way:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as ss
from scipy.optimize import curve_fit
plt.rc("font", family=["Helvetica", "Arial"])
plt.rc("axes", labelsize=18)
plt.rc("xtick", labelsize=16, top=True, direction="in")
plt.rc("ytick", labelsize=16, right=True, direction="in")
plt.rc("axes", titlesize=22)
plt.rc("legend", fontsize=16)
data_Ra = np.loadtxt('Ra226_cal2_ch001.txt',skiprows=5)
t_Ra = data_Ra[:,0]*10**-8 # time in seconds
channels_Ra = data_Ra[:,1]
channels_Ra = channels_Ra[np.where(channels_Ra>0)] # removing all the measurements at channel = 0
intervalspace = 2 #The intervals in which we count
bins=np.arange(0,4000,intervalspace)
counts, intervals , stuff = plt.hist(channels_Ra,bins)
plt.xlabel('Channels')
plt.ylabel('Counts')
plt.show()
Here, the histogram plot looks totally fine, with a max near 13000 counts. But when I then use np.max(counts), I am given about 24000, and when I try and just plot the values it gives me with:
plt.scatter(intervals[0:len(intervals)-1]+intervalspace/2,counts,s=1)
plt.xlabel('Channels')
plt.ylabel('Counts')
plt.title('Ra225')
plt.show()
it looks like this, which is totally different, and I can't figure out why. I am expecting the scatterplot to resemble the histogram, and while the peaks are located at the same x-vales, the height do not match.
This problem is in other large datasets as well.
I dont think i'm allowed to drop the txt-file here? So im not sure how much more I can show, but any help will be appreciated!
I don't know why you interpret the results in that way.
If you look at the histogram plot, you will be able to see the maximum value of the y-axis is 25,000. That means that there are some values close to 25,000. This fact can be verified in the scatter plot.
Your scatter plot shows actual values. It would be clearer if you describe how your expected plot looks like.
If you want discard some outlier points, you should apply some filtering before plotting the data.

Matplotlib 2.02 plotting within a for loop

I am having trouble with two things on a plot I am generating within a for loop, my code loads some data in, fits it to a function using curve_fit and then plots measured data and the fit on the same plot for 5 different sets of measured y value (the measured data is represent by empty circle markers and fit by a solid line as the same color as the marker)
Firstly I am struggling to reduce the linewidth of the fit (solid line) however much I reduce the float value of linewidth, I can increase the size just not decrease it by the value displayed in the output below. Secondly I would like the legend to display only circle markers not circles with lines through - I cannot seem to get this to work, any ideas?
Here is my code and attached is the output plot and data file on google drive share link (for some reason it's cutting off long lines of text on this post):
import scipy
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
#define vogel-fulcher-tamman (VFT) function
def vft(x,sigma_0,temp_vf,D):
return np.log(sigma_0)-((D*temp_vf)/(x-temp_vf))
#load and sort data
data=np.genfromtxt('data file',skip_header=3)
temp=data[:,0]
inverse_temp=data[:,1]
dc_conduct=np.log10(data[:,2:11])
only_adam=dc_conduct[:,4:9]
colors = ['b','g','r','c','m']
labels = ['50mg 2-adam','300mg 2-adam','100 mg 2-adam','150 mg 2-adam','250mg
2-adam']
for i in range(0,len(only_adam)):
#fit VTF function
y=only_adam[:,i]
popt, pcov = curve_fit(vft,temp,y)
#plotting
plt.plot(inverse_temp,y,color=colors[i],marker='o',markerfacecolor='none',
label=labels[i])
plt.plot(inverse_temp,vft(temp, *popt),linewidth=0.00001,linestyle='-
',color=colors[i])
plt.ylabel("Ionic Conductivity [Scm**2/mol]")
plt.xlabel("1000 / [T(K)]")
plt.axis('tight')
plt.legend(loc='lower left')
You are looping over the rows of only_adam, but index the columns of that array with the loop variable i. This does not make sense and leads to the error shown.
The plot that shows the data points has lines in it. Those are the lines shown. You cannot make them smaller by decreasing the other plot's linewidth. Instead you need to set the linestyle of that plot off, e.g. plot(..., ls="")

Pyplot colormap line by line

I'm beginning with plotting on python using the very nice pyplot. I aim at showing the evolution of two series of data along time. Instead of doing a casual plot of data function of time, I'd like to have a scatter plot (data1,data2) where the time component is shown as a color gradient.
In my two column file, the time would be described by the line number. Either written as a 3rd column in the file either using the intrinsic capability of pyplot to get the line number on its own.
Can anyone help me in doing that ?
Thanks a lot.
Nicolas
When plotting using matplotlib.pyplot.scatter you can pass a third array via the keyword argument c. This array can choose the colors that you want your scatter points to be. You then also pick an appropriate colormap from matplotlib.cm and assign that with the cmap keyword argument.
This toy example creates two datasets data1 and data2. It then also creates an array colors, an array of continual values equally spaced between 0 and 1, and with the same length as data1 and data2. It doesn't need to know the "line number", it just needs to know the total number of data points, and then equally spaces the colors.
I've also added a colorbar. You can remove this by removing the plt.colorbar() line.
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
N = 500
data1 = np.random.randn(N)
data2 = np.random.randn(N)
colors = np.linspace(0,1,N)
plt.scatter(data1, data2, c=colors, cmap=cm.Blues)
plt.colorbar()
plt.show()

Scale axes 3d in matplotlib

I'm facing issues in scaling axes 3d in matplotlib. I have found another questions but somehow the answer it does not seems to work. Here is a sample code:
import matplotlib as mpl
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
data=np.array([[0,0,0],[10,1,1],[2,2,2]])
fig=plt.figure()
ax=Axes3D(fig)
ax.set_xlim3d(0,15)
ax.set_ylim3d(0,15)
ax.set_zlim3d(0,15)
ax.scatter(data[:,0],data[:,1],data[:,2])
plt.show()
It seems it just ignore the ax.set commands...
In my experience, you have to set your axis limits after plotting the data, otherwise it will look at your data and adjust whatever axes settings you entered before to fit it all in-frame out to the next convenient increment along the axes in question. If, for instance, you set your x-axis limits to +/-400 but your data go out to about +/-1700 and matplotlib decides to label the x-axis in increments of 500, it's going to display the data relative to an x-axis that goes out to +/-2000.
So in your case, you just want to rearrange that last block of text as:
fig=plt.figure()
ax=Axes3D(fig)
ax.scatter(data[:,0],data[:,1],data[:,2])
ax.set_xlim3d(0,15)
ax.set_ylim3d(0,15)
ax.set_zlim3d(0,15)
plt.show()
The way of ColorOutOfSpace is good. But if you want to automate the scaling you have to search for the maximum and minimum number in the data and scale with those values.
min = np.amin(data) # lowest number in the array
max = np.amax(data) # highest number in the array
ax.set_xlim3d(min, max)
ax.set_ylim3d(min, max)
ax.set_zlim3d(min, max)

Python equivalent for MATLAB's normplot?

Is there a python equivalent function similar to normplot from MATLAB?
Perhaps in matplotlib?
MATLAB syntax:
x = normrnd(10,1,25,1);
normplot(x)
Gives:
I have tried using matplotlib & numpy module to determine the probability/percentile of the values in array but the output plot y-axis scales are linear as compared to the plot from MATLAB.
import numpy as np
import matplotlib.pyplot as plt
data =[-11.83,-8.53,-2.86,-6.49,-7.53,-9.74,-9.44,-3.58,-6.68,-13.26,-4.52]
plot_percentiles = range(0, 110, 10)
x = np.percentile(data, plot_percentiles)
plt.plot(x, plot_percentiles, 'ro-')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.show()
Gives:
Else, how could the scales be adjusted as in the first plot?
Thanks.
A late answer, but I just came across the same problem and found a solution, that is worth sharing. I guess.
As joris pointed out the probplot function is an equivalent to normplot, but the resulting distribution is in form of the cumulative density function. Scipy.stats also offers a function, to convert these values.
cdf -> percentile
stats.'distribution function'.cdf(cdf_value)
percentile -> cdf
stats.'distribution function'.ppf(percentile_value)
for example:
stats.norm.ppf(percentile)
To get an equivalent y-axis, like normplot, you can replace the cdf-ticks:
from scipy import stats
import matplotlib.pyplot as plt
nsample=500
#create list of random variables
x=stats.t.rvs(100, size=nsample)
# Calculate quantiles and least-square-fit curve
(quantiles, values), (slope, intercept, r) = stats.probplot(x, dist='norm')
#plot results
plt.plot(values, quantiles,'ob')
plt.plot(quantiles * slope + intercept, quantiles, 'r')
#define ticks
ticks_perc=[1, 5, 10, 20, 50, 80, 90, 95, 99]
#transfrom them from precentile to cumulative density
ticks_quan=[stats.norm.ppf(i/100.) for i in ticks_perc]
#assign new ticks
plt.yticks(ticks_quan,ticks_perc)
#show plot
plt.grid()
plt.show()
The result:
I'm fairly certain matplotlib doesn't provide anything like this.
It's possible to do, of course, but you'll have to either rescale your data and change your y axis ticks/labels to match, or, if you're planning on doing this often, perhaps code a new scale that can be applied to matplotlib axes, like in this example: http://matplotlib.sourceforge.net/examples/api/custom_scale_example.html.
Maybe you can use the probplot function of scipy (scipy.stats), this seems to me an equivalent for MATLABs normplot:
Calculate quantiles for a probability
plot of sample data against a
specified theoretical distribution.
probplot optionally calculates a
best-fit line for the data and plots
the results using Matplotlib or a
given plot function.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html
But is does not solve your problem of the different y-axis scale.
Using matplotlib.semilogy will get closer to the matlab output.

Categories