Plotting a curve from numpy array with large values

Plotting a curve from numpy array with large values - python

I am trying to plot a curve from molecular dynamics potential energies data stored in numpy array. As you can see from my figure attached, on the top left of the figure, a large number appears which is related to the label on y-axis. Look at it.
Even if I rescale the data, still a number appears there. I do not want it. Please can you suggest me howto sort out this issue? Thank you very much..

This is likely happening because your data is a small value offset by a large one. That's what the - sign means at the front of the number, "take the plotted y-values and subtract this number to get the actual values". You can remove it by plotting with the mean subtracted. Here's an example:
import numpy as np
import matplotlib.pyplot as plt
y = -1.5*1e7 + np.random.random(100)
plt.plot(y)
plt.ylabel("units")
gives the form you don't like:
but subtracting the mean (or some other number close to that, like min or max, etc) will remove the large offset:
plt.figure()
plt.plot(y - np.mean(y))
plt.ylabel("offset units")
plt.show()

You can remove the offset by using:
plt.ticklabel_format(useOffset=False)

It seems your data is displayed in exponential form like: 1e+10, 2e+10, etc.
This question here might help:
How to prevent numbers being changed to exponential form in Python matplotlib figure

Related

How do you smoothen out values in an array (without polynomial equations)?

So basically I have some data and I need to find a way to smoothen it out (so that the line produced from it is smooth and not jittery). When plotted out the data right now looks like this:
and what I want it to look is like this:
I tried using this numpy method to get the equation of the line, but it did not work for me as the graph repeats (there are multiple readings so the graph rises, saturates, then falls then repeats that multiple times) so there isn't really an equation that can represent that.
I also tried this but it did not work for the same reason as above.
The graph is defined as such:
gx = [] #x is already taken so gx -> graphx
gy = [] #same as above
#Put in data
#Get nice data #[this is what I need help with]
#Plot nice data and original data
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
The method I think would be most applicable to my solution is getting the average of every 2 points and setting that to the value of both points, but this idea doesn't sit right with me - potential values may be lost.

You could use a infinite horizon filter
import numpy as np
import matplotlib.pyplot as plt
x = 0.85 # adjust x to use more or less of the previous value
k = np.sin(np.linspace(0.5,1.5,100))+np.random.normal(0,0.05,100)
filtered = np.zeros_like(k)
#filtered = newvalue*x+oldvalue*(1-x)
filtered[0]=k[0]
for i in range(1,len(k)):
# uses x% of the previous filtered value and 1-x % of the new value
filtered[i] = filtered[i-1]*x+k[i]*(1-x)
plt.plot(k)
plt.plot(filtered)
plt.show()

I figured it out, by averaging 4 results I was able to significantly smooth out the graph. Here is a demonstration:
Hope this helps whoever needs it

Linregress output seems incorrect

I plotted a scatter plot on my dataframe which looks like this:
with code
from scipy import stats
import pandas as pd
import seaborn as sns
df = pd.read_csv('/content/drive/My Drive/df.csv', sep=',')
subset = df[:,1:10080]
df['mean'] = subset.mean(axis=1)
df.plot(x='mean', y='Result', kind = 'scatter')
sns.lmplot('mean', 'Result', df, order=1)
I wanted to find the slope of the regression in the graph using code
scipy.stats.mstats.linregress(Result,average)
but from the output it seems like the slope magnitude is too small:
LinregressResult(slope=-0.0001320534706614152, intercept=27.887336813241845, rvalue=-0.16776138446214162, pvalue=3.0450456899520655e-07, stderr=2.55977061451773e-05)
if I switched the Resultand average positions,
scipy.stats.mstats.linregress(average,Result)
it still doesn't look right as the intercept is too large
LinregressResult(slope=-213.12489536011773, intercept=7138.48783135982, rvalue=-0.16776138446214162, pvalue=3.0450456899520655e-07, stderr=41.31287437069993)
Why is this happening? Do these output values need to be rescaled?

The signature for scipy.stats.mstats.linregress is linregress(x,y) so your second ordering, linregress(average, Result) is the one that is consistent with the way your graph is drawn. And on that graph, an intercept of 7138 doesn't seem unreasonable—are you getting confused by the fact that the x-axis limits you're showing don't go down to 0, where the intercept would actually happen?
In any case, your data really don't look like they follow a linear law, so the slope (or any parameter from a completely-misspecified model) will not actually tell you much. Are the x and y values all strictly positive? And is there a particular reason why x can never logically go below 25? The data-points certainly seem to be piling up against that vertical asymptote. If so, I would probably subtract 25 from x, then fit a linear model to logged data. In other words, do your plot and your linregress with x=numpy.log(average-25) and y=numpy.log(Result). EDIT: since you say x is temperature there’s no logical reason why x can’t go below 25 (it is meaningful to want to extrapolate below 25, for example—and even below 0). Therefore don’t subtract 25, and don’t log x. Just log y.
In your comments you talk about rescaling the slope, and eventually the suspicion emerges that you think this will give you a correlation coefficient. These are different things. The correlation coefficient is about the spread of the points around the line as well as slope. If what you want is correlation, look up the relevant tools using that keyword.

matplotlib scatter: the more overlapping points the bigger the marker

I want to scatterplot two categorical variables as follows
from matplotlib import pyplot as plt
a=[1,1,1,1,2,2]
b=[2,2,2,2,1,1]
plt.scatter(a,b)
If I plot this I will see only two points (4 overlapping in (1,2), and 2 overlapping in (2,1)) without being able to appreciate the different occurrence of the two overlapping points.
I would like to see a scatter plot where the marker of the point of the left (1,2) is twice bigger than the marker on the point on the right (2,1), in order to show the different occurrence of the point. What is the correct way to do this? (beside the trival solution where I count occurrences by hand and I put them inside the size argument of plt.scatter)
I already searched other SOF questions, but they all propose to use an alpha like here, but I would like to see a marker size to appreciate better the different proportionalities between occurrences.
A pointer might be to use some Kernel Density Estimate as suggested in this answer
To give a bit more context to my question, the two output are the predictions of two classifiers, and I want to explore the differences between the predictions to evaluate whether to ensemble them.

You can make use of the occurrence frequency of the x-points (or even y-points for this particular data set) which can be obtained using Counter module. The frequencies can then be used as a rescaling factor for defining the size of the markers. Here 200 is just a big number to emphasize the size of the markers.
from matplotlib import pyplot as plt
from collections import Counter
a=[1,1,1,1,2,2]
b=[2,2,2,2,1,1]
weights = [200*i for i in Counter(a).values() for j in range(i)]
plt.scatter(a, b, s = weights)
plt.show()
Another option to visualise the distribution is a bar chart
freqs = Counter(a)
plt.bar(freqs.keys(), freqs.values(), width=0.5)
plt.xticks(list(freqs.keys()))

issue with python plot

So with this code I need to plot an IV-curve exponentially decaying, but it is in wrong direction and needs to be mirrored/flipped. The x andy values are not being plotted in the correct axes and needs to be switched. It would show the relation with current exponentially decreasing while given a voltage.I tried all sorts of debugging, but it kept showing an exponential growth or the same kind of decay.
import matplotlib.pyplot as plt
import numpy as np
xdata=np.linspace(23,0)# voltage data
ydata=np.exp(xdata)# current data
plt.plot(ydata,xdata)
plt.title(r'IV-curve')
plt.xlabel('Voltage(V)')
plt.ylabel('Current(I)')
plt.show()
Here's what it looks like: http://imgur.com/a/NJf3g
Also, bear with me as this may seem like a trivial code, but I literally started coding for the first time last week, so I will get some bumps on the road :)

The problem is that the ydata that you use are not correctly ordered.
The solution is simple. Reorder the ydata.
Do this:
import matplotlib.pyplot as plt
import numpy as np
xdata = np.linspace(23,0)# voltage data
ydata = np.exp(xdata)# current data
ydata = np.sort(ydata)
plt.plot(ydata,xdata)
plt.title(r'IV-curve')
plt.xlabel('Voltage(V)')
plt.ylabel('Current(I)')
plt.show()
Result:

It looks like maybe
plt.plot(ydata,xdata)
should be
plt.plot(xdata,ydata)
This will correct the axes. But you still aren't going to get a decaying exponential. Why? Not because of the plotting but because of your data. Your data is a growing exponential. If you want decay use something like
ydata=np.exp(-xdata)
i.e. minus sign in front of xdata.

Got an extra line on python plot

i'm using pyplot to show the FFT of the signal 'a', here the code:
myFFT = numpy.fft.fft(a)
x = numpy.arange(len(a))
fig2 = plt.figure(2)
plt.plot(numpy.fft.fftfreq(x.shape[-1]), myFFT)
fig2.show()
and i get this figure
There is a line from the begin to the end of the signal in the frequency domain. How i can remove this line? AM I doing something wrong with pyplot?

Instead of sorted, you might want to use np.fft.fftshift to center you 0th frequency, this deals properly with odd- and even-size signals. Most importantly, you need to apply the transform on both x and y vectors you are plotting.
plt.plot(np.fft.fftshift(np.fft.fftfreq(x.shape[-1])), np.fft.fftshift(myFFT))
You might also want to display the amplitude or phase of the FFT (np.abs or np.angle) - as-is, you are just plotting the real-part.

Have a look at plt.plot(numpy.fft.fftfreq(x.shape[-1]): the first and last points are the same, hence the graph "makes a loop"
You can do plt.plot(sorted(numpy.fft.fftfreq(x.shape[-1])),myFFT) or plt.plot(myFFT)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.