How to change plot properties of statsmodels qqplot? (Python) - python

So I am plotting a normal Q-Q plot using statsmodels.graphics.gofplots.qqplot().
The module uses matplotlib.pyplot to create figure instance. It plots the graph well.
However, I would like to plot the markers with alpha=0.3.
Is there a way to do this?
Here is a sample of code:
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
test = np.random.normal(0,1, 1000)
sm.qqplot(test, line='45')
plt.show()
And the output figure:

You can use statsmodels.graphics.gofplots.ProbPlot class which has qqplot method to pass matplotlib pyplot.plot **kwargs.
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
test = np.random.normal(0, 1, 1000)
pp = sm.ProbPlot(test, fit=True)
qq = pp.qqplot(marker='.', markerfacecolor='k', markeredgecolor='k', alpha=0.3)
sm.qqline(qq.axes[0], line='45', fmt='k--')
plt.show()

qqplot returns a figure object which can be used to get the lines which can then be modified using set_alpha
fig = sm.qqplot(test, line='45');
# Grab the lines with blue dots
dots = fig.findobj(lambda x: hasattr(x, 'get_color') and x.get_color() == 'b')
[d.set_alpha(0.3) for d in dots]
Obviously you have a bit of overlap of the dots so even though they have a low alpha value, where they are piled on top of one another they look to be more opaque.

Related

How to make horizontal linechart with categorical variables and timeseries?

I want to replicate plots from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000555/pdf/nihms774453.pdf I'm particularly interested in plot on page 16, right panel. I tried to do this in matplotlib but it seems to me that there is no way to access lines in linecollection.
I don't know how to change the color of the each line, according to the value at every index. I'd like to eventually get something like here: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/multicolored_line.html but for every line, according to the data.
this is what I tried:
the data in numpy array: https://pastebin.com/B1wJu9Nd
import pandas as pd, numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from matplotlib import colors as mcolors
%matplotlib inline
base_range = np.arange(qq.index.max()+1)
fig, ax = plt.subplots(figsize=(12,8))
ax.set_xlim(qq.index.min(), qq.index.max())
# ax.set_ylim(qq.columns[0], qq.columns[-1])
ax.set_ylim(-5, len(qq.columns) +5)
line_segments = LineCollection([np.column_stack([base_range, [y]*len(qq.index)]) for y in range(len(qq.columns))],
cmap='viridis',
linewidths=(5),
linestyles='solid',
)
line_segments.set_array(base_range)
ax.add_collection(line_segments)
axcb = fig.colorbar(line_segments)
plt.show()
my result:
what I want to achieve:

how to reduce y-axis in matplot with same distance

I want this plot's y-axis to be centered at 38, and the y-axis scaled such that the 'humps' disappear. How do I accomplish this?
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
s=['05/02/2019', '06/02/2019', '07/02/2019', '08/02/2019',
'09/02/2019', '10/02/2019', '11/02/2019', '12/02/2019',
'13/02/2019', '20/02/2019', '21/02/2019', '22/02/2019',
'23/02/2019', '24/02/2019', '25/02/2019']
df[0]=['38.02', '33.79', '34.73', '36.47', '35.03', '33.45',
'33.82', '33.38', '34.68', '36.93', '33.44', '33.55',
'33.18', '33.07', '33.17']
# Data for plotting
fig, ax = plt.subplots(figsize=(17, 2))
for i,j in zip(s,df[0]):
ax.annotate(str(j),xy=(i,j+0.8))
ax.plot(s, df[0])
ax.set(xlabel='Dates', ylabel='Latency',
title='Hongkong to sing')
ax.grid()
#plt.yticks(np.arange(min(df[p]), max(df[p])+1, 2))
fig.savefig("test.png")
plt.show()
I'm not entirely certain if this is what you're looking for but you can adjust the y-limits explicitly to change the scale, i.e.
ax.set_ylim([ax.get_ylim()[0], 42])
Which only sets the upper bound, leaving the lower limit unchanged, this would give you
you can supply any values you find appropriate, i.e.
ax.set_ylim([22, 52])
will give you something that looks like
Also note that the tick labels and general appearance of your plot will differ from what is shown here.
Edit - Here is the complete code as requested:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame()
s=['05/02/2019', '06/02/2019', '07/02/2019', '08/02/2019',
'09/02/2019', '10/02/2019', '11/02/2019', '12/02/2019',
'13/02/2019', '20/02/2019', '21/02/2019', '22/02/2019',
'23/02/2019', '24/02/2019', '25/02/2019']
df[0]=['38.02','33.79','34.73','36.47','35.03','33.45',
'33.82','33.38','34.68','36.93','33.44','33.55',
'33.18','33.07','33.17']
# Data for plotting
fig, ax = plt.subplots(figsize=(17, 3))
#for i,j in zip(s,df[0]):
# ax.annotate(str(j),xy=(i,j+0.8))
ax.plot(s, pd.to_numeric(df[0]))
ax.set(xlabel='Dates', ylabel='Latency',
title='Hongkong to sing')
ax.set_xticklabels(pd.to_datetime(s).strftime('%m.%d'), rotation=45)
ax.set_ylim([22, 52])
plt.show()

Using a colormap for a pandas Series

I have pandas series of complex numbers, which I would like to plot. Currently, I am looping through each point and assigning it a color. I would prefer to generate the plot without the need to loop over each point... Using Series.plot() would be preferable. Converting series to numpy is ok though.
Here is an example of what I currently have:
import pandas as pd
import numpy as np
from matplotlib import pyplot
s = pd.Series((1+np.random.randn(500)*0.05)*np.exp(1j*np.linspace(-np.pi, np.pi, 500)))
cmap = pyplot.cm.viridis
for i, val in enumerate(s):
pyplot.plot(np.real(val), np.imag(val), 'o', ms=10, color=cmap(i/(len(s)-1)))
pyplot.show()
You can use pyplot.scatter, which allows coloring of points based on a value.
pyplot.scatter(np.real(s), np.imag(s), s=50, c=np.arange(len(s)), cmap='viridis')
Here, we set c to an increasing sequence to get the same result as in the question.
You can simply plot the real and imaginary part of the series without a loop.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
s = pd.Series((1+np.random.randn(500)*0.05)*np.exp(1j*np.linspace(-np.pi, np.pi, 500)))
plt.plot(s.values.real,s.values.imag, marker="o", ls="")
plt.show()
However, you need to use a scatter plot if you want to have different colors:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
s = pd.Series((1+np.random.randn(500)*0.05)*np.exp(1j*np.linspace(-np.pi, np.pi, 500)))
plt.scatter(s.values.real,s.values.imag, c = range(len(s)), cmap=plt.cm.viridis)
plt.show()

how to modify the autocorrelation default plot style and write the output of a acorr function to a dat/txt file?

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab`
mu = np.loadtxt('my_data/corr.txt')
d = mu[:,2]
y=[]
tot=0
min=999
for i in d:
y.append(float(i))
tot=tot+float(i)
if (min>float(i)):
min=float(i)
av=tot/len(y)
z=[]
m=[]
for i in y:
z.append(i-av)
m.append(i-min)
plt.acorr(z,usevlines=True,maxlags=None,normed=True)
plt.show()
WIth this code I have the output showing a bar chart.
Now,
1) How do I change this plot style to give just the trend line? I cant modify the line properties by any means.
2) How do I write this output data to a dat or txt file?
this should be a working minimal example:
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import normal
data = normal(0, 1, 1000)
# return values are lags, correlation vector and the drawn line
lags, corr, line, rest = plt.acorr(data, marker=None, linestyle='-', color='red', usevlines=False)
plt.show()
np.savetxt("correlations.txt", np.transpose((lags, corr)), header='Lags\tCorrelation')
But i would recommand not to connect the points.

pyplot: loglog() with base e

Python (and matplotlib) newbie here coming over from R, so I hope this question is not too idiotic. I'm trying to make a loglog plot on a natural log scale. But after some googling I cannot somehow figure out how to force pyplot to use a base e scale on the axes. The code I have currently:
import matplotlib.pyplot as pyplot
import math
e = math.exp(1)
pyplot.loglog(range(1,len(degrees)+1),degrees,'o',basex=e,basey=e)
Where degrees is a vector of counts at each value of range(1,len(degrees)+1). For some reason when I run this code, pyplot keeps giving me a plot with powers of 2 on the axes. I feel like this ought to be easy, but I'm stumped...
Any advice is greatly appreciated!
When plotting using plt.loglog you can pass the keyword arguments basex and basey as shown below.
From numpy you can get the e constant with numpy.e (or np.e if you import numpy as np)
import numpy as np
import matplotlib.pyplot as plt
# Generate some data.
x = np.linspace(0, 2, 1000)
y = x**np.e
plt.loglog(x,y, basex=np.e, basey=np.e)
plt.show()
Edit
Additionally if you want pretty looking ticks you can use matplotlib.ticker to choose the format of your ticks, an example of which is given below.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
x = np.linspace(1, 4, 1000)
y = x**3
fig, ax = plt.subplots()
ax.loglog(x,y, basex=np.e, basey=np.e)
def ticks(y, pos):
return r'$e^{:.0f}$'.format(np.log(y))
ax.xaxis.set_major_formatter(mtick.FuncFormatter(ticks))
ax.yaxis.set_major_formatter(mtick.FuncFormatter(ticks))
plt.show()
It can also works for semilogx and semilogy to show them in e and also change their name.
import matplotlib.ticker as mtick
fig, ax = plt.subplots()
def ticks(y, pos):
return r'$e^{:.0f}$'.format(np.log(y))
plt.semilogy(Time_Series, California_Pervalence ,'gray', basey=np.e )
ax.yaxis.set_major_formatter(mtick.FuncFormatter(ticks))
plt.show()
Take a look at the image.

Categories