Plot quartiles of data series in a matplotlib chart - python

I would like to illustrate the quartiles of a distribution sample with matplotlib. It is probably best explained by an example:
import matplotlib.pyplot as plt
import numpy as np
import random
x = sorted([random.randrange(0,n) for n in range(1,1000)])
median_y = np.median(x)
median_x = x.index(median)
plt.plot(x)
plt.plot((median_x,median_x), (0,median_y),'k:')
plt.plot((0,median_x), (median_y,median_y),'k:')
Do you see a more convenient way to add quartiles 1,2 (median), and 3 than my clumsy solution? I could not find any command to plot a point with helper lines like this. And how could I add numbers to the points or axes?

Related

Controlling Bin Widths in Altair

I have a set of numbers that I'd like to plot on a histogram.
Say:
import numpy as np
import matplotlib.pyplot as plt
my_numbers = np.random.normal(size = 1000)
plt.hist(my_numbers)
If I want to control the size and range of the bins I could do this:
plt.hist(my_numbers, bins=np.arange(-4,4.5,0.5))
Now, if I want to plot a histogram in Altair the code below will do, but how do I control the size and range of the bins in Altair?
import pandas as pd
import altair as alt
my_numbers_df = pd.DataFrame.from_dict({'Integers': my_numbers})
alt.Chart(my_numbers_df).mark_bar().encode(
alt.X("Integers", bin = True),
y = 'count()',
)
I have searched Altair's docs but all their explanations and sample charts (that I could find) just said bin = True with no further modification.
Appreciate any pointers :)
As demonstrated briefly in the Bin transforms section of the documentation, you can pass an alt.Bin() instance to fine-tune the binning parameters.
The equivalent of your matplotlib histogram would be something like this:
alt.Chart(my_numbers_df).mark_bar().encode(
alt.X("Integers", bin=alt.Bin(extent=[-4, 4], step=0.5)),
y='count()',
)

Beginner question: Python scatter plot with normal distribution not plotting

I have an array of random integers for which I have calculated the mean and std, the standard deviation. Next I have an array of random numbers within the normal distribution of this (mean, std).
I want to plot now a scatter plot of the normal distribution array using matplotlib. Can you please help?
Code:
random_array_a = np.random.randint(2,15,size=75) #random array from [2,15)
mean = np.mean(random_array_a)
std = np.std(random_array_a)
sample_norm_distrib = np.random.normal(mean,std,75)
The scatter plot needs x and y axis...but what should it be?
I think what you may want is a histogram of the normal distribution:
import matplotlib.pyplot as plt
%matplotlib inline
plt.hist(sample_norm_distrib)
The closest thing you can do to visualise your distribution of 1D output is doing scatter where your x & y are the same. this way you can see more accumulation of data in the high probability areas. For example:
import numpy as np
import matplotlib.pyplot as plt
mean = 0
std = 1
sample_norm_distrib = np.random.normal(mean,std,7500)
plt.figure()
plt.scatter(sample_norm_distrib,sample_norm_distrib)

Tracing functions in python

I was searching about how to trace function graphs, but not only linear ones, I know how to plot with simple points, they are the linear ones like this one below:
import numpy
import matplotlib.pyplot as plt
%matplotlib inline
_=plt.plot([4,7],[5,7],color ='w')
_=plt.plot([4,7],[7,7],color ='w')
ax = plt.gca()
ax.set_facecolor('xkcd:red')
plt.show()
then after a bit of searching, I've found this code:
import pylab
import numpy
x = numpy.linspace(-15,15,100) # 100 linearly spaced numbers
y = numpy.sin(x)/x # computing the values of sin(x)/x
# compose plot
pylab.plot(x,y) # sin(x)/x
pylab.plot(x,y,'co') # same function with cyan dots
pylab.plot(x,2*y,x,3*y) # 2*sin(x)/x and 3*sin(x)/x
pylab.show() # show the plot
That works perfectly! But what I'm wondering is: do we really need to use standard functions that have defined by Numpy?( like sin(x)/x here ) Or can we define a function ourselves and use it in Numpy function too, like x**3?
This solved issue, Thanks FlyingTeller
An example of y=x**3 graph:
import pylab
import numpy
x = numpy.linspace(-15,15,100) # 100 linearly spaced numbers
y = x**3 # we change this to tracer graphs as we want
# compose plot
pylab.plot(x,y)
pylab.show()

Custom scale from simple list or dict?

I need to make a custom scale for an axis. Before diving into http://matplotlib.org/examples/api/custom_scale_example.html, I'm wondering if there is an easier way for my special case.
A picture is worth a thousand words, so here we go:
See the value in each row next to the filename ? I would like the row height to be relative to the difference between it and the previous one. I'd start from 0 and would have to define a top limit so I see the last row.
Try matplotlib's pcolormesh with which you can create irregularly shaped grids.
from matplotlib import pyplot as plt
import numpy as np
y1D = np.hstack([0, np.random.random(9)])
y1D = np.sort(y1D)/np.max(y1D)
x, y = np.meshgrid(np.arange(0,1.1,0.1),y1D)
plt.pcolormesh(x,y, np.random.random((10,10)))
plt.show()
You can use this recipe and adapt to your needs:
import numpy as np
import matplotlib.pyplot as plt
grid = np.zeros((20,20))
for i in range(grid.shape[0]):
r = np.random.randint(1,19)
grid[i,:r] = np.random.randint(10,30,size=(r,))
plt.imshow(grid,origin='lower',cmap='Reds',interpolation='nearest')
plt.yticks(list(range(20)),['File '+str(i) for i in range(20)])
plt.colorbar()
plt.show()
, the result is this:

Plotting random point on Function - Pandas

I want to graph a function 2D or 3D
for example a f(x) = sin(x)
Then randomly plot a certain amount of points
I am using IPython and I think this might be possible using Pandas
You can use np.random.uniform to generate a few random points along x-axis and calculate corresponding f(x) values.
import numpy as np
import matplotlib.pyplot as plt
# generate 20 points from uniform (-3,3)
x = np.random.uniform(-3, 3, size=20)
y = np.sin(x)
fig, ax = plt.subplots()
ax.scatter(x,y)
You should post example code so people can demonstrate it more easily.
(numpy.random.random(10)*x_scale)**2
Generate an array of random numbers between 0 and 1, scale as appropriate (so for (-10,0);
10*numpy.random.random(100) -10
then pass this to any function that can calculate the value of f(x) for each element of the array.
Use shape() if you need to play around with layout of the array.
If you want to use Pandas...
import pandas as pd
import matplotlib.pyplot as plt
x=linspace(0,8)
y=sin(x)
DF=pd.DataFrame({'x':x,'y':y})
plot values:
DF.plot(x='x',y='y')
make a random index:
RandIndex=randint(0,len(DF),size=20)
use it to select from original DF and plot:
DF.iloc[RandIndex].plot(x='x',y='y',kind='scatter',s=120,ax=plt.gca())

Categories