How to plot min max line plot in python pandas - python

Hi I have a data frame in the following format.
For simplicity i am showing the data categorized as years, but it has the quarterly data.
I want to do a line plot with min max as shadow and mean as a line plot. I tried different ways to do it but i am not able to get it in the output i need shown below.
As an alternative a box plot with mean, min and max will also work.
Data format
Output Needed

IIUC, groupby YEAR and aggregate your Value column by max, min and mean, then plot mean and use fill_between to do the coloring inside max and min.
data = df.groupby('YEAR')['VALUE'].agg({'Low Value':'min','High Value':'max','Mean':'mean'})
data.reset_index(inplace=True)
ax = data.plot(x='YEAR', y='Mean', c='white')
plt.fill_between(x='YEAR',y1='Low Value',y2='High Value', data=data)

Related

I can't get my histogram to display solid bars instead it comes up with individual lines for each data point

I am completely new to python and I have been trying to create a histogram from a random 200x200 numpy array. I have included the code below:
random=np.random.normal(size=(200,200))
print(random)
plt.hist(random, bins=10, histtype='bar')
plt.savefig('histogram.jpeg')
The graph I get from this code looks like this:
I cannot get the graph to form bars for the amount of bins and I don't know why.
In addition to this I need to overplot the mean, median and the mean +/- 1 sigma(68% confidence interval) values as vertical lines. I know you can get the mean and median values using the code below but i'm not sure how to overplot these values as vertical lines.
median = np.median(random)
mean = np.mean(random)

How can I plot the sum of values (rather than the count) using seaborn violinplot?

I have a data set with various quantities which I would like to make a violin plot of using seaborn. However, instead of visualizing the COUNT of occurrences of all of the quantities I'd like to display the SUM of the quantities.
For example, if I have a data set like this...
df = pd.DataFrame({'quantity':[1,1,1,4]})
sns.violinplot(data=df)
Actual Result
I really want it to display something like this...
df = pd.DataFrame({'quantity':[1,1,1,4,4,4,4]})
sns.violinplot(data=df)
Expected Result
My data set is around 500k values ranging between 1 and 10k so its not possible to transform the data as above (ie [1,1,2,3] -> [1,1,2,2,3,3,3]).
I know I could do something like this below to get the values I want and then plot with a bar plot or something...
df.quantity.value_counts() * df.quantity.value_counts().index
however, I really like seaborns violin plot and the ability to pair it with catplot so if anyone knows a way to do this in seaborn I'd be very grateful.

How to add condition to value_counts method

I have a dataframe named concatenated_df
I am plotting the data with the following code
(concatenated_df[concatenated_df.DAY.eq('Tuesday')].groupby('COMPANY')['STATUS'].value_counts(normalize=True).unstack().plot.bar())
plt.xticks(rotation=0)
plt.show()
which gives me an output plot as
How can I plot only those values which are greater than 0.8?
In the current example, it should print only VEDL.NS and WIPRO.NS
you can filter Data frame which has values greater than 80 and save it into new data frame and then plot it
you can use this for example
but you need to sepcify wich colum are you want to fillter
new_df= df[df.b > 80]
plot df2

I want to create a pie chart using a dataframe column in python

I want to create a Pie chart using single column of my dataframe, say my column name is 'Score'. I have stored scores in this column as below :
Score
.92
.81
.21
.46
.72
.11
.89
Now I want to create a pie chart with the range in percentage.
Say 0-0.4 is 30% , 0.4-0.7 is 35 % , 0.7+ is 35% .
I am using the below code using
df1['bins'] = pd.cut(df1['Score'],bins=[0,0.5,1], labels=["0-50%","50-100%"])
df1 = df.groupby(['Score', 'bins']).size().unstack(fill_value=0)
df1.plot.pie(subplots=True,figsize=(8, 3))
With the above code I am getting the Pie chart, but i don’t know how i can do this using percentage.
my pie chart look like this for now
Cutting the dataframe up into bins is the right first step. After which, you can use value_counts with normalize=True in order to get relative frequencies of values in the bins column. This will let you see percentage of data across ranges that are defined in the bins.
In terms of plotting the pie chart, I'm not sure if I understood correctly, but it seemed like you would like to display the correct legend values and the percentage values in each slice of the pie.
pandas.DataFrame.plot is a good place to see all parameters that can be passed into the plot method. You can specify what are your x and y columns to use, and by default, the dataframe index is used as the legend in the pie plot.
To show the percentage values per slice, you can use the autopct parameter as well. As mentioned in this answer, you can use all the normal matplotlib plt.pie() flags in the plot method as well.
Bringing everything together, this is the resultant code and the resultant chart:
df = pd.DataFrame({'Score': [0.92,0.81,0.21,0.46,0.72,0.11,0.89]})
df['bins'] = pd.cut(df['Score'], bins=[0,0.4,0.7,1], labels=['0-0.4','0.4-0.7','0.7-1'], right=True)
bin_percent = pd.DataFrame(df['bins'].value_counts(normalize=True) * 100)
plot = bin_percent.plot.pie(y='bins', figsize=(5, 5), autopct='%1.1f%%')
Plot of Pie Chart

How to connect boxplot median values

It seems like plotting a line connecting the mean values of box plots would be a simple thing to do, but I couldn't figure out how to do this plot in pandas.
I'm using this syntax to do the boxplot so that it automatically generate the box plot for Y vs. X device without having to do external manipulation of the data frame:
df.boxplot(column='Y_Data', by="Category", showfliers=True, showmeans=True)
One way I thought of doing is to just do a line plot by getting the mean values from the boxplot, but I'm not sure how to extract that information from the plot.
You can save the axis object that gets returned from df.boxplot(), and plot the means as a line plot using that same axis. I'd suggest using Seaborn's pointplot for the lines, as it handles a categorical x-axis nicely.
First let's generate some sample data:
import pandas as pd
import numpy as np
import seaborn as sns
N = 150
values = np.random.random(size=N)
groups = np.random.choice(['A','B','C'], size=N)
df = pd.DataFrame({'value':values, 'group':groups})
print(df.head())
group value
0 A 0.816847
1 A 0.468465
2 C 0.871975
3 B 0.933708
4 A 0.480170
...
Next, make the boxplot and save the axis object:
ax = df.boxplot(column='value', by='group', showfliers=True,
positions=range(df.group.unique().shape[0]))
Note: There's a curious positions argument in Pyplot/Pandas boxplot(), which can cause off-by-one errors. See more in this discussion, including the workaround I've employed here.
Finally, use groupby to get category means, and then connect mean values with a line plot overlaid on top of the boxplot:
sns.pointplot(x='group', y='value', data=df.groupby('group', as_index=False).mean(), ax=ax)
Your title mentions "median" but you talk about category means in your post. I used means here; change the groupby aggregation to median() if you want to plot medians instead.
You can get the value of the medians by using the .get_data() property of the matplotlib.lines.Line2D objects that draw them, without having to use seaborn.
Let bp be your boxplot created as bp=plt.boxplot(data). Then, bp is a dict containing the medians key, among others. That key contains a list of matplotlib.lines.Line2D, from which you can extract the (x,y) position as follows:
bp=plt.boxplot(data)
X=[]
Y=[]
for m in bp['medians']:
[[x0, x1],[y0,y1]] = m.get_data()
X.append(np.mean((x0,x1)))
Y.append(np.mean((y0,y1)))
plt.plot(X,Y,c='C1')
For an arbitrary dataset (data), this script generates this figure. Hope it helps!

Categories