I have a dataframe named concatenated_df
I am plotting the data with the following code
(concatenated_df[concatenated_df.DAY.eq('Tuesday')].groupby('COMPANY')['STATUS'].value_counts(normalize=True).unstack().plot.bar())
plt.xticks(rotation=0)
plt.show()
which gives me an output plot as
How can I plot only those values which are greater than 0.8?
In the current example, it should print only VEDL.NS and WIPRO.NS
you can filter Data frame which has values greater than 80 and save it into new data frame and then plot it
you can use this for example
but you need to sepcify wich colum are you want to fillter
new_df= df[df.b > 80]
plot df2
Related
I have the following pandas Data Frame:
and I need to make line plots using the column names (400, 400.5, 401....) as the x axis and the data frame values as the y axis, and using the index column ('fluorophore') as the label for that line plot. I want to be able to choose which fluorophores I want to plot.
How can I accomplish that?
I do not know your dataset, so if it's always just full columns of NaN you could do
df[non_nan_cols].T[['FAM', 'TET']].plot.line()
Where non_nan_cols is a list of your columns that do not contain NaN values.
Alternatively, you could
choice_of_fp = df.index.tolist()
x_val = np.asarray(df.columns.tolist())
for i in choice_of_fp:
mask = np.isfinite(df.loc[i].values)
plt.plot(x_val[mask], df.loc[i].values[mask], label=i)
plt.legend()
plt.show()
which allows to have NaN values. Here choice_of_fp is a list containing the fluorophores you want to plot.
You can do the below and it will use all columns except the index and plot the chart.
abs_data.set_index('fluorophore ').plot()
If you want to filter values for fluorophore then you can do this
abs_data[abs_data.fluorophore .isin(['A', 'B'])].set_index('fluorophore ').plot()
Suppose i have more than 100 columns into a dataset.
I want to put the Column_Names count along with the cloumn names(i.e. xticklabels) while displaying the heatmap as it is difficult to understand that the Heatmap is displaying all the columns or not.
I want to plot Column_Names count on the X-axis of heatmap to see that how many columns are getting displayed into the heatmap.
currently i'm using this code to plot heatmap:
plt.figure(figsize=(15, 8))
sns.heatmap(df_test.isnull(), yticklabels =False)
which gives output:
CLICK to see the real output
Click to see Expected output sample marked with index
I am a beginner in seaborn plotting and noticed that sns.barplot shows the value of bars using a parameter called estimator.
Is there a way for the barplot to show the value of each column instead of using a statiscal approach through the estimator parameter?
For instance, I have the following dataframe:
data = [["2019/oct",10],["2019/oct",20],["2019/oct",30],["2019/oct",40],["2019/nov",20],["2019/dec",30]]
df = pd.DataFrame(data, columns=['Period', 'Observations'])
I would like to plot all values from the Period "2019/oct" column (10,20,30 and 40), but the bar chart returns the average of these values (25) for the period "2019/oct":
sns.barplot(x='Period',y='Observations',data=df,ci=None)
How can I bring all column values to the chart?
barplot combines values with the same x, unless the have a different hue. If you want to keep the different value for "2019/oct", you could create a new column to attribute them a different hue:
data = [["2019/oct",10],["2019/oct",20],["2019/oct",30],["2019/oct",40],["2019/nov",20],["2019/dec",30]]
df = pd.DataFrame(data, columns=['Period', 'Observations'])
df['subgroup'] = df.groupby('Period').cumcount()+1
sns.barplot(x='Period',y='Observations',hue='subgroup',data=df,ci=None)
I want to create a Pie chart using single column of my dataframe, say my column name is 'Score'. I have stored scores in this column as below :
Score
.92
.81
.21
.46
.72
.11
.89
Now I want to create a pie chart with the range in percentage.
Say 0-0.4 is 30% , 0.4-0.7 is 35 % , 0.7+ is 35% .
I am using the below code using
df1['bins'] = pd.cut(df1['Score'],bins=[0,0.5,1], labels=["0-50%","50-100%"])
df1 = df.groupby(['Score', 'bins']).size().unstack(fill_value=0)
df1.plot.pie(subplots=True,figsize=(8, 3))
With the above code I am getting the Pie chart, but i don’t know how i can do this using percentage.
my pie chart look like this for now
Cutting the dataframe up into bins is the right first step. After which, you can use value_counts with normalize=True in order to get relative frequencies of values in the bins column. This will let you see percentage of data across ranges that are defined in the bins.
In terms of plotting the pie chart, I'm not sure if I understood correctly, but it seemed like you would like to display the correct legend values and the percentage values in each slice of the pie.
pandas.DataFrame.plot is a good place to see all parameters that can be passed into the plot method. You can specify what are your x and y columns to use, and by default, the dataframe index is used as the legend in the pie plot.
To show the percentage values per slice, you can use the autopct parameter as well. As mentioned in this answer, you can use all the normal matplotlib plt.pie() flags in the plot method as well.
Bringing everything together, this is the resultant code and the resultant chart:
df = pd.DataFrame({'Score': [0.92,0.81,0.21,0.46,0.72,0.11,0.89]})
df['bins'] = pd.cut(df['Score'], bins=[0,0.4,0.7,1], labels=['0-0.4','0.4-0.7','0.7-1'], right=True)
bin_percent = pd.DataFrame(df['bins'].value_counts(normalize=True) * 100)
plot = bin_percent.plot.pie(y='bins', figsize=(5, 5), autopct='%1.1f%%')
Plot of Pie Chart
Hi I have a data frame in the following format.
For simplicity i am showing the data categorized as years, but it has the quarterly data.
I want to do a line plot with min max as shadow and mean as a line plot. I tried different ways to do it but i am not able to get it in the output i need shown below.
As an alternative a box plot with mean, min and max will also work.
Data format
Output Needed
IIUC, groupby YEAR and aggregate your Value column by max, min and mean, then plot mean and use fill_between to do the coloring inside max and min.
data = df.groupby('YEAR')['VALUE'].agg({'Low Value':'min','High Value':'max','Mean':'mean'})
data.reset_index(inplace=True)
ax = data.plot(x='YEAR', y='Mean', c='white')
plt.fill_between(x='YEAR',y1='Low Value',y2='High Value', data=data)