I am creating a stacked area chart using pandas df.plot(kind = area). Some of my data values are zero at some times. I would like to not have the line show where the value is zero. Is it possible to hide the line while still showing the area?
Here is basic code that makes a simple graph. I don't want the red line to show between 3 and 4 because the values are 0.
import numpy as np
import pandas as pd
data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, columns = ['A','B','C'])
df['C']=np.where(df.index==4,0,df['C'])
df['C']=np.where(df.index==3,0,df['C'])
df.plot(kind='area')
I have finally worked out the solution to this. Other places suggested edgecolor etc but it didn't solve the problem. linewidth, however, does.
linewidth=0
or, in your case, use the line of code:
df.plot(kind='area', linewidth=0)
Related
I have numerous sets of seasonal data that I am looking to show in a heatmap format. I am not worried about the magnitude of the values in the dataset but more the overall direction and any patterns that i can look at in more detail later. To do this I want to create a heatmap that only shows 2 colours (red for below zero and green for zero and above).
I can create a normal heatmap with seaborn but the normal colour maps do not have only 2 colours and I am not able to create one myself. Even if I could I am unable to set the parameters to reflect the criteria of below zero = red and zero+ = green.
I managed to create this simply by styling the dataframe but I was unable to export it as a .png because the table_criteria='matplotlib' option removes the formatting.
Below is an example of what I would like to create made from random data, could someone help or point me in the direction of a helpful Stackoverflow answer?
I have also included the code I used to style and export the dataframe.
Desired output - this is created with random data in an Excel spreadsheet
#Code to create a regular heatmap - can this be easily amended?
df_hm = pd.read_csv(filename+h)
pivot = df_hm.pivot_table(index='Year', columns='Month', values='delta', aggfunc='sum')
fig, ax = plt.subplots(figsize=(10,5))
ax.set_title('M1 '+h[:-7])
sns.heatmap(pivot, annot=True, fmt='.2f', cmap='RdYlGn')
plt.savefig(chartpath+h[:-7]+" M1.png", bbox_inches='tight')
plt.close()
#code used to export dataframe that loses format in the .png
import matplotlib.pyplot as plt
import dataframe_image as dfi
#pivot is the dateframe name
pivot = pd.DataFrame(np.random.randint(-100,100,size= (5, 12)),columns=list ('ABCDEFGHIJKL'))
styles = [dict(selector="caption", props=[("font-size", "120%"),("font-weight", "bold")])]
pivot = pivot.style.format(precision=2).highlight_between(left=-100000, right=-0.01, props='color:white;background-color:red').highlight_between(left=0, right= 100000, props='color:white;background-color:green').set_caption(title).set_table_styles(styles)
dfi.export(pivot, root+'testhm.png', table_conversion='matplotlib',chrome_path=None)
You can manually set cmap property to list of colors and if you want to annotate you can do it and it will show same value as it's not converted to -1 or 1.
import numpy as np
import seaborn as sns
arr = np.random.randn(10,10)
sns.heatmap(arr,cmap=["grey",'green'],annot=True,center=0)
# center will make it dividing point
Output:
PS. If you don't want color-bar you can pass cbar=False in `sns.heatmap)
Welcome to SO!
To achieve what you need, you just need to pass delta through the sign function, here's an example code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
arr = np.random.randn(25,25)
sns.heatmap(np.sign(arr))
Which results in a binary heatmap, albeit one with a quite ugly colormap, still, you can fiddle around with Seaborn's colormaps in order to make it look like excel.
1 - My goal is to create a bar plot of grades (y axis) and students id (x axis).
2 - Add an extra column with the mean() of the grades in a different color.
What's the best way of doing it?
I could create the first part but when it comes to change the color of the following column (mean), I couldn't finish it.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = pd.read_excel('x.xlsx')
Felipe_stu = a['Teacher'] == 'Felipe'
Felipe_stu.plot(kind = 'bar', figsize = (20,5), color = 'gold')
Example of data (the first 10):
data example
Example of plot:
I've already tried to create a list with all the colors of the respective items on the plot.
Such as:
my_color = []
for c in range(0, len(Jorge_stu))
my_color.append('gold')
my_color.append('blue')
So, I would make the last column (the mean) in the color that I chose (blue in this case). This didn't work.
Any ideas how can I put the mean column on my plot?
Is it a better option to add an extra column to the plot or to add it in the proper dataframe and afterwards plot it?
U may need to do something like this:
How to create a matplotlib bar chart with a threshold line?
the threshold value in the above example, will be ur mean line, and that can be simply calculated with the df[score_column_name].mean()
My simple Dataframe produces a plot with 4 single, horizontal bars, rather than one stacked horizontal bar. I've tried transposing it etc - without success. I'm sure I'm doing something simple wrong - but I can't work it out. Help much appreciated!
import pandas as pd
import matplotlib.pyplot as plt
fake_data = [['dogs',12],['cats',8],['fish',22],['bird',8]]
myDF = pd.DataFrame(fake_data)
myDF.columns = ['animals','count']
myDF.plot.barh(stacked=True)
plt.show()
I think you need create one row DataFrame with Series.to_frame and transpose by DataFrame.T:
myDF.set_index('animals')['count'].to_frame().T.plot.barh(stacked=True)
I want to create a matplotlib bar plot that has the look of a stacked plot without being additive from a multi-index pandas dataframe.
The below code gives the basic behaviour
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import io
data = io.StringIO('''Fruit,Color,Price
Apple,Red,1.5
Apple,Green,1.0
Pear,Red,2.5
Pear,Green,2.3
Lime,Green,0.5
Lime, Red, 3.0
''')
df_unindexed = pd.read_csv(data)
df_unindexed
df = df_unindexed.set_index(['Fruit', 'Color'])
df.unstack().plot(kind='bar')
The plot command df.unstack().plot(kind='bar') shows all the apple prices grouped next to each other. If you choose the option df.unstack().plot(kind='bar',stacked=True) - it adds the prices for Red and Green together and stacks them.
I am wanting a plot that is halfway between the two - it shows each group as a single bar, but overlays the values so you can see them all. The below figure (done in powerpoint) shows what behaviour I am looking for -> I want the image on the right.
Short of calculating all the values and then using the stacked option, is this possible?
This seems (to me) like a bad idea, since this representation leads to several problem. Will a reader understand that those are not staked bars? What happens when the front bar is taller than the ones behind?
In any case, to accomplish what you want, I would simply repeatedly call plot() on each subset of the data and using the same axes so that the bars are drawn on top of each other.
In your example, the "Red" prices are always higher, so I had to adjust the order to plot them in the back, or they would hide the "Green" bars.
fig,ax = plt.subplots()
my_groups = ['Red','Green']
df_group = df_unindexed.groupby("Color")
for color in my_groups:
temp_df = df_group.get_group(color)
temp_df.plot(kind='bar', ax=ax, x='Fruit', y='Price', color=color, label=color)
There are two problems with this kind of plot. (1) What if the background bar is smaller than the foreground bar? It would simply be hidden and not visible. (2) A chart like this is not distinguishable from a stacked bar chart. Readers will have severe problems interpreting it.
That being said, you can plot both columns individually.
import matplotlib.pyplot as plt
import pandas as pd
import io
data = io.StringIO('''Fruit,Color,Price
Apple,Red,1.5
Apple,Green,1.0
Pear,Red,2.5
Pear,Green,2.3
Lime,Green,0.5
Lime,Red,3.0''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Fruit', 'Color']).unstack()
df.columns = df.columns.droplevel()
plt.bar(df.index, df["Red"].values, label="Red")
plt.bar(df.index, df["Green"].values, label="Green")
plt.legend()
plt.show()
It seems like plotting a line connecting the mean values of box plots would be a simple thing to do, but I couldn't figure out how to do this plot in pandas.
I'm using this syntax to do the boxplot so that it automatically generate the box plot for Y vs. X device without having to do external manipulation of the data frame:
df.boxplot(column='Y_Data', by="Category", showfliers=True, showmeans=True)
One way I thought of doing is to just do a line plot by getting the mean values from the boxplot, but I'm not sure how to extract that information from the plot.
You can save the axis object that gets returned from df.boxplot(), and plot the means as a line plot using that same axis. I'd suggest using Seaborn's pointplot for the lines, as it handles a categorical x-axis nicely.
First let's generate some sample data:
import pandas as pd
import numpy as np
import seaborn as sns
N = 150
values = np.random.random(size=N)
groups = np.random.choice(['A','B','C'], size=N)
df = pd.DataFrame({'value':values, 'group':groups})
print(df.head())
group value
0 A 0.816847
1 A 0.468465
2 C 0.871975
3 B 0.933708
4 A 0.480170
...
Next, make the boxplot and save the axis object:
ax = df.boxplot(column='value', by='group', showfliers=True,
positions=range(df.group.unique().shape[0]))
Note: There's a curious positions argument in Pyplot/Pandas boxplot(), which can cause off-by-one errors. See more in this discussion, including the workaround I've employed here.
Finally, use groupby to get category means, and then connect mean values with a line plot overlaid on top of the boxplot:
sns.pointplot(x='group', y='value', data=df.groupby('group', as_index=False).mean(), ax=ax)
Your title mentions "median" but you talk about category means in your post. I used means here; change the groupby aggregation to median() if you want to plot medians instead.
You can get the value of the medians by using the .get_data() property of the matplotlib.lines.Line2D objects that draw them, without having to use seaborn.
Let bp be your boxplot created as bp=plt.boxplot(data). Then, bp is a dict containing the medians key, among others. That key contains a list of matplotlib.lines.Line2D, from which you can extract the (x,y) position as follows:
bp=plt.boxplot(data)
X=[]
Y=[]
for m in bp['medians']:
[[x0, x1],[y0,y1]] = m.get_data()
X.append(np.mean((x0,x1)))
Y.append(np.mean((y0,y1)))
plt.plot(X,Y,c='C1')
For an arbitrary dataset (data), this script generates this figure. Hope it helps!