I've got a scatter plot and want add a straight line for mean, 3*std+mean and 3*std-mean. I seem to have the mean plotting but can't work out the std! Thanks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
for element in df_na.loc[:, 'Ag_ppb':'Zr_ppb']:
temp_df = df_na.loc[:, ['Date', element]].dropna()
fig =plt.figure()
plt.scatter(temp_df['Date'], temp_df[element],c='black',s=10)
plt.plot(temp_df['Date'],[df_na[element].mean()]*len(x))
plt.xlabel('Date')
plt.xticks(rotation =90, fontsize=5)
plt.ylabel(element)
plt.show()
You want to use dataframe.std():
df_na.std(axis=0,skipna=True)[element]
So I incorporated the above which works, see below:
plt.plot(temp_df['Date'],[temp_df[element].mean(axis=0,skipna=True)]*len(x), c='red',label='Mean')
but the following won't plot the 3* std + mean .
plt.plot(temp_df['Date'],[temp_df[element].mean()]+[temp_df[element].std(axis=0,skipna=True)*3]*len(x),label='3xstd')
The above worked but don't adding the mean to 3*std doesn't plot as a line.
Related
I am trying to plot a density chart. Below you can see data and chart
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = {'type_sale':[100,200,400,400,200,400,300,200,210,300],
'bool':[0,1,0,1,1,0,1,1,0,1],
}
df1 = pd.DataFrame(data, columns = ['type_sale',
'bool'])
df1['bool']= df1['bool'].astype('int32')
I tried with the command above but is not working. Can anybody help me how to solve this problem ?
plot_density_chart(df1[['type_sale', 'bool']], "bool", 'type_sale',
category_var="type_sale", title='prevalence',
xlabel='Type_sale', logx="Yes", vline=None,
save_figure_name = 'type_sale_prevalence.pdf')
You can use seaborn to plot the density chart:
import seaborn as sns
g = sns.FacetGrid(df1,hue='bool')
g = g.map(sns.kdeplot,'type_sale',fill=True,alpha=0.3)
g.add_legend()
g.fig.suptitle('Prevalence', fontsize=16)
g.axes[0,0].set_xlabel('Type_sale')
Which gives you the figure:
If you want to set x-axis to log, add this :
g.axes[0,0].set_xscale('log')
Currently I'm doing some data visualization using python, matplotlib and mplcursor that requires to show different parameters and values at the same time in a certain time period.
Sample CSV data that was extracted from a system:
https://i.stack.imgur.com/fjd1d.png
My expected output would look like this:
https://i.stack.imgur.com/zXGXA.png
Found the same case but they were using numpy functions: Add the vertical line to the hoverbox (see pictures)
Hoping someone will suggest what is the best approach of my problem.
Code below:
import matplotlib.pyplot as plt
import numpy as np
import mplcursors
import pandas as pd
fig, ax=plt.subplots()
y1=ax.twinx()
y2=ax.twinx()
y2.spines.right.set_position(("axes", 1.05))
df=pd.read_csv(r"C:\Users\OneDrive\Desktop\sample.csv")
time=df['Time']
yd1=df['Real Power']
yd2=df['Frequency']
yd3=df['SOC']
l1=ax.plot(time,yd1,color='black', label='Real Power')
l2=y1.plot(time,yd2, color='blue', label='Frequency')
l3=y2.plot(time,yd3, color='orange', label='SOC')
df=pd.DataFrame(df)
arr=df.to_numpy()
print(arr)
def show_annotation(sel):
x=sel.target[0]
annotation_str = df['Real Power'][sel.index]
#sel.annotation.set_text(annotation_str)
fig.autofmt_xdate()
cursor=mplcursors.cursor(hover=True)
cursor.connect('add', show_annotation)
plt.show()```
I have a distance matrix which I normalized, trimmed the row and column headers with python regular expressions and tried to make a clustered heatmap from it with the following code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv('distance_matrix_Mult_Align(distance).csv', index_col=0)
row_sums = df.sum(axis=1)
new_matrix = df / row_sums[:, np.newaxis]
def acc_id(s):
import re
match = re.search('\|(.*)\|', s)
if match:
return match.group(1)
sns.clustermap(new_matrix.rename(columns=acc_id, index=acc_id),
row_cluster=False,
xticklabels=True,
yticklabels=True,
cmap='RdBu',
center=0,
vmin=0,
vmax=1)
plt.figure()
plt.show
My clustered map look like this:
I have tried to read the documentations of clustermap and pyplot: https://seaborn.pydata.org/generated/seaborn.clustermap.html
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html#matplotlib.pyplot.figure
But I can not seem to understand how to make the plot look something useful. I would really appreciate any help. Thanks!
The problem is in your vmax = 1 argument. If you look at the maximum value in the whole dataset using new_matrix.max().max() , it is about 0.17.
So, just removing vmax as: or just set a lower value for vmax
Here I am trying to separate the data with the factor male or not by plotting Age on x-axis and Fare on y-axis and I want to display two labels in the legend differentiating male and female with respective colors.Can anyone help me do this.
Code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male']=df['Sex']=='male'
sc1= plt.scatter(df['Age'],df['Fare'],c=df['male'])
plt.legend()
plt.show()
You could use the seaborn library which builds on top of matplotlib to perform the exact task you require. You can scatterplot 'Age' vs 'Fare' and colour code it by 'Sex' by just passing the hue parameter in sns.scatterplot, as follows:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure()
# No need to call plt.legend, seaborn will generate the labels and legend
# automatically.
sns.scatterplot(df['Age'], df['Fare'], hue=df['Sex'])
plt.show()
Seaborn generates nicer plots with less code and more functionality.
You can install seaborn from PyPI using pip install seaborn.
Refer: Seaborn docs
PathCollection.legend_elements method
can be used to steer how many legend entries are to be created and how they
should be labeled.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male'] = df['Sex']=='male'
sc1= plt.scatter(df['Age'], df['Fare'], c=df['male'])
plt.legend(handles=sc1.legend_elements()[0], labels=['male', 'female'])
plt.show()
Legend guide and Scatter plots with a legend for reference.
This can be achieved by segregating the data in two separate dataframe and then, label can be set for these dataframe.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
subset1 = df[(df['Sex'] == 'male')]
subset2 = df[(df['Sex'] != 'male')]
plt.scatter(subset1['Age'], subset1['Fare'], label = 'Male')
plt.scatter(subset2['Age'], subset2['Fare'], label = 'Female')
plt.legend()
plt.show()
enter image description here
I have an existing plot that was created with pandas like this:
df['myvar'].plot(kind='bar')
The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)
How can I format the y axis as percentages without changing the line above?
Here is the solution I found but requires that I redefine the plot:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
Link to the above solution: Pyplot: using percentage on x axis
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):
import ...
import matplotlib.ticker as mtick
ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Update
PercentFormatter was introduced into Matplotlib proper in version 2.1.0.
pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,5))
# you get ax from here
ax = df.plot()
type(ax) # matplotlib.axes._subplots.AxesSubplot
# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
Jianxun's solution did the job for me but broke the y value indicator at the bottom left of the window.
I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):
import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter
df = pd.DataFrame(np.random.randn(100,5))
ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
Generally speaking I'd recommend using FuncFormatter for label formatting: it's reliable, and versatile.
For those who are looking for the quick one-liner:
plt.gca().set_yticklabels([f'{x:.0%}' for x in plt.gca().get_yticks()])
this assumes
import: from matplotlib import pyplot as plt
Python >=3.6 for f-String formatting. For older versions, replace f'{x:.0%}' with '{:.0%}'.format(x)
I'm late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.
Echoing #Mad Physicist answer, using the package PercentFormatter it would be:
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer
I propose an alternative method using seaborn
Working code:
import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)
You can do this in one line without importing anything:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{}%'.format))
If you want integer percentages, you can do:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{:.0f}%'.format))
You can use either ax.yaxis or plt.gca().yaxis. FuncFormatter is still part of matplotlib.ticker, but you can also do plt.FuncFormatter as a shortcut.
Based on the answer of #erwanp, you can use the formatted string literals of Python 3,
x = '2'
percentage = f'{x}%' # 2%
inside the FuncFormatter() and combined with a lambda expression.
All wrapped:
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: f'{y}%'))
Another one line solution if the yticks are between 0 and 1:
plt.yticks(plt.yticks()[0], ['{:,.0%}'.format(x) for x in plt.yticks()[0]])
add a line of code
ax.yaxis.set_major_formatter(ticker.PercentFormatter())