How to create an Area plot - python

Is there any way to create an Area plot in Seaborn. I checked the documentation but I couldn't able to find it.
Here is the data that I want to plot.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'launch_year': [1957, 1958, 1959, 1960, 1961, 1957, 1958, 1959, 1960, 1961, 1957, 1958, 1959,
1960, 1961, 1957, 1958, 1959, 1960, 1961, 1957, 1958, 1959, 1960, 1961],
'state_code': ['China', 'China', 'China', 'China', 'China', 'France', 'France', 'France', 'France',
'France', 'Japan', 'Japan', 'Japan', 'Japan', 'Japan', 'Russia', 'Russia', 'Russia',
'Russia', 'Russia', 'United States', 'United States', 'United States', 'United States', 'United States'],
'value': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 5, 4, 8, 9, 1, 22, 18, 29, 41]}
# create a long format DataFrame
df = pd.DataFrame(data)
# pivot the DataFrame to a wide format
year_countries = df.pivot(index='launch_year', columns='state_code', values='value')
# display(year_countries)
state_code China France Japan Russia United States
launch_year
1957 0 0 0 2 1
1958 0 0 0 5 22
1959 0 0 0 4 18
1960 0 0 0 8 29
1961 0 0 0 9 41
I created a line plot using this code -
sns.relplot(data=year_countries, kind='line',
height=7, aspect=1.3,linestyle='solid')
plt.xlabel('Lanuch Year', fontsize=15)
plt.ylabel('Number of Launches', fontsize=15)
plt.title('Space Launches By Country',fontsize=17)
plt.show()
but the Plot isn't so clear when using a line chart
Also can't able to make the lines Solid and Sort the legends based on the values in descending order.

How about using pandas.DataFrame.plot with kind='area'.
Setting a seaborn style with plt.style.use('seaborn') is deprecated.
In addition, you need to manually sort the legend, as shown here. However, changing the legend order does not change the plot order.
xticks=range(1957, 1962) can be used to specify the xticks, otherwise the 'launch_year' is treated as floats on the x-axis
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2
ax = year_countries.plot(kind='area', figsize=(9, 6), xticks=range(1957, 1962))
ax.set_xlabel('Launch Year', fontsize=15)
ax.set_ylabel('Number of Launches', fontsize=15)
ax.set_title('Space Launches By Country', fontsize=17)
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0], reverse=True))
ax.legend(handles, labels)
plt.show()
Alternatively, use pandas.Categorical to set the order of the columns in df, prior to pivoting. This will ensure the plot order and legend order are the same (e.g. the first group in the legend is the first group in the plot stack).
# set the order of the column in df
df.state_code = pd.Categorical(df.state_code, sorted(df.state_code.unique())[::-1], ordered=True)
# now pivot df
year_countries = df.pivot(index='launch_year', columns='state_code', values='value')
# plot
ax = year_countries.plot(kind='area', figsize=(9, 6), xticks=range(1957, 1962))
ax.set_xlabel('Launch Year', fontsize=15)
ax.set_ylabel('Number of Launches', fontsize=15)
ax.set_title('Space Launches By Country', fontsize=17)
# move the legend
ax.legend(title='Countries', bbox_to_anchor=(1, 1.02), loc='upper left', frameon=False)

Related

How to assign the colors for the values and make piechart using plotly

I have this dataframe, for that I'm trying to create the piechart similar to the attached image.
Index
Category
SE
3
COL
2
PE
1
DP-PD
1
COL
1
OTH
1
I have tried the following, it's creating a pie chart, but not as expected.
import matplotlib.pyplot as plt
# assign data of lists.
data = {'index': ['SE', 'COL', 'PE', 'OTH', 'DP-PD'], 'Category': [3, 2, 1, 1,1]}
# Create DataFrame
df = pd.DataFrame(data)
plt.pie(df["Category"], labels = df["Category"],startangle=90)
plt.title("Observation statistics", fontsize = 24)
I need the same color which is mentioned in the legend for each category. These are the color codes:
{'DP-PD': '#1E90FF', 'ID': '#FFA500', 'ENC': '#D3D3D3', 'SE': '#FFFF00',
'COL': '#FF0000', 'GL': '#32CD32', 'COT': '#0000CD', 'PE': '#A52A2A',
'FI': '#000000', 'OTH': '#00BFFF'}
I'd like the following output:

How to set x-axis ticks to show for each datetime value in plotly.py express scatter plot?

In the docs for plotly.py tick formatting here, it states that you can set the tickmode to array and just specify the tickvals and ticktext e.g.
import plotly.graph_objects as go
go.Figure(go.Scatter(
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
y = [28.8, 28.5, 37, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9]
))
fig.update_layout(
xaxis = dict(
tickmode = 'array',
tickvals = [1, 3, 5, 7, 9, 11],
ticktext = ['One', 'Three', 'Five', 'Seven', 'Nine', 'Eleven']
)
)
fig.show()
But this does not seem to work when tickvals is a list of datetime objects.
What I want to do is show an x-axis tick for each point in my scatter plot where the x values are all datetime objects but this does not seem to work. No error is thrown and the graph is rendered as if I did not try update the x ticks. My code for this is below:
# lambda expression to convert datetime object to string of desired format
date_to_string_lambda = lambda x: x.strftime("%e %b")
fig.update_layout(
xaxis = dict(
tickmode = 'array',
# all points should have a corresponding tick
tickvals = list(fig.data[0].x),
# datetime value represented as %e %b string i.e. space padded day and abreviated month.
ticktext = list(map(date_to_string_lambda, list(fig.data[0].x))),
)
)
Instead of showing a tick for each value it goes to the default tick mode and shows ticks at intervals i.e.
Image of graph produced
The values for layout when print(fig) is run after the above code are below, where the xaxis dict is important. Note that the tickvals are no longer of type datetime.
'layout': {'hovermode': 'x',
'legend': {'title': {'text': ''}, 'tracegroupgap': 0, 'x': 0.01, 'y': 0.98},
'margin': {'b': 0, 'l': 0, 'r': 0, 't': 0},
'template': '...',
'title': {'text': ''},
'xaxis': {'anchor': 'y',
'domain': [0.0, 1.0],
'fixedrange': True,
'tickmode': 'array',
'ticktext': [27 Apr, 3 May, 9 May, 13 May, 20 May],
'tickvals': [2020-04-27 00:00:00, 2020-05-03 00:00:00,
2020-05-09 00:00:00, 2020-05-13 00:00:00,
2020-05-20 00:00:00],
'title': {'text': 'Date'}},
'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'fixedrange': True, 'title': {'text': 'Total Tests'}}}
This seems to be a bug with plotly.py, so is there a workaround for this?

Seaborn heat map with correlation to string

I'm trying to build a heatmap to illustrate the correlation between indexes and a range (string).
data = {'Report': [1,2,3,4],
'Hours': [30,45,85,24],
'Wage': [100,446,245,632],
'Worker': [321,63,456,234],
'Buyer': [36,53,71,52],
'Range': ['High', 'Medium', 'Low', 'Low']
}
df = pd.DataFrame(data, columns = ['Report', 'Hours', 'Wage', 'Worker', 'Buyer', 'Range'])
My expected result would be a heatmap with 'Hours', 'Wage', 'Worker', and 'Buyer' on the left as indexes and three categories in 'Range' on the bottom.
How do I achieve the desired result using seaborn heatmap?
Thanks in advance!
I appreciate any help!!
data = {'Report': [1,2,3,4],
'Hours': [30,45,85,24],
'Wage': [100,446,245,632],
'Worker': [321,63,456,234],
'Buyer': [36,53,71,52],
'Range': ['High', 'Medium', 'Low', 'Low']
}
df = pd.DataFrame(data, columns = ['Report', 'Hours', 'Wage', 'Worker', 'Buyer', 'Range'])
df_corr = df.corr()
fig, ax = plt.subplots(figsize=(12, 9))
sns.heatmap(df_corr, square=True, vmax=1, vmin=-1, center=0)
print(df_corr)
Report Hours Wage Worker Buyer
Report 1.000000 0.103434 0.774683 0.103496 0.595586
Hours 0.103434 1.000000 -0.333933 0.548300 0.845140
Wage 0.774683 -0.333933 1.000000 -0.542259 0.208270
Worker 0.103496 0.548300 -0.542259 1.000000 0.356177
Buyer 0.595586 0.845140 0.208270 0.356177 1.000000
Just calculate the correlation coefficients and draw them with a headmap.

Could not plot multiple horizontal bar side by side

I would like to plot multiple horizontal bar charts sharing the same y-axis. To elaborate, I have 4 dataframes, each representing a bar chart. I want to use these dataframes to plot 2 horizontal bar charts at the left and another 2 at the right. Right now, I am only able to display one horizontal bar chart at left and right. Below are my desired output, code, and error
data1 = {
'age': ['20-24 Years', '25-29 Years', '30-34 Years', '35-39 Years', '40-44 Years', '45-49 Years'],
'single_value': [97, 75, 35, 19, 15, 13]
}
data2 = {
'age': ['20-24 Years', '25-29 Years', '30-34 Years', '35-39 Years', '40-44 Years', '45-49 Years'],
'single_value': [98, 79, 38, 16, 15, 13]
}
data3 = {
'age': ['20-24 Years', '25-29 Years', '30-34 Years', '35-39 Years', '40-44 Years', '45-49 Years'],
'single_value': [89, 52, 22, 16, 12, 13]
}
data4 = {
'age': ['20-24 Years', '25-29 Years', '30-34 Years', '35-39 Years', '40-44 Years', '45-49 Years'],
'single_value': [95, 64, 27, 18, 15, 13]
}
df_male_1 = pd.DataFrame(data1)
df_male_2 = pd.DataFrame(data2)
df_female_1 = pd.DataFrame(data3)
df_female_2 = pd.DataFrame(data4)
fig, axes = plt.subplots(ncols=2, sharey=True, figsize=(12,12))
axes[0].barh(df_male_1['age'], df_male_1['single_value'], align='center',
color='red', zorder=10)
axes[0].barh(df_male_2['age'], df_male_2['single_value'], align='center',
color='blue', zorder=10)
axes[0].set(title='Age Group (Male)')
axes[1].barh(df_female_1['age'], df_female_1['single_value'],
align='center', color='pink', zorder=10)
axes[1].barh(df_female_2['age'], df_female_2['single_value'],
align='center', color='purple', zorder=10)
axes[1].set(title='Age Group (Female)')
axes[0].invert_xaxis()
axes[0].set(yticks=df_male_1['age'])
axes[0].yaxis.tick_right()
for ax in axes.flat:
ax.margins(0.09)
ax.grid(True)
fig.tight_layout()
fig.subplots_adjust(wspace=0.09)
plt.show()
Error output
The problem is that currently your bars are overlapping each other because they are center aligned by default. To get the desired figure, you have to align them at the edges. To have them adjacent to each other, you have to use negative and positive heights (horizontal width of bars). You can choose the value of height as per needs
Following is the modified code (only showing relevant part)
fig, axes = plt.subplots(ncols=2, sharey=True, figsize=(12,12))
axes[0].barh(df_male_1['age'], df_male_1['single_value'], align='edge', height=0.3,
color='red', zorder=10)
axes[0].barh(df_male_2['age'], df_male_2['single_value'], align='edge', height=-0.3,
color='blue', zorder=10)
axes[0].set(title='Age Group (Male)')
axes[1].barh(df_female_1['age'], df_female_1['single_value'], align='edge',height=0.3,
color='pink', zorder=10)
axes[1].barh(df_female_2['age'], df_female_2['single_value'], align='edge', height=-0.3,
color='purple', zorder=10)

seaborn plot from total

I have the following data frame:
df = pd.DataFrame({'group': ['Red', 'Red', 'Red', 'Blue', 'Blue', 'Blue'],
'valueA_found': [10, 40, 50, 20, 50, 70],
'valueA_total': [100,200, 210, 100, 200, 210],
'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-01-01', '2017-02-01', '2017-03-01']})
and can create a plot:
fig, ax = plt.subplots(figsize=(15,8))
sns.set_style("whitegrid")
g = sns.barplot(x="date", y="valueA_found", hue="group", data=df)
# g.set_yscale('log')
g.set_xticklabels(df.date, rotation=45)
g.set(xlabel='date', ylabel='value from total')
But, I would rather like to see below per each point in time:
as you can see per each model valueA_found is plotted as a bar and the total is plotted as a single bar.
Initially suggested, it would also be possible to plot the total as a line - but as outlined in the comments it is probably better to produce a bar as well. valueA_total i.e. the total should be the same per group per month.
An option might be to plot the total values in a desaturated/more transparent bar plot behind the first dataset.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn.apionly as sns
df = pd.DataFrame({'group': ['Red', 'Red', 'Red', 'Blue', 'Blue', 'Blue'],
'valueA': [10, 40, 50, 20, 50, 70],
'valueB': [100,200, 210, 100, 200, 210],
'date': ['2017-01-01', '2017-02-01', '2017-03-01',
'2017-01-01', '2017-02-01', '2017-03-01']})
fig, ax = plt.subplots(figsize=(6,4))
sns.barplot(x="date", y="valueB", hue="group", data=df,
ax=ax, palette={"Red":"#f3c4c4","Blue":"#c5d6f2" }, alpha=0.6)
sns.barplot(x="date", y="valueA", hue="group", data=df,
ax=ax, palette={"Red":"#d40000","Blue":"#0044aa" })
ax.set_xticklabels(df.date, rotation=45)
ax.set(xlabel='date', ylabel='value from total')
plt.show()
Or just putting one bar plot in the background, assuming that the totals of each group are always the same:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn.apionly as sns
df = pd.DataFrame({'group': ['Red', 'Red', 'Red', 'Blue', 'Blue', 'Blue'],
'valueA': [10, 40, 50, 20, 50, 70],
'valueB': [100,200, 210, 100, 200, 210],
'date': ['2017-01-01', '2017-02-01', '2017-03-01',
'2017-01-01', '2017-02-01', '2017-03-01']})
fig, ax = plt.subplots(figsize=(6,4))
sns.barplot(x="date", y="valueB", data=df[df.group=="Red"],
ax=ax, color="#e7e2e8", label="total")
sns.barplot(x="date", y="valueA", hue="group", data=df,
ax=ax, palette={"Red":"#d40000","Blue":"#0044aa" })
ax.set_xticklabels(df.date, rotation=45)
ax.set(xlabel='date', ylabel='value from total')
plt.show()

Categories