Seaborn heat map with correlation to string - python

I'm trying to build a heatmap to illustrate the correlation between indexes and a range (string).
data = {'Report': [1,2,3,4],
'Hours': [30,45,85,24],
'Wage': [100,446,245,632],
'Worker': [321,63,456,234],
'Buyer': [36,53,71,52],
'Range': ['High', 'Medium', 'Low', 'Low']
}
df = pd.DataFrame(data, columns = ['Report', 'Hours', 'Wage', 'Worker', 'Buyer', 'Range'])
My expected result would be a heatmap with 'Hours', 'Wage', 'Worker', and 'Buyer' on the left as indexes and three categories in 'Range' on the bottom.
How do I achieve the desired result using seaborn heatmap?
Thanks in advance!
I appreciate any help!!

data = {'Report': [1,2,3,4],
'Hours': [30,45,85,24],
'Wage': [100,446,245,632],
'Worker': [321,63,456,234],
'Buyer': [36,53,71,52],
'Range': ['High', 'Medium', 'Low', 'Low']
}
df = pd.DataFrame(data, columns = ['Report', 'Hours', 'Wage', 'Worker', 'Buyer', 'Range'])
df_corr = df.corr()
fig, ax = plt.subplots(figsize=(12, 9))
sns.heatmap(df_corr, square=True, vmax=1, vmin=-1, center=0)
print(df_corr)
Report Hours Wage Worker Buyer
Report 1.000000 0.103434 0.774683 0.103496 0.595586
Hours 0.103434 1.000000 -0.333933 0.548300 0.845140
Wage 0.774683 -0.333933 1.000000 -0.542259 0.208270
Worker 0.103496 0.548300 -0.542259 1.000000 0.356177
Buyer 0.595586 0.845140 0.208270 0.356177 1.000000
Just calculate the correlation coefficients and draw them with a headmap.

Related

How to assign the colors for the values and make piechart using plotly

I have this dataframe, for that I'm trying to create the piechart similar to the attached image.
Index
Category
SE
3
COL
2
PE
1
DP-PD
1
COL
1
OTH
1
I have tried the following, it's creating a pie chart, but not as expected.
import matplotlib.pyplot as plt
# assign data of lists.
data = {'index': ['SE', 'COL', 'PE', 'OTH', 'DP-PD'], 'Category': [3, 2, 1, 1,1]}
# Create DataFrame
df = pd.DataFrame(data)
plt.pie(df["Category"], labels = df["Category"],startangle=90)
plt.title("Observation statistics", fontsize = 24)
I need the same color which is mentioned in the legend for each category. These are the color codes:
{'DP-PD': '#1E90FF', 'ID': '#FFA500', 'ENC': '#D3D3D3', 'SE': '#FFFF00',
'COL': '#FF0000', 'GL': '#32CD32', 'COT': '#0000CD', 'PE': '#A52A2A',
'FI': '#000000', 'OTH': '#00BFFF'}
I'd like the following output:

Create a Single Boxplot from Multiple DataFrames

I have multiple data frames with different no. of rows and same no.of columns i.e
DATA
female_df1 = pd.DataFrame({'ID': [5,21,17], 'value': [85, 56.7, 77.9]})
female_df2 = pd.DataFrame({'ID': [75,1,7], 'value': [39, 66.7, 77.9]})
female_df3 = pd.DataFrame({'ID': [5,21,17], 'value': [85, 56.7, 77.9]})
female_df4 = pd.DataFrame({'ID': [5,21,17], 'value': [85, 56.7, 77.9]})
male_df1 = pd.DataFrame({'ID': [35,1,7], 'value': [15, 36.7, 87.9]})
male_df2 = pd.DataFrame({'ID': [5,11,17], 'value': [99, 96.7, 97.9]})
male_df3 = pd.DataFrame({'ID': [35,41,37], 'value': [15, 16.7, 17.9]})
male_df4 = pd.DataFrame({'ID': [51,11,27], 'value': [35, 36.7, 37.9]})
Now, I would like to plot a single boxplot from above multiple df's. I used below code to do so
fig, ax2 = plt.subplots(figsize = (15,10))
vec = [female_df1['value'].values,female_df2['value'].values,female_df3['value'].values,female_df4['value'].values]
labels = ['f1','f2','f3', 'f4']
ax2.boxplot(vec, labels = labels)
plt.show()
The Output in female values boxplot, now similarly I have Male data frames with values, and I want to plot side by side (i.e fbeta1.0 and mbeta1.0) to observe the difference in data distribution. Valuable insights much appreciated
Desired Output plot:
Desired Output
This is a bit manual, but should do what you need...
### DATA ###
female_df1 = pd.DataFrame({'ID': [5,21,17], 'value': [85, 56.7, 77.9]})
female_df2 = pd.DataFrame({'ID': [75,1,7], 'value': [39, 66.7, 77.9]})
female_df3 = pd.DataFrame({'ID': [5,21,17], 'value': [85, 56.7, 77.9]})
female_df4 = pd.DataFrame({'ID': [5,21,17], 'value': [85, 56.7, 77.9]})
male_df1 = pd.DataFrame({'ID': [35,1,7], 'value': [15, 36.7, 87.9]})
male_df2 = pd.DataFrame({'ID': [5,11,17], 'value': [99, 96.7, 97.9]})
male_df3 = pd.DataFrame({'ID': [35,41,37], 'value': [15, 16.7, 17.9]})
male_df4 = pd.DataFrame({'ID': [51,11,27], 'value': [35, 36.7, 37.9]})
### PLOTTING ###
fig, ax = plt.subplots(1,4, figsize = (15,6))
ax[0].boxplot([female_df1['value'].values, male_df1['value'].values], labels = ['f1','m1'])
ax[1].boxplot([female_df2['value'].values, male_df2['value'].values], labels = ['f1','m1'])
ax[2].boxplot([female_df3['value'].values, male_df3['value'].values], labels = ['f1','m1'])
ax[3].boxplot([female_df4['value'].values, male_df4['value'].values], labels = ['f1','m1'])
ax[0].set_title("M1 & F1")
ax[1].set_title("M2 & F2")
ax[2].set_title("M3 & F3")
ax[3].set_title("M4 & F4")
plt.show()
Plot

Stacked barplot over multiindex pandas dataframe

import pandas as pd
import numpy as np
np.random.seed(365)
rows = 100
data = {'Month': np.random.choice(['2014-01', '2014-02', '2014-03', '2014-04'], size=rows),
'Code': np.random.choice(['A', 'B', 'C'], size=rows),
'ColA': np.random.randint(5, 125, size=rows),
'ColB': np.random.randint(0, 51, size=rows),}
df = pd.DataFrame(data)
df = df[((~((df.Code=='A')&(df.Month=='2014-04')))&(~((df.Code=='C')&(df.Month=='2014-03'))))]
dfg = df.groupby(['Code', 'Month']).sum()
For above. I wish to plot a stacked plot..
dfg.unstack(level=0).plot(kind='bar', stacked =True)
I wish to stack over 'Code' column. But, above is stacking over 'Month' Why?. How to better plot stacked plot with this?
The index of the input dataframe is used by default as x-value in plot.bar
IIUC, you need:
dfg.unstack(level=1).plot(kind='bar', stacked=True)
legend position:
ax = dfg.unstack(level=1).plot(kind='bar', stacked=True, legend=False)
ax.figure.legend(loc='center left', bbox_to_anchor=(1, 0.5))

python: seaborn lineplot customize marker size based on values of a column

I have a data like the following:
#df
df = pd.DataFrame({
'id': {0: -3, 1: 2, 2: -3, 3: 1},
'val': {0: 0.4, 1: 0.03, 2: 0.88, 3: 1.3},
'indicator': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'count': {0: 40000, 1: 5779, 2: 3000, 3: 31090}
})
df
if I do the following code, I will have:
sns.relplot(x = 'id', y = 'val', hue = 'indicator', size = 'count', data = df)
I want to have a line connecting the dots. But if I change the plot to a line plot, I will have any graphs.
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', size = 'count', data = df)
Seems like you want to combine a lineplot with a scatterplot
plt.figure()
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df)
sns.scatterplot(x = 'id', y = 'val', hue = 'indicator', size = 'count', data = df)

How to create an Area plot

Is there any way to create an Area plot in Seaborn. I checked the documentation but I couldn't able to find it.
Here is the data that I want to plot.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'launch_year': [1957, 1958, 1959, 1960, 1961, 1957, 1958, 1959, 1960, 1961, 1957, 1958, 1959,
1960, 1961, 1957, 1958, 1959, 1960, 1961, 1957, 1958, 1959, 1960, 1961],
'state_code': ['China', 'China', 'China', 'China', 'China', 'France', 'France', 'France', 'France',
'France', 'Japan', 'Japan', 'Japan', 'Japan', 'Japan', 'Russia', 'Russia', 'Russia',
'Russia', 'Russia', 'United States', 'United States', 'United States', 'United States', 'United States'],
'value': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 5, 4, 8, 9, 1, 22, 18, 29, 41]}
# create a long format DataFrame
df = pd.DataFrame(data)
# pivot the DataFrame to a wide format
year_countries = df.pivot(index='launch_year', columns='state_code', values='value')
# display(year_countries)
state_code China France Japan Russia United States
launch_year
1957 0 0 0 2 1
1958 0 0 0 5 22
1959 0 0 0 4 18
1960 0 0 0 8 29
1961 0 0 0 9 41
I created a line plot using this code -
sns.relplot(data=year_countries, kind='line',
height=7, aspect=1.3,linestyle='solid')
plt.xlabel('Lanuch Year', fontsize=15)
plt.ylabel('Number of Launches', fontsize=15)
plt.title('Space Launches By Country',fontsize=17)
plt.show()
but the Plot isn't so clear when using a line chart
Also can't able to make the lines Solid and Sort the legends based on the values in descending order.
How about using pandas.DataFrame.plot with kind='area'.
Setting a seaborn style with plt.style.use('seaborn') is deprecated.
In addition, you need to manually sort the legend, as shown here. However, changing the legend order does not change the plot order.
xticks=range(1957, 1962) can be used to specify the xticks, otherwise the 'launch_year' is treated as floats on the x-axis
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2
ax = year_countries.plot(kind='area', figsize=(9, 6), xticks=range(1957, 1962))
ax.set_xlabel('Launch Year', fontsize=15)
ax.set_ylabel('Number of Launches', fontsize=15)
ax.set_title('Space Launches By Country', fontsize=17)
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0], reverse=True))
ax.legend(handles, labels)
plt.show()
Alternatively, use pandas.Categorical to set the order of the columns in df, prior to pivoting. This will ensure the plot order and legend order are the same (e.g. the first group in the legend is the first group in the plot stack).
# set the order of the column in df
df.state_code = pd.Categorical(df.state_code, sorted(df.state_code.unique())[::-1], ordered=True)
# now pivot df
year_countries = df.pivot(index='launch_year', columns='state_code', values='value')
# plot
ax = year_countries.plot(kind='area', figsize=(9, 6), xticks=range(1957, 1962))
ax.set_xlabel('Launch Year', fontsize=15)
ax.set_ylabel('Number of Launches', fontsize=15)
ax.set_title('Space Launches By Country', fontsize=17)
# move the legend
ax.legend(title='Countries', bbox_to_anchor=(1, 1.02), loc='upper left', frameon=False)

Categories