Order the x axis in seaborn for two columns - python

I wanna plot a dataframe with seaborn lineplot, which has the following structure:
A Year Month diff
Der 2019 1 3
Der 2019 2 4
Die 2019 1 1
Die 2019 2 1
Right now I am trying:
sns.lineplot(x= ['Year', 'Month'], y='diff', hue='A', ci=None, data = df)
plt.show()
How can I get a timeline graph starting with 2019 1 and going over the order of the months without having a time column?

You could create a new date column from the year and month and just set the day to be 1:
from datetime import date
import matplotlib.dates as mdates
df['date'] = df.apply(lambda row: date(row['Year'], row['Month'], 1), axis=1)
ax = sns.lineplot(x='date', y='diff', hue='A', ci=None, data=df)
# To only show x-axis ticks once per month:
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))

Related

Seaborn Barplot With Multiple Years and Weeks [duplicate]

I am having difficulty adding a multi level axis with month and then year to my plot and I have been unable to find any answers anywhere. I have a dataframe which contains the upload date as a datetime dtype and then the year and month for each row. See Below:
Upload Date Year Month DocID
0 2021-03-22 2021 March DOC146984
1 2021-12-16 2021 December DOC173111
2 2021-12-07 2021 December DOC115350
3 2021-10-29 2021 October DOC150149
4 2021-03-12 2021 March DOC125480
5 2021-06-25 2021 June DOC101062
6 2021-05-03 2021 May DOC155916
7 2021-11-14 2021 November DOC198519
8 2021-03-20 2021 March DOC159523
9 2021-07-19 2021 July DOC169328
10 2021-04-13 2021 April DOC182660
11 2021-10-08 2021 October DOC176871
12 2021-09-19 2021 September DOC185854
13 2021-05-16 2021 May DOC192329
14 2021-06-29 2021 June DOC142190
15 2021-11-30 2021 November DOC140231
16 2021-11-12 2021 November DOC145392
17 2021-11-10 2021 November DOC178159
18 2021-11-06 2021 November DOC160932
19 2021-06-16 2021 June DOC131448
What I am trying to achieve is to build a bar chart which has the count for number of documents in each month and year. The graph would look something like this:
The main thing is that the x axis is split by each month and then further by each year, rather than me labelling each column with month and year (e.g 'March 2021'). However I can't figure out how to achieve this. I've tried using a countplot but it only allows me to choose month or year (See Below). I have also tried groupby but the end product is always the same. Any Ideas?
This is using randomly generated data, see the code to replicate below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
from datetime import date, timedelta
from random import choices
np.random.seed(42)
# initializing dates ranges
test_date1, test_date2 = date(2020, 1, 1), date(2021, 6, 30)
# initializing K
K = 2000
res_dates = [test_date1]
# loop to get each date till end date
while test_date1 != test_date2:
test_date1 += timedelta(days=1)
res_dates.append(test_date1)
# random K dates from pack
res = choices(res_dates, k=K)
# Generating dataframe
df = pd.DataFrame(res, columns=['Upload Date'])
# Generate other columns
df['Upload Date'] = pd.to_datetime(df['Upload Date'])
df['Year'] = df['Upload Date'].dt.year
df['Month'] = df['Upload Date'].dt.month_name()
df['DocID'] = np.random.randint(100000,200000, df.shape[0]).astype('str')
df['DocID'] = 'DOC' + df['DocID']
# plotting graph
sns.set_color_codes("pastel")
f, ax = plt.subplots(figsize=(20,8))
sns.countplot(x='Month', data=df)
A new column with year and month in numeric form can serve to indicate the x-positions, correctly ordered. The x-tick labels can be renamed to the month names. Vertical lines and manual placing of the year labels lead to the final plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
test_date1, test_date2 = '20200101', '20210630'
months = pd.date_range('2021-01-01', periods=12, freq='M').strftime('%B')
K = 2000
df = pd.DataFrame(np.random.choice(pd.date_range(test_date1, test_date2), K), columns=['Upload Date'])
df['Year'] = df['Upload Date'].dt.year
# df['Month'] = pd.Categorical(df['Upload Date'].dt.strftime('%B'), categories=months)
df['YearMonth'] = df['Upload Date'].dt.strftime('%Y%m').astype(int)
df['DocID'] = np.random.randint(100000, 200000, df.shape[0]).astype('str')
df['DocID'] = 'DOC' + df['DocID']
sns.set_style("white")
sns.set_color_codes("pastel")
fig, ax = plt.subplots(figsize=(20, 8))
sns.countplot(x='YearMonth', data=df, ax=ax)
sns.despine()
yearmonth_labels = [int(l.get_text()) for l in ax.get_xticklabels()]
ax.set_xticklabels([months[ym % 100 - 1] for ym in yearmonth_labels])
ax.set_xlabel('')
# calculate the positions of the borders between the years
pos = []
years = []
prev = None
for i, ym in enumerate(yearmonth_labels):
if ym // 100 != prev:
pos.append(i)
prev = ym // 100
years.append(prev)
pos.append(len(yearmonth_labels))
pos = np.array(pos) - 0.5
# vertical lines to separate the years
ax.vlines(pos, 0, -0.12, color='black', lw=0.8, clip_on=False, transform=ax.get_xaxis_transform())
# years at the center of their range
for year, pos0, pos1 in zip(years, pos[:-1], pos[1:]):
ax.text((pos0 + pos1) / 2, -0.07, year, ha='center', clip_on=False, transform=ax.get_xaxis_transform())
ax.set_xlim(pos[0], pos[-1])
ax.set_ylim(ymin=0)
plt.tight_layout()
plt.show()

Make datetime line look nice on seaborn plot x axis

How do you reformat from datetime to Week 1, Week 2... to plot onto a seaborn line chart?
Input
Date Ratio
0 2019-10-04 0.350365
1 2019-10-04 0.416058
2 2019-10-11 0.489051
3 2019-10-18 0.540146
4 2019-10-25 0.598540
5 2019-11-08 0.547445
6 2019-11-01 0.722628
7 2019-11-15 0.788321
8 2019-11-22 0.875912
9 2019-11-27 0.948905
Desired output
I was able to cheese it by matching the natural index of the dataframe to the week. I wonder if there's another way to do this.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'Date': ['2019-10-04',
'2019-10-04',
'2019-10-11',
'2019-10-18',
'2019-10-25',
'2019-11-08',
'2019-11-01',
'2019-11-15',
'2019-11-22',
'2019-11-27'],
'Ratio': [0.350365,
0.416058,
0.489051,
0.540146,
0.598540,
0.547445,
0.722628,
0.788321,
0.875912,
0.948905]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
graph = sns.lineplot(data=df,x='Date',y='Ratio')
plt.show()
# First plot looks bad.
week_mapping = dict(zip(df['Date'].unique(),range(len(df['Date'].unique()))))
df['Week'] = df['Date'].map(week_mapping)
graph = sns.lineplot(data=df,x='Week',y='Ratio')
plt.show()
# This plot looks better, but method seems cheesy.
It looks like your data is already spaced weekly, so you can just do:
df.groupby('Date',as_index=False)['Ratio'].mean().plot()
Output:
You can make a new column with the week number and use that as your x value. This would give you the week of the year. If you want to start your week numbers with 0, just subtract the week number of the first date from the value (see the commented out section of the code)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime as dt
data = {'Date': ['2019-10-04',
'2019-10-04',
'2019-10-11',
'2019-10-18',
'2019-10-25',
'2019-11-08',
'2019-11-01',
'2019-11-15',
'2019-11-22',
'2019-11-27'],
'Ratio': [0.350365,
0.416058,
0.489051,
0.540146,
0.598540,
0.547445,
0.722628,
0.788321,
0.875912,
0.948905]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
# To get the week number of the year
df.loc[:, 'Week'] = df['Date'].dt.week
# Or you can use the line below for the exact output you had
#df.loc[:, 'Week'] = df['Date'].dt.week - (df.sort_values(by='Date').iloc[0,0].week)
graph = sns.lineplot(data=df,x='Week',y='Ratio')
plt.show()

How can i replace daily x-axis labels with months but keep chart showing daily values?

I want to draw a chart from my dataframe (which has got daily data) and want the xlables to show up as months (covering all the daily data period).
For example if i have data from 2010-01-01 till 2010-12-31, i want 365 days data points, but in the x-axis i want just Jan, Feb, Mar, etc... each of those month covering the exact period of the corresponding days. Strugglling in getting this...
This is the DataFrame:
Daily CP ROI S2 positive Month Day
Date
2008-01-02 100087.000 True January 2
2008-01-03 101967.000 True January 3
2008-01-04 102167.000 True January 4
2008-01-07 104004.000 True January 7
2008-01-08 105192.000 True January 8
pl_plot = pl_plot.set_index('Date')
figure(num=None, figsize=(20, 8), dpi=80, facecolor='w', edgecolor='k')
plt.ylabel('USD', fontsize=18)
plt.rc('ytick',labelsize=16)
final_value = new_df_test.iloc[ei-2]['Daily_Compound_ROI']
roi = round(((final_value - very_init_budget)/very_init_budget)*100,3)
roi_s=str(roi)
plt_title_s = new_title+'\nPeriod: '+y_init_s+'-'+y_end_s+', (ROI: '+roi_s+'%)'
plt.title(plt_title_s, fontsize=24)
pl_plot['positive'] = pl_plot['Daily_Compound_ROI'] > 100000
pl_plot['Daily CP ROI S2'].plot(kind='bar', color=pl_plot.positive.map({True: '#5cb85c', False: 'r'}))
ax1 = plt.axes()
plt.rc('xtick',labelsize=14)
x_axis = ax1.axes.get_xaxis()
x_axis.set_visible(True)
I want to get something like the below (keep red color when value is below 100000 otherwise green color for the bar), but in the x-axis would like to see Jan, Feb, Mar, etc... without any separation between the months (and dont want to see each single day as i see right now).
enter image description here
IIUC you could give seaborn a try.
If you add a month- and day-column to your dataframe, you can make a barpot with monthly separated blocks of bars per day.
Example:
# import pandas as pd
# import numpy as np
# df = pd.DataFrame(index=pd.date_range('15.8.2019', '27.11.2019'))
# df['Value'] = np.random.random(len(df))
# df['Month'] = df.index.month_name()
# df['Day'] = df.index.day
# Value Month Day
# 2019-08-15 0.813130 August 15
# 2019-08-16 0.850873 August 16
# 2019-08-17 0.728416 August 17
# 2019-08-18 0.326072 August 18
# 2019-08-19 0.880385 August 19
# ... ... ..
# 2019-11-23 0.771801 November 23
# 2019-11-24 0.638811 November 24
# 2019-11-25 0.824542 November 25
# 2019-11-26 0.451075 November 26
# 2019-11-27 0.151469 November 27
given this dataframe, you could do
import seaborn as sns
sns.catplot(kind='bar', x='Month', y='Value', hue='Day', data=df, color='b', legend=False)
results in
That looks quite strange as i am answering my own question.
Anyway i found the way:
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
sns.set(font_scale=1.5, style="whitegrid")
fig, ax = plt.subplots(figsize=(20, 8))
ax.bar(df.index.values, df['Daily CP ROI S2'].values, width=1, color=pl_plot.positive.map({True: '#5cb85c', False: 'r'}))
ax.set(#xlabel="Period", ylabel="USD", title=title1)
ax.title.set(fontsize=24)
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax.xaxis.set_major_formatter(DateFormatter("%b"))
plt.show()
the above fixes the problem and what i get is the below:
Chart as i wanted it
The only problem left is now i need to fit the title within the chart :) Will look after this, but if any suggestions, please go ahead! Thanks.

Time series plot with week, year and number of hits

In[1] df
Out[0] week year number_of_cases
8 2010 583.0
9 2010 116.0
10 2010 358.0
11 2010 420.0
... ... ...
52 2010 300.0
1 2011 123.0
2 2011 145.0
How may I create a timeline graph where my y-axis is the number of cases and my x axis is increasing week number that corresponds with the year?
I want it to go from week 1 to 52 in the year 2010 then week 1 to 52 in the year 2011. And have this as one large graph to see how the number of cases vary each year according to week.
Python 3, Pandas.
You can create a datetime column based on the year and the week, plot 'number_of_cases' against the date, and then use mdates to format the x-ticks.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Determine the date
df['date'] = pd.to_datetime(df.assign(day=1, month=1)[['year', 'month', 'day']])+pd.to_timedelta(df.week*7, unit='days')
# Plot
fig, ax = plt.subplots()
df.plot(x='date', y='number_of_cases', marker='o', ax=ax)
# Format the x-ticks
myFmt = mdates.DateFormatter('%Y week %U')
ax.xaxis.set_major_formatter(myFmt)

How to plot two different dataframe columns at time based on the same datetime x-axis

Hi I have a dataframe like this:
Date Influenza[it] Febbre[it] Cefalea[it] Paracetamolo[it] \
0 2008-01 989 2395 1291 2933
1 2008-02 962 2553 1360 2547
2 2008-03 1029 2309 1401 2735
3 2008-04 1031 2399 1137 2296
Unnamed: 6 tot_incidence
0 NaN 4.56
1 NaN 5.98
2 NaN 6.54
3 NaN 6.95
I'd like to plot different figures with on x-axis the Date column and the y-axis the Influenza[it] column and another column like Febbre[it]. Then again x-axis the Date column, y-axis Influenza[it] column and another column (ex. Paracetamolo[it]) and so on. I'm trying to figure out if there is a fast way to make it without completely manipulate the dataframes.
You can simply plot 3 different subplots.
import pandas as pd
import matplotlib.pyplot as plt
dic = {"Date" : ["2008-01","2008-02", "2008-03", "2008-04"],
"Influenza[it]" : [989,962,1029,1031],
"Febbre[it]" : [2395,2553,2309,2399],
"Cefalea[it]" : [1291,1360,1401,1137],
"Paracetamolo[it]" : [2933,2547,2735,2296]}
df = pd.DataFrame(dic)
#optionally convert to datetime
df['Date'] = pd.to_datetime(df['Date'])
fig, ax = plt.subplots(1,3, figsize=(13,7))
df.plot(x="Date", y=["Influenza[it]","Febbre[it]" ], ax=ax[0])
df.plot(x="Date", y=["Influenza[it]","Cefalea[it]" ], ax=ax[1])
df.plot(x="Date", y=["Influenza[it]","Paracetamolo[it]" ], ax=ax[2])
#optionally equalize yaxis limits
for a in ax:
a.set_ylim([800, 3000])
plt.show()
If you want to plot each plot separately in a jupyter notebook, the following might do what you want.
Additionally we convert the dates from format year-week to a datetime to be able to plot them with matplotlib.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
dic = {"Date" : ["2008-01","2008-02", "2008-03", "2008-04"],
"Influenza[it]" : [989,962,1029,1031],
"Febbre[it]" : [2395,2553,2309,2399],
"Cefalea[it]" : [1291,1360,1401,1137],
"Paracetamolo[it]" : [2933,2547,2735,2296]}
df = pd.DataFrame(dic)
#convert to datetime, format year-week -> date (monday of that week)
df['Date'] = [ date + "-1" for date in df['Date']] # add "-1" indicating monday of that week
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%W-%w")
cols = ["Febbre[it]", "Cefalea[it]", "Paracetamolo[it]"]
for col in cols:
plt.close()
fig, ax = plt.subplots(1,1)
ax.set_ylim([800, 3000])
ax.plot(df.Date, df["Influenza[it]"], label="Influenza[it]")
ax.plot(df.Date, df[col], label=col)
ax.legend()
plt.show()

Categories