Change line width of specific line in line plot pandas/matplotlib - python

I am plotting a dataframe that looks like this.
Date 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Date
01 Jan 12.896 13.353 12.959 13.011 13.073 12.721 12.643 12.484 12.876 13.102
02 Jan 12.915 13.421 12.961 13.103 13.125 12.806 12.644 12.600 12.956 13.075
03 Jan 12.926 13.379 13.012 13.116 13.112 12.790 12.713 12.634 12.959 13.176
04 Jan 13.051 13.414 13.045 13.219 13.051 12.829 12.954 12.724 13.047 13.187
05 Jan 13.176 13.417 13.065 13.148 13.115 12.874 12.956 12.834 13.098 13.123
The code for plotting is here.
ice_data_dates.plot(figsize=(20,12), title='Arctic Sea Ice Extent', lw=3, fontsize=16, ax=ax, grid=True)
This plots a line plot for each of the years listed in the dataframe over each day in the year. However, I would like to make the line for 2020 much thicker than the others so it stands out more clearly. Is there a way to do that using this one line of code? Or do I need to manually plot all of the years such that I can control the thickness of each line separately? A current picture is attached, where the line thicknesses are all the same.

You can iterate over the lines in the plot, which can be retrieved with ax.get_lines, and increase the width using set_linewidth if its label matches the value of interest:
fig, ax = plt.subplots()
df.plot(figsize=(20,12), title='Arctic Sea Ice Extent',
lw=3, fontsize=16, ax=ax, grid=True)
for line in ax.get_lines():
if line.get_label() == '2020':
line.set_linewidth(15)
plt.show()

You can do it in two lines like this:
ice_data_dates.loc[:, ice_data_dates.columns != "2020"].plot(figsize=(20, 12), title='Arctic Sea Ice Extent', lw=3, fontsize=16, ax=ax, grid=True)
ice_data_dates["2020"].plot(figsize=(20, 12), title='Arctic Sea Ice Extent', lw=15, fontsize=16, ax=ax, grid=True)
This will first plot the entire DataFrame except for the column for 2020 and then only plot 2020. The output looks like this:
This uses a different approach as the selected answer but it gives the same result.

Related

How to plot daily data as monthly averages (for separate years)

I am trying to plot a graph to represent a monthly river discharge dataset from 1980-01-01 to 2013-12-31.
Please check out this graph
The plan is to plot "Jan Feb Mar Apr May...Dec" as the x-axis and the discharge (m3/s) as the y-axis. The actual lines on the graphs would represent the years. Alternatively, the lines on the graph would showcase monthly average (from jan to dec) of every year from 1980 to 2013.
DAT = pd.read_excel('Modelled Discharge_UIB_1980-2013_Daily.xlsx',
sheet_name='Karhmong', header=None, skiprows=1,
names=['year', 'month', 'day', 'flow'],
parse_dates={ 'date': ['year', 'month', 'day'] },
index_col='date')
the above is to show what type of data it is
date flow
1980-01-01 104.06
1980-01-02 103.81
1980-01-03 103.57
1980-01-04 103.34
1980-01-05 103.13
... ...
2013-12-27 105.65
2013-12-28 105.32
2013-12-29 105.00
2013-12-30 104.71
2013-12-31 104.42
because I want to compare all the years to each other so I tried the below command
DAT1980 = DAT[DAT.index.year==1980]
DAT1980
DAT1981 = DAT[DAT.index.year==1981
DAT1981
...etc
in terms of grouping the months for the x-axis I tried grouping months using the command
datmonth = np.unique(DAT.index.month)
so far all of these commands caused no error
however as I plot the graph I got this error
Graph plot command
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
ax.plot(datmonth, DAT1980, color='purple', linestyle='--', label='1980')
ax.grid()
plt.legend()
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
axs.set_xlim(3, 5)
axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
which I got "ValueError: x and y must have same first dimension, but have shapes (12,) and (366, 1)" as the error
I then tried
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
ax.plot(DAT.index.month, DAT.index.year==1980, color='purple', linestyle='--', label='1980')
ax.grid()
ax.plot(DAT.index.month, DAT.index.year==1981, color='black', marker='o', linestyle='-', label='C1981')
ax.grid()
plt.legend()
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
#axs.set_xlim(1, 12)
axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
and it worked better than the previous graph but still not what I wanted
(please check out the graph here)
as my intention is to create a graph similar to this
I wholeheartedly appreciate any suggestion you may have! Thank you so so much and if you need any further information please do not hesitate to ask, I will reply as soon as possible.
Welcome to SO! Nice job creating a clear description of your issue and showing lots of code : )
There are a few syntax issues here and there, but the main issue I see is that you need to add a groupby/aggregation operation at some point. That is, you have daily data, but your desired plot has monthly resolution (for each year). It sounds like you want an average of the daily values for each month for each year (correct me if that is wrong).
Here is some fake data:
dr = pd.date_range('01-01-1980', '12-31-2013', freq='1D')
flow = np.random.rand(len(dr))
df = pd.DataFrame(flow, columns=['flow'], index=dr)
Looks like your example:
flow
1980-01-01 0.751287
1980-01-02 0.411040
1980-01-03 0.134878
1980-01-04 0.692086
1980-01-05 0.671108
...
2013-12-27 0.683654
2013-12-28 0.772894
2013-12-29 0.380631
2013-12-30 0.957220
2013-12-31 0.864612
[12419 rows x 1 columns]
You can use groupby to get a mean for each month, using the same datetime attributes you use above (with some additional methods to help make the data easier to work with)
monthly = (df.groupby([df.index.year, df.index.month])
.mean()
.rename_axis(index=['year', 'month'],)
.reset_index())
monthly has flow data for each month for each year, i.e. what you want to plot:
year month flow
0 1980 1 0.514496
1 1980 2 0.633738
2 1980 3 0.566166
3 1980 4 0.553763
4 1980 5 0.537686
.. ... ... ...
403 2013 8 0.402805
404 2013 9 0.479226
405 2013 10 0.446874
406 2013 11 0.526942
407 2013 12 0.599161
[408 rows x 3 columns]
Now to plot an individual year, you index it from monthly and plot the flow data. I use most of your axes formatting:
# make figure
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
# plotting for one year
sub = monthly[monthly['year'] == 1980]
ax.plot(sub['month'], sub['flow'], color='purple', linestyle='--', label='1980')
# some formatting
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
ax.set_xticks(range(1, 13))
ax.set_xticklabels(['J','F','M','A','M','J','J','A','S','O','N','D'])
ax.legend()
ax.grid()
Producing the following:
You could instead plot several years using a loop of some sort:
years = [1980, 1981, 1982, ...]
for year in years:
sub = monthly[monthly['year'] == year]
ax.plot(sub['month'], sub['flow'], ...)
You many run into some other challenges here (like finding a way to set nice styling for 30+ lines, and doing so in a loop). You can open a new post (building off of this one) if you can't find out how to accomplish something through other posts here. Best of luck!

Skipping gap periods the x-axis of a chart python

I would like to customize the x axis of a pyplot line chart. Basically, I have periods without values, so I want to skip these periods.
Data example:
Row
x
y
previous
2021-5-10 14:58
100.520
previous
2021-5-10 14:59
100.500
red dot
2021-5-10 15:00
100.550
green dot
2021-5-11 9:00
100.490
after
2021-5-11 9:01
100.650
after
2021-5-11 9:02
100.480
def chartValueTimeLine(time_series, value_series, line_color='blue', time_label='date', value_label='y_label', alpha_value=0.35, title='chart1'):
fig = plt.figure(figsize=(12.5, 7), num=title)
fig.subplots_adjust(bottom=0.2)
plt.plot(time_series, value_series, label=value_label, color = line_color, alpha=alpha_value)
plt.title(title)
plt.xticks(rotation=45)
plt.xlabel(time_label)
plt.ylabel(value_label)
plt.legend(loc='upper left')
I would like the green point to come directly after the red one in the x-axis. Does anyone know how to do this?
Thanks

Adding a shaded box to a plot in python

I am looking to add a shaded box to my plot below. I want the box to go from Aug 25-Aug 30 and to run the length of the Y axis.
The following is my code for the two plots I have made...
df = pd.read_excel('salinity_temp.xlsx')
dates = df['Date']
sal = df['Salinity']
temp = df['Temperature']
fig, axes = plt.subplots(2, 1, figsize=(8,8), sharex=True)
axes[0].plot(dates, sal, lw=5, color="red")
axes[0].set_ylabel('Salinity (PSU)')
axes[0].set_title('Salinity', fontsize=14)
axes[1].set_title('Temperature', fontsize=14)
axes[1].plot(dates, temp, lw=5, color="blue")
axes[1].set_ylabel('Temperature (C)')
axes[1].set_xlabel('Dates, 2017', fontsize=12)
axes[1].xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b %d'))
axes[0].xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b %d'))
axes[1].xaxis_date()
axes[0].xaxis_date()
I want the shaded box to highlight when Hurricane Harvey hit Houston, Texas (Aug 25- Aug 30). My data looks like:
Date Salinity Temperature
20-Aug 15.88144647 31.64707184
21-Aug 18.83088846 31.43848419
22-Aug 19.51015264 31.47655487
23-Aug 23.41655369 31.198349
24-Aug 25.16410124 30.63014984
25-Aug 25.2273574 28.8677597
26-Aug 28.35557667 27.49458313
27-Aug 18.52829235 25.92834473
28-Aug 7.423231661 24.06635284
29-Aug 0.520394177 23.47881317
30-Aug 0.238508327 23.90857697
31-Aug 0.143210364 24.30892944
1-Sep 0.206473387 25.20442963
2-Sep 0.241343182 26.32663727
3-Sep 0.58000503 26.93431854
4-Sep 1.182055098 27.8212738
5-Sep 3.632014919 28.23947906
6-Sep 4.672006985 27.29686737
7-Sep 5.938766377 26.8693161
8-Sep 9.107671159 26.48963928
9-Sep 8.180587303 26.05213165
10-Sep 6.200532091 25.73104858
11-Sep 5.144526191 25.60035706
12-Sep 5.106032451 25.73139191
13-Sep 4.279492562 26.06132507
14-Sep 5.255868992 26.74919128
15-Sep 8.026764063 27.23724365
I have tried using the rectangle function in this link (https://discuss.analyticsvidhya.com/t/how-to-add-a-patch-in-a-plot-in-python/5518) however can't seem to get it to work properly.
Independent of your specific data, it sounds like you need axvspan. Try running this after your plotting code:
for ax in axes:
ax.axvspan('2017-08-25', '2017-08-30', color='black', alpha=0.5)
This will work if dates = df['Date'] is stored as type datetime64. It might not work with other datetime data types, and it won't work if dates contains date strings.

How to plot different dataframe data in one figure?

I need some guidance to plot:
scatter plot of df1 data: time vs y use the hue for the column z
line plot df2 data: time vs. y
a single line at y=c (c is a constant)
y data in df1 and df2 are different but they are in the same range.
I do not know where to begin. Any guidance is appreciated.
More explanation. A portion of data is presented here. I want to plot:
scatter plot of time vs CO2
finding the yearly rolling average of CO2 (from 01/01/2016 to 09/30/2019 based on hourly data. So the first average will be from "01/01/2016 00" to "12/31/2016 23" and second average will be from "01/01/2016 01" to "01/01/2017 00") (like the trend in plot below)
finding the maximum of all the data and through a line over the plot (like straight line below)
Sample data
data = {'Date':['0 01/14/2016 00', '01/14/2016 01','01/14/2016 02','01/14/2016 03','01/14/2016 04','01/14/2016 05','01/14/2016 06','01/14/2016 07','01/14/2016 08','01/14/2016 09','01/14/2016 10','01/14/2016 11','01/14/2016 12','01/14/2016 13','01/14/2016 14','01/14/2016 15','01/14/2016 16','01/14/2016 17','01/14/2016 18','01/14/2016 19'],
'CO2':[2415.9,2416.5,2429.8,2421.5,2422.2,2428.3,2389.1,2343.2,2444.,2424.8,2429.6,2414.7,2434.9,2420.6,2420.5,2397.1,2415.6,2417.4,2373.2,2367.9],
'Year':[2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016]}
# Create DataFrame
df = pd.DataFrame(data)
# DataFrame view
Date CO2 Year
0 01/14/2016 00 2415.9 2016
01/14/2016 01 2416.5 2016
01/14/2016 02 2429.8 2016
01/14/2016 03 2421.5 2016
01/14/2016 04 2422.2 2016
using matplotlib.pyplot:
plt.hlines to add a horizontal line at a constant
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# with synthetic data
np.random.seed(365)
data = {'CO2': [np.random.randint(2000, 2500) for _ in range(783)],
'Date': pd.bdate_range(start='1/1/2016', end='1/1/2019').tolist()}
# create the dataframe:
df = pd.DataFrame(data)
# verify Date is in datetime format
df['Date'] = pd.to_datetime(df['Date'])
# set Date as index so .rolling can be used
df.set_index('Date', inplace=True)
# add rolling mean
df['rolling'] = df['CO2'].rolling('365D').mean()
# plot the data
plt.figure(figsize=(8, 8))
plt.scatter(x=df.index, y='CO2', data=df, label='data')
plt.plot(df.index, 'rolling', data=df, color='black', label='365 day rolling mean')
plt.hlines(max(df['CO2']), xmin=min(df.index), xmax=max(df.index), color='red', linestyles='dashed', label='Max')
plt.hlines(np.mean(df['CO2']), xmin=min(df.index), xmax=max(df.index), color='green', linestyles='dashed', label='Mean')
plt.xticks(rotation='45')
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
Plot using synthetic data:
Issues with the Date format in the data from the op:
Use a regular expression to fix the Date column
Place the code to fix Date, just before df['Date'] = pd.to_datetime(df['Date'])
import re
# your data
Date CO2 Year
0 01/14/2016 00 2415.9 2016
01/14/2016 01 2416.5 2016
01/14/2016 02 2429.8 2016
01/14/2016 03 2421.5 2016
01/14/2016 04 2422.2 2016
df['Date'] = df['Date'].apply(lambda x: (re.findall(r'\d{2}/\d{2}/\d{4}', x)[0]))
# fixed Date column
Date CO2 Year
01/14/2016 2415.9 2016
01/14/2016 2416.5 2016
01/14/2016 2429.8 2016
01/14/2016 2421.5 2016
01/14/2016 2422.2 2016
You can use a dual-axis chart. It will ideally look the same as yours because both the axes will be the same scale. Can directly plot using pandas data frames
import matplotlib.pyplot as plt
import pandas as pd
# create a color map for the z column
color_map = {'z_val1':'red', 'z_val2':'blue', 'z_val3':'green', 'z_val4':'yellow'}
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx() #second axis within the first
# define scatter plot
df1.plot.scatter(x = 'date',
y = 'CO2',
ax = ax1,
c = df['z'].apply(lambda x:color_map[x]))
# define line plot
df2.plot.line(x = 'date',
y = 'MA_CO2', #moving average in dataframe 2
ax = ax2)
# plot the horizontal line at y = c (constant value)
ax1.axhline(y = c, color='r', linestyle='-')
# to fit the chart properly
plt.tight_layout()

Python Pandas Stacked Bar Chart x-axis labels

I've got the below dataframe:
Months Region Open Case ID Closed Case ID
April APAC 648888 648888
April US 157790
April UK 221456 221456
April APAC 425700
April US 634156 634156
April UK 109445
April APAC 442459 442459
May US 218526
May UK 317079 317079
May APAC 458098
May US 726342 726342
May UK 354155
May APAC 463582 463582
May US 511059
June UK 97186 97186
June APAC 681548
June US 799169 799169
June UK 210129
June APAC 935887 935887
June US 518106
June UK 69279 69279
and I am getting the counts of the Open Case ID and Closed Case ID with:
df = df.groupby(['Months','Region']).count()
I am trying to replicate the below chart generated by Excel, which looks like this:
and I am getting the below with:
df[['Months','Region']].plot.bar(stacked=True, rot=0, alpha=0.5, legend=False)
Is there a way to get the chart generated by python closer to the chart generated by Excel in terms of how the x-axis and its labels are broken down?
Theres are great solution for similar question to design multi index labels here. You can use the same parameters of plot with ax=fig.gca() in that solution i.e
import matplotlib.pyplot as plt
# add_line,label_len,label_group_bar_table from https://stackoverflow.com/a/39502106/4800652
fig = plt.figure()
ax = fig.add_subplot(111)
#Your df.plot code with ax parameter here
df.plot.bar(stacked=True, rot=0, alpha=0.5, legend=False, ax=fig.gca())
labels = ['' for item in ax.get_xticklabels()]
ax.set_xticklabels(labels)
ax.set_xlabel('')
label_group_bar_table(ax, df)
fig.subplots_adjust(bottom=.1*df.index.nlevels)
plt.show()
Output based on sample data:

Categories