I have a Pandas data frame, and I want to explore the periodicity, trend, etc of the time series. Here is the data.
To visualize it, I want to superpose the "sub time series" for each year on the same plot (ie have the same x coordinate for data from 01/01/2000, 01/01/2001 and 01/01/2002).
Do I have to transform my date column so that each data has the same year?
Does anyone have an idea of how to do that?
Setup
This parses the data that you linked
df = pd.read_csv(
'data.csv', sep=';', decimal=',',
usecols=['date', 'speed', 'height', 'width'],
index_col=0, parse_dates=[0]
)
My Hack
I stripped the everything but the year from the dates and assumed the year of 2012 because it is a leap year and will accommodate Feb-29. I splity the year into another level of a multi-index, unstack and plot
idx = pd.MultiIndex.from_arrays([
pd.to_datetime(df.index.strftime('2012-%m-%d %H:%M:%S')),
df.index.year
])
ax = df.set_index(idx).unstack().speed.plot()
lg = ax.legend(bbox_to_anchor=(1.05, 1), loc=2, ncol=2)
In an effort to pretty this up
fig, axes = plt.subplots(3, 1, figsize=(15, 9))
idx = pd.MultiIndex.from_arrays([
pd.to_datetime(df.index.strftime('2012-%m-%d %H:%M:%S')),
df.index.year
])
d1 = df.set_index(idx).unstack().resample('W').mean()
d1.speed.plot(ax=axes[0], title='speed')
lg = axes[0].legend(bbox_to_anchor=(1.02, 1), loc=2, ncol=1)
d1.height.plot(ax=axes[1], title='height', legend=False)
d1.width.plot(ax=axes[2], title='width', legend=False)
fig.tight_layout()
One way you could do it is to create a common x-axis for all years like this:
df['yeartime']=df.groupby(df.date.dt.year).cumcount()
where 'yeartime' represents the number of time measures in a year. Next, create a year column:
df['year'] = df.date.dt.year
Now, let's subset our data for the Jan 1st of years 2000, 2001, and 2002
subset_df = df.loc[df.date.dt.year.isin(['2000','2001',2002]) & (df.date.dt.day == 1) & (df.date.dt.month == 1)]
And lastly, plot it.
ax = sns.pointplot('yeartime','speed',hue='year',data=subset_df, markers='None')
_ =ax.get_xaxis().set_ticks([])
Related
I am trying to plot a simple pandas Series object, its something like this:
2018-01-01 10
2018-01-02 90
2018-01-03 79
...
2020-01-01 9
2020-01-02 72
2020-01-03 65
It includes only the first month of each year, so it only contains the month January and all its values through the days.
When i try to plot it
# suppose the name of the series is dates_and_values
dates_and_values.plot()
It returns a plot like this (made using my current data)
It is clearly plotting by year and then the month, so it looks pretty squished and small, since i don't have any other months except January, is there a way to plot it by the year and day so it outputs a better plot to observe the days.
the x-axis is the index of the dataframe
dates are a continuous series, x-axis is continuous
change index to be a string of values, means it it no longer continuous and squishes your graph
have generated some sample data that only has January to demonstrate
import matplotlib.pyplot as plt
cf = pd.tseries.offsets.CustomBusinessDay(weekmask="Sun Mon Tue Wed Thu Fri Sat",
holidays=[d for d in pd.date_range("01-jan-1990",periods=365*50, freq="D")
if d.month!=1])
d = pd.date_range("01-jan-2015", periods=200, freq=cf)
df = pd.DataFrame({"Values":np.random.randint(20,70,len(d))}, index=d)
fig, ax = plt.subplots(2, figsize=[14,6])
df.set_index(df.index.strftime("%Y %d")).plot(ax=ax[0])
df.plot(ax=ax[1])
I suggest that you convert the series to a dataframe and then pivot it to get one column for each year. This lets you plot the data for each year with a separate line, either in the same plot using different colors or in subplots. Here is an example:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample series
rng = np.random.default_rng(seed=123) # random number generator
dt = pd.date_range('2018-01-01', '2020-01-31', freq='D')
dt_jan = dt[dt.month == 1]
series = pd.Series(rng.integers(20, 90, size=dt_jan.size), index=dt_jan)
# Convert series to dataframe and pivot it
df_raw = series.to_frame()
df_pivot = df_raw.pivot_table(index=df_raw.index.day, columns=df_raw.index.year)
df = df_pivot.droplevel(axis=1, level=0)
df.head()
# Plot all years together in different colors
ax = df.plot(figsize=(10,4))
ax.set_xlim(1, 31)
ax.legend(frameon=False, bbox_to_anchor=(1, 0.65))
ax.set_xlabel('January', labelpad=10, size=12)
for spine in ['top', 'right']:
ax.spines[spine].set_visible(False)
# Plot years separately
axs = df.plot(subplots=True, color='tab:blue', sharey=True,
figsize=(10,8), legend=None)
for ax in axs:
ax.set_xlim(1, 31)
ax.grid(axis='x', alpha=0.3)
handles, labels = ax.get_legend_handles_labels()
ax.text(28.75, 80, *labels, size=14)
if ax.is_last_row():
ax.set_xlabel('January', labelpad=10, size=12)
ax.figure.subplots_adjust(hspace=0)
Thank you in advance! (Image provided below)
I am trying to have the Y-Axis of my heatmap reflect the year associated with the data it is pulling. What is happening is that the Y-Axis is merely counting the number of years (0, 1, 2, ....30) when it should be appearing as 1990, 1995, 2000, etc.
How do I update my code (provided below) so that the Y-Axis shows the actual year instead of the year count?
# links to Minot data if you want to pull from the web
##url2 = 'https://raw.githubusercontent.com/the-
datadudes/deepSoilTemperature/master/allStationsDailyAirTemp1.csv'
raw_data = pd.read_csv('https://raw.githubusercontent.com/the-
datadudes/deepSoilTemperature/master/allStationsDailyAirTemp1.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()
selected_station = 'Minot'
# load the data into a DataFrame, not a Series
# parse the dates, and set them as the index
df1 = df_all_stations[df_all_stations['Station'] == selected_station]
# groupby year and aggregate Temp into a list
dfg1 = df1.groupby(df1.index.year).agg({'Temp': list})
# create a wide format dataframe with all the temp data expanded
df1_wide = pd.DataFrame(dfg1.Temp.tolist(), index=dfg1.index)
# adding the data between 1990/01/01 -/04/23 and delete the 29th of Feb
rng = pd.date_range(start='1990-01-01', end='1990-04-23', freq='D')
df = pd.DataFrame(index= rng)
df.index = pd.to_datetime(df.index)
df['Temp'] = np.NaN
frames = [df, df1]
result = pd.concat(frames)
result = result[~((result.index.month == 2) & (result.index.day == 29))]
dfg1 = result.groupby(result.index.year).agg({'Temp': list})
df1_wide = pd.DataFrame(dfg1['Temp'].tolist(), index=dfg1.index)
# Setting all leftover empty fields to the average of that time in order to fill in the gaps
df1_wide = df1_wide.apply(lambda x: x.fillna(x.mean()),axis=0)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
##ax1.set_title('Average Daily Air Temperature - Minot Station')
ax1.set_xlabel('Day of the year')
ax1.set_ylabel('Years since start of data collection')
# Setting the title so that it changes based off of the selected station
ax1.set_title('Average Air Temp for ' + str(selected_station))
# Creating Colorbar
cbm = ax1.matshow(df1_wide, interpolation=None, aspect='auto');
# Plotting the colorbar
cb = plt.colorbar(cbm, ax=ax1)
cb.set_label('Temp in Celsius')
Add this line at the end of your code:
ax1.set_yticklabels(['']+df1_wide.index.tolist()[::5])
file_path is an excel file with a column 'Year' of year numbers ranging from 1940 to 2018 and another column 'Divide Year 1976' indicating Pre-1976 or 1976-Present.
# Load excel file as a pandas data_frame
data = pd.read_excel(file_path, sheet_name=5, skiprows=1)
data_frame = pd.DataFrame(data)
# create an extra column in data_frame with bin from 1930 to 2020 with 10 years interval
data_frame['bin Year'] = pd.cut(data_frame.Year, bins=np.arange(1930, 2030, 10, dtype=int))
# Plot stacked bar plot
color_table = pd.crosstab(index=data_frame['bin Year'], columns=data_frame['Divide Year 1976'])
color_table.plot(kind='bar', figsize=(6.5, 3.5), stacked=True, legend=None, edgecolor='black')
# Add xticks
plt.xticks(locs, ['1930s','1940s','1950s','1960s','1970s','1980s','1990s','2000s','2010s'], fontsize=8, rotation=45)
The problem here is that colortable.plot() function automatically ignores the interval that has 0 counts, in my case which is 1940-1950. How can I force the code to display bars that has zero counts in certain intervals?
enter image description here
Use parameter dropna in crosstab.
color_table = pd.crosstab(
index=data_frame['bin Year'],
columns=data_frame['Divide Year 1976'],
dropna=False)
See the docs
I need to create a hourly mean multi plot heatmap of Temperature as in:
for sevel years. The data to plot are read from excel sheet. The excel sheet is formated as "year", "month", "day", "hour", "Temp".
I created a mounthly mean heatmap using seaborn library, using this code :
df = pd.read_excel('D:\\Users\\CO2_heatmap.xlsx')
co2=df.pivot_table(index="month",columns="year",values='CO2',aggfunc="mean")
ax = sns.heatmap(co2,cmap='bwr',vmin=370,vmax=430, cbar_kws={'label': '$\mathregular{CO_2}$ [ppm]', 'orientation': 'vertical'})
Obtaining this graph:
How can I generate a
co2=df.pivot_table(index="hour",columns="day",values='CO2',aggfunc="mean")
for each month and for each year?
The seaborn heat map did not allow me to draw multiple graphs of different axes. I created a graph by SNSing that one graph with multiple graphs. It was not customizable like the reference graph. Sorry we are not able to help you.
import pandas as pd
import numpy as np
import random
date_rng = pd.date_range('2018-01-01', '2019-12-31',freq='1H')
temp = np.random.randint(-30.0, 40.0,(17497,))
df = pd.DataFrame({'CO2':temp},index=pd.to_datetime(date_rng))
df.insert(1, 'year', df.index.year)
df.insert(2, 'month', df.index.month)
df.insert(3, 'day', df.index.day)
df.insert(4, 'hour', df.index.hour)
df = df.copy()
yyyy = df['year'].unique()
month = df['month'].unique()
import matplotlib.pyplot as plt
import seaborn as sns
fig, axes = plt.subplots(figsize=(20,10), nrows=2, ncols=12)
for m, ax in zip(range(1,25), axes.flat):
if m <= 12:
y = yyyy[0]
df1 = df[(df['year'] == y) & (df['month'] == m)]
else:
y = yyyy[1]
m -= 12
df1 = df[(df['year'] == y) & (df['month'] == m)]
df1 = df1.pivot_table(index="hour",columns="day",values='CO2',aggfunc="mean")
plt.figure(m)
sns.heatmap(df1, cmap='RdBu', cbar=False, ax=ax)
This might help- /hourly-heatmap-graph-using-python-s-ggplot2-implementation-plotnine
There's also a guide to producing this exact plot (for two years of data) on the
Python graph gallery-heatmap-for-timeseries-matplotlib
I'm afraid I don't know any Python, so didn't want to copy/paste in case I missed anything. I did, however, create the original plot in R :) The main trick was to use facet_grid to split the data by year and month, and reverse the y axis labels.
It looks like
fig, axes = plt.subplots(2, 12, figsize=(14, 10), sharey=True)
for i, year in enumerate([2004, 2005]):
for j, month in enumerate(range(1, 13)):
single_plot(data, month, year, axes[i, j])
does the work of splitting by year and month.
I hope this helps you get further forward
I have a time series that I would like to plot year on year. I want the data to be daily, but the axis to show each month as "Jan", "Feb" etc.
At the moment I can get the daily data, BUT the axis is 1-366 (the day of the year).
Or I can get the monthly axis as 1, 2, 3 etc (by changing the index to df.index.month), BUT then the data is monthly.
How can I convert the day of year axis into months? Or how can I do this?
Code showing the daily data, but the axis is wrong:
# import
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create fake time series dataframe
index = pd.date_range(start='01-Jan-2012', end='31-12-2018', freq='D')
data = np.random.randn(len(index))
df = pd.DataFrame(data, index, columns=['Data'])
# pivot to get by day in rows, then year in columns
df_pivot = pd.pivot_table(df, index=df.index.dayofyear, columns=df.index.year, values='Data')
df_pivot.plot()
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
This can be done using the xticks function. Simply add the following code before plt.show():
plt.xticks(np.linspace(0,365,13)[:-1], ('Jan', 'Feb' ... 'Nov', 'Dec'))
Or the following to have the month names appear in the middle of the month:
plt.xticks(np.linspace(15,380,13)[:-1], ('Jan', 'Feb' ... 'Nov', 'Dec'))
It may be more straightforward to simply add a datetime index to your pivoted dataframe.
df_pivot.index = pd.date_range(
df.index.max() - pd.Timedelta(days=df_pivot.shape[0]),
freq='D', periods=df_pivot.shape[0])
df_pivot.plot()
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
The resulting plot has the axis as desired:
This method also has the advantage over the accepted answer of working irrespective of your start and end date. For example, if you change your index's end date to end='30-Jun-2018', the axis adapts nicely to fit the data: