Matplotlib Heat-Map Y Axis - python

Thank you in advance! (Image provided below)
I am trying to have the Y-Axis of my heatmap reflect the year associated with the data it is pulling. What is happening is that the Y-Axis is merely counting the number of years (0, 1, 2, ....30) when it should be appearing as 1990, 1995, 2000, etc.
How do I update my code (provided below) so that the Y-Axis shows the actual year instead of the year count?
# links to Minot data if you want to pull from the web
##url2 = 'https://raw.githubusercontent.com/the-
datadudes/deepSoilTemperature/master/allStationsDailyAirTemp1.csv'
raw_data = pd.read_csv('https://raw.githubusercontent.com/the-
datadudes/deepSoilTemperature/master/allStationsDailyAirTemp1.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()
selected_station = 'Minot'
# load the data into a DataFrame, not a Series
# parse the dates, and set them as the index
df1 = df_all_stations[df_all_stations['Station'] == selected_station]
# groupby year and aggregate Temp into a list
dfg1 = df1.groupby(df1.index.year).agg({'Temp': list})
# create a wide format dataframe with all the temp data expanded
df1_wide = pd.DataFrame(dfg1.Temp.tolist(), index=dfg1.index)
# adding the data between 1990/01/01 -/04/23 and delete the 29th of Feb
rng = pd.date_range(start='1990-01-01', end='1990-04-23', freq='D')
df = pd.DataFrame(index= rng)
df.index = pd.to_datetime(df.index)
df['Temp'] = np.NaN
frames = [df, df1]
result = pd.concat(frames)
result = result[~((result.index.month == 2) & (result.index.day == 29))]
dfg1 = result.groupby(result.index.year).agg({'Temp': list})
df1_wide = pd.DataFrame(dfg1['Temp'].tolist(), index=dfg1.index)
# Setting all leftover empty fields to the average of that time in order to fill in the gaps
df1_wide = df1_wide.apply(lambda x: x.fillna(x.mean()),axis=0)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
##ax1.set_title('Average Daily Air Temperature - Minot Station')
ax1.set_xlabel('Day of the year')
ax1.set_ylabel('Years since start of data collection')
# Setting the title so that it changes based off of the selected station
ax1.set_title('Average Air Temp for ' + str(selected_station))
# Creating Colorbar
cbm = ax1.matshow(df1_wide, interpolation=None, aspect='auto');
# Plotting the colorbar
cb = plt.colorbar(cbm, ax=ax1)
cb.set_label('Temp in Celsius')

Add this line at the end of your code:
ax1.set_yticklabels(['']+df1_wide.index.tolist()[::5])

Related

Combining multple figure in window

I'm playing around with kaggle dataframe to practice using matplotlib.
I was creating bar graph one by one, but it keeps adding up.
When I called plt.show() there were like 10 windows of figure suddenly shows up.
Is it possible to combine 4 of those figures into 1 window?
These part are in the same segments "Time Analysis" So I want to combine these 4 figures in 1 window.
import matplotlib.pyplot as plt
import seaborn as sns
dataset = ('accidents_data.csv')
df = pd.read_csv(dataset)
"""Time Analysis :
Analyze the time that accidents happen for various patterns and trends"""
df.Start_Time = pd.to_datetime(df.Start_Time) #convert the start time column to date time format
df['Hour_of_Accident'] = df.Start_Time.dt.hour #extract the hour from the time data
hour_accident = df['Hour_of_Accident'].value_counts()
hour_accident_df = hour_accident.to_frame() #convert the series data to dataframe in order to sort the index columns
hour_accident_df.index.names = ['Hours'] #naming the index column
hour_accident_df.sort_index(ascending=True, inplace=True)
print(hour_accident_df)
# Plotting the hour of accidents data in a bargraph
hour_accident_df.plot(kind='bar',figsize=(8,4),color='blue',title='Hour of Accident')
#plt.show() #Show the bar graph
"""Analyzing the accident frequency per day of the week"""
df['Day_of_the_week'] = df.Start_Time.dt.day_of_week
day_of_accident = df['Day_of_the_week'].value_counts()
day_of_accident_df = day_of_accident.to_frame() #convert the series data to dataframe so that we can sort the index columns
day_of_accident_df.index.names = ['Day'] # Renaming the index column
day_of_accident_df.sort_index(ascending=True, inplace=True)
print(day_of_accident_df)
f, ax = plt.subplots(figsize = (8, 5))
x = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Sartuday', 'Sunday']
l = day_of_accident_df.index.values
y = day_of_accident_df.Day_of_the_week
plt.bar(l, y, color='green')
plt.title('Day of the week vs total number of accidents')
plt.ylabel("No. of accidents recorded")
ax.set_xticks(l)
ax.set_xticklabels(x)
#plt.show()
"""Analysis for the months"""
df['Month'] = df.Start_Time.dt.month
accident_month = df['Month'].value_counts()
accident_month_df = accident_month.to_frame() #convert the series data to dataframe so that we can sort the index columns
accident_month_df.index.names = ['Month'] # Renaming the index column
accident_month_df.sort_index(ascending=True, inplace=True)
print(accident_month_df)
#Plotting the Bar Graph
accident_month_df.plot(kind='bar',figsize=(8,5),color='purple',title='Month of Accident')
"""Yearly Analysis"""
df['Year_of_accident'] = df.Start_Time.dt.year
#Check the yearly trend
yearly_count = df['Year_of_accident'].value_counts()
yearly_count_df = pd.DataFrame({'Year':yearly_count.index, 'Accidents':yearly_count.values})
yearly_count_df.sort_values(by='Year', ascending=True, inplace=True)
print(yearly_count_df)
#Creating line plot
yearly_count_df.plot.line(x='Year',color='red',title='Yearly Accident Trend ')
plt.show()

Matplotlib Time-Series Heatmap Visualization Row Modification

Thank you in advance for the assistance!
I am trying to create a heat map from time-series data and the data begins mid year, which is causing the top of my heat map to be shifted to the left and not match up with the rest of the plot (Shown Below). How would I go about shifting the just the top line over so that the visualization of the data syncs up with the rest of the plot?
(Code Provided Below)
import pandas as pd
import matplotlib.pyplot as plt
# links to datadata
url1 = 'https://raw.githubusercontent.com/the-datadudes/deepSoilTemperature/master/minotDailyAirTemp.csv'
# load the data into a DataFrame, not a Series
# parse the dates, and set them as the index
df1 = pd.read_csv(url1, parse_dates=['Date'], index_col=['Date'])
# groupby year and aggregate Temp into a list
dfg1 = df1.groupby(df1.index.year).agg({'Temp': list})
# create a wide format dataframe with all the temp data expanded
df1_wide = pd.DataFrame(dfg1.Temp.tolist(), index=dfg1.index)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
ax1.matshow(df1_wide, interpolation=None, aspect='auto');
Now, what its the problem, the dates on the dataset, if you see the Dataset this start on
`1990-4-24,15.533`
To solve this is neccesary to add the data between 1990/01/01 -/04/23 and delete the 29Feb.
rng = pd.date_range(start='1990-01-01', end='1990-04-23', freq='D')
df = pd.DataFrame(index= rng)
df.index = pd.to_datetime(df.index)
df['Temp'] = np.NaN
frames = [df, df1]
result = pd.concat(frames)
result = result[~((result.index.month == 2) & (result.index.day == 29))]
With this data
dfg1 = result.groupby(result.index.year).agg({'Temp': list})
df1_wide = pd.DataFrame(dfg1['Temp'].tolist(), index=dfg1.index)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
ax1.matshow(df1_wide, interpolation=None, aspect='auto');
The problem with the unfilled portions are a consequence of the NaN values on your dataset, in this case you take the option, replace the NaN values with the column-mean or replace by the row-mean.
Another ways are available to replace the NaN values
df1_wide = df1_wide.apply(lambda x: x.fillna(x.mean()),axis=0)

How to plot from multiple datasets with same fieldnames?

I have a few monthly datasets of usage stats stored in different CSVs, with a couple hundred fields. I am cutting off the top 30 of each one, but the bottom will change (and the top as changes as stuff is banned, albeit less commonly). Currently I have the lines representing months, but I want the points to be (y=usage %) and (x=month) with the legend being different users.
column[0] is their number in the file (1-30)
column[1] is their name
column[2] is the usage percent
AprilStats = pd.read_csv(r'filepath', nrows=30)
MayStats = pd.read_csv(r'filepath', nrows=30)
JuneStats = pd.read_csv(r'filepath', nrows=30)
## Assign labels and sources
labels = [[AprilStats.columns[1]], [MayStats.columns[1]], [JuneStats.columns[1]]]
AprilUsage=np.array(AprilStats[AprilStats.columns[2]].tolist())
MayUsage=np.array(MayStats[MayStats.columns[2]].tolist())
JuneUsage=np.array(JuneStats[JuneStats.columns[2]].tolist())
x = np.array(AprilStats[AprilStats.columns[0]].tolist())
y = np.array(AprilStats[AprilStats.columns[2]].tolist())
my_xticks = AprilStats[AprilStats.columns[1]].tolist()
plt.xticks(x, my_xticks, rotation='55')
x1 = np.array(MayStats[MayStats.columns[0]].tolist())
y1 = np.array(MayStats[MayStats.columns[2]].tolist())
my_xticks1 = MayStats[MayStats.columns[1]].tolist()
plt.xticks(x, my_xticks1, rotation='55')
x2 = np.array(JuneStats[JuneStats.columns[0]].tolist())
y2 = np.array(JuneStats[JuneStats.columns[2]].tolist())
my_xticks2 = JuneStats[JuneStats.columns[1]].tolist()
plt.xticks(x, my_xticks2, rotation='55',)
### Plot the data
plt.rc('xtick', labelsize='xx-small')
plt.title('Little Cup Usage')
plt.ylabel('Usage (Percent)')
plt.plot(x,y,label='April', color='green', alpha=.4)
plt.plot(x1,y1,label='May', color='blue', alpha=.4)
plt.plot(x2,y2,label='June', color='red', alpha=.4)
plt.subplots_adjust(bottom=.2)
plt.legend()
plt.savefig('90daytest.png', dpi=500)
plt.show()
I think I am mislabeling them, but the month of usage isn't stored in the file. I reckon I could add it, but I'd like to not have to go in and edit these files every month. Also, sorry if this is horribly inneficient coding, I have just started learning python less than two weeks ago and this is a little project for me to learn with.
I'd divide this into two steps:
Gather all the data into a single dataframe in which the rows correspond to the different months, the columns to the different names and the values are the usage %.
Plot each column as a different series in a scatter plot.
Step 1:
# Create a dictionary associating a file to each month
files = {dt.date(2019, 4, 1): 'april.csv',
dt.date(2019, 5, 1): 'may.csv'}
# An empty data frame
df = pd.DataFrame()
''' For each file, generate a one entry data frame as follows, and append it to df.
Month name1 name2 ...
2019-1-1 0.5 0.2
'''
for month, file in files.items():
data = pd.read_csv(file, usecols=['name', 'usage'], index_col='name')
data = data.transpose()
data['month'] = month
data = data.set_index('month')
df = df.append(data)
Step 2:
# New figure
fig = plt.figure()
# Plot one series for each column in df
for name in df.columns:
plt.scatter(x=df.index, y=df[name], label=name)
# Additional plot formatting code here
plt.show()
I hope that helps.

Superposing Pandas time series from different years in Seaborn plot

I have a Pandas data frame, and I want to explore the periodicity, trend, etc of the time series. Here is the data.
To visualize it, I want to superpose the "sub time series" for each year on the same plot (ie have the same x coordinate for data from 01/01/2000, 01/01/2001 and 01/01/2002).
Do I have to transform my date column so that each data has the same year?
Does anyone have an idea of how to do that?
Setup
This parses the data that you linked
df = pd.read_csv(
'data.csv', sep=';', decimal=',',
usecols=['date', 'speed', 'height', 'width'],
index_col=0, parse_dates=[0]
)
My Hack
I stripped the everything but the year from the dates and assumed the year of 2012 because it is a leap year and will accommodate Feb-29. I splity the year into another level of a multi-index, unstack and plot
idx = pd.MultiIndex.from_arrays([
pd.to_datetime(df.index.strftime('2012-%m-%d %H:%M:%S')),
df.index.year
])
ax = df.set_index(idx).unstack().speed.plot()
lg = ax.legend(bbox_to_anchor=(1.05, 1), loc=2, ncol=2)
In an effort to pretty this up
fig, axes = plt.subplots(3, 1, figsize=(15, 9))
idx = pd.MultiIndex.from_arrays([
pd.to_datetime(df.index.strftime('2012-%m-%d %H:%M:%S')),
df.index.year
])
d1 = df.set_index(idx).unstack().resample('W').mean()
d1.speed.plot(ax=axes[0], title='speed')
lg = axes[0].legend(bbox_to_anchor=(1.02, 1), loc=2, ncol=1)
d1.height.plot(ax=axes[1], title='height', legend=False)
d1.width.plot(ax=axes[2], title='width', legend=False)
fig.tight_layout()
One way you could do it is to create a common x-axis for all years like this:
df['yeartime']=df.groupby(df.date.dt.year).cumcount()
where 'yeartime' represents the number of time measures in a year. Next, create a year column:
df['year'] = df.date.dt.year
Now, let's subset our data for the Jan 1st of years 2000, 2001, and 2002
subset_df = df.loc[df.date.dt.year.isin(['2000','2001',2002]) & (df.date.dt.day == 1) & (df.date.dt.month == 1)]
And lastly, plot it.
ax = sns.pointplot('yeartime','speed',hue='year',data=subset_df, markers='None')
_ =ax.get_xaxis().set_ticks([])

Getting Pandas datetime column to display as Dates, not Numbers, on Matplotlib X-axis

Using pandas and wondering why the date column isn't showing up as the actual dates (type = pandas.tslib.Timestamp) but are showing up as numbers.
Take this replicable example:
todays_date = datetime.datetime.now().date()
columns = ['month','A','B','C','D']
_dates = pd.DataFrame(pd.date_range(todays_date-datetime.timedelta(10), periods=150, freq='M'))
_randomdata = pd.DataFrame(np.random.randn(150, 4))
data = pd.concat([_dates, _randomdata], axis=1)
data.plot(figsize = (10,6))
As you can see, the x-axis is showing up as numbers, not dates.
2 questions:
a) How do I change it so that the actual dates are showing up on the x-axis?
b) How do I change the frequency of the ticks and tick labels on the x-axis if I want more/fewer months showing up?
Thanks guys!
Just use the date_range as an index to the DataFrame:
todays_date = datetime.datetime.now().date()
columns = ['A','B','C','D']
data = pd.DataFrame(data=np.random.randn(150, 4),
index=pd.date_range(todays_date-datetime.timedelta(10), periods=150, freq='M'),
columns=columns)
data.plot(figsize = (10,6))

Categories