I am trying to distinguish weekends from weekdays by either 1) shading the region 2) coloring points with different colors or 3) setting x-axis label marked different for weekend.
Here I am trying the 2nd option — coloring data points for weekend differently. I first created an additional column (Is_Weekday) for distinguish weekends from weekdays. However, it’s not drawn on the same line, but rather draws two lines with different colors. I would like them to be in one line but with different color for values on weekends.
Here’s my code for reproducible data:
import pandas as pd
from datetime import datetime
import plotly.express as px
np.random.seed(42)
rng = pd.date_range('2022-04-10', periods=21, freq='D')
practice_df = pd.DataFrame({ 'Date': rng, 'Val' : np.random.randn(len(rng))})
practice_df = practice_df.set_index('Date')
weekend_list = []
for i in range(len(practice_df)):
if practice_df.index[i].weekday() > 4:
weekend_list.append(True)
else:
weekend_list.append(False)
practice_df['IsWeekend'] = weekend_list
fig = px.line(temp_df,
x=temp_df.index, y='cnt',
color = 'Is_Weekend',
markers=True)
fig.show()
What I want to do would look something like this but coloring data points/line for weekends differently.
Edit:
Thanks so much to #Derek_O, I was able to color weekend with my original dataset. But I'd want to color the friday-saturday line also colored as weekend legend, so I set practice_df.index[i].weekday() >= 4 instead of practice_df.index[i].weekday() > 4.
But would it be possible to have the Friday point to be the same as weekdays.
Also, is it possible to have a straight line connecting the points, not like stairs?
Otherwise, it'd also work if we could shade weekend region like the image at the bottom.
Borrowing from #Rob Raymond's answer here, we can loop through the practice_df two elements at a time, adding a trace to the fig for each iteration of the loop.
We also only want to show the legend category the first time it occurs (so that the legend entries only show each category like True or False once), which is why I've created a new column called "showlegend" that determines whether the legend is shown or not.
import numpy as np
import pandas as pd
from datetime import datetime
import plotly.express as px
import plotly.graph_objects as go
np.random.seed(42)
rng = pd.date_range('2022-04-10', periods=21, freq='D')
practice_df = pd.DataFrame({ 'Date': rng, 'Val' : np.random.randn(len(rng))})
practice_df = practice_df.set_index('Date')
weekend_list = []
for i in range(len(practice_df)):
if practice_df.index[i].weekday() > 4:
weekend_list.append(True)
else:
weekend_list.append(False)
practice_df['IsWeekend'] = weekend_list
weekend_color_map = {True:0, False:1}
weekend_name_map = {True:"True", False:"False"}
practice_df['color'] = practice_df['IsWeekend'].map(weekend_color_map)
practice_df['name'] = practice_df['IsWeekend'].map(weekend_name_map)
## use the color column since weekend corresponds to 0, nonweekend corresponds to 1
first_weekend_idx = practice_df['color'].loc[practice_df['color'].idxmin()]
first_nonweekend_idx = practice_df['color'].loc[practice_df['color'].idxmax()]
practice_df["showlegend"] = False
showlegendIdx = practice_df.columns.get_indexer(["showlegend"])[0]
practice_df.iat[first_weekend_idx, showlegendIdx] = True
practice_df.iat[first_nonweekend_idx, showlegendIdx] = True
practice_df["showlegend"] = practice_df["showlegend"].astype(object)
fig = go.Figure(
[
go.Scatter(
x=practice_df.index[tn : tn + 2],
y=practice_df['Val'][tn : tn + 2],
mode='lines+markers',
# line_shape="hv",
line_color=px.colors.qualitative.Plotly[practice_df['color'][tn]],
name=practice_df['name'][tn],
legendgroup=practice_df['name'][tn],
showlegend=practice_df['showlegend'][tn],
)
for tn in range(len(practice_df))
]
)
fig.update_layout(legend_title_text='Is Weekend')
fig.show()
Thank you in advance for the assistance!
I am trying to create a heat map from time-series data and the data begins mid year, which is causing the top of my heat map to be shifted to the left and not match up with the rest of the plot (Shown Below). How would I go about shifting the just the top line over so that the visualization of the data syncs up with the rest of the plot?
(Code Provided Below)
import pandas as pd
import matplotlib.pyplot as plt
# links to datadata
url1 = 'https://raw.githubusercontent.com/the-datadudes/deepSoilTemperature/master/minotDailyAirTemp.csv'
# load the data into a DataFrame, not a Series
# parse the dates, and set them as the index
df1 = pd.read_csv(url1, parse_dates=['Date'], index_col=['Date'])
# groupby year and aggregate Temp into a list
dfg1 = df1.groupby(df1.index.year).agg({'Temp': list})
# create a wide format dataframe with all the temp data expanded
df1_wide = pd.DataFrame(dfg1.Temp.tolist(), index=dfg1.index)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
ax1.matshow(df1_wide, interpolation=None, aspect='auto');
Now, what its the problem, the dates on the dataset, if you see the Dataset this start on
`1990-4-24,15.533`
To solve this is neccesary to add the data between 1990/01/01 -/04/23 and delete the 29Feb.
rng = pd.date_range(start='1990-01-01', end='1990-04-23', freq='D')
df = pd.DataFrame(index= rng)
df.index = pd.to_datetime(df.index)
df['Temp'] = np.NaN
frames = [df, df1]
result = pd.concat(frames)
result = result[~((result.index.month == 2) & (result.index.day == 29))]
With this data
dfg1 = result.groupby(result.index.year).agg({'Temp': list})
df1_wide = pd.DataFrame(dfg1['Temp'].tolist(), index=dfg1.index)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
ax1.matshow(df1_wide, interpolation=None, aspect='auto');
The problem with the unfilled portions are a consequence of the NaN values on your dataset, in this case you take the option, replace the NaN values with the column-mean or replace by the row-mean.
Another ways are available to replace the NaN values
df1_wide = df1_wide.apply(lambda x: x.fillna(x.mean()),axis=0)
I want to plot machine observation data by days separately,
so changes between Current, Temperature etc. can be seen by hour.
Basically I want one plot for each day. Thing is when I make too many of these Jupyter Notebook can't display each one of them and plotly gives error.
f_day --> first day
n_day --> next day
I think of using sub_plots with a shared y-axis but then I don't know how I can put different dates in x-axis
How can I make these with graph objects and sub_plots ? So therefore using only 1 figure object so plots doesn't crash.
Data looks like this
,ID,IOT_ID,DATE,Voltage,Current,Temperature,Noise,Humidity,Vibration,Open,Close
0,9466,5d36edfe125b874a36c6a210,2020-08-06 09:02:00,228.893,4.17,39.9817,73.1167,33.3133,2.05,T,F
1,9467,5d36edfe125b874a36c6a210,2020-08-06 09:03:00,228.168,4.13167,40.0317,69.65,33.265,2.03333,T,F
2,9468,5d36edfe125b874a36c6a210,2020-08-06 09:04:00,228.535,4.13,40.11,71.7,33.1717,2.08333,T,F
3,9469,5d36edfe125b874a36c6a210,2020-08-06 09:05:00,228.597,4.14,40.1683,71.95,33.0417,2.0666700000000002,T,F
4,9470,5d36edfe125b874a36c6a210,2020-08-06 09:06:00,228.405,4.13333,40.2317,71.2167,32.9933,2.0,T,F
Code with display error is this
f_day = pd.Timestamp('2020-08-06 00:00:00')
for day in range(days_between.days):
n_day = f_day + pd.Timedelta('1 days')
fig_df = df[(df["DATE"] >= f_day) & (df["DATE"] <= n_day) & (df["IOT_ID"] == iot_id)]
fig_cn = px.scatter(
fig_df, x="DATE", y="Current", color="Noise", color_continuous_scale= "Sunset",
title= ("IoT " + iot_id + " " + str(f_day.date())),
range_color= (min_noise,max_noise)
)
f_day = n_day
fig_cn.show()
updated
The question was with respect to plotly not matplotlib. Same approach works. Clearly axis and titles need some beautification
import pandas as pd
import plotly.subplots
import plotly.express as px
import datetime as dt
import random
df = pd.DataFrame([{"DATE":d, "IOT_ID":random.randint(1,5), "Noise":random.uniform(0,1), "Current":random.uniform(15,25)}
for d in pd.date_range(dt.datetime(2020,9,1), dt.datetime(2020,9,4,23,59), freq="15min")])
# get days to plot
days = df["DATE"].dt.floor("D").unique()
# create axis for each day
fig = plotly.subplots.make_subplots(len(days))
iot_id=3
for i,d in enumerate(days):
# filter data and plot ....
mask = (df["DATE"].dt.floor("D")==d)&(df["IOT_ID"]==iot_id)
splt = px.scatter(df.loc[mask], x="DATE", y="Current", color="Noise", color_continuous_scale= "Sunset",
title= f"IoT ({iot_id}) Date:{pd.to_datetime(d).strftime('%d %b')}")
# select_traces() returns a generator so turn it into a list and take first one
fig.add_trace(list(splt.select_traces())[0], row=i+1, col=1)
fig.show()
It's simple - create the axis that you want to plot on first. Then plot. I've simulated your data as you didn't provide in your question.
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import random
df = pd.DataFrame([{"DATE":d, "IOT_ID":random.randint(1,5), "Noise":random.uniform(0,1), "Current":random.uniform(15,25)}
for d in pd.date_range(dt.datetime(2020,9,1), dt.datetime(2020,9,4,23,59), freq="15min")])
# get days to plot
days = df["DATE"].dt.floor("D").unique()
# create axis for each day
fig, ax = plt.subplots(len(days), figsize=[20,10],
sharey=True, sharex=False, gridspec_kw={"hspace":0.4})
iot_id=3
for i,d in enumerate(days):
# filter data and plot ....
df.loc[(df["DATE"].dt.floor("D")==d)&(df["IOT_ID"]==iot_id),].plot(kind="scatter", ax=ax[i], x="DATE", y="Current", c="Noise",
colormap= "turbo", title=f"IoT ({iot_id}) Date:{pd.to_datetime(d).strftime('%d %b')}")
ax[i].set_xlabel("") # it's in the titles...
output
I am plotting some time-series with pandas dataframe and I ran into the problem of gaps on weekends. What can I do to remove gaps in the time-series plot?
date_concat = pd.to_datetime(pd.Series(df.index),infer_datetime_format=True)
pca_factors.index = date_concat
pca_colnames = ['Outright', 'Curve', 'Convexity']
pca_factors.columns = pca_colnames
fig,axes = plt.subplots(2)
pca_factors.Curve.plot(ax=axes[0]); axes[0].set_title('Curve')
pca_factors.Convexity.plot(ax=axes[1]); axes[1].set_title('Convexity'); plt.axhline(linewidth=2, color = 'g')
fig.tight_layout()
fig.savefig('convexity.png')
Partial plot below:
Ideally, I would like the time-series to only show the weekdays and ignore weekends.
To make MaxU's suggestion more explicit:
convert to datetime as you have done, but drop the weekends
reset the index and plot the data via this default Int64Index
change the x tick labels
Code:
date_concat = data_concat[date_concat.weekday < 5] # drop weekends
pca_factors = pca_factors.reset_index() # from MaxU's comment
pca_factors['Convexity'].plot() # ^^^
plt.xticks(pca_factors.index, date_concat) # change the x tick labels
Using pandas and wondering why the date column isn't showing up as the actual dates (type = pandas.tslib.Timestamp) but are showing up as numbers.
Take this replicable example:
todays_date = datetime.datetime.now().date()
columns = ['month','A','B','C','D']
_dates = pd.DataFrame(pd.date_range(todays_date-datetime.timedelta(10), periods=150, freq='M'))
_randomdata = pd.DataFrame(np.random.randn(150, 4))
data = pd.concat([_dates, _randomdata], axis=1)
data.plot(figsize = (10,6))
As you can see, the x-axis is showing up as numbers, not dates.
2 questions:
a) How do I change it so that the actual dates are showing up on the x-axis?
b) How do I change the frequency of the ticks and tick labels on the x-axis if I want more/fewer months showing up?
Thanks guys!
Just use the date_range as an index to the DataFrame:
todays_date = datetime.datetime.now().date()
columns = ['A','B','C','D']
data = pd.DataFrame(data=np.random.randn(150, 4),
index=pd.date_range(todays_date-datetime.timedelta(10), periods=150, freq='M'),
columns=columns)
data.plot(figsize = (10,6))