Plotly, Python. Can i plot vertical line based on a datetime object? - python

I am forecasting Covid cases and using Plotly for visualization. I would like to plot a straight vertical line in place where forecast starts. I have a chart like this.
chart. I just want to plot a vertical line on date 25 Jan 2021, so it is visible where forecast starts.

because you didn't share your data i tried to solve the answer with a sample code snippet from plotly:
import plotly.express as px
df = px.data.stocks()
fig = px.line(df, x='date', y="GOOG")
fig.add_vline(x='2019-01-25')
fig.show()
I have added the following line to my code before fig.show():
fig.add_vline(x='2021-01-25')
If my date format differs from yours, you get it by printing your graph input:
print(df)
date ...
0 2018-01-01 ...
1 2018-01-08 ...
2 2018-01-15 ...
3 2018-01-22 ...
4 2018-01-29 ...
... ...
If you need more info and examples check: https://plotly.com/python/horizontal-vertical-shapes/
My result chart

Related

Seaborn lineplot, show months when index of your data is Date

I have a Kaggle dataset (link).
I read the dataset, and I set the Date to be index column:
museum_data = pd.read_csv("museum_visitors.csv", index_col = "Date", parse_dates = True)
Then, the museum_data be like:
Date
Avila Adobe
Firehouse Museum
Chinese American Museum
America Tropical Interpretive Center
2014-01-01
24778
4486
1581
6602
2014-02-01
18976
4172
1785
5029
...
...
...
...
...
2018-10-01
19280
4622
2364
3775
2018-11-01
17163
4082
2385
4562
Here is the code I use to plot the lineplot in seaborn:
plt.figure(figsize = (20,8))
sns.lineplot(data = museum_data)
plt.show()
And, this is what the result looks like:
What I want to know is that, how I can show multiple (not all, for example, first month of each season) months per year in x-axis.
Thank you all for your time, in advance.
You can use MonthLocator and perhaps ConciseDateFormatter to add minor ticks with a few months showing, something like the following:
import matplotlib.dates as mdates
...
fig, ax = plt.subplots(figsize = (20,8))
sns.lineplot(data = museum_data, ax=ax)
locator = mdates.MonthLocator(bymonth=[4,7,10])
ax.xaxis.set_minor_locator(locator)
ax.xaxis.set_minor_formatter(mdates.ConciseDateFormatter(locator))
Output:
Edit (closer): you can add the following to show January as well:
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
Output:
Edit 2 (there's probably a better way but I'm rusty):
length = plt.rcParams["xtick.minor.size"]
pad = plt.rcParams['xtick.minor.pad']
ax.tick_params('x', length=length, pad=pad)

Plotting a pandas Series using dates and values too squished

I am trying to plot a simple pandas Series object, its something like this:
2018-01-01 10
2018-01-02 90
2018-01-03 79
...
2020-01-01 9
2020-01-02 72
2020-01-03 65
It includes only the first month of each year, so it only contains the month January and all its values through the days.
When i try to plot it
# suppose the name of the series is dates_and_values
dates_and_values.plot()
It returns a plot like this (made using my current data)
It is clearly plotting by year and then the month, so it looks pretty squished and small, since i don't have any other months except January, is there a way to plot it by the year and day so it outputs a better plot to observe the days.
the x-axis is the index of the dataframe
dates are a continuous series, x-axis is continuous
change index to be a string of values, means it it no longer continuous and squishes your graph
have generated some sample data that only has January to demonstrate
import matplotlib.pyplot as plt
cf = pd.tseries.offsets.CustomBusinessDay(weekmask="Sun Mon Tue Wed Thu Fri Sat",
holidays=[d for d in pd.date_range("01-jan-1990",periods=365*50, freq="D")
if d.month!=1])
d = pd.date_range("01-jan-2015", periods=200, freq=cf)
df = pd.DataFrame({"Values":np.random.randint(20,70,len(d))}, index=d)
fig, ax = plt.subplots(2, figsize=[14,6])
df.set_index(df.index.strftime("%Y %d")).plot(ax=ax[0])
df.plot(ax=ax[1])
I suggest that you convert the series to a dataframe and then pivot it to get one column for each year. This lets you plot the data for each year with a separate line, either in the same plot using different colors or in subplots. Here is an example:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample series
rng = np.random.default_rng(seed=123) # random number generator
dt = pd.date_range('2018-01-01', '2020-01-31', freq='D')
dt_jan = dt[dt.month == 1]
series = pd.Series(rng.integers(20, 90, size=dt_jan.size), index=dt_jan)
# Convert series to dataframe and pivot it
df_raw = series.to_frame()
df_pivot = df_raw.pivot_table(index=df_raw.index.day, columns=df_raw.index.year)
df = df_pivot.droplevel(axis=1, level=0)
df.head()
# Plot all years together in different colors
ax = df.plot(figsize=(10,4))
ax.set_xlim(1, 31)
ax.legend(frameon=False, bbox_to_anchor=(1, 0.65))
ax.set_xlabel('January', labelpad=10, size=12)
for spine in ['top', 'right']:
ax.spines[spine].set_visible(False)
# Plot years separately
axs = df.plot(subplots=True, color='tab:blue', sharey=True,
figsize=(10,8), legend=None)
for ax in axs:
ax.set_xlim(1, 31)
ax.grid(axis='x', alpha=0.3)
handles, labels = ax.get_legend_handles_labels()
ax.text(28.75, 80, *labels, size=14)
if ax.is_last_row():
ax.set_xlabel('January', labelpad=10, size=12)
ax.figure.subplots_adjust(hspace=0)

How to make a bar_polar chart with plotly, without frequency column?

The first part of code is code from plotly website and it shows what i would like to make with my own data available.
import plotly.express as px
import pandas as pd
fig= px.bar_polar(df2, r="frequency", theta = "direction",
color = "strength", template = "plotly_dark",
color_discrete_sequence = px.colors.sequential.Plasma_r)
fig.show()
Desired graph:
This is my code so far:
import plotly.express as px
import pandas as pd
df = pd.read_csv (r"Lidar_final_corrected.csv")
print(df)
wind_dir Windspeed
0 227.35 2.12
1 233.41 1.65
2 227.75 1.52
3 217.75 1.71
4 204.64 2.21
... ... ...
3336 222.33 17.89
3337 221.52 17.21
3338 219.37 15.45
3339 217.23 16.09
3340 218.31 16.18
[3341 rows x 2 columns]
fig= px.bar_polar(df,theta = "wind_dir",
color = "wind_dir", template = "plotly_dark",
color_discrete_sequence = px.colors.sequential.Plasma_r)
fig.show()
Current output:
Now if you look closely you can see that it is plotting something, but the colors are not right. I left out the frequency because I only have two columns to work with. Is there a workaround to make this plot work without the frequency?
(PS: the data is data based on time series so every 10min or 1 min a new entry is in there. I deleted the timestamp because I don't think I need it.)
you need to do a pandas groupby
grp = df.groupby(["wind_dir", "wind_speed"]).size().reset_index(name="frequency")
as explained in this answer
Then you can use the plotly code same as in the plotly docs!
fig = px.bar_polar(grp, r="frequency", theta="wind_dir",
color="wind_speed", template="plotly_dark",
color_discrete_sequence=px.colors.sequential.Plasma_r
)
fig.show()
Happy plotting!
(for better results you should bin the wind directions and wind speeds.
(at least round them). Example of proper binning:
https://community.plotly.com/t/wind-rose-with-wind-speed-m-s-and-direction-deg-data-columns-need-help/33274/5

Add months to xaxis and legend on a matplotlib line plot

I am trying to plot stacked yearly line graphs by months.
I have a dataframe df_year as below:
Day Number of Bicycle Hires
2010-07-30 6897
2010-07-31 5564
2010-08-01 4303
2010-08-02 6642
2010-08-03 7966
with the index set to the date going from 2010 July to 2017 July
I want to plot a line graph for each year with the xaxis being months from Jan to Dec and only the total sum per month is plotted
I have achieved this by converting the dataframe to a pivot table as below:
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
This creates the pivot table as below which I can plot as show in the attached figure:
Number of Bicycle Hires 2010 2011 2012 2013 2014
1 NaN 403178.0 494325.0 565589.0 493870.0
2 NaN 398292.0 481826.0 516588.0 522940.0
3 NaN 556155.0 818209.0 504611.0 757864.0
4 NaN 673639.0 649473.0 658230.0 805571.0
5 NaN 722072.0 926952.0 749934.0 890709.0
plot showing yearly data with months on xaxis
The only problem is that the months show up as integers and I would like them to be shown as Jan, Feb .... Dec with each line representing one year. And I am unable to add a legend for each year.
I have tried the following code to achieve this:
dims = (15,5)
fig, ax = plt.subplots(figsize=dims)
ax.plot(pt)
months = MonthLocator(range(1, 13), bymonthday=1, interval=1)
monthsFmt = DateFormatter("%b '%y")
ax.xaxis.set_major_locator(months) #adding this makes the month ints disapper
ax.xaxis.set_major_formatter(monthsFmt)
handles, labels = ax.get_legend_handles_labels() #legend is nowhere on the plot
ax.legend(handles, labels)
Please can anyone help me out with this, what am I doing incorrectly here?
Thanks!
There is nothing in your legend handles and labels, furthermore the DateFormatter is not returning the right values considering they are not datetime objects your translating.
You could set the index specifically for the dates, then drop the multiindex column level which is created by the pivot (the '0') and then use explicit ticklabels for the months whilst setting where they need to occur on your x-axis. As follows:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import datetime
# dummy data (Days)
dates_d = pd.date_range('2010-01-01', '2017-12-31', freq='D')
df_year = pd.DataFrame(np.random.randint(100, 200, (dates_d.shape[0], 1)), columns=['Data'])
df_year.index = dates_d #set index
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
pt.columns = pt.columns.droplevel() # remove the double header (0) as pivot creates a multiindex.
ax = plt.figure().add_subplot(111)
ax.plot(pt)
ticklabels = [datetime.date(1900, item, 1).strftime('%b') for item in pt.index]
ax.set_xticks(np.arange(1,13))
ax.set_xticklabels(ticklabels) #add monthlabels to the xaxis
ax.legend(pt.columns.tolist(), loc='center left', bbox_to_anchor=(1, .5)) #add the column names as legend.
plt.tight_layout(rect=[0, 0, 0.85, 1])
plt.show()

In Pandas, generate DateTime index from Multi-Index with years and weeks

I have a DataFrame df with columns saledate (in DateTime, dytpe <M8[ns]) and price (dytpe int64), such if I plot them like
fig, ax = plt.subplots()
ax.plot_date(dfp['saledate'],dfp['price']/1000.0,'.')
ax.set_xlabel('Date of sale')
ax.set_ylabel('Price (1,000 euros)')
I get a scatter plot which looks like below.
Since there are so many points that it is difficult to discern an average trend, I'd like to compute the average sale price per week, and plot that in the same plot. I've tried the following:
dfp_week = dfp.groupby([dfp['saledate'].dt.year, dfp['saledate'].dt.week]).mean()
If I plot the resulting 'price' column like this
plt.figure()
plt.plot(df_week['price'].values/1000.0)
plt.ylabel('Price (1,000 euros)')
I can more clearly discern an increasing trend (see below).
The problem is that I no longer have a time axis to plot this DataSeries in the same plot as the previous figure. The time axis starts like this:
longitude_4pp postal_code_4pp price rooms \
saledate saledate
2014 1 4.873140 1067.5 206250.0 2.5
6 4.954779 1102.0 129000.0 3.0
26 4.938828 1019.0 327500.0 3.0
40 4.896904 1073.0 249000.0 2.0
43 4.938828 1019.0 549000.0 5.0
How could I convert this Multi-Index with years and weeks back to a single DateTime index that I can plot my per-week-averaged data against?
If you group using pd.TimeGrouper you'll keep datetimes in your index.
dfp.groupby(pd.TimeGrouper('W')).mean()
Create a new index:
i = pd.Index(pd.datetime(year, 1, 1) + pd.Timedelta(7 * weeks, unit='d') for year, weeks in df.index)
Then set this new index on the DataFrame:
df.index = i
For the sake of completeness, here are the details of how I implemented the solution suggested by piRSquared:
fig, ax = plt.subplots()
ax.plot_date(dfp['saledate'],dfp['price']/1000.0,'.')
ax.set_xlabel('Date of sale')
ax.set_ylabel('Price (1,000 euros)')
dfp_week = dfp.groupby(pd.TimeGrouper(key='saledate', freq='W')).mean()
plt.plot_date(dfp_week.index, dfp_week['price']/1000.0)
which yields the plot below.

Categories