I have two time series from different years stored in pandas dataframes. For example:
data15 = pd.DataFrame(
[1,2,3,4,5,6,7,8,9,10,11,12],
index=pd.date_range(start='2015-01',end='2016-01',freq='M'),
columns=['2015']
)
data16 = pd.DataFrame(
[5,4,3,2,1],
index=pd.date_range(start='2016-01',end='2016-06',freq='M'),
columns=['2016']
)
I'm actually working with daily data but if this question is answered sufficiently I can figure out the rest.
What I'm trying to do is overlay the plots of these different data sets onto a single plot from January through December to compare the differences between the years. I can do this by creating a "false" index for one of the datasets so they have a common year:
data16.index = data15.index[:len(data16)]
ax = data15.plot()
data16.plot(ax=ax)
But I would like to avoid messing with the index if possible. Another problem with this method is that the year (2015) will appear in the x axis tick label which I don't want. Does anyone know of a better way to do this?
One way to do this would be to overlay a transparent axes over the first, and plot the 2nd dataframe in that one, but then you'd need to update the x-limits of both axes at the same time (similar to twinx). However, I think that's far more work and has a few more downsides: you can't easily zoom interactively into a specific region anymore for example, unless you make sure both axes are linked via their x-limits. Really, the easiest is to take into account that offset, by "messing with the index".
As for the tick labels, you can easily change the format so that they don't show the year by specifying the x-tick format:
import matplotlib.dates as mdates
month_day_fmt = mdates.DateFormatter('%b %d') # "Locale's abbreviated month name. + day of the month"
ax.xaxis.set_major_formatter(month_day_fmt)
Have a look at the matplotlib API example for specifying the date format.
I see two options.
Option 1: add a month column to your dataframes
data15['month'] = data15.index.to_series().dt.strftime('%b')
data16['month'] = data16.index.to_series().dt.strftime('%b')
ax = data16.plot(x='month', y='2016')
ax = data15.plot(x='month', y='2015', ax=ax)
Option 2: if you don't want to do that, you can use matplotlib directly
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(data15['2015'].values)
ax.plot(data16['2016'].values)
plt.xticks(range(len(data15)), data15.index.to_series().dt.strftime('%b'), size='small')
Needless to say, I would recommend the first option.
You might be able to use pandas.DatetimeIndex.dayofyear to get the day number which will allow you to plot two different year's data on top of one another.
in: date=pd.datetime('2008-10-31')
in: date.dayofyear
out: 305
Related
I am making an OHLC graph using plotly. I have stumbled across one issue. The labels in the x-axis is looking really messy . Is there a way to make it more neat. Or can we only show the extreme date values. For example only the first date value and last date value is show. The date range is a dynamic in nature. I am using the below query to make the graph . Thanks for the help.
fig = go.Figure(data=go.Candlestick(x=tickerDf.index.date,
open=tickerDf.Open,
high=tickerDf.High,
low=tickerDf.Low,
close=tickerDf.Close) )
fig.update_xaxes(showticklabels=True ) #Disable xticks
fig.update_layout(width=800,height=600,xaxis=dict(type = "category") ) # hide dates with no values
st.plotly_chart(fig)
Here tickerDf is the dataframe which contains the stock related data.
One way that you can use is changing the nticks. This can be done by calling fig.update_xaxes() and passing nticks as the parameter. For example, here's a plot with the regular amount of ticks, with no change.
and here is what it looks like after specifying the number of ticks:
The code for the second plot:
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('./finance-charts-apple.csv')
fig = go.Figure([go.Scatter(x=df['Date'], y=df['AAPL.High'])])
fig.update_xaxes(nticks=5)
fig.show()
the important line, again is:
fig.update_xaxes(nticks=5)
I'm trying to plot a line chart based on 2 columns using seaborn from a dataframe imported as a .csv with pandas.
The data consists of ~97000 records across 19 years of timeframe.
First part of the code: (I assume the code directly below shouldn't contribute to the issue, but will list it just in case)
# use pandas to read CSV files and prepare the timestamp column for recognition
temporal_fires = pd.read_csv("D:/GIS/undergraduateThesis/data/fires_csv/mongolia/modis_2001-2019_Mongolia.csv")
temporal_fires = temporal_fires.rename(columns={"acq_date": "datetime"})
# recognize the datetime column from the data
temporal_fires["datetime"] = pd.to_datetime(temporal_fires["datetime"])
# add a year column to the dataframe
temporal_fires["year"] = temporal_fires["datetime"].dt.year
temporal_fires['count'] = temporal_fires['year'].map(temporal_fires['year'].value_counts())
The plotting part of the code:
# plotting (seaborn)
plot1 = sns.lineplot(x="year",
y="count",
data=temporal_fires,
color='firebrick')
plt.gca().xaxis.set_major_formatter(FuncFormatter(lambda x, _: int(x)))
plt.xlabel("Шаталт бүртгэгдсэн он", fontsize=10)
plt.ylabel("Бүртгэгдсэн шаталтын тоо")
plt.title("2001-2019 он тус бүрт бүртгэгдсэн шаталтын график")
plt.xticks(fontsize=7.5, rotation=45)
plt.yticks(fontsize=7.5)
Python doesn't return any errors and does show the figure:
... but (1) the labels are not properly aligned with the graph vertices and (2) I want the X label ticks to show each year instead of skipping some. For the latter, I did find a stackoverflow post, but it was for a heatmap, so I'm not sure how I'll advance in this case.
How do I align them properly and show all ticks?
Thank you.
I found my answer, just in case anyone makes the same mistake.
The line
plt.gca().xaxis.set_major_formatter(FuncFormatter(lambda x, _: int(x)))
converted the X ticks on my plot to to its nearest number, but the original values stayed the same. The misalignment was because I had just renamed the "years" "2001.5" to "2001", not actually modifying the core data itself.
As for the label intervals, the addition of this line...
plt.xticks(np.arange(min(temporal_fires['year']), max(temporal_fires['year'])+1, 1.0))
...showed me all of the year values in the plot instead of skipping them.
So I have a function that will take a pandas dataframe and plot it, along with displaying some error metrics, and I also have a function that will take a pandas dataframe with a datetime type index, and take the daily average of the values in the dataframe. The problem is, when I try to plot the daily average, it looks really bad with matplotlib because it plots everyday as a seperate tick on the x axis. I have all this code in a package called Hydrostats, the github reposity source code for the daily average function is here, and the source code for the plotting function is here. The plot for a linear time series is is below.
The Daily Average plot is shown below
As you can see, you can't see any of the x axis ticks because they are all so squished together.
You can set the ticks used for the x axis via ax.set_xticks() and labels via ax.set_xticklabels().
For instance you could just provide that method with a list of dates to use, such as every 20th value of the current pd.DataFrame index (df.index[::20]) and then set the formatting of the date string as below.
# Get the current axis
ax = plt.gca()
# Only label every 20th value
ticks_to_use = df.index[::20]
# Set format of labels (note year not excluded as requested)
labels = [ i.strftime("%-H:%M") for i in ticks_to_use ]
# Now set the ticks and labels
ax.set_xticks(ticks_to_use)
ax.set_xticklabels(labels)
Notes
If labels still overlap, you could also rotate the them by passing the rotatation argument (e.g. ax.set_xticklabels(labels, rotation=45)).
There is a useful reference for time string formats here: http://strftime.org.
I faced similar issue with my plot
Matplotlib automatically handles timestamps on axes, but only when they are in timestamp format. Timestamps in index were in string format, so I changed read_csv to
pd.read_csv(file_path, index_col=[0], parse_dates=True)
Try changing the index to timestamp format. This solved the problem for me hope it does the same for you.
I get data every 5 mins between 9:30am and 4pm. Most days I just plot live intraday data. However, sometimes I want a historical view of lets says 2+ days. The only problem is that during 4pm and 9:30 am I just get a line connecting the two data points. I would like that gap to disappear. My code and an example of what is happening are below;
fig = plt.figure()
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
ax = fig.add_subplot(111)
myFmt = mdates.DateFormatter('%m/%d %I:%M')
ax.xaxis.set_major_formatter(myFmt)
line, = ax.plot(data['Date_Time'],data['Y'],'b-')
I want to keep the data as a time series so that when i scroll over it I can see the exact date and time.
So it looks like you're using a pandas object, which is helpful. Assuming you have filtered out any time between 4pm and 9am in data['Date_Time'], I would make sure your index is reset via data.reset_index(). You'll want to use that integer index as the under-the-hood index that matplotlib actually uses to plot the timeseries. Then you can manually alter the tick labels themselves with plt.xticks() as seen in this demo case. Put together, I would expect it to look something like this:
data = data.reset_index(drop=True) # just to remove that pesky column
fig, ax = plt.subplots(1,1)
ax.plot(data.index, data['Y'])
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
plt.xticks(data.index, data['Date_Time'])
I noticed the last statement in your question just after posting this. Unfortunately, this "solution" doesn't track the "x" variable in an interactive figure. That is, while the time axis labels adjust to your zoom, you can't know the time by cursor location, so you'd have to eyeball it up from the bottom of the figure. :/
I have a csv file of power levels at several stations (4 in this case, though "HUT4" is not in this short excerpt):
2014-06-21T20:03:21,HUT3,74
2014-06-21T21:03:16,HUT1,70
2014-06-21T21:04:31,HUT3,73
2014-06-21T21:04:33,HUT2,30
2014-06-21T22:03:50,HUT3,64
2014-06-21T23:03:29,HUT1,60
(etc . .)
The times are not synchronised across stations. The power level is (in this case) integer percent. Some machines report in volts (~13.0), which would be an additional issue when plotting.
The data is easy to read into a dataframe, to index the dataframe, to put into a dictionary. But I can't get the right syntax to make a meaningful plot. Either all stations on a single plot sharing a timeline that's big enough for all stations, or as separate plots, maybe a subplot for each station. If I do:
import pandas as pd
df = pd.read_csv('Power_Log.csv',names=['DT','Station','Power'])
df2=df.groupby(['Station']) # set 'Station' as the data index
d = dict(iter(df2)) # make a dictionary including each station's data
for stn in d.keys():
d[stn].plot(x='DT',y='Power')
plt.legend(loc='lower right')
plt.savefig('Station_Power.png')
I do get a plot but the X axis is not right for each station.
I have not figured out yet how to do four independent subplots, which would free me from making a wide-enough timescale.
I would greatly appreciate comments on getting a single plot right and/or getting good looking subplots. The subplots do not need to have synchronised X axes.
I'd rather plot the typical way, smth like:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
plt.savefig()
( http://matplotlib.org/users/pyplot_tutorial.html )
Re more subplots: simply call plt.plot() multiple times, once for each data series.
P.S. you can set xticks this way: Changing the "tick frequency" on x or y axis in matplotlib?
Sorry for the comment above where I needed to add code. Still learning . .
From the 5th code line:
import matplotlib.dates as mdates
for stn in d.keys():
plt.figure()
d[stn].interpolate().plot(x='DT',y='Power',title=stn,rot=45)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%D/%M/%Y'))
plt.savefig('Station_Power_'+stn+'.png')
Does more or less what I want to do except the DateFormatter line does not work. I would like to shorten my datetime data to show just date. If it places ticks at midnight that would be brilliant but not strictly necessary.
The key to getting a continuous plot is to use the interpolate() method in the plot.
With this data having different x scales from station to station a plot of all stations on the same graph does not work. HUT4 in my data has far fewer records and only plots to about 25% of the scale even though the datetime values cover more or less the same range as the other HUTs.