Right now Im trying to figure out how Matplotlib and Pandas work by processing a dataset with movements out/in a birdnest. I successfully plotted the full dataset as an line graph with an xaxis as timestamp.
However when Im trying to plot a set interval as a barchart (grouped by hour) matplotlib no longer recognize the xaxis as a timestamp, so i can't use formatters and locators correctly (HourLocator etc.).
The index of my dataframe is a timestamp, however it seems to loose it properties as a timestamp when i use the .groupby() function.
newdf = df[date:(date+forward*deltaday)]
bx = newdf['Movements'].groupby(pd.Grouper( freq='H')).sum().plot.bar()
xtick_locator = mdates.AutoDateLocator()
xtick_formatter = mdates.AutoDateFormatter(xtick_locator)
bx.xaxis.set_major_locator(xtick_locator)
bx.xaxis.set_major_formatter(xtick_formatter)
When running this code i get the following error messages:
ValueError: view limit minimum -0.5 is less than 1 and is an invalid
Matplotlib date value. This often happens if you pass a non-datetime
value to an axis that has datetime units"
Which indicates that my xaxis is not datetime object anymore. When fetching my xticks labels with xticks() i get following:
Text(0, 0, '2015-02-02 00:00:00')...
Text(47, 0, '2015-02-03 23:00:00')
It looks like an timestamp but matplotlib won't recognize it as one. Without any expertise knowledge i believe it .groupby() who is the bandit.
Is there a better way to plot a barchart with the sum of df['Movements'] per hour plotted for a set time interval?
(df['Movements'] is right now movements per minute)
Related
I am making an OHLC graph using plotly. I have stumbled across one issue. The labels in the x-axis is looking really messy . Is there a way to make it more neat. Or can we only show the extreme date values. For example only the first date value and last date value is show. The date range is a dynamic in nature. I am using the below query to make the graph . Thanks for the help.
fig = go.Figure(data=go.Candlestick(x=tickerDf.index.date,
open=tickerDf.Open,
high=tickerDf.High,
low=tickerDf.Low,
close=tickerDf.Close) )
fig.update_xaxes(showticklabels=True ) #Disable xticks
fig.update_layout(width=800,height=600,xaxis=dict(type = "category") ) # hide dates with no values
st.plotly_chart(fig)
Here tickerDf is the dataframe which contains the stock related data.
One way that you can use is changing the nticks. This can be done by calling fig.update_xaxes() and passing nticks as the parameter. For example, here's a plot with the regular amount of ticks, with no change.
and here is what it looks like after specifying the number of ticks:
The code for the second plot:
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('./finance-charts-apple.csv')
fig = go.Figure([go.Scatter(x=df['Date'], y=df['AAPL.High'])])
fig.update_xaxes(nticks=5)
fig.show()
the important line, again is:
fig.update_xaxes(nticks=5)
When I plot my data with just the index the graph looks fine. But when I try to plot it with a datetime object in the x axis, the plot gets messed up. Does anyone know why? I provided the head of my data and also the two plots.
import plotly.express as px
fig = px.line(y=data.iloc[:,3])
fig.show()
fig = px.line(y=data.iloc[:,3],x=data.iloc[:,0])
fig.show()
It is probably because of missing dates as you have around 180 data points but your second plot shows data spans from 2014 to 2019 that means it does not have many data points in between that's why your second graph looks like that.
Instead of datetime try plotting converting it into string but then it will not be a time series as you will have many missing dates
Here I have two solutions:
Use reset_index() function to get rid off the missing dates but it just explains the chart but x-axis values don't provide date information anymore.
Heres an example:
This is the data frame and I want to plot the chart between time and closing price
import plotly.graph_objects as go
fig = go.Figure([go.Scatter(x=df.reset_index().index, y=df['close'])])
fig.show()
1277 is the index value and corresponding value is the closing price.
Use .iloc() to find the x-axis value
Convert x-axis value to datetime object
Follow this link: https://stackoverflow.com/a/51231209/10277042
So I have a function that will take a pandas dataframe and plot it, along with displaying some error metrics, and I also have a function that will take a pandas dataframe with a datetime type index, and take the daily average of the values in the dataframe. The problem is, when I try to plot the daily average, it looks really bad with matplotlib because it plots everyday as a seperate tick on the x axis. I have all this code in a package called Hydrostats, the github reposity source code for the daily average function is here, and the source code for the plotting function is here. The plot for a linear time series is is below.
The Daily Average plot is shown below
As you can see, you can't see any of the x axis ticks because they are all so squished together.
You can set the ticks used for the x axis via ax.set_xticks() and labels via ax.set_xticklabels().
For instance you could just provide that method with a list of dates to use, such as every 20th value of the current pd.DataFrame index (df.index[::20]) and then set the formatting of the date string as below.
# Get the current axis
ax = plt.gca()
# Only label every 20th value
ticks_to_use = df.index[::20]
# Set format of labels (note year not excluded as requested)
labels = [ i.strftime("%-H:%M") for i in ticks_to_use ]
# Now set the ticks and labels
ax.set_xticks(ticks_to_use)
ax.set_xticklabels(labels)
Notes
If labels still overlap, you could also rotate the them by passing the rotatation argument (e.g. ax.set_xticklabels(labels, rotation=45)).
There is a useful reference for time string formats here: http://strftime.org.
I faced similar issue with my plot
Matplotlib automatically handles timestamps on axes, but only when they are in timestamp format. Timestamps in index were in string format, so I changed read_csv to
pd.read_csv(file_path, index_col=[0], parse_dates=True)
Try changing the index to timestamp format. This solved the problem for me hope it does the same for you.
I get data every 5 mins between 9:30am and 4pm. Most days I just plot live intraday data. However, sometimes I want a historical view of lets says 2+ days. The only problem is that during 4pm and 9:30 am I just get a line connecting the two data points. I would like that gap to disappear. My code and an example of what is happening are below;
fig = plt.figure()
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
ax = fig.add_subplot(111)
myFmt = mdates.DateFormatter('%m/%d %I:%M')
ax.xaxis.set_major_formatter(myFmt)
line, = ax.plot(data['Date_Time'],data['Y'],'b-')
I want to keep the data as a time series so that when i scroll over it I can see the exact date and time.
So it looks like you're using a pandas object, which is helpful. Assuming you have filtered out any time between 4pm and 9am in data['Date_Time'], I would make sure your index is reset via data.reset_index(). You'll want to use that integer index as the under-the-hood index that matplotlib actually uses to plot the timeseries. Then you can manually alter the tick labels themselves with plt.xticks() as seen in this demo case. Put together, I would expect it to look something like this:
data = data.reset_index(drop=True) # just to remove that pesky column
fig, ax = plt.subplots(1,1)
ax.plot(data.index, data['Y'])
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
plt.xticks(data.index, data['Date_Time'])
I noticed the last statement in your question just after posting this. Unfortunately, this "solution" doesn't track the "x" variable in an interactive figure. That is, while the time axis labels adjust to your zoom, you can't know the time by cursor location, so you'd have to eyeball it up from the bottom of the figure. :/
I'm using Matplotlib to plot data on Ubuntu 15.10. My y-axis has numeric values and my x-axis timestamps.
I'm having the problem that the date labels intersect with each other making it look bad. How do I increase the distance between the x-axis ticks/labels to be evenly spaced still? Since the automatic selection of ticks was bad I'm okay with manually setting the amount of date ticks. Any other solution is appreciated, too.
Besides, I'm using the following DateFormatter:
formatter = DateFormatter('%m/%d/%y')
axis = plt.gca()
axis.xaxis.set_major_formatter(formatter)
You could add the following to your code:
plt.gcf().autofmt_xdate()
Which automatically formats the x axis for you (rotates the labels to something like 30 degrees etc).
You can also manually set the amount of x ticks that show on your x-axis to avoid it getting crowded, by using the following:
max_xticks = 10
xloc = plt.MaxNLocator(max_xticks)
ax.xaxis.set_major_locator(xloc)
I personally use both together as it makes the graph look much nicer when using dates.
You can simply set the locations you want to be labeled:
axis.set_xticks(x[[0, int(len(x)/2), -1]])
where x would be your array of timestamps