When I plot my data with just the index the graph looks fine. But when I try to plot it with a datetime object in the x axis, the plot gets messed up. Does anyone know why? I provided the head of my data and also the two plots.
import plotly.express as px
fig = px.line(y=data.iloc[:,3])
fig.show()
fig = px.line(y=data.iloc[:,3],x=data.iloc[:,0])
fig.show()
It is probably because of missing dates as you have around 180 data points but your second plot shows data spans from 2014 to 2019 that means it does not have many data points in between that's why your second graph looks like that.
Instead of datetime try plotting converting it into string but then it will not be a time series as you will have many missing dates
Here I have two solutions:
Use reset_index() function to get rid off the missing dates but it just explains the chart but x-axis values don't provide date information anymore.
Heres an example:
This is the data frame and I want to plot the chart between time and closing price
import plotly.graph_objects as go
fig = go.Figure([go.Scatter(x=df.reset_index().index, y=df['close'])])
fig.show()
1277 is the index value and corresponding value is the closing price.
Use .iloc() to find the x-axis value
Convert x-axis value to datetime object
Follow this link: https://stackoverflow.com/a/51231209/10277042
Related
I am making an OHLC graph using plotly. I have stumbled across one issue. The labels in the x-axis is looking really messy . Is there a way to make it more neat. Or can we only show the extreme date values. For example only the first date value and last date value is show. The date range is a dynamic in nature. I am using the below query to make the graph . Thanks for the help.
fig = go.Figure(data=go.Candlestick(x=tickerDf.index.date,
open=tickerDf.Open,
high=tickerDf.High,
low=tickerDf.Low,
close=tickerDf.Close) )
fig.update_xaxes(showticklabels=True ) #Disable xticks
fig.update_layout(width=800,height=600,xaxis=dict(type = "category") ) # hide dates with no values
st.plotly_chart(fig)
Here tickerDf is the dataframe which contains the stock related data.
One way that you can use is changing the nticks. This can be done by calling fig.update_xaxes() and passing nticks as the parameter. For example, here's a plot with the regular amount of ticks, with no change.
and here is what it looks like after specifying the number of ticks:
The code for the second plot:
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('./finance-charts-apple.csv')
fig = go.Figure([go.Scatter(x=df['Date'], y=df['AAPL.High'])])
fig.update_xaxes(nticks=5)
fig.show()
the important line, again is:
fig.update_xaxes(nticks=5)
Right now Im trying to figure out how Matplotlib and Pandas work by processing a dataset with movements out/in a birdnest. I successfully plotted the full dataset as an line graph with an xaxis as timestamp.
However when Im trying to plot a set interval as a barchart (grouped by hour) matplotlib no longer recognize the xaxis as a timestamp, so i can't use formatters and locators correctly (HourLocator etc.).
The index of my dataframe is a timestamp, however it seems to loose it properties as a timestamp when i use the .groupby() function.
newdf = df[date:(date+forward*deltaday)]
bx = newdf['Movements'].groupby(pd.Grouper( freq='H')).sum().plot.bar()
xtick_locator = mdates.AutoDateLocator()
xtick_formatter = mdates.AutoDateFormatter(xtick_locator)
bx.xaxis.set_major_locator(xtick_locator)
bx.xaxis.set_major_formatter(xtick_formatter)
When running this code i get the following error messages:
ValueError: view limit minimum -0.5 is less than 1 and is an invalid
Matplotlib date value. This often happens if you pass a non-datetime
value to an axis that has datetime units"
Which indicates that my xaxis is not datetime object anymore. When fetching my xticks labels with xticks() i get following:
Text(0, 0, '2015-02-02 00:00:00')...
Text(47, 0, '2015-02-03 23:00:00')
It looks like an timestamp but matplotlib won't recognize it as one. Without any expertise knowledge i believe it .groupby() who is the bandit.
Is there a better way to plot a barchart with the sum of df['Movements'] per hour plotted for a set time interval?
(df['Movements'] is right now movements per minute)
So I have a function that will take a pandas dataframe and plot it, along with displaying some error metrics, and I also have a function that will take a pandas dataframe with a datetime type index, and take the daily average of the values in the dataframe. The problem is, when I try to plot the daily average, it looks really bad with matplotlib because it plots everyday as a seperate tick on the x axis. I have all this code in a package called Hydrostats, the github reposity source code for the daily average function is here, and the source code for the plotting function is here. The plot for a linear time series is is below.
The Daily Average plot is shown below
As you can see, you can't see any of the x axis ticks because they are all so squished together.
You can set the ticks used for the x axis via ax.set_xticks() and labels via ax.set_xticklabels().
For instance you could just provide that method with a list of dates to use, such as every 20th value of the current pd.DataFrame index (df.index[::20]) and then set the formatting of the date string as below.
# Get the current axis
ax = plt.gca()
# Only label every 20th value
ticks_to_use = df.index[::20]
# Set format of labels (note year not excluded as requested)
labels = [ i.strftime("%-H:%M") for i in ticks_to_use ]
# Now set the ticks and labels
ax.set_xticks(ticks_to_use)
ax.set_xticklabels(labels)
Notes
If labels still overlap, you could also rotate the them by passing the rotatation argument (e.g. ax.set_xticklabels(labels, rotation=45)).
There is a useful reference for time string formats here: http://strftime.org.
I faced similar issue with my plot
Matplotlib automatically handles timestamps on axes, but only when they are in timestamp format. Timestamps in index were in string format, so I changed read_csv to
pd.read_csv(file_path, index_col=[0], parse_dates=True)
Try changing the index to timestamp format. This solved the problem for me hope it does the same for you.
I have a csv file of power levels at several stations (4 in this case, though "HUT4" is not in this short excerpt):
2014-06-21T20:03:21,HUT3,74
2014-06-21T21:03:16,HUT1,70
2014-06-21T21:04:31,HUT3,73
2014-06-21T21:04:33,HUT2,30
2014-06-21T22:03:50,HUT3,64
2014-06-21T23:03:29,HUT1,60
(etc . .)
The times are not synchronised across stations. The power level is (in this case) integer percent. Some machines report in volts (~13.0), which would be an additional issue when plotting.
The data is easy to read into a dataframe, to index the dataframe, to put into a dictionary. But I can't get the right syntax to make a meaningful plot. Either all stations on a single plot sharing a timeline that's big enough for all stations, or as separate plots, maybe a subplot for each station. If I do:
import pandas as pd
df = pd.read_csv('Power_Log.csv',names=['DT','Station','Power'])
df2=df.groupby(['Station']) # set 'Station' as the data index
d = dict(iter(df2)) # make a dictionary including each station's data
for stn in d.keys():
d[stn].plot(x='DT',y='Power')
plt.legend(loc='lower right')
plt.savefig('Station_Power.png')
I do get a plot but the X axis is not right for each station.
I have not figured out yet how to do four independent subplots, which would free me from making a wide-enough timescale.
I would greatly appreciate comments on getting a single plot right and/or getting good looking subplots. The subplots do not need to have synchronised X axes.
I'd rather plot the typical way, smth like:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
plt.savefig()
( http://matplotlib.org/users/pyplot_tutorial.html )
Re more subplots: simply call plt.plot() multiple times, once for each data series.
P.S. you can set xticks this way: Changing the "tick frequency" on x or y axis in matplotlib?
Sorry for the comment above where I needed to add code. Still learning . .
From the 5th code line:
import matplotlib.dates as mdates
for stn in d.keys():
plt.figure()
d[stn].interpolate().plot(x='DT',y='Power',title=stn,rot=45)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%D/%M/%Y'))
plt.savefig('Station_Power_'+stn+'.png')
Does more or less what I want to do except the DateFormatter line does not work. I would like to shorten my datetime data to show just date. If it places ticks at midnight that would be brilliant but not strictly necessary.
The key to getting a continuous plot is to use the interpolate() method in the plot.
With this data having different x scales from station to station a plot of all stations on the same graph does not work. HUT4 in my data has far fewer records and only plots to about 25% of the scale even though the datetime values cover more or less the same range as the other HUTs.
I have time series in a Pandas dateframe with a number of columns which I'd like to plot. Is there a way to set the x-axis to always use the index from a dateframe?
When I use the .plot() method from Pandas the x-axis is formatted correctly however I when I pass my dates and the column(s) I'd like to plot directly to matplotlib the graph doesn't plot correctly. Thanks in advance.
plt.plot(site2.index.values, site2['Cl'])
plt.show()
FYI: site2.index.values produces this (I've cut out the middle part for brevity):
array([
'1987-07-25T12:30:00.000000000+0200',
'1987-07-25T16:30:00.000000000+0200',
'2010-08-13T02:00:00.000000000+0200',
'2010-08-31T02:00:00.000000000+0200',
'2010-09-15T02:00:00.000000000+0200'
],
dtype='datetime64[ns]')
It seems the issue was that I had .values. Without it (i.e. site2.index) the graph displays correctly.
You can use plt.xticks to set the x-axis
try:
plt.xticks( site2['Cl'], site2.index.values ) # location, labels
plt.plot( site2['Cl'] )
plt.show()
see the documentation for more details: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xticks
That's Builtin Right Into To plot() method
You can use yourDataFrame.plot(use_index=True) to use the DataFrame Index On X-Axis.
The "use_index=True" sets the DataFrame Index on the X-Axis.
Read More Here: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.html
you want to use matplotlib to select a 'sensible' scale just like me, there is one way can solve this question. using a Pandas dataframe index as values for x-axis in matplotlib plot. Code:
ax = plt.plot(site2['Cl'])
x_ticks = ax.get_xticks() # use matplotlib default xticks
x_ticks = list(filter(lambda x: x in range(len(site2)), x_ticks))
ax.set_xticklabels([' '] + site2.index.iloc[x_ticks].to_list())