Pandas plotting: How to format datetimeindex? - python

I am doing a barplot out of a dataframe with a 15min datetimeindex over a couple of years.
Using this code:
df_Vol.resample(
'A',how='sum'
).plot.bar(
title='Sums per year',
style='ggplot',
alpha=0.8
)
Unfortunately the ticks on the X-axis are now shown with the full timestamp like this: 2009-12-31 00:00:00.
I would prefer to Keep the code for plotting short, but I couldn't find an easy way to format the timestamp simply to the year (2009...2016) for the plot.
Can someone help on this?

As it does not seem to be possible to Format the date within the Pandas df.plot(), I have decided to create a new dataframe and plot from it.
The solution below worked for me:
df_Vol_new = df_Vol.resample('A',how='sum')
df_Vol_new.index = df_Vol_new.index.format(formatter=lambda x: x.strftime('%Y'))
ax2 =df_Vol_new.plot.bar(title='Sums per year',stacked=True, style='ggplot', alpha=0.8)

I figured an alternative (better, at least to me) way is to add the following to df_Vol_new.plot() command:
plt.legend(df_Vol_new.index.to_period('A'))
This way you would reserve df_Vol_new.index datetime format while getting better plots at the same time.

Related

Pandas datetime64 problem (datetime introduces spikes in data)

This is my first question on stackoverflow, so be kind :)
I work with imported csv files and pandas and really liked the pandas datetime possibilities to work and filter dataframes. But i have serious problems with plotting the data in a neat way when using dates as datetime64. Either when using pandas plots or seaborn plots.
my csv looks like this:
date time Flux_ConNT_C Flux_ConB1 Flux_ConB2 Flux_ConB3 Flux_ConB4 Flux_ConB4
0 01.01.2015 00:30 2.552032129 2.193558665 1.0093326 1.013124869 1.159512896 1.159512896
1 01.01.2015 01:00 2.553308464 2.195533756 1.01003938 1.013935693 1.160672989 1.160672989
2 01.01.2015 01:30 2.554585438 2.197510626 1.010746655 1.014747166 1.161834243 1.161834243
3 01.01.2015 02:00 2.55586305 2.199489276 1.011454426 1.015559289 1.162996658 1.162996658
4 01.01.2015 02:30 2.557141301 2.201469707 1.012162692 1.016372061 1.164160236 1.164160236
when I plot the data with
df.plot(figsize=(15,8))
my output is right output
but when I change the "date time" column to 'datetime64 with
df['date time'] = pd.to_datetime(df['date time'])
and use the same code to plot, the data is plotted with these spikes and its not usable false output
There seems to be a problem with matplotlib, but i can't find anything else than putting register_matplotlib_converters() before the plot, which doesn't change anything.
I'm working with Spyder IDE and Python 3.7 and all libraries are up to date.
Thanks for your help!
Your problem is no miracle, it's simply not reproduciable.
Are you sure your csv doesn't have a header for the first index column 0..4?
Are you sure in the csv column 8 is a duplicate of column 7?
How did you actually import this csv and construct your dataframe?
The first plot only works after replaceing the range index 0..4 by the "date time" column. What other transformations did you apply to the dataframe before calling the plot method?
Your to_datetime conversion only works on a column, not an index. Why don't you share all the code that you've been using?
In the 2 plots the first 5 rows don't don't differ. Why don't you share the data rows that are actually different in the 2 plots?
I will give you credit for trying to abstract the problem properly. Unfortunately, you omitted important information. Based on the limited information you've been showing here, there is no problem at all.
To make my point clear: What you observed is not related to the datetime64[ns] conversion, but to something probably very simple that you didn't consider important enough to share with us.
Have a look at How to create a Minimal, Reproducible Example. The idea is: When you're able to prepare your problem in a reproduciable way, you'll probably be ab le to solve it yourself.

How do I "reset the index" for a matplotlib plot?

I have the following code:
fig, ax = plt.subplots(1, 1)
calls["2016-12-24"].resample("1h").sum().plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().plot(ax=ax)
which generates the following image:
How can I make this so the lines share the x-axis? In other words, how do I make them not switch days?
If you don't care about using the correct datetime as index, you could just reset the index as you suggested for all the series. This is going to overlap all the time series, if this is what you're trying to achieve.
# the below should
calls["2016-12-24"].resample("1h").sum().reset_index("2016-12-24").plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().reset_index("2016-12-25").plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().reset_index("2016-12-26").plot(ax=ax)
Otherwise you should try as well to resample the three columns at the same time. Have a go with the below but not knowing how your original dataframe look like, I'm not sure this will fit your case. You should post some more information about the input dataframe.
# have a try with the below
calls[["2016-12-24","2016-12-25","2016-12-26"].resample('1h').sum().plot()

recognizing date variable in python

I am trying to plot two time series of data what are not in sequential date. So when I plot them they look weird. I don't know how I can fix it here is my code and figure. Thank you.
date=read_myfile['Date']
x=[dt.datetime.strptime(d,'%m/%d/%Y').date() for d in date]
y=read_myfile['Observed']
y1=read_myfile['Simulated']
plt.plot(x,y,color='blue');
plt.plot(x,y1,color='red');
plt.gcf().autofmt_xdate()
Two time series of data.

ggplot Bar Plot semantics

I am trying to use ggplot in Python for the first time and the semantics are completely unobvious to me.
I have a pandas dataframe with two columns: date and entries_sum. What I would like to do is plot a bar plot with the date column as each entry on the x-axis and entries_sum as the respective heights.
I cannot figure out how to do this with the ggplot API. Am I formatting my data wrong for this?
How about:
ggplot(aes(x='date', y='entries_sum'), data=data) + geom_bar(stat='identity')

How to remove day from datetime index in pandas?

The idea behind this question is, that when I'm working with full datetime tags and data from different days, I sometimes want to compare how the hourly behavior compares.
But because the days are different, I can not directly plot two 1-hour data sets on top of each other.
My naive idea would be that I need to remove the day from the datetime index on both sets and then plot them on top of each other. What's the best way to do that?
Or, alternatively, what's the better approach to my problem?
This may not be exactly it but should help you along, assuming ts is your timeseries:
hourly = ts.resample('H')
hourly.index = pd.MultiIndex.from_arrays([hourly.index.hour, hourly.index.normalize()])
hourly.unstack().plot()
If you don't care about the day AT ALL, just hourly.index = hourly.index.hour should work

Categories