So, I made a DataFrame that looks like this:
The DF ist chronologically ordered by the DateTime-Objects. These DateTimes are generated by transforming the column "attributes.timestamp" which contains timestamps as strings:
df["DateTime"] = df["attributes.timestamp"].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ'))
The corresponding y-values is a counter that counts objects within the corresponding minute.
When I try to plot this DF in matplotlib, it actually works. It takes the datetime-objects as x-values and plots the counts for that minute as follows:
Now of course it looks dumb to have to full DateTime-object shown on the x-axis. It shows month, day and hour in this order (in the example it's the 2nd of March from 2pm to 20pm). I want it to show JUST the hours (or at least just the time, not the entire date that comes with it). So I tried to add a new column (called "Time") to the DF. That column would extract the time from the DateTime column using the following code:
df["Time"] = df["DateTime"].time()
However, that doesn't work, for it gives me the attribute error "'Series' object has no attribute 'time'". Instead I tried something else. I just repeated the whole code I used earlier when I generated the DateTime objects and added ".time()" to it.
df["Time"] = df["attributes.timestamp"].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ').time())
I have no idea why, but now it works just fine. I was capable of adding the time from my Datetime-object:
My next idea would be to use the "Time" column on my x-axis instead of the whole datetime for plotting. y-values from the counter stay the same. But for some reason that doesn't work. When I try to plot it like that, it gives me the following error: TypeError: float() argument must be a string or a number, not 'datetime.time'
Strangely enough that was no problem when plotting with the whole DateTime-object. I don't know, why the exctracted time would be a problem, since it is a chronologically ordered value as well.
My question is: Why the heck does my approach not work? And: Is there any way to go around this?
Matplotlib supports plotting pandas DatetimeIndex, as well as numpy datetime64 objects, but not sequences of datetime.time. In addition, df["Time"] = df["DateTime"].time() does not work because you are applying the .time() method to the Series itself, instead of to the elements of the Series within, which are pandas.Timestamp objects that do have the .time() method defined.
To answer your main question, you just want the x-axis to not show redundant info, yes? Instead of creating a new column only for itme, the proper way to do this is to format the matplotlib x-axis with matplotlib.dates.DateFormatter.
Here's a minimal example:
import matplotlib.pyplot as plt
import pandas as pd
# Example DatetimeIndex and data
x = pd.date_range(start='2020-05-10', end='2020-05-11', freq='1h')
y = list(range(len(x)))
fig, ax = plt.subplots()
plt.plot(x, y)
# The following specifies the format for dates
import matplotlib.dates as mdates
date_fmt = mdates.DateFormatter('%I: %M%p')
ax.xaxis.set_major_formatter(date_fmt)
# autofmt_xdate helps with auto-rotating dates so they do not overlap
fig.autofmt_xdate()
plt.show()
As for how to know what string to pass to DateFormatter, refer to https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior for strftime formats.
Matplotlib has a page dedicated to fixing common date annoyances that you might find useful: https://matplotlib.org/3.1.1/gallery/recipes/common_date_problems.html
Related
I have a large (> 1 mil rows) dataset that has datetime timestamps inside of it. I want to look at trends that may occur throughout the day. So to start if I do: print(df['timestamp']) it will show my data as:
0 2014-01-01 13:11:50.3
1 2011-02-13 04:12:45.0
Name: timestamp, Length: 1000000, dtype: datetime64[ns] /
However, I do not want the date there, as I only want to plot trends throughout the day, without caring what day it is. So I do this line of code:
df['timestamp'] = pd.Series([val.time() for val in df['timestamp']]), this gives me the desired only-timestamp data, but returns the dtype as 'object', which I cannot plot. For example when I try using Seaborn: sns.lineplot(df['timestamp'], df['Task_Length']), I get the error "TypeError: Invalid object type at position 0".
BUT, if I just do the same exact sns.lineplot(df['timestamp'], df['Task_Length']), without the intermediary step of cutting off date, leaving it as datetime64[ns] object as opposed to the generic 'object' datatype; it plots fine. However, this results in a plot spanning multiple years, whereas I only want to see time-of-day trends.
For clarity, this is a pandas dataframe where each row has a task that occurs, which generically I could call one column being 'TaskName', and each is associated with a 'timestamp' as previously explained, and I want to use any sort of Seaborn plotting to analyze daily trends such as different task types happening at different times of the day, not caring about days of the year. Thanks for any help.
Edit* updating another thing that I tried: using original datetime64[ns] object that does plot, I tried doing sns.lineplot(df['timestamp'].dt.time, df['Task_Length']) which gave the same error as when I add the line of code to cut off date. Can't figure our why Seaborn doesn't like just the time component.
This works for me.
Difference is in converting column "timestamp" from datetime to time.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame([['2014-01-01 13:11:50.3',10],['2011-02-13 04:12:45.0',15]], columns=['timestamp','Task_Length'])
df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%H:%M:%S')
sns.lineplot(df['timestamp'], df['Task_Length'])
plt.show()
Refer this question for further details
Plot datetime.time in seaborn
I have troubles with my graph using matplotlib and plotly.
I did transform my object date col into a datetime64[ns] col but I still have the same answer over and over :
"view limit minimum -36834.32000000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime unit"
my code to transform object date > datetime :df_result["date_str"] = pd.to_datetime(df_result['date_str'], format = '%Y-%m-%d')
then I reduce my dataFrame in the few col I need : df_graph = df_result[["date_str","frequentation_reel"]]
when I check, my column 'date_str' is in 'dtype: datetime64[ns]', so it should be ok for a seaborn graph
Here is my code for the plot :ax1 = sns.barplot(x='date_str', y='frequentation_reel', data = df_graph, palette='summer')
Here is a sample : date_str 2017-09-30 frequentation_reel 0.926966
I tried a lots of things from stackoverflow but nothing works for me so far. I can't figure out what is going wrong. Does anyone can help ? Thanks a lot
my x axis values are in epoch time with milliseconds. When I try to set the xaxis ticker to a more readable format such as:
fig.update_layout(
xaxis = {'type': 'date','tickformat': '%H:%M:%S'},
)
The date doesn't come through correctly as expected. The year, day, hour, are all incorrect. I cannot find the proper way to do this with plotly python. If someone could provide an example that would be great.
Haven't had much luck with millisecond timestamps either. One solution could be to pre-format before going to Plotly. This slows down performance, but there are certain formats Plotly recognizes.
Not assuming you are using pandas as your inputs, but you can build the format like so:
df2[xaxis_name] = pd.to_datetime(df[xaxis_name], unit='ms').dt.date
https://plotly.com/python/time-series/
Then you can further refine it with:
https://plotly.com/python/reference/#layout-xaxis-tickformat
Such as:
fig.update_xaxes(
tickformat = "%b-%Y"
)
Worked for me.
Plotly auto-sets the axis type to a date format when the corresponding
data are either ISO-formatted date strings or if they're a date pandas
column or datetime NumPy array.
I'm trying to plot "created_at" versus "rating".
"Created_at" is datetime format
Why is the x-axis being shown as float values instead of dates like I am intending with the line
plt.plot(data['created_at'].values, data['rating'],"b.", alpha =0.5)
shown in the bottom picture
The easiest way to achieve this is to define a temporary dataframe with the datetimes as index (see this SO post):
rating_vs_creationdate = data.copy()
rating_vs_creationdate = rating_vs_creationdate.set_index('created_at')
rating_vs_creationdate = rating_vs_creationdate['rating']
plt.plot(rating_vs_creationdate)
Maybe you will run into problems with the date format. The link provided by MrFuppes in the comments gives an easy date formatting example.
I have a DataFrame with two columns of time information. The first is the epoch time in seconds, and the second is the corresponding formatted str time like "2015-06-01T09:00:00+08:00" where "+08:00" denotes the timezone.
I'm aware that time formats are in a horrible mess in Python, and that matplotlib.pyplot seems to only recognise the datetime format. I tried several ways to convert the str time to datetime but none of them would work. When I use pd.to_datetime it will convert to datetime64, and when using pd.Timestamp it converts to Timestamp, and even when I tried using combinations of these two functions, the output would always be either datetime64 or Timestamp but NEVER for once datetime. I also tried the method suggested in this answer. Didn't work. It's kind of driving me up the wall now.
Could anybody kindly figure out a quick way for this? Thanks!
I post a minimal example below:
import matplotlib.pyplot as plt
import time
import pandas as pd
df = pd.DataFrame([[1433120400, "2015-06-01T09:00:00+08:00"]], columns=["epoch", "strtime"])
# didn't work
df["usable_time"] = pd.to_datetime(df["strtime"])
# didn't work either
df["usable_time"] = pd.to_datetime(df["strtime"].apply(lambda s: pd.Timestamp(s)))
# produced a strange type called "struct_time". Don't think it'd be compatible with pyplot
df["usable_time"] = df["epoch"].apply(lambda x: time.localtime(x))
# attempted to plot with pyplot
df["usable_time"] = pd.to_datetime(df["strtime"])
plt.plot(x=df["usable_time"], y=[0.123])
plt.show()
UPDATE (per comments)
It seems like the confusion here is stemming from the fact that the call to plt.plot() takes positional x/y arguments instead of keyword arguments. In other words, the appropriate signature is:
plt.plot(x, y)
Or, alternately:
plt.plot('x_label', 'y_label', data=obj)
But not:
plt.plot(x=x, y=y)
There's a separate discussion of why this quirk of Pyplot exists here, also see ImportanceOfBeingErnest's comments below.
Original
This isn't really an answer, more of a demonstration that Pyplot doesn't have an issue with Pandas datetime data. I've added an extra row to df to make the plot clearer:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([[1433120400, "2015-06-01T09:00:00+08:00"],
[1433130400, "2015-07-01T09:00:00+08:00"]],
columns=["epoch", "strtime"])
df["usable_time"] = pd.to_datetime(df["strtime"])
df.dtypes
epoch int64
strtime object
usable_time datetime64[ns]
dtype: object
plt.plot(df.usable_time, df.epoch)
pd.__version__ # '0.23.3'
matplotlib.__version__ # '2.2.2'
You can use to_pydatetime (from the dt accessor or Timestamp) to get back native datetime objects if you really want to, e.g.:
pd.to_datetime(df["strtime"]).dt.to_pydatetime()
This will return an array of native datetime objects:
array([datetime.datetime(2015, 6, 1, 1, 0)], dtype=object)
However, pyplot seems to be able to work with pandas datetime series.