I'm trying to plot "created_at" versus "rating".
"Created_at" is datetime format
Why is the x-axis being shown as float values instead of dates like I am intending with the line
plt.plot(data['created_at'].values, data['rating'],"b.", alpha =0.5)
shown in the bottom picture
The easiest way to achieve this is to define a temporary dataframe with the datetimes as index (see this SO post):
rating_vs_creationdate = data.copy()
rating_vs_creationdate = rating_vs_creationdate.set_index('created_at')
rating_vs_creationdate = rating_vs_creationdate['rating']
plt.plot(rating_vs_creationdate)
Maybe you will run into problems with the date format. The link provided by MrFuppes in the comments gives an easy date formatting example.
Related
I have a large (> 1 mil rows) dataset that has datetime timestamps inside of it. I want to look at trends that may occur throughout the day. So to start if I do: print(df['timestamp']) it will show my data as:
0 2014-01-01 13:11:50.3
1 2011-02-13 04:12:45.0
Name: timestamp, Length: 1000000, dtype: datetime64[ns] /
However, I do not want the date there, as I only want to plot trends throughout the day, without caring what day it is. So I do this line of code:
df['timestamp'] = pd.Series([val.time() for val in df['timestamp']]), this gives me the desired only-timestamp data, but returns the dtype as 'object', which I cannot plot. For example when I try using Seaborn: sns.lineplot(df['timestamp'], df['Task_Length']), I get the error "TypeError: Invalid object type at position 0".
BUT, if I just do the same exact sns.lineplot(df['timestamp'], df['Task_Length']), without the intermediary step of cutting off date, leaving it as datetime64[ns] object as opposed to the generic 'object' datatype; it plots fine. However, this results in a plot spanning multiple years, whereas I only want to see time-of-day trends.
For clarity, this is a pandas dataframe where each row has a task that occurs, which generically I could call one column being 'TaskName', and each is associated with a 'timestamp' as previously explained, and I want to use any sort of Seaborn plotting to analyze daily trends such as different task types happening at different times of the day, not caring about days of the year. Thanks for any help.
Edit* updating another thing that I tried: using original datetime64[ns] object that does plot, I tried doing sns.lineplot(df['timestamp'].dt.time, df['Task_Length']) which gave the same error as when I add the line of code to cut off date. Can't figure our why Seaborn doesn't like just the time component.
This works for me.
Difference is in converting column "timestamp" from datetime to time.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame([['2014-01-01 13:11:50.3',10],['2011-02-13 04:12:45.0',15]], columns=['timestamp','Task_Length'])
df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%H:%M:%S')
sns.lineplot(df['timestamp'], df['Task_Length'])
plt.show()
Refer this question for further details
Plot datetime.time in seaborn
I have troubles with my graph using matplotlib and plotly.
I did transform my object date col into a datetime64[ns] col but I still have the same answer over and over :
"view limit minimum -36834.32000000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime unit"
my code to transform object date > datetime :df_result["date_str"] = pd.to_datetime(df_result['date_str'], format = '%Y-%m-%d')
then I reduce my dataFrame in the few col I need : df_graph = df_result[["date_str","frequentation_reel"]]
when I check, my column 'date_str' is in 'dtype: datetime64[ns]', so it should be ok for a seaborn graph
Here is my code for the plot :ax1 = sns.barplot(x='date_str', y='frequentation_reel', data = df_graph, palette='summer')
Here is a sample : date_str 2017-09-30 frequentation_reel 0.926966
I tried a lots of things from stackoverflow but nothing works for me so far. I can't figure out what is going wrong. Does anyone can help ? Thanks a lot
my x axis values are in epoch time with milliseconds. When I try to set the xaxis ticker to a more readable format such as:
fig.update_layout(
xaxis = {'type': 'date','tickformat': '%H:%M:%S'},
)
The date doesn't come through correctly as expected. The year, day, hour, are all incorrect. I cannot find the proper way to do this with plotly python. If someone could provide an example that would be great.
Haven't had much luck with millisecond timestamps either. One solution could be to pre-format before going to Plotly. This slows down performance, but there are certain formats Plotly recognizes.
Not assuming you are using pandas as your inputs, but you can build the format like so:
df2[xaxis_name] = pd.to_datetime(df[xaxis_name], unit='ms').dt.date
https://plotly.com/python/time-series/
Then you can further refine it with:
https://plotly.com/python/reference/#layout-xaxis-tickformat
Such as:
fig.update_xaxes(
tickformat = "%b-%Y"
)
Worked for me.
Plotly auto-sets the axis type to a date format when the corresponding
data are either ISO-formatted date strings or if they're a date pandas
column or datetime NumPy array.
So, I made a DataFrame that looks like this:
The DF ist chronologically ordered by the DateTime-Objects. These DateTimes are generated by transforming the column "attributes.timestamp" which contains timestamps as strings:
df["DateTime"] = df["attributes.timestamp"].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ'))
The corresponding y-values is a counter that counts objects within the corresponding minute.
When I try to plot this DF in matplotlib, it actually works. It takes the datetime-objects as x-values and plots the counts for that minute as follows:
Now of course it looks dumb to have to full DateTime-object shown on the x-axis. It shows month, day and hour in this order (in the example it's the 2nd of March from 2pm to 20pm). I want it to show JUST the hours (or at least just the time, not the entire date that comes with it). So I tried to add a new column (called "Time") to the DF. That column would extract the time from the DateTime column using the following code:
df["Time"] = df["DateTime"].time()
However, that doesn't work, for it gives me the attribute error "'Series' object has no attribute 'time'". Instead I tried something else. I just repeated the whole code I used earlier when I generated the DateTime objects and added ".time()" to it.
df["Time"] = df["attributes.timestamp"].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ').time())
I have no idea why, but now it works just fine. I was capable of adding the time from my Datetime-object:
My next idea would be to use the "Time" column on my x-axis instead of the whole datetime for plotting. y-values from the counter stay the same. But for some reason that doesn't work. When I try to plot it like that, it gives me the following error: TypeError: float() argument must be a string or a number, not 'datetime.time'
Strangely enough that was no problem when plotting with the whole DateTime-object. I don't know, why the exctracted time would be a problem, since it is a chronologically ordered value as well.
My question is: Why the heck does my approach not work? And: Is there any way to go around this?
Matplotlib supports plotting pandas DatetimeIndex, as well as numpy datetime64 objects, but not sequences of datetime.time. In addition, df["Time"] = df["DateTime"].time() does not work because you are applying the .time() method to the Series itself, instead of to the elements of the Series within, which are pandas.Timestamp objects that do have the .time() method defined.
To answer your main question, you just want the x-axis to not show redundant info, yes? Instead of creating a new column only for itme, the proper way to do this is to format the matplotlib x-axis with matplotlib.dates.DateFormatter.
Here's a minimal example:
import matplotlib.pyplot as plt
import pandas as pd
# Example DatetimeIndex and data
x = pd.date_range(start='2020-05-10', end='2020-05-11', freq='1h')
y = list(range(len(x)))
fig, ax = plt.subplots()
plt.plot(x, y)
# The following specifies the format for dates
import matplotlib.dates as mdates
date_fmt = mdates.DateFormatter('%I: %M%p')
ax.xaxis.set_major_formatter(date_fmt)
# autofmt_xdate helps with auto-rotating dates so they do not overlap
fig.autofmt_xdate()
plt.show()
As for how to know what string to pass to DateFormatter, refer to https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior for strftime formats.
Matplotlib has a page dedicated to fixing common date annoyances that you might find useful: https://matplotlib.org/3.1.1/gallery/recipes/common_date_problems.html
I am trying to plot timestamp by using pyplot. I want to limit timestamp on x axis in a required range from the timestamp data column in df. I am not getting how to go about it.
I tried using xlim with pandas timestamp option to set limits but it didn't work. Datatype of timestamp in the dataframe is datetime64.
plt.xlim((pd.Timestamp(x[0:1]['Time']), pd.Timestamp(x[35:36]['Time'])
Obtained output is:
TypeError: Cannot convert input [0 2017-01-01 06:00:00
Name: Time, dtype: datetime64[ns]] of type to Timestamp
expected result is :
on x axis I want ticks ranging from a particular timestamp to another particular timestamp taken from the same df.
matplotlib can handle datetime64 objects, you don't need to convert them to Timestamp, so you can use:
plt.xlim(x['Time'][0],x['Time'][35])
For readability reasons, you can also just write e.g. [35], instead of [35:36], because you only need one objects and not a range.
If you would like to know why the convertion did not work, you can show us the values for x['Time'][0] and x['Time'][35].