I have troubles with my graph using matplotlib and plotly.
I did transform my object date col into a datetime64[ns] col but I still have the same answer over and over :
"view limit minimum -36834.32000000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime unit"
my code to transform object date > datetime :df_result["date_str"] = pd.to_datetime(df_result['date_str'], format = '%Y-%m-%d')
then I reduce my dataFrame in the few col I need : df_graph = df_result[["date_str","frequentation_reel"]]
when I check, my column 'date_str' is in 'dtype: datetime64[ns]', so it should be ok for a seaborn graph
Here is my code for the plot :ax1 = sns.barplot(x='date_str', y='frequentation_reel', data = df_graph, palette='summer')
Here is a sample : date_str 2017-09-30 frequentation_reel 0.926966
I tried a lots of things from stackoverflow but nothing works for me so far. I can't figure out what is going wrong. Does anyone can help ? Thanks a lot
Related
I have a large (> 1 mil rows) dataset that has datetime timestamps inside of it. I want to look at trends that may occur throughout the day. So to start if I do: print(df['timestamp']) it will show my data as:
0 2014-01-01 13:11:50.3
1 2011-02-13 04:12:45.0
Name: timestamp, Length: 1000000, dtype: datetime64[ns] /
However, I do not want the date there, as I only want to plot trends throughout the day, without caring what day it is. So I do this line of code:
df['timestamp'] = pd.Series([val.time() for val in df['timestamp']]), this gives me the desired only-timestamp data, but returns the dtype as 'object', which I cannot plot. For example when I try using Seaborn: sns.lineplot(df['timestamp'], df['Task_Length']), I get the error "TypeError: Invalid object type at position 0".
BUT, if I just do the same exact sns.lineplot(df['timestamp'], df['Task_Length']), without the intermediary step of cutting off date, leaving it as datetime64[ns] object as opposed to the generic 'object' datatype; it plots fine. However, this results in a plot spanning multiple years, whereas I only want to see time-of-day trends.
For clarity, this is a pandas dataframe where each row has a task that occurs, which generically I could call one column being 'TaskName', and each is associated with a 'timestamp' as previously explained, and I want to use any sort of Seaborn plotting to analyze daily trends such as different task types happening at different times of the day, not caring about days of the year. Thanks for any help.
Edit* updating another thing that I tried: using original datetime64[ns] object that does plot, I tried doing sns.lineplot(df['timestamp'].dt.time, df['Task_Length']) which gave the same error as when I add the line of code to cut off date. Can't figure our why Seaborn doesn't like just the time component.
This works for me.
Difference is in converting column "timestamp" from datetime to time.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame([['2014-01-01 13:11:50.3',10],['2011-02-13 04:12:45.0',15]], columns=['timestamp','Task_Length'])
df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%H:%M:%S')
sns.lineplot(df['timestamp'], df['Task_Length'])
plt.show()
Refer this question for further details
Plot datetime.time in seaborn
The x-axis should be a series of dates; instead, I'm getting just the my_date header. How do I return the dates? By the way, I am visualizing the number of daily registrations for a project.
viz = registrations.groupby('my_date').count()[['event_type']]
viz.plot()
It's dificult to understant without seeing an example of your data frame, but is your 'my_date' column a datetime object? You could check that with a registrations.info()
You should see something like my_date 4 non-null datetime64[ns]
If it is not a datetime64, you need to convert using:
registrations['my_date'] = pd.to_datetime(registrations['my_date'])
my x axis values are in epoch time with milliseconds. When I try to set the xaxis ticker to a more readable format such as:
fig.update_layout(
xaxis = {'type': 'date','tickformat': '%H:%M:%S'},
)
The date doesn't come through correctly as expected. The year, day, hour, are all incorrect. I cannot find the proper way to do this with plotly python. If someone could provide an example that would be great.
Haven't had much luck with millisecond timestamps either. One solution could be to pre-format before going to Plotly. This slows down performance, but there are certain formats Plotly recognizes.
Not assuming you are using pandas as your inputs, but you can build the format like so:
df2[xaxis_name] = pd.to_datetime(df[xaxis_name], unit='ms').dt.date
https://plotly.com/python/time-series/
Then you can further refine it with:
https://plotly.com/python/reference/#layout-xaxis-tickformat
Such as:
fig.update_xaxes(
tickformat = "%b-%Y"
)
Worked for me.
Plotly auto-sets the axis type to a date format when the corresponding
data are either ISO-formatted date strings or if they're a date pandas
column or datetime NumPy array.
I'm trying to plot "created_at" versus "rating".
"Created_at" is datetime format
Why is the x-axis being shown as float values instead of dates like I am intending with the line
plt.plot(data['created_at'].values, data['rating'],"b.", alpha =0.5)
shown in the bottom picture
The easiest way to achieve this is to define a temporary dataframe with the datetimes as index (see this SO post):
rating_vs_creationdate = data.copy()
rating_vs_creationdate = rating_vs_creationdate.set_index('created_at')
rating_vs_creationdate = rating_vs_creationdate['rating']
plt.plot(rating_vs_creationdate)
Maybe you will run into problems with the date format. The link provided by MrFuppes in the comments gives an easy date formatting example.
So, I made a DataFrame that looks like this:
The DF ist chronologically ordered by the DateTime-Objects. These DateTimes are generated by transforming the column "attributes.timestamp" which contains timestamps as strings:
df["DateTime"] = df["attributes.timestamp"].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ'))
The corresponding y-values is a counter that counts objects within the corresponding minute.
When I try to plot this DF in matplotlib, it actually works. It takes the datetime-objects as x-values and plots the counts for that minute as follows:
Now of course it looks dumb to have to full DateTime-object shown on the x-axis. It shows month, day and hour in this order (in the example it's the 2nd of March from 2pm to 20pm). I want it to show JUST the hours (or at least just the time, not the entire date that comes with it). So I tried to add a new column (called "Time") to the DF. That column would extract the time from the DateTime column using the following code:
df["Time"] = df["DateTime"].time()
However, that doesn't work, for it gives me the attribute error "'Series' object has no attribute 'time'". Instead I tried something else. I just repeated the whole code I used earlier when I generated the DateTime objects and added ".time()" to it.
df["Time"] = df["attributes.timestamp"].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ').time())
I have no idea why, but now it works just fine. I was capable of adding the time from my Datetime-object:
My next idea would be to use the "Time" column on my x-axis instead of the whole datetime for plotting. y-values from the counter stay the same. But for some reason that doesn't work. When I try to plot it like that, it gives me the following error: TypeError: float() argument must be a string or a number, not 'datetime.time'
Strangely enough that was no problem when plotting with the whole DateTime-object. I don't know, why the exctracted time would be a problem, since it is a chronologically ordered value as well.
My question is: Why the heck does my approach not work? And: Is there any way to go around this?
Matplotlib supports plotting pandas DatetimeIndex, as well as numpy datetime64 objects, but not sequences of datetime.time. In addition, df["Time"] = df["DateTime"].time() does not work because you are applying the .time() method to the Series itself, instead of to the elements of the Series within, which are pandas.Timestamp objects that do have the .time() method defined.
To answer your main question, you just want the x-axis to not show redundant info, yes? Instead of creating a new column only for itme, the proper way to do this is to format the matplotlib x-axis with matplotlib.dates.DateFormatter.
Here's a minimal example:
import matplotlib.pyplot as plt
import pandas as pd
# Example DatetimeIndex and data
x = pd.date_range(start='2020-05-10', end='2020-05-11', freq='1h')
y = list(range(len(x)))
fig, ax = plt.subplots()
plt.plot(x, y)
# The following specifies the format for dates
import matplotlib.dates as mdates
date_fmt = mdates.DateFormatter('%I: %M%p')
ax.xaxis.set_major_formatter(date_fmt)
# autofmt_xdate helps with auto-rotating dates so they do not overlap
fig.autofmt_xdate()
plt.show()
As for how to know what string to pass to DateFormatter, refer to https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior for strftime formats.
Matplotlib has a page dedicated to fixing common date annoyances that you might find useful: https://matplotlib.org/3.1.1/gallery/recipes/common_date_problems.html