Python: My x axis labels do not show, only the column header - python

The x-axis should be a series of dates; instead, I'm getting just the my_date header. How do I return the dates? By the way, I am visualizing the number of daily registrations for a project.
viz = registrations.groupby('my_date').count()[['event_type']]
viz.plot()

It's dificult to understant without seeing an example of your data frame, but is your 'my_date' column a datetime object? You could check that with a registrations.info()
You should see something like my_date 4 non-null datetime64[ns]
If it is not a datetime64, you need to convert using:
registrations['my_date'] = pd.to_datetime(registrations['my_date'])

Related

Plotting with with datetime64[ns] objects in Seaborn

I have a large (> 1 mil rows) dataset that has datetime timestamps inside of it. I want to look at trends that may occur throughout the day. So to start if I do: print(df['timestamp']) it will show my data as:
0 2014-01-01 13:11:50.3
1 2011-02-13 04:12:45.0
Name: timestamp, Length: 1000000, dtype: datetime64[ns] /
However, I do not want the date there, as I only want to plot trends throughout the day, without caring what day it is. So I do this line of code:
df['timestamp'] = pd.Series([val.time() for val in df['timestamp']]), this gives me the desired only-timestamp data, but returns the dtype as 'object', which I cannot plot. For example when I try using Seaborn: sns.lineplot(df['timestamp'], df['Task_Length']), I get the error "TypeError: Invalid object type at position 0".
BUT, if I just do the same exact sns.lineplot(df['timestamp'], df['Task_Length']), without the intermediary step of cutting off date, leaving it as datetime64[ns] object as opposed to the generic 'object' datatype; it plots fine. However, this results in a plot spanning multiple years, whereas I only want to see time-of-day trends.
For clarity, this is a pandas dataframe where each row has a task that occurs, which generically I could call one column being 'TaskName', and each is associated with a 'timestamp' as previously explained, and I want to use any sort of Seaborn plotting to analyze daily trends such as different task types happening at different times of the day, not caring about days of the year. Thanks for any help.
Edit* updating another thing that I tried: using original datetime64[ns] object that does plot, I tried doing sns.lineplot(df['timestamp'].dt.time, df['Task_Length']) which gave the same error as when I add the line of code to cut off date. Can't figure our why Seaborn doesn't like just the time component.
This works for me.
Difference is in converting column "timestamp" from datetime to time.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame([['2014-01-01 13:11:50.3',10],['2011-02-13 04:12:45.0',15]], columns=['timestamp','Task_Length'])
df['timestamp'] = pd.to_datetime(df['timestamp']).dt.strftime('%H:%M:%S')
sns.lineplot(df['timestamp'], df['Task_Length'])
plt.show()
Refer this question for further details
Plot datetime.time in seaborn

pd.to_datetime ok but still not as date in plotly

I have troubles with my graph using matplotlib and plotly.
I did transform my object date col into a datetime64[ns] col but I still have the same answer over and over :
"view limit minimum -36834.32000000001 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime unit"
my code to transform object date > datetime :df_result["date_str"] = pd.to_datetime(df_result['date_str'], format = '%Y-%m-%d')
then I reduce my dataFrame in the few col I need : df_graph = df_result[["date_str","frequentation_reel"]]
when I check, my column 'date_str' is in 'dtype: datetime64[ns]', so it should be ok for a seaborn graph
Here is my code for the plot :ax1 = sns.barplot(x='date_str', y='frequentation_reel', data = df_graph, palette='summer')
Here is a sample : date_str 2017-09-30 frequentation_reel 0.926966
I tried a lots of things from stackoverflow but nothing works for me so far. I can't figure out what is going wrong. Does anyone can help ? Thanks a lot

how to attribute repeated annual datetime values to a series of numbers in a dataframe

I have a data frame consisting of hourly wind speed measurements for the year 2012 for different locations as seen below:
I was able to change the index to datetime format for just one location using the code:
dfn=df1_s.reset_index(drop=True)
import datetime
dfn['datetime']=pd.to_datetime(dfn.index,
origin=pd.Timestamp('2012-01-01 00:00:00'), unit= 'H')
When using the same code over the entire dataframe, i obtain the error: cannot convert input with unit 'h'. This is probably because there are way to much data than the number of years to be represented by 'hours', i am not sure. Nevertheless, it works when i use units in minutes i.e. units='m'.
What i want to do is to set the datetime in such a way that it repeats itself after every 8784 hours i.e. having the same replicate datetime format for each location on the same dataframe as seen in the image below(expected results produced on excel).
When trying the following, all i obtained was a column with a series on NaNs:
import pdb, random
dates = pd.date_range('2012-01-01', '2013-01-01', freq='H')
data = [int(1000*random.random()) for i in range(len(dates))]
dfn['cum_data'] = pd.Series(data, index=dates)
Can you please direct me on how to go about this?

Trouble plotting datetime pandas series with matplotlib

I'm trying to plot "created_at" versus "rating".
"Created_at" is datetime format
Why is the x-axis being shown as float values instead of dates like I am intending with the line
plt.plot(data['created_at'].values, data['rating'],"b.", alpha =0.5)
shown in the bottom picture
The easiest way to achieve this is to define a temporary dataframe with the datetimes as index (see this SO post):
rating_vs_creationdate = data.copy()
rating_vs_creationdate = rating_vs_creationdate.set_index('created_at')
rating_vs_creationdate = rating_vs_creationdate['rating']
plt.plot(rating_vs_creationdate)
Maybe you will run into problems with the date format. The link provided by MrFuppes in the comments gives an easy date formatting example.

how to use xlim in pyplot to plot timestamp

I am trying to plot timestamp by using pyplot. I want to limit timestamp on x axis in a required range from the timestamp data column in df. I am not getting how to go about it.
I tried using xlim with pandas timestamp option to set limits but it didn't work. Datatype of timestamp in the dataframe is datetime64.
plt.xlim((pd.Timestamp(x[0:1]['Time']), pd.Timestamp(x[35:36]['Time'])
Obtained output is:
TypeError: Cannot convert input [0 2017-01-01 06:00:00
Name: Time, dtype: datetime64[ns]] of type to Timestamp
expected result is :
on x axis I want ticks ranging from a particular timestamp to another particular timestamp taken from the same df.
matplotlib can handle datetime64 objects, you don't need to convert them to Timestamp, so you can use:
plt.xlim(x['Time'][0],x['Time'][35])
For readability reasons, you can also just write e.g. [35], instead of [35:36], because you only need one objects and not a range.
If you would like to know why the convertion did not work, you can show us the values for x['Time'][0] and x['Time'][35].

Categories