I´ve a pandas dataframe which I applied the pandas.to_datetime. Now I want to extract the hours/minutes/seconds from each timestamp. I used df.index.day to get the days, and now, I want to know if there are different hours in my index.
For example, if I have two dates d1 = 2020-01-01 00:00:00 and d2 = 2020-01-02 00:00:00 I can't assume I should apply a smooth operator by hour because makes no sense.
So what I want to know is: how do I know if a day has different hours/minutes or seconds?
Thank you in advance
I think you should use df[index].dt provided by pandas.
You can extract day, week, hour, minute, second by using it.
Please see this.
dir(df[index].dt)
Here is an example.
import pandas as pd
df = pd.DataFrame([["2020-01-01 06:31:00"], ["2020-03-12 10:21:09"]])
print(df)
df['time'] = pd.to_datetime(df["timestamp"])
df['dates'] = df['time'].dt.date
df['hour'] = df['time'].dt.hour
df['minute'] = df['time'].dt.minute
df['second'] = df['time'].dt.second
Now your df should look like this.
0 time dates hour minute second
0 2020-01-01 06:31:00 2020-01-01 06:31:00 2020-01-01 6 31 0
1 2020-03-12 10:21:09 2020-03-12 10:21:09 2020-03-12 10 21 9
If d1 and d2 are datetime or Timestamp objects, you can get the hour, minute and second using attributes hour , minute and second
print(d1.hour,d1.minute,d1.second)
print(d2.hour,d2.minute,d2.second)
Similarly, year, month and day can also be extracted.
Related
I am working on time-series data, where I have two columns date and quantity. The date is day wise. I want to add all the quantity for a month and convert it into a single date.
date is my index column
Example
quantity
date
2018-01-03 30
2018-01-05 45
2018-01-19 30
2018-02-09 10
2018-02-19 20
Output :
quantity
date
2018-01-01 105
2018-02-01 30
Thanks in advance!!
You can downsample to combine the data for each month and sum it by chaining the sum method.
df.resample("M").sum()
Check out the pandas user guide on resampling here.
You'll need to make sure your index is in datetime format for this to work. So first do: df.index = pd.to_datetime(df.index). Hat tip to sammywemmy for the same advice in the comments.
You an also use groupby to get results.
df.index = pd.to_datetime(df.index)
df.groupby(df.index.strftime('%Y-%m-01')).sum()
I am reading some data from an csv file where the datatype of the two columns are in hh:mm format. Here is an example:
Start End
11:15 15:00
22:30 2:00
In the above example, the End in the 2nd row happens in the next day. I am trying to get the time difference between these two columns in the most efficient way as the dataset is huge. Is there any good pythonic way for doing this? Also, since there is no date, and some Ends happen in the next I get wrong result when I calculate the diff.
>>> import pandas as pd
>>> df = pd.read_csv(file_path)
>>> pd.to_datetime(df['End'])-pd.to_datetime(df['Start'])
0 0 days 03:45:00
1 0 days 03:00:00
2 -1 days +03:30:00
You can use the technique (a+x)%x with a timedelta of 24h (or 1d, same)
the + timedelta(hours=24) makes all values becomes positive
the % timedelta(hours=24) makes the ones above 24h back of 24h
df['duration'] = (pd.to_datetime(df['End']) - pd.to_datetime(df['Start']) + timedelta(hours=24)) \
% timedelta(hours=24)
Gives
Start End duration
0 11:15 15:00 0 days 03:45:00
1 22:30 2:00 0 days 03:30:00
I have two time series, df1
day cnt
2020-03-01 135006282
2020-03-02 145184482
2020-03-03 146361872
2020-03-04 147702306
2020-03-05 148242336
and df2:
day cnt
2017-03-01 149104078
2017-03-02 149781629
2017-03-03 151963252
2017-03-04 147384922
2017-03-05 143466746
The problem is that the sensors I'm measuring are sensitive to the day of the week, so on Sunday, for instance, they will produce less cnt. Now I need to compare the time series over 2 different years, 2017 and 2020, but to do that I have to align (March, in this case) to the matching day of the week, and plot them accordingly. How do I "shift" the data to make the series comparable?
The ISO calendar is a representation of date in a tuple (year, weeknumber, weekday). In pandas they are the dt members year, weekofyear and weekday. So assuming that the day column actually contains Timestamps (convert if first with to_datetime if it does not), you could do:
df1['Y'] = df1.day.dt.year
df1['W'] = df1.day.dt.weekofyear
df1['D'] = df1.day.dt.weekday
Then you could align the dataframes on the W and D columns
March 2017 started on wednesday
March 2020 started on Sunday
So, delete the last 3 days of march 2017
So, delete the first sunday, monday and tuesday from 2020
this way you have comparable days
df1['ctn2020'] = df1['cnt']
df2['cnt2017'] = df2['cnt']
df1 = df1.iloc[2:, 2]
df2 = df2.iloc[:-3, 2]
Since you don't want to plot the date, but want the months to align, make a new dataframe with both columns and a index column. This way you will have 3 columns: index(0-27), 2017 and 2020. The index will represent.
new_df = pd.concat([df1,df2], axis=1)
If you also want to plot the days of the week on the x axis, check out this link, to know how to get the day of the week from a date, and them change the x ticks label.
Sorry for the "written step-to-stop", if it all sounds confusing, i can type the whole code later for you.
I have a pandas dataframe with timestamps shown below:
6/30/2019 3:45:00 PM
I would like to round the date based on time. Anything before 6AM will be counted as the day before.
6/30/2019 5:45:00 AM -> 6/29/2019
6/30/2019 6:30:00 AM -> 6/30/2019
What I have considered doing is splitting date and time into 2 different columns then using an if statement to shift the date (if time >= 06:00 etc). Just wondering there is a built in function in pandas to do this. Ive seen posts of people rounding up and down based on the closest hour but never a specific time threshold (6AM).
Thank you for the help!
there could be a better way to do this.. But this is one way of doing it.
import pandas as pd
def checkDates(d):
if d.time().hour < 6:
return d - pd.Timedelta(days=1)
else:
return d
ls = ["12/31/2019 3:45:00 AM", "6/30/2019 9:45:00 PM", "6/30/2019 10:45:00 PM", "1/1/2019 4:45:00 AM"]
df = pd.DataFrame(ls, columns=["dates"])
df["dates"] = df["dates"].apply(lambda d: checkDates(pd.to_datetime(d)))
print (df)
dates
0 2019-12-30 03:45:00
1 2019-06-30 21:45:00
2 2019-06-30 22:45:00
3 2018-12-31 04:45:00
Also note i am not taking into consideration of the time. when giving back the result..
if you just want the date at the end of it you can just get that out of the datetime object doing something like this
print ((pd.to_datetime("12/31/2019 3:45:00 AM")).date()) >>> 2019-12-31
if understand python well and dont want anyone else(in the future) to understand what your are doing
one liner to the above is.
df["dates"] = df["dates"].apply(lambda d: pd.to_datetime(d) - pd.Timedelta(days=1) if pd.to_datetime(d).time().hour < 6 else pd.to_datetime(d))
I have a table where it has a column 'Date', 'Time', 'Costs'.
I want to select rows where the time is greater than 12:00:00, then add 1 day to 'Date' column of the selected rows.
How should I go about in doing it?
So far I have:
df[df['Time']>'12:00:00']['Date'] = df[df['Time']>'12:00:00']['Date'].astype('datetime64[ns]') + timedelta(days=1)
I am a beginner in learning coding and any suggestions would be really helpful! Thanks.
Use to_datetime first for column Date if not datetimes, then convert column Time to string if possible python times, convert to datetimes and get hours by Series.dt.hour, compare and add 1 day by condition:
df = pd.DataFrame({'Date':['2015-01-02','2016-05-08'],
'Time':['10:00:00','15:00:00']})
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-08 15:00:00
df['Date'] = pd.to_datetime(df['Date'])
mask = pd.to_datetime(df['Time'].astype(str)).dt.hour > 12
df.loc[mask, 'Date'] += pd.Timedelta(days=1)
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-09 15:00:00