I have a Pandas Dataframe (data) with a column "Duration" that represents the time duration in hours, minutes, seconds with a format like: "1:10:27"
How to convert the column to Pandas Timedelta?
I tried:
data['Duration'] = pd.to_timedelta(data['Duration'])
But it says:
"ValueError: expected hh:mm:ss format before"
I suspect this happens because the format has only 1 digit for hours.
The rows show "1:30:27" instead of "01:30:27". Or "0:57:23" instead of "00:57:23"
I would appreciate your help!
Using inputs that you described I'm guessing it's something like the following, but it's working fine for me. If you have a specific input that's different, feel free to post it.
import pandas as pd
time = "1:30:27"
print(pd.to_timedelta(time))
Output:
0 days 01:30:27
Related
I'm using pandas to analyze some data about the House Price Index of all states from quandl:
HPI_Data = quandl.get("FMAC/HPI_AK")
The data looks something like this:
HPI Alaska
Date
1975-01-31 35.105461
1975-02-28 35.465209
1975-03-31 35.843110
and so on.
I've got a second dataframe with some special dates in it:
Date
Name
David 1979-08
Allen 1980-08
Hugo 1989-09
The values for "Date" here are of "string" type and not "date".
I'd like to go 6 months back from each date in the special dataframe and see the values in the HPI dataframe.
I'd like to use .loc but I have not been able to convert the first dataframe's index from "END OF MONTH" to "MONTH". even after resampling to "1D" then back to "M".
I'd would appreciate any help, if it solves the problem a different way or the janky data deleting way I want :).
Not sure if I understand correctly. So please clarify your question if this is not correct.
You can convert a string to a pandas date time object using pd.to_datetime and use the format parameter to specify how to parse the string
import pandas as pd
# Creating a dummy Series
sr = pd.Series(['2012-10-21 09:30', '2019-7-18 12:30', '2008-02-2 10:30',
'2010-4-22 09:25', '2019-11-8 02:22'])
# Convert the underlying data to datetime
sr = pd.to_datetime(sr)
# Subtract 6 months of the datetime series
sr-pd.DateOffset(month=6)
In regards to changing the datetime to just month i.e. 2012-10-21 09:30 --> 2012-10 I would do this:
sr.dt.to_period('M')
Some rows in my dataframe have time as "13:2:7" and I want them to be "13:02:07".
I have tried applying pd.to_datetime to the column but it doesnt work
Can someone please suggest some method to format the time in standard format
Found the solution
I solved it through pandas to_timedelta
pd.to_timedelta("13:2:3") or pass the column of dataframe through it
result:
"13:02:03"
ps: you will get result as "0 days 13:02:03"
In order to remove 0 days
df["column_name"].astype(str).map(lambda x: x[7:]))
We will slice of the initial 0 days string present
Note: The final result will be in form string
In case you want time object ,
Use pandas pd.to_datetime or strftime from datetime module
I have tried the following without luck:
df['TimedSpentInZone'] = df.TimedSpentInZone.astype(int)
df['TimedSpentInZone'] = df['TimedSpentInZone'].dt.total_seconds()
df['TimedSpentInZone'] = df['TimedSpentInZone'].dt.hours
(And also divided by 60 etc to get minutes. none of the above works.
You can, and should convert to Timedelta type:
pd.to_timedelta(df['TimedSpentInZone'])/pd.Timedelta('60s')
Its been a long time, but for future readers, this is what I did;
From the looks of it this column in the question is already a Timedelta type.
If thats your case, you don't need to convert it.
Just get
df['TimeSpentInZone'].dt.seconds / 60
to get the minutes.
I've a datetime (int64) column in my pandas dataframe.
I'm trying to convert its value of 201903250428 to a datetime value.
The value i have for the datetime (int64) column is only till minute level with 24 hours format.
I tried various methods like striptime, to_datetime methods but no luck.
pd.datetime.strptime('201903250428','%y%m%d%H%M')
I get this error when i use the above code.
ValueError: unconverted data remains: 0428
I wanted this value to be converted to like '25-03-2019 04:28:00'
Lower-case y means two-digit years only, so this is trying to parse "20" as the year, 1 as the month, 9 the day, and 03:25 as the time, leaving "0428" unconverted.
You need to use %Y which will work fine:
pd.datetime.strptime('201903250428','%Y%m%d%H%M')
http://strftime.org/ is a handy reference for time formatting/parsing parameters.
This seems like it would be fairly straight forward but after nearly an entire day I have not found the solution. I've loaded my dataframe with read_csv and easily parsed, combined and indexed a date and a time column into one column but now I want to be able to just reshape and perform calculations based on hour and minute groupings similar to what you can do in excel pivot.
I know how to resample to hour or minute but it maintains the date portion associated with each hour/minute whereas I want to aggregate the data set ONLY to hour and minute similar to grouping in excel pivots and selecting "hour" and "minute" but not selecting anything else.
Any help would be greatly appreciated.
Can't you do, where df is your DataFrame:
times = pd.to_datetime(df.timestamp_col)
df.groupby([times.dt.hour, times.dt.minute]).value_col.sum()
Wes' code didn't work for me. But the DatetimeIndex function (docs) did:
times = pd.DatetimeIndex(data.datetime_col)
grouped = df.groupby([times.hour, times.minute])
The DatetimeIndex object is a representation of times in pandas. The first line creates a array of the datetimes. The second line uses this array to get the hour and minute data for all of the rows, allowing the data to be grouped (docs) by these values.
Came across this when I was searching for this type of groupby. Wes' code above didn't work for me, not sure if it's because changes in pandas over time.
In pandas 0.16.2, what I did in the end was:
grp = data.groupby(by=[data.datetime_col.map(lambda x : (x.hour, x.minute))])
grp.count()
You'd have (hour, minute) tuples as the grouped index. If you want multi-index:
grp = data.groupby(by=[data.datetime_col.map(lambda x : x.hour),
data.datetime_col.map(lambda x : x.minute)])
I have an alternative of Wes & Nix answers above, with just one line of code, assuming your column is already a datetime column, you don't need to get the hour and minute attributes separately:
df.groupby(df.timestamp_col.dt.time).value_col.sum()
This might be a little late but I found quite a good solution for any one that has the same problem.
I have a df like this:
datetime value
2022-06-28 13:28:08 15
2022-06-28 13:28:09 30
... ...
2022-06-28 14:29:11 20
2022-06-28 14:29:12 10
I want to convert those timestamps which are in intervals of a second to timestamps with an interval of minutes adding the value column in the process.
There is a neat way of doing it:
df['datetime'] = pd.to_datetime(df['datetime']) #if not already as datetime object
grouped = df.groupby(pd.Grouper(key='datetime', axis=0, freq='T')).sum()
print(grouped.head())
Result:
datetime value
2022-06-28 13:28:00 45
... ...
2022-06-28 14:29:00 30
freq='T' stands for minutes. You could also group it by hours or days. They are called Offset aliases.