Unable convert the data time of this column (object to time) - python

I have a data frame data types like below
usr_id year
0 t961 00:50:03.158000
1 t964 03:25:57
2 t335 00:55:00
3 t829 00:04:25.714000
usr_id object
year object
dtype: object
I want to convert the year column data type to a datetime. I used the below code.
timefmt = "%H:%M"
test['year'] = pd.to_datetime(
test['year'], format=timefmt, errors='coerce').dt.time
I get below output
usr_id year
0 t961 NaT
1 t964 NaT
2 t335 NaT
3 t829 NaT
How can I convert the data time of this column (object to datetime)?
How can I drop seconds & microseconds?
Expected output
usr_id year
0 t961 00:50
1 t964 03:25
2 t335 00:55
3 t829 00:04

Use to_datetime with Series.dt.strftime:
timefmt = "%H:%M"
test['year'] = pd.to_datetime(test['year'], errors='coerce').dt.strftime(timefmt)
print (test)
usr_id year
0 t961 00:50
1 t964 03:25
2 t335 00:55
3 t829 00:04
Or you can use Series.str.rsplit with n=1 for split by last : and select first lists by indexing:
test['year'] = test['year'].str.rsplit(':', n=1).str[0]
print (test)
usr_id year
0 t961 00:50
1 t964 03:25
2 t335 00:55
3 t829 00:04
Or solution by #Akira:
test['year'] = test['year'].astype(str).str[:5]

As there is currently no actual date in your year column, you need to set a default one. Then you you can pass a format to pandas to_datetime function.
This could be done in a one-liner like this:
test['year'] = pd.to_datetime(test['year'].apply(lambda x: '1900-01-01 '+ x),format='%Y-%m-%d %H:%M:%S')

Related

parse multiple date format pandas

I 've got stuck with the following format:
0 2001-12-25
1 2002-9-27
2 2001-2-24
3 2001-5-3
4 200510
5 20078
What I need is the date in a format %Y-%m
What I tried was
def parse(date):
if len(date)<=5:
return "{}-{}".format(date[:4], date[4:5], date[5:])
else:
pass
df['Date']= parse(df['Date'])
However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None.
So, how can I do it? Thank you!
we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.
assuming your column is called date
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
df['date_fixed'] = s
print(df)
date date_fixed
0 2001-12-25 2001-12-25
1 2002-9-27 2002-09-27
2 2001-2-24 2001-02-24
3 2001-5-3 2001-05-03
4 200510 2005-10-01
5 20078 2007-08-01
In steps,
first we cast the regular datetimes to a new series called s
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 NaT
5 NaT
Name: date, dtype: datetime64[ns]
as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,
we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 2005-10-01
5 2007-08-01
then we re-assign to your dataframe.
You could use a regex to pull out the year and month, and convert to datetime :
df = pd.read_clipboard("\s{2,}",header=None,names=["Dates"])
pattern = r"(?P<Year>\d{4})[-]*(?P<Month>\d{1,2})"
df['Dates'] = pd.to_datetime([f"{year}-{month}" for year, month in df.Dates.str.extract(pattern).to_numpy()])
print(df)
Dates
0 2001-12-01
1 2002-09-01
2 2001-02-01
3 2001-05-01
4 2005-10-01
5 2007-08-01
Note that pandas automatically converts the day to 1, since only year and month was supplied.

How to homogenize date type in a pandas dataframe column?

I have a Date column in my dataframe having dates with 2 different types (YYYY-DD-MM 00:00:00 and YYYY-DD-MM) :
Date
0 2023-01-10 00:00:00
1 2024-27-06
2 2022-07-04 00:00:00
3 NaN
4 2020-30-06
(you can use pd.read_clipboard(sep='\s\s+') after copying the previous dataframe to get it in your notebook)
I would like to have only a YYYY-MM-DD type. Consequently, I would like to have :
Date
0 2023-10-01
1 2024-06-27
2 2022-04-07
3 NaN
4 2020-06-30
How please could I do ?
Use Series.str.replace with to_datetime and format parameter:
df['Date'] = pd.to_datetime(df['Date'].str.replace(' 00:00:00',''), format='%Y-%d-%m')
print (df)
Date
0 2023-10-01
1 2024-06-27
2 2022-04-07
3 NaT
4 2020-06-30
Another idea with match both formats:
d1 = pd.to_datetime(df['Date'], format='%Y-%d-%m', errors='coerce')
d2 = pd.to_datetime(df['Date'], format='%Y-%d-%m 00:00:00', errors='coerce')
df['Date'] = d1.fillna(d2)

String dates into unixtime in a pandas dataframe

i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?
One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000

Days before end of month in pandas

I would like to get the number of days before the end of the month, from a string column representing a date.
I have the following pandas dataframe :
df = pd.DataFrame({'date':['2019-11-22','2019-11-08','2019-11-30']})
df
date
0 2019-11-22
1 2019-11-08
2 2019-11-30
I would like the following output :
df
date days_end_month
0 2019-11-22 8
1 2019-11-08 22
2 2019-11-30 0
The package pd.tseries.MonthEnd with rollforward seemed a good pick, but I can't figure out how to use it to transform a whole column.
Subtract all days of month created by Series.dt.daysinmonth with days extracted by Series.dt.day:
df['date'] = pd.to_datetime(df['date'])
df['days_end_month'] = df['date'].dt.daysinmonth - df['date'].dt.day
Or use offsets.MonthEnd, subtract and convert timedeltas to days by Series.dt.days:
df['days_end_month'] = (df['date'] + pd.offsets.MonthEnd(0) - df['date']).dt.days
print (df)
date days_end_month
0 2019-11-22 8
1 2019-11-08 22
2 2019-11-30 0

Reformat Dataframe column to date only format

I have a dataframe (df) with a column 'Date of birth' column:
Date of birth
0 1957-04-30 00:00:00
1 1966-11-10 00:00:00
2 1966-11-10 00:00:00
3 1962-03-28 00:00:00
4 1958-10-28 00:00:00
5 1958-06-04 00:00:00
How can I reformat the column to a date only format? After I reformat I'm going to work out age from a specific date:
Date of birth
0 1957-04-30
1 1966-11-10
2 1966-11-10
3 1962-03-28
4 1958-10-28
5 1958-06-04
I have tried using
df["Date of birth"] = pd.to_datetime(df['Date of birth'], format='%d%b%Y')
df["Date of birth"] = df["Date of birth"].dt.strftime('%m/%d/%Y')
but with no joy.
After the column becomes a date, use date accessor to access it.
df["Date of birth"] = pd.to_datetime(df['Date of birth']).dt.date

Categories