I need to convert a df with a data column of integers and convert this to the following format in the current year: YYYY-MM-DD HH:MM:SS. I have a DF that looks like this:
Date LT Mean
0 7 5.491916
1 8 5.596823
2 9 5.793934
3 10 7.501096
4 11 8.152358
5 12 8.426322
And, I need it to look like this using the current year 2020:
Date LT Mean
0 2020-07-01 5.491916
1 2020-08-01 5.596823
2 2020-09-01 5.793934
3 2020-10-01 7.501096
4 2020-11-01 8.152358
5 2020-12-01 8.426322
I have not been able to find a reference for converting a single integer used for the date and converting it into the yyyy-mm-dd hh:mm:ss format i need. Thank you,
You can use pandas to_datetime function. Assuming your Date column represents the month, you can use like this:
df['Date'] = pandas.to_datetime(df["Date"], format='%m').apply(lambda dt: dt.replace(year=2020))
Then if you need transform the column to string in the specified format:
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d %H:%m:%s')
Related
I have the following date column that I would like to transform to a pandas datetime object. Is it possible to do this with weekly data? For example, 1-2018 stands for week 1 in 2018 and so on. I tried the following conversion but I get an error message: Cannot use '%W' or '%U' without day and year
import pandas as pd
df1 = pd.DataFrame(columns=["date"])
df1['date'] = ["1-2018", "1-2018", "2-2018", "2-2018", "3-2018", "4-2018", "4-2018", "4-2018"]
df1["date"] = pd.to_datetime(df1["date"], format = "%W-%Y")
You need to add a day to the datetime format
df1["date"] = pd.to_datetime('0' + df1["date"], format='%w%W-%Y')
print(df1)
Output
date
0 2018-01-07
1 2018-01-07
2 2018-01-14
3 2018-01-14
4 2018-01-21
5 2018-01-28
6 2018-01-28
7 2018-01-28
As the error message says, you need to specify the day of the week by adding %w :
df1["date"] = pd.to_datetime( '0'+df1.date, format='%w%W-%Y')
Hi i have date column in DataFrame, i need to get Date patterns from that columns.For example i have below column.I need to get patterns from this.
0 01/7/2022
1 01/8/2022
2 Jan/9/2022
3 01/10/2022
4 25/11/2022
5 01/12/2022
6 21/9/2022
7 01/14/2022
8 01/15/2022
9 May/16/2022
10 07172022
11 01/18/2022
12 10-3-2021
I tried this way
df_date=df_date.astype(str).replace([r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|(0[1-9]|1[012]))(/)[\d]{1,2}(/)[\d]{2,4}'],['MM/DD/YYYY'], regex=True)
df_month=df_date.astype(str).replace([r'[\d]{1,2}(/)(0[1-9]|1[012])(/)[\d]{2,4}'],['DD/MM/YYYY'],regex=True)
df_mm=df_date.astype(str).replace([r'[\d]{,2}(-)(0[1-9]|1[012])(-)[\d]{2,4}'],['MM-DD-YYYY'],regex=True)
df_combi=df_date.astype(str).replace([r'[\d]{1,2}(0[1-9]|1[012])[\d]{2,4}'],['DDMMYYYY'],regex=True)
df_com=df_date.astype(str).replace([r'(0[1-9]|1[012])[\d]{1,2}[\d]{2,4}'],['MMDDYYYY'],regex=True)
The output should looks like this:
Date
0 MM/DD/YYYY
1 MM/DD/YYYY
2 MM/DD/YYYY
3 MM/DD/YYYY
4 25/11/2022
5 MM/DD/YYYY
6 21/9/2022
7 MM/DD/YYYY
8 MM/DD/YYYY
9 MM/DD/YYYY
10 MMDDYYYY
11 MM/DD/YYYY
12 10-3-2021
The showing numbers need to change.
One way to do this is to create a function that tries to convert a string to date using a list of formats.
from datetime import datetime
def get_date_format(text):
fmt_map = {'%m/%d/%Y': 'MM/DD/YYYY',
'%d/%m/%Y': 'DD/MM/YYYY',
'%b/%d/%Y': 'Mon/DD/YYYY',
'%d-%m-%Y': 'DD-MM-YYYY',
'%m-%d-%Y': 'MM-DD-YYYY',
'%m%d%Y': 'MMDDYYYY'}
for fmt in ('%m/%d/%Y', '%d/%m/%Y', '%b/%d/%Y', '%d-%m-%Y', '%m-%d-%Y', '%m%d%Y'):
try:
datetime.strptime(text, fmt)
return fmt_map[fmt]
except ValueError:
pass
return text
If it succeeds it returns the first format that worked (or in this case, it uses a map to return a prettier format string). If none of them work, it returns the original string. (You could substitute a default value or an error message in place of the original value.)
You can add as many date formats as you need for your expected input, and put them in any order to deal with ambiguities like MM/DD vs. DD/MM.
You can then apply that function to the date column in your dataframe.
import pandas as pd
dates = ["01/7/2022", "01/8/2022", "Jan/9/2022", "01/10/2022", "25/11/2022", "01/12/2022", "21/9/2022",
"01/14/2022", "01/15/2022", "May/16/2022", "07172022", "01/18/2022", "10-3-2021", "not a date"]
df = pd.DataFrame({"date_string": dates})
df['FMT'] = df['date_string'].apply(get_date_format)
print(df)
Output:
date_string FMT
0 01/7/2022 MM/DD/YYYY
1 01/8/2022 MM/DD/YYYY
2 Jan/9/2022 Mon/DD/YYYY
3 01/10/2022 MM/DD/YYYY
4 25/11/2022 DD/MM/YYYY
5 01/12/2022 MM/DD/YYYY
6 21/9/2022 DD/MM/YYYY
7 01/14/2022 MM/DD/YYYY
8 01/15/2022 MM/DD/YYYY
9 May/16/2022 Mon/DD/YYYY
10 07172022 MMDDYYYY
11 01/18/2022 MM/DD/YYYY
12 10-3-2021 DD-MM-YYYY
13 not a date not a date
I 've got stuck with the following format:
0 2001-12-25
1 2002-9-27
2 2001-2-24
3 2001-5-3
4 200510
5 20078
What I need is the date in a format %Y-%m
What I tried was
def parse(date):
if len(date)<=5:
return "{}-{}".format(date[:4], date[4:5], date[5:])
else:
pass
df['Date']= parse(df['Date'])
However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None.
So, how can I do it? Thank you!
we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.
assuming your column is called date
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
df['date_fixed'] = s
print(df)
date date_fixed
0 2001-12-25 2001-12-25
1 2002-9-27 2002-09-27
2 2001-2-24 2001-02-24
3 2001-5-3 2001-05-03
4 200510 2005-10-01
5 20078 2007-08-01
In steps,
first we cast the regular datetimes to a new series called s
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 NaT
5 NaT
Name: date, dtype: datetime64[ns]
as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,
we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 2005-10-01
5 2007-08-01
then we re-assign to your dataframe.
You could use a regex to pull out the year and month, and convert to datetime :
df = pd.read_clipboard("\s{2,}",header=None,names=["Dates"])
pattern = r"(?P<Year>\d{4})[-]*(?P<Month>\d{1,2})"
df['Dates'] = pd.to_datetime([f"{year}-{month}" for year, month in df.Dates.str.extract(pattern).to_numpy()])
print(df)
Dates
0 2001-12-01
1 2002-09-01
2 2001-02-01
3 2001-05-01
4 2005-10-01
5 2007-08-01
Note that pandas automatically converts the day to 1, since only year and month was supplied.
I have a dataframe that has a date time string but is not in traditional date time format. I would like to separate out the date from the time into two separate columns. And then eventually also separate out the month.
This is what the date/time string looks like: 2019-03-20T16:55:52.981-06:00
>>> df.head()
Date Score
2019-03-20T16:55:52.981-06:00 10
2019-03-07T06:16:52.174-07:00 9
2019-06-17T04:32:09.749-06:003 1
I tried this but got a type error:
df['Month'] = pd.DatetimeIndex(df['Date']).month
This can be done just using pandas itself. You can first convert the Date column to datetime by passing utc = True:
df['Date'] = pd.to_datetime(df['Date'], utc = True)
And then just extract the month using dt.month:
df['Month'] = df['Date'].dt.month
Output:
Date Score Month
0 2019-03-20 22:55:52.981000+00:00 10 3
1 2019-03-07 13:16:52.174000+00:00 9 3
2 2019-06-17 10:32:09.749000+00:00 1 6
From the documentation of pd.to_datetime you can see a parameter:
utc : boolean, default None
Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).
I would like to convert the data type float below to datetime format:
df
Date
0 NaN
1 NaN
2 201708.0
4 201709.0
5 201700.0
6 201600.0
Name: Cred_Act_LstPostDt_U324123, dtype: float64
pd.to_datetime(df['Date'],format='%Y%m.0')
ValueError: time data 201700.0 does not match format '%Y%m.0' (match)
How could I transform these rows without month information as yyyy01 as default?
You can use pd.Series.str.replace to clean up your month data:
s = [x.replace('00.0', '01.0') for x in df['Date'].astype(str)]
df['Date'] = pd.to_datetime(s, format='%Y%m.0', errors='coerce')
print(df)
Date
0 NaT
1 NaT
2 2017-08-01
4 2017-09-01
5 2017-01-01
6 2016-01-01
Create a string that contains the float using .asType(str), then split the string at the fourth char and using cat insert a hyphen. Then you can use format='%Y%m.
However this may still fail if you try to use incorrect month numbering, such as month 00
string = df['Date'].astype(str)
s = pd.Series([string[:4], '-',string[4:6])
date = s.str.cat(sep=',')
pd.to_datetime(date.astype(str),format='%Y%m')