a two part question
I'm attempting to transform a column into a datetime, an easy task I assume ? as I've done it before on different df's using the documentation without much issue.
df = pd.DataFrame({'date' : ['24 October 2018', '23 April 2018', '18 January 2018']})
print(df)
date
0 24 October 2018
1 23 April 2018
2 18 January 2018
I was going through the datetime docs and I thought this piece of code would convert this column (which is an object) into a datetime
df.date = pd.to_datetime(df['date'], format="%d-%m-%Y",errors='ignore')
which gives the error :
ValueError: time data '24 April 2018' does not match format '%d-%m-%Y' (match)
I've attempted playing with formulas and going through documentation to no avail!
You are using the wrong format. '24 October 2018' uses format="%d %B %Y". The format specifiers are listed here.
edit: -Demo-
>>> import pandas as pd
>>> df = pd.DataFrame({'date':['24 October 2018', '23 April 2018', '18 January 2018']})
>>> df.date = pd.to_datetime(df['date'], format="%d %B %Y")
>>>
>>> df
date
0 2018-10-24
1 2018-04-23
2 2018-01-18
>>>
>>> df['date'][0]
Timestamp('2018-10-24 00:00:00')
>>> df['date'][0].month
10
edit 2: second question
>>> df['status'] = ['complete', 'complete', 'requested']
>>> df
date status
0 2018-10-24 complete
1 2018-04-23 complete
2 2018-01-18 requested
>>>
>>> df[df['status'] != 'complete']
date status
2 2018-01-18 requested
You can use pd.to_datetime or the datetime library
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x,'%d %B %Y'))
Related
i want to convert a column with string date '19 Desember 2022' for example (the month name is in Indonesian), to supported datetime format without translating it, how do i do that?
already tried this one
df_train['date'] = pd.to_datetime(df_train['date'], format='%d %B %Y') but got error time data '19 Desember 2022' does not match format '%d %B %Y' (match)
incase if anyone want to see the row image
Try using dateparser
import dateparser
df_train = pd.DataFrame(['19 Desember 2022', '20 Desember 2022', '21 Desember 2022', '22 Desember 2022'], columns = ['date'])
df_train['date'] = [dateparser.parse(x) for x in df_train['date']]
df_train
Output:
date
0 2022-12-19
1 2022-12-20
2 2022-12-21
3 2022-12-22
Pandas doesn't recognize bahasa(indonesian language) Try replacing the spelling of December (as pointed out you can use a one liner and create a new column):
df_train["formatted_date"] = pd.to_datetime(df_train["date"].str.replace("Desember", "December"), format="%d %B %Y")
print(df_train)
Output:
user_type date formatted_date
0 Anggota 19 Desember 2022 2022-12-19
1 Anggota 19 Desember 2022 2022-12-19
2 Anggota 19 Desember 2022 2022-12-19
3 Anggota 19 Desember 2022 2022-12-19
4 Anggota 19 Desember 2022 2022-12-19
I have a df with dates in the format %B %Y (e.g. June 2021, December 2022 etc.)
Date
Price
Apr 2022
2
Dec 2021
8
I am trying to sort dates in order of oldest to newest but when I try:
.sort_values(by='Date', ascending=False)
it is ordering in alphabetical order.
The 'Date' column is an Object.
ascending=False will sort from newest to oldest, but you are asking to sort oldest to newest, so you don't need that option;
there is a key option to specify how to parse the values before sorting them;
you may or may not want option ignore_index=True, which I included below.
We can use the key option to parse the values into datetime objects with pandas.to_datetime.
import pandas as pd
df = pd.DataFrame({'Date': ['Apr 2022', 'Dec 2021', 'May 2022', 'May 2021'], 'Price': [2, 8, 12, 15]})
df = df.sort_values(by='Date', ignore_index=True, key=pd.to_datetime)
print(df)
# Date Price
# 0 May 2021 15
# 1 Dec 2021 8
# 2 Apr 2022 2
# 3 May 2022 12
Relevant documentation:
DataFrame.sort_values;
to_datetime.
I've a sample dataframe
year_month
202004
202005
202011
202012
How can I append the month_name + year column to the dataframe
year_month month_name
202004 April 2020
202005 May 2020
202011 Nov 2020
202112 Dec 2021
You can use datetime.strptime to convert your string into a datetime object, then you can use datetime.strftime to convert it back into a string with different format.
>>> import datetime as dt
>>> import pandas as pd
>>> df = pd.DataFrame(['202004', '202005', '202011', '202012'], columns=['year_month'])
>>> df['month_name'] = df['year_month'].apply(lambda x: dt.datetime.strptime(x, '%Y%m').strftime('%b %Y'))
>>> df
year_month month_name
0 202004 Apr 2020
1 202005 May 2020
2 202011 Nov 2020
3 202012 Dec 2020
You can see the full list of format codes here.
I'm trying to format a column with date to 'Month Year' format without changing non-date values .
input_df = pd.DataFrame({'Period' :['2017-11-01 00:00:00', '2019-02-01 00:00:00', 'Mar 2020', 'Pre-Nov 2017', '2019-10-01 00:00:00' , 'Nov 17-Nov 18'] } )
input_df is
expected output is:
I tired with the below code which didn't work:
output_df['Period'] = input_df['Period'].apply(lambda x: x.strftime('%m %Y') if isinstance(x, datetime.date) else x)
Pls help..
You can do with error='coerce' and fillna:
input_df['new_period'] = (pd.to_datetime(input_df['Period'], errors='coerce')
.dt.strftime('%b %Y')
.fillna(input_df['Period'])
)
Output:
Period new_period
0 2017-11-01 00:00:00 Nov 2017
1 2019-02-01 00:00:00 Feb 2019
2 Mar 2020 Mar 2020
3 Pre-Nov 2017 Pre-Nov 2017
4 2019-10-01 00:00:00 Oct 2019
5 Nov 17-Nov 18 Nov 17-Nov 18
Update: Second, safer option:
s = pd.to_datetime(input_df['Period'], errors='coerce')
input_df['new_period'] = np.where(s.isna(), input_df['Period'],
s.dt.strftime('%b %Y'))
I have a line after split like in here:
lineaftersplit=Jan 31 00:57:07 2012 GMT
How do I get only year 2012 from this and compare if it falls between (2010) and (2013)
If lineaftersplit is a string value, you can use the datetime module to parse out the information, including the year:
import datetime
parsed_date = datetime.datetime.strptime(lineaftersplit, '%b %d %H:%M:%S %Y %Z')
if 2010 <= parsed_date.year <= 2013:
# year between 2010 and 2013.
This has the advantage that you can do further tests on the datetime object, including sorting and date arithmetic.
Demo:
>>> import datetime
>>> lineaftersplit="Jan 31 00:57:07 2012 GMT"
>>> parsed_date = datetime.datetime.strptime(lineaftersplit, '%b %d %H:%M:%S %Y %Z')
>>> parsed_date
datetime.datetime(2012, 1, 31, 0, 57, 7)
>>> parsed_date.year
2012
You can use str.rsplit:
>>> strs = 'Jan 31 00:57:07 2012 GMT'
str.rstrip will return a list like this:
>>> strs.rsplit(None,2)
['Jan 31 00:57:07', '2012', 'GMT']
Now we need the second item:
>>> year = strs.rsplit(None,2)[1]
>>> year
'2012'
>>> if 2010 <= int(year) <= 2013: #apply int() to get the integer value
... #do something
...
Try this:
st="Jan 31 00:57:07 2012 GMT".split()
year=int(st[3])
This actually works if the string is always of this format
str='Jan 31 00:57:07 2012 GMT'
str.split()[3]