I have a df with dates in the format %B %Y (e.g. June 2021, December 2022 etc.)
Date
Price
Apr 2022
2
Dec 2021
8
I am trying to sort dates in order of oldest to newest but when I try:
.sort_values(by='Date', ascending=False)
it is ordering in alphabetical order.
The 'Date' column is an Object.
ascending=False will sort from newest to oldest, but you are asking to sort oldest to newest, so you don't need that option;
there is a key option to specify how to parse the values before sorting them;
you may or may not want option ignore_index=True, which I included below.
We can use the key option to parse the values into datetime objects with pandas.to_datetime.
import pandas as pd
df = pd.DataFrame({'Date': ['Apr 2022', 'Dec 2021', 'May 2022', 'May 2021'], 'Price': [2, 8, 12, 15]})
df = df.sort_values(by='Date', ignore_index=True, key=pd.to_datetime)
print(df)
# Date Price
# 0 May 2021 15
# 1 Dec 2021 8
# 2 Apr 2022 2
# 3 May 2022 12
Relevant documentation:
DataFrame.sort_values;
to_datetime.
Related
i want to convert a column with string date '19 Desember 2022' for example (the month name is in Indonesian), to supported datetime format without translating it, how do i do that?
already tried this one
df_train['date'] = pd.to_datetime(df_train['date'], format='%d %B %Y') but got error time data '19 Desember 2022' does not match format '%d %B %Y' (match)
incase if anyone want to see the row image
Try using dateparser
import dateparser
df_train = pd.DataFrame(['19 Desember 2022', '20 Desember 2022', '21 Desember 2022', '22 Desember 2022'], columns = ['date'])
df_train['date'] = [dateparser.parse(x) for x in df_train['date']]
df_train
Output:
date
0 2022-12-19
1 2022-12-20
2 2022-12-21
3 2022-12-22
Pandas doesn't recognize bahasa(indonesian language) Try replacing the spelling of December (as pointed out you can use a one liner and create a new column):
df_train["formatted_date"] = pd.to_datetime(df_train["date"].str.replace("Desember", "December"), format="%d %B %Y")
print(df_train)
Output:
user_type date formatted_date
0 Anggota 19 Desember 2022 2022-12-19
1 Anggota 19 Desember 2022 2022-12-19
2 Anggota 19 Desember 2022 2022-12-19
3 Anggota 19 Desember 2022 2022-12-19
4 Anggota 19 Desember 2022 2022-12-19
This question already has answers here:
Numbers of Day in Month
(4 answers)
Closed 3 months ago.
I have a pandas Series that is of the following format
dates = [Nov 2022, Dec 2022, Jan 2023, Feb 2023 ..]
I want to create a dataframe that takes these values and has the number of days. I have to consider of course the case if it is a leap year
I have created a small function that splits the dates into 2 dataframes and 2 lists of months depending if they have 30 or 31 days like the following
month = [Nov, Dec, Jan, Feb ..] and
year = [2022, 2022, 2023, 2023 ..]
and then use the isin function in a sense if the month is in listA then insert 31 days etc. I also check for the leap years. However, I was wondering if there is a way to automate this whole proces with the pd.datetime
If you want the number of days in this month:
dates = pd.Series(['Nov 2022', 'Dec 2022', 'Jan 2023', 'Feb 2023'])
out = (pd.to_datetime(dates, format='%b %Y')
.dt.days_in_month
)
# Or
out = (pd.to_datetime(dates, format='%b %Y')
.add(pd.offsets.MonthEnd(0))
.dt.day
)
Output:
0 30
1 31
2 31
3 28
dtype: int64
previous interpretation
If I understand correctly, you want the day of year?
Assuming:
dates = pd.Series(['Nov 2022', 'Dec 2022', 'Jan 2023', 'Feb 2023'])
You can use:
pd.to_datetime(dates, format='%b %Y').dt.dayofyear
NB. The reference is the start of each month.
Output:
0 305
1 335
2 1
3 32
dtype: int64
I have a pandas date column in the following format
Date
0 March 13, 2020, March 13, 2020
1 3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021
2 NaN
3 May 20, 2022, May 21, 2022
I tried to convert the pattern to a single pattern to store to a new column.
import pandas as pd
import dateutil.parser
# initialise data of lists.
data = {'Date':['March 13, 2020, March 13, 2020', '3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021', 'NaN','May 20, 2022, May 21, 2022']}
# Create DataFrame
df = pd.DataFrame(data)
df["FormattedDate"] = df.Date.apply(lambda x: dateutil.parser.parse(x.strftime("%Y-%m-%d") ))
But i am getting an error
AttributeError: 'str' object has no attribute 'strftime'
Desired Output
Date DateFormatted
0 March 13, 2020, March 13, 2020 2020-03-13, 2020-03-13
1 3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021 2021-03-09, 2021-03-09, 2021-03-09, 2021-09-03
2 NaN NaN
3 May 20, 2022, May 21, 2022 2022-05-20, 2022-05-21
I was authot of previous solution, so possible solution is change also it for avoid , like separator and like value in date strings is used Series.str.extractall, converting to datetimes and last is aggregate join:
format_list = ["[0-9]{1,2}(?:\,|\.|\/|\-)(?:\s)?[0-9]{1,2}(?:\,|\.|\/|\-)(?:\s)?[0-9]{2,4}",
"[0-9]{1,2}(?:\.)(?:\s)?(?:(?:(?:j|J)a)|(?:(?:f|F)e)|(?:(?:m|M)a)|(?:(?:a|A)p)|(?:(?:m|M)a)|(?:(?:j|J)u)|(?:(?:a|A)u)|(?:(?:s|S)e)|(?:(?:o|O)c)|(?:(?:n|N)o)|(?:(?:d|D)e))\w*(?:\s)?[0-9]{2,4}",
"(?:(?:(?:j|J)an)|(?:(?:f|F)eb)|(?:(?:m|M)ar)|(?:(?:a|A)pr)|(?:(?:m|M)ay)|(?:(?:j|J)un)|(?:(?:j|J)ul)|(?:(?:a|A)ug)|(?:(?:s|S)ep)|(?:(?:o|O)ct)|(?:(?:n|N)ov)|(?:(?:d|D)ec))\w*(?:\s)?(?:\n)?[0-9]{1,2}(?:\s)?(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}",
"[0-9]{1,2}(?:\.)?(?:\s)?(?:\n)?(?:(?:(?:j|J)a)|(?:(?:f|F)e)|(?:(?:m|M)a)|(?:(?:a|A)p)|(?:(?:m|M)a)|(?:(?:j|J)u)|(?:(?:a|A)u)|(?:(?:s|S)e)|(?:(?:o|O)c)|(?:(?:n|N)o)|(?:(?:d|D)e))\w*(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}"]
# initialise data of lists.
data = {'Name':['Today is 09 September 2021', np.nan, '25 December 2021 is christmas', '01/01/2022 is newyear and will be holiday on 02.01.2022 also']}
# Create DataFrame
df = pd.DataFrame(data)
import dateutil.parser
f = lambda x: dateutil.parser.parse(x).strftime("%Y-%m-%d")
df['DateFormatted'] = df['Name'].str.extractall(f'({"|".join(format_list)})')[0].apply(f).groupby(level=0).agg(','.join)
print (df)
Name DateFormatted
0 Today is 09 September 2021 2021-09-09
1 NaN NaN
2 25 December 2021 is christmas 2021-12-25
3 01/01/2022 is newyear and will be holiday on 0... 2022-01-01,2022-02-01
Another alternative is processing lists after remove missing values in generato comprehension with join:
import dateutil.parser
f = lambda x: dateutil.parser.parse(x).strftime("%Y-%m-%d")
df['Date'] = df['Name'].str.findall("|".join(format_list)).dropna().apply(lambda y: ','.join(f(x) for x in y))
print (df)
Name Date
0 Today is 09 September 2021 2021-09-09
1 NaN NaN
2 25 December 2021 is christmas 2021-12-25
3 01/01/2022 is newyear and will be holiday on 0... 2022-01-01,2022-02-01
a two part question
I'm attempting to transform a column into a datetime, an easy task I assume ? as I've done it before on different df's using the documentation without much issue.
df = pd.DataFrame({'date' : ['24 October 2018', '23 April 2018', '18 January 2018']})
print(df)
date
0 24 October 2018
1 23 April 2018
2 18 January 2018
I was going through the datetime docs and I thought this piece of code would convert this column (which is an object) into a datetime
df.date = pd.to_datetime(df['date'], format="%d-%m-%Y",errors='ignore')
which gives the error :
ValueError: time data '24 April 2018' does not match format '%d-%m-%Y' (match)
I've attempted playing with formulas and going through documentation to no avail!
You are using the wrong format. '24 October 2018' uses format="%d %B %Y". The format specifiers are listed here.
edit: -Demo-
>>> import pandas as pd
>>> df = pd.DataFrame({'date':['24 October 2018', '23 April 2018', '18 January 2018']})
>>> df.date = pd.to_datetime(df['date'], format="%d %B %Y")
>>>
>>> df
date
0 2018-10-24
1 2018-04-23
2 2018-01-18
>>>
>>> df['date'][0]
Timestamp('2018-10-24 00:00:00')
>>> df['date'][0].month
10
edit 2: second question
>>> df['status'] = ['complete', 'complete', 'requested']
>>> df
date status
0 2018-10-24 complete
1 2018-04-23 complete
2 2018-01-18 requested
>>>
>>> df[df['status'] != 'complete']
date status
2 2018-01-18 requested
You can use pd.to_datetime or the datetime library
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x,'%d %B %Y'))
I have a dataframe which has a Date column, I want to remove those row from Date column which doesn't have YYYY (eg, 2018, it can be any year) format.
I had used apply method with regex expression but doesn't work ,
df[df.Date.apply(lambda x: re.findall(r'[0-9]{4}', x))]
The Date column can have values such as,
12/3/2018
March 12, 2018
stackoverflow
Mar 12, 2018
no date text
3/12/2018
So here output should be
12/3/2018
March 12, 2018
Mar 12, 2018
3/12/2018
This is one approach. Using pd.to_datetime with errors="coerce"
Ex:
import pandas as pd
df = pd.DataFrame({"Col1": ['12/3/2018', 'March 12, 2018', 'stackoverflow', 'Mar 12, 2018', 'no date text', '3/12/2018']})
df["Col1"] = pd.to_datetime(df["Col1"], errors="coerce")
df = df[df["Col1"].notnull()]
print(df)
Output:
Col1
0 2018-12-03
1 2018-03-12
3 2018-03-12
5 2018-03-12
Or if you want to maintain the original data
import pandas as pd
def validateDate(d):
try:
pd.to_datetime(d)
return d
except:
return None
df = pd.DataFrame({"Col1": ['12/3/2018', 'March 12, 2018', 'stackoverflow', 'Mar 12, 2018', 'no date text', '3/12/2018']})
df["Col1"] = df["Col1"].apply(validateDate)
df.dropna(inplace=True)
print(df)
Output:
Col1
0 12/3/2018
1 March 12, 2018
3 Mar 12, 2018
5 3/12/2018