convert series of dates to int number of dates [duplicate] - python

This question already has answers here:
Numbers of Day in Month
(4 answers)
Closed 3 months ago.
I have a pandas Series that is of the following format
dates = [Nov 2022, Dec 2022, Jan 2023, Feb 2023 ..]
I want to create a dataframe that takes these values and has the number of days. I have to consider of course the case if it is a leap year
I have created a small function that splits the dates into 2 dataframes and 2 lists of months depending if they have 30 or 31 days like the following
month = [Nov, Dec, Jan, Feb ..] and
year = [2022, 2022, 2023, 2023 ..]
and then use the isin function in a sense if the month is in listA then insert 31 days etc. I also check for the leap years. However, I was wondering if there is a way to automate this whole proces with the pd.datetime

If you want the number of days in this month:
dates = pd.Series(['Nov 2022', 'Dec 2022', 'Jan 2023', 'Feb 2023'])
out = (pd.to_datetime(dates, format='%b %Y')
.dt.days_in_month
)
# Or
out = (pd.to_datetime(dates, format='%b %Y')
.add(pd.offsets.MonthEnd(0))
.dt.day
)
Output:
0 30
1 31
2 31
3 28
dtype: int64
previous interpretation
If I understand correctly, you want the day of year?
Assuming:
dates = pd.Series(['Nov 2022', 'Dec 2022', 'Jan 2023', 'Feb 2023'])
You can use:
pd.to_datetime(dates, format='%b %Y').dt.dayofyear
NB. The reference is the start of each month.
Output:
0 305
1 335
2 1
3 32
dtype: int64

Related

how to convert a column with string datetime to datetime format

i want to convert a column with string date '19 Desember 2022' for example (the month name is in Indonesian), to supported datetime format without translating it, how do i do that?
already tried this one
df_train['date'] = pd.to_datetime(df_train['date'], format='%d %B %Y') but got error time data '19 Desember 2022' does not match format '%d %B %Y' (match)
incase if anyone want to see the row image
Try using dateparser
import dateparser
df_train = pd.DataFrame(['19 Desember 2022', '20 Desember 2022', '21 Desember 2022', '22 Desember 2022'], columns = ['date'])
df_train['date'] = [dateparser.parse(x) for x in df_train['date']]
df_train
Output:
date
0 2022-12-19
1 2022-12-20
2 2022-12-21
3 2022-12-22
Pandas doesn't recognize bahasa(indonesian language) Try replacing the spelling of December (as pointed out you can use a one liner and create a new column):
df_train["formatted_date"] = pd.to_datetime(df_train["date"].str.replace("Desember", "December"), format="%d %B %Y")
print(df_train)
Output:
user_type date formatted_date
0 Anggota 19 Desember 2022 2022-12-19
1 Anggota 19 Desember 2022 2022-12-19
2 Anggota 19 Desember 2022 2022-12-19
3 Anggota 19 Desember 2022 2022-12-19
4 Anggota 19 Desember 2022 2022-12-19

I am working with a Dataset, where few values in the Date Column are like ' 2 Months Ago', ' 28 Days Ago', how can we change these dates to Month-Year

I am thinking of trying if condition, but is there any library or method which I don't know about can solve this?
You can subtract actual month-year period with months from values with decimals with month(s) and assign back to DataFrames, for days convert values to timedeltas and subtract actual datetime by Series.rsub for subtract from right side:
print (df)
col
0 28 days ago
1 4 months ago
2 11 months ago
3 Oct, 2021
now = pd.Timestamp('now')
per = now.to_period('m')
date = now.floor('d')
s = df['col'].str.extract('(\d+)\s*month', expand=False).astype(float)
s1 = df['col'].str.extract('(\d+)\s*day', expand=False).astype(float)
mask, mask1 = s.notna(), s1.notna()
df.loc[mask, 'col'] = s[mask].astype(int).rsub(per).dt.strftime('%b, %Y')
df.loc[mask1, 'col'] = pd.to_timedelta(s1[mask1], unit='d').rsub(date).dt.strftime('%b, %Y')
print (df)
col
0 Sep, 2022
1 Jun, 2022
2 Nov, 2021
3 Oct, 2021
Assuming this input:
col
0 4 months ago
1 Oct, 2021
2 9 months ago
You can use:
# try to get a date:
s = pd.to_datetime(df['col'], errors='coerce')
# extract the month offset
offset = (df['col']
.str.extract(r'(\d+) months? ago', expand=False)
.fillna(0).astype(int)
)
# if the date it NaT, replace by today - n months
df['date'] = s.fillna(pd.Timestamp('today').normalize()
- offset*pd.DateOffset(months=1))
If you want a Mon, Year format:
df['date2'] = df['col'].where(offset.eq(0),
(pd.Timestamp('today').normalize()
-offset*pd.DateOffset(months=1)
).dt.strftime('%b, %Y')
)
output:
col date date2
0 4 months ago 2022-06-28 Jun, 2022
1 Oct, 2021 2021-10-01 Oct, 2021
2 9 months ago 2022-01-28 Jan, 2022

Python: Order Dates that are in the format: %B %Y

I have a df with dates in the format %B %Y (e.g. June 2021, December 2022 etc.)
Date
Price
Apr 2022
2
Dec 2021
8
I am trying to sort dates in order of oldest to newest but when I try:
.sort_values(by='Date', ascending=False)
it is ordering in alphabetical order.
The 'Date' column is an Object.
ascending=False will sort from newest to oldest, but you are asking to sort oldest to newest, so you don't need that option;
there is a key option to specify how to parse the values before sorting them;
you may or may not want option ignore_index=True, which I included below.
We can use the key option to parse the values into datetime objects with pandas.to_datetime.
import pandas as pd
df = pd.DataFrame({'Date': ['Apr 2022', 'Dec 2021', 'May 2022', 'May 2021'], 'Price': [2, 8, 12, 15]})
df = df.sort_values(by='Date', ignore_index=True, key=pd.to_datetime)
print(df)
# Date Price
# 0 May 2021 15
# 1 Dec 2021 8
# 2 Apr 2022 2
# 3 May 2022 12
Relevant documentation:
DataFrame.sort_values;
to_datetime.

How can I order the table by month and then show it in plot chart? Python

I want to Order the table by the year and by month.
Sort_values doesnt work for me.
after that I need to show it in plot line chart with month over time
How can I do it?
df10=df.groupby(['year','month'],as_index=False).Sales.sum()
df10
year month Sales
0 2018 Apr 452546547.720000
1 2018 Aug 452830473.750001
2 2018 Dec 525888501.900000
3 2018 Feb 417589010.130000
4 2018 Jan 506665837.860000
5 2018 Jul 527113871.520000
6 2018 Jun 489527703.960000
7 2018 Mar 471807206.670001
8 2018 May 517740285.600000
9 2018 Nov 417862539.330000
10 2018 Oct 441153829.710001
11 2018 Sep 450298873.800000
12 2019 Apr 440397073.890000
13 2019 Feb 408684717.060001
14 2019 Jan 511212275.310001
15 2019 Mar 455560627.320000
16 2019 May 571120956.510000
sns.lineplot(x='month',y='Sales',data=df10)
'To sort by month, you need to have mont has number, or sorted string text. Either way, refer below to my code to get month as number, then plot the df however you like.
from time import strptime
df['month_num'] = [strptime(x,'%b').tm_mon for x in df['month']
df = df.soft_vlaues(['year', 'month_num')
data['y-m'] = data['year'].astype(str) +'-'+ data['month']
data['y-m'] = pd.to_datetime(data['y-m'])
sns.lineplot(y='Sales',x='y-m',data=data)
plt.xticks(rotation=45)
plt.show()
When sorting by dates, you first need to convert your data to datetime using datetime.date(year, month)
the key parameter helps you with that.
df10.sort_values(key=lambda e: datetime.date(e["year"], e["month"]))

Chaning date to datetime Python

I have2 columns both columns have a date but no year.
I cant just do a convert becuase part of the date is missing the year is missing
if I try:
pd.to_datetime(df13["Date"])
I get error:
Out of bounds nanosecond timestamp: 1-04-16 00:00:00
Sample
date1 +---------+ date2
Apr 16 +----------+ Apr 15 4:30PM
Mar 17 +----------+ Mar 14 3:35PM
Feb 7 +----------+ Feb 3 2:03PM
Dec 21 +----------+ Dec 19 3:21PM
I like to make it a datetime column with a year and if the new date is greater than today then subtract a year. The data in the list goes as far back as a year if it just adds 2020 as the year it may be wrong in some cases
I got part of my by answer date1:
df13["Date"] = df13["Date"]+ ' 2020'
df13["Date"] = df13["Date"].apply(lambda : datetime.strptime(,"%b %d %Y"))
date2:
df13["SEC Form 4"].apply(lambda : datetime.strptime(,"%b %d %I:%M %p"))
I still have a year problem but need to add more logic

Categories