I have2 columns both columns have a date but no year.
I cant just do a convert becuase part of the date is missing the year is missing
if I try:
pd.to_datetime(df13["Date"])
I get error:
Out of bounds nanosecond timestamp: 1-04-16 00:00:00
Sample
date1 +---------+ date2
Apr 16 +----------+ Apr 15 4:30PM
Mar 17 +----------+ Mar 14 3:35PM
Feb 7 +----------+ Feb 3 2:03PM
Dec 21 +----------+ Dec 19 3:21PM
I like to make it a datetime column with a year and if the new date is greater than today then subtract a year. The data in the list goes as far back as a year if it just adds 2020 as the year it may be wrong in some cases
I got part of my by answer date1:
df13["Date"] = df13["Date"]+ ' 2020'
df13["Date"] = df13["Date"].apply(lambda : datetime.strptime(,"%b %d %Y"))
date2:
df13["SEC Form 4"].apply(lambda : datetime.strptime(,"%b %d %I:%M %p"))
I still have a year problem but need to add more logic
Related
This question already has answers here:
Numbers of Day in Month
(4 answers)
Closed 3 months ago.
I have a pandas Series that is of the following format
dates = [Nov 2022, Dec 2022, Jan 2023, Feb 2023 ..]
I want to create a dataframe that takes these values and has the number of days. I have to consider of course the case if it is a leap year
I have created a small function that splits the dates into 2 dataframes and 2 lists of months depending if they have 30 or 31 days like the following
month = [Nov, Dec, Jan, Feb ..] and
year = [2022, 2022, 2023, 2023 ..]
and then use the isin function in a sense if the month is in listA then insert 31 days etc. I also check for the leap years. However, I was wondering if there is a way to automate this whole proces with the pd.datetime
If you want the number of days in this month:
dates = pd.Series(['Nov 2022', 'Dec 2022', 'Jan 2023', 'Feb 2023'])
out = (pd.to_datetime(dates, format='%b %Y')
.dt.days_in_month
)
# Or
out = (pd.to_datetime(dates, format='%b %Y')
.add(pd.offsets.MonthEnd(0))
.dt.day
)
Output:
0 30
1 31
2 31
3 28
dtype: int64
previous interpretation
If I understand correctly, you want the day of year?
Assuming:
dates = pd.Series(['Nov 2022', 'Dec 2022', 'Jan 2023', 'Feb 2023'])
You can use:
pd.to_datetime(dates, format='%b %Y').dt.dayofyear
NB. The reference is the start of each month.
Output:
0 305
1 335
2 1
3 32
dtype: int64
My code is returning the following data in CSV
Quantity Date of purchase
1 17 May 2022 at 5:40:20PM BST
1 2 Apr 2022 at 7:41:29PM BST
1 2 Apr 2022 at 6:42:05PM BST
1 29 Mar 2022 at 12:34:56PM BST
1 29 Mar 2022 at 10:52:54AM BST
1 29 Mar 2022 at 12:04:52AM BST
1 28 Mar 2022 at 4:49:34PM BST
1 28 Mar 2022 at 11:13:37AM BST
1 27 Mar 2022 at 8:53:05PM BST
1 27 Mar 2022 at 5:10:21PM BST
I am trying to get the dates only and adding the quantity data with the same date but below is the code for that
data = read_csv("products_sold_history_data.csv")
data['Date of purchase'] = pandas.to_datetime(data['Date of purchase'] , format='%d-%m-%Y').dt.date
but its giving me error can anyone please help how can I take the dates only from Date of purchase column and then add the quantity values in the same date.
Date format in your data is not the format that you specified: format='%d-%m-%Y'.
You could specify it explicitly, or let pandas infer the format for you by not providing the format:
pandas.to_datetime(data['Date of purchase']).dt.date
If you want to specify the format explicitly, you should provide the format that matches your data:
pandas.to_datetime(data['Date of purchase'], format='%d %b %Y at %H:%M:%S%p %Z')
here is one way to do it, where a date is created as a on-fly field and not making part of the DF.
Also, IIUC you're not concerned with the time part and only date is what you need to use for summing it up
extract the date part using regex, create a temp field dte using pandas.assign, and then a groupby to sum up the quantity
df.assign(dte = pd.to_datetime(
df['purchase'].str.extract(r'(.*)(at)')[0].str.strip())
).groupby('dte')['qty'].sum().reset_index()
dte qty
0 2022-02-06 3
1 2022-02-07 3
2 2022-02-08 2
3 2022-02-09 2
4 2022-02-10 2
5 2022-02-11 3
6 2022-02-14 1
7 2022-02-15 1
8 2022-02-19 1
I want to Order the table by the year and by month.
Sort_values doesnt work for me.
after that I need to show it in plot line chart with month over time
How can I do it?
df10=df.groupby(['year','month'],as_index=False).Sales.sum()
df10
year month Sales
0 2018 Apr 452546547.720000
1 2018 Aug 452830473.750001
2 2018 Dec 525888501.900000
3 2018 Feb 417589010.130000
4 2018 Jan 506665837.860000
5 2018 Jul 527113871.520000
6 2018 Jun 489527703.960000
7 2018 Mar 471807206.670001
8 2018 May 517740285.600000
9 2018 Nov 417862539.330000
10 2018 Oct 441153829.710001
11 2018 Sep 450298873.800000
12 2019 Apr 440397073.890000
13 2019 Feb 408684717.060001
14 2019 Jan 511212275.310001
15 2019 Mar 455560627.320000
16 2019 May 571120956.510000
sns.lineplot(x='month',y='Sales',data=df10)
'To sort by month, you need to have mont has number, or sorted string text. Either way, refer below to my code to get month as number, then plot the df however you like.
from time import strptime
df['month_num'] = [strptime(x,'%b').tm_mon for x in df['month']
df = df.soft_vlaues(['year', 'month_num')
data['y-m'] = data['year'].astype(str) +'-'+ data['month']
data['y-m'] = pd.to_datetime(data['y-m'])
sns.lineplot(y='Sales',x='y-m',data=data)
plt.xticks(rotation=45)
plt.show()
When sorting by dates, you first need to convert your data to datetime using datetime.date(year, month)
the key parameter helps you with that.
df10.sort_values(key=lambda e: datetime.date(e["year"], e["month"]))
I am attempting to use regex, extracting a date from df['Subject'], on a dataframe series/column and creating a new column df['Date'] with the resulting date extraction.
The following code is extracting most column dates:
Code:
df['Bug Date'] = df['Subject'].str.extract('(\s\w{3}\s\w{3}\s\d{1,2}\s\d{2}\:\d{2}\:\d{2}\s\d{4})')
Input: Typical text row in the df['Subject'] column:
'Call Don today [atron#uw.edu.au - Wed Apr 14 00:18:50 2021]'
' Report access [rbund#gmail.com - Mon Apr 4 13:11:12 2021]'
Output: 'Wed Apr 14 00:18:50 2021'
'Mon Apr 4 13:11:12 2021'
A few of the dates however, all single digit, show up as NaT. Another option I am trying is:
I get no errors, and no changes, in this code:
option1 = '(\s\w{3}\s\w{3}\s\d{1,2}\s\d{2}\:\d{2}\:\d{1}\s\d{4})'
df.replace({'Bug Date':'NaN'},{'Subject':option1},inplace=True)
with Pandas:
DataFrame.replace(self, to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
Help would be appreciated. Why doesnt d{1,2} work on some single digit days and not others? After careful analysis of the strings, I see no difference. However, the bug is consistent. 4 rows containing single digit string for day of the month change to NaN, while many other single digit rows do transfer well to the new column.
Here are a few rows of data. The first 4 are the trouble rows, out of about 200 rows with single and double digit day 'strings':
'Re: report [karen.glass#google.edu - Fri Apr 2 09:27:38 2021]', #results in NaN
'Re: report [hong.li#msoft.edu - Mon Apr 5 09:39:37 2021]', #results in NaN
'Re: report [sdgesmin#563.com - Wed Apr 7 09:21:02 2021]', #results in NaN
'Re: report [pdefgios#utonto.ca - Thu Apr 8 12:40:28 2021]', #results in NaN
'Re: report [zhuig-li7#mail.ghua.edu.cn - Tue Apr 13 02:38:51 2021]', #Good
'Re: report [l4ddgri#eie.grdf - Mon Mar 8 12:50:34 2021]' #Good,
'Re: report [luca.jodfge#ki.sfge - Thu Apr 8 23:52:20 2021]' #Good```
After many a trial and error, I ended up using:
``` df['Bug Date'] = df['Subject'].str.slice(start=-25,stop=-1).str.pad(25)
So this string of date and time column creation gave me no errors but when I tried to convert to_datetime a random error date would pop up. So I gave and extra space within the to_datetime( format= ) code from:
'%a %b %d %H:%M:%S %Y'
to
' %a %b %d %H:%M:%S %Y'
and that seems to have done the trick. <fingers_crossed>
I have a pandas dataframe with some timestamp values in a column. I wish to get the sum of values grouped by every hour.
Date_and_Time Frequency
0 Jan 08 15:54:39 NaN
1 Jan 09 10:48:13 NaN
2 Jan 09 10:42:24 NaN
3 Jan 09 20:18:46 NaN
4 Jan 09 12:08:23 NaN
I started off removing the leading days in the column and then typed the following to convert the values to date_time compliant format:
dateTimeValues['Date_and_Time'] = pd.to_datetime(dateTimeValues['Date_and_Time'], format='%b %d %H:%M:%S')
After doing so, I receive the following error:
ValueError: time data 'Jan 08 12:41:' does not match format '%b %d %H:%M:%S' (match)
On checking my input CSV, I can confirm that no column containing the above data are incomplete.
I'd like to know how to resolve this issue and successfully process my timestamps to their desired output format.
I suggest you create a self defined lambda function which selects the needed format string.
You may have to edit the lambda function:
df = pd.DataFrame({'Date_and_Time':['Jan 08 15:54:39', 'Jan 09 10:48:']})
df
>>>
Date_and_Time
0 Jan 08 15:54:39
1 Jan 09 10:48:
With one typo in line 1.
Now selected the format string for every item with the lambda function.
def my_lambda(x):
f = '%b %d %H:%M:%S'
if x.endswith(':'):
f = '%b %d %H:%M:'
return pd.to_datetime(x , format=f)
df['Date_and_Time'] = df['Date_and_Time'].apply(my_lambda)
>>> df
Date_and_Time
0 1900-01-08 15:54:39
1 1900-01-09 10:48:00