I want to write a program where i can compare current date with couple of dates that i have.
my data is
12 JUN 2016
21 MAR 1989
15 MAR 1958
15 SEP 1958
23 OCT 1930
15 SEP 1928
10 MAR 2010
23 JAN 1928
15 NOV 1925
26 AUG 2009
29 APR 1987
20 JUL 1962
10 MAY 1960
13 FEB 1955
10 MAR 1956
3 MAR 2010
14 NOV 1958
4 AUG 1985
24 AUG 1956
15 FEB 1955
19 MAY 1987
30 APR 1990
8 SEP 2014
18 JAN 2012
14 DEC 1960
1 AUG 1998
7 SEP 1963
9 MAR 2012
1 MAY 1990
14 MAY 1985
15 JUN 1945
5 APR 1995
26 FEB 1987
13 DEC 1983
15 AUG 2009
16 SEP 1980
16 JAN 2005
19 JUN 2011
Now how can i compare this to current date to know that date is not exceeding current date ( i.e 13/JUN/2016).
please help me! Thank you.
You have to create a datetime object using the string data. You can create the object by parsing the date string using strptime method.
from datetime import datetime
mydate = datetime.strptime("19 JUN 2011", "%d %b %Y")
And then use the object to compare it with today's date.
print mydate < datetime.today()
True
Related
I have a dataframe df as below:
Student_id Date_of_visit(d/m/y)
1 1/4/2020
1 30/12/2019
1 26/12/2019
2 3/1/2021
2 10/1/2021
3 4/5/2020
3 22/8/2020
How can I get the bar-graph with x-axis as month-year(eg: y-ticks: Dec 2019, Jan 2020, Feb 2020) and on y-axis - the total number of students (count) visited on a particular month.
Convert values to datetimes, then use DataFrame.resample with Resampler.size for counts, create new format of datetimes by DatetimeIndex.strftime:
df['Date_of_visit'] = pd.to_datetime(df['Date_of_visit'], dayfirst=True)
s = df.resample('M', on='Date_of_visit')['Student_id'].size()
s.index = s.index.strftime('%b %Y')
print (s)
Date_of_visit
Dec 2019 2
Jan 2020 0
Feb 2020 0
Mar 2020 0
Apr 2020 1
May 2020 1
Jun 2020 0
Jul 2020 0
Aug 2020 1
Sep 2020 0
Oct 2020 0
Nov 2020 0
Dec 2020 0
Jan 2021 2
Name: Student_id, dtype: int64
If need count only unique Student_id use Resampler.nunique:
s = df.resample('M', on='Date_of_visit')['Student_id'].nunique()
s.index = s.index.strftime('%b %Y')
print (s)
Date_of_visit
Dec 2019 1
Jan 2020 0
Feb 2020 0
Mar 2020 0
Apr 2020 1
May 2020 1
Jun 2020 0
Jul 2020 0
Aug 2020 1
Sep 2020 0
Oct 2020 0
Nov 2020 0
Dec 2020 0
Jan 2021 1
Name: Student_id, dtype: int64
Last plot by Series.plot.bar
s.plot.bar()
I am running into an issue trying to convert datetime values consistently into years, weeks, and months.
I was able to figure out how to convert a particular date into a Year/Wk/Month combination, but because of the overlap in week and month numbers, I am encountering duplicate combinations which I want to account for. For example:
2019/ Week 31 / Aug: this is because august 1 is still part of week 31 in the calendar, but the month extracted is in August
2019/ Week 31 / Jul: this is because July 31 is still part of week 31 in the calendar, but the month extracted is in July
My goal is to avoid having duplicates and wrong values extracted. Another example:
2019/ Week 01 / Dec: this is because december 31 is part of week 01 in the new year, and it's tied to calendar year 2019.
This is my code:
req_df is the original dataframe
req_total_grouped for me to group values based on a loc/filter, grouping by datecol which is a date value (ex: 2020-01-01)
import calendar
req_total_grouped = req_df.loc[req_df['datecol'] >= '2019-07-01'].groupby(req_df['datecol'])
req_total_df = req_total_grouped.count()
req_total_df['YEAR'] = req_total_df['datecol'].dt.year
req_total_df['WEEK'] = req_total_df['datecol'].dt.week.map("{:02}".format)
req_total_df['MONTH'] = req_total_df['datecol'].dt.month.apply(lambda x: calendar.month_abbr[x])
req_total_df['YR_WK_MTH'] = req_total_df['YEAR'].astype(str) + \
'/ Week ' + \
req_total_df['WEEK'].astype(str) + \
' / ' \
+ req_total_df['MONTH']
My desired output:
In cases where there are month overlaps, I would want there to be a uniform value. It doesn't matter which month I take they just need to be under the same week. (ex: 2019/ Week 31 / Aug and 2019/ Week 31 / Jul should consolidate into one single value '2019/ Week 31 / Aug' for example)
In cases where there are year over laps (ex: 2019 / Week 01 / Dec) should be 2020 / Week 01 / Jan
I guess grouping the rows by 'year' and 'week' and keeping the last value of each group gives your desired result. Can you try this?
Data (same as yours?)
df = pd.DataFrame({'date': pd.date_range('01/01/2019', '12/31/2020', freq='D')})
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month.apply(lambda x: calendar.month_abbr[x])
df['week'] = df['date'].dt.week.map("{:02}".format)
df['yr_wk_mth'] = df['year'].astype(str) + ' / Week ' + df['week'] + ' / ' + df['month']
Code:
print(df.groupby(['year','week'])['yr_wk_mth'].last())
Result:
date month yr_wk_mth
year week
2019 01 2019-12-31 Dec 2019 / Week 01 / Dec
02 2019-01-13 Jan 2019 / Week 02 / Jan
03 2019-01-20 Jan 2019 / Week 03 / Jan
04 2019-01-27 Jan 2019 / Week 04 / Jan
05 2019-02-03 Feb 2019 / Week 05 / Feb
06 2019-02-10 Feb 2019 / Week 06 / Feb
07 2019-02-17 Feb 2019 / Week 07 / Feb
08 2019-02-24 Feb 2019 / Week 08 / Feb
09 2019-03-03 Mar 2019 / Week 09 / Mar
10 2019-03-10 Mar 2019 / Week 10 / Mar
11 2019-03-17 Mar 2019 / Week 11 / Mar
12 2019-03-24 Mar 2019 / Week 12 / Mar
13 2019-03-31 Mar 2019 / Week 13 / Mar
14 2019-04-07 Apr 2019 / Week 14 / Apr
15 2019-04-14 Apr 2019 / Week 15 / Apr
16 2019-04-21 Apr 2019 / Week 16 / Apr
17 2019-04-28 Apr 2019 / Week 17 / Apr
18 2019-05-05 May 2019 / Week 18 / May
19 2019-05-12 May 2019 / Week 19 / May
20 2019-05-19 May 2019 / Week 20 / May
21 2019-05-26 May 2019 / Week 21 / May
22 2019-06-02 Jun 2019 / Week 22 / Jun
23 2019-06-09 Jun 2019 / Week 23 / Jun
24 2019-06-16 Jun 2019 / Week 24 / Jun
25 2019-06-23 Jun 2019 / Week 25 / Jun
26 2019-06-30 Jun 2019 / Week 26 / Jun
27 2019-07-07 Jul 2019 / Week 27 / Jul
28 2019-07-14 Jul 2019 / Week 28 / Jul
29 2019-07-21 Jul 2019 / Week 29 / Jul
30 2019-07-28 Jul 2019 / Week 30 / Jul
31 2019-08-04 Aug 2019 / Week 31 / Aug
32 2019-08-11 Aug 2019 / Week 32 / Aug
33 2019-08-18 Aug 2019 / Week 33 / Aug
34 2019-08-25 Aug 2019 / Week 34 / Aug
35 2019-09-01 Sep 2019 / Week 35 / Sep
36 2019-09-08 Sep 2019 / Week 36 / Sep
37 2019-09-15 Sep 2019 / Week 37 / Sep
38 2019-09-22 Sep 2019 / Week 38 / Sep
39 2019-09-29 Sep 2019 / Week 39 / Sep
40 2019-10-06 Oct 2019 / Week 40 / Oct
41 2019-10-13 Oct 2019 / Week 41 / Oct
42 2019-10-20 Oct 2019 / Week 42 / Oct
43 2019-10-27 Oct 2019 / Week 43 / Oct
44 2019-11-03 Nov 2019 / Week 44 / Nov
45 2019-11-10 Nov 2019 / Week 45 / Nov
46 2019-11-17 Nov 2019 / Week 46 / Nov
47 2019-11-24 Nov 2019 / Week 47 / Nov
48 2019-12-01 Dec 2019 / Week 48 / Dec
49 2019-12-08 Dec 2019 / Week 49 / Dec
50 2019-12-15 Dec 2019 / Week 50 / Dec
51 2019-12-22 Dec 2019 / Week 51 / Dec
52 2019-12-29 Dec 2019 / Week 52 / Dec
2020 01 2020-01-05 Jan 2020 / Week 01 / Jan
02 2020-01-12 Jan 2020 / Week 02 / Jan
03 2020-01-19 Jan 2020 / Week 03 / Jan
04 2020-01-26 Jan 2020 / Week 04 / Jan
05 2020-02-02 Feb 2020 / Week 05 / Feb
06 2020-02-09 Feb 2020 / Week 06 / Feb
07 2020-02-16 Feb 2020 / Week 07 / Feb
08 2020-02-23 Feb 2020 / Week 08 / Feb
09 2020-03-01 Mar 2020 / Week 09 / Mar
10 2020-03-08 Mar 2020 / Week 10 / Mar
11 2020-03-15 Mar 2020 / Week 11 / Mar
12 2020-03-22 Mar 2020 / Week 12 / Mar
13 2020-03-29 Mar 2020 / Week 13 / Mar
14 2020-04-05 Apr 2020 / Week 14 / Apr
15 2020-04-12 Apr 2020 / Week 15 / Apr
16 2020-04-19 Apr 2020 / Week 16 / Apr
17 2020-04-26 Apr 2020 / Week 17 / Apr
18 2020-05-03 May 2020 / Week 18 / May
19 2020-05-10 May 2020 / Week 19 / May
20 2020-05-17 May 2020 / Week 20 / May
21 2020-05-24 May 2020 / Week 21 / May
22 2020-05-31 May 2020 / Week 22 / May
23 2020-06-07 Jun 2020 / Week 23 / Jun
24 2020-06-14 Jun 2020 / Week 24 / Jun
25 2020-06-21 Jun 2020 / Week 25 / Jun
26 2020-06-28 Jun 2020 / Week 26 / Jun
27 2020-07-05 Jul 2020 / Week 27 / Jul
28 2020-07-12 Jul 2020 / Week 28 / Jul
29 2020-07-19 Jul 2020 / Week 29 / Jul
30 2020-07-26 Jul 2020 / Week 30 / Jul
31 2020-08-02 Aug 2020 / Week 31 / Aug
32 2020-08-09 Aug 2020 / Week 32 / Aug
33 2020-08-16 Aug 2020 / Week 33 / Aug
34 2020-08-23 Aug 2020 / Week 34 / Aug
35 2020-08-30 Aug 2020 / Week 35 / Aug
36 2020-09-06 Sep 2020 / Week 36 / Sep
37 2020-09-13 Sep 2020 / Week 37 / Sep
38 2020-09-20 Sep 2020 / Week 38 / Sep
39 2020-09-27 Sep 2020 / Week 39 / Sep
40 2020-10-04 Oct 2020 / Week 40 / Oct
41 2020-10-11 Oct 2020 / Week 41 / Oct
42 2020-10-18 Oct 2020 / Week 42 / Oct
43 2020-10-25 Oct 2020 / Week 43 / Oct
44 2020-11-01 Nov 2020 / Week 44 / Nov
45 2020-11-08 Nov 2020 / Week 45 / Nov
46 2020-11-15 Nov 2020 / Week 46 / Nov
47 2020-11-22 Nov 2020 / Week 47 / Nov
48 2020-11-29 Nov 2020 / Week 48 / Nov
49 2020-12-06 Dec 2020 / Week 49 / Dec
50 2020-12-13 Dec 2020 / Week 50 / Dec
51 2020-12-20 Dec 2020 / Week 51 / Dec
52 2020-12-27 Dec 2020 / Week 52 / Dec
53 2020-12-31 Dec 2020 / Week 53 / Dec
I have one dataframe which looks like below:
Date_1 Date_2
0 5 Dec 2017 5 Dec 2017
1 14 Dec 2017 14 Dec 2017
2 15 Dec 2017 15 Dec 2017
3 18 Dec 2017 21 Dec 2017 18 Dec 2017 21 Dec 2017
4 22 Dec 2017 22 Dec 2017
Conditions to be checked:
Want to check if any row contains two dates or not like 3rd row. If present split them into two separate rows.
Apply the datetime on both columns.
I am trying to do the same operation like below:
df['Date_1'] = pd.to_datetime(df['Date_1'], format='%d %b %Y')
But getting below error:
ValueError: unconverted data remains:
Expected Output:
Date_1 Date_2
0 5 Dec 2017 5 Dec 2017
1 14 Dec 2017 14 Dec 2017
2 15 Dec 2017 15 Dec 2017
3 18 Dec 2017 18 Dec 2017
4 21 Dec 2017 21 Dec 2017
5 22 Dec 2017 22 Dec 2017
After using regex with findall get the you date , your problem become a unnesting problem
s=df.apply(lambda x : x.str.findall(r'((?:\d{,2}\s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|\.|\s|,)\s?\d{,2}[a-z]*(?:-|,|\s)?\s?\d{,4})'))
unnesting(s,['Date_1','Date_2']).apply(pd.to_datetime)
Out[82]:
Date_1 Date_2
0 2017-12-05 2017-12-05
1 2017-12-14 2017-12-14
2 2017-12-15 2017-12-15
3 2017-12-18 2017-12-18
3 2017-12-21 2017-12-21
4 2017-12-22 2017-12-22
I have a dataframe that contains stacked monthly values and looks like:
Value Month
0 0.09187 Jan
1 0.72878 Feb
2 0.92052 Mar
3 -1.86845 Apr
4 -1.16489 May
5 -0.61433 Jun
6 0.68008 Jul
7 -1.50555 Aug
8 -0.18985 Sep
9 -1.11380 Oct
10 -0.63838 Nov
11 0.37527 Dec
12 0.234216 Jan
I would like to add a column of years, using a known range, so that the df looks like:
Value Month Year
0 0.09187 Jan 1950
1 0.72878 Feb 1950
2 0.92052 Mar 1950
3 -1.86845 Apr 1950
4 -1.16489 May 1950
5 -0.61433 Jun 1950
6 0.68008 Jul 1950
7 -1.50555 Aug 1950
8 -0.18985 Sep 1950
9 -1.11380 Oct 1950
10 -0.63838 Nov 1950
11 0.37527 Dec 1950
12 0.234216 Jan 1951
I tried initializing a years list to apply to the column as:
years = list(range(1950, 2000)
df['Year'] = years * 12
But this produced
Value Month Year
0 0.09187 Jan 1950
1 0.72878 Feb 1951
2 0.92052 Mar 1952
And so on. I've been unable to come up with any other approach
As long as you know that you have Jan data for all your years, you could do:
df['Year'] = df['Month'].eq('Jan').cumsum()+1949
>>> df
Value Month Year
0 0.091870 Jan 1950
1 0.728780 Feb 1950
2 0.920520 Mar 1950
3 -1.868450 Apr 1950
4 -1.164890 May 1950
5 -0.614330 Jun 1950
6 0.680080 Jul 1950
7 -1.505550 Aug 1950
8 -0.189850 Sep 1950
9 -1.113800 Oct 1950
10 -0.638380 Nov 1950
11 0.375270 Dec 1950
12 0.234216 Jan 1951
Or, you could follow your original logic, but use np.repeat:
import numpy as np
years = list(range(1950, 2000))
df['Year'] = np.repeat(years,12)
Or another alternative:
df['Year'] = pd.date_range('1950-01-01',periods=len(df),freq='m').year
I have this dataframe:
date value
1 Thu 17th Nov 2016 385.943800
2 Fri 18th Nov 2016 1074.160340
3 Sat 19th Nov 2016 2980.857860
4 Sun 20th Nov 2016 1919.723960
5 Mon 21st Nov 2016 884.279340
6 Tue 22nd Nov 2016 869.071070
7 Wed 23rd Nov 2016 760.289260
8 Thu 24th Nov 2016 2481.689270
9 Fri 25th Nov 2016 2745.990070
10 Sat 26th Nov 2016 2273.413250
11 Sun 27th Nov 2016 2630.414900
12 Mon 28th Nov 2016 817.322310
13 Tue 29th Nov 2016 1766.876030
14 Wed 30th Nov 2016 469.388420
I would like to change the format of the date column to this format YYYY-MM-DD. The dataframe consists of more than 200 rows, and every day new rows will be added, so I need to find a way to do this automatically.
This link is not helping because it sets the dates like this dates = ['30th November 2009', '31st March 2010', '30th September 2010'] and I can't do it for every row. Anyone knows a way to solve this?
Dateutil will do this job.
from dateutil import parser
print df
df2 = df.copy()
df2.date = df2.date.apply(lambda x: parser.parse(x))
df2
Output: