Add column of repeating sequential values - python

I have a dataframe that contains stacked monthly values and looks like:
Value Month
0 0.09187 Jan
1 0.72878 Feb
2 0.92052 Mar
3 -1.86845 Apr
4 -1.16489 May
5 -0.61433 Jun
6 0.68008 Jul
7 -1.50555 Aug
8 -0.18985 Sep
9 -1.11380 Oct
10 -0.63838 Nov
11 0.37527 Dec
12 0.234216 Jan
I would like to add a column of years, using a known range, so that the df looks like:
Value Month Year
0 0.09187 Jan 1950
1 0.72878 Feb 1950
2 0.92052 Mar 1950
3 -1.86845 Apr 1950
4 -1.16489 May 1950
5 -0.61433 Jun 1950
6 0.68008 Jul 1950
7 -1.50555 Aug 1950
8 -0.18985 Sep 1950
9 -1.11380 Oct 1950
10 -0.63838 Nov 1950
11 0.37527 Dec 1950
12 0.234216 Jan 1951
I tried initializing a years list to apply to the column as:
years = list(range(1950, 2000)
df['Year'] = years * 12
But this produced
Value Month Year
0 0.09187 Jan 1950
1 0.72878 Feb 1951
2 0.92052 Mar 1952
And so on. I've been unable to come up with any other approach

As long as you know that you have Jan data for all your years, you could do:
df['Year'] = df['Month'].eq('Jan').cumsum()+1949
>>> df
Value Month Year
0 0.091870 Jan 1950
1 0.728780 Feb 1950
2 0.920520 Mar 1950
3 -1.868450 Apr 1950
4 -1.164890 May 1950
5 -0.614330 Jun 1950
6 0.680080 Jul 1950
7 -1.505550 Aug 1950
8 -0.189850 Sep 1950
9 -1.113800 Oct 1950
10 -0.638380 Nov 1950
11 0.375270 Dec 1950
12 0.234216 Jan 1951
Or, you could follow your original logic, but use np.repeat:
import numpy as np
years = list(range(1950, 2000))
df['Year'] = np.repeat(years,12)
Or another alternative:
df['Year'] = pd.date_range('1950-01-01',periods=len(df),freq='m').year

Related

Pandas: How to draw bar graph on month over counts

I have a dataframe df as below:
Student_id Date_of_visit(d/m/y)
1 1/4/2020
1 30/12/2019
1 26/12/2019
2 3/1/2021
2 10/1/2021
3 4/5/2020
3 22/8/2020
How can I get the bar-graph with x-axis as month-year(eg: y-ticks: Dec 2019, Jan 2020, Feb 2020) and on y-axis - the total number of students (count) visited on a particular month.
Convert values to datetimes, then use DataFrame.resample with Resampler.size for counts, create new format of datetimes by DatetimeIndex.strftime:
df['Date_of_visit'] = pd.to_datetime(df['Date_of_visit'], dayfirst=True)
s = df.resample('M', on='Date_of_visit')['Student_id'].size()
s.index = s.index.strftime('%b %Y')
print (s)
Date_of_visit
Dec 2019 2
Jan 2020 0
Feb 2020 0
Mar 2020 0
Apr 2020 1
May 2020 1
Jun 2020 0
Jul 2020 0
Aug 2020 1
Sep 2020 0
Oct 2020 0
Nov 2020 0
Dec 2020 0
Jan 2021 2
Name: Student_id, dtype: int64
If need count only unique Student_id use Resampler.nunique:
s = df.resample('M', on='Date_of_visit')['Student_id'].nunique()
s.index = s.index.strftime('%b %Y')
print (s)
Date_of_visit
Dec 2019 1
Jan 2020 0
Feb 2020 0
Mar 2020 0
Apr 2020 1
May 2020 1
Jun 2020 0
Jul 2020 0
Aug 2020 1
Sep 2020 0
Oct 2020 0
Nov 2020 0
Dec 2020 0
Jan 2021 1
Name: Student_id, dtype: int64
Last plot by Series.plot.bar
s.plot.bar()

Changing month format from 1, 2 to Jan, Feb

I have the following table:
data1
which produces:
month
1 -0.008999
2 0.032581
3 0.049919
4 0.072708
5 -0.037558
6 -0.017506
7 0.082839
8 -0.030190
9 0.006419
10 0.035679
11 0.065266
12 0.019905
Name: pct_day, dtype: float64
How can i make month into Jan, Feb ... instead of month 1, 2...
You can use this:
import calendar
data1.month = data1.month.apply(lambda x: calendar.month_abbr[x])
or
data1.month = data1.month.apply(lambda x: calendar.month_abbr[int(x)])
Out[363]:
0 Jan
1 Feb
2 Mar
3 Apr
4 May
5 Jun
6 Jul
7 Aug
8 Sep
9 Oct
10 Nov
11 Dec
Name: month, dtype: object

How to split one row into multiple and apply datetime on dataframe column?

I have one dataframe which looks like below:
Date_1 Date_2
0 5 Dec 2017 5 Dec 2017
1 14 Dec 2017 14 Dec 2017
2 15 Dec 2017 15 Dec 2017
3 18 Dec 2017 21 Dec 2017 18 Dec 2017 21 Dec 2017
4 22 Dec 2017 22 Dec 2017
Conditions to be checked:
Want to check if any row contains two dates or not like 3rd row. If present split them into two separate rows.
Apply the datetime on both columns.
I am trying to do the same operation like below:
df['Date_1'] = pd.to_datetime(df['Date_1'], format='%d %b %Y')
But getting below error:
ValueError: unconverted data remains:
Expected Output:
Date_1 Date_2
0 5 Dec 2017 5 Dec 2017
1 14 Dec 2017 14 Dec 2017
2 15 Dec 2017 15 Dec 2017
3 18 Dec 2017 18 Dec 2017
4 21 Dec 2017 21 Dec 2017
5 22 Dec 2017 22 Dec 2017
After using regex with findall get the you date , your problem become a unnesting problem
s=df.apply(lambda x : x.str.findall(r'((?:\d{,2}\s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|\.|\s|,)\s?\d{,2}[a-z]*(?:-|,|\s)?\s?\d{,4})'))
unnesting(s,['Date_1','Date_2']).apply(pd.to_datetime)
Out[82]:
Date_1 Date_2
0 2017-12-05 2017-12-05
1 2017-12-14 2017-12-14
2 2017-12-15 2017-12-15
3 2017-12-18 2017-12-18
3 2017-12-21 2017-12-21
4 2017-12-22 2017-12-22

comparing date using dd/mm/yy format in python

I want to write a program where i can compare current date with couple of dates that i have.
my data is
12 JUN 2016
21 MAR 1989
15 MAR 1958
15 SEP 1958
23 OCT 1930
15 SEP 1928
10 MAR 2010
23 JAN 1928
15 NOV 1925
26 AUG 2009
29 APR 1987
20 JUL 1962
10 MAY 1960
13 FEB 1955
10 MAR 1956
3 MAR 2010
14 NOV 1958
4 AUG 1985
24 AUG 1956
15 FEB 1955
19 MAY 1987
30 APR 1990
8 SEP 2014
18 JAN 2012
14 DEC 1960
1 AUG 1998
7 SEP 1963
9 MAR 2012
1 MAY 1990
14 MAY 1985
15 JUN 1945
5 APR 1995
26 FEB 1987
13 DEC 1983
15 AUG 2009
16 SEP 1980
16 JAN 2005
19 JUN 2011
Now how can i compare this to current date to know that date is not exceeding current date ( i.e 13/JUN/2016).
please help me! Thank you.
You have to create a datetime object using the string data. You can create the object by parsing the date string using strptime method.
from datetime import datetime
mydate = datetime.strptime("19 JUN 2011", "%d %b %Y")
And then use the object to compare it with today's date.
print mydate < datetime.today()
True

Moving average for months over years

I am new to pandas and would appreciate guidance with the following problem. I have a dataframe that looks like the following:
In [88]: df.head()
Out[88]:
Jan Feb Mar Apr May Jun ... Dec
Year ...
1758 13 15 14 5 5 5 ... 12
1759 11 10 7 4 3 6 ... 11
1760 19 15 18 5 13 6 ... 11
1761 14 16 14 9 9 11 ... 10
1762 13 12 12 8 5 3 ... 11
I need to compute moving average per month in the following way:
Moving_average of Mar_1761 = (value_of_Mar_1761)/(sum of values from Sep_1760 to Aug_1761)
If I am using the rolling average function of pandas, how do I code the logic to inspect predecessor or successor row for a particular point?
The easiest approach is to reshape to data to a long format using .stack, which can be be passed straight into rolling mean.
In [34]: pd.rolling_mean(df.stack(), window=12)
Out[34]:
Year
1758 Jan NaN
Feb NaN
Mar NaN
Apr NaN
May NaN
Jun NaN
Jul NaN
Aug NaN
Sep NaN
Oct NaN
Nov NaN
Dec 0.035038
1759 Jan -0.076660
Feb -0.153907
Mar -0.286818
Apr -0.306684
May -0.159371
Jun -0.230627
Jul -0.175845

Categories