convert month of dates into sequence - python

i want to combine months from years into sequence, for example, i have dataframe like this:
stuff_id date
1 2015-02-03
2 2015-03-03
3 2015-05-19
4 2015-10-13
5 2016-01-07
6 2016-03-20
i want to sequence the months of the date. the desired output is:
stuff_id date month
1 2015-02-03 1
2 2015-03-03 2
3 2015-05-19 4
4 2015-10-13 9
5 2016-01-07 12
6 2016-03-20 14
which means feb'15 is the first month in the date list and jan'2016 is the 12th month after feb'2015

If your date column is a datetime (if it's not, cast it to one), you can use the .dt.month and .dt.year properties for this!
https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.month.html
recast
(text copy from Answer to Pasting data into a pandas dataframe)
>>> df = pd.read_table(io.StringIO(s), delim_whitespace=True) # text from SO
>>> df["date"] = pd.to_datetime(df["date"])
>>> df
stuff_id date
0 1 2015-02-03
1 2 2015-03-03
2 3 2015-05-19
3 4 2015-10-13
4 5 2016-01-07
5 6 2016-03-20
>>> df.dtypes
stuff_id int64
date datetime64[ns]
dtype: object
extract years and months to decimal months and reduce to relative
>>> months = df["date"].dt.year * 12 + df["date"].dt.month # series
>>> df["months"] = months - min(months) + 1
>>> df
stuff_id date months
0 1 2015-02-03 1
1 2 2015-03-03 2
2 3 2015-05-19 4
3 4 2015-10-13 9
4 5 2016-01-07 12
5 6 2016-03-20 14

Related

pandas get a sum column for next 7 days

I want to get the sum of values for next 7 days of a column
my dataframe :
date value
0 2021-04-29 1
1 2021-05-03 2
2 2021-05-06 1
3 2021-05-15 1
4 2021-05-17 2
5 2021-05-18 1
6 2021-05-21 2
7 2021-05-22 5
8 2021-05-24 4
i tried to make a new column that contains date 7 days from current date
df['temp'] = df['date'] + timedelta(days=7)
then calculate value between date range :
df['next_7days'] = df[(df.date > df.date) & (df.date <= df.temp)].value.sum()
But this gives me answer as all 0.
intended result:
date value next_7days
0 2021-04-29 1 3
1 2021-05-03 2 1
2 2021-05-06 1 0
3 2021-05-15 1 10
4 2021-05-17 2 12
5 2021-05-18 1 11
6 2021-05-21 2 9
7 2021-05-22 5 4
8 2021-05-24 4 0
The method iam using currently is quite tedious, are their any better methods to get the intended result.
With a list comprehension:
tomorrow_dates = df.date + pd.Timedelta("1 day")
next_week_dates = df.date + pd.Timedelta("7 days")
df["next_7days"] = [df.value[df.date.between(tomorrow, next_week)].sum()
for tomorrow, next_week in zip(tomorrow_dates, next_week_dates)]
where we first define tomorrow and next week's dates and store them. Then zip them together and use between of pd.Series to get a boolean series if the date is indeed between the desired range. Then using boolean indexing to get the actual values and sum them. Do this for each date pair.
to get
date value next_7days
0 2021-04-29 1 3
1 2021-05-03 2 1
2 2021-05-06 1 0
3 2021-05-15 1 10
4 2021-05-17 2 12
5 2021-05-18 1 11
6 2021-05-21 2 9
7 2021-05-22 5 4
8 2021-05-24 4 0

extract chunk of Pandas dataframe from today's date up to "n" weeks ahead

I want to write code to cut a dataframe that contains weekly predictions data to return a 'n' week prediction length from today's date.
a toy example of my dataframe looks like this:
data4 = pd.DataFrame({'Id' : ['001','002','003'],
'2020-01-01' : [4,5,6],
'2020-01-08':[3,5,6],
'2020-01-15': [2,6,7],
'2020-01-22': [2,6,7],
'2020-01-29': [2,6,7],
'2020-02-5': [2,6,7],
'2020-02-12': [4,4,4]})
Id 2020-01-01 2020-01-08 2020-01-15 2020-01-22 2020-01-29 2020-02-5 \
0 001 4 3 2 2 2 2
1 002 5 5 6 6 6 6
2 003 6 6 7 7 7 7
2020-02-12
0 4
1 4
2 4
I am trying to get:
dataset_for_analysis = pd.DataFrame({'Id' : ['001','002','003'],
'2020-01-15': [2,6,7],
'2020-01-22': [2,6,7],
'2020-01-29': [2,6,7],
'2020-02-5': [2,6,7]})
Id 2020-01-15 2020-01-22 2020-01-29 2020-02-5
0 001 2 2 2 2
1 002 6 6 6 6
2 003 7 7 7 7
I have done this,from what I understood from datetime documentations.
dataset_for_analysis = data4.datetime.datetime.today+ pd.Timedelta('3 weeks')
and gives me the error:
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'datetime'
I am a bit confused about how to use the datetime today and timedelta, especially because i am working with weekly data. is there a way to get the current week of the year i am in, rather than the day? Would anyone has help with this? Thank you!
You can do the following:
today = '2020-01-15'
n_weeks = 10
# get dates by n weeks
cols = [str((pd.to_datetime(today) + pd.Timedelta(weeks=x)).date()) for x in range(n_weeks)]
# pick the columns which exist in cols
use_cols = ['Id'] + [x for x in data4.columns if x in cols]
# select the columns
data4 = data4[use_cols]
Id 2020-01-15 2020-01-22 2020-01-29 2020-02-12
0 001 2 2 2 4
1 002 6 6 6 4
2 003 7 7 7 4

to_datetime assemblage error due to extra keys

My pandas version is 0.23.4.
I tried to run this code:
df['date_time'] = pd.to_datetime(df[['year','month','day','hour_scheduled_departure','minute_scheduled_departure']])
and the following error appeared:
extra keys have been passed to the datetime assemblage: [hour_scheduled_departure, minute_scheduled_departure]
Any ideas of how to get the job done by pd.to_datetime?
#anky_91
In this image an extract of first 10 rows is presented. First column [int32]: year; Second column[int32]: month; Third column[int32]: day; Fourth column[object]: hour; Fifth column[object]: minute. The length of objects is 2.
Another solution:
>>pd.concat([df.A,pd.to_datetime(pd.Series(df[df.columns[1:]].fillna('').values.tolist(),name='Date').map(lambda x: '0'.join(map(str,x))))],axis=1)
A Date
0 a 2002-07-01 05:07:00
1 b 2002-08-03 03:08:00
2 c 2002-09-05 06:09:00
3 d 2002-04-07 09:04:00
4 e 2002-02-01 02:02:00
5 f 2002-03-05 04:03:00
For the example you have added as image (i have skipped the last 3 columns due to save time)
>>df.month=df.month.map("{:02}".format)
>>df.day = df.day.map("{:02}".format)
>>pd.concat([df.A,pd.to_datetime(pd.Series(df[df.columns[1:]].fillna('').values.tolist(),name='Date').map(lambda x: ''.join(map(str,x))))],axis=1)
A Date
0 a 2015-01-01 00:05:00
1 b 2015-01-01 00:01:00
2 c 2015-01-01 00:02:00
3 d 2015-01-01 00:02:00
4 e 2015-01-01 00:25:00
5 f 2015-01-01 00:25:00
You can use rename to columns, so possible use pandas.to_datetime with columns year, month, day, hour, minute:
df = pd.DataFrame({
'A':list('abcdef'),
'year':[2002,2002,2002,2002,2002,2002],
'month':[7,8,9,4,2,3],
'day':[1,3,5,7,1,5],
'hour_scheduled_departure':[5,3,6,9,2,4],
'minute_scheduled_departure':[7,8,9,4,2,3]
})
print (df)
A year month day hour_scheduled_departure minute_scheduled_departure
0 a 2002 7 1 5 7
1 b 2002 8 3 3 8
2 c 2002 9 5 6 9
3 d 2002 4 7 9 4
4 e 2002 2 1 2 2
5 f 2002 3 5 4 3
cols = ['year','month','day','hour_scheduled_departure','minute_scheduled_departure']
d = {'hour_scheduled_departure':'hour','minute_scheduled_departure':'minute'}
df['date_time'] = pd.to_datetime(df[cols].rename(columns=d))
#if necessary remove columns
df = df.drop(cols, axis=1)
print (df)
A date_time
0 a 2002-07-01 05:07:00
1 b 2002-08-03 03:08:00
2 c 2002-09-05 06:09:00
3 d 2002-04-07 09:04:00
4 e 2002-02-01 02:02:00
5 f 2002-03-05 04:03:00
Detail:
print (df[cols].rename(columns=d))
year month day hour minute
0 2002 7 1 5 7
1 2002 8 3 3 8
2 2002 9 5 6 9
3 2002 4 7 9 4
4 2002 2 1 2 2
5 2002 3 5 4 3

Python-String parsing for extraction of date and time

The datetime is given in the format YY-MM-DD HH:MM:SS in a dataframe.I want new Series of year,month and hour for which I am trying the below code.
But the problem is that Month and Hour are getting the same value,Year is fine.
Can anyone help me with this ? I am using Ipthon notebook and Pandas and numpy.
Here is the code :
def extract_hour(X):
cnv=datetime.strptime(X, '%Y-%m-%d %H:%M:%S')
return cnv.hour
def extract_month(X):
cnv=datetime.strptime(X, '%Y-%m-%d %H:%M:%S')
return cnv.month
def extract_year(X):
cnv=datetime.strptime(X, '%Y-%m-%d %H:%M:%S')
return cnv.year
#month column
train['Month']=train['datetime'].apply((lambda x: extract_month(x)))
test['Month']=test['datetime'].apply((lambda x: extract_month(x)))
#year column
train['Year']=train['datetime'].apply((lambda x: extract_year(x)))
test['Year']=test['datetime'].apply((lambda x: extract_year(x)))
#Hour column
train['Hour']=train['datetime'].apply((lambda x: extract_hour(x)))
test['Hour']=test['datetime'].apply((lambda x: extract_hour(x)))
you can use .dt accessors instead: train['datetime'].dt.month, train['datetime'].dt.year, train['datetime'].dt.hour (see the full list below)
Demo:
In [81]: train = pd.DataFrame(pd.date_range('2016-01-01', freq='1999H', periods=10), columns=['datetime'])
In [82]: train
Out[82]:
datetime
0 2016-01-01 00:00:00
1 2016-03-24 07:00:00
2 2016-06-15 14:00:00
3 2016-09-06 21:00:00
4 2016-11-29 04:00:00
5 2017-02-20 11:00:00
6 2017-05-14 18:00:00
7 2017-08-06 01:00:00
8 2017-10-28 08:00:00
9 2018-01-19 15:00:00
In [83]: train.datetime.dt.year
Out[83]:
0 2016
1 2016
2 2016
3 2016
4 2016
5 2017
6 2017
7 2017
8 2017
9 2018
Name: datetime, dtype: int64
In [84]: train.datetime.dt.month
Out[84]:
0 1
1 3
2 6
3 9
4 11
5 2
6 5
7 8
8 10
9 1
Name: datetime, dtype: int64
In [85]: train.datetime.dt.hour
Out[85]:
0 0
1 7
2 14
3 21
4 4
5 11
6 18
7 1
8 8
9 15
Name: datetime, dtype: int64
In [86]: train.datetime.dt.day
Out[86]:
0 1
1 24
2 15
3 6
4 29
5 20
6 14
7 6
8 28
9 19
Name: datetime, dtype: int64
List of all .dt accessors:
In [77]: train.datetime.dt.
train.datetime.dt.ceil train.datetime.dt.hour train.datetime.dt.month train.datetime.dt.to_pydatetime
train.datetime.dt.date train.datetime.dt.is_month_end train.datetime.dt.nanosecond train.datetime.dt.tz
train.datetime.dt.day train.datetime.dt.is_month_start train.datetime.dt.normalize train.datetime.dt.tz_convert
train.datetime.dt.dayofweek train.datetime.dt.is_quarter_end train.datetime.dt.quarter train.datetime.dt.tz_localize
train.datetime.dt.dayofyear train.datetime.dt.is_quarter_start train.datetime.dt.round train.datetime.dt.week
train.datetime.dt.days_in_month train.datetime.dt.is_year_end train.datetime.dt.second train.datetime.dt.weekday
train.datetime.dt.daysinmonth train.datetime.dt.is_year_start train.datetime.dt.strftime train.datetime.dt.weekday_name
train.datetime.dt.floor train.datetime.dt.microsecond train.datetime.dt.time train.datetime.dt.weekofyear
train.datetime.dt.freq train.datetime.dt.minute train.datetime.dt.to_period train.datetime.dt.year

How to group time series data by Monday, Tuesday .. ? pandas

I have time series pandas DataFrame looks like
value
12-01-2014 1
13-01-2014 2
....
01-05-2014 5
I want to group them into
1 (Monday, Tuesday, ..., Saturday, Sonday)
2 (Workday, Weekend)
How could I achieve that in pandas ?
Make sure your dates column is a datetime object and use the datetime attributes:
df = pd.DataFrame({'dates':['1/1/15','1/2/15','1/3/15','1/4/15','1/5/15','1/6/15',
'1/7/15','1/8/15','1/9/15','1/10/15','1/11/15','1/12/15'],
'values':[1,2,3,4,5,1,2,3,1,2,3,4]})
df['dates'] = pd.to_datetime(df['dates'])
df['dayofweek'] = df['dates'].apply(lambda x: x.dayofweek)
dates values dayofweek
0 2015-01-01 1 3
1 2015-01-02 2 4
2 2015-01-03 3 5
3 2015-01-04 4 6
4 2015-01-05 5 0
5 2015-01-06 1 1
6 2015-01-07 2 2
7 2015-01-08 3 3
8 2015-01-09 1 4
9 2015-01-10 2 5
10 2015-01-11 3 6
11 2015-01-12 4 0
df.groupby(df['dates'].apply(lambda x: x.dayofweek)).sum()
df.groupby(df['dates'].apply(lambda x: 0 if x.dayofweek in [5,6] else 1)).sum()
Output:
In [1]: df.groupby(df['dates'].apply(lambda x: x.dayofweek)).sum()
Out[1]:
values
dates
0 9
1 1
2 2
3 4
4 3
5 5
6 7
In [2]: df.groupby(df['dates'].apply(lambda x: 0 if x.dayofweek in [5,6] else 1)).sum()
Out[2]:
values
dates
0 12
1 19

Categories