Separate month from year in Python data frame - python

I have a 1000 x 6 dimension data frame and one of the columns' header is "Date" where the date is presented in the format "JAN2014", "JUN2002" etc...
I would like to split this column in two separate columns: "Year" and "Month" so JAN will be in "Month" column, 2014 will be in "Year" column etc..
Could anyone please tell me how to do this in Python?

You can use the str accessor and indexing:
df['Month'] = df['Date'].str[:3]
df['Year'] = df['Date'].str[3:]
Example:
df = pd.DataFrame({'Date':['JAN2014','JUN2002']})
df['Month'] = df['Date'].str[:3]
df['Year'] = df['Date'].str[3:]
print(df)
Output:
Date Month Year
0 JAN2014 JAN 2014
1 JUN2002 JUN 2002

Related

Pandas str split a column and add it to a dataframe

so I have a dataframe with this information and for the release date, I want to str,split so that it only display the year.
how can i str.split the release date so that it only shows the year and label it "release year"
enter image description here
df['release year'] = df['releaseDate'].apply(lambda x: x.split('-')[-1])
You can access the year and month attributes directly using the following:
import pandas as pd
df = pd.DataFrame({"data" : ["11-Jan-2020", "05-Feb-2020", "01-Mar-2020"]})
df["data"]=pd.to_datetime(df["data"], dayfirst=True)
df['year'] = df['data'].dt.year
df.head()
Your df will now look like:
data year
0 2020-01-11 2020
1 2020-02-05 2020
2 2020-03-01 2020

Split dataframe into two using Data as splitting point

I have a dataframe which has 100,000 rows and 24 columns; representing crime over a year period October 2019 - October 2020
I'm trying to split the my df into two one dataframe of all rows ranging from october 1st - 31st March and the second ranging from April 1st - October 31st;
Would anyone be able to kindly assist how using pandas?
Assuming the column is of datetime type. You can do like this :
import pandas as pd
split_data = pd.datetime(2020,03,31)
df_1 = df.loc[df['Date']<= split_date]
df_2 = df.loc[df['Date'] > split_date]
if the column containing date field is not datetime type. You should first convert it into datetime type.
df['Date'] = pd.to_datetime(df['Date'])

counting months between two days in dataframe

I have a dataframe with multiple columns, one of which is a date column. I'm interested in creating a new column which contains the number of months between the date column and a preset date. For example one of the dates in the 'start date' column is '2019-06-30 00:00:00' i would want to be able to calculate the number of months between that date and the end of 2021 so 2021-12-31 and place the answer into a new column and do this for the entire date column in the dataframe. I haven't been able to work out how i could go about this but i would like it in the end to look like this if the predetermined end date was 2021-12-31:
df =
|start date months
0|2019-06-30 30
1|2019-08-12 28
2|2020-01-24 23
You can do this using np.timedelta64:
end_date = pd.to_datetime('2021-12-31')
df['start date'] = pd.to_datetime(df['start date'])
df['month'] = ((end_date - df['start date'])/np.timedelta64(1, 'M')).astype(int)
print(df)
start date month
0 2019-06-30 30
1 2019-08-12 28
2 2020-01-24 23
Assume that start date column is of datetime type (not string)
and the reference date is defined as follows:
refDate = pd.to_datetime('2021-12-31')
or any other date of your choice.
Then you can compute the number of months as:
df['months'] = (refDate.to_period('M') - df['start date']\
.dt.to_period('M')).apply(lambda x: x.n)

Create a new column in a dataframe that shows Day of the Week from an already existing dd/mm/yy column? Python

I have a dataframe that contains a column with dates e.g. 24/07/15 etc
Is there a way to create a new column into the dataframe that displays all the days of the week corresponding to the already existing 'Date' column?
I want the output to appear as:
[Date][DayOfTheWeek]
This might work:
If you want day name:
In [1405]: df
Out[1405]:
dates
0 24/07/15
1 25/07/15
2 26/07/15
In [1406]: df['dates'] = pd.to_datetime(df['dates']) # You don't need to specify the format also.
In [1408]: df['dow'] = df['dates'].dt.day_name()
In [1409]: df
Out[1409]:
dates dow
0 2015-07-24 Friday
1 2015-07-25 Saturday
2 2015-07-26 Sunday
If you want day number:
In [1410]: df['dow'] = df['dates'].dt.day
In [1411]: df
Out[1411]:
dates dow
0 2015-07-24 24
1 2015-07-25 25
2 2015-07-26 26
I would try the apply function, so something like this:
def extractDayOfWeek(dateString):
...
df['DayOfWeek'] = df.apply(lambda x: extractDayOfWeek(x['Date'], axis=1)
The idea is that, you map over every row, extract the 'date' column, and then apply your own function to create a new row entry named 'Day'
Depending of the type of you column Date.
df['Date']=pd.to_datetime(df['Date'], format="d/%m/%y")
df['weekday'] = df['Date'].dt.dayofweek

Editing the date in pandas to show year only, by column

I am trying to understand how I can edit the dataframe in python using pandas so I can drop everything but the year.
Example: if the date is 2014-01-01, I want it to show 2014 and drop both the month and the date. All the dates are in a single column.
Thanks in advice!
You can convert the numpy.datetime64 date value to datetime using pd.to_datetime() and then you can extract year or month or day from it.
import numpy as np
date = np.datetime64('2014-01-01')
type(date)
Output:
numpy.datetime64
Convert this date to pandas datetime using pd.to_datetime.
date = pd.to_datetime(date)
type(date)
Output:
pandas._libs.tslibs.timestamps.Timestamp
Then you can extract the year using .year
date.year
Output:
2014
So, if you if you have a df:
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df
Output:
date
0 2014
1 2015
2 2016
Alternately, you can also do this
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = df['date'].apply(lambda x: x.strftime('%Y'))
df
Output:
date
0 2014
1 2015
2 2016
EDIT 1
Group by using year when the column has date values
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.groupby(df.index.year).size()
Output:
date
2014 1
2015 1
2016 1
You can still do the same even if you have removed the month and day from the date and only have year in your column
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df.groupby('date').size()
Output:
date
2014 1
2015 1
2016 1

Categories