Pandas str split a column and add it to a dataframe - python

so I have a dataframe with this information and for the release date, I want to str,split so that it only display the year.
how can i str.split the release date so that it only shows the year and label it "release year"
enter image description here

df['release year'] = df['releaseDate'].apply(lambda x: x.split('-')[-1])

You can access the year and month attributes directly using the following:
import pandas as pd
df = pd.DataFrame({"data" : ["11-Jan-2020", "05-Feb-2020", "01-Mar-2020"]})
df["data"]=pd.to_datetime(df["data"], dayfirst=True)
df['year'] = df['data'].dt.year
df.head()
Your df will now look like:
data year
0 2020-01-11 2020
1 2020-02-05 2020
2 2020-03-01 2020

Related

How to extract year and month from date in python

I tried some functions but those functions cannot produce the right answer for some dates, image of my code. Like the first date "12/11/2015", the month should be 11 instead of 12. Does anyone know how to solve this in general?
You just have to add another argument called dayfirst: bool, like so:
import pandas as pd
df = pd.DataFrame({
'date': ['12/11/2015', '19/11/2015', '7/12/2015']
})
df['month'] = pd.DatetimeIndex(df['date'], dayfirst=True).month
df['year'] = pd.DatetimeIndex(df['date'], dayfirst=True).year
print(df)
Result:
date month year
0 12/11/2015 11 2015
1 19/11/2015 11 2015
2 7/12/2015 12 2015

Create a new column in a dataframe that shows Day of the Week from an already existing dd/mm/yy column? Python

I have a dataframe that contains a column with dates e.g. 24/07/15 etc
Is there a way to create a new column into the dataframe that displays all the days of the week corresponding to the already existing 'Date' column?
I want the output to appear as:
[Date][DayOfTheWeek]
This might work:
If you want day name:
In [1405]: df
Out[1405]:
dates
0 24/07/15
1 25/07/15
2 26/07/15
In [1406]: df['dates'] = pd.to_datetime(df['dates']) # You don't need to specify the format also.
In [1408]: df['dow'] = df['dates'].dt.day_name()
In [1409]: df
Out[1409]:
dates dow
0 2015-07-24 Friday
1 2015-07-25 Saturday
2 2015-07-26 Sunday
If you want day number:
In [1410]: df['dow'] = df['dates'].dt.day
In [1411]: df
Out[1411]:
dates dow
0 2015-07-24 24
1 2015-07-25 25
2 2015-07-26 26
I would try the apply function, so something like this:
def extractDayOfWeek(dateString):
...
df['DayOfWeek'] = df.apply(lambda x: extractDayOfWeek(x['Date'], axis=1)
The idea is that, you map over every row, extract the 'date' column, and then apply your own function to create a new row entry named 'Day'
Depending of the type of you column Date.
df['Date']=pd.to_datetime(df['Date'], format="d/%m/%y")
df['weekday'] = df['Date'].dt.dayofweek

Editing the date in pandas to show year only, by column

I am trying to understand how I can edit the dataframe in python using pandas so I can drop everything but the year.
Example: if the date is 2014-01-01, I want it to show 2014 and drop both the month and the date. All the dates are in a single column.
Thanks in advice!
You can convert the numpy.datetime64 date value to datetime using pd.to_datetime() and then you can extract year or month or day from it.
import numpy as np
date = np.datetime64('2014-01-01')
type(date)
Output:
numpy.datetime64
Convert this date to pandas datetime using pd.to_datetime.
date = pd.to_datetime(date)
type(date)
Output:
pandas._libs.tslibs.timestamps.Timestamp
Then you can extract the year using .year
date.year
Output:
2014
So, if you if you have a df:
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df
Output:
date
0 2014
1 2015
2 2016
Alternately, you can also do this
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = df['date'].apply(lambda x: x.strftime('%Y'))
df
Output:
date
0 2014
1 2015
2 2016
EDIT 1
Group by using year when the column has date values
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.groupby(df.index.year).size()
Output:
date
2014 1
2015 1
2016 1
You can still do the same even if you have removed the month and day from the date and only have year in your column
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df.groupby('date').size()
Output:
date
2014 1
2015 1
2016 1

Parse DateTime from Object with Pandas

I would like to parse the year column to datetime.
name id nametype recclass mass (g) fall year
0 Aachen 1 Valid L5 21.0 Fell 01/01/1880 12:00:00 AM
... reclat reclong GeoLocation
... 50.77500 6.08333 (50.775000, 6.083330)
df['year'].apply(dateutil.parser.parse) and that parses as 1880-01-01 00:00:00 but i can use that for selecting dates.
Do anyone have a tip for me?
I think need to_datetime:
df = pd.DataFrame({'year':['01/01/1880 12:00:00 AM']})
df['year'] = pd.to_datetime(df['year'])
print (df)
year
0 1880-01-01
Since you have converted the year column to datetime, you can use:
df['year'] = df['year'].dt.date
# year
# 1880-01-01
However, for datetime casting, note that pandas has an inbuilt datetime parser that is easy to use than dateutil IMO.

Separate month from year in Python data frame

I have a 1000 x 6 dimension data frame and one of the columns' header is "Date" where the date is presented in the format "JAN2014", "JUN2002" etc...
I would like to split this column in two separate columns: "Year" and "Month" so JAN will be in "Month" column, 2014 will be in "Year" column etc..
Could anyone please tell me how to do this in Python?
You can use the str accessor and indexing:
df['Month'] = df['Date'].str[:3]
df['Year'] = df['Date'].str[3:]
Example:
df = pd.DataFrame({'Date':['JAN2014','JUN2002']})
df['Month'] = df['Date'].str[:3]
df['Year'] = df['Date'].str[3:]
print(df)
Output:
Date Month Year
0 JAN2014 JAN 2014
1 JUN2002 JUN 2002

Categories