I am trying to understand how I can edit the dataframe in python using pandas so I can drop everything but the year.
Example: if the date is 2014-01-01, I want it to show 2014 and drop both the month and the date. All the dates are in a single column.
Thanks in advice!
You can convert the numpy.datetime64 date value to datetime using pd.to_datetime() and then you can extract year or month or day from it.
import numpy as np
date = np.datetime64('2014-01-01')
type(date)
Output:
numpy.datetime64
Convert this date to pandas datetime using pd.to_datetime.
date = pd.to_datetime(date)
type(date)
Output:
pandas._libs.tslibs.timestamps.Timestamp
Then you can extract the year using .year
date.year
Output:
2014
So, if you if you have a df:
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df
Output:
date
0 2014
1 2015
2 2016
Alternately, you can also do this
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = df['date'].apply(lambda x: x.strftime('%Y'))
df
Output:
date
0 2014
1 2015
2 2016
EDIT 1
Group by using year when the column has date values
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.groupby(df.index.year).size()
Output:
date
2014 1
2015 1
2016 1
You can still do the same even if you have removed the month and day from the date and only have year in your column
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df.groupby('date').size()
Output:
date
2014 1
2015 1
2016 1
Related
I tried some functions but those functions cannot produce the right answer for some dates, image of my code. Like the first date "12/11/2015", the month should be 11 instead of 12. Does anyone know how to solve this in general?
You just have to add another argument called dayfirst: bool, like so:
import pandas as pd
df = pd.DataFrame({
'date': ['12/11/2015', '19/11/2015', '7/12/2015']
})
df['month'] = pd.DatetimeIndex(df['date'], dayfirst=True).month
df['year'] = pd.DatetimeIndex(df['date'], dayfirst=True).year
print(df)
Result:
date month year
0 12/11/2015 11 2015
1 19/11/2015 11 2015
2 7/12/2015 12 2015
I have a Pandas dataframe, which looks like below
I want to create a new column, which tells the exact date from the information from all the above columns. The code should look something like this:
df['Date'] = pd.to_datetime(df['Month']+df['WeekOfMonth']+df['DayOfWeek']+df['Year'])
I was able to find a workaround for your case. You will need to define the dictionaries for the months and the days of the week.
month = {"Jan":"01", "Feb":"02", "March":"03", "Apr": "04", "May":"05", "Jun":"06", "Jul":"07", "Aug":"08", "Sep":"09", "Oct":"10", "Nov":"11", "Dec":"12"}
week = {"Monday":1,"Tuesday":2,"Wednesday":3,"Thursday":4,"Friday":5,"Saturday":6,"Sunday":7}
With this dictionaries the transformation that I used with a custom dataframe was:
rows = [["Dec",5,"Wednesday", "1995"],
["Jan",3,"Wednesday","2013"]]
df = pd.DataFrame(rows, columns=["Month","Week","Weekday","Year"])
df['Date'] = (df["Year"] + "-" + df["Month"].map(month) + "-" + (df["Week"].apply(lambda x: (x - 1)*7) + df["Weekday"].map(week).apply(int) ).apply(str)).astype('datetime64[ns]')
However you have to be careful. With some data that you posted as example there were some dates that exceeds the date range. For example, for
row = ["Oct",5,"Friday","2018"]
The date displayed is 2018-10-33. I recommend using some logic to filter your data in order to avoid this kind of problems.
Let's approach it in 3 steps as follows:
Get the date of month start Month_Start from Year and Month
Calculate the date offsets DateOffset relative to Month_Start from WeekOfMonth and DayOfWeek
Get the actual date Date from Month_Start and DateOffset
Here's the codes:
df['Month_Start'] = pd.to_datetime(df['Year'].astype(str) + df['Month'] + '01', format="%Y%b%d")
import time
df['DateOffset'] = (df['WeekOfMonth'] - 1) * 7 + df['DayOfWeek'].map(lambda x: time.strptime(x, '%A').tm_wday) - df['Month_Start'].dt.dayofweek
df['Date'] = df['Month_Start'] + pd.to_timedelta(df['DateOffset'], unit='D')
Output:
Month WeekOfMonth DayOfWeek Year Month_Start DateOffset Date
0 Dec 5 Wednesday 1995 1995-12-01 26 1995-12-27
1 Jan 3 Wednesday 2013 2013-01-01 15 2013-01-16
2 Oct 5 Friday 2018 2018-10-01 32 2018-11-02
3 Jun 2 Saturday 1980 1980-06-01 6 1980-06-07
4 Jan 5 Monday 1976 1976-01-01 25 1976-01-26
The Date column now contains the dates derived from the information from other columns.
You can remove the working interim columns, if you like, as follows:
df = df.drop(['Month_Start', 'DateOffset'], axis=1)
so I have a dataframe with this information and for the release date, I want to str,split so that it only display the year.
how can i str.split the release date so that it only shows the year and label it "release year"
enter image description here
df['release year'] = df['releaseDate'].apply(lambda x: x.split('-')[-1])
You can access the year and month attributes directly using the following:
import pandas as pd
df = pd.DataFrame({"data" : ["11-Jan-2020", "05-Feb-2020", "01-Mar-2020"]})
df["data"]=pd.to_datetime(df["data"], dayfirst=True)
df['year'] = df['data'].dt.year
df.head()
Your df will now look like:
data year
0 2020-01-11 2020
1 2020-02-05 2020
2 2020-03-01 2020
I have problems with transforming a Pandas dataframe column with dates to a number.
import matplotlib.dates
import datetime
for x in arsenalchelsea['Datum']:
year = int(x[:4])
month = int(x[5:7])
day = int(x[8:10])
hour = int(x[11:13])
minute = int(x[14:16])
sec = int(x[17:19])
arsenalchelsea['floatdate']=date2num(datetime.datetime(year, month, day, hour, minute, sec))
arsenalchelsea
I want to make a new column in my dataframe with the dates in numbers, because i want to make a line graph later with the date on the x-as.
This is the format of the date:
2017-11-29 14:06:45
Does anyone have a solution for this problem?
Slicing strings to get date components is bad practice. You should convert to datetime and extract directly.
In this case, it seems you can just use pd.to_datetime, but below I also demonstrate how you can extract the various components once you have performed the conversion.
df = pd.DataFrame({'Date': ['2017-01-15 14:55:42', '2017-11-10 12:15:21', '2017-12-05 22:05:45']})
df['Date'] = pd.to_datetime(df['Date'])
df[['year', 'month', 'day', 'hour', 'minute', 'sec']] = \
df['Date'].apply(lambda x: (x.year, x.month, x.day, x.hour, x.minute, x.second)).apply(pd.Series)
Result:
Date year month day hour minute sec
0 2017-01-15 14:55:42 2017 1 15 14 55 42
1 2017-11-10 12:15:21 2017 11 10 12 15 21
2 2017-12-05 22:05:45 2017 12 5 22 5 45
I have a 1000 x 6 dimension data frame and one of the columns' header is "Date" where the date is presented in the format "JAN2014", "JUN2002" etc...
I would like to split this column in two separate columns: "Year" and "Month" so JAN will be in "Month" column, 2014 will be in "Year" column etc..
Could anyone please tell me how to do this in Python?
You can use the str accessor and indexing:
df['Month'] = df['Date'].str[:3]
df['Year'] = df['Date'].str[3:]
Example:
df = pd.DataFrame({'Date':['JAN2014','JUN2002']})
df['Month'] = df['Date'].str[:3]
df['Year'] = df['Date'].str[3:]
print(df)
Output:
Date Month Year
0 JAN2014 JAN 2014
1 JUN2002 JUN 2002