How to extract year and month from date in python - python

I tried some functions but those functions cannot produce the right answer for some dates, image of my code. Like the first date "12/11/2015", the month should be 11 instead of 12. Does anyone know how to solve this in general?

You just have to add another argument called dayfirst: bool, like so:
import pandas as pd
df = pd.DataFrame({
'date': ['12/11/2015', '19/11/2015', '7/12/2015']
})
df['month'] = pd.DatetimeIndex(df['date'], dayfirst=True).month
df['year'] = pd.DatetimeIndex(df['date'], dayfirst=True).year
print(df)
Result:
date month year
0 12/11/2015 11 2015
1 19/11/2015 11 2015
2 7/12/2015 12 2015

Related

How can I save out hourly dates to excel?

a=pd.date_range(start='1/1/2021', end='01/01/2022', freq='h')
I have these hourly dates for the year but when I try to save it out to excel its just one ridiculously big number and what I need is 4 columns (year;month;day;hour)
Thanks in advance!
One solution could be:
Use pd.DataFrame to construct a df with respective columns.
Use df.to_csv to write the df to a csv-file (also possible: directly to excel with df.to_excel).
import pandas as pd
a = pd.date_range(start='1/1/2021', end='01/01/2022', freq='h')
df = pd.DataFrame({'year': a.year,
'month': a.month,
'day': a.day,
'hour': a.hour})
print(df.head())
year month day hour
0 2021 1 1 0
1 2021 1 1 1
2 2021 1 1 2
3 2021 1 1 3
4 2021 1 1 4
df.to_csv('fname.csv')

Python Pandas: Is there a way to obtain a subset dataframe based on strings in a list

I am looking to make a subset df based on the string values in a list.
A toy model example:
data = {'month': ['January','February','March','April','May','June','July','August','September','October','November','December'],
'days_in_month': [31,28,31,30,31,30,31,31,30,31,30,31]
}
df = pd.DataFrame(data, columns = ['month', 'days_in_month'])
summer_months = ['Dec', 'Jan', 'Feb']
contain_values = df[df['month'].str.contains(summer_months)]
print (df)
This would fail because of contain_values = df[df['month'].str.contains(summer_months)]
TypeError: unhashable type: 'list'
I know that contain_values = df[df['month'].str.contains('Dec')] works but I would like to return the new dataframe with the summer months in it. Or even all the none summer months using the ~ function.
Thanks
>>> contain_values = df[df['month'].str.contains('|'.join(summer_months))]
>>> contain_values
month days_in_month
0 January 31
1 February 28
11 December 31
You can as well using what .str offers you:
df[df["month"].str[:3].isin(summer_months)]
OUTPUT
month days_in_month
0 January 31
1 February 28
11 December 31
You can make it more robust using something like this (in case names in the dataframe are not properly capitalized):
df[df["month"].str.capitalize().str[:3]]

Pandas str split a column and add it to a dataframe

so I have a dataframe with this information and for the release date, I want to str,split so that it only display the year.
how can i str.split the release date so that it only shows the year and label it "release year"
enter image description here
df['release year'] = df['releaseDate'].apply(lambda x: x.split('-')[-1])
You can access the year and month attributes directly using the following:
import pandas as pd
df = pd.DataFrame({"data" : ["11-Jan-2020", "05-Feb-2020", "01-Mar-2020"]})
df["data"]=pd.to_datetime(df["data"], dayfirst=True)
df['year'] = df['data'].dt.year
df.head()
Your df will now look like:
data year
0 2020-01-11 2020
1 2020-02-05 2020
2 2020-03-01 2020

Editing the date in pandas to show year only, by column

I am trying to understand how I can edit the dataframe in python using pandas so I can drop everything but the year.
Example: if the date is 2014-01-01, I want it to show 2014 and drop both the month and the date. All the dates are in a single column.
Thanks in advice!
You can convert the numpy.datetime64 date value to datetime using pd.to_datetime() and then you can extract year or month or day from it.
import numpy as np
date = np.datetime64('2014-01-01')
type(date)
Output:
numpy.datetime64
Convert this date to pandas datetime using pd.to_datetime.
date = pd.to_datetime(date)
type(date)
Output:
pandas._libs.tslibs.timestamps.Timestamp
Then you can extract the year using .year
date.year
Output:
2014
So, if you if you have a df:
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df
Output:
date
0 2014
1 2015
2 2016
Alternately, you can also do this
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = df['date'].apply(lambda x: x.strftime('%Y'))
df
Output:
date
0 2014
1 2015
2 2016
EDIT 1
Group by using year when the column has date values
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.groupby(df.index.year).size()
Output:
date
2014 1
2015 1
2016 1
You can still do the same even if you have removed the month and day from the date and only have year in your column
df = pd.DataFrame({'date': [np.datetime64('2014-01-01'), np.datetime64('2015-01-01'), np.datetime64('2016-01-01')]})
df['date'] = pd.DatetimeIndex(df['date']).year
df.groupby('date').size()
Output:
date
2014 1
2015 1
2016 1

Extracting date components in pandas series

I have problems with transforming a Pandas dataframe column with dates to a number.
import matplotlib.dates
import datetime
for x in arsenalchelsea['Datum']:
year = int(x[:4])
month = int(x[5:7])
day = int(x[8:10])
hour = int(x[11:13])
minute = int(x[14:16])
sec = int(x[17:19])
arsenalchelsea['floatdate']=date2num(datetime.datetime(year, month, day, hour, minute, sec))
arsenalchelsea
I want to make a new column in my dataframe with the dates in numbers, because i want to make a line graph later with the date on the x-as.
This is the format of the date:
2017-11-29 14:06:45
Does anyone have a solution for this problem?
Slicing strings to get date components is bad practice. You should convert to datetime and extract directly.
In this case, it seems you can just use pd.to_datetime, but below I also demonstrate how you can extract the various components once you have performed the conversion.
df = pd.DataFrame({'Date': ['2017-01-15 14:55:42', '2017-11-10 12:15:21', '2017-12-05 22:05:45']})
df['Date'] = pd.to_datetime(df['Date'])
df[['year', 'month', 'day', 'hour', 'minute', 'sec']] = \
df['Date'].apply(lambda x: (x.year, x.month, x.day, x.hour, x.minute, x.second)).apply(pd.Series)
Result:
Date year month day hour minute sec
0 2017-01-15 14:55:42 2017 1 15 14 55 42
1 2017-11-10 12:15:21 2017 11 10 12 15 21
2 2017-12-05 22:05:45 2017 12 5 22 5 45

Categories