python dataframe datetime condition - python

I am trying to create a new dataframe from an existing one by conditioning holiday datetime. train dataframe is existing and I want to create train_holiday from it by taking day and month values of holiday dataframe, my purpose is similar below:
date values
2015-02-01 10
2015-02-02 20
2015-02-03 30
2015-02-04 40
2015-02-05 50
2015-02-06 60
date
2012-02-02
2012-02-05
now first one is existing, and second dataframe shows holidays. I want to create a new dataframe from first one that only contains 2015 holidays similar below:
date values
2015-02-02 20
2015-02-05 50
I tried
train_holiday = train.loc[train["date"].dt.day== holidays["date"].dt.day]
but it gives error. could you please help me about this?

In your problem you care only the month and the day components, and one way to extract that is by dt.strftime() (ref). Applying that extraction on both date columns and use .isin() to keep month-day in df1 that matches that in df2.
df1[
df1['date'].dt.strftime('%m%d').isin(
df2['date'].dt.strftime('%m%d')
)
]
Make sure both date columns are in date-time format so that .dt can work. For example,
df1['date'] = pd.to_datetime(df1['date'])

Related

How To Sum all the values of a column for a date instance in pandas

I am working on time-series data, where I have two columns date and quantity. The date is day wise. I want to add all the quantity for a month and convert it into a single date.
date is my index column
Example
quantity
date
2018-01-03 30
2018-01-05 45
2018-01-19 30
2018-02-09 10
2018-02-19 20
Output :
quantity
date
2018-01-01 105
2018-02-01 30
Thanks in advance!!
You can downsample to combine the data for each month and sum it by chaining the sum method.
df.resample("M").sum()
Check out the pandas user guide on resampling here.
You'll need to make sure your index is in datetime format for this to work. So first do: df.index = pd.to_datetime(df.index). Hat tip to sammywemmy for the same advice in the comments.
You an also use groupby to get results.
df.index = pd.to_datetime(df.index)
df.groupby(df.index.strftime('%Y-%m-01')).sum()

Add a column for day of the week based on Date INdex

I'm new to the language and have managed to create a dataframe below. it is MultiIndex and is a (a,b) size.
The Date is on the rows, and I'm not fully sure how it is all defined.
I want to add a column that is the day of the week (1,2,3,4,5,6,7) for the days, based on the date stamps on the left/index.
Can someone show me how to do it please, I'm just confused on how to pull the index/date column to do calcs on.
Thanks
print(df_3.iloc[:,0])
Date
2019-06-01 8573.84
2019-06-02 8565.47
2019-06-03 8741.75
2019-06-04 8210.99
2019-06-05 7704.34
2019-09-09 10443.23
2019-09-10 10336.41
2019-09-11 10123.03
2019-09-12 10176.82
2019-09-13 10415.36
Name: (bitcoin, Open), Length: 105, dtype: float64
I've just used two of yours first columns and 3 of your records to get a possible solution. It's pretty much of what Celius did, but with a column conversion to to_datetime.
data = [['2019-06-01', 8573.84], ['2019-06-02', 8565.47], ['2019-06-03', 8741.75]]
df = pd.DataFrame(data,columns = ['Date', 'Bitcoin'])
df['Date']= pd.to_datetime(df['Date']).dt.dayofweek
The output result prints 5 for day 2019-06-01 which is a Saturday, 6 for the 2019-06-02 (Sunday) and 0 for 2019-06-03 (Monday).
I hope it helps you.
If you are using pandas and your Index is interpreted as a Datetime object, I would try the following (I assume Date is your index given the dataframe you provided as example):
df = df.reset_index(drop=False) #Drop the index so you can get a new column named `Date`.
df['day_of_week'] = df['Date'].dt.dayofweek #Create new column using pandas `dt.dayofweek`
Edit: Also possible duplicate of Create a day of week column in a pandas dataframe

comparing date time values in a pandas DataFrame with a specific data_time value and returning the closet one

I have a date column in a pandas DataFrame as follows:
index date_time
1 2013-01-23
2 2014-01-23
3 2015-8-14
4 2015-10-23
5 2016-10-28
I want to compare the values in date_time column with a specific date, for example date_x = 2015-9-14 ad return a date that is before this date and it is the most closet, which is 2015-8-14.
I thought about converting the values in date_time column to a list and then compare them with the specific date. However, I do not think it is an efficient solution.
Any solution?
Thank you.
Here is one way using searchsorted, and all my method is assuming the data already order , if not doing the df=df.sort_values('date_time')
df.date_time=pd.to_datetime(df.date_time)
date_x = '2015-9-14'
idx=np.searchsorted(df.date_time,pd.to_datetime(date_x))
df.date_time.iloc[idx-1]
Out[408]:
2 2015-08-14
Name: date_time, dtype: datetime64[ns]
Or we can do
s=df.date_time-pd.to_datetime(date_x)
df.loc[[s[s.dt.days<0].index[-1]]]
Out[417]:
index date_time
2 3 2015-08-14

How to group by day and month in pandas?

Given a series like this
Date
2005-01-01 128
2005-01-02 72
2005-01-03 67
2005-01-04 61
2005-01-05 33
Name: Data_Value, dtype: int64
for several years, how do I group all the January 1sts together, all the January 2nds, etc?
I'm actually trying to find the max for each day of the year across several years, so it does not have to be groupby. If there is an easier way to do this, that would be great.
You can convert your index to datetime, then use strftime to get a date formatted string to group on:
df.groupby(pd.to_datetime(df.index).strftime('%b-%d'))['Date_Value'].max()
If there are no NaNs in your date string, you can slice as well. This returns strings of the format "MM-DD":
df.groupby(df.index.astype(str).str[5:])['Date_Value'].max()
Why to not just keep it simple!
max_temp = dfall.groupby([(dfall.Date.dt.month),(dfall.Date.dt.day)])['Data_Value'].max()
As an alternative, you can use a pivot table:
Reset index and format date columns
df=df.reset_index()
df['date']=pd.to_datetime(df['index'])
df['year']=df['date'].dt.year
df['month']=df['date'].dt.month
df['day']=df['date'].dt.day
Pivot over the month and day columns:
df_grouped=df.pivot_table(index=('month','day'),values='Date',aggfunc='max')

How to generate a series of dates that are the first monday from a given start date and increment by 2 weeks

I have a dataferam with a column that contains the date for the first monday of evry week between an arbitrary start date and now. I wish to generate a new column that has 2 week jumps but is the same length as the original column and would contain repeated values. For example this would be the result for the month of October where the column weekly exists and bi-weekly is the target:
data = {'weekly':['2018-10-08','2018-10-15','2018-10-22','2018-10-29']
,'bi-weekly':['2018-10-08','2018-10-08',
'2018-10- 22','2018-10-22']}
df = pd.DataFrame(data)
At the moment I am stuck with pd.date_range(start,end,freq='14D') but this does not contain any repeated values which I need to be able to groupby
IIUC
df.groupby(np.arange(len(df))//2).weekly.transform('first')
Out[487]:
0 2018-10-08
1 2018-10-08
2 2018-10-22
3 2018-10-22
Name: weekly, dtype: datetime64[ns]

Categories