pandas aggregate on Date column

pandas aggregate on Date column - python

Below is sample pandas code and result.I am trying to convert Date list to date format excluding the timestamp i.e just the date eg:2022-06-28., but unable to get the result. Any help is much appreciated.
df2=df1.sort_values(['Date'],ascending=False).groupby(['Remarks'])
['dr','cr','Date'].agg({'dr':['sum',list],'cr':['sum',list],'Date':list}).reset_index()
Remarks dr cr Date
sum list sum list list
0 peta 10000.00 [10000.0] 0.0 [nan] [2022-06-28 00:00:00]
1 axis 227222.00 [227222.0] 0.0 [nan] [2022-12-05 00:00:00]
Thanks in advance.

You could format the date before running groupby command.
df1['Date_wo_timestamp'] = df1['Date'].str.split(' ').str[:1]

first convert date list to date
df2['Date'] = pd.to_datetime(df2['Date'])
df2['Date'] = pd.to_datetime(df2['Date']).dt.date #if you want only date
df2=df1.sort_values(by='Date',ascending=False).groupby(['Remarks'])

Related

Getting Type Error Trying to create a month-year column from date ranges in pandas

I'm trying to follow the solution provided Find all months between two date columns and generate row for each month and I'm hitting a wall as I'm getting an error. What I want to do is create a Year-Month column for each year-month that exists in the startdate and enddate range for each row. When I tried to follow the above linked Stack, I get the error
TypeError: Cannot convert input ... Name: ServiceStartDate, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp
I have no idea how to fix this. Please help!
Sample Data
ID
StartDate
EndDate
1
311566
2021-10-01
2024-09-30
2
235216
2020-11-01
2020-11-30
3
157054
2021-10-01
2023-09-30
4
159954
2021-01-01
2023-12-31
5
255815
2019-11-01
2022-10-31

I have found a solution to my problem (sorry for the long response delay). The problem was that my data had a time stamp associated with it. I needed to change the date field to y/m/-01 format using the following code.
df['date] = df['date'].apply(lambda x: x.strftime('%Y-%m-01'))
Then I used the solution below to get all the months/years that exist between the min and max dates as a single column.
df.merge(df.apply(lambda s: pd.date_range(df['date'].min(),
df['date'].max(), freq='MS'), 1).explode("").rename('Month'),
left_index=True, right_index=True)

Use multiple dates in pd.date_range

I have a range of dates in date column of dataframe. The dates are scattered eg 1st feb, 5th Feb, 11th feb etc.
I want to use pd.date_range with frequency one minute on every date in this column. So my start argument will be date and the end argument will be date + datetime.timedelta(days=1).
I'm struggling with using apply function with this, can someone help me with it? or can I use some other function over here?
I don't want to use a for loop because the length of my dates will be HUGE.
I tried this :
df.date.apply(lamda x : pd.date_range(start=df['date'],end = df['date']+datetime.timedelta(days=1),freq="1min"),axis =1)
but I'm getting error.
Thanks in advance

Use x in lambda function instead df['date'] and remove axis=1:
df = pd.DataFrame({'date':pd.date_range('2021-11-26', periods=3)})
print (df)
date
0 2021-11-26
1 2021-11-27
2 2021-11-28
s = df['date'].apply(lambda x:pd.date_range(start=x,end=x+pd.Timedelta(days=1),freq="1min"))
print (s)
0 DatetimeIndex(['2021-11-26 00:00:00', '2021-11...
1 DatetimeIndex(['2021-11-27 00:00:00', '2021-11...
2 DatetimeIndex(['2021-11-28 00:00:00', '2021-11...
Name: date, dtype: object.Timedelta(days=1),freq="1min"))

How To Sum all the values of a column for a date instance in pandas

I am working on time-series data, where I have two columns date and quantity. The date is day wise. I want to add all the quantity for a month and convert it into a single date.
date is my index column
Example
quantity
date
2018-01-03 30
2018-01-05 45
2018-01-19 30
2018-02-09 10
2018-02-19 20
Output :
quantity
date
2018-01-01 105
2018-02-01 30
Thanks in advance!!

You can downsample to combine the data for each month and sum it by chaining the sum method.
df.resample("M").sum()
Check out the pandas user guide on resampling here.
You'll need to make sure your index is in datetime format for this to work. So first do: df.index = pd.to_datetime(df.index). Hat tip to sammywemmy for the same advice in the comments.

You an also use groupby to get results.
df.index = pd.to_datetime(df.index)
df.groupby(df.index.strftime('%Y-%m-01')).sum()

Add a column for day of the week based on Date INdex

I'm new to the language and have managed to create a dataframe below. it is MultiIndex and is a (a,b) size.
The Date is on the rows, and I'm not fully sure how it is all defined.
I want to add a column that is the day of the week (1,2,3,4,5,6,7) for the days, based on the date stamps on the left/index.
Can someone show me how to do it please, I'm just confused on how to pull the index/date column to do calcs on.
Thanks
print(df_3.iloc[:,0])
Date
2019-06-01 8573.84
2019-06-02 8565.47
2019-06-03 8741.75
2019-06-04 8210.99
2019-06-05 7704.34
2019-09-09 10443.23
2019-09-10 10336.41
2019-09-11 10123.03
2019-09-12 10176.82
2019-09-13 10415.36
Name: (bitcoin, Open), Length: 105, dtype: float64

I've just used two of yours first columns and 3 of your records to get a possible solution. It's pretty much of what Celius did, but with a column conversion to to_datetime.
data = [['2019-06-01', 8573.84], ['2019-06-02', 8565.47], ['2019-06-03', 8741.75]]
df = pd.DataFrame(data,columns = ['Date', 'Bitcoin'])
df['Date']= pd.to_datetime(df['Date']).dt.dayofweek
The output result prints 5 for day 2019-06-01 which is a Saturday, 6 for the 2019-06-02 (Sunday) and 0 for 2019-06-03 (Monday).
I hope it helps you.

If you are using pandas and your Index is interpreted as a Datetime object, I would try the following (I assume Date is your index given the dataframe you provided as example):
df = df.reset_index(drop=False) #Drop the index so you can get a new column named `Date`.
df['day_of_week'] = df['Date'].dt.dayofweek #Create new column using pandas `dt.dayofweek`
Edit: Also possible duplicate of Create a day of week column in a pandas dataframe

comparing date time values in a pandas DataFrame with a specific data_time value and returning the closet one

I have a date column in a pandas DataFrame as follows:
index date_time
1 2013-01-23
2 2014-01-23
3 2015-8-14
4 2015-10-23
5 2016-10-28
I want to compare the values in date_time column with a specific date, for example date_x = 2015-9-14 ad return a date that is before this date and it is the most closet, which is 2015-8-14.
I thought about converting the values in date_time column to a list and then compare them with the specific date. However, I do not think it is an efficient solution.
Any solution?
Thank you.

Here is one way using searchsorted, and all my method is assuming the data already order , if not doing the df=df.sort_values('date_time')
df.date_time=pd.to_datetime(df.date_time)
date_x = '2015-9-14'
idx=np.searchsorted(df.date_time,pd.to_datetime(date_x))
df.date_time.iloc[idx-1]
Out[408]:
2 2015-08-14
Name: date_time, dtype: datetime64[ns]
Or we can do
s=df.date_time-pd.to_datetime(date_x)
df.loc[[s[s.dt.days<0].index[-1]]]
Out[417]:
index date_time
2 3 2015-08-14

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas aggregate on Date column - python

You could format the date before running groupby command. df1['Date_wo_timestamp'] = df1['Date'].str.split(' ').str[:1]

first convert date list to date df2['Date'] = pd.to_datetime(df2['Date']) df2['Date'] = pd.to_datetime(df2['Date']).dt.date #if you want only date df2=df1.sort_values(by='Date',ascending=False).groupby(['Remarks'])

Related

Getting Type Error Trying to create a month-year column from date ranges in pandas

Use multiple dates in pd.date_range

How To Sum all the values of a column for a date instance in pandas

Add a column for day of the week based on Date INdex

comparing date time values in a pandas DataFrame with a specific data_time value and returning the closet one

Categories

Resources