I am new to Pandas timeseries and dataframes and struggle getting this simple task done.
I have a dataset "data" (1-dimensional float32-Numpy array) for each day from 1/1/2004 - 12/31/2008. The dates are stored as a list of datetime objects "dates".
Basically, I would like to calculate a complete "standard year" - the average value of each day of all years (1-365).
I started from this similar (?) question (Getting the average of a certain hour on weekdays over several years in a pandas dataframe), but could not get to the desired result - a time series of 365 "average" days, e.g. the average of all four 1st of January's, 2nd of January's ...
A small example script:
import numpy as np
import pandas as pd
import datetime
startdate = datetime.datetime(2004, 1, 1)
enddate = datetime.datetime(2008, 1, 1)
days = (enddate + datetime.timedelta(days=1) - startdate).days
data = np.random.random(days)
dates = [startdate + datetime.timedelta(days=x) for x in range(0, days)]
ts = pd.Series(data, dates)
test = ts.groupby(lambda x: (x.year, x.day)).mean()
Group by the month and day, rather than the year and day:
test = ts.groupby([ts.index.month, ts.index.day]).mean()
yields
1 1 0.499264
2 0.449357
3 0.498883
...
12 17 0.408180
18 0.317682
19 0.467238
...
29 0.413721
30 0.399180
31 0.828423
Length: 366, dtype: float64
Related
I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking leap years into account as well.
Example: 1st Jan 1995 - Day 1, 1st Feb 1995 - Day 32 .......and so on all the way to 31st.
If you want to filter a pandas dataframe using a range of 2 date you can do this by:
start_date = '1995/01/01'
end_date = '1995/02/01'
df = df[ (df['days']>=start_date) & (df['days']<=end_date) ]
and with len(df) you will see the number of rows of the filter dataframe.
Instead, if you want to calculate a range of days between 2 different date you can do without pandas with datetime:
from datetime import datetime
start_date = '1995/01/01'
end_date = '1995/02/01'
delta = datetime.strptime(end_date, '%Y/%m/%d') - datetime.strptime(start_date, '%Y/%m/%d')
print(delta.days)
Output:
31
The only thing is that this not taking into account leap years
Having a data frame as below:
Day
Month and year
13
septiembre /98
15
August/98
24
Novem /98
Is it possible that i can merge day with month and year and create a new column.
Day
Month and year
Date
13
septiembre /98
13-09-98
15
August/98
15-08-98
24
Nov /98
24-11-98
I was able to create a panda Series that converts the data you provided from a list to a new list, and then obtains a panda Series from the new, correctly formatted list. I'm not sure if that's what you wanted, but anyway I hope this can be of some help:
import pandas as pd
day = [13, 15, 24]
monthyear = ['Semptember/98', 'August/98', 'November/98']
daymonthyear = zip(day, monthyear)
daymonthyear_new = []
for i in daymonthyear:
teste = [i[0],i[1].split("/")]
string = str(teste[0]) + "-" + str(teste[1][0]) + "-" +
str(teste[1][1])
print('string= ', string)
daymonthyear_new.append(string)
print('daymonthyear_new= ', daymonthyear_new)
import datetime
dates = pd.Series(daymonthyear_new)
dates
You could perform string slicing and concatenation provided that your dataset comes in a predictable and standard format. Cast this new string to datetime using the pd.to_datetime method.
For example, this would work for your example:
import pandas as pd
df = pd.DataFrame([[13, 'Septiembre /98'], [15, 'August/98'], [24, 'Novem /98']], columns=["Day", "Month and year"])
df['Date'] = pd.to_datetime(
df['Day'].astype('str') +
' - ' +
df['Month and year'].str.slice(0, 3) +
' - ' +
df['Month and year'].str.slice(-2)
)
print(df)
Day Month and year Date
0 13 Septiembre /98 1998-09-13
1 15 August/98 1998-08-15
2 24 Novem /98 1998-11-24
This question already has answers here:
How to convert columns into one datetime column in pandas?
(8 answers)
Closed 1 year ago.
I'm using pandas and have 3 columns of data, containing a day, a month, and a year. I want to input my numbers into a loop so that I can create a new column in my dataframe that shows the week number. My data also starts from October 1, and I need this to be my first week.
I've tried using this code:
for (a,b,c) in zip(year, month, day):
print(datetime.date(a, b, c).strftime("%U"))
But this assumes that the first week is in January.
I'm also unsure how to assign what's in the loop to a new column. I was just printing what was in the for loop to test it out.
Thanks
I think this is what you want :
import pandas as pd
import datetime
# define a function to get the week number according to January
get_week_number = lambda y, m, d : int(datetime.date(y, m, d).strftime('%U'))
# get the week number for October 1st; the offset
offset = get_week_number(2021, 10, 1)
def compute_week_number(year, month, day):
"""
Function that computes the week number with an offset
October 1st becomes week number 1
"""
return get_week_number(year, month, day) - offset + 1
df = pd.DataFrame({'year':[2021, 2021, 2021],
'month':[10, 10, 10],
'day':[1, 6, 29]})
df['week_number'] = df.apply(lambda x: compute_week_number(x['year'],
x['month'],
x['day']),
axis=1)
apply with the use of axis=1 allows to call a function for each line of the dataframe to return the value of the new column we want to compute for this line.
I used % (modulo) to compute the new week number according to what you asked for.
Week 39 becomes week 1, week 40 becomes week 2 and so on.
This gives :
year
month
day
week_number
2021
10
1
1
2021
10
6
2
2021
10
29
5
I have 2 datasets to work with:
ID Date Amount
1 2020-01-02 1000
1 2020-01-09 200
1 2020-01-08 400
And another dataset which tells which is most frequent day of week and most frequent week of month for each ID(there are multiple such IDs)
ID Pref_Day_Of_Week_A Pref_Week_Of_Month_A
1 3 2
For this ID ,Thursday was the most frequent day of the week for ID 1 and 2nd week of the month was the most frequent week of the month.
I wish to find sum of all the amounts that took place on the most frequent day of week and frequent week of month, for all IDs(hence requiring groupby):
ID Amount_On_Pref_Day Amount_Pref_Week
1 1200 600
I would really appreciate it if anyone could help me calculating this dataframe using pandas. For reference, I have used this function to find the week of month for a given date:
#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
Idea is filter only matched dayofweek and week and aggregate sum, last join together by concat:
#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = weekinmonth(df['Date'])
f = lambda x: x.mode().iat[0]
df1 = (df.groupby('ID', as_index=False).agg(Pref_Day_Of_Week_A=('dayofweek',f),
Pref_Week_Of_Month_A=('week',f)))
s1 = df1.rename(columns={'Pref_Day_Of_Week_A':'dayofweek'}).merge(df).groupby('ID')['Amount'].sum()
s2 = df1.rename(columns={'Pref_Week_Of_Month_A':'week'}).merge(df).groupby('ID')['Amount'].sum()
df2 = pd.concat([s1, s2], axis=1, keys=('Amount_On_Pref_Day','Amount_Pref_Week'))
print (df2)
Amount_On_Pref_Day Amount_Pref_Week
ID
1 1200 600
I have the following table :
DayTime
1 days 19:55:00
134 days 15:34:00
How to convert the Daytime to fully day? Which mean the hours will change to day(devide by 24)
You can convert Timedeltas to numerical units of time by dividing by units of Timedelta. For instance,
import pandas as pd
df = pd.DataFrame({'DayTime':['1 days 19:55:00', '134 days 15:34:00']})
df['DayTime'] = pd.to_timedelta(df['DayTime'])
days = df['DayTime'] / pd.Timedelta(hours=24)
print(days)
yields
0 1.829861
1 134.648611
Name: DayTime, dtype: float64
Note that above I'm assuming that 1 day = 24 hours. That's not always exactly true. Some days are 24 hours + 1 leap second long.
Without using pandas and in python 2.7 (python 3 timedeltas can be directly divided):
import re
from datetime import timedelta
def full_days(day_time):
d, h, m, s = map(int, re.split('\D+', day_time))
delta = timedelta(hours=h, minutes=m, seconds=s)
return d + delta.total_seconds() / timedelta(days=1).total_seconds()
print full_days('1 days 19:55:00')
print full_days('0 days 43:55:00')
print full_days('134 days 15:34:00')
Outputs:
1.82986111111
1.82986111111
134.648611111