could anyone can teach me how to calculate the time duration for each person? I'm not good at describing the question because my English sucks but as the picture shows below, I want to select the latest date for the person and then subtract the oldest date to get the time duration. Thanks
I wanna get the duration time.
If need subtract maximal dates per customers use GroupBy.transform with Series.sub, last if necessary convert timedeltas to days by Series.dt.days:
df['date'] = pd.to_datetime(df['date'])
df['dur'] = df.groupby('customer')['date'].transform('max').sub(df['date']).dt.days
Related
Is it possible to use .resample() to take the last observation in a month of a weekly time series to create a monthly time series from the weekly time series? I don't want to sum or average anything, just take the last observation of each month
Thank you.
Based on what you want and what the documentation describes, you could try the following :
data[COLUMN].resample('M', convention='end')
Try it out and update us!
References
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html
Is the 'week' field as week of year, a date or other?
If it's a datetime, and you have datetime library imported , use .dt.to_period('M') on your current date column to create a new 'month' column, then get the max date for each month to get the date to sample ( if you only want the LAST date in each month ? )
Like max(df['MyDateField'])
Someone else is posting as I type this, so may have a better answer :)
So I have sales data that I'm trying to analyze. I have datetime data ["Order Date Time"] and I'd like to see the most common hours for sales but more importantly I'd like to see what minutes have NO sales.
I have been spinning my wheels for a while and I can't get my brain around a solution. Any help is greatly appreciated.
I import the data:
df = pd.read_excel ('Audit Period.xlsx')
print (df)
I clean up the data:
# Remove all columns except `applieddate` and null rows
time_df = df[df["Order Date Time"].notnull()]
# Ensure the index is still sequential
time_df = time_df[["Order Date Time"]].reset_index(drop=True)
# Select the first 10 rows
time_df.head(10)
I convert to datetime and I look at the month totals:
# Convert applieddate to datetime
time_df = time_df.copy()
time_df["Order Date Time"] = time_df["Order Date Time"].apply(pd.to_datetime)
time_df = time_df.set_index(time_df["Order Date Time"])
# Group by month
grouped = time_df.resample("M").count()
time_df = pd.DataFrame({"count": grouped.values.flatten()}, index=grouped.index)
time_df.head(10)
I try to group by hour but that gives me totals per day/hour rather than totals per hour like every order ever at noon, etc:
# Group by hour
grouped = time_df.resample("2H").count()
time_df = pd.DataFrame({"count": grouped.values.flatten()}, index=grouped.index)
time_df.head(10)
And that is where I'm stuck. I'm trying to integrate the below suggestions but can't quite get a grasp on them yet. Any help would be appreciated.
Not sure if this is the most brilliant solution, but I would start by generating a dataframe at the level of detail I wanted, whether that is 1-hour intervals, 5-minute intervals, etc. Then in your df with all the actual data, you could do your grouping as you currently are doing it above. Once it is grouped, join the two. That way you have one dataframe that includes empty rows associated with time spans with no records. The tricky part will just be making sure you have your date and time formatted in a way that it will match and join properly.
Two event columns dtb(start time) dte(stop time)
In the image two columns is there I want group by day of the value for get min(time) as start of the event on the day and get max(time) as stop of the event on the day.I want like this
I will try to do my best to answer it as I understood it.
Supposing your columns dtb and dte are in datetime format:
df['date'] = df.dtb.dt.date
df['dtb'] = df.dtb.dt.time
df['dte'] = df.dte.dt.time
result = df.groupby('date').agg({'dtb': np.max,
'dte': np.min})
print(result)
What I did is create a new column with the date, and reformat the dtb and dte columns to get only the time, and then group by the date taking the max and min for dtb and dte
You can directly group per day or per week even using the following syntax
dg_bydate= df.groupby(pd.Grouper(key='dtb', freq='1D')).agg({'dte':[np.min, np.max]})
I am working on a Dataframe in which there two column checkin date & checkout date. Initially those two columns were in string type. I have changed them to date time object and calculate days difference between then using checkout date - checkin date. Now I want to filter rows based on the stay duration. Could any one help me to do that as the difference came in also in datetime object format.
Here is my code so far:
New_Train['checkin_date']=pd.to_datetime(New_Train['checkin_date'])
New_Train['checkout_date']=pd.to_datetime(New_Train['checkout_date'])
print(New_Train.info())
#Now, Checking date & checkout date type changed to date time
New_Train['Checkin_Year'],New_Train['Checkin_Month']=
New_Train['checkin_date'].dt.year,New_Train['checkin_date'].dt.month
New_Train['Stay_Duration'] = New_Train['checkout_date']- New_Train['checkin_date']
Thanks in Advance
I want to divide the date range starting from 1st July to 1st August, in weekly basis. But I want it to start from 1st day of the month.
I am using pd.date_range('2015-07-01', '2015-08-01', freq='W' )
But I am getting
DatetimeIndex(['2015-07-05', '2015-07-12', '2015-07-19', '2015-07-26'], dtype='datetime64[ns]', freq='W-SUN')
I want this to be done from 2015-07-01. I know I can use timedelta or find the start day of the month and use W-WED. But is there any other shortcut to do the same using date_range of pandas?
I have checked http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases, but could not come up with anything useful.
Any help is appreciated. Thanks in advance.
I would suggest using the frequency of 7 days instead of a week, so that you will start at the first day of the month rather than the first day of the week
pd.date_range('2015-07-01', '2015-08-01', freq='7d')
EDIT
To clarify, it is not strictly the first day of the month, but the first day you provide. But in your example those two are the same