I have a dataframe like in example below:
Timestamp ComponentName Utilization
18.10.2020-19:07.10 A Available
19.10.2020-21:07.10 A Available
19.10.2020-19:07.10 A In use
22.10.2020-19:07.10 A In use
25.10.2020-19:07.10 A In use
And desired output should be:
ComponentName Total_Inuse_time Total_Available_time
A 6 days 1 day 2 hours
Basicly I want to have total inuse time and available time for each component.
I have tried grouping by component names and aggregating with sum on Time differences but could not get the desired result.
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['Timestamp'] = df.groupby(['ComponentName', 'Utilization'])['Timestamp'].diff().fillna(pd.Timedelta(0))
sums = df.groupby(['ComponentName', 'Utilization'])['Timestamp'].sum()
Output:
>>> sums
ComponentName Utilization
A Available 1 days 02:00:00
In use 6 days 00:00:00
Name: Timestamp, dtype: timedelta64[ns]
>>> sums['A']
Utilization
Available 1 days 02:00:00
In use 6 days 00:00:00
Name: Timestamp, dtype: timedelta64[ns]
>>> sums['A']['Available']
Timedelta('1 days 02:00:00')
Related
I'm using the function resample to change the daily data to be a monthly data of a pandas dataframe. Reading the documentation I found that I could define the rule='M' or rule='MS'. The first is "calendar month end" and the second is "calendar month begin". What is the difference between the two?
It doesn't set the same date as index of the resampled groups.
Here is an example:
date = pd.Series([0,1,2],
index=pd.to_datetime(['2022-01-15',
'2022-01-20',
'2022-02-15']))
2022-01-15 0
2022-01-20 1
2022-02-15 2
dtype: int64
# resampling MS:
date.resample('MS').mean()
2022-01-01 0.5
2022-02-01 2.0
Freq: MS, dtype: float64
# resampling M:
date.resample('M').mean()
2022-01-31 0.5
2022-02-28 2.0
Freq: M, dtype: float64
Note the difference in the dates of the index. For 'MS' the dates of the groups are always the first of the month, for 'M' the last day.
Date Precipitation
20010101 0
20010102 10
20010103 5
20010104 3
20010105 0
...
20011231 0
I have dataset showing precipitation (in) per each day in the year 2001. The date variable is in YYYYMMDD format. I want to calculate how many times it precipitated each month. In other words, I need the number of times the precipitation value is not 0 per each month.
I am a beginner python learner and don’t quite know how to tell the program to output the count per each month without having to do it individually.
The code I have below does not work because I’m not sure how to tell the program the Date variable is in YYYYMMDD format.
Precip_Count= Date[(Precipitation !=0)]
Is there a way to do this by only using NumPy?
First, convert Date column to datetime using pd.to_datetime and specify the format of your datetime string Datetime format code, then use Series.ne to find non-zero values, groupby month and take the sum using GroupBy.sum
df['Date'] = pd.to_datetime(df['Date'], format="%Y%M%d")
df['Precipitation'].ne(0).groupby(df.Date.dt.month).sum()
Date
1 3
...
12 0
Name: Precipitation, dtype: int64
OR using Series.dt.to_period here.
df['Precipitation'].ne(0).groupby(df.Date.dt.to_period('M')).sum()
Date
2001-01 3
...
2001-12 0
Freq: M, Name: Precipitation, dtype: int64
If you want index as DatetimeIndex use pd.Grouper
df['Precipitation'].ne(0).groupby(pd.Grouper(freq='M')).sum()
Date
2001-01-31 3
...
2001-12-31 0
Freq: M, Name: Precipitation, dtype: int64
The output is calculated from df mentioned in the question.
I'm trying to calculate the difference between two datetime columns (dtype = datetime64[ns]) in pandas. I can successfully calculate the delta in hours, but I want the result to be days.
Example
foo_df = pd.DataFrame({'date_1': ['2019-08-07 09:25:07'],
'date_2': ['2019-08-08 01:01:00']}).astype('datetime64[ns]')
foo_df['delta'] = foo_df['date_2'] - foo_df['date_1']
result
date_1 date_2 delta
0 2019-08-07 09:25:07 2019-08-08 01:01:00 15:35:53
Desired Result
date_1 date_2 delta
0 2019-08-07 09:25:07 2019-08-08 01:01:00 1
NOTE: The delta should be 1 because date_2 is the next day. I only need to calculate if the day is different. I can do this if I convert the date columns to strings, but ideally, I'd like to avoid that since this should be possible to do with dtype; datetime64[ns]
Subtract the dates, then get the number of days from the Timedelta:
foo_df['delta'] = (foo_df.date_2.dt.date - foo_df.date_1.dt.date).dt.days
Want to calculate the difference of days between pandas date series -
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
and today's date.
I tried but could not come up with logical solution.
Please help me with the code. Actually I am new to python and there are lot of syntactical errors happening while applying any function.
You could do something like
# generate time data
data = pd.to_datetime(pd.Series(["2018-09-1", "2019-01-25", "2018-10-10"]))
pd.to_datetime("now") > data
returns:
0 False
1 True
2 False
you could then use that to select the data
data[pd.to_datetime("now") > data]
Hope it helps.
Edit: I misread it but you can easily alter this example to calculate the difference:
data - pd.to_datetime("now")
returns:
0 -122 days +13:10:37.489823
1 24 days 13:10:37.489823
2 -83 days +13:10:37.489823
dtype: timedelta64[ns]
You can try as Follows:
>>> from datetime import datetime
>>> df
col1
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
Make Sure to convert the column names to_datetime:
>>> df['col1'] = pd.to_datetime(df['col1'], infer_datetime_format=True)
set the current datetime in order to Further get the diffrence:
>>> curr_time = pd.to_datetime("now")
Now get the Difference as follows:
>>> df['col1'] - curr_time
0 -2145 days +07:48:48.736939
1 -2163 days +07:48:48.736939
2 -2140 days +07:48:48.736939
3 -2139 days +07:48:48.736939
4 -2132 days +07:48:48.736939
5 -2119 days +07:48:48.736939
6 -2115 days +07:48:48.736939
7 -2112 days +07:48:48.736939
Name: col1, dtype: timedelta64[ns]
With numpy you can solve it like difference-two-dates-days-weeks-months-years-pandas-python-2
. bottom line
df['diff_days'] = df['First dates column'] - df['Second Date column']
# for days use 'D' for weeks use 'W', for month use 'M' and for years use 'Y'
df['diff_days']=df['diff_days']/np.timedelta64(1,'D')
print(df)
if you want days as int and not as float use
df['diff_days']=df['diff_days']//np.timedelta64(1,'D')
From the pandas docs under Converting To Timestamps you will find:
"Converting to Timestamps To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the to_datetime function"
I haven't used pandas before but this suggests your pandas date series (a list-like object) is iterable and each element of this series is an instance of a class which has a to_datetime function.
Assuming my assumptions are correct, the following function would take such a list and return a list of timedeltas' (a datetime object representing the difference between two date time objects).
from datetime import datetime
def convert(pandas_series):
# get the current date
now = datetime.now()
# Use a list comprehension and the pandas to_datetime method to calculate timedeltas.
return [now - pandas_element.to_datetime() for pandas_series]
# assuming 'some_pandas_series' is a list-like pandas series object
list_of_timedeltas = convert(some_pandas_series)
I have a column of datetime stamps. I need a column of total minutes elapsed from the first to the last value.
I have:
>>> df = pd.DataFrame({'timestamp': [
... pd.Timestamp('2001-01-01 06:00:00'),
... pd.Timestamp('2001-01-01 06:01:00'),
... pd.Timestamp('2001-01-01 06:15:00')
... ]})
>>> df
timestamp
0 2001-01-01 06:00:00
1 2001-01-01 06:01:00
2 2001-01-01 06:15:00
I need to add a column that gives the running total:
timestamp minutes
1-1-2001 6:00 0
1-1-2001 6:01 1
1-1-2001 6:15 15
1-1-2001 7:00 60
1-1-2001 7:35 95
Having a hard time manipulating the datetime Series to allow me to total up the timestamp.
I've looked at a lot of posts and can't find anything that does what I'm trying to do. Would appreciate any ideas!
You can chain a few methods together:
>>> df['minutes'] = df['timestamp'].diff().fillna(0).dt.total_seconds()\
... .cumsum().div(60).astype(int)
>>> df
timestamp minutes
0 2001-01-01 06:00:00 0
1 2001-01-01 06:01:00 1
2 2001-01-01 06:15:00 15
Creation:
>>> df = pd.DataFrame({'timestamp': [
... pd.Timestamp('2001-01-01 06:00:00'),
... pd.Timestamp('2001-01-01 06:01:00'),
... pd.Timestamp('2001-01-01 06:15:00')
... ]})
Walkthrough
The easiest way to break this down is to separate each intermediate method call.
df['timestamp'].diff() gives you a Series of Pandas-equivalent of Python's datetime.timedelta, the differences in times from each value to the next.
>>> df['timestamp'].diff()
0 NaT
1 00:01:00
2 00:14:00
Name: timestamp, dtype: timedelta64[ns]
This contains an N/A value (NaT/not a time) because there's nothing to subtract from the first value. You can simply fill it with the zero-value for timedeltas:
>>> df['timestamp'].diff().fillna(0)
0 00:00:00
1 00:01:00
2 00:14:00
Name: timestamp, dtype: timedelta64[ns]
Now you need to get an actual integer (minutes) from these objects. In .dt.total_seconds(), .dt is an "accessor" that is a way of accessing a bunch of methods that let you work with datetime-like data:
>>> df['timestamp'].diff().fillna(0).dt.total_seconds()
0 0.0
1 60.0
2 840.0
Name: timestamp, dtype: float64
The result is the incremental second-change as a float. You need this on a cumulative basis, in minutes, and as an integer. That's what the final 3 operations do:
>>> df['timestamp'].diff().fillna(0).dt.total_seconds().cumsum().div(60).astype(int)
0 0
1 1
2 15
Name: timestamp, dtype: int64
Note that astype(int) will do rounding if you have seconds that aren't fully divisible by 60.