Suppose, I have the data like this,
Date Time Energy_produced
01.01.2016 00:00 500
01.01.2016 00:15 580
01.01.2016 00:30 600
01.01.2016 00:45 620
01.01.2016 01:00 580
01.01.2016 01:15 520
01.01.2016 01:30 590
01.01.2016 01:45 570
01.01.2016 02:00 540
Now, i want to sum the energy produced based on each hour
suppose ,
Date Hour Energy produced per hour
01.01.2016 00:00 2280(per hour)
01:01:2016 01:00 2240(per hour)
How to sum like this?
If you want to keep Date/Time as strings, you could use:
(df.groupby(['Date', df['Time'].str[:3].rename('Hour')+'00'])
['Energy_produced'].sum()
.reset_index()
)
Output:
Date Hour Energy_produced
0 01.01.2016 00:00 2300
1 01.01.2016 01:00 2260
2 01.01.2016 02:00 540
NB. You can also get the second group with: df['Time'].str.replace(r'\d{2}$', '00', regex=True).rename('Hour')
Related
I have data, I want to add a column that shows the moving average of the val column for each day.
df
timestamp val val_mean
2022-10-10 00:00 10 10
2022-10-10 00:01 20 15
..
2022-10-10 23:59 50 23
2022-10-11 00:00 80 80
How can I achieve this
Looks like you want a grouped, expanding mean:
group = pd.to_datetime(df['timestamp']).dt.normalize()
df['val_mean'] = df.groupby(group)['val'].expanding().mean().droplevel(0)
output:
timestamp val val_mean
0 2022-10-10 00:00 10 10.000000
1 2022-10-10 00:01 20 15.000000
2 2022-10-10 23:59 50 26.666667
3 2022-10-11 00:00 80 80.000000
First of all thank you for your help.
I have two dataframes row indexed by date (DD-MM-YYYY HH:MM) as follows:
DF1
date temp wind
0 31-12-2002 23:00 12.3 80
1 01-01-2004 00:00 15.2 NAN
2 01-01-2004 01:00 18.4 NAN
........
DF2
date temp wind
0 31-12-2002 23:00 14.5 86
1 01-01-2003 00:00 28.7 98
2 01-01-2003 01:00 26.7 88
........
n 01-01-2004 00:00 34.5 23
m 01-01-2004 01:00 35.7 NAN
MergedDF
date temp wind
0 31-12-2002 23:00 12.3 80
1 01-01-2003 00:00 28.7 98
2 01-01-2003 01:00 26.7 88
........
n 01-01-2004 00:00 15.2 23
m 01-01-2004 01:00 18.4 NAN
In DF1 there's one whole year (2003) missing and also some NAN values in the rest of the years.
Basically I want to merge both dataframes, adding the year missing and replacing NAN values if this information is in DF2.
Someone could help me? I don't know very well how to implement this on python/pandas.
MergedDF = df1.append(df2).groupby('date', as_index=False).first()
as_index=False option of group_by is useful to keep the same table index in the aggregated output.
.first() will keep the first non-null value for each date.
I would like to convert the following time format which is located in a panda dataframe column
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
I would like to transform the previous time format into a standard time format of HH:MM as follow
01:00
02:00
03:00
...
15:00
16:00
...
22:00
23:00
00:00
How can I do it in python?
Thank you in advance
This will give you a df with a datetime64[ns] and object dtype column for your data:
import pandas as pd
df = pd.read_csv('hm.txt', sep=r"[ ]{2,}", engine='python', header=None, names=['pre'])
df['pre_1'] = df['pre'].astype(str).str.replace('00', '')
df['datetime_dtype'] = pd.to_datetime(df['pre_1'], format='%H', exact=False)
df['str_dtype'] = df['datetime_dtype'].astype(str).str[11:16]
print(df.head(5))
pre datetime_dtype str_dtype
0 1 1900-01-01 01:00:00 01:00
1 2 1900-01-01 02:00:00 02:00
2 3 1900-01-01 03:00:00 03:00
3 4 1900-01-01 04:00:00 04:00
4 5 1900-01-01 05:00:00 05:00
print(df.dtypes)
pre object
datetime_dtype datetime64[ns]
str_dtype object
dtype: object
I am creating a dictionary for 7 days. From 22th January to 29th. But there is two different data in one column in a day. Column name is Last Update. That values are I want to combine is '1/25/2020 10:00 PM', '1/25/2020 12:00 PM'. This values in the same column. So 25. January is Saturday. I want to combine them together as Saturday.
For understanding the column:
Last Update
0 1/22/2020 12:00
1 1/22/2020 12:00
2 1/22/2020 12:00
3 1/22/2020 12:00
4 1/22/2020 12:00
...
363 1/29/2020 21:00
364 1/29/2020 21:00
365 1/29/2020 21:00
366 1/29/2020 21:00
367 1/29/2020 21:00
i came so far:
day_map = {'1/22/2020 12:00': 'Wednesday', '1/23/20 12:00 PM': 'Thursday',
'1/24/2020 12:00 PM': 'Friday', .?.?.
You just need to convert date to datetime and use pandas.dt functions. In this case
df["Last Update"] = df["Last Update"].astype("M8")
df["Last Update"].dt.weekday_name
# returns
0 Wednesday
1 Wednesday
2 Wednesday
3 Wednesday
4 Wednesday
Name: Last Update, dtype: object
I have an indexed dataframe (indexed by type then date) and would like to carry out a subtraction between the end time of the top row and start time of the next row in hours :
type date start_time end_time code
A 01/01/2018 01/01/2018 9:00 01/01/2018 14:00 525
01/02/2018 01/02/2018 5:00 01/02/2018 17:00 524
01/04/2018 01/04/2018 8:00 01/04/2018 10:00 528
B 01/01/2018 01/01/2018 5:00 01/01/2018 14:00 525
01/04/2018 01/04/2018 2:00 01/04/2018 17:00 524
01/05/2018 01/05/2018 7:00 01/05/2018 10:00 528
I would like to get the resulting table with a new column['interval']:
type date interval
A 01/01/2018 -
01/02/2018 15
01/04/2018 39
B 01/01/2018 -
01/04/2018 60
01/05/2018 14
The interval column is in hours
You can convert start_time and end_time to datetime format, then use apply to subtract the end_time of the previous row in each group (using groupby). To convert to hours, divide by pd.Timedelta('1 hour'):
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])
df['interval'] = (df.groupby(level=0,sort=False).apply(lambda x: x.start_time-x.end_time.shift(1)) / pd.Timedelta('1 hour')).values
>>> df
start_time end_time code interval
type date
A 01/01/2018 2018-01-01 09:00:00 2018-01-01 14:00:00 525 NaN
01/02/2018 2018-01-02 05:00:00 2018-01-02 17:00:00 524 15.0
01/04/2018 2018-01-04 08:00:00 2018-01-04 10:00:00 528 39.0
B 01/01/2018 2018-01-01 05:00:00 2018-01-01 14:00:00 525 NaN
01/04/2018 2018-01-04 02:00:00 2018-01-04 17:00:00 524 60.0
01/05/2018 2018-01-05 07:00:00 2018-01-05 10:00:00 528 14.0