Copy values from Fridays to the following Saturday in pandas - python

I have 8760 hours worth of data and have added a datetime index against it. What I want to do is replace all the values that are on Saturdays with the values from the previous Friday.
df = pandas.DataFrame(hourly_data, names=['values'])
df.index = pd.date_range('2015-01-01','2016-01-01', freq='H')[:8760]
df['weekday'] = df.index.weekday
So the df format is as follows:
value weekday
2015-01-03 00:00:00 21 5
2015-01-03 01:00:00 21 5
2015-01-03 02:00:00 21 5
...
2015-01-03 00:00:00 12 6
2015-01-03 01:00:00 12 6
2015-01-03 02:00:00 12 6
And what I want to get out is:
value weekday
2015-01-03 00:00:00 21 5
2015-01-03 01:00:00 21 5
2015-01-03 02:00:00 21 5
...
2015-01-03 00:00:00 21 6
2015-01-03 01:00:00 21 6
2015-01-03 02:00:00 21 6
But I've got no idea how to get there. Something to do with offset perhaps?

You can use loc and a mask to select the rows you wish to modify and assign the values shifted by your regular interval:
df.loc[df.index.weekday==5] = df.shift(-24)

Related

How to use pandas Grouper to get sum of values within each hour

I have the following table:
Hora_Retiro count_uses
0 00:00:18 1
1 00:00:34 1
2 00:02:27 1
3 00:03:13 1
4 00:06:45 1
... ... ...
748700 23:58:47 1
748701 23:58:49 1
748702 23:59:11 1
748703 23:59:47 1
748704 23:59:56 1
And I want to group all values within each hour, so I can see the total number of uses per hour (00:00:00 - 23:00:00)
I have the following code:
hora_pico_aug= hora_pico.groupby(pd.Grouper(key="Hora_Retiro",freq='H')).count()
Hora_Retiro column is of timedelta64[ns] type
Which gives the following output:
count_uses
Hora_Retiro
00:00:02 2566
01:00:02 602
02:00:02 295
03:00:02 5
04:00:02 10
05:00:02 4002
06:00:02 16075
07:00:02 39410
08:00:02 76272
09:00:02 56721
10:00:02 36036
11:00:02 32011
12:00:02 33725
13:00:02 41032
14:00:02 50747
15:00:02 50338
16:00:02 42347
17:00:02 54674
18:00:02 76056
19:00:02 57958
20:00:02 34286
21:00:02 22509
22:00:02 13894
23:00:02 7134
However, the index column starts at 00:00:02, and I want it to start at 00:00:00, and then go from one hour intervals. Something like this:
count_uses
Hora_Retiro
00:00:00 2565
01:00:00 603
02:00:00 295
03:00:00 5
04:00:00 10
05:00:00 4002
06:00:00 16075
07:00:00 39410
08:00:00 76272
09:00:00 56721
10:00:00 36036
11:00:00 32011
12:00:00 33725
13:00:00 41032
14:00:00 50747
15:00:00 50338
16:00:00 42347
17:00:00 54674
18:00:00 76056
19:00:00 57958
20:00:00 34286
21:00:00 22509
22:00:00 13894
23:00:00 7134
How can i make it to start at 00:00:00??
Thanks for the help!
You can create an hour column from Hora_Retiro column.
df['hour'] = df['Hora_Retiro'].dt.hour
And then groupby on the basis of hour
gpby_df = df.groupby('hour')['count_uses'].sum().reset_index()
gpby_df['hour'] = pd.to_datetime(gpby_df['hour'], format='%H').dt.time
gpby_df.columns = ['Hora_Retiro', 'sum_count_uses']
gpby_df
gives
Hora_Retiro sum_count_uses
0 00:00:00 14
1 09:00:00 1
2 10:00:00 2
3 20:00:00 2
I assume that Hora_Retiro column in your DataFrame is of
Timedelta type. It is not datetime, as in this case there
would be printed also the date part.
Indeed, your code creates groups starting at the minute / second
taken from the first row.
To group by "full hours":
round each element in this column to hour,
then group (just by this rounded value).
The code to do it is:
hora_pico.groupby(hora_pico.Hora_Retiro.apply(
lambda tt: tt.round('H'))).count_uses.count()
However I advise you to make up your mind, what do you want to count:
rows or values in count_uses column.
In the second case replace count function with sum.

Converting to datetime in python

I have a time data in a column and trying to figure out how can I get it in datetime format
2000
2100
2300
2355
0
1
5
10
100
105
330
My question is how can I get these in datetime format:
output should be:
20:00:00
21:00:00
23:00:00
23:55:00
00:00:00
00:01:00
00:05:00
00:10:00
01:00:00
01:05:00
03:30:00
tried:
1. da = pd.to_datetime(330, format='%H%M')
output: '03:30:00'
2. d= str(datetime.timedelta(minutes = 55 ))
output : '0:55:00'
But if I apply 1. to 100 it gives 10 hrs.
eg: da = pd.to_datetime(100, format='%H%M')
output: '10:00:00'
Try,
pd.to_datetime(df['time'].astype(str).str.zfill(4), format = '%H%M').dt.time
0 20:00:00
1 21:00:00
2 23:00:00
3 23:55:00
4 00:00:00
5 00:01:00
6 00:05:00
7 00:10:00
8 01:00:00
9 01:05:00
10 03:30:00
IIUC str.rjust
pd.to_datetime(s.astype(str).str.rjust(4,'0'),format='%H%M').dt.time
Out[41]:
0 20:00:00
1 21:00:00
2 23:00:00
3 23:55:00
4 00:00:00
5 00:01:00
6 00:05:00
7 00:10:00
8 01:00:00
9 01:05:00
10 03:30:00
Name: x, dtype: object
Since novice code, I am making the things more explicit and adding the formatting letters %H and %M info:
df['cname'] = pd.to_datetime(df['cname'].astype(str).str.zfill(4), format = '%H%M').dt.time
print(df['cname'])
# %H Hour (24-hour clock) as a zero-padded decimal number. 07
# %M Minute as a zero-padded decimal number. 06

Split dataframe to several dataframes

I have following code:
Date X
...
2014-12-30 23:00:00 2
2014-12-30 23:15:00 0
2014-12-30 23:30:00 1
2014-12-30 23:45:00 1
2014-12-31 00:00:00 22
...
2015-01-01 00:00:00 0
2015-01-02 00:00:00 2
2015-01-03 00:00:00 2
2015-01-04 00:00:00 2
2015-01-04 00:00:00 2
2015-01-05 00:00:00 2
...
I want to split this time series (dataframe) into many time series (dataframe). I would like to have one time series for each Monday, one for all Tuesdays, Wednesdays ... etc.
How can I do that with pandas?
You can create dictionary of DataFrames with groupby and weekday_name:
dfs = dict(tuple(df.groupby(df['Date'].dt.weekday_name)))
#select by days
print (dfs['Friday'])
Date X
6 2015-01-02 2
print (dfs['Thursday'])
Date X
5 2015-01-01 0
Detail:
print (df['Date'].dt.weekday_name)
0 Tuesday
1 Tuesday
2 Tuesday
3 Tuesday
4 Wednesday
5 Thursday
6 Friday
7 Saturday
8 Sunday
9 Sunday
10 Monday
Name: Date, dtype: object

Groupby for datetime on a scale of hours (ignoring what day)

I have a series of floats with a datetimeindex that I have resampled into bins of 3 hours. As such I have an index containing
2015-01-01 09:00:00
2015-01-01 12:00:00
2015-01-01 15:00:00
2015-01-01 18:00:00
2015-01-01 21:00:00
2015-01-02 00:00:00
2015-01-02 03:00:00
2015-01-02 06:00:00
2015-01-02 09:00:00
and so forth. I am trying to sum the floats associated with each time of day, say 09:00:00, for all days.
The only way I can think to do it with my limited experience is to convert this series to a dataframe by using the date time index as another column, then running iterations to see if the hours slot of the date time is equal to one another than summing the values. I feel like this is horribly inefficient and probably not the 'correct' way to do this. Any help would be appreciated!
IIUC:
In [116]: s
Out[116]:
2015-01-01 09:00:00 3
2015-01-01 12:00:00 1
2015-01-01 15:00:00 0
2015-01-01 18:00:00 1
2015-01-01 21:00:00 0
2015-01-02 00:00:00 9
2015-01-02 03:00:00 2
2015-01-02 06:00:00 2
2015-01-02 09:00:00 7
2015-01-02 12:00:00 8
Freq: 3H, Name: val, dtype: int32
In [117]: s.groupby(s.index - s.index.normalize()).sum()
Out[117]:
00:00:00 9
03:00:00 2
06:00:00 2
09:00:00 10
12:00:00 9
15:00:00 0
18:00:00 1
21:00:00 0
Name: val, dtype: int32
or:
In [118]: s.groupby(s.index.hour).sum()
Out[118]:
0 9
3 2
6 2
9 10
12 9
15 0
18 1
21 0
Name: val, dtype: int32

python: compare two timestamp in different dates

I have a dataframe, the index is timestamp format with 'YYYY-MM-DD HH:MM:SS'
Now i want to divide this data frame into two parts.
one is the data with time before 12pm('YYYY-MM-DD 12:00:00') everyday
another is the data with time after 12pm for everyday.
I'm just stuck with this question for several days. Any suggestions?
Thank you.
If you have a DatetimeIndex (and if you don't, df.index = pd.to_datetime(df.index) should work to get one), then you can access .hour, e.g. df.index.hour, and select using that:
>>> df.head()
A
2015-01-01 00:00:00 0
2015-01-01 01:00:00 1
2015-01-01 02:00:00 2
2015-01-01 03:00:00 3
2015-01-01 04:00:00 4
>>> morning = df[df.index.hour < 12]
>>> afternoon = df[df.index.hour >= 12]
>>> morning.head()
A
2015-01-01 00:00:00 0
2015-01-01 01:00:00 1
2015-01-01 02:00:00 2
2015-01-01 03:00:00 3
2015-01-01 04:00:00 4
>>> afternoon.head()
A
2015-01-01 12:00:00 12
2015-01-01 13:00:00 13
2015-01-01 14:00:00 14
2015-01-01 15:00:00 15
2015-01-01 16:00:00 16
You could also use groupby, e.g. df.groupby(df.index.hour < 12), but that seems like overkill here. If you wanted a more complex division that might be the way to go, though.

Categories