How to round date time index in a pandas data frame? - python

There is a pandas dataframe like this:
index
2018-06-01 02:50:00 R 45.48 -2.8
2018-06-01 07:13:00 R 45.85 -2.0
...
2018-06-01 08:37:00 R 45.87 -2.7
I would like to round the index to the hour like this:
index
2018-06-01 02:00:00 R 45.48 -2.8
2018-06-01 07:00:00 R 45.85 -2.0
...
2018-06-01 08:00:00 R 45.87 -2.7
I am trying the following code:
df = df.date_time.apply ( lambda x : x.round('H'))
but returns a serie instead of a dataframe with the modified index column

Try using floor:
df.index.floor('H')
Setup:
df = pd.DataFrame(np.arange(25),index=pd.date_range('2018-01-01 01:12:50','2018-01-02 01:12:50',freq='H'),columns=['Value'])
df.head()
Value
2018-01-01 01:12:50 0
2018-01-01 02:12:50 1
2018-01-01 03:12:50 2
2018-01-01 04:12:50 3
2018-01-01 05:12:50 4
df.index = df.index.floor('H')
df.head()
Value
2018-01-01 01:00:00 0
2018-01-01 02:00:00 1
2018-01-01 03:00:00 2
2018-01-01 04:00:00 3
2018-01-01 05:00:00 4

Try my method:
Add a new column by the rounded value of hour:
df['E'] = df.index.round('H')
Set it as index:
df1 = df.set_index('E')
Delete the name you set('E' here):
df1.index.name = None
And now, df1 is a new DataFrame with index hour rounded from df.

Try this
df['index'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour,60*(dt.minute // 60)))

Related

replace dataframe values based on another dataframe

I have a pandas dataframe which is structured as follows:
timestamp y
0 2020-01-01 00:00:00 336.0
1 2020-01-01 00:15:00 544.0
2 2020-01-01 00:30:00 736.0
3 2020-01-01 00:45:00 924.0
4 2020-01-01 01:00:00 1260.0
...
The timestamp column is a datetime data type
and I have another dataframe with the following structure:
y
timestamp
00:00:00 625.076923
00:15:00 628.461538
00:30:00 557.692308
00:45:00 501.692308
01:00:00 494.615385
...
I this case, the time is the pandas datetime index.
Now what I want to do is replace the values in the first dataframe where the time field is matching i.e. the time of the day is matching with the second dataset.
IIUC your first dataframe df1's timestamp is datetime type and your second dataframe (df2) has an index of type datetime as well but only time and not date.
then you can do:
df1['y'] = df1['timestamp'].dt.time.map(df2['y'])
I wouldn't be surprised if there is a better way, but you can accomplish this by working to get the tables so that they can merge on the time. Assuming your dataframes will be df and df2.
df['time'] = df['timestamp'].dt.time
df2 = df2.reset_index()
df2['timestamp'] = pd.to_datetime(df2['timestamp'].dt.time
df_combined = pd.merge(df,df2,left_on='time',right_on='timestamp')
df_combined
timestamp_x y_x time timestamp_y y_y
0 2020-01-01 00:00:00 336.0 00:00:00 00:00:00 625.076923
1 2020-01-01 00:15:00 544.0 00:15:00 00:15:00 628.461538
2 2020-01-01 00:30:00 736.0 00:30:00 00:30:00 557.692308
3 2020-01-01 00:45:00 924.0 00:45:00 00:45:00 501.692308
4 2020-01-01 01:00:00 1260.0 01:00:00 01:00:00 494.615385
# This clearly has more than you need, so just keep what you want and rename things back.
df_combined = df_combined[['timestamp_x','y_y']]
df_combined = df_combined.rename(columns={'timestamp_x':'timestamp','y_y':'y'})
New answer I like way better: actually using .map()
Still need to get df2 to have the time column to match on.
df2 = df2.reset_index()
df2['timestamp'] = pd.to_datetime(df2['timestamp'].dt.time
df['y'] = df['timestamp'].dt.time.map(dict(zip(df2['timestamp',df2['y'])))

Change value if it repeats a certain number of times in a month

I have a dataframe with time data in the format:
date values
0 2013-01-01 00:00:00 0.0
1 2013-01-01 01:00:00 0.0
2 2013-01-01 02:00:00 -9999
3 2013-01-01 03:00:00 -9999
4 2013-01-01 04:00:00 0.0
.. ... ...
8754 2016-12-31 18:00:00 427.5
8755 2016-12-31 19:00:00 194.9
8756 2016-12-31 20:00:00 -9999
8757 2016-12-31 21:00:00 237.6
8758 2016-12-31 22:00:00 -9999
8759 2016-12-31 23:00:00 0.0
And I want every month that the value -9999 is repeated more than 175 times those values get changed to NaN.
Imagine that we have this other dataframe with the number of times the value is repeated per month:
date values
0 2013-01 200
1 2013-02 0
2 2013-03 2
3 2013-04 181
4 2013-05 0
5 2013-06 0
6 2013-07 66
7 2013-08 0
8 2013-09 7
In this case, the month of January and April passed the stipulated value and that first dataframe should be:
date values
0 2013-01-01 00:00:00 0.0
1 2013-01-01 01:00:00 0.0
2 2013-01-01 02:00:00 NaN
3 2013-01-01 03:00:00 NaN
4 2013-01-01 04:00:00 0.0
.. ... ...
8754 2016-12-31 18:00:00 427.5
8755 2016-12-31 19:00:00 194.9
8756 2016-12-31 20:00:00 -9999
8757 2016-12-31 21:00:00 237.6
8758 2016-12-31 22:00:00 -9999
8759 2016-12-31 23:00:00 0.0
I imagined creating a list using tolist() that separates the months that the value appears more than 175 times and then creating a condition if df["values"]==-9999 and df["date"] in list_with_months and then change the values.
You can do this using a transform call where you calculate the number of values per month in the same dataframe. Then you create a new column conditionally on this:
import numpy as np
MISSING = -9999
THRESHOLD = 175
# Create a month column
df['month'] = df['date'].dt.to_period('M')
# Count number of MISSING per month and assign to dataframe
df['n_missing'] = (
df.groupby('month')['values']
.transform(lambda d: (d == MISSING).sum())
)
# If value is MISSING and number of missing is above THRESHOLD, replace with NaN, otherwise keep original values
df['new_value'] = np.where(
(df['values'] == MISSING) & (df['n_missing'] > THRESHOLD),
np.nan,
df['values']
)

datetime difference between dates

I have a df like so:
firstdate seconddate
0 2011-01-01 13:00:00 2011-01-01 13:00:00
1 2011-01-02 14:00:00 2011-01-01 11:00:00
2 2011-01-02 16:00:00 2011-01-02 13:00:00
3 2011-01-04 12:00:00 2011-01-03 15:00:00
...
Seconddate is always before firstdate. I want to compute the difference between firstdate and seconddate in number of days and make this a column, if firstdate and seconddate are the same day, difference=0, if seconddate is the day before firstdate, difference=1 and so on until a week. How would I do this?
df['first'] = pd.to_datetime(df['first'])
df['second'] = pd.to_datetime(df['second'])
df['diff'] = (df['first'] - df['second']).dt.days
This will add a column with the diff. You can delete based on it
df.drop(df[df.diff < 0].index)
# or
df = df[df.diff > 0]

Group pandas rows into pairs then find timedelta

I have a dataframe where I need to group the TX/RX column into pairs, and then put these into a new dataframe with a new index and the timedelta between them as values.
df = pd.DataFrame()
df['time1'] = pd.date_range('2018-01-01', periods=6, freq='H')
df['time2'] = pd.date_range('2018-01-01', periods=6, freq='1H1min')
df['id'] = ids
df['val'] = vals
time1 time2 id val
0 2018-01-01 00:00:00 2018-01-01 00:00:00 1 A
1 2018-01-01 01:00:00 2018-01-01 01:01:00 2 B
2 2018-01-01 02:00:00 2018-01-01 02:02:00 3 A
3 2018-01-01 03:00:00 2018-01-01 03:03:00 4 B
4 2018-01-01 04:00:00 2018-01-01 04:04:00 5 A
5 2018-01-01 05:00:00 2018-01-01 05:05:00 6 B
needs to be...
index timedelta A B
0 1 1 2
1 1 3 4
2 1 5 6
I think that pivot_tables or stack/unstack is probably the best way to go about this, but I'm not entirely sure how...
I believe you need:
df = pd.DataFrame()
df['time1'] = pd.date_range('2018-01-01', periods=6, freq='H')
df['time2'] = df['time1'] + pd.to_timedelta([60,60,120,120,180,180], 's')
df['id'] = range(1,7)
df['val'] = ['A','B'] * 3
df['t'] = df['time2'] - df['time1']
print (df)
time1 time2 id val t
0 2018-01-01 00:00:00 2018-01-01 00:01:00 1 A 00:01:00
1 2018-01-01 01:00:00 2018-01-01 01:01:00 2 B 00:01:00
2 2018-01-01 02:00:00 2018-01-01 02:02:00 3 A 00:02:00
3 2018-01-01 03:00:00 2018-01-01 03:02:00 4 B 00:02:00
4 2018-01-01 04:00:00 2018-01-01 04:03:00 5 A 00:03:00
5 2018-01-01 05:00:00 2018-01-01 05:03:00 6 B 00:03:00
#if necessary convert to seconds
#df['t'] = (df['time2'] - df['time1']).dt.total_seconds()
df = df.pivot('t','val','id').reset_index().rename_axis(None, axis=1)
#if necessary aggregate values
#df = (df.pivot_table(index='t',columns='val',values='id', aggfunc='mean')
# .reset_index().rename_axis(None, axis=1))
print (df)
t A B
0 00:01:00 1 2
1 00:02:00 3 4
2 00:03:00 5 6

Adding holidays columns in a Dataframe in Python

I am trying to add holidays column for France in a Dataframe by using workalendar package but it gives me an error of
Series' object has no attribute 'weekday'
Below is my code;
from workalendar.europe import France
df1 = pd.read_csv('C:\Users\ABC.csv')
df1['Date'] = pd.to_datetime(df1['Date'], format= '%d/%m/%Y %H:%M:%S')
df1['Date1'] = df1.Date.dt.date
dr = df1['Date1']
cal = France()
df1['Holiday'] = cal.is_working_day(df1['Date1'])
df1.head()
The original data in the file looks like this;
Date Value
17/10/2012 19:00:00 0
17/10/2012 20:00:00 0.1
17/10/2012 21:00:00 0
17/10/2012 22:00:00 0
17/10/2012 23:00:00 0
18/10/2012 00:00:00 0
18/10/2012 01:00:00 0
18/10/2012 02:00:00 0
18/10/2012 03:00:00 0.1
18/10/2012 04:00:00 0
18/10/2012 05:00:00 0
18/10/2012 06:00:00 0
18/10/2012 07:00:00 0
18/10/2012 08:00:00 0.2
18/10/2012 09:00:00 0.5
`
Try this.
df1['Holiday'] = df1.Date.apply(lambda x: cal.is_working_day(pd.to_pydatetime(x)))
You have to convert the object type to datetime.
BTW, I thought that working_day would might not be Holiday...
Converting between datetime, Timestamp and datetime64

Categories