I have a table where it has a column 'Date', 'Time', 'Costs'.
I want to select rows where the time is greater than 12:00:00, then add 1 day to 'Date' column of the selected rows.
How should I go about in doing it?
So far I have:
df[df['Time']>'12:00:00']['Date'] = df[df['Time']>'12:00:00']['Date'].astype('datetime64[ns]') + timedelta(days=1)
I am a beginner in learning coding and any suggestions would be really helpful! Thanks.
Use to_datetime first for column Date if not datetimes, then convert column Time to string if possible python times, convert to datetimes and get hours by Series.dt.hour, compare and add 1 day by condition:
df = pd.DataFrame({'Date':['2015-01-02','2016-05-08'],
'Time':['10:00:00','15:00:00']})
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-08 15:00:00
df['Date'] = pd.to_datetime(df['Date'])
mask = pd.to_datetime(df['Time'].astype(str)).dt.hour > 12
df.loc[mask, 'Date'] += pd.Timedelta(days=1)
print (df)
Date Time
0 2015-01-02 10:00:00
1 2016-05-09 15:00:00
Related
Im unable to filter my dataframe to give me the details i need. Im trying to obtain the min value for any given 24 hour period but keep the detail in the filtered rows like the day name
below is the head of df dataframe. It has t datetimeindex format and ive pulled the hour column as an integer from the index time information and the low is a float. when i run the code between the two dataframes below the df2 is the second dataframe i get returned
df = pd.read_csv(r'C:/Users/Oliver/Documents/Data/eurusd1hr.csv')
df.drop(columns=(['Close', 'Open', 'High', 'Volume']), inplace=True)
df['t'] = (df['Date'] + ' ' + df['Time'])
df.drop(columns = ['Date', 'Time'], inplace=True)
df['t'] = pd.DatetimeIndex(df.t)
df.set_index('t', inplace=True)
df['hour'] = df.index.hour
df['day'] = df.index.day_name()
Low hour day
t
2003-05-05 03:00:00 1.12154 3 Monday
2003-05-05 04:00:00 1.12099 4 Monday
2003-05-05 05:00:00 1.12085 5 Monday
2003-05-05 06:00:00 1.12049 6 Monday
2003-05-05 07:00:00 1.12079 7 Monday```
```df2 = df.between_time('00:00', '23:00').groupby(pd.Grouper(freq='d')).min()```
``` Low hour day
t
2003-05-05 1.12014 3.0 Monday
2003-05-06 1.12723 0.0 Tuesday
2003-05-07 1.13265 0.0 Wednesday
2003-05-08 1.13006 0.0 Thursday
2003-05-09 1.14346 0.0 Friday```
I want to keep the corresponding hour in the hour column and also the hour information in the original index like the day name has been maintained
i was expecting the index and hour column to keep the information
ive tried add a 2nd grouper method but failed. I have also tried to reset the index. any help would be gratefully received. thanks
i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?
One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000
I'm want to return rows by checking for the maximum date of the month and then rechecking if the date falls in the last 2 weeks of that particular month. Below is the DataFrame that I'm using:
finalPrize date high low
1777.44 2018-07-31 1801.83 1739.32
1797.17 2018-06-27 1798.44 1776.02
1834.33 2018-05-28 1836.56 1786.00
1823.29 2018-04-03 1841.00 1821.50
1847.75 2018-03-29 1847.77 1818.92
I have referred other answers and found a way to find the max date from the 'date' column. Here is the code:
df.index = df['date']
print(df.groupby(df.index.month).apply(lambda x: x.index.max()))
But, this results into:
date
1 2019-07-31
2 2019-06-27
3 2019-05-28
4 2019-04-03
5 2019-03-29
Rather, I want to return all the values from the rows where these dates occur But that date should fall in last 2 weeks. I'm not able to figure out how to do that!
So expected output is:
finalPrize date high low
1777.44 2018-07-31 1801.83 1739.32
1797.17 2018-06-27 1798.44 1776.02
1834.33 2018-05-28 1836.56 1786.00
1847.75 2018-03-29 1847.77 1818.92
import calendar
df.index = pd.to_datetime(df.index)
df['day'] = pd.to_numeric(df.index.day)
df['days_in_month'] = df.apply(lambda row : calendar.monthrange(row.name.year,row.name.month)[1], axis = 1)
df['first_day'] = df.apply(lambda row : calendar.monthrange(row.name.year,row.name.month)[0], axis = 1)
df['days_in_last_week'] = ((df['days_in_month'])%7+df['first_day'])%7
df[df['day'] > (df['days_in_month'] - df['days_in_last_week'])]
Hope this works!Do this after you set date to index.
I am attempting to add a year to a column of dates in a pandas dataframe, but when I use pd.to_timedelta I get additional hours & minutes. I know I could take the updated time and truncate the hours, but I feel like there must be a way to add a year precisely. My attempt as follows:
import pandas as pd
dates = pd.DataFrame({'date':['20170101','20170102','20170103']})
dates['date'] = pd.to_datetime(dates['date'], format='%Y%m%d')
dates['date2'] = dates['date'] + pd.to_timedelta(1, unit='y')
dates
yields:
Out[1]:
date date2
0 2017-01-01 2018-01-01 05:49:12
1 2017-01-02 2018-01-02 05:49:12
2 2017-01-03 2018-01-03 05:49:12
How can I add a year without adding 05:49:12 HH:mm:ss?
In [99]: dates['date'] + pd.offsets.DateOffset(years=1)
Out[99]:
0 2018-01-01
1 2018-01-02
2 2018-01-03
Name: date, dtype: datetime64[ns]
leap year check:
In [100]: pd.to_datetime(['2011-02-28', '2012-02-29']) + pd.offsets.DateOffset(years=1)
Out[100]: DatetimeIndex(['2012-02-28', '2013-02-28'], dtype='datetime64[ns]', freq=None)
You can normalize via pd.Series.dt.normalize:
dates['date2'] = (dates['date'] + pd.to_timedelta(1, unit='y')).dt.normalize()
Or convert datetime to date
dates['date'] = dates['date'].apply(lambda a: a.date())
Edit: This works if you don't care about leap years, etc. Otherwise see jp_data_analysis's answer.
You can use 365 and unit='d':
pd.to_timedelta(365, unit='d')
You can access the components of a date (year, month and day) using code of the form dataframe["column"].dt.component.
For example, the month component is dataframe["column"].dt.month, and the year component is dataframe["column"].dt.year.
I have a dataframe column which looks like this :
It reads M:S.MS. How can I convert it into a M:S:MS timeformat so I can plot it as a time series graph?
If I plot it as it is, python throws an Invalid literal for float() error.
Note
: This dataframe contains one hour worth of data. Values between
0:0.0 - 59:59.9
df = pd.DataFrame({'date':['00:02.0','00:05:0','00:08.1']})
print (df)
date
0 00:02.0
1 00:05:0
2 00:08.1
It is possible convert to datetime:
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f')
print (df)
date
0 1900-01-01 00:00:02.000
1 1900-01-01 00:00:05.000
2 1900-01-01 00:00:08.100
Or to timedeltas:
df['date'] = pd.to_timedelta(df['date'].radd('00:'))
print (df)
date
0 00:00:02
1 00:00:05
2 00:00:08.100000
EDIT:
For custom date use:
date = '2015-01-04'
td = pd.to_datetime(date) - pd.to_datetime('1900-01-01')
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f') + td
print (df)
date
0 2015-01-04 00:00:02.000
1 2015-01-04 00:00:05.000
2 2015-01-04 00:00:08.100