Pandas updating weekend date to nearest business day - python

I have a dataframe that currently looks like this:
raw_data = {'AllDate':['2017-04-05','2017-04-06','2017-04-07','2017-04-08','2017-04-09']}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['AllDate'])
print df
I would like to add a WeekDate column to this dataframe such as if the date in the 'AllDate' falls on the weekend, the 'WeekDate' column has the date from the Friday before. If the date falls on the weekday, the date should remain the same.
As an example, the resulting DataFrame should look like this:
raw_data = {'AllDate':['2017-04-05','2017-04-06','2017-04-07','2017-04-08','2017-04-09'],'WeekDate':['2017-04-05','2017-04-06','2017-04-07','2017-04-07','2017-04-07']}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['AllDate','WeekDate'])
print df
Any ideas how I could achieve this?

This works best (adding to the answer posted by Zhe):
import pandas as pd
import time
from datetime import datetime,timedelta
df = pd.DataFrame({'AllDate':['2017-04-05','2017-04-06','2017-04-07','2017-04-08','2017-04-09']})
df['WeekDate'] = [x if x.weekday() not in [5,6] else x - timedelta(days = (x.weekday()-4)) for x in pd.to_datetime(df['AllDate'])]

Try:
import pandas as pd
import time
df = pd.DataFrame({
'AllDate':['2017-04-05','2017-04-06','2017-04-07','2017-04-08','2017-04-09']
})
df['WeekDate'] = [
x if x.weekday() not in [5,6] else None for x in pd.to_datetime(df['AllDate'])
]
print(df.ffill())

Here's perhaps a simpler answer that comes up a lot dealing with timeseries, etc. Key is the offset objects available in Pandas tseries
df = pd.DataFrame({"AllDate": ["2017-04-01", "2017-04-02", "2017-04-03", "2017-04-04", "2017-04-09"]})
df["AllDate"] = pd.to_datetime(df["AllDate"])
df["PrevBusDate"] = df["AllDate"].apply(pd.tseries.offsets.BusinessDay().rollback)
df.head()
...
>>> AllDate PrevBusDate
0 2017-04-01 2017-03-31
1 2017-04-02 2017-03-31
2 2017-04-03 2017-04-03
3 2017-04-04 2017-04-04
4 2017-04-09 2017-04-07
NB: Don't have to convert the 'AllDate' column if you don't want to. Can simply generate the offsets and work with them however you like, eg:
[pd.tseries.offsets.BusinessDay().rollback(d) for d in pd.to_datetime(df["AllDate"])]

Related

Python: Create new column that counts the days between current date and a lag date

I want to create a function that counts the days as an integer between a date and the date shifted back a number of periods (e.g. df['new_col'] = (df['date'].shift(#periods)-df['date']). The date variable is datetime64[D].
As an example: df['report_date'].shift(39) = '2008-09-26' and df['report_date'] = '2008-08-18' and df['delta'] = 39.
import pandas as pd
from datetime import datetime
from datetime import timedelta
import datetime as dt
dates =pd.Series(np.tile(['2012-08-01','2012-08-15','2012-09-01','2012-08-15'],4)).astype('datetime64[D]')
dates2 =pd.Series(np.tile(['2012-08-01','2012-09-01','2012-10-01','2012-11-01'],4)).astype('datetime64[D]')
stocks = ['A','A','A','A','G','G','G','G','B','B','B','B','F','F','F','F']
stocks = pd.Series(stocks)
df = pd.DataFrame(dict(stocks = stocks, dates = dates,report_date = dates2)).reset_index()
df.head()
print('df info:',df.info())
The code below is my latest attempt to create this variable, but the code produces incorrect results.
df['delta'] = df.groupby(['stocks','dates'])['report_date'].transform(lambda x: (x.shift(1).rsub(x).dt.days))
I came up with the solution of using a for loop and zip function, to simply subtract each pair like so...
from datetime import datetime
import pandas as pd
dates = ['2012-08-01', '2012-08-15', '2012-09-01', '2012-08-15']
dates2 = ['2012-08-01', '2012-09-01', '2012-10-01', '2012-11-01']
diff = []
for i, x in zip(dates, dates2):
i = datetime.strptime(i, '%Y-%m-%d')
x = datetime.strptime(x, '%Y-%m-%d')
diff.append(i - x)
df = {'--col1--': dates, '--col2--': dates2, '--difference--': diff}
df = pd.DataFrame(df)
print(df)
Ouput:
--col1-- --col2-- --difference--
0 2012-08-01 2012-08-01 0 days
1 2012-08-15 2012-09-01 -17 days
2 2012-09-01 2012-10-01 -30 days
3 2012-08-15 2012-11-01 -78 days
Process finished with exit code 0
I hope that solves your problem.

Cannot find index of corresponding date in pandas DataFrame

I have the following DataFrame with a Date column,
0 2021-12-13
1 2021-12-10
2 2021-12-09
3 2021-12-08
4 2021-12-07
...
7990 1990-01-08
7991 1990-01-05
7992 1990-01-04
7993 1990-01-03
7994 1990-01-02
I am trying to find the index for a specific date in this DataFrame using the following code,
# import raw data into DataFrame
df = pd.DataFrame.from_records(data['dataset']['data'])
df.columns = data['dataset']['column_names']
df['Date'] = pd.to_datetime(df['Date'])
# sample date to search for
sample_date = dt.date(2021,12,13)
print(sample_date)
# return index of sample date
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
The output of the program is,
2021-12-13
[]
I can't understand why. I have cast the Date column in the DataFrame to a DateTime and I'm doing a like-for-like comparison.
I have reproduced your Dataframe with minimal samples. By changing the way that you can compare the date will work like this below.
import pandas as pd
import datetime as dt
df = pd.DataFrame({'Date':['2021-12-13','2021-12-10','2021-12-09','2021-12-08']})
df['Date'] = pd.to_datetime(df['Date'].astype(str), format='%Y-%m-%d')
sample_date = dt.datetime.strptime('2021-12-13', '%Y-%m-%d')
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
output:
[0]
The search data was in the index number 0 of the DataFrame
Please let me know if this one has any issues

pandas insert new first row and calculate timestamp based on minimum date

Hi i am looking for a more elegant solution than my code. i have a given df which look like this:
import pandas as pd
from pandas.tseries.offsets import DateOffset
sdate = date(2021,1,31)
edate = date(2021,8,30)
date_range = pd.date_range(sdate,edate-timedelta(days=1),freq='m')
df_test = pd.DataFrame({ 'Datum': date_range})
i take this df and have to insert a new first row with the minimum date
data_perf_indexed_vv = df_test.copy()
minimum_date = df_test['Datum'].min()
data_perf_indexed_vv = data_perf_indexed_vv.reset_index()
df1 = pd.DataFrame([[np.nan] * len(data_perf_indexed_vv.columns)],
columns=data_perf_indexed_vv.columns)
data_perf_indexed_vv = df1.append(data_perf_indexed_vv, ignore_index=True)
data_perf_indexed_vv['Datum'].iloc[0] = minimum_date - DateOffset(months=1)
data_perf_indexed_vv.drop(['index'], axis=1)
may somebody have a shorter or more elegant solution. thanks
Instead of writing such big 2nd block of code just make use of:
df_test.loc[len(df_test)+1,'Datum']=(df_test['Datum'].min()-DateOffset(months=1))
Finally make use of sort_values() method:
df_test=df_test.sort_values(by='Datum',ignore_index=True)
Now if you print df_test you will get desired output:
#output
Datum
0 2020-12-31
1 2021-01-31
2 2021-02-28
3 2021-03-31
4 2021-04-30
5 2021-05-31
6 2021-06-30
7 2021-07-31

Convert date column in dataframe to unix python

I would like to convert a date column in a Dataframe to unixtime in a new column.
Index Date Second Measurement
0 0 2020-02-24 10:52:38 0.000 0.001155460021
1 1 2020-02-24 10:52:39 0.109 0.001124729984
2 2 2020-02-24 10:52:40 0.203 0.001119069988
I tried a lot, but always get an error. This does not work:
laser['unixtime'] = laser['Date'].arrow.timestamp()
laser['unixtime'] = laser['Date'].timestamp()
laser['unixtime'] = time.mktime(laser['Date'].timetuple())
Can anyone help me out?
greets
Solution with two examples (one when the Date column is a string and one when it is not).
import pandas as pd
from datetime import timezone
# Suppress scientific notation in Pandas
pd.set_option('display.float_format', lambda x: f"{x:.0f}")
df = pd.DataFrame()
df["Date"] = pd.date_range(start='2014-08-01 09:00', freq='H', periods=3, tz='Europe/Berlin')
df["DateUnix"] = df.Date.map(lambda x: x.replace(tzinfo=timezone.utc).timestamp())
df
df2 = pd.DataFrame({"Date": ["2020-02-24 10:52:38"]})
# Convert from object to datetime
df2.Date = pd.to_datetime(df2.Date)
df2["DateUnix"] = df2.Date.map(lambda x: x.replace(tzinfo=timezone.utc).timestamp())
df2

Python data-frame using pandas

I have a dataset which looks like below
[25/May/2015:23:11:15 000]
[25/May/2015:23:11:15 000]
[25/May/2015:23:11:16 000]
[25/May/2015:23:11:16 000]
Now i have made this into a DF and df[0] has [25/May/2015:23:11:15 and df[1] has 000]. I want to send all the data which ends with same seconds to a file. in the above example they end with 15 and 16 as seconds. So all ending with 15 seconds into one and the other into a different one and many more
I have tried the below code
import pandas as pd
data = pd.read_csv('apache-access-log.txt', sep=" ", header=None)
df = pd.DataFrame(data)
print(df[0],df[1].str[-2:])
Converting that column to a datetime would make it easier to work on, e.g.:
df['date'] = pd.to_datetime(df['date'], format='%d/%B/%Y:%H:%m:%S')
The you can simply iterate over a groupby(), e.g.:
In []:
for k, frame in df.groupby(df['date'].dt.second):
#frame.to_csv('file{}.csv'.format(k))
print('{}\n{}\n'.format(k, frame))
Out[]:
15
date value
0 2015-11-25 23:00:15 0
1 2015-11-25 23:00:15 0
16
date value
2 2015-11-25 23:00:16 0
3 2015-11-25 23:00:16 0
You can set your datetime as the index for the dataframe, and then use loc and to_csv Pandas' functions. Obviously, as other answers points out, you should convert your date to datetime while reading your dataframe.
Example:
df = df.set_index(['date'])
df.loc['25/05/2018 23:11:15':'25/05/2018 23:11:15'].to_csv('df_data.csv')
Try out this,
## Convert a new column with seconds value
df['seconds'] = df.apply(lambda row: row[0].split(":")[3].split(" ")[0], axis=1)
for sec in df['seconds'].unique():
## filter by seconds
print("Resutl ",df[df['seconds'] == sec])

Categories