I have the dataframe
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB03', 'AB04','AB05', 'AB06'],
'l_date': ["1/4/2021","1/4/2021",'1/5/2021','1/5/2021','1/8/2021', np.nan],
'l_time': ["17:05",
"6:00","13:43:10","00:00",np.nan,np.nan]
}
)
And I want to create a new column which combines l_date and l_time into a datetime column, l_datetime.
My code is this
cols = ['l_date','l_time']
df['d_datetime'] = df[cols].astype(str).agg(' '.join, axis=1)
df['d_datetime'] = df['d_datetime'].replace({'nan':''}, regex=True)
df['d_datetime'] = pd.to_datetime(df['d_datetime'], errors="coerce").dt.strftime("%d/%m/%Y %H:%M")
Now, this generates time for AB05 as 00:00 and creates the datetime. But for the ones which doesn't time in column l_time, I want the d_datetime to only have the date. How can I achieve this?
Intially I tried
df['d_datetime'] = df['d_datetime'].replace({' 00:00':''}, regex=True)
But this removes the time for AB04 too and I don't want that. How can I achieve the end result looks like below?
UPDATE
From the below result:
I want to check if l_time is NaN and if it is then, I want to apply replace({'00:00':''}) to that row. How can I achieve this?
Use:
df['d_datetime'] = (pd.to_datetime(df['l_date']).dt.strftime("%d/%m/%Y") + ' ' +
pd.to_datetime(df['l_time']).dt.time.replace(np.nan, '').astype(str).str[0:5]).str.strip()
OUTPUT:
ID l_date l_time d_datetime
0 AB01 1/4/2021 17:05 04/01/2021 17:05
1 AB02 1/4/2021 6:00 04/01/2021 06:00
2 AB03 1/5/2021 13:43:10 05/01/2021 13:43
3 AB04 1/5/2021 00:00 05/01/2021 00:00
4 AB05 1/8/2021 NaN 08/01/2021
5 AB06 NaN NaN NaN
df.loc[df["l_time"].isnull(), "d_datetime"] = df["d_datetime"].replace(
{"00:00": ""}, regex=True
)
Here is the solution:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB03', 'AB04','AB05', 'AB06'],
'l_date': ["1/4/2021","1/4/2021",'1/5/2021','1/5/2021','1/8/2021', np.nan],
'l_time': ["17:05",
"6:00","13:43:10","00:00",np.nan,np.nan]
}
)
df.l_time = df.l_time.fillna('')
df['d_datetime']= df['l_date'].astype(str)+" "+ df['l_time'].astype(str)
print(df)
Related
Hi i am looking for a more elegant solution than my code. i have a given df which look like this:
import pandas as pd
from pandas.tseries.offsets import DateOffset
sdate = date(2021,1,31)
edate = date(2021,8,30)
date_range = pd.date_range(sdate,edate-timedelta(days=1),freq='m')
df_test = pd.DataFrame({ 'Datum': date_range})
i take this df and have to insert a new first row with the minimum date
data_perf_indexed_vv = df_test.copy()
minimum_date = df_test['Datum'].min()
data_perf_indexed_vv = data_perf_indexed_vv.reset_index()
df1 = pd.DataFrame([[np.nan] * len(data_perf_indexed_vv.columns)],
columns=data_perf_indexed_vv.columns)
data_perf_indexed_vv = df1.append(data_perf_indexed_vv, ignore_index=True)
data_perf_indexed_vv['Datum'].iloc[0] = minimum_date - DateOffset(months=1)
data_perf_indexed_vv.drop(['index'], axis=1)
may somebody have a shorter or more elegant solution. thanks
Instead of writing such big 2nd block of code just make use of:
df_test.loc[len(df_test)+1,'Datum']=(df_test['Datum'].min()-DateOffset(months=1))
Finally make use of sort_values() method:
df_test=df_test.sort_values(by='Datum',ignore_index=True)
Now if you print df_test you will get desired output:
#output
Datum
0 2020-12-31
1 2021-01-31
2 2021-02-28
3 2021-03-31
4 2021-04-30
5 2021-05-31
6 2021-06-30
7 2021-07-31
Edit: You can use the alleged duplicate solution with reindex() if your dates don't include times, otherwise you need a solution like the one by #kosnik. In addition, their solution doesn't need your dates to be the index!
I have data formatted like this
df = pd.DataFrame(data=[['2017-02-12 20:25:00', 'Sam', '8'],
['2017-02-15 16:33:00', 'Scott', '10'],
['2017-02-15 16:45:00', 'Steve', '5']],
columns=['Datetime', 'Sender', 'Count'])
df['Datetime'] = pd.to_datetime(df['Datetime'], format='%Y-%m-%d %H:%M:%S')
Datetime Sender Count
0 2017-02-12 20:25:00 Sam 8
1 2017-02-15 16:33:00 Scott 10
2 2017-02-15 16:45:00 Steve 5
I need there to be at least one row for every date, so the expected result would be
Datetime Sender Count
0 2017-02-12 20:25:00 Sam 8
1 2017-02-13 00:00:00 None 0
2 2017-02-14 00:00:00 None 0
3 2017-02-15 16:33:00 Scott 10
4 2017-02-15 16:45:00 Steve 5
I have tried to make datetime the index, add the dates and use reindex() like so
df.index = df['Datetime']
values = df['Datetime'].tolist()
for i in range(len(values)-1):
if values[i].date() + timedelta < values[i+1].date():
values.insert(i+1, pd.Timestamp(values[i].date() + timedelta))
print(df.reindex(values, fill_value=0))
This makes every row forget about the other columns and the same thing happens for asfreq('D') or resample()
ID Sender Count
Datetime
2017-02-12 16:25:00 0 Sam 8
2017-02-13 00:00:00 0 0 0
2017-02-14 00:00:00 0 0 0
2017-02-15 20:25:00 0 0 0
2017-02-15 20:25:00 0 0 0
What would be the appropriate way of going about this?
I would create a new DataFrame column which contains all the required data and then left join with your data frame.
A working code example is the following
df['Datetime'] = pd.to_datetime(df['Datetime']) # first convert to datetimes
datetimes = df['Datetime'].tolist() # these are existing datetimes - will add here the missing
dates = [x.date() for x in datetimes] # these are converted to dates
min_date = min(dates)
max_date = max(dates)
for d in range((max_date - min_date).days):
forward_date = min_date + datetime.timedelta(d)
if forward_date not in dates:
datetimes.append(np.datetime64(forward_date))
# create new dataframe, merge and fill 'Count' column with zeroes
df = pd.DataFrame({'Datetime': datetimes}).merge(df, on='Datetime', how='left')
df['Count'].fillna(0, inplace=True)
I want to add a number of months to the end of my dataframe.
What is the best way to append another six (or 12) months to such a dataframe using dates?
0 2013-07-31
1 2013-08-31
2 2013-09-30
3 2013-10-31
4 2013-11-30
Thanks
Edit: I think you might want pd.date_range
df = pd.DataFrame({'date':['2010-01-31', '2010-02-28'], 'x':[1,2]})
df['date'] = pd.to_datetime(df.date)
date x
0 2010-01-31 1
1 2010-02-28 2
Then
df.append(pd.DataFrame({'date': pd.date_range(start=df.date.iloc[-1], periods=6, freq='M', closed='right')}))
date x
0 2010-01-31 1.0
1 2010-02-28 2.0
0 2010-03-31 NaN
1 2010-04-30 NaN
2 2010-05-31 NaN
3 2010-06-30 NaN
4 2010-07-31 NaN
After looking into append and other loop sort of options I created this:
length = df.shape [ 0 ]
add = 12
start = df [ 'month' ].iloc [ 0 ]
count = int ( length + add )
dt = pd.date_range ( start, periods = count, freq = 'M' )
this is the dt I get. It gives the proper ending month days.
DatetimeIndex(['2013-07-31', '2013-08-31', '2013-09-30', '2013-10-31',
'2013-11-30', '2013-12-31', '2014-01-31', '2014-02-28',
'2014-03-31', '2014-04-30', '2014-05-31', '2014-06-30'],
dtype='datetime64[ns]', freq='M')
now I just have to change from the DatetimeIndex.
I hope this is good code. Cheers.
My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30
I have 2 DataFrames that currently looks like this:
raw_data = {'SeriesDate':['2017-03-10','2017-03-13','2017-03-14','2017-03-15','2017-03-16','2017-03-17']}
import pandas as pd
df1 = pd.DataFrame(raw_data,columns=['SeriesDate'])
df1['SeriesDate'] = pd.to_datetime(df['SeriesDate'])
print df1
SeriesDate
0 2017-03-10
1 2017-03-13
2 2017-03-14
3 2017-03-15
4 2017-03-16
5 2017-03-17
raw_data2 = {'SeriesDate':['2017-03-10','2017-03-13','2017-03-14','2017-03-15','2017-03-16'],'NewSeriesDate':['2017-03-11','2017-03-12','2017-03-13','2017-03-14','2017-03-14']}
df2 = pd.DataFrame(raw_data2,columns=['SeriesDate','NewSeriesDate'])
df2['SeriesDate'] = pd.to_datetime(df['SeriesDate'])
print df2
SeriesDate NewSeriesDate
0 2017-03-10 2017-03-11
1 2017-03-13 2017-03-12
2 2017-03-14 2017-03-13
3 2017-03-15 2017-03-14
4 2017-03-16 2017-03-14
1) I would like to join the dataframes in such a manner that for all 'SeriesDate' in df1 before 15th March, the 'NewSeriesDate' values should be taken from df2.
2) For any 'SeriesDate' in df1 after 15th March or for any 'SeriesDate' that are not in df2, the 'NewSeriesDate' should be calculated as follows:
from pandas.tseries.offsets import BDay
df1['NewSeriesDate'] = df1[''SeriesDate'] - BDay(1)
As an example, my final DataFrame in this scenario would look like this:
raw_data3 = {'SeriesDate':['2017-03-10','2017-03-13','2017-03-14','2017-03-15','2017-03-16','2017-03-17'],'NewSeriesDate':['2017-03-11','2017-03-12','2017-03-13','2017-03-14','2017-03-15','2017-03-16']}
finaldf = pd.DataFrame(raw_data3,columns=['SeriesDate','NewSeriesDate'])
finaldf['SeriesDate'] = pd.to_datetime(df['SeriesDate'])
print finaldf
SeriesDate NewSeriesDate
0 2017-03-10 2017-03-11
1 2017-03-13 2017-03-12
2 2017-03-14 2017-03-13
3 2017-03-15 2017-03-14
4 2017-03-16 2017-03-15
5 2017-03-17 2017-03-16
I am new to Pandas so not sure how to apply conditional merge, can anyone provide some insight please?
Try this out. It could probably be a little cleaner, but does the trick. You didn't specify what happens if the date is exactly March 15th, so I made an assumption. I may have switched out some header names, but you get the idea:
import pandas as pd
from pandas.tseries.offsets import BDay
import numpy as np
df1 = pd.DataFrame({
'SeriesDate':pd.to_datetime(['3/10/17','3/13/17','3/14/17','3/15/17','3/16/17','3/17/17']),
})
df1['NewSeries'] = np.nan
df2 = pd.DataFrame({
'SeriesDate':pd.to_datetime(['3/10/17','3/13/17','3/14/17','3/15/17','3/16/17']),
'NewSeries':pd.to_datetime(['3/11/17','3/12/17','3/13/17','3/14/17','3/14/17'])
})
d = pd.to_datetime('3/15/17')
df1.loc[df1['SeriesDate'] <= d] = df1.loc[df1['SeriesDate'] <= d].set_index('SeriesDate') \
.combine_first(df2.loc[df2['SeriesDate'] <= d].set_index('SeriesDate')).reset_index()
df1.loc[df1['SeriesDate'] > d, 'NewSeries'] = df1['SeriesDate'] - BDay(1)