Suppose I have a dataframe df looking like this
df
TimeStamp. Column1......Column n.
2017-01-01
2017-01-02
...
But I want it like this
TimeStamp. Column1......Column n.
2017-01-01 00:00:00
2017-01-02.00:00:00
...
How can I add this (00:00:00) to all TimeStamps in the dataframe? Thanks
Find the below code:
import pandas as pd
df=pd.DataFrame([{"Timestamp":"2017-01-01"},{"Timestamp":"2017-01-01"}],columns=['Timestamp'])
df_new=df['Timestamp'].apply(lambda k:k+" 00:00:00")
Output:
df_new['Timestamp']
0 2017-01-01 00:00:00
1 2017-01-01 00:00:00
Name: Timestamp, dtype: object
import pandas as pd
from datetime import datetime, timedelta
Name = ['a', 'b', 'c', 'd']
Age = [10, 20, 30, 40]
somedate = datetime.date(datetime.now())
DOB = [somedate] * 4
somelistdata = list(zip(Name, Age, DOB))
df = pd.DataFrame(somelistdata, columns = ['Name', 'Age', 'DOB'])
# problem statement
print(df)
# solution to your problem
df['DOB'] = pd.to_datetime(df['DOB']).dt.strftime('%Y-%m-%d %H:%M:%S')
print(df)
Problem statement
Name Age DOB
0 a 10 2019-09-19
1 b 20 2019-09-19
2 c 30 2019-09-19
3 d 40 2019-09-19
Solution
Name Age DOB
0 a 10 2019-09-19 00:00:00
1 b 20 2019-09-19 00:00:00
2 c 30 2019-09-19 00:00:00
3 d 40 2019-09-19 00:00:00
Related
I have a dataframe:
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
I would like to convert the time based on conditions: if the hour is less than 9, I want to set it to 9 and if the hour is more than 17, I need to set it to 17.
I tried this approach:
df['time'] = np.where(((df['time'].dt.hour < 9) & (df['time'].dt.hour != 0)), dt.time(9, 00))
I am getting an error: Can only use .dt. accesor with datetimelike values.
Can anyone please help me with this? Thanks.
Here's a way to do what your question asks:
df.time = pd.to_datetime(df.time)
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64') + (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64') + (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
Input:
time
0 2022-06-06 08:45:00
1 2022-06-06 09:30:00
2 2022-06-06 18:00:00
3 2022-06-06 15:00:00
Output:
time
0 2022-06-06 09:45:00
1 2022-06-06 09:30:00
2 2022-06-06 17:00:00
3 2022-06-06 15:00:00
UPDATE:
Here's alternative code to try to address OP's error as described in the comments:
import pandas as pd
import datetime
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print('', 'df loaded as strings:', df, sep='\n')
df.time = pd.to_datetime(df.time, format='%H:%M:%S')
print('', 'df converted to datetime by pd.to_datetime():', df, sep='\n')
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64') + (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64') + (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.time = [time.time() for time in pd.to_datetime(df.time)]
print('', 'df with time column adjusted to have hour between 9 and 17, converted to type "time":', df, sep='\n')
Output:
df loaded as strings:
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
df converted to datetime by pd.to_datetime():
time
0 1900-01-01 08:45:00
1 1900-01-01 09:30:00
2 1900-01-01 18:00:00
3 1900-01-01 15:00:00
df with time column adjusted to have hour between 9 and 17, converted to type "time":
time
0 09:45:00
1 09:30:00
2 17:00:00
3 15:00:00
UPDATE #2:
To not just change the hour for out-of-window times, but to simply apply 9:00 and 17:00 as min and max times, respectively (see OP's comment on this), you can do this:
df.loc[df['time'].dt.hour < 9, 'time'] = pd.to_datetime(pd.DataFrame({
'year':df['time'].dt.year, 'month':df['time'].dt.month, 'day':df['time'].dt.day,
'hour':[9]*len(df.index)}))
df.loc[df['time'].dt.hour > 17, 'time'] = pd.to_datetime(pd.DataFrame({
'year':df['time'].dt.year, 'month':df['time'].dt.month, 'day':df['time'].dt.day,
'hour':[17]*len(df.index)}))
df['time'] = [time.time() for time in pd.to_datetime(df['time'])]
Since your 'time' column contains strings they can kept as strings and assign new string values where appropriate. To filter for your criteria it is convenient to: create datetime Series from the 'time' column, create boolean Series by comparing the datetime Series with your criteria, use the boolean Series to filter the rows which need to be changed.
Your data:
import numpy as np
import pandas as pd
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print(df.to_string())
>>>
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
Convert to datetime, make boolean Series with your criteria
dts = pd.to_datetime(df['time'])
lt_nine = dts.dt.hour < 9
gt_seventeen = (dts.dt.hour >= 17)
print(lt_nine)
print(gt_seventeen)
>>>
0 True
1 False
2 False
3 False
Name: time, dtype: bool
0 False
1 False
2 True
3 False
Name: time, dtype: bool
Use the boolean series to assign a new value:
df.loc[lt_nine,'time'] = '09:00:00'
df.loc[gt_seventeen,'time'] = '17:00:00'
print(df.to_string())
>>>
time
0 09:00:00
1 09:30:00
2 17:00:00
3 15:00:00
Or just stick with strings altogether and create the boolean Series using regex patterns and .str.match.
data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00','07:22:00','22:02:06']}
dg = pd.DataFrame(data)
print(dg.to_string())
>>>
time
0 08:45:00
1 09:30:00
2 18:00:00
3 15:00:00
4 07:22:00
5 22:02:06
# regex patterns
pattern_lt_nine = '^00|01|02|03|04|05|06|07|08'
pattern_gt_seventeen = '^17|18|19|20|21|22|23'
Make boolean Series and assign new values
gt_seventeen = dg['time'].str.match(pattern_gt_seventeen)
lt_nine = dg['time'].str.match(pattern_lt_nine)
dg.loc[lt_nine,'time'] = '09:00:00'
dg.loc[gt_seventeen,'time'] = '17:00:00'
print(dg.to_string())
>>>
time
0 09:00:00
1 09:30:00
2 17:00:00
3 15:00:00
4 09:00:00
5 17:00:00
Time series / date functionality
Working with text data
I have data that is in this inconvenient format. Simple reproducible example below:
26/9/21 26/9/21
10:00 Paul
12:00 John
27/9/21 27/9/21
1:00 Ringo
As you can see, the dates have not been entered as a column. Instead, the dates repeat across rows as a "header" row for the rows below it. Each date then has a variable number of data rows beneath it, before the next date "header" row.
The output I would like would be:
26/9/21 10:00 Paul
26/9/21 12:00 John
27/9/21 1:00 Ringo
How can I do this in Python and Pandas?
Code for data entry below:
import pandas as pd
df = pd.DataFrame({'a': ['26/9/21', '10:00', '12:00', '27/9/21', '1:00'],
'b': ['26/9/21', 'Paul', 'John', '27/9/21', 'Ringo']})
df
Convert your column a to datetime with errors='coerce' then fill forward. Now you can add the time offset rows.
sra = pd.to_datetime(df['a'], format='%d/%m/%y', errors='coerce')
msk = sra.isnull()
sra = sra.ffill() + pd.to_timedelta(df.loc[msk, 'a'] + ':00')
out = pd.merge(sra[msk], df['b'], left_index=True, right_index=True)
>>> out
a b
1 2021-09-26 10:00:00 John
2 2021-09-26 12:00:00 Paul
4 2021-09-27 01:00:00 Ringo
Step by step:
>>> sra = pd.to_datetime(df['a'], format='%d/%m/%y', errors='coerce')
0 2021-09-26
1 NaT
2 NaT
3 2021-09-27
4 NaT
Name: a, dtype: datetime64[ns]
>>> msk = sra.isnull()
0 False
1 True
2 True
3 False
4 True
Name: a, dtype: bool
>>> sra = sra.ffill() + pd.to_timedelta(df.loc[msk, 'a'] + ':00')
0 NaT
1 2021-09-26 10:00:00
2 2021-09-26 12:00:00
3 NaT
4 2021-09-27 01:00:00
Name: a, dtype: datetime64[ns]
>>> out = pd.merge(sra[msk], df['b'], left_index=True, right_index=True)
a b
1 2021-09-26 10:00:00 John
2 2021-09-26 12:00:00 Paul
4 2021-09-27 01:00:00 Ringo
Following is simple to understand code, reading original dataframe row by row and creating a new dataframe:
df = pd.DataFrame({'a': ['26/9/21', '10:00', '12:00', '27/9/21', '1:00'],
'b': ['26/9/21', 'Paul', 'John', '27/9/21', 'Ringo']})
dflen = len(df)
newrow = []; newdata = []
for i in range(dflen): # read each row one by one
if '/' in df.iloc[i,0]: # if date found
item0 = df.iloc[i,0] # get new date
newrow = [item0] # put date as first entry of new row
continue # go to next row
newrow.append(df.iloc[i,0]) # add time
newrow.append(df.iloc[i,1]) # add name
newdata.append(newrow) # add row to new data
newrow = [item0] # create new row with same date entry
newdf = pd.DataFrame(newdata, columns=['Date','Time','Name']) # create new dataframe;
print(newdf)
Output:
Date Time Name
0 26/9/21 10:00 Paul
1 26/9/21 12:00 John
2 27/9/21 1:00 Ringo
I'd like to add 1 if date_ > buy_date larger than 12 months else 0
example df
customer_id date_ buy_date
34555 2019-01-01 2017-02-01
24252 2019-01-01 2018-02-10
96477 2019-01-01 2017-02-18
output df
customer_id date_ buy_date buy_date>_than_12_months
34555 2019-01-01 2017-02-01 1
24252 2019-01-01 2018-02-10 0
96477 2019-01-01 2018-02-18 1
Based on what I understand, you can try adding a year to buy_date and then subtract from date_ , then check if days are + or -.
df['buy_date>_than_12_months'] = ((df['date_'] -
(df['buy_date']+pd.offsets.DateOffset(years=1)))
.dt.days.gt(0).astype(int))
print(df)
customer_id date_ buy_date buy_date>_than_12_months
0 34555 2019-01-01 2017-02-01 1
1 24252 2019-01-01 2018-02-10 0
2 96477 2019-01-01 2017-02-18 1
import pandas as pd
import numpy as np
values = {'customer_id': [34555,24252,96477],
'date_': ['2019-01-01','2019-01-01','2019-01-01'],
'buy_date': ['2017-02-01','2018-02-10','2017-02-18'],
}
df = pd.DataFrame(values, columns = ['customer_id', 'date_', 'buy_date'])
df['date_'] = pd.to_datetime(df['date_'], format='%Y-%m-%d')
df['buy_date'] = pd.to_datetime(df['buy_date'], format='%Y-%m-%d')
print(df['date_'] - df['buy_date'])
df['buy_date>_than_12_months'] = pd.Series([1 if ((df['date_'] - df['buy_date'])[i]> np.timedelta64(1, 'Y')) else 0 for i in range(3)])
print (df)
I want to convert 24 numbers series, like 20190919120426, into date time, 2019-09-19 12:04:26
Here are the 'Datatime' series
0 20190919093350
1 20190919093350
2 20190919093357
3 20190919093357
4 20190919093517
5 20190919093517
import pandas as pd
for i in fl_std:
x = i['Datatime'].astype(int)
pd.to_datetime(x, format='%Y%m%d %H:%M:%S')
Convert 20190919093517 into datetime type
change format with omit space and :, also loop here is not necessary, because is possible pass column to function:
df['Datatime'] = pd.to_datetime(df['Datatime'], format='%Y%m%d%H%M%S')
print (df)
Datatime
0 2019-09-19 09:33:50
1 2019-09-19 09:33:50
2 2019-09-19 09:33:57
3 2019-09-19 09:33:57
4 2019-09-19 09:35:17
5 2019-09-19 09:35:17
I have a pandas timeline table containing dates objects and scores:
datetime score
2018-11-23 08:33:02 4
2018-11-24 09:43:30 2
2018-11-25 08:21:34 5
2018-11-26 19:33:01 4
2018-11-23 08:50:40 1
2018-11-23 09:03:10 3
I want to aggregate the score by hour without taking into consideration the date, the result desired is :
08:00:00 10
09:00:00 5
19:00:00 4
So basically I have to remove the date-month-year, and then group score by hour,
I tried this command
monthagg = df['score'].resample('H').sum().to_frame()
Which does work but takes into consideration the date-month-year, How to remove DD-MM-YYYY and aggregate by Hour?
One possible solution is use DatetimeIndex.floor for set minutes and seconds to 0 and then convert DatetimeIndex to strings by DatetimeIndex.strftime, then aggregate sum:
a = df['score'].groupby(df.index.floor('H').strftime('%H:%M:%S')).sum()
#if column datetime
#a = df['score'].groupby(df['datetime'].dt.floor('H').dt.strftime('%H:%M:%S')).sum()
print (a)
08:00:00 10
09:00:00 5
19:00:00 4
Name: score, dtype: int64
Or use DatetimeIndex.hour and aggregate sum:
a = df.groupby(df.index.hour)['score'].sum()
#if column datetime
#a = df.groupby(df['datetime'].dt.hour)['score'].sum()
print (a)
datetime
8 10
9 5
19 4
Name: score, dtype: int64
Setup to generate a frame with datetime objects:
import datetime
import pandas as pd
rows = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(100)]
df = pd.DataFrame(rows,columns = ["date"])
You can now add a hour-column like this, and then group by it:
df["hour"] = df["date"].dt.hour
df.groupby("hour").sum()
import pandas as pd
df = pd.DataFrame({'datetime':['2018-11-23 08:33:02 ','2018-11-24 09:43:30',
'2018-11-25 08:21:34',
'2018-11-26 19:33:01','2018-11-23 08:50:40',
'2018-11-23 09:03:10'],'score':[4,2,5,4,1,3]})
df['datetime']=pd.to_datetime(df['datetime'], errors='coerce')
df["hour"] = df["datetime"].dt.hour
df.groupby("hour").sum()
Output:
8 10
9 5
19 4