I have a pandas dataframe in which a column is in this format:
0 1983-02-07
1 1989-10-07
2 1969-10-28
3 1967-02-25
4 1982-07-21
...
328970 1995-06-09
328971 1999-01-11
328972 1962-04-01
328973 1996-05-19
328974 1994-03-03
Name: Nascita - Data, Length: 328975, dtype: object
what I'd like to do is something like
df['datecolumn']-datetime.now()
Something like this should work:
import pandas as pd
from datetime import datetime
data = ["1983-02-07",
"1989-10-07",
"1969-10-28",
"1967-02-25",
"1982-07-21"]
df = pd.DataFrame(data, columns = ["Date"])
print(df)
df["Date"] = pd.to_datetime(df['Date'])
#df["Difference"] = df["Date"].apply(lambda x: x-datetime.now())
# Alternate code
from dateutil.relativedelta import relativedelta
df["Difference"] = df["Date"].apply(lambda x: relativedelta(datetime.now(), x).years)
print(df)
Output:
Date
0 1983-02-07
1 1989-10-07
2 1969-10-28
3 1967-02-25
4 1982-07-21
Date Difference
0 1983-02-07 -13409 days +06:41:00.418879
1 1989-10-07 -10975 days +06:41:00.418728
2 1969-10-28 -18259 days +06:41:00.418671
3 1967-02-25 -19235 days +06:41:00.418630
4 1982-07-21 -13610 days +06:41:00.418591
OUTPUT ALTERNATE CODE:
Date
0 1983-02-07
1 1989-10-07
2 1969-10-28
3 1967-02-25
4 1982-07-21
Date Difference
0 1983-02-07 36
1 1989-10-07 30
2 1969-10-28 49
3 1967-02-25 52
4 1982-07-21 37
Related
I have a dataframe with a list of time value as object and needed to convert them to datetime, the issue is, they are not on the same format so when I try:
df['Total call time'] = pd.to_datetime(df['Total call time'], format='%H:%M:%S')
it gives me an error
ValueError: time data '3:22' does not match format '%H:%M:%S' (match)
or if use this code
df['Total call time'] = pd.to_datetime(df['Total call time'], format='%H:%M')
I get this error
ValueError: unconverted data remains: :58
These are the values on my data
Total call time
2:04:07
3:22:41
2:30:41
2:19:06
1:45:55
1:30:08
1:32:15
1:43:28
**45:48**
1:41:40
5:08:37
**3:22**
4:29:05
2:47:25
2:39:29
2:29:32
2:09:52
3:31:57
2:27:58
2:34:28
3:14:10
2:12:10
2:46:58
times = """\
2:04:07
3:22:41
2:30:41
2:19:06
1:45:55
1:30:08
1:32:15
1:43:28
45:48
1:41:40
5:08:37
3:22
4:29:05
2:47:25
2:39:29
2:29:32
2:09:52
3:31:57
2:27:58
2:34:28
3:14:10
2:12:10
2:46:58""".split()
import pandas as pd
df = pd.DataFrame(times, columns=['elapsed'])
def pad(s):
if len(s) == 4:
return '00:0'+s
elif len(s) == 5:
return '00:'+s
return s
print(pd.to_timedelta(df['elapsed'].apply(pad)))
Output:
0 0 days 02:04:07
1 0 days 03:22:41
2 0 days 02:30:41
3 0 days 02:19:06
4 0 days 01:45:55
5 0 days 01:30:08
6 0 days 01:32:15
7 0 days 01:43:28
8 0 days 00:45:48
9 0 days 01:41:40
10 0 days 05:08:37
11 0 days 00:03:22
12 0 days 04:29:05
13 0 days 02:47:25
14 0 days 02:39:29
15 0 days 02:29:32
16 0 days 02:09:52
17 0 days 03:31:57
18 0 days 02:27:58
19 0 days 02:34:28
20 0 days 03:14:10
21 0 days 02:12:10
22 0 days 02:46:58
Name: elapsed, dtype: timedelta64[ns]
Alternatively to grovina's answer ... instead of using apply you can directly use the dt accessor.
Here's a sample:
>>> data = [['2017-12-01'], ['2017-12-
30'],['2018-01-01']]
>>> df = pd.DataFrame(data=data,
columns=['date'])
>>> df
date
0 2017-12-01
1 2017-12-30
2 2018-01-01
>>> df.date
0 2017-12-01
1 2017-12-30
2 2018-01-01
Name: date, dtype: object
Note how df.date is an object? Let's turn it into a date like you want
>>> df.date = pd.to_datetime(df.date)
>>> df.date
0 2017-12-01
1 2017-12-30
2 2018-01-01
Name: date, dtype: datetime64[ns]
The format you want is for string formatting. I don't think you'll be able to convert the actual datetime64 to look like that format. For now, let's make a newly formatted string version of your date in a separate column
>>> df['new_formatted_date'] =
df.date.dt.strftime('%d/%m/%y %H:%M')
>>> df.new_formatted_date
0 01/12/17 00:00
1 30/12/17 00:00
2 01/01/18 00:00
Name: new_formatted_date, dtype: object
Finally, since the df.date column is now of date datetime64... you can use the dt accessor right on it. No need to use apply
>>> df['month'] = df.date.dt.month
>>> df['day'] = df.date.dt.day
>>> df['year'] = df.date.dt.year
>>> df['hour'] = df.date.dt.hour
>>> df['minute'] = df.date.dt.minute
>>> df
date new_formatted_date month day
year hour minute
0 2017-12-01 01/12/17 00:00 12
1 2017 0 0
1 2017-12-30 30/12/17 00:00 12
30 2017 0 0
2 2018-01-01 01/01/18 00:00
Another idea is test if double : and if not added :00 with converting to timedeltas by to_timedelta, also is test if number before first : is less like 23 - then is parsing like HH:MM, if is greater is parising like MM:SS:
m1 = df['Total call time'].str.count(':').ne(2)
m2 = df['Total call time'].str.extract('^(\d+):', expand=False).astype(float).gt(23)
s = np.select([m1 & m2, m1 & ~m2],
['00:' + df['Total call time'], df['Total call time']+ ':00'],
df['Total call time'] )
df['Total call time'] = pd.to_timedelta(s)
print (df)
Total call time
0 0 days 02:04:07
1 0 days 03:22:41
2 0 days 02:30:41
3 0 days 02:19:06
4 0 days 01:45:55
5 0 days 01:30:08
6 0 days 01:32:15
7 0 days 01:43:28
8 0 days 00:45:48
9 0 days 01:41:40
10 0 days 05:08:37
11 0 days 03:22:00
12 0 days 04:29:05
13 0 days 02:47:25
14 0 days 02:39:29
15 0 days 02:29:32
16 0 days 02:09:52
17 0 days 03:31:57
18 0 days 02:27:58
19 0 days 02:34:28
20 0 days 03:14:10
21 0 days 02:12:10
22 0 days 02:46:58
Currently I am reading in a data frame with the timestamp from film 00(days):00(hours clocks over at 24 to day):00(min):00(sec)
pandas reads time formats HH:MM:SS and YYYY:MM:DD HH:MM:SS fine.
Though is there a way of having pandas read the duration of time such as the DD:HH:MM:SS.
Alternatively using timedelta how would I go about getting the DD into HH in the data frame so that pandas can make it "1 day HH:MM:SS" for example
Data sample
00:00:00:00
00:07:33:57
02:07:02:13
00:00:13:11
00:00:10:11
00:00:00:00
00:06:20:06
01:12:13:25
Expected output for last sample
36:13:25
Thanks
If you want timedelta objects, a simple way is to replace the first colon with days :
df['timedelta'] = pd.to_timedelta(df['col'].str.replace(':', 'days ', n=1))
output:
col timedelta
0 00:00:00:00 0 days 00:00:00
1 00:07:33:57 0 days 07:33:57
2 02:07:02:13 2 days 07:02:13
3 00:00:13:11 0 days 00:13:11
4 00:00:10:11 0 days 00:10:11
5 00:00:00:00 0 days 00:00:00
6 00:06:20:06 0 days 06:20:06
7 01:12:13:25 1 days 12:13:25
>>> df.dtypes
col object
timedelta timedelta64[ns]
dtype: object
From there it's also relatively easy to combine the days and hours as string:
c = df['timedelta'].dt.components
df['str_format'] = ((c['hours']+c['days']*24).astype(str)
+df['col'].str.split('(?=:)', n=2).str[-1]).str.zfill(8)
output:
col timedelta str_format
0 00:00:00:00 0 days 00:00:00 00:00:00
1 00:07:33:57 0 days 07:33:57 07:33:57
2 02:07:02:13 2 days 07:02:13 55:02:13
3 00:00:13:11 0 days 00:13:11 00:13:11
4 00:00:10:11 0 days 00:10:11 00:10:11
5 00:00:00:00 0 days 00:00:00 00:00:00
6 00:06:20:06 0 days 06:20:06 06:20:06
7 01:12:13:25 1 days 12:13:25 36:13:25
Convert days separately, add to times and last call custom function:
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
d = pd.to_timedelta(df['col'].str[:2].astype(int), unit='d')
td = pd.to_timedelta(df['col'].str[3:])
df['col'] = d.add(td).apply(f)
print (df)
col
0 0:00:00
1 7:33:57
2 55:02:13
3 0:13:11
4 0:10:11
5 0:00:00
6 6:20:06
7 36:13:25
I have a csv file containing 2 columns: id, val
where id is the number of the day (total 365)
Is it possible to convert the number id to dates in format '%d-%m-%Y'?
In fact I want to add all the days of year 2015 e.g. 01-01-2015 etc.
How can i do this with pandas in python?
following is a part of the file and the desired output
"id" "val"
1 49
2 48
3 46
4 45
"date" "val"
01-01-2015 49
02-01-2015 48
03-01-2015 46
04-01-2015 45
Use pd.tseries.offsets.Day:
df['date'] = pd.Timestamp('2015-01-01') \
+ df['id'].sub(1).apply(pd.tseries.offsets.Day)
Alternative, proposed by #HenryEcker:
df['date'] = pd.Timestamp('2015-01-01') \
- pd.Timedelta(days=1) \
+ df['id'].apply(pd.tseries.offsets.Day)
>>> df['id'].sub(1).apply(pd.tseries.offsets.Day)
0 <0 * Days>
1 <Day>
2 <2 * Days>
3 <3 * Days>
Name: id, dtype: object
>>> df
id val date
0 1 49 2015-01-01
1 2 48 2015-01-02
2 3 46 2015-01-03
3 4 45 2015-01-04
You can convert id to datetime and format the output with strftime:
df['Date'] = pd.to_datetime(df['id'].astype(str)+"-2015", format='%j-%Y').dt.strftime('%d-%m-%Y')
Result:
id
val
Date
0
1
49
01-01-2015
1
2
48
02-01-2015
2
3
46
03-01-2015
3
4
45
04-01-2015
df.columns['date', 'val']
for i, contents in enumerate(df['date']):
info = str(contents)
if contents < 10:
info = str(0) + info
df['date'][i] = "01-" + info + "-2015"
This iterates through your column and converts it to date formatting
Or like this:
df['Date'] = pd.Timestamp('2014-12-31') + df['id'].apply(lambda x: pd.Timedelta(days=x))
Output:
id val Date
0 1 49 2015-01-01
1 2 48 2015-01-02
2 3 46 2015-01-03
3 4 45 2015-01-04
You can use pd.to_timedelta() on id column to turn its values into date offsets for adding to the base date, as follows:
df['date'] = pd.Timestamp('2015-01-01') + pd.to_timedelta(df['id'] -1, unit='day')
Result:
print(df)
id val date
0 1 49 2015-01-01
1 2 48 2015-01-02
2 3 46 2015-01-03
3 4 45 2015-01-04
If you want the date in dd-mm-YYYY format, you can use together with .dt.strftime(), as follows:
df['date2'] = (pd.Timestamp('2015-01-01') + pd.to_timedelta(df['id'] -1, unit='day')).dt.strftime('%d-%m-%Y')
Result:
print(df)
id val date date2
0 1 49 2015-01-01 01-01-2015
1 2 48 2015-01-02 02-01-2015
2 3 46 2015-01-03 03-01-2015
3 4 45 2015-01-04 04-01-2015
I'm not sure about the years as the day count doesn't speak about which year to choose but you can convert it into months and dates.
change your csv column called id into the date. Then >>>
df['Date'] = pd.to_datetime(df['Date'], format='%j').dt.strftime('%m-%d')
it will change it into date. Then you can manually add year.
I would like to get the number of days before the end of the month, from a string column representing a date.
I have the following pandas dataframe :
df = pd.DataFrame({'date':['2019-11-22','2019-11-08','2019-11-30']})
df
date
0 2019-11-22
1 2019-11-08
2 2019-11-30
I would like the following output :
df
date days_end_month
0 2019-11-22 8
1 2019-11-08 22
2 2019-11-30 0
The package pd.tseries.MonthEnd with rollforward seemed a good pick, but I can't figure out how to use it to transform a whole column.
Subtract all days of month created by Series.dt.daysinmonth with days extracted by Series.dt.day:
df['date'] = pd.to_datetime(df['date'])
df['days_end_month'] = df['date'].dt.daysinmonth - df['date'].dt.day
Or use offsets.MonthEnd, subtract and convert timedeltas to days by Series.dt.days:
df['days_end_month'] = (df['date'] + pd.offsets.MonthEnd(0) - df['date']).dt.days
print (df)
date days_end_month
0 2019-11-22 8
1 2019-11-08 22
2 2019-11-30 0
My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30