Change date format in pandas dataframe - python

I have this dataframe:
date value
1 Thu 17th Nov 2016 385.943800
2 Fri 18th Nov 2016 1074.160340
3 Sat 19th Nov 2016 2980.857860
4 Sun 20th Nov 2016 1919.723960
5 Mon 21st Nov 2016 884.279340
6 Tue 22nd Nov 2016 869.071070
7 Wed 23rd Nov 2016 760.289260
8 Thu 24th Nov 2016 2481.689270
9 Fri 25th Nov 2016 2745.990070
10 Sat 26th Nov 2016 2273.413250
11 Sun 27th Nov 2016 2630.414900
12 Mon 28th Nov 2016 817.322310
13 Tue 29th Nov 2016 1766.876030
14 Wed 30th Nov 2016 469.388420
I would like to change the format of the date column to this format YYYY-MM-DD. The dataframe consists of more than 200 rows, and every day new rows will be added, so I need to find a way to do this automatically.
This link is not helping because it sets the dates like this dates = ['30th November 2009', '31st March 2010', '30th September 2010'] and I can't do it for every row. Anyone knows a way to solve this?

Dateutil will do this job.
from dateutil import parser
print df
df2 = df.copy()
df2.date = df2.date.apply(lambda x: parser.parse(x))
df2
Output:

Related

How to split one row into multiple and apply datetime on dataframe column?

I have one dataframe which looks like below:
Date_1 Date_2
0 5 Dec 2017 5 Dec 2017
1 14 Dec 2017 14 Dec 2017
2 15 Dec 2017 15 Dec 2017
3 18 Dec 2017 21 Dec 2017 18 Dec 2017 21 Dec 2017
4 22 Dec 2017 22 Dec 2017
Conditions to be checked:
Want to check if any row contains two dates or not like 3rd row. If present split them into two separate rows.
Apply the datetime on both columns.
I am trying to do the same operation like below:
df['Date_1'] = pd.to_datetime(df['Date_1'], format='%d %b %Y')
But getting below error:
ValueError: unconverted data remains:
Expected Output:
Date_1 Date_2
0 5 Dec 2017 5 Dec 2017
1 14 Dec 2017 14 Dec 2017
2 15 Dec 2017 15 Dec 2017
3 18 Dec 2017 18 Dec 2017
4 21 Dec 2017 21 Dec 2017
5 22 Dec 2017 22 Dec 2017
After using regex with findall get the you date , your problem become a unnesting problem
s=df.apply(lambda x : x.str.findall(r'((?:\d{,2}\s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|\.|\s|,)\s?\d{,2}[a-z]*(?:-|,|\s)?\s?\d{,4})'))
unnesting(s,['Date_1','Date_2']).apply(pd.to_datetime)
Out[82]:
Date_1 Date_2
0 2017-12-05 2017-12-05
1 2017-12-14 2017-12-14
2 2017-12-15 2017-12-15
3 2017-12-18 2017-12-18
3 2017-12-21 2017-12-21
4 2017-12-22 2017-12-22

How to generate even calendar dates?

Does anyone know how to generate a list in Calendar in python (or some other platform) with "even days", month and year from 2018 until 2021?
Example:
Sun, 02 Jan 2019
Tue, 04 Jan 2019
Thur, 06 Jan 2019
Sat, 08 Jan 2019
Sun, 10 Jan 2019
Tue, 12 Jan 2019
Thur, 14 Jan 2019
Sat, 16 Jan 2019
Sun, 18 Jan 2019
Tue, 20 Jan 2019
Thur, 22 Jan 2019
and so on, respecting the calendar until 2021.
EDIT:
how to generate in python a calendar list between 2018 and 2022 with 2 formats:
Day of the week, Date Month Year Time (hours: minutes: seconds) - Year-Month-Date Time (hours: minutes: seconds)
Note:
Dates: Peer dates only
schedule: Randomly generated schedules
Example:
Tue, 02 Jan 2018 00:59:23 - 2018-01-02 00:59:23
Thu, 04 Jan 2018 10:24:52 - 2018-01-04 10:24:52
Sat, 06 Jan 2018 04:11:09 - 2018-01-06 04:11:09
Mon, 08 Jan 2018 16:12:40 - 2018-01-08 16:12:40
Wed, 10 Jan 2018 10:08:15 - 2018-01-10 10:08:15
Fri, 12 Jan 2018 07:10:09 - 2018-01-12 07:10:09
Sun, 14 Jan 2018 11:50:10 - 2018-01-14 11:50:10
Tue, 16 Jan 2018 02:29:22 - 2018-01-16 02:29:22
Thu, 18 Jan 2018 19:07:20 - 2018-01-18 19:07:20
Sat, 20 Jan 2018 08:50:13 - 2018-01-20 08:50:13
Mon, 22 Jan 2018 02:40:02 - 2018-01-22 02:40:02
and so on, until the year 2022 ...
Here's something fairly simple that seems to work and handles leap years:
from calendar import isleap
from datetime import date
# Days in each month (1-12).
MDAYS = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
def dim(year, month):
""" Number of days in month of the given year. """
return MDAYS[month] + ((month == 2) and isleap(year))
start_year, end_year = 2018, 2021
for year in range(start_year, end_year+1):
for month in range(1, 12+1):
days = dim(year, month)
for day in range(1, days+1):
if day % 2 == 0:
dt = date(year, month, day)
print(dt.strftime('%a, %d %b %Y'))
Output:
Tue, 02 Jan 2018
Thu, 04 Jan 2018
Sat, 06 Jan 2018
Mon, 08 Jan 2018
Wed, 10 Jan 2018
Fri, 12 Jan 2018
Sun, 14 Jan 2018
Tue, 16 Jan 2018
...
Edit:
Here's a way to do what (I think) you asked how to do in your follow-on question:
from calendar import isleap
from datetime import date, datetime, time
from random import randrange
# Days in each month (1-12).
MDAYS = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
def dim(year, month):
""" Number of days in month of the given year. """
return MDAYS[month] + ((month == 2) and isleap(year))
def whenever():
""" Gets the time value. """
# Currently just returns a randomly selected time of day.
return time(*map(randrange, (24, 60, 60))) # hour:minute:second
start_year, end_year = 2018, 2021
for year in range(start_year, end_year+1):
for month in range(1, 12+1):
days = dim(year, month)
for day in range(1, days+1):
if day % 2 == 0:
dt, when = date(year, month, day), whenever()
dttm = datetime.combine(dt, when)
print(dt.strftime('%a, %d %b %Y'), when, '-', dttm)
Output:
Tue, 02 Jan 2018 00:54:02 - 2018-01-02 00:54:02
Thu, 04 Jan 2018 10:19:51 - 2018-01-04 10:19:51
Sat, 06 Jan 2018 22:48:09 - 2018-01-06 22:48:09
Mon, 08 Jan 2018 06:48:46 - 2018-01-08 06:48:46
Wed, 10 Jan 2018 14:01:54 - 2018-01-10 14:01:54
Fri, 12 Jan 2018 05:42:43 - 2018-01-12 05:42:43
Sun, 14 Jan 2018 21:42:37 - 2018-01-14 21:42:37
Tue, 16 Jan 2018 08:08:39 - 2018-01-16 08:08:39
...
What about:
import datetime
d = datetime.date.today() # Define Start date
while d.year <= 2021: # This will go *through* 2012
if d.day % 2 == 0: # Print if even date
print(d.strftime('%a, %d %b %Y'))
d += datetime.timedelta(days=1) # Jump forward a day
Wed, 31 Oct 2018
Fri, 02 Nov 2018
Sun, 04 Nov 2018
Tue, 06 Nov 2018
Thu, 08 Nov 2018
Sat, 10 Nov 2018
Mon, 12 Nov 2018
Wed, 14 Nov 2018
Fri, 16 Nov 2018
Sun, 18 Nov 2018
Tue, 20 Nov 2018
Thu, 22 Nov 2018
...
Fri, 24 Dec 2021
Sun, 26 Dec 2021
Tue, 28 Dec 2021
Thu, 30 Dec 2021

Creating repeating values in a pandas Dataframe

I have 3 lists -
Name = ["ABC", "DEF", "GHI"]
Year = [2016,2017]
Month = ["Aug","Jul","Jun"]
I want to create a dataframe from these lists as follows -
df -
Name Year Month
ABC 2016 Aug
ABC 2016 Jul
ABC 2016 Jun
ABC 2017 Aug
ABC 2017 Jul
ABC 2017 Jun
DEF 2016 Aug
DEF 2016 Jul
DEF 2016 Jun
DEF 2017 Aug
DEF 2017 Jul
DEF 2017 Jun
..... and so on
for all values in the lists. Is there any method in python(pandas or numpy or scipy) to perform this? Or is looping the only way to perform this?
Use itertools.product:
pd.DataFrame(list(itertools.product(Name, Year, Month)),
columns=['Name', 'Year', 'Month'])
Name Year Month
0 ABC 2016 Aug
1 ABC 2016 Jul
2 ABC 2016 Jun
3 ABC 2017 Aug
4 ABC 2017 Jul
5 ABC 2017 Jun
6 DEF 2016 Aug
7 DEF 2016 Jul
8 DEF 2016 Jun
9 DEF 2017 Aug
10 DEF 2017 Jul
11 DEF 2017 Jun
12 GHI 2016 Aug
13 GHI 2016 Jul
14 GHI 2016 Jun
15 GHI 2017 Aug
16 GHI 2017 Jul
17 GHI 2017 Jun
If you want a fast numpy cartesian product, I'd suggest looking at
Numpy: cartesian product of x and y array points into single array of 2D points
Substituting product for a numpy alternative should be simple. All that's left to do is to call the pd.DataFrame constructor.

comparing date using dd/mm/yy format in python

I want to write a program where i can compare current date with couple of dates that i have.
my data is
12 JUN 2016
21 MAR 1989
15 MAR 1958
15 SEP 1958
23 OCT 1930
15 SEP 1928
10 MAR 2010
23 JAN 1928
15 NOV 1925
26 AUG 2009
29 APR 1987
20 JUL 1962
10 MAY 1960
13 FEB 1955
10 MAR 1956
3 MAR 2010
14 NOV 1958
4 AUG 1985
24 AUG 1956
15 FEB 1955
19 MAY 1987
30 APR 1990
8 SEP 2014
18 JAN 2012
14 DEC 1960
1 AUG 1998
7 SEP 1963
9 MAR 2012
1 MAY 1990
14 MAY 1985
15 JUN 1945
5 APR 1995
26 FEB 1987
13 DEC 1983
15 AUG 2009
16 SEP 1980
16 JAN 2005
19 JUN 2011
Now how can i compare this to current date to know that date is not exceeding current date ( i.e 13/JUN/2016).
please help me! Thank you.
You have to create a datetime object using the string data. You can create the object by parsing the date string using strptime method.
from datetime import datetime
mydate = datetime.strptime("19 JUN 2011", "%d %b %Y")
And then use the object to compare it with today's date.
print mydate < datetime.today()
True

calculating the mean of a list of timestamps ignoring weekend days in python

I have a list of timestamps and I want to calculate the mean of the list, but I need to ignore the weekend days which are Saturday and Sunday and consider Friday and Monday as one day. I only want to include the working days from Monday to Friday. This is an example of the list. I wrote the timestamps in readable format to follow the process easily.
Example:
['Wed Feb 17 12:57:40 2011', ' Wed Feb 8 12:57:40 2011', 'Tue Jan 25 17:15:35 2011']
MIN='Tue Jan 25 17:15:35 2011'
' Wed Feb 17 12:57:40 2011' , since we have 6 weekend days between this number and the MIN I shift back this number 6days.It will be = 'Fri Feb 11 12:57:40 2011'.
'Wed Feb 8 12:57:40 2011', since we have 4 weekend days between this number and the MIN I shift back this number 4days it will be 'Wed Feb 4 12:57:40 2011'
The new list is now [' Fri Feb 11 12:57:40 2011',' Wed Feb 4 12:57:40 2011',' Tue Jan 25 17:15:35 2011]
MAX= 'Fri Feb 11 12:57:40 2011'
average= (Fri Feb 11 12:57:40 2011 + Wed Feb 4 12:57:40 2011 + Tue Jan 25 17:15:35 2011) /3
difference= MAX - average
Edit: [Removed previous code, which had an error; replaced with code below.]
Here is some output from code that squeezes out weekends, computes average, and puts weekends back in to get an apparently valid average. The code is shown after the output from some test cases.
['Fri Jan 13 12:00:00 2012', 'Mon Jan 16 11:00:00 2012']
Average = Fri Jan 13 23:30:00 2012
['Fri Jan 13 12:00:00 2012', 'Mon Jan 16 13:00:00 2012']
Average = Mon Jan 16 00:30:00 2012
['Fri Jan 13 14:17:58 2012', 'Sat Jan 14 1:2:3 2012', 'Sun Jan 15 4:5:6 2012', 'Mon Jan 16 11:03:29 2012', 'Wed Jan 18 14:27:17 2012', 'Mon Jan 23 10:02:12 2012', 'Mon Jan 30 10:02:12 2012']
Average = Thu Jan 19 16:46:37 2012
['Fri Jan 14 14:17:58 2011', 'Mon Jan 17 11:03:29 2011', 'Wed Jan 19 14:27:17 2011', 'Mon Jan 24 10:02:12 2011']
Average = Wed Jan 19 00:27:44 2011
Python code:
from time import strptime, mktime, localtime, asctime
from math import floor
def averageBusinessDay (dates):
f = [mktime(strptime(x)) for x in dates]
h = [x for x in f if localtime(x).tm_wday < 5] # Get rid of weekend days
bweek, cweek, dweek = 3600*24*5, 3600*24*7, 3600*24*2
e = localtime(h[0]) # Get struct_time for first item
# fm is first Monday in local time
fm = mktime((e.tm_year, e.tm_mon, e.tm_mday-e.tm_wday, 0,0,0,0,0,0))
i = [x-fm for x in h] # Subtract leading Monday
j = [x-floor(x/cweek)*dweek for x in i] # Squeeze out weekends
avx = sum(j)/len(j)
avt = asctime(localtime(avx+floor(avx/bweek)*dweek+fm))
return avt
def atest(dates):
print dates
print 'Average = ', averageBusinessDay (dates)
atest(['Fri Jan 13 12:00:00 2012', 'Mon Jan 16 11:00:00 2012'])
atest(['Fri Jan 13 12:00:00 2012', 'Mon Jan 16 13:00:00 2012'])
atest(['Fri Jan 13 14:17:58 2012', 'Sat Jan 14 1:2:3 2012', 'Sun Jan 15 4:5:6 2012', 'Mon Jan 16 11:03:29 2012', 'Wed Jan 18 14:27:17 2012', 'Mon Jan 23 10:02:12 2012', 'Mon Jan 30 10:02:12 2012'])
atest(['Fri Jan 14 14:17:58 2011', 'Mon Jan 17 11:03:29 2011', 'Wed Jan 19 14:27:17 2011', 'Mon Jan 24 10:02:12 2011'])
Split the strings based on ' ', take the first element and if it's not saturday or sunday, it's a weekday. Now I need to know what you mean by the "mean" of a list of dates.

Categories