Standardizing date format in Python - python

I have a dataset and I realized the values in the date column are in two different formats.
I tried resolving this issue using date parser but it confuses the day of the month with the month.
For example :
'27/02/21 13:40' is converted correctly to 2021-02-27 13:40:00' BUT '01/03/21 15:09' is converted to '2021-01-03 15:09:00' (Instead of March 1st it's transformed into January 3rd).
I really don't understand why in the first case the conversion is correct, while in the second it's not. Both dates are in the same column and have the same format.
This is a preview of the dataset with the two different dates:
This is the date that was converted correctly
This is the date that was not converted correctly
These are the steps I followed:
I converted the date columns to a list of strings
I created this function:
date parser function
I previewed my converted list and noticed that not all dates had been converted in the same way:
This date was converted correctly
Here, month and day of the month are switched

Related

Convert ISO dates and time to separate columns date and time using Python

I have no idea how to convert this to a readable format. I have a column with values such as:
2022-07-04T22:10:49.000+0000

How to add date and HHMM time together into month/date/year hh:mm format and index

I'm working with a CSV file with flight records. My overall goal is to make plots of flight delays over a few selected days. I am trying to index these flights by the day and the scheduled departure times. So, I have a flight date in a month/day/year format and a departure time formated in hhmm, is there a way to reformat that departure time column to a hh:mm format in 24:00 time? Then would I simply add the columns together and index by them?
I've tried adding the columns together without reformatting the time and I'm not sure matplotlib recognizes this time format for my plots.
data = pd.read_csv("groundhog_query.csv",parse_dates=[['Flight_Date', 'Scheduled_Dep_Time']])
data.index = data['Flight_Date_Scheduled_Dep_Time']
data
the CSV files looks like this
'''
Year,Flight_Date,Day_Of_Year,Unique_Carrier_ID,Airline_ID,Tail_Number,Flight_Number,Origin_Airport_ID,Origin_Market_ID,Origin_Airport_Code,Origin_State,Destination_Airport_ID,Destination_Market_ID,Destination_Airport_Code,Dest_State,Scheduled_Dep_Time,Actual_Dep_Time,Dep_Delay,Pos_Dep_Delay,Scheduled_Arr_Time,Actual_Arr_Time,Arr_Delay,Pos_Arr_Delay,Combined_Arr_Delay,Can_Status,Can_Reason,Div_Status,Scheduled_Elapsed_Time,Actual_Elapsed_Time,Carrier_Delay,Weather_Delay,Natl_Airspace_System_Delay,Security_Delay,Late_Aircraft_Delay,Div_Airport_Landings,Div_Landing_Status,Div_Elapsed_Time,Div_Arrival_Delay,Div_Airport_1_ID,Div_1_Tail_Num,Div_Airport_2_ID,Div_2_Tail_Num,Div_Airport_3_ID,Div_3_Tail_Num,Div_Airport_4_ID,Div_4_Tail_Num,Div_Airport_5_ID,Div_5_Tail_Num
2011,2011-01-24,24,MQ,20398,N717MQ,4527,11278,30852,DCA,VA,14492,34492,RDU,NC,1630,1622.0,-8.0,0.0,1735,1722.0,-13.0,0.0,-13.0,0,,0,65,60.0,,,,,,0,,,,,,,,,,,,,
2011,2011-01-25,25,MQ,20398,N736MQ,4527,11278,30852,DCA,VA,14492,34492,RDU,NC,1630,1624.0,-6.0,0.0,1735,1724.0,-11.0,0.0,-11.0,0,,0,65,60.0,,,,,,0,,,,,,,,,,,,,
2011,2011-01-26,26,MQ,20398,N737MQ,4527,11278,30852,DCA,VA,14492,34492,RDU,NC,1630,,,,1735,,,,,1,B,0,65,,,,,,,0,,,,,,,,,,,,,
2011,2011-01-27,27,MQ,20398,N721MQ,4527,11278,30852,DCA,VA,14492,34492,RDU,NC,1630,1832.0,122.0,122.0,1735,1936.0,121.0,121.0,121.0,0,,0,65,64.0,121.0,0.0,0.0,0.
'''
my current results are in a month/day/year hhmm format
Use the following steps:
1. Read CSV without parsing dates.
2. Merge 'Flight_Date' and 'Scheduled_Dep_Time' columns. Make sure that 'Scheduled_Dep_Time' is converted to string fist (hence .map(str)) since it is by default parsed as int.
3. Convert string to datetime by using correct format ('%Y-%m-%d %H:%M')
4. Set this newly produced column as index
d = pd.read_csv("groundhog_query.csv")
d['Flight_Date_Scheduled_Dep_Time_string'] = d.Flight_Date.str.cat(' ' + d.Scheduled_Dep_Time.map(str))
d['Flight_Date_Scheduled_Dep_Time'] = pd.to_datetime(d.Flight_Date_Scheduled_Dep_Time_string, format='%Y-%m-%d %H:%M')
d = d.set_index('Flight_Date_Scheduled_Dep_Time')
The reference for % directives is here:
https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior

Issue with converting string to a datetime in Python

I've a datetime (int64) column in my pandas dataframe.
I'm trying to convert its value of 201903250428 to a datetime value.
The value i have for the datetime (int64) column is only till minute level with 24 hours format.
I tried various methods like striptime, to_datetime methods but no luck.
pd.datetime.strptime('201903250428','%y%m%d%H%M')
I get this error when i use the above code.
ValueError: unconverted data remains: 0428
I wanted this value to be converted to like '25-03-2019 04:28:00'
Lower-case y means two-digit years only, so this is trying to parse "20" as the year, 1 as the month, 9 the day, and 03:25 as the time, leaving "0428" unconverted.
You need to use %Y which will work fine:
pd.datetime.strptime('201903250428','%Y%m%d%H%M')
http://strftime.org/ is a handy reference for time formatting/parsing parameters.

Date to text and vice versa in excel [duplicate]

This question already has an answer here:
convert numerical representation of date (excel format) to python date and time, then split them into two seperate dataframe columns in pandas
(1 answer)
Closed 4 years ago.
I have seen that excel identifies dates with specific serial numbers. For example :
09/07/2018 = 43290
10/07/2018 = 43291
I know that we use the DATEVALUE , VALUE and the TEXT functions to convert between these types.
But what is the logic behind this conversion? why 43290 for 09/07/2018 ?
Also , if I have a list of these dates in the number format in a dataframe (Python), how can I convert this number to the date format?
Similarly with time, I see decimal values in place of a regular time format. What is the logic behind these time conversions?
The following question that has been given in the comments is informative, but does not answer my question of the logic behind the conversion between Date and Text format :
convert numerical representation of date (excel format) to python date and time, then split them into two seperate dataframe columns in pandas
It is simply the number of days (or fraction of days, if talking about date and time) since January 1st 1900:
The DATEVALUE function converts a date that is stored as text to a
serial number that Excel recognizes as a date. For example, the
formula =DATEVALUE("1/1/2008") returns 39448, the serial number of the
date 1/1/2008. Remember, though, that your computer's system date
setting may cause the results of a DATEVALUE function to vary from
this example
...
Excel stores dates as sequential serial numbers so that they can be used in calculations. By default, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,447 days after January 1, 1900.
from DATEVALUE docs
if I have a list of these dates in the number format in a dataframe
(Python), how can I convert this number to the date format?
Since we know this number represents the number of days since 1/1/1900 it can be easily converted to a date:
from datetime import datetime, timedelta
day_number = 43290
print(datetime(1900, 1, 1) + timedelta(days=day_number - 2))
# 2018-07-09 00:00:00 ^ subtracting 2 because 1/1/1900 is
# "day 1", not "day 0"
However pd.read_excel should be able to handle this automatically.

Working on dates with mm-dd-YY & YY-mm-dd format in pandas

I am trying to do a simple test on pandas capabilities to handle dates & format.
For that i have created a dataframe with values like below. :
df = pd.DataFrame({'date1' : ['10-11-11','12-11-12','10-10-10','12-11-11',
'12-12-12','11-12-11','11-11-11']})
Here I am assuming that the values are dates. And I am converting it into proper format using pandas' to_datetime function.
df['format_date1'] = pd.to_datetime(df['date1'])
print(df)
Out[3]:
date1 format_date1
0 10-11-11 2011-10-11
1 12-11-12 2012-12-11
2 10-10-10 2010-10-10
3 12-11-11 2011-12-11
4 12-12-12 2012-12-12
5 11-12-11 2011-11-12
6 11-11-11 2011-11-11
Here, Pandas is reading the date of the dataframe as "MM/DD/YY" and converting it in native format (i.e. YYYY/MM/DD). I want to check if Pandas can take my input indicating that the date format is actually "YY/MM/DD" and then let it convert into its native format. This will change the value of row no.: 5. To do this, I have run following code. But it is giving me an error.
df3['format_date2'] = pd.to_datetime(df3['date1'], format='%Y/%m/%d')
ValueError: time data '10-10-10' does not match format '%Y/%m/%d' (match)
I have seen the sort of solution here. But I was hoping to get a little easy and crisp answer.
%Y in the format specifier takes the 4-digit year (i.e. 2016). %y takes the 2-digit year (i.e. 16, meaning 2016). Change the %Y to %y and it should work.
Also the dashes in your format specifier are not present. You need to change your format to %y-%m-%d

Categories