parse odd dataframe index to datetime - python

I have a dataframe that I've pulled from the EIA API, however, all of the index values are of the format 'YYYY mmddTHHZ dd'. For example, 11am on today's date appears as '2020 0317T11Z 17'.
What I would like to be able to do is parse this index such that there is a separate ['Date'] and ['Time']column with the date in YYYY-mm-dd format and the hour as a singular number, i.e. 11.
It is not a datetime object and I'm not sure how to parse an index and replace in this manner. Any help is appreciated.
Thanks.

Remove the excessive part:
s = pd.Series(['2020 0317T11Z 17'])
datetimes = pd.to_datetime(s.str[:-4], format='%Y %m%dT%H')
# after converting to datetime, you can extract
dates = datetimes.dt.normalize()
times = datetimes.dt.time
# or better
# times = dtatetimes - date

Related

How can I take list of Dates from csv (as strings) and return only the dates/data between a start date and end date?

I have a csv file with dates in format M/D/YYYY from 1948 to 2017. I'm able to plot other columns/lists associated with each date by list index. I want to be able to ask the user for a start date, and an end date, then return/plot the data from only within that period.
Problem is, reading dates in from the csv, they are strings so I cannot use if date[x] >= startDate && date[x] <= endDate because theres no way for me to turn dates in this format to integers.
Here is my csv file
I am already able to read in the dates from the csv to its own list.
How can I take the dates in my list and only return the ones within the user specified date range?
Here is my function for plotting the entire dataset right now:
#CSV Plotting function
def CSV_Plot (data,header,column1,column2):
#pyplot.plot([item[column1] for item in data] , [item[column2] for item in data])
pyplot.scatter([item[column1] for item in data] , [item[column2] for item in data])
pyplot.xlabel(header[column1])
pyplot.ylabel(header[column2])
pyplot.show()
return True
CSV_Plot(mycsvdata,data_header,dateIndex,rainIndex)
This is how I am asking the user to input the start and end dates:
#Ask user for start date in M/D/YYY format
startDate = input('Please provide the start date (M/D/YYYY) of the period for the data you would like to plot: ')
endDate = input('Please provide the end date (M/D/YYYY) of the period for the data you would like to plot: ')
You need to compare the dates.
I would suggest parsing the dates from your CSV into a datetime object, and also turning the user input value into a datetime object.
How to create a datetime object from a string? You need to specify the format string and the strptime() will parse it for you. Details here:
Converting string into datetime
In your case, it could be something like
from datetime import datetime
# Considering date is in M/D/YYYY format
datetime_object1 = datetime.strptime(date_string, "%m/%d/%Y")
Then you can compare them with a > or < operator. Here you can find details of how to compare the dates.

How do I format date using pandas?

My data 'df' shows data 'Date' as 1970-01-01 00:00:00.019990103 when this is formatted to date_to using pandas. How do I show the date as 01/03/1999?
consider LoneWanderer's comment for next time and show some of the code that you have tried.
I would try this:
from datetime import datetime
now = datetime.now()
print(now.strftime('%d/%m/%Y'))
You can print now to see that is in the same format that you have and after that is formatted to the format required.
I see that the actual date is in last 10 chars of your source string.
To convert such strings to a Timestamp (ignoring the starting part), run:
df.Date = df.Date.apply(lambda src: pd.to_datetime(src[-8:]))
It is worth to consider to keep this date just as Timestamp, as it
simplifies operations on date / time and apply your formatting only in printouts.
But if you want to have this date as a string in "your" format, in the
"original" column, perform the second conversion (Timestamp to string):
df.Date = df.Date.dt.strftime('%m/%d/%Y')

How to add date and HHMM time together into month/date/year hh:mm format and index

I'm working with a CSV file with flight records. My overall goal is to make plots of flight delays over a few selected days. I am trying to index these flights by the day and the scheduled departure times. So, I have a flight date in a month/day/year format and a departure time formated in hhmm, is there a way to reformat that departure time column to a hh:mm format in 24:00 time? Then would I simply add the columns together and index by them?
I've tried adding the columns together without reformatting the time and I'm not sure matplotlib recognizes this time format for my plots.
data = pd.read_csv("groundhog_query.csv",parse_dates=[['Flight_Date', 'Scheduled_Dep_Time']])
data.index = data['Flight_Date_Scheduled_Dep_Time']
data
the CSV files looks like this
'''
Year,Flight_Date,Day_Of_Year,Unique_Carrier_ID,Airline_ID,Tail_Number,Flight_Number,Origin_Airport_ID,Origin_Market_ID,Origin_Airport_Code,Origin_State,Destination_Airport_ID,Destination_Market_ID,Destination_Airport_Code,Dest_State,Scheduled_Dep_Time,Actual_Dep_Time,Dep_Delay,Pos_Dep_Delay,Scheduled_Arr_Time,Actual_Arr_Time,Arr_Delay,Pos_Arr_Delay,Combined_Arr_Delay,Can_Status,Can_Reason,Div_Status,Scheduled_Elapsed_Time,Actual_Elapsed_Time,Carrier_Delay,Weather_Delay,Natl_Airspace_System_Delay,Security_Delay,Late_Aircraft_Delay,Div_Airport_Landings,Div_Landing_Status,Div_Elapsed_Time,Div_Arrival_Delay,Div_Airport_1_ID,Div_1_Tail_Num,Div_Airport_2_ID,Div_2_Tail_Num,Div_Airport_3_ID,Div_3_Tail_Num,Div_Airport_4_ID,Div_4_Tail_Num,Div_Airport_5_ID,Div_5_Tail_Num
2011,2011-01-24,24,MQ,20398,N717MQ,4527,11278,30852,DCA,VA,14492,34492,RDU,NC,1630,1622.0,-8.0,0.0,1735,1722.0,-13.0,0.0,-13.0,0,,0,65,60.0,,,,,,0,,,,,,,,,,,,,
2011,2011-01-25,25,MQ,20398,N736MQ,4527,11278,30852,DCA,VA,14492,34492,RDU,NC,1630,1624.0,-6.0,0.0,1735,1724.0,-11.0,0.0,-11.0,0,,0,65,60.0,,,,,,0,,,,,,,,,,,,,
2011,2011-01-26,26,MQ,20398,N737MQ,4527,11278,30852,DCA,VA,14492,34492,RDU,NC,1630,,,,1735,,,,,1,B,0,65,,,,,,,0,,,,,,,,,,,,,
2011,2011-01-27,27,MQ,20398,N721MQ,4527,11278,30852,DCA,VA,14492,34492,RDU,NC,1630,1832.0,122.0,122.0,1735,1936.0,121.0,121.0,121.0,0,,0,65,64.0,121.0,0.0,0.0,0.
'''
my current results are in a month/day/year hhmm format
Use the following steps:
1. Read CSV without parsing dates.
2. Merge 'Flight_Date' and 'Scheduled_Dep_Time' columns. Make sure that 'Scheduled_Dep_Time' is converted to string fist (hence .map(str)) since it is by default parsed as int.
3. Convert string to datetime by using correct format ('%Y-%m-%d %H:%M')
4. Set this newly produced column as index
d = pd.read_csv("groundhog_query.csv")
d['Flight_Date_Scheduled_Dep_Time_string'] = d.Flight_Date.str.cat(' ' + d.Scheduled_Dep_Time.map(str))
d['Flight_Date_Scheduled_Dep_Time'] = pd.to_datetime(d.Flight_Date_Scheduled_Dep_Time_string, format='%Y-%m-%d %H:%M')
d = d.set_index('Flight_Date_Scheduled_Dep_Time')
The reference for % directives is here:
https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-behavior

Extract Date from excel and append it in a list using python

I have an column in excel which has dates in the format ''17-12-2015 19:35". How can I extract the first 2 digits as integers and append it to a list? In this case I need to extract 17 and append it to a list. Can it be done using pandas also?
Code thus far:
import pandas as pd
Location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(Location)
time = df['Creation Date'].tolist()
print (time)
You could extract the day of each timestamp like
from datetime import datetime
import pandas as pd
location = r'F:\Analytics Materials\files\paymenttransactions.csv'
df = pd.read_csv(location)
timestamps = df['Creation Date'].tolist()
dates = [datetime.strptime(timestamp, '%d-%m-%Y %H:%M') for timestamp in timestamps]
days = [date.strftime('%d') for date in dates]
print(days)
The '%d-%m-%Y %H:%M'and '%d' bits are format specififers, that describe how your timestamp is formatted. See e.g. here for a complete list of directives.
datetime.strptime parses a string into a datetimeobject using such a specifier. dateswill thus hold a list of datetime instances instead of strings.
datetime.strftime does the opposite: It turns a datetime object into string, again using a format specifier. %d simply instructs strftime to only output the day of a date.

Break-up year, months & days in Pandas

I have a input parameter dictionary as below -
InparamDict = {'DataInputDate':'2014-10-25'
}
Using the field InparamDict['DataInputDate'], I want to pull up data from 2013-10-01 till 2013-10-25. What would be the best way to arrive at the same using Pandas?
The sql equivalent is -
DATEFROMPARTS(DATEPART(year,GETDATE())-1,DATEPART(month,GETDATE()),'01')
You forgot to mention if you're trying to pull up data from a DataFrame, Series or what. If you just want to get the date parts, you just have to get the attribute you want from the Timestamp object.
from pandas import Timestamp
dt = Timestamp(InparamDict['DataInputDate'])
dt.year, dt.month, dt.day
If the dates are in a DataFrame (df) and you convert them to dates instead of strings. You can select the data by ranges as well, for instance
df[df['DataInputDate'] > datetime(2013,10,1)]

Categories