How to reformat dates in an index using Pandas/in Python - python

My variable dates_city stores this:
Index(['2020-11-17T00:00:00', '2020-11-18T00:00:00', '2020-11-19T00:00:00',
'2020-11-20T00:00:00', '2020-11-21T00:00:00', '2020-11-22T00:00:00',
'2020-11-23T00:00:00', '2020-11-24T00:00:00', '2020-11-25T00:00:00',
'2020-11-26T00:00:00', '2020-11-27T00:00:00', '2020-11-28T00:00:00'])
I want it to be stored as:
Index(['2020-11-17', '2020-11-18', '2020-11-19',
'2020-11-20', '2020-11-21', '2020-11-22',
'2020-11-23', '2020-11-24', '2020-11-25',
'2020-11-26', '2020-11-27', '2020-11-28'])
So, basically with just the date in yyyy-mm-dd format. I was trying to use datetime but I can't seem to get it to work, possibly because this variable is an index, not an array. How do I reformat this?

You could change the index of your dataframe using pandas reset_index() method. Note that this will rename the date column to 'index', so you may want to rename it using pandas rename() method.
Then you can use pandas strftime() method to reformat your dates. After reformatting, if you still want to use the date column as the index, you can do that by changing the index attribute of the dataframe (see code below):
df.index = df['Date']

pandas.to_datetime worked for me:
pd.to_datetime(dates_city)
#DatetimeIndex(['2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20',
# '2020-11-21', '2020-11-22', '2020-11-23', '2020-11-24',
# '2020-11-25', '2020-11-26', '2020-11-27', '2020-11-28'],
# dtype='datetime64[ns]', freq=None)
If you want to keep it as pandas.Index, you can add the method pandas.DatetimeIndex.strftime:
pd.to_datetime(dates_city).strftime("%Y-%m-%d")
#Index(['2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20', '2020-11-21',
# '2020-11-22', '2020-11-23', '2020-11-24', '2020-11-25', '2020-11-26',
# '2020-11-27', '2020-11-28'],
# dtype='object')
You can find the datetime format codes here.

Related

How do I extract a DateTimeIndex for use in a new column?

I've extracted the dates from filenames in a set of Excel files into a list of DateTimeIndex objects. I now need to write the extracted date from each to a new date column for the dataframes I've created from each Excel sheet. My code works in that it writes the the new 'Date' column to each dataframe, but I'm unable to convert the objects out of their generator object DateTimeIndex format and into a %Y-%m-%d format.
Link to code creating the list of DateTimeIndexes from the filenames:
How do I turn datefinder output into a list?
Code to write each list entry to a new 'Date' column in each dataframe created from the spreadsheets:
for i in range(0, len(df)):
df[i]['Date'] = (event_dates_dto[i] for frames in df)
The involved objects:
type(event_dates_dto)
<class 'list'>
type(event_dates_dto[0])
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
event_dates_dto
[DatetimeIndex(['2019-03-29'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-04-13'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-05-11'], dtype='datetime64[ns]', freq=None)]
The dates were extracted using datefinder: http://www.blog.pythonlibrary.org/2016/02/04/python-the-datefinder-package/
I've tried using methods here that seemed like they could make sense but none of them are the right ticket: Keep only date part when using pandas.to_datetime
Again, the simple for function is working correctly, but I'm unsure how to coerce the generator object into the correct format so that it not only writes to the new 'Date' column but also so that it is is in a useful '%Y-%m-%d' format that makes sense within the dataframe. Any help is greatly appreciated.
force evaluation with a one line loop like dates = [_ for _ in matches]
convert the index to a column using the .index (or .reset_index() if you don't need to keep it)
convert the column to datetime using pd.to_datetime()
. use the .dt.date object of the datetime column to convert to Y-m-d
Here's a sample
import datefinder
import pandas as pd
data = '''Your appointment is on July 14th, 2016 15:24. Your bill is due 05/05/2016 16:00'''
matches = datefinder.find_dates(data)
# force evaluation with 1 line loop
dates = [_ for _ in matches] # 'dates = list(matches)' also works
df = pd.DataFrame({'dt_index':dates,'value':['appointment','bill']}).set_index('dt_index')
df['date'] = df.index
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.date
df
which gives
value date
dt_index
2016-07-14 15:24:00 appointment 2016-07-14
2016-05-05 16:00:00 bill 2016-05-05
Edit: Edited to account for forced evaluation
A minor fix got it working, I was just trying to carry out too much at once and overthinking it.
#create empty list and append each date
event_dates_transfer = []
#use .strftime('%Y-%m-%d') method on event_dates_dto here if you wish to return a string instead of a datetimeindex
for i in range(0,len(event_dates_dto)):
event_dates_transfer.append(event_dates_dto[i][0])
#Create a 'Date' column for each dataframe correlating to the filename it was created from and set it as the index
for i in range(0, len(df)):
new_date = event_dates_transfer[i]
df[i]['Date'] = new_date
df[i].set_index('Date', inplace=True)

Select Pandas dataframe rows based on 'hour' datetime

I have a pandas dataframe 'df' with a column 'DateTimes' of type datetime.time.
The entries of that column are hours of a single day:
00:00:00
.
.
.
23:59:00
Seconds are skipped, it counts by minutes.
How can I choose rows by hour, for example the rows between 00:00:00 and 00:01:00?
If I try this:
df.between_time('00:00:00', '00:00:10')
I get an error that index must be a DateTimeIndex.
I set the index as such with:
df=df.set_index(keys='DateTime')
but I get the same error.
I can't seem to get 'loc' to work either. Any suggestions?
Here a working example of what you are trying to do:
times = pd.date_range('3/6/2012 00:00', periods=100, freq='S', tz='UTC')
df = pd.DataFrame(np.random.randint(10, size=(100,1)), index=times)
df.between_time('00:00:00', '00:00:30')
Note the index has to be of type DatetimeIndex.
I understand you have a column with your dates/times. The problem probably is that your column is not of this type, so you have to convert it first, before setting it as index:
# Method A
df.set_index(pd.to_datetime(df['column_name'], drop=True)
# Method B
df.index = pd.to_datetime(df['column_name'])
df = df.drop('col', axis=1)
(The drop is only necessary if you want to remove the original column after setting it as index)
Check out these links:
convert column to date type: Convert DataFrame column type from string to datetime
filter dataframe on dates: Filtering Pandas DataFrames on dates
Hope this helps

How can I undo a time series conversion of a pandas dataframe?

I set the index of my dataframe to a time series:
new_data.index = pd.DatetimeIndex(new_data.index)}
How can I convert this timeseries data back into the original string format?
Pandas index objects often have methods equivalent to those available to series. Here you can use pd.Index.astype:
df = pd.DataFrame(index=['2018-01-01', '2018-05-15', '2018-12-25'])
df.index = pd.DatetimeIndex(df.index)
# DatetimeIndex(['2018-01-01', '2018-05-15', '2018-12-25'],
# dtype='datetime64[ns]', freq=None)
df.index = df.index.astype(str)
# Index(['2018-01-01', '2018-05-15', '2018-12-25'], dtype='object')
Note strings in Pandas are stored in object dtype series. If you need a specific format, this can also be accommodated:
df.index = df.index.strftime('%d-%b-%Y')
# Index(['01-Jan-2018', '15-May-2018', '25-Dec-2018'], dtype='object')
See Python's strftime directives for conventions.

Pandas Dataframe asFreq changing datatype of index

I'm having an issue when using asfreq to resample a dataframe. My dataframe, df, has an index of type Datetime.Date(). After using df.asfreq('d','pad'), my dataframe index has been changed to type pandas.tslib.Timestamp. I've tried the following to change it back but I'm having no luck...
df = df.set_index(df.index.to_datetime())
df.index = df.index.to_datetime()
df.index = pd.to_datetime(df.index)
Any thoughts?
Thanks!
use pd.to_datetime
df.index = pd.to_datetime(df.index)
This is the canonical approach to creating datetime indices. If you want your index indices to all be of type datetime.datetime then you can do this following.
df.index = pd.Index([i.to_datetime() for i in df.index], name=df.index.name, dtype=object)
I just don't know why you'd want to.
Why is this a problem? If you really need a datetime.date you can try df.index = df.index.map(lambda x: x.date() since pandas.TimeStamp subclasses datetime.datetime

Frequency of a data frame

I have a data frame indexed with a date (Python datetime object). How could I find the frequency as the number of months of data in the data frame?
I tried the attribute data_frame.index.freq, but it returns a none value. I also tried asfreq function using data_frame.asfreq('M',how={'start','end'} but it does not return the expected results. Please advise how I can get the expected results.
You want to convert you index of datetimes to a DatetimeIndex, the easiest way is to use to_datetime:
df.index = pd.to_datetime(df.index)
Now you can do timeseries/frame operations, like resample or TimeGrouper.
If your data has a consistent frequency, then this will be df.index.freq, if it doesn't (e.g. if some days are missing) then df.index.freq will be None.
You probably want to be use pandas Timestamp for your index instead of datetime to use 'freq'. See example below
import pandas as pd
dates = pd.date_range('2012-1-1','2012-2-1')
df = pd.DataFrame(index=dates)
print (df.index.freq)
yields,
<Day>
You can easily convert your dataframe like so,
df.index = [pd.Timestamp(d) for d in df.index]

Categories