I set the index of my dataframe to a time series:
new_data.index = pd.DatetimeIndex(new_data.index)}
How can I convert this timeseries data back into the original string format?
Pandas index objects often have methods equivalent to those available to series. Here you can use pd.Index.astype:
df = pd.DataFrame(index=['2018-01-01', '2018-05-15', '2018-12-25'])
df.index = pd.DatetimeIndex(df.index)
# DatetimeIndex(['2018-01-01', '2018-05-15', '2018-12-25'],
# dtype='datetime64[ns]', freq=None)
df.index = df.index.astype(str)
# Index(['2018-01-01', '2018-05-15', '2018-12-25'], dtype='object')
Note strings in Pandas are stored in object dtype series. If you need a specific format, this can also be accommodated:
df.index = df.index.strftime('%d-%b-%Y')
# Index(['01-Jan-2018', '15-May-2018', '25-Dec-2018'], dtype='object')
See Python's strftime directives for conventions.
Related
I have a dataframe time column with object datatype and would like to convert time format for graph.
import pandas as pd
df = pd.DataFrame({
"time":["12:30:31.320"]
})
df["time"]
df['time'] = pd.to_datetime(df['time'],format='%H:%M:%S.%f').dt.strftime('%H:%M:%S')
df['time'] # Output Name: time, dtype: object
To keep Python's time instance, you can use:
df['time'] = (pd.to_datetime(df['time'],format='%H:%M:%S.%f')
.dt.floor('S') # remove milliseconds
.dt.time) # keep time part
Output:
>>> df['time']
0 12:30:31
Name: time, dtype: object # the dtype is object but...
>>> df.loc[0, 'time']
datetime.time(12, 30, 31) # ...contain a list of time objects
You appear to be attempting to convert the 'time' column back to a string in the format '%H:%M:%S' after converting it to datetime.
You may accomplish this by using the dt.strftime function.
However, after converting back to string, the output of df['time'] is still of object data type.
You may use the astype method to convert the data type of this column to string:
df['time'] = df['time'].astype(str)
My variable dates_city stores this:
Index(['2020-11-17T00:00:00', '2020-11-18T00:00:00', '2020-11-19T00:00:00',
'2020-11-20T00:00:00', '2020-11-21T00:00:00', '2020-11-22T00:00:00',
'2020-11-23T00:00:00', '2020-11-24T00:00:00', '2020-11-25T00:00:00',
'2020-11-26T00:00:00', '2020-11-27T00:00:00', '2020-11-28T00:00:00'])
I want it to be stored as:
Index(['2020-11-17', '2020-11-18', '2020-11-19',
'2020-11-20', '2020-11-21', '2020-11-22',
'2020-11-23', '2020-11-24', '2020-11-25',
'2020-11-26', '2020-11-27', '2020-11-28'])
So, basically with just the date in yyyy-mm-dd format. I was trying to use datetime but I can't seem to get it to work, possibly because this variable is an index, not an array. How do I reformat this?
You could change the index of your dataframe using pandas reset_index() method. Note that this will rename the date column to 'index', so you may want to rename it using pandas rename() method.
Then you can use pandas strftime() method to reformat your dates. After reformatting, if you still want to use the date column as the index, you can do that by changing the index attribute of the dataframe (see code below):
df.index = df['Date']
pandas.to_datetime worked for me:
pd.to_datetime(dates_city)
#DatetimeIndex(['2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20',
# '2020-11-21', '2020-11-22', '2020-11-23', '2020-11-24',
# '2020-11-25', '2020-11-26', '2020-11-27', '2020-11-28'],
# dtype='datetime64[ns]', freq=None)
If you want to keep it as pandas.Index, you can add the method pandas.DatetimeIndex.strftime:
pd.to_datetime(dates_city).strftime("%Y-%m-%d")
#Index(['2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20', '2020-11-21',
# '2020-11-22', '2020-11-23', '2020-11-24', '2020-11-25', '2020-11-26',
# '2020-11-27', '2020-11-28'],
# dtype='object')
You can find the datetime format codes here.
I have one field in a pandas DataFrame that was imported as string format.
It should be a datetime variable. How do I convert it to a datetime column and then filter based on date.
Example:
df = pd.DataFrame({'date': ['05SEP2014:00:00:00.000']})
Use the to_datetime function, specifying a format to match your data.
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
If you have more than one column to be converted you can do the following:
df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)
You can use the DataFrame method .apply() to operate on the values in Mycol:
>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'],columns=['Mycol'])
>>> df
Mycol
0 05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x:
dt.datetime.strptime(x,'%d%b%Y:%H:%M:%S.%f'))
>>> df
Mycol
0 2014-09-05
Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True, it will automatically detect the format and convert the mentioned column to DateTime.
import pandas as pd
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)
chrisb's answer works:
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
however it results in a Python warning of
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I would guess this is due to some chaining indexing.
Time Saver:
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'])
To silence SettingWithCopyWarning
If you got this warning, then that means your dataframe was probably created by filtering another dataframe. Make a copy of your dataframe before any assignment and you're good to go.
df = df.copy()
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')
errors='coerce' is useful
If some rows are not in the correct format or not datetime at all, errors= parameter is very useful, so that you can convert the valid rows and handle the rows that contained invalid values later.
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')
# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')
Setting the correct format= is much faster than letting pandas find out1
Long story short, passing the correct format= from the beginning as in chrisb's post is much faster than letting pandas figure out the format, especially if the format contains time component. The runtime difference for dataframes greater than 10k rows is huge (~25 times faster, so we're talking like a couple minutes vs a few seconds). All valid format options can be found at https://strftime.org/.
1 Code used to produce the timeit test plot.
import perfplot
from random import choices
from datetime import datetime
mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
kernels=[lambda x: pd.to_datetime(x),
lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'),
lambda x: pd.to_datetime(x, infer_datetime_format=True),
lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
labels=["pd.to_datetime(df['date'])",
"pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')",
"pd.to_datetime(df['date'], infer_datetime_format=True)",
"df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
n_range=[2**k for k in range(20)],
setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}"
for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
equality_check=pd.Series.equals,
xlabel='len(df)'
)
Just like we convert object data type to float or int. Use astype()
raw_data['Mycol']=raw_data['Mycol'].astype('datetime64[ns]')
I'm having an issue when using asfreq to resample a dataframe. My dataframe, df, has an index of type Datetime.Date(). After using df.asfreq('d','pad'), my dataframe index has been changed to type pandas.tslib.Timestamp. I've tried the following to change it back but I'm having no luck...
df = df.set_index(df.index.to_datetime())
df.index = df.index.to_datetime()
df.index = pd.to_datetime(df.index)
Any thoughts?
Thanks!
use pd.to_datetime
df.index = pd.to_datetime(df.index)
This is the canonical approach to creating datetime indices. If you want your index indices to all be of type datetime.datetime then you can do this following.
df.index = pd.Index([i.to_datetime() for i in df.index], name=df.index.name, dtype=object)
I just don't know why you'd want to.
Why is this a problem? If you really need a datetime.date you can try df.index = df.index.map(lambda x: x.date() since pandas.TimeStamp subclasses datetime.datetime
I have a data frame indexed with a date (Python datetime object). How could I find the frequency as the number of months of data in the data frame?
I tried the attribute data_frame.index.freq, but it returns a none value. I also tried asfreq function using data_frame.asfreq('M',how={'start','end'} but it does not return the expected results. Please advise how I can get the expected results.
You want to convert you index of datetimes to a DatetimeIndex, the easiest way is to use to_datetime:
df.index = pd.to_datetime(df.index)
Now you can do timeseries/frame operations, like resample or TimeGrouper.
If your data has a consistent frequency, then this will be df.index.freq, if it doesn't (e.g. if some days are missing) then df.index.freq will be None.
You probably want to be use pandas Timestamp for your index instead of datetime to use 'freq'. See example below
import pandas as pd
dates = pd.date_range('2012-1-1','2012-2-1')
df = pd.DataFrame(index=dates)
print (df.index.freq)
yields,
<Day>
You can easily convert your dataframe like so,
df.index = [pd.Timestamp(d) for d in df.index]