I have a pandas dataframe 'df' with a column 'DateTimes' of type datetime.time.
The entries of that column are hours of a single day:
00:00:00
.
.
.
23:59:00
Seconds are skipped, it counts by minutes.
How can I choose rows by hour, for example the rows between 00:00:00 and 00:01:00?
If I try this:
df.between_time('00:00:00', '00:00:10')
I get an error that index must be a DateTimeIndex.
I set the index as such with:
df=df.set_index(keys='DateTime')
but I get the same error.
I can't seem to get 'loc' to work either. Any suggestions?
Here a working example of what you are trying to do:
times = pd.date_range('3/6/2012 00:00', periods=100, freq='S', tz='UTC')
df = pd.DataFrame(np.random.randint(10, size=(100,1)), index=times)
df.between_time('00:00:00', '00:00:30')
Note the index has to be of type DatetimeIndex.
I understand you have a column with your dates/times. The problem probably is that your column is not of this type, so you have to convert it first, before setting it as index:
# Method A
df.set_index(pd.to_datetime(df['column_name'], drop=True)
# Method B
df.index = pd.to_datetime(df['column_name'])
df = df.drop('col', axis=1)
(The drop is only necessary if you want to remove the original column after setting it as index)
Check out these links:
convert column to date type: Convert DataFrame column type from string to datetime
filter dataframe on dates: Filtering Pandas DataFrames on dates
Hope this helps
Related
I am trying to merge two pandas dataframes, and to do this I want to make it so that they both have the same index. The problem is, one df has an index of datatype object which just includes the date while the other df has an index of datatype datetime64[ns] which includes the date and time. Is there a way to make these both the same data type so that I can merge the two dataframes?
Convert both date types into a pandas datetime format and convert them with having just dates.
df['date_only'] = df['dates'].dt.date
You could convert a date and time format to just date as below
import pandas as pd
date_n_time='2015-01-08 22:44:09'
date=pd.to_datetime(date_n_time).date()
make your index as a column using
df.reset_index()
set it back to index using
df.set_index()
I have a time series composed of weekdays with anomalous/unpredictable holidays. On any given day, I want to know the length/number of rows to a date specified under column 'date1'. See below.
len(df.loc['2019-10-18':'2019-11-15']) returns the correct answer
I am trying to create a column 'shift' that will calculate the above.
Both DatetimeIndex and the 'date1' are dtype 'datetime64[ns]'
df['shift']=len(df.loc[df.index : df['date1']]) clearly doesn't work but might there be a solution that does?
IIUC use:
df['len'] = (df.index - df['date1']).dt.days
I have a column of data which looks like the following:
I am trying to set a range of the entire month:
rng = pd.date_range('2016-09-01 00:00:00', '2016-09-30 23:59:58', freq='S')
But my column of data (above) is missing a few hours, and I am unsure where (since my data is 2 million rows large.
I tried to use the reindex command, but it instead seemed to have filled everyhthing with zeroes.
The code that I was using is as follows:
df = pd.DataFrame(df_csv)
rng = pd.date_range('2016-09-01 00:00:00', '2016-09-30 23:59:58', freq='S')
df = df.reindex(rng,fill_value=0.0)
How do I properly fill in the missing date/times without filling everything with a 0?
I think you need set_index from column date first, then is possible use reindex:
#cast column date if dtype is not datetime
df.date = pd.to_datetime(df.date)
df = df.set_index('date').reindex(rng,fill_value=0.0)
You get all NaN values, because reindexing int index by datetime values (After using fill_value=0.0 all NaN are replaced to 0.0).
Also if column date is sorted, you can use more general solution with selecting first and last value of column date:
start_date = df.date.iat[0]
end_date = df.date.iat[-1]
rng = pd.date_range(start_date, end_date, freq='S')
(I think) I have a dataset with the columns representing datetime intervals
Columns were transformed in datetime with:
for col in df.columns:
df.rename({col: pd.to_datetime(col, infer_datetime_format=True)}, inplace=True)
Then, I need to resample the columns (year and month '2001-01') into quarters using mean
I tried
df = df.resample('1q', how='mean', axis=1)
The DataFrame also has a multindex set ['RegionName', 'County']
But I get the error:
Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Is the problem in the to_datetime function or in the wrong sampling?
(I think) you are renaming each column head rather than making the entire columns object a DatetimeIndex
Try this instead:
df.columns = pd.to_datetime(df.columns)
Then run your resample
note:
I'd do it with period after converting to DatetimeIndex. That way, you get the period in your column header rather than an end date of the quarter.
df.groupby(df.columns.to_period('Q'), axis=1).mean()
demo
df = pd.DataFrame(np.arange(12).reshape(2, -1),
columns=['2011-01-31', '2011-02-28', '2011-03-31',
'2011-04-30', '2011-05-31', '2011-06-30'])
df.columns = pd.to_datetime(df.columns)
print(df.groupby(df.columns.to_period('Q'), axis=1).mean())
2011Q1 2011Q2
0 1 4
1 7 10
I have a data frame indexed with a date (Python datetime object). How could I find the frequency as the number of months of data in the data frame?
I tried the attribute data_frame.index.freq, but it returns a none value. I also tried asfreq function using data_frame.asfreq('M',how={'start','end'} but it does not return the expected results. Please advise how I can get the expected results.
You want to convert you index of datetimes to a DatetimeIndex, the easiest way is to use to_datetime:
df.index = pd.to_datetime(df.index)
Now you can do timeseries/frame operations, like resample or TimeGrouper.
If your data has a consistent frequency, then this will be df.index.freq, if it doesn't (e.g. if some days are missing) then df.index.freq will be None.
You probably want to be use pandas Timestamp for your index instead of datetime to use 'freq'. See example below
import pandas as pd
dates = pd.date_range('2012-1-1','2012-2-1')
df = pd.DataFrame(index=dates)
print (df.index.freq)
yields,
<Day>
You can easily convert your dataframe like so,
df.index = [pd.Timestamp(d) for d in df.index]