Converting datetimeindex to timestamp for pd.date_range - python

I have two lists:
"max_" consists of datetime types:
2012-04-20 00:00:00
2012-11-29 00:00:00
2013-11-22 00:00:00
"min_" , consists of datetimeindex:
DatetimeIndex(['2012-07-11'], dtype='datetime64[ns]', name=u'Date', freq=None)
DatetimeIndex(['2013-02-05', '2013-10-23'], dtype='datetime64[ns]', name=u'Date', freq=None)
DatetimeIndex([], dtype='datetime64[ns]', name=u'Date', freq=None)
My expected output is to take a range from each max value to its respective min value, for example, the first one would be range (2012-04-20 to 2012-07-11). I've tried:
pd.date_range(max_, min_)
TypeError: Cannot convert input [DatetimeIndex(['2012-07-11'], dtype='datetime64[ns]', name=u'Date', freq=None)] of type <class 'pandas.core.indexes.datetimes.DatetimeIndex'> to Timestamp
I'm not sure how to get around the conversion part, additionally, I'd like to have only the first value for the min_ lists (and ignore any additional).

I think you need to just specify items in your lists that you want to create the range on:
pd.date_range(min_[0],max_[0])
If you are trying to print the ranges:
for date in max_:
print (pd.date_range(min_[0],max_[date])

Related

discarding all elements of datetimeindex except first and last

I have the following datetimeindex:
dates = DatetimeIndex(['2022-03-01', '2022-03-02', '2022-03-03', '2022-03-04',
'2022-03-05', '2022-03-06', '2022-03-07', '2022-03-08',
'2022-03-09', '2022-03-10',
...
'2022-06-06', '2022-06-07', '2022-06-08', '2022-06-09',
'2022-06-10', '2022-06-11', '2022-06-12', '2022-06-13',
'2022-06-14', '2022-06-15'],
dtype='datetime64[ns]', length=107, freq='D')
I want to discard all elements except the first and last one. How do I do that? I tried this:
[dates[0]] + dates[-1]] but it returns an actual list of datetimes and not this:
DatetimeIndex(['2022-03-01', '2022-06-15'],
dtype='datetime64[ns]', length=2, freq='D')
Index with a list to select multiple items.
>>> dates[[0, -1]]
DatetimeIndex(['2022-03-01', '2022-06-15'], dtype='datetime64[ns]', freq=None)
This is covered in the NumPy user guide under Integer array indexing. In the Pandas user guide, there's related info under Selection by position.
Here's a way to do it:
print(dates[::len(dates)-1])
Output:
DatetimeIndex(['2022-03-01', '2022-06-15'], dtype='datetime64[ns]', freq=None)
This is slicing using a step that skips right from the start to the end (explanation suggested by #wjandrea).

Splitting datetime index to identify consecutive timestamps (Python)

I know there are several posts on how to split series based on consecutive values, and I have adopted some of their code, but I'm not sure what I am doing wrong.
I have a long datetimeindex ("times" below), and I want to split it to identify consecutive groupings. So, every time there is a gap in time longer than the normal time increment, I want it to split. The index is evenly incremented (10m between times).
times = pd.date_range(start=start_date, end=end_date, freq=frequency).difference(x.index)
splits = ((times-pd.Series(times).shift(-1)).abs() != frequency)
consec = np.split(times, splits)
"Splits" is a boolean array that accurately indicates where the splits should occur, so that seems to be working correctly.
However, when I actually use np.split, instead of splitting into sections, the output is like this, where it is only keeping the values that are at the split indices:
[DatetimeIndex([], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2003-02-05 09:20:00'], dtype='datetime64[ns]', freq=None), DatetimeIndex([], dtype='datetime64[ns]', freq=None), DatetimeIndex([], dtype='datetime64[ns]', freq=None), DatetimeIndex([], dtype='datetime64[ns]', freq=None), DatetimeIndex([], dtype='datetime64[ns]', freq=None), DatetimeIndex([], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2003-02-09 01:20:00'], dtype='datetime64[ns]', freq=None), DatetimeIndex([], dtype='datetime64[ns]', freq=None), DatetimeIndex([], dtype='datetime64[ns]', freq=None), DatetimeIndex([], .... etc
Any ideas on why this is happening?
indices_or_sections argument of np.split accepts an array of integers.
To get it from split (an array of bools) you can use np.where:
splits_indexes = np.where(splits)
Then you can call
consec = np.split(times, splits_indexes + 1)
The +1 is there to identify the beginning of a new part.

Error when resampling a datetime64 time series dataframe

I have a Pandas v1.0.3 dataframe where the index is a timestamp and the one column is a numeric value. The index is at a constant time interval difference of 30 minutes. The single column represents a count, and I want to resample it so the index is a 2 day interval.
It seems that I would need to do something like this:
df.index = pd.to_datetime(df.index)
df.resample('2D', on='column_name').sum()
However, I am getting the following error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
If I print out df.index I get:
DatetimeIndex(['2016-12-31 23:59:59', '2017-01-01 00:29:59',
...
dtype='datetime64[ns]', length=500, freq='30T')
Which seems to be compatible. Does anyone know what I'm doing wrong here?
The index should be compatible, but you ask pandas to resample on a column. You should just do:
df.resample('2D').sum()

pd.to_datetime changes the values to 1970 which are not the correct dates

I have a column of the below type in dataframe.
MAT_DATE object
The values in this column are something like
42872
42741
...
...
How can I convert them to datetime ?
These are essentially future dates.
Using pd.to_datetime() converts them to year 1970
df['MAT_DATE1'] = pd.to_datetime(df['MAT_DATE'], errors='coerce')
If I use the excel to change to short date, it does well to convert the dates.
However I want to use it on the dataframe directly.
Using the origin parameter of the pandas.to_datetime that you are interested in and based on the days as the delta as #Wen suggested, this might work:
pd.to_datetime(df['MAT_DATE'],errors='coerce',unit='d',origin='1900-01-01')
The number is days delta to offset date , the default for excel is offset is 1990-01-01
s=pd.Series([42872,42741])
pd.TimedeltaIndex(s,unit='d')+pd.to_datetime('1900-01-01')
Out[88]: DatetimeIndex(['2017-05-19', '2017-01-08'], dtype='datetime64[ns]', freq=None)

Timestamps with different timezones in the same pandas DatetimeIndex object?

Is it possible to convert a pd.DatetimeIndex consisting of timestamps in a single timezone to one where each timestamp has its own, in some cases distinct timezone?
Here is an example of what I would like to have:
type(df.index)
pandas.tseries.index.DatetimeIndex
df.index[0]
Timestamp('2015-06-07 23:00:00+0100', tz='Europe/London')
df.index[1]
Timestamp('2015-06-08 00:01:00+0200', tz='Europe/Brussels')
You can have an index contain Timestamps with different timezones. But you would have to explicity construct it as an Index.
In [33]: pd.Index([pd.Timestamp('2015-06-07 23:00:00+0100', tz='Europe/London'),pd.Timestamp('2015-06-08 00:01:00+0200', tz='Europe/Brussels')],dtype='object')
Out[33]: Index([2015-06-07 23:00:00+01:00, 2015-06-08 00:01:00+02:00], dtype='object')
In [34]: list(pd.Index([pd.Timestamp('2015-06-07 23:00:00+0100', tz='Europe/London'),pd.Timestamp('2015-06-08 00:01:00+0200', tz='Europe/Brussels')],dtype='object'))
Out[34]:
[Timestamp('2015-06-07 23:00:00+0100', tz='Europe/London'),
Timestamp('2015-06-08 00:01:00+0200', tz='Europe/Brussels')]
This is a very odd thing to do, and completely non-performant. You generally want to have a single timezone represented (UTC or other). In 0.17.0, you can represent efficiently a single column with a timezone, so one way of accomplishing what I think your goal is to segregate the different timezones into different columns. See the docs
If you're happy for it to not be an Index, but just a regular Series this should be OK:
pd.Series([pd.Timestamp('2015-06-07 23:00:00+0100', tz='Europe/London'),
pd.Timestamp('2015-06-08 00:01:00+0200', tz='Europe/Brussels')])
Adding timestamps with different timezones into the same DatetimeIndex automatically yields a DatetimeIndex with UTC as the default timezone. For example:
In [269]  index = pandas.DatetimeIndex([Timestamp('2015-06-07 23:00:00+0100')])
In [270]  index
Out[270]  DatetimeIndex(['2015-06-07 23:00:00+01:00'], dtype='datetime64[ns, pytz.FixedOffset(60)]', freq=None)
In [271]  index2 = DatetimeIndex([Timestamp('2015-06-08 00:01:00+0200')])
In [272]  index2
Out[272]  DatetimeIndex(['2015-06-08 00:01:00+02:00'], dtype='datetime64[ns, pytz.FixedOffset(120)]', freq=None)
In [273]  index.append(index2) # returns single index containing both data
Out[273]  DatetimeIndex(['2015-06-07 22:00:00+00:00', '2015-06-07 22:01:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
Notice how the result is a UTC DatetimeIndex with the correct UTC timestamps preserved.
Similarly:
In [279]  pandas.to_datetime([Timestamp('2015-06-07 23:00:00+0100'), Timestamp('2015-06-08 00:01:00+0200')], utc=True) # utc=True is needed
Out[279]  DatetimeIndex(['2015-06-07 22:00:00+00:00', '2015-06-07 22:01:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
This is not a bad thing as you get to preserve the correct time while having the ability to use the indexing prowess of DatetimeIndex (e.g. slice by date range) and at the same time you can easily convert the timestamps to any other timezone (unless you really need to know the original timezone of each timestamp, then this won't be ideal).

Categories