get next value in list Pandas - python

I have a list of unique dates in chronological order.
I have a dataframe with dates in it. I want to use the list of dates in the dataframe to get the NEXT date in the list (find the date in dataframe in the list, return the date to the right of it ( next chronological date).
Any ideas?

It appears that printing the list wouldn't work, and you haven't provided us with any code to work with, or an example print of what your date time looks like. My best suggestion is to use the sort function.
dataframe.sort()
If I wanted a specific date to print, I would have to say to print it by index number once you have it sorted. Without knowing what your computers ability is to handle print statements of this size, I suggest copying this sorted file to a out txt file to ensure that you are getting the proper response.

so for every item in the dataframe there is an exact match for its date in the list of unique dates and you want to move it to the next date
you should use a dictionary for this really
next_date_dictionary = dict(zip(sequential_list_of_dates,sequential_list_of_dates[1:]))
then you simply look up the next date in the dictionary
next_date = next_date_dictionary.get(row.date)
alternatively if you want to replace the date column you can use replace
data_frame.replace({"date":next_date_dictionary})

OK here is one way of doing this:
In [210]:
# generate some data
df = pd.DataFrame({'dates':pd.date_range(start=dt.datetime(2014,3,2), end=dt.datetime(2014,4,23))})
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 53 entries, 0 to 52
Data columns (total 1 columns):
dates 53 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 848.0 bytes
Now I'd create a df from your date list:
In [219]:
base = dt.datetime(2014,5,3)
date_list = [base - dt.timedelta(days=x) for x in range(0, 70)]
date_df = pd.DataFrame({'dates':date_list})
date_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 70 entries, 0 to 69
Data columns (total 1 columns):
dates 70 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 1.1 KB
Then add a new column to this date_df that shifts the dates column by 1 and then set the index to be the dates:
In [220]:
date_df['date_lookup'] = date_df['dates'].shift(1)
date_df = date_df.set_index('dates')
date_df.head()
Out[220]:
date_lookup
dates
2014-05-03 NaT
2014-05-02 2014-05-03
2014-05-01 2014-05-02
2014-04-30 2014-05-01
2014-04-29 2014-04-30
Then call map on the orig df and pass the date_df and access the date_lookup column, map will use the index to perform a lookup which will return the corresponding next value:
In [221]:
df['date_next'] = df['dates'].map(date_df['date_lookup'])
df.head()
Out[221]:
dates date_next
0 2014-03-02 2014-03-03
1 2014-03-03 2014-03-04
2 2014-03-04 2014-03-05
3 2014-03-05 2014-03-06
4 2014-03-06 2014-03-07

Related

Pandas: Unable to merge on two date columns

I have two dataframes that look like:
df1:
Date Multiplier
0 1995-01-01 5.248256
1 1995-02-01 5.262376
2 1995-03-01 5.255998
3 1995-04-01 5.215762
4 1995-05-01 5.207806
df2:
PRICE Date
0 77500 1995-01-01
1 60000 1995-01-01
2 39250 1995-01-01
3 51250 1995-01-01
4 224950 1995-01-01
Both date columns have been made using the pd.to_datetime() method, and they both supposedly have <M8[ns] data types when using df1.Date.dtype and df2.Date.dtype. However when trying to merge the dataframes with pd.merge(df,hpi,how="left",on="Date") I get the error:
ValueError: You are trying to merge on object and datetime64[ns] columns. If you wish to proceed you should use pd.concat
Try to convert the Date column of df1 to a datetime64
Check dtypes first:
>>> df1.dtypes
Date object # <- Not a datetime
Multiplier float64
dtype: object
>>> df2.dtypes
PRICE int64
Date datetime64[ns] # <- Right dtype
dtype: object
Convert and merge:
df1['Date'] = pd.to_datetime(df1['Date'])
out = pd.merge(df1, df2,how='left',on='Date')

How to remove rows in pandas of type datetime64[ns] by date?

I'm pretty newbie, started to use python for my project.
I have dataset, first column has datetime64[ns] type
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5889 entries, 0 to 5888
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 5889 non-null datetime64[ns]
1 title 5889 non-null object
2 stock 5889 non-null object
dtypes: datetime64[ns](1), object(2)
memory usage: 138.1+ KB
and
type(BA['date'])
gives
pandas.core.series.Series
date has format 2020-06-10
I need to delete all instances before specific date, for example 2015-09-09
What I tried:
convert to string. Failed
Create conditions using:
.df.year <= %y & .df.month <= %m
<= ('%y-%m-%d')
create data with datetime() method
create variable with datetime64 format
just copy with .loc() and .copy()
All of this failed, I had all kinds of error, like it's not int, its not datetime, datetime mutable, not this, not that, not a holy cow
How can this pandas format can be more counterintuitive, I can't believe, for first time I feel like write a parser CSV in C++ seems easier than use prepared library in python
Thank you for understanding
Toy Example
df = pd.DataFrame({'date':['2021-1-1', '2020-12-6', '2019-02-01', '2020-02-01']})
df.date = pd.to_datetime(df.date)
df
Input df
date
0 2021-01-01
1 2020-12-06
2 2019-02-01
3 2020-02-01
Delete rows before 2020.01.01.
We are selecting the rows which have dates after 2020.01.01 and ignoring old dates.
df.loc[df.date>'2020.01.01']
Output
date
0 2021-01-01
1 2020-12-06
3 2020-02-01
If we want the index to be reset
df = df.loc[df.date>'2020.01.01']
df
Output
date
0 2021-01-01
1 2020-12-06
2 2020-02-01

comparing date time values in a pandas DataFrame with a specific data_time value and returning the closet one

I have a date column in a pandas DataFrame as follows:
index date_time
1 2013-01-23
2 2014-01-23
3 2015-8-14
4 2015-10-23
5 2016-10-28
I want to compare the values in date_time column with a specific date, for example date_x = 2015-9-14 ad return a date that is before this date and it is the most closet, which is 2015-8-14.
I thought about converting the values in date_time column to a list and then compare them with the specific date. However, I do not think it is an efficient solution.
Any solution?
Thank you.
Here is one way using searchsorted, and all my method is assuming the data already order , if not doing the df=df.sort_values('date_time')
df.date_time=pd.to_datetime(df.date_time)
date_x = '2015-9-14'
idx=np.searchsorted(df.date_time,pd.to_datetime(date_x))
df.date_time.iloc[idx-1]
Out[408]:
2 2015-08-14
Name: date_time, dtype: datetime64[ns]
Or we can do
s=df.date_time-pd.to_datetime(date_x)
df.loc[[s[s.dt.days<0].index[-1]]]
Out[417]:
index date_time
2 3 2015-08-14

Pandas: How to group the non-continuous date column?

I have a column in a dataframe which contains non-continuous dates. I need to group these date by a frequency of 2 days. Data Sample(after normalization):
2015-04-18 00:00:00
2015-04-20 00:00:00
2015-04-20 00:00:00
2015-04-21 00:00:00
2015-04-27 00:00:00
2015-04-30 00:00:00
2015-05-07 00:00:00
2015-05-08 00:00:00
I tried following but as the dates are not continuous I am not getting the desired result.
df.groupby(pd.Grouper(key = 'l_date', freq='2D'))
Is these a way to achieve the desired grouping using pandas or should I write a separate logic?
Once you have a l_date sorted dataframe. you can create a continuous dummy date (dum_date) column and groupby 2D frequency on it.
df = df.sort_values(by='l_date')
df['dum_date'] = pd.date_range(pd.datetime.today(), periods=df.shape[0]).tolist()
df.groupby(pd.Grouper(key = 'dum_date', freq='2D'))
OR
If you are fine with groupings other than dates. then a generalized way to group n consecutive rows could be:
n = 2 # n = 2 for your use case
df = df.sort_values(by='l_date')
df['grouping'] = [(i//n + 1) for i in range(df.shape[0])]
df.groupby(pd.Grouper(key = 'grouping'))

Efficiently handling missing dates when aggregating Pandas Dataframe

Follow up from Summing across rows of Pandas Dataframe and Pandas Dataframe object types fillna exception over different datatypes
One of the columns that I am aggregating using
df.groupby(['stock', 'same1', 'same2'], as_index=False)['positions'].sum()
this method is not very forgiving if there are missing data. If there are any missing data in same1, same2, etc it pads totally unrelated values. Workaround is to do a fillna loop over the columns to replace missing strings with '' and missing numbers with zero solves the problem.
I do however have one column with missing dates as well. column type is 'object' with nan of type float and in the missing cells and datetime objects in the existing data fields. important that I know that the data is missing, i.e. the missing indicator must survive the groupby transformation.
Dataset outlining the problem:
csv file that I use as input is:
Date,Stock,Position,Expiry,same
2012/12/01,A,100,2013/06/01,AA
2012/12/01,A,200,2013/06/01,AA
2012/12/01,B,300,,BB
2012/6/01,C,400,2013/06/01,CC
2012/6/01,C,500,2013/06/01,CC
I then read in file:
df = pd.read_csv('example', parse_dates=[0])
def convert_date(d):
'''Converts YYYY/mm/dd to datetime object'''
if type(d) != str or len(d) != 10: return np.nan
dd = d[8:]
mm = d[5:7]
YYYY = d[:4]
return datetime.datetime(int(YYYY), int(mm), int(dd))
df['Expiry'] = df.Expiry.map(convert_date)
df
df looks like:
Date Stock Position Expiry same
0 2012-12-01 00:00:00 A 100 2013-06-01 00:00:00 AA
1 2012-12-01 00:00:00 A 200 2013-06-01 00:00:00 AA
2 2012-12-01 00:00:00 B 300 NaN BB
3 2012-06-01 00:00:00 C 400 2013-06-01 00:00:00 CC
4 2012-06-01 00:00:00 C 500 2013-06-01 00:00:00 CC
can quite easily change the convert_date function to pop anything else for missing data in Expiry column.
Then using:
df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum()
to aggregate the Position column. Get a TypeError: can't compare datetime.datetime to str with any non date that I plug into missing date data. Important for later functionality to know if Expiry is missing.
You need to convert your dates to the datetime64[ns] dtype (which manages how datetimes work). An object column is not efficient nor does it deal well with datelikes. datetime64[ns] allow missing values usingNaT (not-a-time), see here: http://pandas.pydata.org/pandas-docs/dev/missing_data.html#datetimes
In [6]: df['Expiry'] = pd.to_datetime(df['Expiry'])
# alternative way of reading in the data (in 0.11.1, as ``NaT`` will be set
# for missing values in a datelike column)
In [4]: df = pd.read_csv('example',parse_dates=['Date','Expiry'])
In [9]: df.dtypes
Out[9]:
Date datetime64[ns]
Stock object
Position int64
Expiry datetime64[ns]
same object
dtype: object
In [7]: df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum()
Out[7]:
Stock Expiry same Position
0 A 2013-06-01 00:00:00 AA 300
1 B NaT BB 300
2 C 2013-06-01 00:00:00 CC 900
In [8]: df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum().dtypes
Out[8]:
Stock object
Expiry datetime64[ns]
same object
Position int64
dtype: object

Categories