how to add data in index column in panda dataframe [duplicate] - python

This question already has answers here:
Offset date for a Pandas DataFrame date index
(3 answers)
Closed 1 year ago.
I have below dataframe where the first column is without any header. and I need to add 14 days to each value in that column. How can I do it?
L122.Y 5121.Y 110.Y
2021-08-30 14:00:00 0.0 0.0 35.778441
2021-08-30 15:00:00 0.0 0.0 35.741066
2021-08-30 16:00:00 0.0 0.0 35.737846

I think first column is called index, test it:
print (df.index)
If need convert it to DatetimeIndex and add days use:
df.index = pd.to_datetime(df.index) + pd.Timedelta('14 days')
If already DatetimeIndex:
df.index += pd.Timedelta('14 days')

Related

Extract a min value and corresponding row information based on date time index

Im unable to filter my dataframe to give me the details i need. Im trying to obtain the min value for any given 24 hour period but keep the detail in the filtered rows like the day name
below is the head of df dataframe. It has t datetimeindex format and ive pulled the hour column as an integer from the index time information and the low is a float. when i run the code between the two dataframes below the df2 is the second dataframe i get returned
df = pd.read_csv(r'C:/Users/Oliver/Documents/Data/eurusd1hr.csv')
df.drop(columns=(['Close', 'Open', 'High', 'Volume']), inplace=True)
df['t'] = (df['Date'] + ' ' + df['Time'])
df.drop(columns = ['Date', 'Time'], inplace=True)
df['t'] = pd.DatetimeIndex(df.t)
df.set_index('t', inplace=True)
df['hour'] = df.index.hour
df['day'] = df.index.day_name()
Low hour day
t
2003-05-05 03:00:00 1.12154 3 Monday
2003-05-05 04:00:00 1.12099 4 Monday
2003-05-05 05:00:00 1.12085 5 Monday
2003-05-05 06:00:00 1.12049 6 Monday
2003-05-05 07:00:00 1.12079 7 Monday```
```df2 = df.between_time('00:00', '23:00').groupby(pd.Grouper(freq='d')).min()```
``` Low hour day
t
2003-05-05 1.12014 3.0 Monday
2003-05-06 1.12723 0.0 Tuesday
2003-05-07 1.13265 0.0 Wednesday
2003-05-08 1.13006 0.0 Thursday
2003-05-09 1.14346 0.0 Friday```
I want to keep the corresponding hour in the hour column and also the hour information in the original index like the day name has been maintained
i was expecting the index and hour column to keep the information
ive tried add a 2nd grouper method but failed. I have also tried to reset the index. any help would be gratefully received. thanks

Python Timedelta[M] adds incomplete days

I have a table that has a column Months_since_Start_fin_year and a Date column. I need to add the number of months in the first column to the date in the second column.
DateTable['Date']=DateTable['First_month']+DateTable['Months_since_Start_fin_year'].astype("timedelta64[M]")
This works OK for month 0, but month 1 already has a different time and for month 2 onwards has the wrong date.
Image of output table where early months have the correct date but month 2 where I would expect June 1st actually shows May 31st
It must be adding incomplete months, but I'm not sure how to fix it?
I have also tried
DateTable['Date']=DateTable['First_month']+relativedelta(months=DateTable['Months_since_Start_fin_year'])
but I get a type error that says
TypeError: cannot convert the series to <class 'int'>
My Months_since_Start_fin_year is type int32 and my First_month variable is datetime64[ns]
The problem with adding months as an offset to a date is that not all months are equally long (28-31 days). So you need pd.DateOffset which handles that ambiguity for you. .astype("timedelta64[M]") on the other hand only gives you the average days per month within a year (30 days 10:29:06).
Ex:
import pandas as pd
# a synthetic example since you didn't provide a mre
df = pd.DataFrame({'start_date': 7*['2017-04-01'],
'month_offset': range(7)})
# make sure we have datetime dtype
df['start_date'] = pd.to_datetime(df['start_date'])
# add month offset
df['new_date'] = df.apply(lambda row: row['start_date'] +
pd.DateOffset(months=row['month_offset']),
axis=1)
which would give you e.g.
df
start_date month_offset new_date
0 2017-04-01 0 2017-04-01
1 2017-04-01 1 2017-05-01
2 2017-04-01 2 2017-06-01
3 2017-04-01 3 2017-07-01
4 2017-04-01 4 2017-08-01
5 2017-04-01 5 2017-09-01
6 2017-04-01 6 2017-10-01
You can find similar examples here on SO, e.g. Add months to a date in Pandas. I only modified the answer there by using an apply to be able to take the months offset from one of the DataFrame's columns.

Pandas KeyError when using .loc() [duplicate]

This question already has answers here:
How are iloc and loc different?
(6 answers)
Closed 2 years ago.
I have a pandas DataFrame portfolio whose keys are dates. I'm trying to access multiple rows through
print(portfolio.loc[['2007-02-26','2008-02-06'],:]),
but am getting an error
KeyError: "None of [Index(['2007-02-26', '2008-02-06'], dtype='object', name='Date')] are in the [index]"
However, print(portfolio.loc['2007-02-26',:]) successfully returns
holdings 1094.6124
pos_diff 100.0000
cash 98905.3876
total 100000.0000
returns 0.0000
Name: 2007-02-26 00:00:00, dtype: float64
Isn't this a valid format--> df.loc[['key1', 'key2', 'key3'], 'Column1]?
It seems that the issue is with type conversion from strings to timestamps. The solution is, therefore, to explicitly convert the set of labels to DateTime before passing them to loc:
df = pd.DataFrame({"a" : range(5)}, index = pd.date_range("2020-01-01", freq="1D", periods=5))
print(df)
==>
a
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
try:
df.loc[["2020-01-01", "2020-01-02"], :]
except Exception as e:
print (e)
==>
"None of [Index(['2020-01-01', '2020-01-02'], dtype='object')] are in the [index]"
# But - if you convert the labels to datetime before calling loc,
# it works fine.
df.loc[pd.to_datetime(["2020-01-01", "2020-01-02"]), :]
===>
a
2020-01-01 0
2020-01-02 1

Differente Betweent Dates - Integer results [duplicate]

This question already has answers here:
Pandas: Subtracting two date columns and the result being an integer
(5 answers)
Closed 3 years ago.
I need to calculate the difference between two columns of type datetime, and the result must be in days (integer format). However, what I am getting is the result in day / month / year hour and minute.
id date_1 date_2 date_3 date_result_2-1 date_result_3-1
0 C_ID_92a2005557 2017-06-01 2017-06-27 14:18:08 2018-04-29 11:23:05 26 days 14:18:08 332 days 11:23:05
1 C_ID_3d0044924f 2017-01-01 2017-01-06 16:29:42 2018-03-30 06:48:26 5 days 16:29:42 453 days 06:48:26
2 C_ID_d639edf6cd 2016-08-01 2017-01-11 08:21:22 2018-04-28 17:43:11 163 days 08:21:22 635 days 17:43:11
3 C_ID_186d6a6901 2017-09-01 2017-09-26 16:22:21 2018-04-18 11:00:11 25 days 16:22:21 229 days 11:00:11
4 C_ID_cdbd2c0db2 2017-11-01 2017-11-12 00:00:00 2018-04-28 18:50:25 11 days 00:00:00 178 days 18:50:25
The last two columns are the result that I obtained with the simple subtraction between two columns. I would like these columns to be in full format, containing only the number of days.
I tried to convert with astype (int) but I got a result that I could not understand.
Any suggestion? Thank you very much in advance.
if you need only days try this:
df = pd.DataFrame(data={"date":['2000-05-07','1965-01-30','NaT'],
"date_2":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2018-11-02 18:30:10"]})
df['date'] = pd.to_datetime(df['date']).dt.date
df['date_2'] = pd.to_datetime(df['date_2']).dt.date
df['days'] = (df['date']-df['date_2']).dt.days

pandas - add 1 month to a pd.Timestamp [duplicate]

This question already has answers here:
Add months to a date in Pandas
(4 answers)
How can I get pandas Timestamp offset by certain amount of months?
(1 answer)
Closed 4 years ago.
I have multiple df, and they are indexed with timestamps for consecutive months. For example:
1996-01-01 01:00:00
1996-02-01 01:00:00
1996-03-01 01:00:00
1996-04-01 01:00:00
1996-05-01 01:00:00
1996-06-01 01:00:00
I'm trying to create a function where I can add an arbitrary number of rows onto the df, continuing on from whatever the last month happens to be. I tried to solve this by using:
df.iloc[-1].name + pd.Timedelta(1, unit='M')
in a for loop, but this only seems to add 30 days, instead of changing the month value +1. Is there a more reliable way to fetch a pd.Timestamp and add 1 month?
Thank you

Categories