Change Date in TimeSeries Dataframes Python - python

I am trying to change the date part of an hourly time series. I am doing this by using this method.
import numpy as np
import pandas as pd
np.random.seed(0)
df_volume = pd.DataFrame(np.random.randint(60, 100, (24,4)), index=pd.date_range('2015-08-11', periods=24, freq='H'), columns='Col1 Col2 Col3 Col4'.split())
#df_volume.reset_index(inplace=True)
df_volume.index = pd.to_datetime('2015-01-01') + pd.to_datetime(df_volume.index,format='%H:%M:%S')
print (df_volume)
The error says raise NotImplementedError. I dont know how to solve this.
If i reset_index and then try to change the date, the error is time data 0 does not match format '%H:%M:%S' (match). Please help.
I have 24 hours worth of data and i want to change the date and keep the hours the way they are.

I am not exactly sure what you are trying to do, however it sounds like you want to change the days of the index without changing the hours, minutes or seconds. If that is what you are looking for this should do it. You can change the days or months for what you need.
from pandas.tseries.offsets import DateOffset
df_volume.index = df_volume.index - DateOffset(months=7, days=10)

Related

Why does pd.to_datetime not take the year into account?

I've searched for 2 hours but can't find an answer for this that works.
I have this dataset I'm working with and I'm trying to find the latest date, but it seems like my code is not taking the year into account. Here are some of the dates that I have in the dataset.
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
Here's a snippet from my code
import pandas as pd
df=pd.read_csv('test.csv')
df['Date'] = pd.to_datetime(df['Date'])
st.write(df['Date'].max())
st.write gives me 12/21/2022 as the output instead of 01/09/2023 as it should be. So it seems like the code is not taking the year into account and just looking at the month and date.
I tried changing the format to
df['Date'] = df['Date'].dt.strftime('%Y%m%d').astype(int) but that didn't change anything.
pandas.read_csv allows you to designate column for conversion into dates, let test.csv content be
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
then
import pandas as pd
df = pd.read_csv('test.csv', parse_dates=["Date"])
print(df['Date'].max())
gives output
2023-01-09 00:00:00
Explanation: I provide list of names of columns holding dates, which then read_csv parses.
(tested in pandas 1.5.2)

How to calculate relative volume using pandas with faster way?

I am trying to implement the RVOL by the time of day technical indicator, which can be used as the indication of market strength.
The logic behind this is as follows:
If the current time is 2022/3/19 13:00, we look through the same moment (13:00) at the previous N days and average all the previous volumes at that moment to calculate Average_volume_previous.
Then, RVOL(t) is volume(t)/Average_volume_previous(t).
It is hard to use methods like rolling and apply to deal with this complex logic in the code I wrote.
However, the operation time of for loop is catastrophically long.
from datetime import datetime
import pandas as pd
import numpy as np
datetime_array = pd.date_range(datetime.strptime('2015-03-19 13:00:00', '%Y-%m-%d %H:%M:%S'), datetime.strptime("2022-03-19 13:00:00", '%Y-%m-%d %H:%M:%S'), freq='30min')
volume_array = pd.Series(np.random.uniform(1000, 10000, len(datetime_array)))
df = pd.DataFrame({'Date':datetime_array, 'Volume':volume_array})
df.set_index(['Date'], inplace=True)
output = []
for idx in range(len(df)):
date = str(df.index[idx].hour)+':'+str(df.index[idx].minute)
temp_date = df.iloc[:idx].between_time(date, date)
output.append(temp_date.tail(day_len).mean().iloc[0])
output = np.array(output)
Practically, there might be missing data in the datetime array. So, it would be hard to use fixed length lookback period to solve this. Is there any way to make this code work faster?
I'm not sure I understand, however this is the solution as far as I understand.
I didn't use date as index
df.set_index(['Date'], inplace=True)
# Filter data to find instant
rolling_day = 10
hour = df['Date'].dt.hour == 13
minute = df['Date'].dt.minute == 0
df_moment = df[ore&minuti].copy()
Calculation of moving averages
df_moment['rolling'] = df_moment.rolling(rolling_day).mean()
Calculation of Average_volume_previous(t)/volume(t)
for idx_s, idx_e in zip(df_moment['Volume'][::rolling_day], df_moment['rolling'][rolling_day::rolling_day]):
print(f'{idx_s/idx_e}')
Output:
0.566379345408499
0.7229214799940626
0.6753586759429548
2.0588617812341354
0.7494803741982076
1.2132554086225438

How can I add a new column to a dataframe that adds to the dates in another column?

I want to automatically get rows with dates that are 90 days away from expiration and send me an email with the rows of what is expiring.
import pandas as pd
import numpy as np
from datetime import date
today = date.today()
fromtoday = pd.DateOffset(days=89)
90days_away = today + fromtoday
expiration date column in my dataframe:
expirations = df[df['Ad Expiration'].notnull()]
What I'm trying to do now is create a column that sums my expirations column with 90days_away
I think I somehow need to apply the 90days_away to all rows? but i can't do that manually.
Also worth noting that I've only been studying python for about a week and a half, so I still don't know the best way to do things. Thank you!

Python/Pandas Timedelta with only 1 digit on Hours?

I have a Pandas Dataframe (data) with a column "Duration" that represents the time duration in hours, minutes, seconds with a format like: "1:10:27"
How to convert the column to Pandas Timedelta?
I tried:
data['Duration'] = pd.to_timedelta(data['Duration'])
But it says:
"ValueError: expected hh:mm:ss format before"
I suspect this happens because the format has only 1 digit for hours.
The rows show "1:30:27" instead of "01:30:27". Or "0:57:23" instead of "00:57:23"
I would appreciate your help!
Using inputs that you described I'm guessing it's something like the following, but it's working fine for me. If you have a specific input that's different, feel free to post it.
import pandas as pd
time = "1:30:27"
print(pd.to_timedelta(time))
Output:
0 days 01:30:27

How to find missing dates in an excel file by python

I'm a beginner in python. I have an excel file. This file shows the rainfall amount between 2016-1-1 and 2020-6-30. It has 2 columns. The first column is date, another column is rainfall. Some dates are missed in the file (The rainfall didn't estimate). For example there isn't a row for 2016-05-05 in my file. This a sample of my excel file.
Date rainfall (mm)
1/1/2016 10
1/2/2016 5
.
.
.
12/30/2020 0
I want to find the missing dates but my code doesn't work correctly!
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import dates as mpl_dates
from matplotlib.dates import date2num
df=pd.read_excel ('rainfall.xlsx')
a= pd.date_range(start = '2016-01-01', end = '2020-06-30' ).difference(df.index)
print(a)
Here' a beginner friendly way of doing it.
First you need to make sure, that the Date in your dataframe is really a date and not a string or object.
Type (or print) df.info().
The date column should show up as datetime64[ns]
If not, df['Date'] = pd.to_datetime(df['Date'], dayfirst=False)fixes that. (Use dayfirst to tell if the month is first or the day is first in your date string because Pandas doesn't know. Month first is the default, if you forget, so it would work without...)
For the tasks of finding missing days, there's many ways to solve it. Here's one.
Turn all dates into a series
all_dates = pd.Series(pd.date_range(start = '2016-01-01', end = '2020-06-30' ))
Then print all dates from that series which are not in your dataframe "Date" column. The ~ sign means "not".
print(all_dates[~all_dates.isin(df['Date'])])
Try:
df = pd.read_excel('rainfall.xlsx', usecols=[0])
a = pd.date_range(start = '2016-01-01', end = '2020-06-30').difference([l[0] for l in df.values])
print(a)
And the date in the file must like 2016/1/1
To find the missing dates from a list, you can apply Conditional Formatting function in Excel. 4. Click OK > OK, then the position of the missing dates are highlighted. Note: The last date in the date list will be highlighted.
this TRICK Is not with python,a NORMAL Trick

Categories