I want to display the current date (at this example 2017-11-16) at top of the data frame. when i download data, the new date appear at the bottom of data frame.
how can i change it?
Open High Low Close Adj Close
Date
2017-11-13 173.500000 174.500000 173.399994 173.970001 173.970001
2017-11-14 173.039993 173.479996 171.179993 171.339996 171.339996
2017-11-15 169.970001 170.320007 168.380005 169.080002 169.080002
2017-11-16 171.179993 171.869995 170.300003 171.100006 171.100006
My code is:
import pandas_datareader.data as web
import datetime
import pandas as pd
DayToPast=5
today = datetime.date.today()
end = datetime.date.today()
start = today-datetime.timedelta(days=DayToPast)
df=web.DataReader('AAPL', 'yahoo', start, end)
print(df)
Thanks & Have a nice day.
Assuming "today" is the maximal date in your dataframe's index, you can just sort your data by descending order of its index:
df.sort_index(ascending=False, inplace=True)
Related
I have a dataframe that contains 1 years of weekly OHLC data.
What do I need ?
list only the last monday's data of each month. For example, May has 5 weeks and I want to list the last monday's data of may and need to discard the rest. Here's the code that I tried and I'm able to list the data on weekly basis. I got stuck here!
Any help would be appreciated!
import pandas as pd
import yfinance as yf
import datetime
from datetime import date, timedelta
periods=pd.date_range(start='2021-4-30',periods=60,freq='W')
start = periods[0].strftime('%Y-%m-%d')
end = periods[-1].strftime('%Y-%m-%d')
symbol="^NSEI"
df=yf.download(symbol,start,end,interval="1wk",index=periods)
You can use groupby(pd.Grouper()) to group by month and get the latest record.
# reset index to flatten columns
df = df.reset_index()
# copy date column to label last monday of a month
df['last_monday_of_month'] = df['Date']
# groupby month and get latest record
df.groupby(pd.Grouper(freq='M', key='Date')).last().reset_index()
The following code allows me to select dates to visualise and predict stock prices in a defined date range
start = '2010-01-01'
end = '2021-11-20'
st.title('Stock Prediction')
ticker_input = st.sidebar.text_input('Enter Stock Ticker', 'AAPL')
df = data.DataReader(ticker_input, 'yahoo', start, end)
st.subheader(ticker_input)
The code only works when I change the end variable to a future date within the IDE by changing end to '2022-01-01' and then run it in streamlit. My prediction chart would also change to reflect the end date. How can I change the end variable so the user can select future dates themselves? The tutorial I followed doesn't show this and I've tried to look at examples where datetime lets users select dates in the future and they all seem to just go up to present day.
start = st.date_input('Start', value = pd.to_datetime('2010-01-01'))
end = st.date_input('End', value = pd.to_datetime('2024-01-01'))
I tried using pd.to_datetime and st.date_input like this to see if the user can change it from the dropdown calendar but it doesn't work.
First i would like to point out that pandas-datareader does not predict stock prices it enables you to get historical data about the stock prices for example yahoo. This mean that the maximum end date can be today and giving it a larger date will not change the data returned from calling:
data.DataReader(ticker_input, 'yahoo', start, end)
For more you can read their documentation.
That in mind here is the code to get a user to input start date and end date and plot a stock prices plot:
import streamlit as st
from pandas_datareader import data
import datetime
###########
# sidebar #
###########
#
ticker_input = st.sidebar.selectbox('Select one symbol', ('AAPL',))
# create default date range
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2021, 11, 1)
# ask user for his date
start_date = st.sidebar.date_input('Start date', start)
end_date = st.sidebar.date_input('End date', end)
# validate start_date date is smaller then end_date
if start_date < end_date:
st.sidebar.success('Start date: `%s`\n\nEnd date:`%s`' % (start_date, end_date))
else:
st.sidebar.error('Error: End date must fall after start date.')
##############
# Main Panel #
##############
# add title
st.title('Stock Prediction')
# get data based on dates
df = data.DataReader(ticker_input, 'yahoo', start_date, end_date)
# plot
st.line_chart(df.loc[:, ["Close", "Open", "Low", "High"]])
I've got a stock data dataframe with dates as the index column. What I'd like to do is drop all rows that aren't the beginning or ending of the week, effectively leaving me with a dataframe of (mostly) Monday's and Friday's. The trick is, I don't want to just look for Monday's and Friday's because some weeks are short weeks, starting on Tuesday's or ending on Thursday's (or otherwise. Maybe some weeks have a Wednesday off too?).
The logic I have right now (and a reproducible code) for dropping all rows that aren't the beginning of the week looks like this:
import pandas_datareader.data as web
import numpy as np
import pandas as pd
from pandas import Series
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import warnings
warnings.filterwarnings("once")
from datetime import datetime, timedelta
# Import a stock dataset from Yahoo
ticker = 'SPY'
start = datetime(2010, 1, 1)
end = datetime.today().strftime('%Y-%m-%d')
# Download the df
df = web.DataReader(ticker, 'yahoo', start, end)
# Drop the Adj Close and Volume for now
df = df.drop(['Adj Close'], axis=1)
print(df)
# Check if day of week is Monday
print('Checking for beginnings of weeks...')
df = df.reset_index() # Make the date index an actual column again for now
df['week_day_objects'] = pd.to_datetime(df['Date'], format='%Y-%m-%d') # make the dates a datetime object
for i in range(len(df)-1, 0, -1): # start at the bottom of the DF and work backwards
if df['week_day_objects'].iloc[i] > df['week_day_objects'].iloc[i-1] + timedelta(days=2): # first day of week is always > 2 days since the previous date, holidays included
continue # if today is the start of the week, continue the loop...
else:
df = df.drop([df.index[i]]) # ...else, drop all rows that aren't at the beginning of the week
df = df.set_index(['Date']) # make the date column the index again
df = df.drop(['week_day_objects'], axis=1) # drop the datetime column now
# For review
df.to_csv('./Check_Week_Days.csv', index=True)
...however, I'm stuck trying to also incorporate Friday's (or rather, the end of week) into this solution. And I'm not even sure this is the best way to do it so I'm open to suggestions. The logic above just basically looks for any day that's at least 3 days greater than the previous row, which is the beginning of the week as the beginning of a new work week always happens at least 3 days after the last work day of last week.
As requested, some clarification. Like I mentioned above, I don't just want to drop all rows that aren't Friday's or Monday's because some weeks are short weeks, so the beginning of the week could start on a Tuesday, or the end of a week could end on a Thursday, so I don't want to lose those rows. What I'd like to end up with is a dataframe of rows that start on the beginning business day of that week, and end on the last business day of that week, whether it be a Friday or Thursday/Monday or Tuesday. So the final dataset would look like this:
Notice how most weeks are Monday to Friday, however the 18th is a Tuesday because the 17th of that year was a holiday. I'm not looking to sync my calendar to holidays, I want to drop all the middle days between whatever business day started that week, and whatever business day ended that week. Hope that helps?
Thanks!
You can use the datetime object's dayofweek attribute to select rows and delete those based on the index.
import numpy as np
import pandas as pd
dates_df = pd.DataFrame(np.arange(np.datetime64('2000-01-03'), np.datetime64('2000-01-25')), columns=['date'])
dates_df.drop(dates_df[dates_df['date'].dt.dayofweek == 6].index)
The snippet above will drop all Sunday values.
But you can also select the data that matches the first or last day of the week instead of dropping it
dates_df[(dates_df['date'].dt.dayofweek == 1) | (dates_df['date'].dt.dayofweek == 4)]
I've figured it out with the below function using the day of week numbers:
# Check if day of week is Monday
print('Checking for beginnings of weeks...')
df = df.reset_index() # Make the date index an actual column again for now
df['week_day_objects'] = pd.to_datetime(df['Date'], format='%Y-%m-%d').dt.dayofweek # make the dates a datetime object number
for i in range(len(df)-2, 1, -1): # start at the bottom of the DF and work backwards. Need to trim the top/bottom rows accordingly later.
if (df['week_day_objects'].iloc[i] < df['week_day_objects'].iloc[i-1] and df['week_day_objects'].iloc[i] < df['week_day_objects'].iloc[i+1]) or # A beginning of the week will always have a day of week number less than the day after it, and the day before it
(df['week_day_objects'].iloc[i] > df['week_day_objects'].iloc[i-1] and df['week_day_objects'].iloc[i] > df['week_day_objects'].iloc[i+1]): # ...and a EOW will always have a number greater than the day before it, and the day after it.
continue # if today is the start or end of the week, skip...
else:
df = df.drop([df.index[i]]) # ...else, drop all rows that aren't at the beginning/end of the week
df = df.set_index(['Date']) # make the date column the index again
df = df.drop(['week_day_objects'], axis=1) # drop the datetime column now
# For review
df.to_csv('./Check_Week_Days.csv', index=True)
So a start of week will always have a lower number than the previous row/day's number, and it will also be lower than tomorrow's number. Reverse that for the end of week. This makes it work no matter what the Start or End of week is, being a Thursday end, or a Tuesday start.
This loop doesn't start at the very top/bottom of the dataframe though leaving some cleanup to do, but I will write a separate code to take care of that.
From the daily stock price data, I want to sample and select end of the month price. I am accomplishing using the following code.
import datetime
from pandas_datareader import data as pdr
import pandas as pd
end = datetime.date.today()
begin=end-pd.DateOffset(365*2)
st=begin.strftime('%Y-%m-%d')
ed=end.strftime('%Y-%m-%d')
data = pdr.get_data_yahoo("AAPL",st,ed)
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-2])).set_index(data.index)
The line above selects end of the month data and here is the output.
If I want to select penultimate value of the month, I can do it using the following code.
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-2]))
Here is the output.
However the index shows end of the month value. When I choose penultimate value of the month, I want index to be 2015-12-30 instead of 2015-12-31.
Please suggest the way forward. I hope my question is clear.
Thanking you in anticipation.
Regards,
Abhishek
I am not sure if there is a way to do it with resample. But, you can get what you want using groupby and TimeGrouper.
import datetime
from pandas_datareader import data as pdr
import pandas as pd
end = datetime.date.today()
begin = end - pd.DateOffset(365*2)
st = begin.strftime('%Y-%m-%d')
ed = end.strftime('%Y-%m-%d')
data = pdr.get_data_yahoo("AAPL",st,ed)
data['Date'] = data.index
mon_data = (
data[['Date', 'Adj Close']]
.groupby(pd.TimeGrouper(freq='M')).nth(-2)
.set_index('Date')
)
simplest solution is to take the index of your newly created dataframe and subtract the number of days you want to go back:
n = 1
mon_data=pd.DataFrame(data['Adj Close'].resample('M').apply(lambda x: x[-1-n]))
mon_data.index = mon_data.index - datetime.timedelta(days=n)
also, seeing your data, i think that you should resample not to ' month end frequency' but rather to 'business month end frequency':
.resample('BM')
but even that won't cover it all, because for instance December 29, 2017 is a business month end, but this date doesn't appear in your data (which ends in December 08 2017). so you could add a small fix to that (assuming the original data is sorted by the date):
end_of_months = mon_data.index.tolist()
end_of_months[-1] = data.index[-1]
mon_data.index = end_of_months
so, the full code will look like:
n = 1
mon_data=pd.DataFrame(data['Adj Close'].resample('BM').apply(lambda x: x[-1-n]))
end_of_months = mon_data.index.tolist()
end_of_months[-1] = data.index[-1]
mon_data.index = end_of_months
mon_data.index = mon_data.index - datetime.timedelta(days=n)
btw: your .set_index(data.index) throw an error because data and mon_data are in different dimensions (mon_data is monthly grouped_by)
in the 3 section of lecture, i encounrtered a problem that I could not upload any finance data from yahoo, so I use pandas datareader to uploaded a stock info for microsoft here is that code:
MS= data.DataReader(name = "MSFT", data_source = "yahoo", start = "2007-07-
10", end = "2008-12-10")
MS.head()
I got this :
Open High Low Close Adj Close Volume
Date
2007-07-10 29.700001 29.990000 29.180000 29.330000 22.709541 66013500
2007-07-11 29.240000 29.650000 29.209999 29.490000 22.833429 48017000
2007-07-12 29.559999 30.110001 29.440001 30.070000 23.282511 54302400
2007-07-13 29.940001 30.020000 29.660000 29.820000 23.088938 42173000
2007-07-16 29.760000 30.240000 29.719999 30.030001 23.251535 4802320
I got a date frame and with a date column as a index
when I plot it with .plot() it works
but how can i use plt.plot() to plot the pandas time data or how can i convert the format of pandas time data into readable format for matplotlib? thanks
If your Date column holds those date strings, try this:
MS['date'] = pd.to_datetime(MS['date'])