I am trying to create a list of all At-The-Money (ATM) option contracts using yahoo_fin options module.
Yahoo_fin offers 2 methods for getting all call and put contracts:
from yahoo_fin import options as ops
# ops.get_call(Ticker, expiration_date=None)
# ops.get_pull(Ticker, expiration_date=None)
# If no expiration_date is passed, the nearest expiration date is used
ops.get_calls("aapl")
ops.get_puts("aapl")
These two methods return the following dataframes, respectively:
I have done some research at possibly using the strike price and comparing it with the underlying stock price. This is probably the most basic way, but the underlying stock may hay a price that is not exactly the same as an option's strike price. Another alternative I have read is to use delta. Can anybody provide insight into how I could find the ATM options using the data provided by yahoo_fin? Is it possible?
For ATM options the strike price is equal to the underlying asset’s current market price, as explained here.
However, there is no option for every possible market price, as options are oganized in grids. You could get the price of the option for which the strike price is closest to the underlying's market price. You can implement it as:
from yahoo_fin import options, stock_info
symbol = "AAPL"
last_adj_close = stock_info.get_data(symbol)["adjclose"][-1]
calls = options.get_calls("aapl")
puts = options.get_puts("aapl")
atm_call = calls.iloc[(calls["Strike"] - last_adj_close).abs().argsort()[:1]]
Output:
Contract Name Last Trade Date Strike Last Price Bid Ask Change % Change Volume Open Interest Implied Volatility
43 AAPL221118C00149000 2022-11-16 3:59PM EST 149.0 1.58 1.5 1.66 -1.12 -41.48% 22594 14120 40.09%
for the AAPL stock:
open high low close adjclose volume ticker
2022-11-14 148.970001 150.279999 147.429993 148.279999 148.279999 73374100 AAPL
2022-11-15 152.220001 153.589996 148.559998 150.039993 150.039993 89868300 AAPL
You can also obtain the two closest options by adjusting the parameter in
.argsort()[:2].
Related
I have a pandas dataframe of stock records, my goal is to pass in a particular 'day' e.g 8 and get the filtered data frame for the 8th of each month and year in the dataset.
I have gone through some SO questions and managed to get one part of my requirement that was getting the records for a particular day, however if the data for say '8th' does not exist for the particular month and year, I need to get the records for the closest day where record exists for this particular month and year.
As an example, if I pass in 8th and there is no record for 8th Jan' 2022, I need to see if records exists for 7th and 9th Jan'22, and so on..and get the record for the nearest date.
If record is present in both 7th and 9th, I will get the record for 9th (higher date).
However, it is possible if the record for 7th exists and 9th does not exist, then I will get the record for 7th (closest).
Code I have written so far
filtered_df = data.loc[(data['Date'].dt.day == 8)]
If the dataset is required, please let me know. I tried to make it clear but if there is any doubt, please let me know. Any help in the correct direction is appreciated.
Alternative 1
Resample to a daily resolution, selecting the nearest day to fill in missing values:
df2 = df.resample('D').nearest()
df2 = df2.loc[df2.index.day == 8]
Alternative 2
A more general method (and a tiny bit faster) is to generate dates/times of your choice, then use reindex() and method 'nearest'. It is more general because you can use any series of timestamps you could come up with (not necessarily aligned with any frequency).
dates = pd.date_range(
start=df.first_valid_index().normalize(), end=df.last_valid_index(),
freq='D')
dates = dates[dates.day == 8]
df2 = df.reindex(dates, method='nearest')
Example
Let's start with a reproducible example:
import yfinance as yf
df = yf.download(['AAPL', 'AMZN'], start='2022-01-01', end='2022-12-31', freq='D')
>>> df.iloc[:10, :5]
Adj Close Close High
AAPL AMZN AAPL AMZN AAPL
Date
2022-01-03 180.959747 170.404495 182.009995 170.404495 182.880005
2022-01-04 178.663086 167.522003 179.699997 167.522003 182.940002
2022-01-05 173.910645 164.356995 174.919998 164.356995 180.169998
2022-01-06 171.007523 163.253998 172.000000 163.253998 175.300003
2022-01-07 171.176529 162.554001 172.169998 162.554001 174.139999
2022-01-10 171.196426 161.485992 172.190002 161.485992 172.500000
2022-01-11 174.069748 165.362000 175.080002 165.362000 175.179993
2022-01-12 174.517136 165.207001 175.529999 165.207001 177.179993
2022-01-13 171.196426 161.214005 172.190002 161.214005 176.619995
2022-01-14 172.071335 162.138000 173.070007 162.138000 173.779999
Now:
df2 = df.resample('D').nearest()
df2 = df2.loc[df2.index.day == 8]
>>> df2.iloc[:5, :5]
Adj Close Close High
AAPL AMZN AAPL AMZN AAPL
2022-01-08 171.176529 162.554001 172.169998 162.554001 174.139999
2022-02-08 174.042633 161.413498 174.830002 161.413498 175.350006
2022-03-08 156.730942 136.014496 157.440002 136.014496 162.880005
2022-04-08 169.323975 154.460495 170.089996 154.460495 171.779999
2022-05-08 151.597595 108.789001 152.059998 108.789001 155.830002
Warning
Replacing a missing day with data from the future (which is what happens when the nearest day is after the missing one) is called peak-ahead and can cause peak-ahead bias in quant research that would use that data. It is usually considered dangerous. You'd be safer using method='ffill'.
I try to get ttm values of the income statement for ticker symbol AAPL by using
from yahoo_fin import stock_info as si
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import pandas_datareader
pd.set_option('display.max_columns', None)
income_statement = si.get_income_statement("aapl")
income_statement
but the result doesn't show the ttm values, only the yearly values are shown
On yahoo finance, we can also see the ttm values:
Anyone who can help?
The reason you don't get the exact same table you see on the website is because of the way yahoo_fin gets data from Yahoo. Rather than getting them from the table you see, they get them from json data that Yahoo provides. In this data, there are both quarterly and yearly income statements. When Yahoo renders the table on their website, they most likely use the yearly data for the yearly columns and then sum the last 4 quarterly results to get the TTM column (as TTM results are nothing else than the sum of the last 4 quarterly results).
If you want to get the TTM data, the best approach would be to do it the same way I assume Yahoo does. Get the quarterly data using yahoo_fin and then sum the quarters to calculate the TTM results. You can do this by setting the optional yearly parameter to False:
quarterly_income_statement = si.get_income_statement("aapl", yearly=False)
You can check out their method _parse_json to better understand how they get and parse data from Yahoo. (Assuming you have some knowledge of requests and json.)
Summing the data
To get the sum of the quarters you can for example do this:
quarterly_income_statement = si.get_income_statement("aapl", yearly=False)
ttm = quarterly_income_statement.sum(axis=1)
This will give you a new Dataframe ttm with the same data fields with values being TTM (You can test it and see if it matches the numbers on the website).
Is there any way I can use ccxt to extract the price of a crypto currency at a given time in the past?
Example: get price of BTC on binance at time 2018-01-24 11:20:01
You can use the fetch_ohlcv method on the binance class in CCXT
def fetch_ohlcv(self, symbol, timeframe='1m', since=None, limit=None, params={}):
You'll need the date as a timestamp in milliseconds, and you can only get it precise to the minute, so take away the seconds, or you'll get the price for the minute after
timestamp = int(datetime.datetime.strptime("2018-01-24 11:20:00", "%Y-%m-%d %H:%M:%S").timestamp() * 1000)
You can only get the price of BTC in comparison to another currency, we'll use USDT(closely matches USD) as our comparison currency, so we will look up the price of BTC in the BTC/USDT market
When we use the method, we will set since to your timestamp, but set the limit as one so that we only get one price
import ccxt
from pprint import pprint
print('CCXT Version:', ccxt.__version__)
exchange = ccxt.binance()
timestamp = int(datetime.datetime.strptime("2018-01-24 11:20:00+00:00", "%Y-%m-%d %H:%M:%S%z").timestamp() * 1000)
response = exchange.fetch_ohlcv('BTC/USDT', '1m', timestamp, 1)
pprint(response)
Which will return candlestick values for one candle
[
1516792860000, // timestamp
11110, // value at beginning of minute, so the value at exactly "2018-01-24 11:20:00"
11110.29, // highest value between "2018-01-24 11:20:00" and "2018-01-24 11:20:59"
11050.91, // lowest value between "2018-01-24 11:20:00" and "2018-01-24 11:20:59"
11052.27, // value just before "2018-01-24 11:21:00"
39.882601 // The volume traded during this minute
]
You can do that with CCXT's unified fetchOHLCV method:
https://docs.ccxt.com/en/latest/manual.html#ohlcv-candlestick-charts
https://docs.ccxt.com/en/latest/manual.html#ohlcv-structure
We highly recommend reading the entire CCXT Manual from top to bottom, it will really save your time:
https://docs.ccxt.com/
https://github.com/ccxt/ccxt/wiki
https://github.com/ccxt/ccxt/wiki/Manual
Also, check out the examples here:
https://github.com/ccxt/ccxt/tree/master/examples
https://github.com/ccxt/ccxt/tree/master/examples#see-also
Two dataframes:
Dataframe 'prices' contains minute pricing.
ts average
2017-12-13 15:55:00-05:00 339.389
2017-12-13 15:56:00-05:00 339.293
2017-12-13 15:57:00-05:00 339.172
2017-12-13 15:58:00-05:00 339.148
2017-12-13 15:59:00-05:00 339.144
Dataframe 'articles' contains articles:
ts title
2017-10-25 11:45:00-04:00 Your Evening Briefing
2017-11-24 14:15:00-05:00 Tesla's Grand Designs Distract From Model 3 Bo...
2017-10-26 11:09:00-04:00 UAW Files Claim That Tesla Fired Workers Who S...
2017-10-25 11:42:00-04:00 Forget the Grid of the Future, Puerto Ricans J...
2017-10-22 09:54:00-04:00 Tesla Reaches Deal for Shanghai Facility, WSJ ...
When 'article' happens, I want the current average stock price (easy), plus the stock price of the end of the day (problem).
My current approach:
articles['t-eod'] = prices.loc[articles.index.strftime('%Y-%m-%d')[0]].between_time('15:30','15:31')
However, it gives a warning:
/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
Reading the docs didn't make it a lot clearer to me.
So question: How can I, for every Article, get Prices' last average price of that day?
Thanks!
/Maurice
You could try using idxmax on ts to identify the index of the maximum timestamp of that date and extract the average value with loc
#Reset our index
prices_df.reset_index(inplace=True)
articles_df.reset_index(inplace=True)
#Ensure our ts field is datetime
prices_df['ts'] = pd.to_datetime(prices_df['ts'])
articles_df['ts'] = pd.to_datetime(articles_df['ts'])
#Get maximum average value from price_df by date
df_max = prices_df.loc[prices_df.groupby(prices_df.ts.dt.date, as_index=False).ts.idxmax()]
#We need to join df_max and articles on the date so we make a new index
df_max['date'] = df_max.ts.dt.date
articles_df['date'] = articles_df.ts.dt.date
df_max.set_index('date',inplace=True)
articles_df.set_index('date',inplace=True)
#Set our max field
articles_df['max'] = df_max['average']
articles_df.set_index('ts',inplace=True)
In Python, monthly stock prices from Yahoo Finance as follows...
import pandas_datareader.data as web
data = web.get_data_yahoo('IBM','01/01/2016',interval='m')
I tried to get monthly stock prices from Google Finance, but daily stock prices are returned
data = web.get_data_google('IBM','2016')
How can I get monthly stock prices from Google Finance in Python ? Thanks in advance
According to the documentation, this is indeed possible.
EDIT: The example below, despite being part of the documentation, does not seem to work. I have found that the following works as an alternative:
import pandas_datareader.data as web
quotes = web.DataReader('NYSE:IBM', 'google')
Where 'NYSE:IBM' is the exchange code and ticker separated by a colon, and 'google' is the pricing source.
OLD (NON-WORKING) EXAMPLE:
In [42]: import pandas_datareader.data as web
In [43]: q = web.get_quote_google(['AMZN', 'GOOG'])
In [44]: q
Out[44]:
change_pct last time
AMZN 0.01 780.22 2016-11-25 13:00:00
GOOG 0.09 761.68 2016-11-25 13:00:00