Download Historic NSE Futures Data - python

I need to download NSE Futures data since 2012 for my strategy backtesting. I tried NSEpy and jugaad-data libraries but they are giving one day's data at a time.
I tried Getbhavcopy as well but the data is not accurate there.
Is there any other free source to download the same.
Thanks,
Mohit

You can get as follows .....
from datetime import timedelta, date
from nsepy import get_history
def importdata(stock):
stock_fut = get_history(symbol=stock,
start=date.today() - timedelta(days = 14), end=date.today(),
futures=True,
expiry_date=date(2022,11,24))
#print(stock_fut.columns)
print(stock_fut[["Open","Close","Change in OI","Open Interest"]])
a = ["AARTIIND","ABB","ABBOTINDIA","ABCAPITAL","ABFRL","ACC","ADANIENT"]
for i in range(0,len(a)):
print(a[i])
importdata(a[i])

I've used NSEpy, this is basically scraping from the NSE website, better to use some API which actually has the right to provide the data. e.g: Samco, angel one APIs.
they are free as well.

Related

Retrieve a lot of data from Yahoo finance

I have a csv file which contains the ticker symbols for all the stocks listed on Nasdaq. Here is a link to that csv file. One can download it from there. There are more than 8000 stocks listed. Following is the code
import pandas as pd
import yfinance # pip install yfinance
tick_pd = pd.read_csv("/path/to/the/csv/file/nasdaq_screener_1654004691484.csv",
usecols = [0])
I have made a function which retrieves the historical stock prices for a ticker symbol. That function is as following:-
## function to be applied on each stock symbol
def appfunc(ticker):
A = yf.Ticker(ticker).history(period="max")
A["symbol"] = ticker
return A
And I apply this function to each row of the tick_pd, the following way:-
hist_prices = tick_pd.apply(appfunc)
But this takes way too much time, way way too much time. I was hoping if someone could find a way with which I can retrieve this data quite quickly. Or if there is a way I could parallelize it. I am quite new to python, so, I don't really know a lot of ways to do this.
Thanks in advance
You can use yf.download to download all tickers asynchronously::
tick_pd = pd.read_csv('nasdaq_screener_1654024849057.csv', usecols=[0])
df = yf.download(tick_pd['Symbol'].tolist(), period='max')
You can use threads as parameter of yf.download:
# Enable mass downloading (default is True)
df = yf.download(tick_pd['Symbol'].tolist(), period='max', threads=True)
# OR
# You can control the number of threads
df = yf.download(tick_pd['Symbol'].tolist(), period='max', threads=8)

BeautifulSoup initialization type Error -- trouble troubleshooting

The error points to this line of the bs4 source code
I'm using a 3rd party module that depends on BeautifulSoup. I am using it to create DataFrames of NBA players' stats individually and then concat'ing them to make one large DataFrame. The list comp in the code below works for a few DFs but then errors out with TypeError: object of type 'NoneType' has no len()
Relevant code:
import pandas as pd
from PandasBasketball.stats import player_stats
dfs = [player_stats(requests.get(url), "per_minute") for url in full_player_urls[600:]]
all_stats = pd.concat(dfs)
all_stats[::500]
Things I've tried:
Checked that full_player_urls was generating correctly. It is. It it's a list of URLs like: http://www.basketball-reference.com/players/b/burrobo01.html
Verified that player_stats() was working properly for URLs:
player_stats(requests.get('http://www.basketball-reference.com/players/b/bustida01.html'), "per_minute")
The above correctly yields a DataFrame generated from a table on that web page. This is working as intended.
My guess is that the sites server is recognizing that you are making many requests in a small timeframe and at some point in the loop is blocking you. There's a couple things you could do. The simpliest is just put a little time delay after each iteration. If that doesn't work, let me know, and we can fix that up a bit:
import pandas as pd
from PandasBasketball.stats import player_stats
import time
import random
import requests
dfs = []
for url in full_player_urls[600:]:
dfs.append(player_stats(requests.get(url), "per_minute"))
x = random.uniform(0, 10)
time.sleep(x)
all_stats = pd.concat(dfs)
all_stats[::500]

How to pull Call/Put Option prices with pandas-datareader in Python?

I have the following code. I tried to pull the data from yahoo and google, both don't work. It's throwing the below message
from pandas_datareader.data import Options
fb_options = Options('TSLA', 'yahoo')
options_df = fb_options.get_options_data(expiry=fb_options.expiry_dates[0])
print(options_df.tail())
Error Message: Yahoo Options has been immediately deprecated due to large breaks in the API without the
introduction of a stable replacement. Pull Requests to re-enable these data
connectors are welcome.
Is there any other way to retrieve the options prices?
The following is working for me right now
import yfinance as yf # https://github.com/ranaroussi/yfinance
aapl= yf.Ticker("AAPL")
# aapl.options # list of dates
DF_calls, DF_puts = aapl.option_chain(aapl.options[0]) # returns 2 DataFrames
Alternative method
import pandas_datareader as pdr
aapl = Options('aapl')
calls = aapl.get_call_data()
Partial output:
Last Bid Ask Chg Vol Open_Int IV Underlying_Price Last_Trade_Date
200.0 2020-07-02 call AAPL200702C00200000 164.50 151.60 156.20 0.000000 13.0 13.0 1.962891 353.63 2020-06-23 13:30:03
Link to my project code
Yahoo ended support for their options API, and as such, the Yahoo options reader and get_options_data were deprecated in pandas_datareader 0.7.0 (marked for removal). Unfortunately, there are no other readers in pandas_datareader which provide options prices.
There are (to my knowledge) no free-to-use APIs for options data, other than TD Ameritrade (see this endpoint), though you must be a TD Ameritrade account holder to obtain access to their developer API (link).

HTML hidden elements

I'm actually trying to code a little "GPS" and actually I couldn't use Google API because of the daily restriction.
I decided to use a site "viamichelin" which provide me the distance between two adresses. I created a little code to fetch all the URL adresses I needed like this :
import pandas
import numpy as np
df = pandas.read_excel('C:\Users\Bibi\Downloads\memoire\memoire.xlsx', sheet_name='Clients')
df2= pandas.read_excel('C:\Users\Bibi\Downloads\memoire\memoire.xlsx', sheet_name='Agences')
matrix=df.as_matrix(columns=None)
clients = np.squeeze(np.asarray(matrix))
matrix2=df2.as_matrix(columns=None)
agences = np.squeeze(np.asarray(matrix2))
compteagences=0
comptetotal=0
for j in agences:
compteclients=0
for i in clients:
print agences[compteagences]
print clients[compteclients]
url ='https://fr.viamichelin.be/web/Itineraires?departure='+agences[compteagences]+'&arrival='+clients[compteclients]+'&arrivalId=34MTE1MnJ5ZmQwMDMzb3YxMDU1ZDFvbGNOVEF1TlRVNU5UUT1jTlM0M01qa3lOZz09Y05UQXVOVFl4TlE9PWNOUzQzTXpFNU5nPT1jTlRBdU5UVTVOVFE9Y05TNDNNamt5Tmc9PTBqUnVlIEZvbmQgZGVzIEhhbGxlcw==&index=0&vehicle=0&type=0&distance=km&currency=EUR&highway=false&toll=false&vignette=false&orc=false&crossing=true&caravan=false&shouldUseTraffic=false&withBreaks=false&break_frequency=7200&coffee_duration=1200&lunch_duration=3600&diner_duration=3600&night_duration=32400&car=hatchback&fuel=petrol&fuelCost=1.393&allowance=0&corridor=&departureDate=&arrivalDate=&fuelConsumption='
print url
compteclients+=1
comptetotal+=1
compteagences+=1
All my datas are on Excel that's why I used the pandas library. I have all the URL's needed for my project.
Although, I would like to extract the number of kilometers needed but there's a little problem. In the source code, I don't have the information I need, so I can't extract it with Python... The site is presented like this:
Michelin
When I click on "inspect" I can find the information needed (on the left) but I can't on the source code (on the right) ... Can someone provide me some help?
Itinerary
I have already tried this, without succeeding :
import os
import csv
import requests
from bs4 import BeautifulSoup
requete = requests.get("https://fr.viamichelin.be/web/Itineraires?departure=Rue%20Lebeau%2C%20Liege%2C%20Belgique&departureId=34MTE1Mmc2NzQwMDM0NHoxMDU1ZW44d2NOVEF1TmpNek5ERT1jTlM0MU5qazJPQT09Y05UQXVOak16TkRFPWNOUzQxTnpBM01nPT1jTlRBdU5qTXpOREU9Y05TNDFOekEzTWc9PTBhUnVlIExlYmVhdQ==&arrival=Rue%20Rys%20De%20Mosbeux%2C%20Trooz%2C%20Belgique&arrivalId=34MTE1MnJ5ZmQwMDMzb3YxMDU1ZDFvbGNOVEF1TlRVNU5UUT1jTlM0M01qa3lOZz09Y05UQXVOVFl4TlE9PWNOUzQzTXpFNU5nPT1jTlRBdU5UVTVOVFE9Y05TNDNNamt5Tmc9PTBqUnVlIEZvbmQgZGVzIEhhbGxlcw==&index=0&vehicle=0&type=0&distance=km&currency=EUR&highway=false&toll=false&vignette=false&orc=false&crossing=true&caravan=false&shouldUseTraffic=false&withBreaks=false&break_frequency=7200&coffee_duration=1200&lunch_duration=3600&diner_duration=3600&night_duration=32400&car=hatchback&fuel=petrol&fuelCost=1.393&allowance=0&corridor=&departureDate=&arrivalDate=&fuelConsumption=")
page = requete.content
soup = BeautifulSoup(page, "html.parser")
print soup
Looking at the inspector for the page, the actual routing is done via a JavaScript invocation to this rather long URL.
The data you need seems to be in that response, starting from _scriptLoaded(. (Since it's a JavaScript object literal, you can use Python's built-in JSON library to load the data into a dict.)

Problems with Pandas DataReader and Yahoo

I was trying to get stock information as follows:
from pandas.io.data import DataReader
import datetime
data = DataReader("F", "yahoo", datetime.datetime(1990, 1, 1),datetime.datetime(2002, 1, 1))
which fails with
IOError: after 3 tries, Yahoo! did not return a 200 for url 'http://ichart.finance.yahoo.com/table.csv?s=C001.F&a=0&b=1&c=2014&d=11&e=1&f=2017&g=d&ignore=.csv'
Up to now, I could not find a fix for this issue or a suitable work-around. Do you guys have any suggestions?
It seems 'yahoo'is no longer supported. Try "morningstar" or "google".
The simple yahoo financial link,that worked for years, is no longer supported.
I've heard of a work around that involves browser spoofing (wget from command line) requires browser aliasing to obtain time sensitive cookies that are then required for each request -- but I've never tried it myself since "morningstar" currently still works (but I miss yahoo's adjusted close).
#(Pascal 3.6)
import pandas as pd
import pandas_datareader.data as web
...
df = web.DataReader('MSFT','morningstar')
for idx, row in df.iterrows():
print(idx[1],row[0],row[1],row[2],row[3],row[4])

Categories