Python schedule to run a function at specific day between time periods - python

I've got a function that accesses an API to check for train data at specific times. This is actually run 3 times for each journey, so I'd need to run each of the 3 at specific times.
I've tried using the schedule module to get this going but I can't seem to get it working. Here's my current code:
schedule.every().day.at("07:30").every(5).minutes.do(darwinChecker(train_station['home_station'], train_station['connect_station'], user_time['morning_time']))
But I get an AttributeError: 'Job' object has no attribute 'every'. The documentation states this happens if your code imports the wrong schedule module, but I've no other files under that name.
How would I go about running my function, say, every Friday from 07:30 till 08:40, every 5 minutes?
Edit: As per request, added my full code for what I'm trying to do:
import requests
import re
import schedule
import time
from darwin_token import DARWIN_KEY
jsonToken = DARWIN_KEY
train_station = {'work_station': 'bat', 'home_station': 'man', 'connect_station': 'wds'}
user_time = {'morning_time': ['0821', '0853'], 'evening_time': ['1733'], 'connect_time': ['0834', '0843']}
def darwinChecker(departure_station, arrival_station, user_time):
response = requests.get("https://huxley.apphb.com/all/" + str(departure_station) + "/to/" + str(arrival_station) + "/" + str(user_time), params={"accessToken": jsonToken})
response.raise_for_status() # this makes an error if something failed
data1 = response.json()
train_service = data1["trainServices"]
print('Departure Station: ' + str(data1.get('crs')))
print('Arrival Station: ' + str(data1.get('filtercrs')))
print('-' * 40)
try:
found_service = 0 # keeps track of services so note is generated if service not in user_time
for index, service in enumerate(train_service):
if service['sta'].replace(':', '') in user_time: # replaces sta time with values in user_time
found_service += 1 # increments for each service in user_time
print('Service RSID: ' + str(train_service[index]['rsid']))
print('Scheduled arrival time: ' + str(train_service[index]['sta']))
print('Scheduled departure time: ' + str(train_service[index]['std']))
print('Status: ' + str(train_service[index]['eta']))
print('-' * 40)
if service['eta'] == 'Cancelled':
# print('The ' + str(train_service[index]['sta']) + ' service is cancelled.')
print('Previous train departure time: ' + str(train_service[index - 1]['sta']))
print('Previous train status: ' + str(train_service[index - 1]['eta']))
if found_service == 0: # if no service is found
print('The services currently available are not specified in user_time.')
except TypeError:
print('There is no train service data')
try:
# print('\nNRCC Messages: ' + str(data1['nrccMessages'][0]['value']))
NRCCRegex = re.compile('^(.*?)[\.!\?](?:\s|$)') # regex pulls all characters until hitting a . or ! or ?
myline = NRCCRegex.search(data1['nrccMessages'][0]['value']) # regex searches through nrccMessages
print('\nNRCC Messages: ' + myline.group(1)) # prints parsed NRCC message
except (TypeError, AttributeError) as error: # tuple catches multiple errors, AttributeError for None value
print('There is no NRCC data currently available\n')
print('Morning Journey'.center(50, '='))
darwinChecker(train_station['home_station'], train_station['connect_station'], user_time['morning_time'])
# schedule.every().day.at("21:50").do()
# schedule.every(2).seconds.do(darwinChecker,train_station['home_station'], train_station['connect_station'], user_time['morning_time'])
schedule.every().day.at("07:30").every(5).minutes.do(darwinChecker,train_station['home_station'], train_station['connect_station'], user_time['morning_time'])
while True:
schedule.run_pending()
time.sleep(1)
# print('Connection Journey'.center(50, '='))
# darwinChecker(train_station['connect_station'], train_station['work_station'], user_time['connect_time'])
# print('Evening Journey'.center(50, '='))
# darwinChecker(train_station['work_station'], train_station['home_station'], user_time['evening_time'])`

Related

How can I send a sms in Django?

I encountered a problem when trying to send sms using the SMSC service in Django project.
My Celery task for sending email and sms:
def order_created_retail(order_id):
# Task to send an email when an order is successfully created
order = OrderRetail.objects.get(id=order_id)
subject = 'Order №{}.'.format(order_id)
message_mail = 'Hello, {}! You have successfully placed an order{}. Manager will contact you shortly'.format(order.first_name, order.id)
message_sms = 'Your order №{} is accepted! Wait for operator call'
mail_sent = send_mail(
subject,
message_mail,
'email#email.com',
[order.email]
)
smsc = SMSC()
sms_sent = smsc.send_sms(
[order.phone],
str(message_sms)
)
return mail_sent, sms_sent
Email sends correctly, but for sms I get that error:
Task orders.tasks.order_created_retail[f05458b1-65e8-493b-9069-fbaa55083e7a] raised unexpected: TypeError('quote_from_bytes() expected bytes')
function from SMSC library:
def send_sms(self, phones, message, translit=0, time="", id=0, format=0, sender=False, query=""):
formats = ["flash=1", "push=1", "hlr=1", "bin=1", "bin=2", "ping=1", "mms=1", "mail=1", "call=1", "viber=1", "soc=1"]
m = self._smsc_send_cmd("send", "cost=3&phones=" + quote(phones) + "&mes=" + quote(message) + \
"&translit=" + str(translit) + "&id=" + str(id) + ifs(format > 0, "&" + formats[format-1], "") + \
ifs(sender == False, "", "&sender=" + quote(str(sender))) + \
ifs(time, "&time=" + quote(time), "") + ifs(query, "&" + query, ""))
# (id, cnt, cost, balance) или (id, -error)
if SMSC_DEBUG:
if m[1] > "0":
print("Сообщение отправлено успешно. ID: " + m[0] + ", всего SMS: " + m[1] + ", стоимость: " + m[2] + ", баланс: " + m[3])
else:
print("Ошибка №" + m[1][1:] + ifs(m[0] > "0", ", ID: " + m[0], ""))
return m
What am I doing wrong?
Thanks!
to solve this problem, I started investigating the functions that were giving out the error.
It turned out that I was passing an incorrect value. the function was expecting a string. And it took me a long time to figure out why editing didn't help.
It turns out that you have to RESET CELERY every time you make an edit.

Python telegram bot cooldown function

I was stuck on my code due to the reason that it's not responding
because i want a bot that /price command has a cooldown and in the next time someone use /price command it will wont respond for 30 mins or prompt people to wait until the cooldown is done so that it will be less spammy on telegram groups when many people doing /price in a same time
def price(update, context):
if chat_data.get('last_time'):
if datetime.now()-last_time <= my_threshold:
return
chat_data['last_time'] = datetime.now()
update.message.reply_text(
'Bitcoin PRICE TRACKER \n'
'🤑 Price: $ ' + usd + '\n'
'📈 Marketcap: $ ' + usdcap + '\n'
'💸 24 Hour Volume: $ ' + usdvol + '\n'
'💵 24 Hour Change: % ' + usdchange + '\n'
'⚙️ Last Updated at: ' + lastat +'\n'
)
updater = Updater('mytoken', use_context=True)
updater.dispatcher.add_handler(CommandHandler('price', price))
What about something django-rest style, like a decorator?
import datetime
throttle_data = {
'minutes': 30,
'last_time': None
}
def throttle(func):
def wrapper(*args, **kwargs):
now = datetime.datetime.now()
delta = now - datetime.timedelta(minutes=throttle_data.get('minutes', 30))
last_time = throttle_data.get('last_time')
if not last_time:
last_time = delta
if last_time <= delta:
throttle_data['last_time'] = now
func(*args, **kwargs)
else:
return not_allowed(*args)
return wrapper
def not_allowed(update, context):
update.message.reply_text(text="You are not allowed.")
#throttle
def price(update, context):
update.message.reply_text(
'Bitcoin PRICE TRACKER \n'
'🤑 Price: $ ' + usd + '\n'
'📈 Marketcap: $ ' + usdcap + '\n'
'💸 24 Hour Volume: $ ' + usdvol + '\n'
'💵 24 Hour Change: % ' + usdchange + '\n'
'⚙️ Last Updated at: ' + lastat +'\n'
)
updater = Updater('mytoken', use_context=True)
updater.dispatcher.add_handler(CommandHandler('price', price))
Of course throttle is resetted everytime you restart the bot.
This, by the way, will spam "You are not allowed", so I you have a lot of user throttling, you change not_allowed function removing the reply.

How to Handle Exceptions Caused by Holidays and Weekends in Python

I'm using an API to lookup historical stock market prices for a given company on the last day of each month. The problem is that the last day can sometimes fall on a weekend or holiday, in which case the API returns a KeyError. I've tried using an exception to handle this by adding n number to the date to get the next-closest valid one, but this is not foolproof.
Here is my existing code:
import os
from iexfinance.stocks import get_historical_data
import iexfinance
import pandas as pd
# Set API Keys
os.environ['IEX_API_VERSION'] = 'iexcloud-sandbox'
os.environ['IEX_TOKEN'] = 'Tsk_5798c0ab124d49639bb1575b322841c4'
stocks = ['AMZN', 'FDX', 'XXXXX', 'BAC', 'COST']
date = "20191130"
for stock in stocks:
try:
price_df = get_historical_data(stock, date, close_only=True,output_format='pandas')
price = price_df['close'].values[0]
print(price)
except KeyError:
date = str(int(date) - 1)
price_df = get_historical_data(stock, date, close_only=True, output_format='pandas')
price = price_df['close'].values[0]
print(price)
except iexfinance.utils.exceptions.IEXQueryError:
print(stock + " is not a valid company")
But if you change date = "20160131", then you get a KeyError again.
So is there a simple way to handle these exceptions and get the next-valid date?
Note that the API Key is public and for sandbox purposes, so feel free to use
I think this might work:
def get_prices(stocks, date):
for stock in stocks:
try:
price_df = get_historical_data(stock, date, close_only=True,output_format='pandas')
price = price_df['close'].values[0]
print(stock + " was # $" + str(price) + " on " + str(date))
except KeyError:
return get_prices(stocks, date = str(int(date) - 1))
print(stock + " was # $" + str(price) + " on " + str(date))
except iexfinance.utils.exceptions.IEXQueryError:
print(stock + " is not a valid company")

Python multiprocessing, functions with arguments

I have a program that simulates an entire baseball season, but does a lot of calculations per game, so each game takes around 30 seconds to run. With 2430 games in a season, the program takes about 20 hours to run, per season. Obviously I'd like to speed this up, so the most immediate solution seems like multiprocessing. I could manually split it up into groups of ~600 and run four processes, but I'd like to figure out how the multiprocessing module works.
Here's what I've tried so far, but obviously it doesn't work.
def test_func():
algorithm_selection = 1
# Create sqlite database connection
conn = sqlite3.connect('C:/F5 Prediction Engine/sqlite3/Version 2/statcast_db.db')
c = conn.cursor()
season = input('Year to simulate: ')
c.execute('SELECT * FROM gamelogs_' + season)
season_games = c.fetchall()
game_num = 0
for game in season_games:
game_num = game_num + 1
#Get away lineup in terms of MLB IDs
away_lineup = ConvertLineup(game[105], game[108], game[111], game[114], game[117], game[120], game[123], game[126], game[129])
#Get home lineup in terms of MLB IDs
home_lineup = ConvertLineup(game[132], game[135], game[138], game[141], game[144], game[147], game[150], game[153], game[156])
#Get away starting pitcher and hand in terms of MLB ID
away_pitcher_results = GetPitcherIDandHand(game[101])
away_pitcher_id = away_pitcher_results[0][0]
away_pitcher_hand = away_pitcher_results[0][1]
#Get home starting pitcher and hand in terms of MLB ID
home_pitcher_results = GetPitcherIDandHand(game[103])
home_pitcher_id = home_pitcher_results[0][0]
home_pitcher_hand = home_pitcher_results[0][1]
#Get the date of the game
today_date = game[0]
if algorithm_selection == 1:
#Check if the current game has already been evaluated and entered into the database
c.execute('SELECT * FROM pemstein_results_' + season + ' WHERE date = "' + game[0] + '" AND away_team = "' + game[3] + '" AND home_team = "' + game[6] + \
'" AND away_team_score = "' + game[9] + '" AND home_team_score = "' + game[10] + '"')
check_results = c.fetchall()
if len(check_results) == 0:
exp_slgs = PemsteinSimulation(home_pitcher_id, away_pitcher_id, season, home_pitcher_hand, away_pitcher_hand, home_lineup, away_lineup, game[0])
if exp_slgs[2] == 0: #if both pitches had at least 300 PAs to use for simulation
c.execute([long string to insert results into database])
conn.commit()
print('Game ' + str(game_num) + ' finished.')
if exp_slgs[2] == 1: #if one of the pitches did not have enough PAs to qualify
c.execute([long string to insert results into database])
conn.commit()
print('Game ' + str(game_num) + ' finished.')
if len(check_results) > 0:
print('Game ' + str(game_num) + ' has already been evaluated.')
from multiprocessing import Process
import os
processes = []
for i in range(0, os.cpu_count()):
print('Registering process %d' % i)
processes.append(Process(target=test))
for process in processes:
process.start()
for process in processes:
process.join()
==================
Edit: new code
#Child Process
def simulate_games(games_list, counter, lock):
while(1):
# Create sqlite database connection
conn = sqlite3.connect('C:/F5 Prediction Engine/sqlite3/Version 2/statcast_db.db')
c = conn.cursor()
#acquire the lock which grants access to the shared variable
with lock:
#check the termination condition
if counter >= len(games_list):
break
#get the game_num and game to simulate
game_num = counter.value
game_to_simulate = game_list[counter.value]
#update the counter for the next process
counter.value += 1
#Do simulation
game_num = 0
game_num = game_num + 1
#Get away lineup in terms of MLB IDs
away_lineup = ConvertLineup(game_to_simulate[105], game_to_simulate[108], game_to_simulate[111], game_to_simulate[114], game_to_simulate[117], game_to_simulate[120], game_to_simulate[123], game_to_simulate[126], game_to_simulate[129])
#Get home lineup in terms of MLB IDs
home_lineup = ConvertLineup(game_to_simulate[132], game_to_simulate[135], game_to_simulate[138], game_to_simulate[141], game_to_simulate[144], game_to_simulate[147], game_to_simulate[150], game_to_simulate[153], game_to_simulate[156])
#Get away starting pitcher and hand in terms of MLB ID
away_pitcher_results = GetPitcherIDandHand(game[101])
away_pitcher_id = away_pitcher_results[0][0]
away_pitcher_hand = away_pitcher_results[0][1]
#Get home starting pitcher and hand in terms of MLB ID
home_pitcher_results = GetPitcherIDandHand(game[103])
home_pitcher_id = home_pitcher_results[0][0]
home_pitcher_hand = home_pitcher_results[0][1]
#Get the date of the game
today_date = game_to_simulate[0]
if algorithm_selection == 1:
#Check if the current game has already been evaluated and entered into the database
c.execute('SELECT * FROM pemstein_results_' + season + ' WHERE date = "' + game_to_simulate[0] + '" AND away_team = "' + game_to_simulate[3] + '" AND home_team = "' + game_to_simulate[6] + \
'" AND away_team_score = "' + game_to_simulate[9] + '" AND home_team_score = "' + game_to_simulate[10] + '"')
check_results = c.fetchall()
if len(check_results) == 0:
exp_slgs = PemsteinSimulation(home_pitcher_id, away_pitcher_id, season, home_pitcher_hand, away_pitcher_hand, home_lineup, away_lineup, game_to_simulate[0])
if exp_slgs[2] == 0: #if both pitches had at least 300 PAs to use for simulation
c.execute('long sql')
conn.commit()
print('Game ' + str(game_num) + ' finished.')
if exp_slgs[2] == 1: #if one of the pitches did not have enough PAs to qualify
c.execute('long sql')
conn.commit()
print('Game ' + str(game_num) + ' finished.')
if len(check_results) > 0:
print('Game ' + str(game_num) + ' has already been evaluated.')
if __name__ == "__main__":
# Create sqlite database connection
conn = sqlite3.connect('C:/F5 Prediction Engine/sqlite3/Version 2/statcast_db.db')
c = conn.cursor()
#Query all games for season to be simulated
season = int(input('Year to simulate: '))
c.execute('SELECT * FROM gamelogs_' + str(season))
season_games = c.fetchall()
algorithmSelection = 1
if algorithmSelection == 1:
PemsteinSQLresults(str(season))
counter = mp.Value('i', 0)
lock = mp.Lock()
children = []
for i in range(os.cpu_count()):
children.append(mp.Process(target=simulate_games, args=(season_games, counter, lock)))
for child in children:
child.start()
for child in children:
child.join()
Error:
Traceback (most recent call last):
File "C:\F5 Prediction Engine\Version 2\SimulateSeason v2.py", line 126, in <module>
child.start()
File "C:\Python\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Python\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Python\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Python\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Python\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
=============
So I went to this website to review some things, and tried a new script with the following code that I copied from the site:
import mp
def worker(num):
"""thread worker function"""
print('Worker:' + num)
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = mp.Process(target=worker, args=(i,))
jobs.append(p)
p.start()
But it likewise doesn't do anything. The site says it should print Worker:0 Worker:1 etc, but I'm getting no prints. Is it possible there's something wrong locally on my machine?
It seems to me that you have simply tried to instantiate a new process for each cpu and had them run the same function that you wrote at first, however if you want to work with processes you would have to adapt it and handle process synchonization.
As an example you could have a master process which prompts the user for the season year, fetches all the games for that year and then the child processes would read from the resulting array. See the following example:
# Parent Process
import multiprocessing as mp
# establish db connection [ ... ]
season = int(input("Year to simulate: "))
c.execute('SELECT * FROM gamelogs_' + season)
season_games = c.fetchall()
counter = mp.Value("i", 0)
lock = mp.Lock()
children = []
for i in range(os.cpu_count()):
children.append(mp.Process(target=simulate_games, args=(season_games, counter, lock,)))
for child in children:
child.start()
for child in children:
child.join()
# Child Process
def simulate_games(games_list, counter, lock):
while(1):
# acquire the lock which grants the access to the shared variable
with lock:
# check the termination condition
if counter.value >= len(games_list):
break
# get the game_num and the game to simulate
game_num = counter.value
game_to_simulate = games_list[counter.value]
# update counter for the next process
counter.value += 1
# Do simulation here
What we have above is a parent process which is basically preparing some data and creating new child processes.
The counter is implemented by means of a special class, i.e Value, which is used for sharing scalar values among processes; Lock is basically a mutex, which we use to synchronize the access to the counter variable and avoid concurrent access: note that you could have used the Lock which is automatically created inside of the counter shared variable, but I thought it would be easier to understand by separating the two.
The children processes by first acquiring the lock, read the counter value and increment it, then proceed to their normal behavior, thus simulating the games

Issues scraping EV/EBITDA, Sale of Purchase Stock & Net Borrowings from Yahoo Finance

I pulled a Python script off of Github which is intended to analyze & rank stocks. I finally got it running but unfortunately the EV/EBITDA and Shareholder Yield are populating their default values, 1000 & 0 respectively.
I've spent the last few days attempting to troubleshoot, learning a lot in the process, but unfortunately had no luck.. I think it's attempting to extract data from a nonexistent line on the 'Scraper' portion or referencing an incorrect HTML. I'll paste the two code snips I think the error may lie within though the rest of the files are linked above.
Main File
from sys import stdout
from Stock import Stock
import Pickler
import Scraper
import Rankings
import Fixer
import Writer
# HTML error code handler - importing data is a chore, and getting a connection
# error halfway through is horribly demotivating. Use a pickler to serialize
# imported data into a hot-startable database.
pklFileName = 'tmpstocks.pkl'
pickler = Pickler.Pickler()
# Check if a pickled file exists. Load it if the user requests. If no file
# loaded, stocks is an empty list.
stocks = pickler.loadPickledFile(pklFileName)
# Scrape data from FINVIZ. Certain presets have been established (see direct
# link for more details)
url = 'http://finviz.com/screener.ashx?v=152&f=cap_smallover&' + \
'ft=4&c=0,1,2,6,7,10,11,13,14,45,65'
html = Scraper.importHtml(url)
# Parse the HTML for the number of pages from which we'll pull data
nPages = -1
for line in html:
if line[0:40] == '<option selected="selected" value=1>Page':
# Find indices
b1 = line.index('/') + 1
b2 = b1 + line[b1:].index('<')
# Number of pages containing stock data
nPages = int(line[b1:b2])
break
# Parse data from table on the first page of stocks and store in the database,
# but only if no data was pickled
if pickler.source == Pickler.PickleSource.NOPICKLE:
Scraper.importFinvizPage(html, stocks)
# The first page of stocks (20 stocks) has been imported. Now import the
# rest of them
source = Pickler.PickleSource.FINVIZ
iS = pickler.getIndex(source, 1, nPages + 1)
for i in range(iS, nPages + 1):
try:
# Print dynamic progress message
print('Importing FINVIZ metrics from page ' + str(i) + ' of ' + \
str(nPages) + '...', file=stdout, flush=True)
# Scrape data as before
url = 'http://finviz.com/screener.ashx?v=152&f=cap_smallover&ft=4&r=' + \
str(i*20+1) + '&c=0,1,2,6,7,10,11,13,14,45,65'
html = Scraper.importHtml(url)
# Import stock metrics from page into a buffer
bufferList = []
Scraper.importFinvizPage(html, bufferList)
# If no errors encountered, extend buffer to stocks list
stocks.extend(bufferList)
except:
# Error encountered. Pickle stocks for later loading
pickler.setError(source, i, stocks)
break
# FINVIZ stock metrics successfully imported
print('\n')
# Store number of stocks in list
nStocks = len(stocks)
# Handle pickle file
source = Pickler.PickleSource.YHOOEV
iS = pickler.getIndex(source, 0, nStocks)
# Grab EV/EBITDA metrics from Yahoo! Finance
for i in range(iS, nStocks):
try:
# Print dynamic progress message
print('Importing Key Statistics for ' + stocks[i].tick +
' (' + str(i) + '/' + str(nStocks - 1) + ') from Yahoo! Finance...', \
file=stdout, flush=True)
# Scrape data from Yahoo! Finance
url = 'http://finance.yahoo.com/q/ks?s=' + stocks[i].tick + '+Key+Statistics'
html = Scraper.importHtml(url)
# Parse data
for line in html:
# Check no value
if 'There is no Key Statistics' in line or \
'Get Quotes Results for' in line or \
'Changed Ticker Symbol' in line or \
'</html>' in line:
# Non-financial file (e.g. mutual fund) or
# Ticker not located or
# End of html page
stocks[i].evebitda = 1000
break
elif 'Enterprise Value/EBITDA' in line:
# Line contains EV/EBITDA data
evebitda = Scraper.readYahooEVEBITDA(line)
stocks[i].evebitda = evebitda
break
except:
# Error encountered. Pickle stocks for later loading
pickler.setError(source, i, stocks)
break
# Yahoo! Finance EV/EBITDA successfully imported
print('\n')
# Handle pickle file
source = Pickler.PickleSource.YHOOBBY
iS = pickler.getIndex(source, 0, nStocks)
# Grab BBY metrics from Yahoo! Finance
for i in range(iS, nStocks):
try:
# Print dynamic progress message
print('Importing Cash Flow for ' + stocks[i].tick +
' (' + str(i) + '/' + str(nStocks - 1) + ') from Yahoo! Finance...', \
file=stdout, flush=True)
# Scrape data from Yahoo! Finance
url = 'http://finance.yahoo.com/q/cf?s=' + stocks[i].tick + '&ql=1'
html = Scraper.importHtml(url)
# Parse data
totalBuysAndSells = 0
for line in html:
# Check no value
if 'There is no Cash Flow' in line or \
'Get Quotes Results for' in line or \
'Changed Ticker Symbol' in line or \
'</html>' in line:
# Non-financial file (e.g. mutual fund) or
# Ticker not located or
# End of html page
break
elif 'Sale Purchase of Stock' in line:
# Line contains Sale/Purchase of Stock information
totalBuysAndSells = Scraper.readYahooBBY(line)
break
# Calculate BBY as a percentage of current market cap
bby = round(-totalBuysAndSells / stocks[i].mktcap * 100, 2)
stocks[i].bby = bby
except:
# Error encountered. Pickle stocks for later loading
pickler.setError(source, i, stocks)
break
# Yahoo! Finance BBY successfully imported
if not pickler.hasErrorOccurred:
# All data imported
print('\n')
print('Fixing screener errors...')
# A number of stocks may have broken metrics. Fix these (i.e. assign out-of-
# bounds values) before sorting
stocks = Fixer.fixBrokenMetrics(stocks)
print('Ranking stocks...')
# Calculate shareholder Yield
for i in range(nStocks):
stocks[i].shy = stocks[i].div + stocks[i].bby
# Time to rank! Lowest value gets 100
rankPE = 100 * (1 - Rankings.rankByValue([o.pe for o in stocks]) / nStocks)
rankPS = 100 * (1 - Rankings.rankByValue([o.ps for o in stocks]) / nStocks)
rankPB = 100 * (1 - Rankings.rankByValue([o.pb for o in stocks]) / nStocks)
rankPFCF = 100 * (1 - Rankings.rankByValue([o.pfcf for o in stocks]) / nStocks)
rankEVEBITDA = 100 * (1 - Rankings.rankByValue([o.evebitda for o in stocks]) / nStocks)
# Shareholder yield ranked with highest getting 100
rankSHY = 100 * (Rankings.rankByValue([o.shy for o in stocks]) / nStocks)
# Rank total stock valuation
rankStock = rankPE + rankPS + rankPB + rankPFCF + rankEVEBITDA + rankSHY
# Rank 'em
rankOverall = Rankings.rankByValue(rankStock)
# Calculate Value Composite - higher the better
valueComposite = 100 * rankOverall / len(rankStock)
# Reverse indices - lower index -> better score
rankOverall = [len(rankStock) - 1 - x for x in rankOverall]
# Assign to stocks
for i in range(nStocks):
stocks[i].rank = rankOverall[i]
stocks[i].vc = round(valueComposite[i], 2)
print('Sorting stocks...')
# Sort all stocks by normalized rank
stocks = [x for (y, x) in sorted(zip(rankOverall, stocks))]
# Sort top decile by momentum factor. O'Shaughnessey historically uses 25
# stocks to hold. The top decile is printed, and the user may select the top 25
# (or any n) from the .csv file.
dec = int(nStocks / 10)
topDecile = []
# Store temporary momentums from top decile for sorting reasons
moms = [o.mom for o in stocks[:dec]]
# Sort top decile by momentum
for i in range(dec):
# Get index of top momentum performer in top decile
topMomInd = moms.index(max(moms))
# Sort
topDecile.append(stocks[topMomInd])
# Remove top momentum performer from further consideration
moms[topMomInd] = -100
print('Saving stocks...')
# Save momentum-weighted top decile
topCsvPath = 'top.csv'
Writer.writeCSV(topCsvPath, topDecile)
# Save results to .csv
allCsvPath = 'stocks.csv'
Writer.writeCSV(allCsvPath, stocks)
print('\n')
print('Complete.')
print('Top decile (sorted by momentum) saved to: ' + topCsvPath)
print('All stocks (sorted by trending value) saved to: ' + allCsvPath)
Scraper
import re
from urllib.request import urlopen
from Stock import Stock
def importHtml(url):
"Scrapes the HTML file from the given URL and returns line break delimited \
strings"
response = urlopen(url, data = None)
html = response.read().decode('utf-8').split('\n')
return html
def importFinvizPage(html, stocks):
"Imports data from a FINVIZ HTML page and stores in the list of Stock \
objects"
isFound = False
for line in html:
if line[0:15] == '<td height="10"':
isFound = True
# Import data line into stock database
_readFinvizLine(line, stocks)
if isFound and len(line) < 10:
break
return
def _readFinvizLine(line, stocks):
"Imports stock metrics from the data line and stores it in the list of \
Stock objects"
# Parse html
(stkraw, dl) = _parseHtml(line)
# Create new stock object
stock = Stock()
# Get ticker symbol
stock.tick = stkraw[dl[1] + 1: dl[2]]
# Get company name
stock.name = stkraw[dl[2] + 1 : dl[3]]
# Get market cap multiplier (either MM or BB)
if stkraw[dl[4] - 1] == 'B':
capmult = 1000000000
else:
capmult = 1000000
# Get market cap
stock.mktcap = capmult * _toFloat(stkraw[dl[3] + 1 : dl[4] - 1])
# Get P/E ratio
stock.pe = _toFloat(stkraw[dl[4] + 1 : dl[5]])
# Get P/S ratio
stock.ps = _toFloat(stkraw[dl[5] + 1 : dl[6]])
# Get P/B ratio
stock.pb = _toFloat(stkraw[dl[6] + 1 : dl[7]])
# Get P/FCF ratio
stock.pfcf = _toFloat(stkraw[dl[7] + 1 : dl[8]])
# Get Dividend Yield
stock.div = _toFloat(stkraw[dl[8] + 1 : dl[9] - 1])
# Get 6-mo Relative Price Strength
stock.mom = _toFloat(stkraw[dl[9] + 1 : dl[10] - 1])
# Get Current Stock Price
stock.price = _toFloat(stkraw[dl[11] + 1 : dl[12]])
# Append stock to list of stocks
stocks.append(stock)
return
def _toFloat(line):
"Converts a string to a float. Returns NaN if the line can't be converted"
try:
num = float(line)
except:
num = float('NaN')
return num
def readYahooEVEBITDA(line):
"Returns EV/EBITDA data from Yahoo! Finance HTML line"
# Parse html
(stkraw, dl) = _parseHtml(line)
for i in range(0, len(dl)):
if (stkraw[dl[i] + 1 : dl[i] + 24] == 'Enterprise Value/EBITDA'):
evebitda = stkraw[dl[i + 1] + 1 : dl[i + 2]]
break
return _toFloat(evebitda)
def readYahooBBY(line):
"Returns total buys and sells from Yahoo! Finance HTML line. Result will \
still need to be divided by market cap"
# Line also contains Borrowings details - Remove it all
if 'Net Borrowings' in line:
# Remove extra data
line = line[:line.find('Net Borrowings')]
# Trim prior data
line = line[line.find('Sale Purchase of Stock'):]
# Determine if buys or sells, replace open parantheses:
# (#,###) -> -#,###
line = re.sub(r'[(]', '-', line)
# Eliminate commas and close parantheses: -#,### -> -####
line = re.sub(r'[,|)]', '', line)
# Remove HTML data and markup, replacing with commas
line = re.sub(r'[<.*?>|]', ',', line)
line = re.sub(' ', ',', line)
# Locate the beginnings of each quarterly Sale Purchase points
starts = [m.start() for m in re.finditer(',\d+,|,.\d+', line)]
# Locate the ends of each quarterly Sale Purchase points
ends = [m.start() for m in re.finditer('\d,', line)]
# Sum all buys and sells across year
tot = 0
for i in range(0, len(starts)):
# x1000 because all numbers are in thousands
tot = tot + float(line[starts[i] + 1 : ends[i] + 1]) * 1000
return tot
def _parseHtml(line):
"Parses the HTML line by </td> breaks and returns the delimited string"
# Replace </td> breaks with placeholder, '`'
ph = '`'
rem = re.sub('</td>', ph, line)
# The ticker symbol initial delimiter is different
# Remove all other remaining HTML data
stkraw = re.sub('<.*?>', '', rem)
# Replace unbalanced HTML
stkraw = re.sub('">', '`', stkraw)
# Find the placeholders
dl = [m.start() for m in re.finditer(ph, stkraw)]
return (stkraw, dl)
If anyone has any input or perhaps a better method such as beautifulsoup, I'd really appreciate it! I'm very open to any tutorials that would help as well. My intent is to both better my programming ability and have an effective stock screener.
I was having the same issue scraping the Yahoo data in Python, and in Matlab as well. As a workaround, I wrote a macro in VBA to grab all of the EV/EBITDA data from Yahoo by visiting each stock's Key Statistics page. However, it takes about a day to run on all 3,000+ stocks with market caps over $200M, which is not really practical.
I've tried finding the EV/EBITDA on various stock screeners online, but they either don't report it or only let you download a couple hundred stocks' data without paying. Busy Stock's screener seems the best in this regard, but their EV/EBITDA figures don't line up to Yahoo's, which worries me that they are using different methodology.
One solution and my recommendation to you is to use the Trending Value algorithm in Quantopian, which is free. You can find the code here: https://www.quantopian.com/posts/oshaugnessy-what-works-on-wall-street
Quantopian will let you backtest the algorithm to 2002, and live test it as well.

Categories