Trying to pass a list from one function to another (teamurls), which then in turn I will use in another program. I have my program working where I have a value output using yield. (yield full_team_urls)
How do I pass the list from the first function into the second (the def - team_urls) Also is that still possible to return the list and continue using yield aswell?
Can each function only return or output one object?
Edit: Tried to pass teamurls into the second function as shown below and I get the error - TypeError: team_urls() missing 1 required positional argument: 'teamurls'>
def table():
url = 'https://www.skysports.com/premier-league-table'
base_url = 'https://www.skysports.com'
today = str(date.today())
premier_r = requests.get(url)
print(premier_r.status_code)
premier_soup = BeautifulSoup(premier_r.text, 'html.parser')
headers = "Position, Team, Pl, W, D, L, F, A, GD, Pts\n"
premier_soup_tr = premier_soup.find_all('tr', {'class': 'standing-table__row'})
premier_soup_th = premier_soup.find_all('thead')
f = open('premier_league_table.csv', 'w')
f.write("Table as of {}\n".format (today))
f.write(headers)
premier_soup_tr = premier_soup.find_all('tr', {'class': 'standing-table__row'})
result = [[r.text.strip() for r in td.find_all('td', {'class': 'standing-table__cell'})][:-1] for td in premier_soup_tr[1:]]
teamurls = ([a.find("a",href=True)["href"] for a in premier_soup_tr[1:]])
return teamurls
for item in result:
f.write(",".join(item))
f.write("\n")
f.close()
print('\n Premier league teams full urls:\n')
for item in teamurls:
entire_team = []
# full_team_urls.append(base_url+ item)
full_team_urls = (base_url + item + '-squad')
yield full_team_urls
table()
def team_urls(teamurls):
teams = [i.strip('/') for i in teamurls]
print (teams)
team_urls()
To pass a value into a method, use arguments:
team_urls(teamurls)
You'll need to specify that argument at the definition of team_urls as such:
def team_urls(teamurls):
Related
The ultimate goal of this is to output select data columns to a .csv. I had it working once to where it only got the first table on the page but I needed both. Now it says this. Im quite new to python and IDK how I got to this point in the first place. I needed the call and put table but on the web page the calls came first and when I did .find I only got the calls. I am working on this with a friend and he put in the last two functions. He could get the columns I wanted but now we only get the calls. I tried to fix it and now it say the error in the title.
import bs4
import requests
import pandas as pd
import csv
from bs4 import BeautifulSoup
#sets desired ticker. in the future you could make this long
def ticker():
ticker = ['GME','NYMT']
return ticker
#creates list of urls for scrapet to grab
def ticker_site():
ticker_site = ['https://finance.yahoo.com/quote/'+x+'/options?p='+x for x in ticker()]
return ticker_site
optionRows = []
for i in range(len(ticker_site())):
optionRows.append([])
def ticker_gets():
option_page = ticker_site()
requested_page = requests.get(option_page[i])
ticker_soup = BeautifulSoup(requested_page.text,'html.parser')
return ticker_soup
def soup_search():
table = ticker_gets()
both_tables = table.find_all('table')
call_table = both_tables[0]
put_table= both_tables[1]
call_rows = call_table.find('tr')
put_rows = put_table.find('tr')
#makes the call table
for call in call_rows:
whole_call_table = call.find_all('td')
call_row = [y.text for y in whole_call_table]
optionRows[call].append(call_row)
#makes the put table
for put in put_rows:
whole_put_table = put.find_all('td')
put_row = [z.text for z in whole_put_table]
optionRows[put].append(put_row)
for i in range(len(optionRows)):
optionRows[i] = optionRows[i][1:len(optionRows[i])]
return optionRows
def getColumns(columnIndexes=[2, 4, 5]):
newList = []
for tickerIndex in range(len(soup_search())):
newList.append([])
indexCount = 0
for j in soup_search()[tickerIndex]:
newList[tickerIndex].append([])
for i in columnIndexes:
newList[tickerIndex][indexCount].append(j[i])
indexCount += 1
return newList
def csvOutputer():
rows = getColumns()
fields = ["Ticker", "Strike", "Bid", "Ask"]
with open('newcsv', 'w') as f:
write = csv.writer(f)
write.writerow(fields)
for i in range(len(ticker())):
for j in rows[i]:
j.insert(0, ticker()[i])
write.writerow(j)
csvOutputer()
I'm creating a web scraper that will be used to value stocks. The problem I got is that my code returns a object "placement" (Not sure what it should be called) instead of the value.
import requests
class Guru():
MedianPE = 0.0
def __init__(self, ticket):
self.ticket = ticket
try:
url = ("https://www.gurufocus.com/term/pettm/"+ticket+"/PE-Ratio-TTM/")
response = requests.get(url)
htmlText = response.text
firstSplit = htmlText
secondSplit = firstSplit.split("And the <strong>median</strong> was <strong>")[1]
thirdSplit = secondSplit.split("</strong>")[0]
lastSplit = float(thirdSplit)
try:
Guru.MedianPE = lastSplit
except:
print(ticket + ": Median PE N/A")
except:
print(ticket + ": Median PE N/A")
def getMedianPE(self):
return float(Guru.getMedianPE)
g1 = Guru("AAPL")
g1.getMedianPE
print("Median + " + str(g1))
If I print the lastSplit inside the __init__ it returns the value I want 15.53 but when I try to get it by the function getMedianPE I just get Median + <__main__.Guru object at 0x0000016B0760D288>
Thanks a lot for your time!
Looks like you are trying to cast a function object to a float. Simply change return float(Guru.getMedianPE) to return float(Guru.MedianPE)
getMedianPE is a function (also called object method when part of a class), so you need to call it with parentheses. If you call it without parentheses, you get the method/function itself rather than the result of calling the method/function.
The other problem is that getMedianPE returns the function Guru.getMedianPE rather than the value Guru.MedianPE. I don't think you want MedianPE to be a class variable - you probably just want to set it as a default of 0 in init so that each object has its own median_PE value.
Also, it is not a good idea to include all of the scraping code in your init method. That should be moved to a scrape() method (or some other name) that you call after instantiating the object.
Finally, if you are going to print an object, it is useful to have a str method, so I added a basic one here.
So putting all of those comments together, here is a recommended refactor of your code.
import requests
class Guru():
def __init__(self, ticket, median_PE=0):
self.ticket = ticket
self.median_PE = median_PE
def __str__(self):
return f'{self.ticket} {self.median_PE}'
def scrape(self):
try:
url = f"https://www.gurufocus.com/term/pettm/{self.ticket}/PE-Ratio-TTM/"
response = requests.get(url)
htmlText = response.text
firstSplit = htmlText
secondSplit = firstSplit.split("And the <strong>median</strong> was <strong>")[1]
thirdSplit = secondSplit.split("</strong>")[0]
lastSplit = float(thirdSplit)
self.median_PE = lastSplit
except ValueError:
print(f"{self.ticket}: Median PE N/A")
Then you run the code
>>>g1 = Guru("AAPL")
...g1.scrape()
...print(g1)
AAPL 15.53
I have a problem with passing a variable from a method to method within a given class.
The code is this (I am practicing beginner):
class Calendar():
def __init__(self,link):
self.link = link
self.request = requests.get(link)
self.request.encoding='UTF-8'
self.soup = BeautifulSoup(self.request.text,'lxml')
def DaysMonth(self):
Dates = []
tds = self.soup.findAll('td', {'class':'action'})
for td in tds:
check = (td.findAll('a')[0].text)
if "Víkendová odstávka" in check:
date = td.findAll('span')[0].text
Dates.append(date)
return Dates
def PrintCal(self):
return ['Víkendová odstávka serverů nastane ' + date + '. den v měsíci.' for date in Dates]
def main(self):
PrintCal(DaysMonth())
I would like to pass the list Dates from the method DaysMonth to the method PrintCal. When I initiate the class, i.e. cal = Calendar('link'), and run cal.PrinCal(), I get that the name Dates has not been defined. If I run cal.DaysMonth(), the output is as expected.
What is the issue here? Thank you!
Dates is a local variable in the DaysMonth method, and is therefore not visible anywhere else. Fortunately, DaysMonth does return Dates, so it's easy to get the value you want. Simply add the following line to your PrintCal method (before the return statement):
Dates = self.DaysMonth()
You are trying to do too much in the Calendar object, particularly in the init method. You don't want to combine the scraping and parsing of a website at the time the object is instantiated. I would use the Calendar object to store and display the results of the scraping/parsing. If you need everything to be object-oriented, than create a separate Scraper/Parser class that handles that part of the logic.
class Calendar():
def __init__(self, dates):
self.dates = dates
def display_dates(self):
return ['Víkendová odstávka serverů nastane ' + date + '. den v měsíci.'
for date in self.dates]
r = requests.get(link, encoding='UTF-8')
soup = BeautifulSoup(r.text,'lxml')
dates = []
for td in soup.findAll('td', {'class':'action'}):
check = (td.findAll('a')[0].text)
if "Víkendová odstávka" in check:
dates.append(td.findAll('span')[0].text)
c = Calendar(dates=dates)
print(c.display_dates)
Working on getting some wave heights from websites and my code fails when the wave heights get into the double digit range.
Ex: Currently the code would scrape a 12 from the site as '1' and '2' separately, not '12'.
#Author: David Owens
#File name: soupScraper.py
#Description: html scraper that takes surf reports from various websites
import csv
import requests
from bs4 import BeautifulSoup
NUM_SITES = 2
reportsFinal = []
###################### SURFLINE URL STRINGS AND TAG ###########################
slRootUrl = 'http://www.surfline.com/surf-report/'
slSunsetCliffs = 'sunset-cliffs-southern-california_4254/'
slScrippsUrl = 'scripps-southern-california_4246/'
slBlacksUrl = 'blacks-southern-california_4245/'
slCardiffUrl = 'cardiff-southern-california_4786/'
slTagText = 'observed-wave-range'
slTag = 'id'
#list of surfline URL endings
slUrls = [slSunsetCliffs, slScrippsUrl, slBlacksUrl]
###############################################################################
#################### MAGICSEAWEED URL STRINGS AND TAG #########################
msRootUrl = 'http://magicseaweed.com/'
msSunsetCliffs = 'Sunset-Cliffs-Surf-Report/4211/'
msScrippsUrl = 'Scripps-Pier-La-Jolla-Surf-Report/296/'
msBlacksUrl = 'Torrey-Pines-Blacks-Beach-Surf-Report/295/'
msTagText = 'rating-text'
msTag = 'li'
#list of magicseaweed URL endings
msUrls = [msSunsetCliffs, msScrippsUrl, msBlacksUrl]
###############################################################################
'''
This class represents a surf break. It contains all wave, wind, & tide data
associated with that break relevant to the website
'''
class surfBreak:
def __init__(self, name,low, high, wind, tide):
self.name = name
self.low = low
self.high = high
self.wind = wind
self.tide = tide
#toString method
def __str__(self):
return '{0}: Wave height: {1}-{2} Wind: {3} Tide: {4}'.format(self.name,
self.low, self.high, self.wind, self.tide)
#END CLASS
'''
This returns the proper attribute from the surf report sites
'''
def reportTagFilter(tag):
return (tag.has_attr('class') and 'rating-text' in tag['class']) \
or (tag.has_attr('id') and tag['id'] == 'observed-wave-range')
#END METHOD
'''
This method checks if the parameter is of type int
'''
def representsInt(s):
try:
int(s)
return True
except ValueError:
return False
#END METHOD
'''
This method extracts all ints from a list of reports
reports: The list of surf reports from a single website
returns: reportNums - A list of ints of the wave heights
'''
def extractInts(reports):
print reports
reportNums = []
afterDash = False
num = 0
tens = 0
ones = 0
#extract all ints from the reports and ditch the rest
for report in reports:
for char in report:
if representsInt(char) == True:
num = int(char)
reportNums.append(num)
else:
afterDash = True
return reportNums
#END METHOD
'''
This method iterates through a list of urls and extracts the surf report from
the webpage dependent upon its tag location
rootUrl: The root url of each surf website
urlList: A list of specific urls to be appended to the root url for each
break
tag: the html tag where the actual report lives on the page
returns: a list of strings of each breaks surf report
'''
def extractReports(rootUrl, urlList, tag, tagText):
#empty list to hold reports
reports = []
reportNums = []
index = 0
#loop thru URLs
for url in urlList:
try:
index += 1
#request page
request = requests.get(rootUrl + url)
#turn into soup
soup = BeautifulSoup(request.content, 'lxml')
#get the tag where surflines report lives
reportTag = soup.findAll(reportTagFilter)[0]
reports.append(reportTag.text.strip())
#notify if fail
except:
print 'scrape failure at URL ', index
pass
reportNums = extractInts(reports)
return reportNums
#END METHOD
'''
This method calculates the average of the wave heights
'''
def calcAverages(reportList):
#empty list to hold averages
finalAverages = []
listIndex = 0
waveIndex = 0
#loop thru list of reports to calc each breaks ave low and high
for x in range(0, 6):
#get low ave
average = (reportList[listIndex][waveIndex]
+ reportList[listIndex+1][waveIndex]) / NUM_SITES
finalAverages.append(average)
waveIndex += 1
return finalAverages
#END METHOD
slReports = extractReports(slRootUrl, slUrls, slTag, slTagText)
msReports = extractReports(msRootUrl, msUrls, msTag, msTagText)
reportsFinal.append(slReports)
reportsFinal.append(msReports)
print 'Surfline: ', slReports
print 'Magicseaweed: ', msReports
You are not actually extracting integers, but floats, it seems, since the values in reports are something like ['0.3-0.6 m']. Right now you are just going through every single character and converting them to int one by one or discarding. So no wonder that you will get only single-digit numbers.
One (arguably) simple way to extract those numbers from that string is with regexp:
import re
FLOATEXPR = re.compile("(\d+\.\d)-(\d+\.\d) {0,1}m")
def extractFloats(reports):
reportNums = []
for report in reports:
groups = re.match(FLOATEXPR, report).groups()
for group in groups:
reportNums.append(float(group))
return reportNums
This expression would match your floats and return them as a list.
In detail, the expression will match anything that has at least one digit before a '.', and one digit after it, a '-' between, another float sequence and ending with 'm' or ' m'. Then it groups the parts representing floats to a tuple. For example that ['12.0m-3.0m'] would return [12.0, 3.0]. If you expect it to have more digits after the floating point, you can add an extra '+' after the second 'd':s in the expression.
I have the following program to scrap data from a website. I want to improve the below code by using a generator with a yield instead of calling generate_url and call_me multiple times sequentially. The purpose of this exersise is to properly understand yield and the context in which it can be used.
import requests
import shutil
start_date='03-03-1997'
end_date='10-04-2015'
yf_base_url ='http://real-chart.finance.yahoo.com/table.csv?s=%5E'
index_list = ['BSESN','NSEI']
def generate_url(index, start_date, end_date):
s_day = start_date.split('-')[0]
s_month = start_date.split('-')[1]
s_year = start_date.split('-')[2]
e_day = end_date.split('-')[0]
e_month = end_date.split('-')[1]
e_year = end_date.split('-')[2]
if (index == 'BSESN') or (index == 'NSEI'):
url = yf_base_url + index + '&a={}&b={}&c={}&d={}&e={}&f={}'.format(s_day,s_month,s_year,e_day,e_month,e_year)
return url
def callme(url,index):
print('URL {}'.format(url))
r = requests.get(url, verify=False,stream=True)
if r.status_code!=200:
print "Failure!!"
exit()
else:
r.raw.decode_content = True
with open(index + "file.csv", 'wb') as f:
shutil.copyfileobj(r.raw, f)
print "Success"
if __name__ == '__main__':
url = generate_url(index_list[0],start_date,end_date)
callme(url,index_list[0])
url = generate_url(index_list[1],start_date,end_date)
callme(url,index_list[1])
There are multiple options. You could use yield to iterate over URL's. Or over request objects.
If your index_list were long, I would suggest yielding URLs.
Because then you could use multiprocessing.Pool to map a function that does a request and saves the output over these URLs. That would execute them in parallel, potentially making it a lot faster (assuming that you have enough network bandwidth, and that yahoo finance doesn't throttle connections).
yf ='http://real-chart.finance.yahoo.com/table.csv?s=%5E'
'{}&a={}&b={}&c={}&d={}&e={}&f={}'
index_list = ['BSESN','NSEI']
def genurl(symbols, start_date, end_date):
# assemble the URLs
s_day, s_month, s_year = start_date.split('-')
e_day, e_month, e_year = end_date.split('-')
for s in symbols:
url = yf.format(s, s_day,s_month,s_year,e_day,e_month,e_year)
yield url
def download(url):
# Do the request, save the file
p = multiprocessing.Pool()
rv = p.map(download, genurl(index_list, '03-03-1997', '10-04-2015'))
If I understand you correctly, what you want to know is how to change the code so that you can replace the last part by
if __name__ == '__main__':
for url in generate_url(index_list,start_date,end_date):
callme(url,index)
If this is correct, you need to change generate_url, but not callme. Changing generate_url is rather mechanical. Make the first parameter index_list instead of index, wrap the function body in a for index in index_list loop, and change return url to yield url.
You don't need to change callme because you never want to say something like for call in callme(...). You won't do anything with it but a normal function call.