I'm trying to web scrape UFC fighters stats based on user input. I'm using beautiful soup and pandas. The idea is that the user input is matched to a fighters first and last name then returns their stats. Ideally I'd like to add the option of specifying which particular stat is required in a separate input. I've been able to pull the html table headers successfully but I don't know how to assign values to them which will correspond to the matching fighter name and print the associated value. In my code currently I'm splitting the fighter name input into first and last name, but I don't know how to then match them to the table data or how to return corresponding data. The data being returned currently is just the first line of results (fighter 'Tom Aaron') but no lookups or matching is being carried out. Are nested dictionaries the way to go? Any advice is greatly appreciated, this is my first python project so code is probably all over the place.
("Which fight do you want information on?"
input - Forrest Griffin
"What information do you want?:
input - Wins
"Forrest Griffin has won 8 times"
from bs4 import BeautifulSoup
import requests
import pandas as pd
website = "http://ufcstats.com/statistics/fighters?char=a&page=all"
response = requests.get(website)
response
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find('table', {'class' : 'b-statistics__table'}).find('tbody').find_all('tr')
len(results)
#print(results)
row = soup.find('tr')
print(row.get_text())
###attempting to split the table headers an assign
table = soup.find('table', {'class' : 'b-statistics__table'}).find('thead').find_all('tr')
#df=pd.read_html(str(table))[0]
#print(df.set_index(0).to_dict('dict'))
#firstname
first_name = str(results[0].find_all('td')[0].get_text())
print(first_name)
def first_names():
for names in first_name:
print(names)
return
#first_names()
last_name = results[1].find_all('td')[1].get_text()
print(last_name)
alias = results[1].find_all('td')[2].get_text()
if len(alias) == 0:
print("n/a")
else:
print(alias)
height = results[1].find_all('td')[3].get_text()
print(height)
weight = results[1].find_all('td')[4].get_text()
#print(weight)
wins = results[1].find_all('td')[7].get_text()
losses = results[1].find_all('td')[8].get_text()
draws = results[1].find_all('td')[9].get_text()
###split user input into list of first + second name
x = input("Which fighter do you want to know about?")
####print(str(first_name) + " " + str(last_name) + " has " + str(wins) + " wins, " + str(losses) + " losses and " + str(draws) + ".")
y = input("What do you want to know about?")
###if user input first name is in results row 1(Tom Aarons row) - still need to search through all result names
if x.split()[0] in str(results[1].find_all('td')[0].get_text()) and x.split()[1] in str(results[1].find_all('td')[1].get_text()) and y == "wins":
print(first_name+ " "+last_name+" has won " + wins + " times.")
if x.split()[1] in str(results[1].find_all('td')[1].get_text()):
print("ok")
else:
print('fail')
###Tom Test
print(x.split()[0])
###if input[1] = first_name and input2[2] == second_name:
if x.split()[1] == first_name:
print(x.split()[1])
if x.split()[0] in results[1] and x.split()[1] in results[1]:
print('wins')
else:
print("Who")
print(str(results[1].find_all('td')[0].get_text()))
I'm trying to set up a loop to pull in weather data for about 500 weather stations for an entire year which I have in my dataframe. The base URL stays the same, and the only part that changes is the weather station ID.
I'd like to create a dataframe with the results. I believe i'd use requests.get to pull in data for all the weather stations in my list, which the IDs to use in the URL are in a column called "API ID" in my dataframe. I am a python beginner - so any help would be appreciated! My code is below but doesn't work and returns an error:
"InvalidSchema: No connection adapters were found for '0 " http://www.ncei.noaa.gov/access/services/data/...\nName: API ID, Length: 497, dtype: object'
.
def callAPI(API_id):
for IDs in range(len(API_id)):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + distances['API ID'] + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
ll = []
for index1,rows1 in distances.iterrows():
station = rows1['Closest Station']
API_id = rows1['API ID']
data = callAPI(API_id)
ll.append([(data)])
I am not sure about your whole code base, but this is the function that will return the data from the API, If you have multiple station id on a single df column then you can use a for loop otherwise no need to do that.
Also, you are not returning the result from the function. Check the return keyword at the end of the function.
Working code:
import requests
def callAPI(API_id):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + API_id + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
return d
print(callAPI('USC00457180'))
So your full code will be something like this,
def callAPI(API_id):
url = ('http://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP,SNOW,TMAX,TMIN&stations=' + API_id + '&startDate=2020-01-01&endDate=2020-12-31&includeAttributes=0&includeStationName=true&units=standard&format=json')
r = requests.request('GET', url)
d = r.json()
return d
ll = []
for index1,rows1 in distances.iterrows():
station = rows1['Closest Station']
API_id = rows1['API ID']
data = callAPI(API_id)
ll.append([(data)])
Note: Even better use asynchronous calls to the API to make the process faster. Something like this: https://stackoverflow.com/a/56926297/1138192
import time
import requests
from bs4 import BeautifulSoup
ts = time.time()
friend_api_url = 'https://api.namemc.com/profile/' # + /friends
player = 'https://namemc.com/profile/surfboarding'
username_to_uuid = 'https://api.mojang.com/users/profiles/minecraft/' # + username + ?at=(timestamp)
def findFriendByUsername(player, target): #add a function to find a users friend my username (player) is the player you want to search the friends of
r = requests.get(username_to_uuid + player + '?at=' + str(ts)) #uses mojangs api scrapes website (there uuid is the "id" part) (ts is the timestamp)
uuid_get = r.json()
uuid = (uuid_get['id']) # gets uuid
friend_scrape = requests.get(friend_api_url + uuid + '/friends')
response = friend_scrape.json()
names = [] #all usernames (dont know how to explain it)
for names in response: #makes loop to print usernames
player_friends = print(names['name']) #prints username
#returns output of the friends usernames
if player_friends==(target):
print('The username ' + (target) + ' is in ' + player + ' friends list') #concatinates usernames into one string
Currently Im trying to scrape a websites api and I search everything with the name (name) which fetches the username for who im trying to search It brings many strings of characters and Im trying to make a program where I can search it so I try to use if player_friends==(target): But it seems like I never get a output saying that they found that username it seems like its just one big clump of letters, Is there anyway I can make this searchable (sorry if the formatting is bad im pretty knew to stackoverflow)
import mcuuidButWorks.api as mcuuid
import requests
def areFriends(player1: str, player2: str) -> bool:
friends: list = []
api: str = "https://api.namemc.com/profile/"
player = mcuuid.GetPlayerData(player1)
uuid = player.uuid
api = api + uuid + "/friends"
response = requests.get(api).json()
for player in response:
friends.append(player["name"])
return True if player2 in friends else False
With this, you could do something like:
if areFriends("player1", "player2"):
. . .
like you mentioned.
I am using Zapier to catch a webhook and use that info for an API post. The action code runs perfectly fine with "4111111111111111" in place of Ccnum in doSale. But when I use the input_data variable and place it in doSale it errors.
Zapier Input Variable:
Zapier Error:
Python code:
import pycurl
import urllib
import urlparse
import StringIO
class gwapi():
def __init__(self):
self.login= dict()
self.order = dict()
self.billing = dict()
self.shipping = dict()
self.responses = dict()
def setLogin(self,username,password):
self.login['password'] = password
self.login['username'] = username
def setOrder(self, orderid, orderdescription, tax, shipping, ponumber,ipadress):
self.order['orderid'] = orderid;
self.order['orderdescription'] = orderdescription
self.order['shipping'] = '{0:.2f}'.format(float(shipping))
self.order['ipaddress'] = ipadress
self.order['tax'] = '{0:.2f}'.format(float(tax))
self.order['ponumber'] = ponumber
def setBilling(self,
firstname,
lastname,
company,
address1,
address2,
city,
state,
zip,
country,
phone,
fax,
email,
website):
self.billing['firstname'] = firstname
self.billing['lastname'] = lastname
self.billing['company'] = company
self.billing['address1'] = address1
self.billing['address2'] = address2
self.billing['city'] = city
self.billing['state'] = state
self.billing['zip'] = zip
self.billing['country'] = country
self.billing['phone'] = phone
self.billing['fax'] = fax
self.billing['email'] = email
self.billing['website'] = website
def setShipping(self,firstname,
lastname,
company,
address1,
address2,
city,
state,
zipcode,
country,
email):
self.shipping['firstname'] = firstname
self.shipping['lastname'] = lastname
self.shipping['company'] = company
self.shipping['address1'] = address1
self.shipping['address2'] = address2
self.shipping['city'] = city
self.shipping['state'] = state
self.shipping['zip'] = zipcode
self.shipping['country'] = country
self.shipping['email'] = email
def doSale(self,amount, ccnumber, ccexp, cvv=''):
query = ""
# Login Information
query = query + "username=" + urllib.quote(self.login['username']) + "&"
query += "password=" + urllib.quote(self.login['password']) + "&"
# Sales Information
query += "ccnumber=" + urllib.quote(ccnumber) + "&"
query += "ccexp=" + urllib.quote(ccexp) + "&"
query += "amount=" + urllib.quote('{0:.2f}'.format(float(amount))) + "&"
if (cvv!=''):
query += "cvv=" + urllib.quote(cvv) + "&"
# Order Information
for key,value in self.order.iteritems():
query += key +"=" + urllib.quote(str(value)) + "&"
# Billing Information
for key,value in self.billing.iteritems():
query += key +"=" + urllib.quote(str(value)) + "&"
# Shipping Information
for key,value in self.shipping.iteritems():
query += key +"=" + urllib.quote(str(value)) + "&"
query += "type=sale"
return self.doPost(query)
def doPost(self,query):
responseIO = StringIO.StringIO()
curlObj = pycurl.Curl()
curlObj.setopt(pycurl.POST,1)
curlObj.setopt(pycurl.CONNECTTIMEOUT,30)
curlObj.setopt(pycurl.TIMEOUT,30)
curlObj.setopt(pycurl.HEADER,0)
curlObj.setopt(pycurl.SSL_VERIFYPEER,0)
curlObj.setopt(pycurl.WRITEFUNCTION,responseIO.write);
curlObj.setopt(pycurl.URL,"https://secure.merchantonegateway.com/api/transact.php")
curlObj.setopt(pycurl.POSTFIELDS,query)
curlObj.perform()
data = responseIO.getvalue()
temp = urlparse.parse_qs(data)
for key,value in temp.iteritems():
self.responses[key] = value[0]
return self.responses['response']
# NOTE: your username and password should replace the ones below
Ccnum = input_data['Ccnum'] #this variable I would like to use in
#the gw.doSale below
gw = gwapi()
gw.setLogin("demo", "password");
gw.setBilling("John","Smith","Acme, Inc.","123 Main St","Suite 200", "Beverly Hills",
"CA","90210","US","555-555-5555","555-555-5556","support#example.com",
"www.example.com")
r = gw.doSale("5.00",Ccnum,"1212",'999')
print gw.responses['response']
if (int(gw.responses['response']) == 1) :
print "Approved"
elif (int(gw.responses['response']) == 2) :
print "Declined"
elif (int(gw.responses['response']) == 3) :
print "Error"
Towards the end is where the problems are. How can I pass the variables from Zapier into the python code?
David here, from the Zapier Platform team. A few things.
First, I think your issue is the one described here. Namely, I believe input_data's values are unicode. So you'll want to call str(input_data['Ccnum']) instead.
Alternatively, if you want to use Requests, it's also supported and is a lot less finicky.
All that said, I would be remiss if I didn't mention that everything in Zapier code steps gets logged in plain text internally. For that reason, I'd strongly recommend against putting credit card numbers, your password for this service, and any other sensitive data through a Code step. A private server that you control is a much safer option.
Let me know if you've got any other questions!
Im getting the top artists from a specify country using last fm api and I want to save the name, url and the biograpgy for each top artist. The name and url is working fine, but the biography is not working.
Im doing like this to get the name and url of the top artists:
import requests
api_key = ""
ID = 0
artists = {}
for i in range(1, 3):
artists_response = requests.get('http://ws.audioscrobbler.com/2.0/?method=geo.gettopartists&country=spain&format=json&page=' + str(i) + '&api_key=' + api_key)
artists_data = artists_response.json()
#print(artists_data)
for artist in artists_data["topartists"]["artist"]:
name = artist["name"]
url = artist["url"]
image = artist["image"]
artists[ID] = {}
artists[ID]['ID'] = ID
artists[ID]['name'] = name
artists[ID]['url'] = url
artists[ID]['image'] = image
ID += 1
#print(artists)
At this point is working fine. But now I want to get the biography summary for each topartist, but it is appearing the error "TypeError: string indices must be integers", on " print(artist["summary"])":
for i,v in artists.items():
chosen = artists[i]['name'].replace(" ", "+")
artist_response = requests.get('http://ws.audioscrobbler.com/2.0/?method=artist.getinfo&format=json&artist='+chosen+'&api_key='+api_key)
artist_data = artist_response.json()
#print(artist_data)
for artist in artist_data['artist']['bio']:
print(artist["summary"])
bio = artist["summary"]
artists[ID]['bio'] = bio
# print(artist_response)
From your example data, it is clear that artist_data["artist"]["bio"] is a dictionary, so that the loop assigns the keys of that dictionary (which are strings) to artist.
As you have not provided an example of artist_data["top_artists"], I cannot speak to why that did not produce the same error.