I'm trying to web scrape UFC fighters stats based on user input. I'm using beautiful soup and pandas. The idea is that the user input is matched to a fighters first and last name then returns their stats. Ideally I'd like to add the option of specifying which particular stat is required in a separate input. I've been able to pull the html table headers successfully but I don't know how to assign values to them which will correspond to the matching fighter name and print the associated value. In my code currently I'm splitting the fighter name input into first and last name, but I don't know how to then match them to the table data or how to return corresponding data. The data being returned currently is just the first line of results (fighter 'Tom Aaron') but no lookups or matching is being carried out. Are nested dictionaries the way to go? Any advice is greatly appreciated, this is my first python project so code is probably all over the place.
("Which fight do you want information on?"
input - Forrest Griffin
"What information do you want?:
input - Wins
"Forrest Griffin has won 8 times"
from bs4 import BeautifulSoup
import requests
import pandas as pd
website = "http://ufcstats.com/statistics/fighters?char=a&page=all"
response = requests.get(website)
response
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find('table', {'class' : 'b-statistics__table'}).find('tbody').find_all('tr')
len(results)
#print(results)
row = soup.find('tr')
print(row.get_text())
###attempting to split the table headers an assign
table = soup.find('table', {'class' : 'b-statistics__table'}).find('thead').find_all('tr')
#df=pd.read_html(str(table))[0]
#print(df.set_index(0).to_dict('dict'))
#firstname
first_name = str(results[0].find_all('td')[0].get_text())
print(first_name)
def first_names():
for names in first_name:
print(names)
return
#first_names()
last_name = results[1].find_all('td')[1].get_text()
print(last_name)
alias = results[1].find_all('td')[2].get_text()
if len(alias) == 0:
print("n/a")
else:
print(alias)
height = results[1].find_all('td')[3].get_text()
print(height)
weight = results[1].find_all('td')[4].get_text()
#print(weight)
wins = results[1].find_all('td')[7].get_text()
losses = results[1].find_all('td')[8].get_text()
draws = results[1].find_all('td')[9].get_text()
###split user input into list of first + second name
x = input("Which fighter do you want to know about?")
####print(str(first_name) + " " + str(last_name) + " has " + str(wins) + " wins, " + str(losses) + " losses and " + str(draws) + ".")
y = input("What do you want to know about?")
###if user input first name is in results row 1(Tom Aarons row) - still need to search through all result names
if x.split()[0] in str(results[1].find_all('td')[0].get_text()) and x.split()[1] in str(results[1].find_all('td')[1].get_text()) and y == "wins":
print(first_name+ " "+last_name+" has won " + wins + " times.")
if x.split()[1] in str(results[1].find_all('td')[1].get_text()):
print("ok")
else:
print('fail')
###Tom Test
print(x.split()[0])
###if input[1] = first_name and input2[2] == second_name:
if x.split()[1] == first_name:
print(x.split()[1])
if x.split()[0] in results[1] and x.split()[1] in results[1]:
print('wins')
else:
print("Who")
print(str(results[1].find_all('td')[0].get_text()))
import time
import requests
from bs4 import BeautifulSoup
ts = time.time()
friend_api_url = 'https://api.namemc.com/profile/' # + /friends
player = 'https://namemc.com/profile/surfboarding'
username_to_uuid = 'https://api.mojang.com/users/profiles/minecraft/' # + username + ?at=(timestamp)
def findFriendByUsername(player, target): #add a function to find a users friend my username (player) is the player you want to search the friends of
r = requests.get(username_to_uuid + player + '?at=' + str(ts)) #uses mojangs api scrapes website (there uuid is the "id" part) (ts is the timestamp)
uuid_get = r.json()
uuid = (uuid_get['id']) # gets uuid
friend_scrape = requests.get(friend_api_url + uuid + '/friends')
response = friend_scrape.json()
names = [] #all usernames (dont know how to explain it)
for names in response: #makes loop to print usernames
player_friends = print(names['name']) #prints username
#returns output of the friends usernames
if player_friends==(target):
print('The username ' + (target) + ' is in ' + player + ' friends list') #concatinates usernames into one string
Currently Im trying to scrape a websites api and I search everything with the name (name) which fetches the username for who im trying to search It brings many strings of characters and Im trying to make a program where I can search it so I try to use if player_friends==(target): But it seems like I never get a output saying that they found that username it seems like its just one big clump of letters, Is there anyway I can make this searchable (sorry if the formatting is bad im pretty knew to stackoverflow)
import mcuuidButWorks.api as mcuuid
import requests
def areFriends(player1: str, player2: str) -> bool:
friends: list = []
api: str = "https://api.namemc.com/profile/"
player = mcuuid.GetPlayerData(player1)
uuid = player.uuid
api = api + uuid + "/friends"
response = requests.get(api).json()
for player in response:
friends.append(player["name"])
return True if player2 in friends else False
With this, you could do something like:
if areFriends("player1", "player2"):
. . .
like you mentioned.
Currently I am using the following code to scrape https://www.nike.com/w/mens-shoes-nik1zy7ok for all shoes on the page:
import requests
import json
# I used a placeholder for the anchor parameter
uri = 'https://api.nike.com/cic/browse/v1?queryid=products&country=us&endpoint=product_feed/rollup_threads/v2?filter=marketplace(US)%26filter=language(en)%26filter=employeePrice(true)%26filter=attributeIds(0f64ecc7-d624-4e91-b171-b83a03dd8550%2C16633190-45e5-4830-a068-232ac7aea82c)%26anchor={}%26consumerChannelId=d9a5bc42-4b9c-4976-858a-f159cf99c647%26count=60'
# collect all products
store = []
with requests.Session() as session:
found_all_products = False
anchor = 0
while not found_all_products:
result = session.get(uri.format(anchor)).json()
products = result['data']['products']['products']
store += products
if len(products) < 60:
found_all_products = True
else:
anchor += 24
# filter by cloudProductId to get a dictionary with unique products
cloudProductIds = set()
unique_products = []
for product in store:
if not product['cloudProductId'] in cloudProductIds:
cloudProductIds.add(product['cloudProductId'])
unique_products.append(product)
How do I write this same api request to retrieve either the mens' shoes from this site or the womens' shoes on the womens shoes page: https://www.nike.com/w/womens-shoes-5e1x6zy7ok ? Which parameter do I need to change?
#Greg I ran your provided API link in Postman and getting different results for men and women. All I have changed in the query string parameters is UUIDs which is unique in both the cases for men it is uuids: 0f64ecc7-d624-4e91-b171-b83a03dd8550,16633190-45e5-4830-a068-232ac7aea82c and for women uuids: 16633190-45e5-4830-a068-232ac7aea82c,193af413-39b0-4d7e-ae34-558821381d3f,7baf216c-acc6-4452-9e07-39c2ca77ba32.
If you pass these 2 unique set of uuids in the query string then you will get men and women result separately as there is no other parameter which will define their identity.
Below is the code:
import json
import requests
#common query parameters
queryid = 'filteredProductsWithContext'
anonymousId = '25AFE5BE9BB9BC03DE89DBE170D80669'
language = 'en-GB'
country = 'IN'
channel = 'NIKE'
localizedRangeStr = '%7BlowestPrice%7D%E2%80%94%7BhighestPrice%7D'
#UUIDs
uuids_men = '0f64ecc7-d624-4e91-b171-b83a03dd8550,16633190-45e5-4830-a068-232ac7aea82c'
uuids_women = '16633190-45e5-4830-a068-232ac7aea82c,193af413-39b0-4d7e-ae34-558821381d3f,7baf216c-acc6-4452-9e07-39c2ca77ba32'
def get_men_result():
url = 'https://api.nike.com/cic/browse/v1?queryid=' + queryid + '&anonymousId=' + anonymousId + '&uuids=' + uuids_men + '&language=' + language + '&country=' + country + '&channel=' + channel + '&localizedRangeStr=' + localizedRangeStr
data = requests.get(url,verify = False).json()
print(data)
def get_women_result():
url = 'https://api.nike.com/cic/browse/v1?queryid=' + queryid + '&anonymousId=' + anonymousId + '&uuids=' + uuids_women + '&language=' + language + '&country=' + country + '&channel=' + channel + '&localizedRangeStr=' + localizedRangeStr
data = requests.get(url,verify = False).json()
print(data)
get_men_result()
print('-'*100)
get_women_result()
If you look at the query string which i have created for men and women you will notice that there are 6 common parameters and only uuid is unique. Also if you want you can change country, language etc for more data fetching. Please refer screenshots as well.
Men
Women
So i have multiple patients' information stored in database.txt and i want to retrieve the data from the file into a list.
And the system prompt for patient's id to search and display other information of the patient such as Name, Age, Group & Zone.
However, i'm getting error from line 12, but the similar syntax in line 17 is able to run without problem.
search_keyword = input() # Asks for patient's name or id (either one)
with open("database.txt", "r") as database:
for data in database:
for patients in data.split('|'):
patient_details = []
for details in patients.split(','):
patient_details.append(details)
print(patient_details) # test
print(len(patient_details) # test
print(patient_details.index('Patient001')) # test
print(patient_details[4]) # test
if search_keyword == patient_details[0] or search_keyword == patient_details[4]: # error occured here, where it says list index out of range.
print("Name: " + patient_details[0])
print("Age: " + patient_details[1])
print("Group: " + patient_details[2])
print("Zone: " + patient_details[3])
print("ID: " + patient_details[4]) # no error here, patient_details[4] is able to display patient's id
database.txt
John,18,A,1,Patient001|Nick,20,F,9,Patient002
Test command for line 8,9, 10 and 11:
Line 8: [John, 18, A, 1, Patient001]
Line 9: 5
Line 10: 4
Line 11: IndexError: list index out of range
Can someone explain why this is happening, and any solutions regarding this issue without using any imported modules? Thank you for any assistance.
Imo a very good use-case for a named tuple:
from collections import namedtuple
text = "John,18,A,1,Patient001|Nick,20,F,9,Patient002"
# build database
Patient = namedtuple('Patient', ['name', 'age', 'group', 'zone', 'id'])
db = [Patient(*patient) for entry in text.split("|") for patient in [entry.split(",")]]
# Asks for patient's id
search_keyword = input("Please give an ID: ")
# query the database
result = [patient for patient in db if patient.id == search_keyword]
# or patient.id.startswith(), etc.
print(result)
Without any imported modules, you could use
text = "John,18,A,1,Patient001|Nick,20,F,9,Patient002"
# build database
db = [entry.split(",") for entry in text.split("|")]
search_keyword = input("Please give an ID: ") # Asks for patient's id
# query the database
result = [patient for patient in db if patient[4] == search_keyword]
print(result)
I see no flaw in the code. Although, I can point out a few ways to optimise it :
patient_details = dict()
with open("database.txt", "r") as database:
for data in database:
for patients in data.split('|'):
patients = patients.split(',')
patient_details[patients[4]] = patients[0:4]
search_keyword = input() # Asks for patient's id
if patient_details.get(search_keyword, None):
patient_detail = patient_details[search_keyword]
print("Name: " + patient_detail[0])
print("Age: " + patient_detail[1])
print("Group: " + patient_detail[2])
print("Zone: " + patient_detail[3])
print("ID: " + search_keyword)
Using map instead of a linear search would allow you to search optimally.
I'm using Google API to obtain the json data of nearby coffee outlets. To do this, I need to encode the latitude and longitude into the URL.
The required URL: https://maps.googleapis.com/maps/api/place/textsearch/json?query=coffee&location=22.303940,114.170372&radius=1000&maxprice=3&key=myAPIKey
The URL i'm obtaining using urlencode: https://maps.googleapis.com/maps/api/place/textsearch/json?query=coffee&location=22.303940%2C114.170372&radius=1000&maxprice=3&key=myAPIKEY
How can I remove the "%2C" in the URL? (I have shown my code below)
serviceurl_placesearch = 'https://maps.googleapis.com/maps/api/place/textsearch/json?'
parameters = dict()
query = input('What are you searching for?')
parameters['query'] = query
parameters['location'] = "22.303940,114.170372"
while True:
radius = input('Enter radius of search in meters: ')
try:
radius = int(radius)
parameters['radius'] = radius
break
except:
print('Please enter number for radius')
while True:
maxprice = input('Enter the maximum price level you are looking for(0 to 4): ')
try:
maxprice = int(maxprice)
parameters['maxprice'] = maxprice
break
except:
print('Valid inputs are 0,1,2,3,4')
parameters['key'] = API_key
url = serviceurl_placesearch + urllib.parse.urlencode(parameters)
I added this piece of code in to make the URL work however I don't think this is a long term solution. I'm looking for a more long term solution.
urlparts = url.split('%2C')
url = ','.join(urlparts)
You can add safe=","
import urllib.parse
parameters = {'location': "22.303940,114.170372"}
urllib.parse.urlencode(parameters, safe=',')
Result
location=22.303940,114.170372