How to extract data from table using Beautifulsoup, with no text - python

Im trying to extract the leaderboard on https://fortnitetracker.com/events/epicgames_S11_PlatformCup_PC_NAE.
However the data inside the table is not extractable as 'text' elements. Do not know how to extract player names...
from bs4 import BeautifulSoup
import requests
url = "https://fortnitetracker.com/events/epicgames_S11_PlatformCup_PC_NAE"
response = requests.get(url)
print(response)
soup = BeautifulSoup(response.text, "html.parser")
table_needed = soup.find_all('table',attrs={'class':'trn-table'})
table_needed=table_needed[0]
for i in table_needed:
print(i.find('div'))
This gives me the output. What can i do from here?
'{{ getPlayerNameList(entry.teamAccountIds, 4) }}'
I see theres a getPlayerNameList...

the player names are dynamically injected into the table with javascript(vue) so I don't think that you can scrape it with just beautifulsoup, i would suggest you try tools like selenium.

Related

Using BeautifulSoup to extract table

I would like to extract the table from the following URL: "https://www.nordpoolgroup.com/en/Market-data1/#/nordic/table", and eventually store it in a pandas dataframe.
The code bellow returns:
table day-headers="true" enable-filter="false" nps-data-table="" table-data="ctrl.data[ctrl.selectedTab].table.data"></table
URL = "https://www.nordpoolgroup.com/en/Market-data1/#/nordic/table"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.find('table')[0])
I am not sure how to continue. I have been able to extract the content under the table tag on other sites using this code. Could someone please give me some advice and maybe explain what is happening in this case?
from pprint import pp
import requests
def main(url):
r = requests.get(url)
pp(r.json())
main('https://www.nordpoolgroup.com/api/marketdata/page/11')
from here you can parse it as JSON and create your dataframe!

Extracting link from soup python

I'm trying make an app gets the source links on bandcamp but im kinda stuck. Is there a way to get the source link with beautifulsoup.
The link im trying to get
Bandcamp
The data is within the <script> tags in json format. So use BeautifulSoup to get the 'script'. The data you are after is in the data-tralbum attribute.
Onece you get thet, have json read it in, then just iterate through the json structure:
from bs4 import BeautifulSoup
import requests
import json
url = 'https://vine.bandcamp.com/album/another-light'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
script = str(soup.find_all('script')[4]['data-tralbum'])
jsonData = json.loads(script)
trackinfo = jsonData['trackinfo']
links = []
for each in trackinfo:
links.append(each['file']['mp3-128'])
Output:
print(links)
['https://t4.bcbits.com/stream/efbba461835eff472bd04a2f9e9910a9/mp3-128/1761020287?p=0&ts=1638288735&t=8ae6343808036ab513cd5436ea009e5d0de784e4&token=1638288735_9139d56ec86f2d44b83a11f3eed8caf7075d6039', 'https://t4.bcbits.com/stream/3e5ef92e6d83e853958ed01955c95f5f/mp3-128/1256475880?p=0&ts=1638288735&t=745a6c701cf1c5772489da5467f6cae5d3622818&token=1638288735_7e86a32c635ba92e0b8320ef56a457d988286cff', 'https://t4.bcbits.com/stream/bbb49d4a72cb80feaf759ec7890abbb6/mp3-128/3439518541?p=0&ts=1638288735&t=dcc7ef7d1d7823e227339fb3243385089478ebe7&token=1638288735_5db36a29c58ea038828d7b34b67e13bd80597dd8', 'https://t4.bcbits.com/stream/8c8a69959337f6f4809f6491c2822b45/mp3-128/1330130896?p=0&ts=1638288735&t=d108dac84dfaac901a546c5fcf5064240cca376b&token=1638288735_8d9151aa82e7a00042025f924660dd3a093c2f74', 'https://t4.bcbits.com/stream/4d4253633405f204d7b1c101379a73be/mp3-128/2478242466?p=0&ts=1638288735&t=a8cd539d0ce8ff417f9b69740070870ed9a182a5&token=1638288735_ad8b5e93c8ffef6623615ce82a6754678fa67b67', 'https://t4.bcbits.com/stream/6c4feee38e289aea76080e9ddc997fa5/mp3-128/2243532902?p=0&ts=1638288735&t=83417c3aba0cef0f969f93bac5165e582f24a588&token=1638288735_c1d9d43b4e10cc6d02c822de90eda3a52c382df2', 'https://t4.bcbits.com/stream/a24dc5dad7b619d47b006e26084ff38f/mp3-128/3054008347?p=0&ts=1638288735&t=4563c326a272c9f5b8462fef1d082e46fac7f605&token=1638288735_55978e7edbe0410ff745913224b8740becad59d5', 'https://t4.bcbits.com/stream/6221790d7f55d3b1f006bd5fac5458fe/mp3-128/1500140939?p=0&ts=1638288735&t=9ecc210c53af05f4034ee00cd1a96a043312a4a7&token=1638288735_0f2faba41da8952f841669513d04bdaaae35a629', 'https://t4.bcbits.com/stream/030506909569626a0d2d7d182b61c691/mp3-128/1707615013?p=0&ts=1638288735&t=c8dcbb2c491789928f5cb6ef8b755df999cb58b8&token=1638288735_b278ba825129ae1b5588b47d5cda345ef2db4e58', 'https://t4.bcbits.com/stream/d1ae0cbc281fc81ddd91f3a3e3d80973/mp3-128/2808772965?p=0&ts=1638288735&t=1080ff51fc40bb5b7afb3a2460f3209cbda549e3&token=1638288735_c93249c847acba5cf23521fa745e05b426a5ba05', 'https://t4.bcbits.com/stream/1b9d50f8210bdc3cf4d2e33986f319ae/mp3-128/2751220220?p=0&ts=1638288735&t=9f24f06dfc5c8a06f24f28664438a6f1a75a038c&token=1638288735_f3a98a20b3c344dc5a37a602a41572d5fe8539c1', 'https://t4.bcbits.com/stream/203cd15629ba03e3249f850d5e1ac42e/mp3-128/4188265472?p=0&ts=1638288735&t=4b4bc2f2194c63a1d3b957e3dd6046bd764c272a&token=1638288735_53a70e7d83ce8c2800baeaf92a5c19db4e146e3f', 'https://t4.bcbits.com/stream/c63b5c9ca090b233e675974c7e7ee4b2/mp3-128/258670123?p=0&ts=1638288735&t=a81ae9dc33dea2b2660d13dbbec93dbcb06e6b63&token=1638288735_446d0ae442cbbadbceb342fe4f7b69d0fbab2928', 'https://t4.bcbits.com/stream/2e824d3c643658c8e9e24b548bc8cb0b/mp3-128/2332945345?p=0&ts=1638288735&t=5bdf0264b9ffe4616d920c55f5081744bf0822d4&token=1638288735_872191bb67a3438ef0fd1ce7e8a9e5ca09e6c37e']

Webscraping output []

Hey i just wanted to test Python Webscraping and i have no Idea why this doesn't work.
As output i become [] and nothing else.
Has anyone an Idea? BEcause if i go to the Website and search for the element i find it.
from bs4 import BeautifulSoup
import requests
html_text = requests.get("https://osu.ppy.sh/users/20488254").text
soup = BeautifulSoup(html_text, "lxml")
job = soup.find("div", class_ = "profile-detail__col profile-detail__col--bottom-right")
print(job)
Player info is loaded dynamically with JS. So, you can't scrape dynamic content using plain bs4. Luckily, they provide user info in json format inside script tag. If you open page source and look for json-user you will see there is a tag:
<script id="json-user" type="application/json">
{"avatar_url":"https:\/\/a.ppy.sh\/20488254?1622470835.jpeg","country_code":"AT","default_group":"default","id":20488254,...
</script>
You can grab json inside that tag and get any information about player. Here is how it would look like:
import json
import requests
from bs4 import BeautifulSoup
html_text = requests.get("https://osu.ppy.sh/users/20488254").text
soup = BeautifulSoup(html_text, "lxml")
json_data = json.loads(soup.find('script', {'id':'json-user'}).string)
Now let's say you are looking for player's global rank. All you need to do is to find the correct keys to navigate you there:
player_rank = json_data['statistics']['global_rank']
# -> 199303

Web Scraping problem, getting empty list back with BeautifulSoup Library

I'm very new to web scraping. I am trying to scrape the genres of a list of movies from their mubi website's source code by their urls. Here I found the genre with classname "css-1wuve65 eyplj6810" in source code shown in the picture below:
and with the following code, I am trying to get this genre by 'select':
'''
for i in range(len(movie_url.movie_url)):
url = movie_url.movie_url[i]
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html,'html.parser')
gener_tags = soup.select('div.css-1wuve65 eyplj6810')
print(gener_tags)
however I keep getting empty list back to every single genre_tags. I have checked url, it is correctly retrieved. Can someone help or give me some tips about how to do this? a sample url is
https://mubi.com/films/elementary-particles
You're missing a dot (.) between the div.css-1wuve65 and eyplj6810:
import requests
from bs4 import BeautifulSoup
url = 'https://mubi.com/films/elementary-particles'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup.select_one('div.css-1wuve65.eyplj6810').text)
Prints:
Comedy, Drama, Romance, Cult

Scraping a table appearing on click with python

I want to scrape information from this page.
Specifically, I want to scrape the table which appears when you click "View all" under the "TOP 10 HOLDINGS" (you have to scroll down on the page a bit).
I am new to webscraping, and have tried using BeautifulSoup to do this. However, there seems to be an issue because the "onclick" function I need to take into account. In other words: The HTML code I scrape directly from the page doesn't include the table I want to obtain.
I am a bit confused about my next step: should I use something like selenium or can I deal with the issue in an easier/more efficient way?
Thanks.
My current code:
from bs4 import BeautifulSoup
import requests
Soup = BeautifulSoup
my_url = 'http://www.etf.com/SHE'
page = requests.get(my_url)
htmltxt = page.text
soup = Soup(htmltxt, "html.parser")
print(soup)
You can get a json response from the api: http://www.etf.com/view_all/holdings/SHE. The table you're looking for is located in 'view_all'.
import requests
from bs4 import BeautifulSoup as Soup
url = 'http://www.etf.com/SHE'
api = "http://www.etf.com/view_all/holdings/SHE"
headers = {'X-Requested-With':'XMLHttpRequest', 'Referer':url}
page = requests.get(api, headers=headers)
htmltxt = page.json()['view_all']
soup = Soup(htmltxt, "html.parser")
data = [[td.text for td in tr.find_all('td')] for tr in soup.find_all('tr')]
print('\n'.join(': '.join(row) for row in data))

Categories