How to ensure a webpage is loaded in selenium python [duplicate] - python

This question already has answers here:
Do we have any generic function to check if page has completely loaded in Selenium
(7 answers)
Is there a way with python-selenium to wait until all elements of a page has loaded?
(3 answers)
Closed 2 years ago.
I am scraping 'href' tags from a website while inputting values from a csv file. The issue that I am facing is that the webpage sometimes doesn't get loaded fully and is returning values from the previously loaded webpage. For the same, I want to ensure that the webpage is loaded fully. I am using selenium in python and have used the following query:
links = driver.find_elements_by_xpath("//*[#class='search-result__image-wrapper']/a")
WebDriverWait(driver, timeout).until(links[0])
Basically, the links variable extracts the elements that contain the profile URLs if it exists. There might be cases where "links" is null. The above code gives me the following error:
"WebElement object is not callable"
When I tried using,
links = driver.find_elements_by_xpath("//*[#class='search-result__image-wrapper']/a")
WebDriverWait(driver, timeout).until(links)
The error is:
"TypeError: List object is not callable in python"
Can you please help me with this? I am new to python and web scraping. Thanks!

Related

How to scrape a webpage that uses JavaScript to build the HTML? [duplicate]

This question already has answers here:
Screen Scraping a Javascript based webpage in Python
(3 answers)
scrape html generated by javascript with python
(5 answers)
Closed 1 year ago.
I have a scraper made in python2 using requests and beautifulsoup. We do not want to use selenium. Now the website we are scraping has changed its some page. The page is now building the elements on the page using javascript. I want to scrape the page in which the source code is getting generated with javascript - how can this be done?

BeautifulSoup not returning children of element [duplicate]

This question already has answers here:
Web scraping program cannot find element which I can see in the browser
(2 answers)
Closed 2 years ago.
I'm new to web scraping and have been using BeautifulSoup to scrape numbers off a gambling website. I'm trying to get the text of a certain element but returned None.
Here is my code:
r=requests.get('https://roobet.com/crash')
soup = bs4.BeautifulSoup(r.text,'lxml')
crash = soup.find('div', class_='CrashHistory_2YtPR')
print(crash)
When I copied the content of my soup into a note pad and tried ctrl+f to find the element i could not find it.
The element I'm looking for is in the <div id="root"> element and when I looked closer at the copied soup in the notepad I saw that there was nothing inside the <div id="root"> element.
I don't understand what is happening
how can I get the element I'm looking for.
Right click on the page and view source. This is one sure way of knowing how the DOM looks like when the page loads. If you do this for the site https://roobet.com/crash you will notice that the <body> is almost empty besides some <script> elements.
This is because body of the webpage is dynamically loaded using Javascript. This is most likely done using a framework such as react
This is the reason BeautifulSoup is having trouble finding the element.
Your website seems to be dynamically loaded, meaning it uses Javascript and other components. You can test this by enabling/disabling Javascript. In order to scrape this site, try using Selenium and Chromedriver, you can also use other browsers, just look for their equivalent.

Python Download Website HTML containing JS [duplicate]

This question already has answers here:
Using python Requests with javascript pages
(6 answers)
Closed 3 years ago.
I am attempting to download many dot-bracket notations of RNA sequences from a url link with Python.
This is one of the links I am using: https://rnacentral.org/rna/URS00003F07BD/9606. To navigate to what I want, you have to click on the '2D structure' button, and only then does the thing I am looking for (right below the occurence of this tag)
<h4>Dot-bracket notation</h4>
appear in the Inspect Element tab.
When I use the get function from the requests package, the text and content fields do not contain that tag. Does anyone know how I can get the bracket notation item?
Here is my current code:
import requests
url = 'http://rnacentral.org/rna/URS00003F07BD/9606'
response = requests.get(url)
print(response.text)
Requests library does not render JS. You need to use a web browser-based solution like selenium. I have listed a pseudo-code below.
Use selenium to load the page.
then click the button 2D structure using selenium.
Wait for some time by adding a time.sleep().
And read the page source using selenium.
You should get what you want.

Not able to locate element, not iframe or page load [duplicate]

This question already has answers here:
Unable to locate element of credit card number using selenium python
(2 answers)
Closed 3 years ago.
I am attempting to automate the checkout process of a shopify checkout and selenium can't seem to find the elements necessary to complete the checkout. NOTE: This checkout is not in an iframe and i have done extensive research to make sure that the page is fully loaded, so this is not a duplicate question.
try:
elem = driver.find_element_by_id('number')
elem.send_keys('4342923222931029')
except NoSuchElementException:
assert 0, "can't find input with number id"
Here is what i am trying to access: screenshot of the source code of the checkout code
Here is the python code to switch to the correct frame.
ele = driver.find_element_by_xpath("//iframe[contains(id, 'card-fields-number')]")
driver.switch_to.frame(ele);
For Java solution refer here

getting full content of web page (using Python-requests) [duplicate]

This question already has answers here:
Programmatic Python Browser with JavaScript
(8 answers)
Closed 4 years ago.
I am new to this subject, so my question could prove stupid.. sorry in advance.
My challenge is to do web-scraping, say for this page: link (google)
I try to web-scrape it using Python,
My problem is that once I use Python requests.get, I don't seem to get the full content of the page. I guess it is because that page has many resources, and Python does not get them all. (more than that, once I scroll my mouse up - more data is reviled on Chrome. I can see from the source code that no more data is downloaded to be shown..)
How can I get the full content of a web page? what am I missing?
thanks
requests.get will get you the page web but only what the page decides to give a robot. If you want the full page web as you see it as a human you need to trick it by changing your headers. If you need to scroll or click on buttons in order to see the whole page web, which is what I think you'll need to do, I suggest you take a look at selenium.

Categories