I'm experimenting with python + selenium, but I'm having trouble navigating list items when there is no ID provided for the elements.
In short, the page I'm interested in contains a list with two elements (see below): "Pitch View" and "List View". By default, "Pitch View" is selected, but I need the list view.
<div class="sc-bdnxRM ichxnR">
<div>
<ul class="Tabs__TabList-sc-1e6ubpf-0 eSBWKp"> [flex]
<li class="Tab__Item-sc-19t48gi-0 bsocgQ">
::marker
<a class="Tab__Link-sc-19t48gi-1 dDKNAk" href="#pitch">Pitch View></a> [event]
</li>
<li class="Tab__Item-sc-19t48gi-0 bsocgQ">
::marker
<a class="Tab__Link-sc-19t48gi-1 xSQCR" href="#list">List View></a> [event]
</li>
</ul>
...
Sorry, a screen shot would have been cleaner, but I don't seem to have permission.
I can load up the page, and I'm able to interact with all the elements. Switching to the "List View" manually is not an issue once that's done, but I can't seem to get Selenium to change to the view automatically. I'm getting TimeoutException errors, but the real issue is that I'm not providing the right element tags, so Selenium can't navigate correctly. Most of my attempts have been variations on the code shown below.
element=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"//a[#class='Tab__Link-sc-19t48gi-1 xSQCR']")))
element.click()
If I can get to the "List View", I'll be fine from there as retrieving data from the tables isn't a problem once they are rendered.
My background isn't in web programming, so apologies if this is a simple question. If anyone is able to assist, that would be very helpful. Thanks!
Update:
It took a bit of tweaking, but I did eventually get it to work after exploring the link provided by Ahmed. Examples below using both the absolute as well as relative xpaths in case anyone else is stuck on this:
Absolute path
driver.find_element_by_xpath("/html/body/main/div/div[2]/div[2]/div[1]/div[4]/div/div/ul/li[2]/a").click()
Relative Path
driver.find_element_by_xpath('//a[contains(#href,"#list")]').click()
Thanks to both Ahmed and Dan.
You can target the href and get the results you want.
[href=‘#list’]
Related
I'm working on a script that will take the src of a website, and take screenshots of relevant parts of the site. More specifically, I'm interested in taking screenshots of posts from a site, including their respective comments and replies.
Currently, I am able to generate all these screenshots as desired, however I am encountering an issue when a given post's content exceeds the length of the Selenium browser window. A sample HTML snippet is below:
<div class="detail word-break">
<p id="contentArea">
Sample text content here. As you can see, the text is inside a p tag
<br>
<br>
...
My issue can be boiled down to wanting to treat each of these text elements as a separate WebElement for the purpose of taking Selenium screenshots
<br>
<br>
Using the XPath selector for "./child::*" on the contentArea element only returns a list of br tags, with no text content inside
...
</p>
</div>
Is it possible to take the WebElement for the contentArea, and subdivide it into smaller WebElements that contain the tagless text so they can be screenshotted individually?
I ended up finding a workaround for my issue. Rather than splitting the post into separate elements, I was instead able to parse out the text and separate each paragraph (i.e. subelement) into a list. Then, using the JS executor, I was able to replace the text of the entire post with just that of each paragraph, and take the screenshots that way.
Hopefully anyone facing a similar issue will find this workaround useful!
I've perused SO for quite a while and cannot find the exact or similar solution to my current problem. This is my first post on SO, so I apologize if my formatting is off.
The Problem -
I'm trying to find a button on a webpage to punch me into a timeclock automatically. I am able to sign in and navigate to the correct page (it seems the page is dynamically loaded, as switching from different tabs like "Time Management" or "Pay Period" do not change the URL).
Attempts to solve -
I've tried using direct and indirect XPaths, CSS Selectors, IDs, Classes, Names, and all have failed. Included below are the different code attempts to find the button, and also a snippet of code including the button.
Button - HTML
Full Page HTML Source Code
<td>
<a onclick="return OnEmpPunchClick2(this);" id="btnEMPPUNCH_PUNCH" class="timesheet button icon " href="javascript:__doPostBack('btnEMPPUNCH_PUNCH','')">
<span> Punch</span></a>
<input type="hidden" name="hdfEMPPUNCH_PUNCH" id="hdfEMPPUNCH_PUNCH" value="0">
</td>
Attempts - PYTHON - ALL FAIL TO FIND
#All these return: "Unable to locate element"
self.browser.find_element_by_id("btnEMPPUNCH_PUNCH")
self.browser.find_element_by_xpath("//a[#id='btnEMPPUNCH_PUNCH']")
self.browser.find_element_by_css_selector('#btnEMPPUNCH_PUNCH')
#I attempted a manual wait:
wait=WebDriverWait(self.browser,30)
button = wait.until(expected_conditions.element_to_be_clickable((By.CSS_SELECTOR,'#btnEMPPUNCH_PUNCH')))
#And even manually triggering the script:
self.browser.execute_script("javascript:__doPostBack('btnEMPPUNCH_PUNCH','')")
self.browser.execute_script("__doPostBack('btnEMPPUNCH_PUNCH','')")
#Returns Message: ReferenceError: __doPostBack is not defined
None of these work, and I cannot seem to figure out why that is. Any help will be greatly appreciated!
I'm currently learning Python, and as a project for myself, I'm learning how to use Selenium to interact with websites. I have found the element through its id in HTML, but when I don't know how to reference the heading inside the element. In this case, I just want the string from <h4>.
<div class="estimate-box equal-width" id="estimate-box">
<a href="#worth" id="worthBox" style="text-decoration: none;">
<h5>Worth</h5>
<h4>$5.02</h4>
</a>
</div>
My question is, how do I get python to extract just the text in <h4>? I'm sorry if I formatted this wrong, this is my first post. Thanks in advance!
Use following xpath.
print(driver.find_element_by_xpath("//a[id='worthBox']/h4").text)
Or following css selector.
print(driver.find_element_by_css_selector("#worthBox>h4").text)
So I've been going through the online book "Automate the Boring Stuff with Python" and I'm learning about BeautifulSoup. My issue is I can't seem to figure out how to choose the appropriate tag based on what I find using the developer's tools in Chrome.
<div data-hveid=.....>
<div class="rc">
<a href="https://www.python.org/".....>
<h3 class="LC20lb">Welcome to Python.org</h3>
# Using select to grab links to search results.
linkElems = soup.select('r .a')
An example of the inspector results.
In the book the goal was to grab all the links that show up on the search results page of a google search. To do so the author uses the line soup.select('r .a'). But when I use the inspector I get to the "a href" tag.
On my own I wanted to also grab the title/heading of a link that shows up on the search results page. The inspector highlights the "h3 class" tag. I tried to select that by telling select to look for tags with the class attribute equal to "LC20lb" but I keep getting an empty list as output.
So my question is, once the inspector has helped us narrow our focus how do we know which tag is the appropriate one to select? Like how did the author know that instead of the "a href" tag, we should instead go with '.r a' instead? In general, how far "out", i.e. which ancestor, should I choose once the selector has shown me a particular element?
If you do 'a href' you haven't specified a div class, so it's going to get all instances of a href, which is going to include links to stuff like maps and drive etc. In the code you cite, you missed the "r" div class
<div data-hveid=.....>
<div class="rc">
<div class="r">
<a href="https://www.python.org/".....>
<h3 class="LC20lb">Welcome to Python.org</h3>
So soup.select('.r a') is getting all the a tags in the "r" div class (which is the search results), rather than all instances of a href tags.
Hope this answers your question!
If you go to the site, you'd notice that there is an age confirmation window which I want to bypass through scrapy but I messed up with that and I had to move on to selenium webdriver and now I'm using
driver.find_element_by_xpath('xpath').click()
to bypass that age confirmation window. Honestly I don't want to go with selenium webdriver because of its time consumption. Is there any way to bypass that window?
I searched a lot in stackoverflow and google
but didn't get any answer which may resolves my problem. If you've any link or idea of resolving it by Scrapy, that'd be appreciated. A single helpful comment will be up-voted!
To expand on Chillie's answer.
The age verification is irrelavant here. The data you are looking for is loaded via AJAX request:
See related question: Can scrapy be used to scrape dynamic content from websites that are using AJAX? to understand how they work.
You need to figure out how https://ns5bwtai8m-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%203.19.1&x-algolia-application-id=NS5BWTAI8M&x-algolia-api-key=e676b05f3844d3adf54a29732af6e43c url works and how can you retrieve in it scrapy.
But the age verification "window" is just a div that gets hidden when you press the button, not a real separate window:
<div class="age-check-modal" id="age-check-modal">
You can use the browser's Network tab in developer tools to see that no new info is uploaded or sent when you press the button. So everything is already loaded when you request a page. The "popup" is not even a popup, just an element whose display is changed to none when you click the button.
So Scrapy doesn't really care what's meant to be displayed as long as all html is loaded. If the elements are loaded, they are accessible. Or have you seen some information being unavailable without pressing the button?
You should inspect the html code more to see what each website does, this might make your scraping tasks easier.
Edit: After inspecting the original html you can see the following:
<div class="products-list">
<div class="products-container-block">
<div class="products-container">
<div id="hits" class='row'>
</div>
</div>
</div>
</div>
You can also see a lot of JS script tags.
The browser element inspector shows us the following:
The ::before part gives away that this was manipulated by JS, as you cannot do this with simple CSS. See Granitosaurus' answer for details on this.
What this means is that you need to somehow execute the arbitrary JS code on those pages. So you either need a solution with Scrapy, or just use Selenium, as many do, and as you already have.