XPath, nested conditions - python

I have the following HTML code, and I need to have an XPath expression, which finds the table element.
<div>
<div>Dezember</div>
<div>
<div class="dash-table-container">more divs</div>
</div>
</div>
My current Xpath expression:
//div[./div[1]/text() = "Dezember"]/preceding::div[./div[2][#class=dash-table-container]
I don't know how to check if the dash table container is the last one loaded, since I have many of them. So I need the check if it's under the div with "Dezember" as a text because the div's before with the other months are being loaded faster.
I want the XPATH to select the "dash table container" div.
Thanks in advance

To select the div with the text content of "more divs", you can use
//div/div[#class="dash-table-container" and ../preceding-sibling::div[1]="Dezember"]
and to select its parent div element, use
//div[div/#class="dash-table-container"][preceding-sibling::div[1]="Dezember"]/..

I figured it out.
//div[preceding-sibling::div="Dezember"]/div[#class="dash-table-container"]
worked perfectly for me.

Related

How to bring back 1st div child in python using bs4 soup.select within a dynamic table

In the below html elements, I have been unsuccessful using beautiful soup.select to only obtain the first child after div class="wrap-25PNPwRV"> (i.e. -11.94M and 2.30M) in list format
<div class="value-25PNPwRV">
<div class="wrap-25PNPwRV">
<div>‪−11.94M‬</div>
<div class="change-25PNPwRV negative-25PNPwRV">−119.94%</div></div></div>
<div class="value-25PNPwRV additional-25PNPwRV">
<div class="wrap-25PNPwRV">
<div>‪2.30M‬</div>
<div class="change-25PNPwRV negative-25PNPwRV">−80.17%</div></div></div>
Above is just two examples within the html I'm attempting to scrape within the dynamic javascript coded table which the above source code lies within, but there are many more div attributes on the page, and many more div class "wrap-25PNPwRV" inside the javascript table
I currently have the below code which allows me to scrape all the contents within div class ="wrap-25PNPwRV"
data_list = [elem.get_text() for elem in soup.select("div.wrap-25PNPwRV")]
Output:
['-11.94M', '-119.94%', '2.30M', '-80.17%']
However, I would like to use soup.select to yield the desired output :
['-11.94M', '2.30M']
I tried following this guide https://www.crummy.com/software/BeautifulSoup/bs4/doc/ but have been unsuccessful to implement it to my above code.
Please note, if soup.select is not possible to perform the above, I am happy to use an alternative providing it generates the same list format/output
You can use the :nth-of-type CSS selector:
data_list = [elem.get_text() for elem in soup.select(".wrap-25PNPwRV div:nth-of-type(1)")]
I'd suggest to not use the .wrap-25PNPwRV class. Seems random and almost certainly will change in the future.
Instead, select the <div> element which has other element with class="change..." as sibling. For example
print([t.text.strip() for t in soup.select('div:has(+ [class^="change"])')])
Prints:
['−11.94M', '2.30M']

how to extract text using Beautifulsoup

Can you please show me how to extract the title text (Inna) using BeautifulSoup in this situation:
<div class="wallpapers-box-300x180-2 wallpapers-margin-2">
<div class="wallpapers-box-300x180-2-img"><a title="Inna" href="/photo.jpg" alt="Inna" width="300" height="188" /></a></div>
<div class="wallpapers-box-300x180-2-title"><a title="Inna" href="/wallpapers/inna/">Inna</a></div>
Thanks.
There are so many ways to locate the element in this case and it's difficult to tell which way would work for you better since we don't know the scope of the problem, how unique is the element and what do you know and can rely on.
The most practical approach here I think would be to use the following CSS selector:
for elm in soup.select('div[class^="wallpapers-box"] > a[href*=wallpapers]'):
print(elm.get_text())
Here we check for the parent div element's class to start with wallpapers-box and find the direct a child element having wallpapers text inside the href attribute value.

Select second tag

How can I select the second <a> tag from the following snippet?
<div class="hovno">
<a href='...'></a>
<a href='...'></a>
</div>
I know that I can find the first <a> tag by using:
driver.find_element_by_css_selector("div.hovno a")
But I don't know how to select the second <a> tag.
You can always find all direct a children and get the second element:
driver.find_elements_by_css_selector("div.hovno > a")[1]
Or, according to the example, the last element would work too:
driver.find_elements_by_css_selector("div.hovno > a")[-1]
nth-of-type pseudo-class is also an option:
driver.find_element_by_css_selector("div.hovno > a:nth-of-type(2)")
You should use nth-of-type
driver.FindElement(By.CssSelector("div.hovno a:nth-of-type(2)");
Im no sure but try this
driver.find_element_by_css_selector("div.hovno").find_element_by_tag_name('a')[2]

scrapy xpath selector without same DIV ID

i want to select all divs that a first part of their ID is "edit" using scrapy/XPATH.
For example:
<div id="edit3423432">...</div>
<div id="edit0036594">...</div>
For divs which have same id i use this code:
hxs.select('.//div[contains(#id,"testid")]')
But now how can i select all divs that have the first four characters equal to "edit"?
Xpath has a special function called starts-with, that would be pretty ideal here. Here's an example of how to use it:
hxs.select('.//div[starts-with(#id, 'edit')]')
Hope that helps, let me know if you have any questions.

Select second element that is after first one

I have 2 divs in code under 1 li. I need to select second one:
<li>
<div id='1'>Stable Text</div>
<div>Unstable Text</div>
</li>
I can find only first one using text's name, as its stable. But I need second one.
using xpath with //li/div[2] will not work because place of this data is not stable.
You can use following-sibling.
//div[text() = 'Stable Text']/following-sibling::div
You can have a look here for more information.
you can try using a cssSelector as well,
li > div#1 + div
Here '+' is used to locate following sibling..you can refer this for more info on selectors.

Categories