Python xpath - Exclude content with style="display:none"

Python xpath - Exclude content with style="display:none" - python

I'm using xpath to get some information from a website and I came across a block of code that contains style="display:none or block and I want to only include the code that has display:block; I watched some examples but I couldn't get it working on my code. I want to use an if statement to run the code if it has display:block but I don't know if that is possible. This is what I have:
if guide_page.xpath(".//div[#class='build-box']/#style/text()") == "display: block;":
for build_names in guide_page.xpath(".//div[#class='build-gradient']"):
for title in build_names.xpath("div/h2/text()"):
print("\n")
print(title)
And this is the div that has it:
<div class="build-box" style="display: block;">
I'm not sure if I should paste more of the html or if that's enough, otherwise, please tell me and thanks for any help :)

You can do this without using if statement. Just add a not(...condition...) in predicate to exclude elements matching certain condition. For example, the following XPath returns div elements with certain class attribute value, that don't have attribute style="display: block;" :
.//div[#class='build-box' and not(#style='display: block;')]

Related

How to bring back 1st div child in python using bs4 soup.select within a dynamic table

In the below html elements, I have been unsuccessful using beautiful soup.select to only obtain the first child after div class="wrap-25PNPwRV"> (i.e. -11.94M and 2.30M) in list format
<div class="value-25PNPwRV">
<div class="wrap-25PNPwRV">
<div>‪−11.94M‬</div>
<div class="change-25PNPwRV negative-25PNPwRV">−119.94%</div></div></div>
<div class="value-25PNPwRV additional-25PNPwRV">
<div class="wrap-25PNPwRV">
<div>‪2.30M‬</div>
<div class="change-25PNPwRV negative-25PNPwRV">−80.17%</div></div></div>
Above is just two examples within the html I'm attempting to scrape within the dynamic javascript coded table which the above source code lies within, but there are many more div attributes on the page, and many more div class "wrap-25PNPwRV" inside the javascript table
I currently have the below code which allows me to scrape all the contents within div class ="wrap-25PNPwRV"
data_list = [elem.get_text() for elem in soup.select("div.wrap-25PNPwRV")]
Output:
['-11.94M', '-119.94%', '2.30M', '-80.17%']
However, I would like to use soup.select to yield the desired output :
['-11.94M', '2.30M']
I tried following this guide https://www.crummy.com/software/BeautifulSoup/bs4/doc/ but have been unsuccessful to implement it to my above code.
Please note, if soup.select is not possible to perform the above, I am happy to use an alternative providing it generates the same list format/output

You can use the :nth-of-type CSS selector:
data_list = [elem.get_text() for elem in soup.select(".wrap-25PNPwRV div:nth-of-type(1)")]

I'd suggest to not use the .wrap-25PNPwRV class. Seems random and almost certainly will change in the future.
Instead, select the <div> element which has other element with class="change..." as sibling. For example
print([t.text.strip() for t in soup.select('div:has(+ [class^="change"])')])
Prints:
['−11.94M', '2.30M']

Web Scraping with Selenium questions, no results with any method

When hitting inspect the element on a paragraph, I get directed to a tag. If it is empty, Where is the text that I see on the screen coming from?
I have attempted:
driver.find_elements_by_css_selector('the class of the textarea')
driver.find_elements_by_id("the id of the textarea")
driver.find_elements_by_class("the class of the textarea")
I've also tried the ladder methods with the parent divs of the text area. nothing.
Then I did the following:
I copied the text and cntrl+F to find it in the HTML, I was directed to a tag. I tried the following:
everything = driver.find_elements_by_tag_name("script")
for item in everything:
print(driver.execute_script(item.text))
I get a list of Nones printed out.
I have been stuck at this for days.
This is the tag:
<textarea style="width: 658px; height: 128px; overflow: auto;" autocomplete="off" id="ContentPH_description" name="ContentPH_description" role="textbox" aria-readonly="false" aria-required="false" aria-multiline="fals" class="x-form-textarea x-form-field vms-viewmode-view-set" readonly="" aria-labelledby="ContentPH_description_label" title="" aria-invalid="false" maxlength="10000" oldindex="0" tabindex="-1"></textarea>
I, unfortunately, cannot share a link or the full HTML of the page since it is work-related and you'd need a login to access the data I am trying to scrape.
What can I do to solve this?

I don't know if it will help you, but when I try to access visual hidden text elements (i.e. text from a collapsed tab), I always replace element.text to element.get_attribute('textContent') or element.get_attribute("innerText").

Selecting dynamic element using selenium and python

Could someone please tell me how can I select a dynamic element using selenium?
I would like to select the "limit-order" element.
<div class="tab-control" id="uniqName_0_85" widgetid="uniqName_0_85">
<span data-tab="market-order" class="tab-item tab-active">Market</span>
<span data-tab="limit-order" class="tab-item">Limit</span>
<span data-tab="stop-order" class="tab-item">Stop</span>
<span data-tab="stop_limit-order" class="tab-item">Stop Limit</span>
</div>
I tried this but no luck:
btn_limit_name_xpath = '//div[contains(#class,"tab-control")]/span[2]'
btn_limit = browser.find_element_by_xpath(btn_limit_name_xpath)
btn_limit.click()

What sometimes does the job for me is copy the full xpath instead of the shorter one.
If that doesn't work either, you could try and check this out.
They show you how you can use an xpath to find a specific piece of text and select the object in that way. So in your case you could try and find it by searching for 'limit'.

Not able to extract data using scrapy with class names containing spaces and hyphens

I am new to scrapy and I have to extract text from a tag with multiple class names, where the class names contain spaces and hyphens.
Example:
<div class="info">
<span class="price sale">text1</span>
<span class="title ng-binding">some text</span>
</div>
When i use the code:
response.xpath("//span[contains(#class,'price sale')]/text()").extract()
I am able to get text1 but when I use:
response.xpath("//span[contains(#class,'title ng-binding')]/text()").extract()
I get an empty list. Why is this happening and how to handle this?

The expression you're looking for is:
//span[contains(#class, 'title') and contains(#class, 'ng-binding')]
I highly suggest XPath visualizer, which can help you debug xpath expressions easily. It can be found here:
http://xpathvisualizer.codeplex.com/
Or with CSS try
response.css("span.title.ng-binding")
Or there is a chance that element with ng-binding is loaded via Javascript/Ajax hence not included in initial server response.

You can replace the spaces with "." in your code when using response.css().
In your case you can try:
response.css("span.title.ng-binding::text").extract()
This code should return the text you are looking for.

how to extract text using Beautifulsoup

Can you please show me how to extract the title text (Inna) using BeautifulSoup in this situation:
<div class="wallpapers-box-300x180-2 wallpapers-margin-2">
<div class="wallpapers-box-300x180-2-img"><a title="Inna" href="/photo.jpg" alt="Inna" width="300" height="188" /></a></div>
<div class="wallpapers-box-300x180-2-title"><a title="Inna" href="/wallpapers/inna/">Inna</a></div>
Thanks.

There are so many ways to locate the element in this case and it's difficult to tell which way would work for you better since we don't know the scope of the problem, how unique is the element and what do you know and can rely on.
The most practical approach here I think would be to use the following CSS selector:
for elm in soup.select('div[class^="wallpapers-box"] > a[href*=wallpapers]'):
print(elm.get_text())
Here we check for the parent div element's class to start with wallpapers-box and find the direct a child element having wallpapers text inside the href attribute value.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python xpath - Exclude content with style="display:none" - python

Related

How to bring back 1st div child in python using bs4 soup.select within a dynamic table

Web Scraping with Selenium questions, no results with any method

Selecting dynamic element using selenium and python

Not able to extract data using scrapy with class names containing spaces and hyphens

how to extract text using Beautifulsoup

Categories

Resources