Xpath to get text following label - python

I want to get items according to their (preceding) <label> attributes, like this:
<div>
<ul>
<li class="phone">
<label>Mobile</label>
312-999-0000
<div>
<ul>
<li class="phone">
<label>Home</label>
312-999-0001
I want to put the first number in the "Mobile" column/list, and the second in the Home list. I currently have code grabbing both of them, but I don't know the proper syntax for getting the label as it is in the source. This is what I'm using now:
for target in targets:
item = CrawlerItem()
item['phonenumbers'] = target.xpath('div/ul/li[#class="phone"]/text()').extract()
How should I rewrite that for item['mobilephone'] and item['homephone'], using the labels?

I found the answer while finishing up the question, and thought I should share it:
item['mobilephone'] = target.xpath('div/ul/li/label[contains (text(),"Mobile")]/following-sibling::text()').extract()
item['officephone']= target.xpath('div/ul/li/label[contains (text(),"Office")]/following-sibling::text()').extract()

Related

Recursion for traversing multiple divs in order to find select tag using selenium

I'm lookin to solve one problem using recursions but I'm
unfortunately very bad at it. Do u guys have any idea how would this recursion look like.
This is just a made up example because those elements after main-div
class are hidden and I would wanna know how to traverse this entire div
tree to search for a select tag using selenium.
I'm very bad at recursions so would appriciate any help or ideas.
<div class="main-div">
<div>
<div><p>Some text1</p></div>
<div><p>Some text2</p></div>
<div><p>Some text3</p></div>
</div>
<div>
<div><p>Some text4</p></div>
<div>
<p>Some text5</p>
<div><p>Div containing my select file<p>
<select>
<option></option>
<option></option>
<option></option>
<option></option>
</select>
</div>
</div>
</div>
I know that I have to use
driver.find_elements_by_class_name('main-div')
to search for that specific element. Afterwards I think that I should do
smth lik for loop to find every element smth like:
for record in main_div:
new_to_search = record.find_elements_by_tag_name('div')
##calling my recursive function
myFunction(new_to_search, selectTag)
So here I'm stuck...
Edit:
What do u guys think about this approach. Will this cover every element ?
def myFunc(element):
print(len(element))
for record in element:
print(element)
checkRecord(record)
new_div_element = record.find_elements_by_xpath('.//div')
if(len(new_div_element) > 0):
myFunc(new_div_element)
def checkRecord(element):
select = element.find_elements_by_tag_name('select')
if(len(select) > 0):
print("U've found ur select tag")
argument = driver.find_elements_by_class_name('main_div')
#calling my recursive function
myFunc(argument)

How can I access bulk-edit button "bulkedit_all"? Python / Selenium

I am trying to automate JIRA tasks but struggling to access bulkedit option after JQL filter. After accessing the correct sceen I am stuck at this point:
enter image description here
HTML code:
<div class="aui-list">
<h5>Bulk Change:</h5>
<ul class="aui-list-sectionaui-first aui-last">
<li class="aui-list-item active">
<a class="aui-list-item-link" id="bulkedit_all" href="/secure/views/bulkedit/BulkEdit1!default.jspa?reset=true&tempMax=4">all 4 issue(s)</a>
</li>
</ul>
</div>
My Python code:
bulkDropdown = browser.find_elements_by_xpath("//div[#class='aui-list']//aui-list[#class='aui-list-item.active']").click()
Try the following xpath -
bulkDropdown = browser.find_elements_by_xpath("//li/a[#id='bulkedit_all']").click()
The link you want has an ID, you should use that unless you find that it's not unique on the page.
browser.find_element_by_id("bulkedit_all").click()
You will likely need to add a wait for clickable since from the screenshot it looks like a popup or tooltip of some kind. See the docs for more info on the different waits available.

Search <Li> List via children text

so I'm currently using python to import data from an excel sheet and then take that information and use it to fill out a form on a webpage.
The problem I'm having is selecting a profile of the drop-down menu.
I've been using the Selenium library and I can actually select the element using find_element_by_xpath assuming, but that's assuming I know the data value, the data value is auto-generated for each new profile that's added so I can't use that as a reliable means.
Profile = Browser.find_element_by_xpath("/html/something/something/.....")
Profile.click()
time.sleep(0.75) #allowing time for link to be clickable
The_Guy = Browser.find_element_by-xpath("/html/something/something/...")
The_Guy.click()
This works only on known paths I would like to do something like this
Profile = Browser.find_element_by_xpath("/html/something/something/.....")
Profile.click()
time.sleep(0.75) #allowing time for link to be clickable
The_Guy = Browser.find_element_by_id("Caption.A")
The_Guy.click()
EXAMPLE OF HTML
<ul class ="list">
<li class = "option" data-value= XXXXX-XXXXX-XXXXX-XX-XXX>
::marker
Thor
</li>
<li class = "option" data-value= XXXXX-XXXXX-XXXXX-XX-XXX>
::marker
IronMan
</li>
<li class = "option" data-value= XXXXX-XXXXX-XXXXX-XX-XXX>
::marker
Caption.A
</li>
....
</ul>
What I'll like to be able to do is search via name (like Caption.A) and then step back to select the parent Li. Thanks in advance
Try using following xpath to find the li containing desired text and then click on it. Sample code:
driver.find_element(By.xpath("//li[contains(text(), 'Caption.A')]")).click()
Hope it helps :)

Navigating DOM in BeautifulSoup

I'm currently able to find certain elements using the findAll function. Is there a way to navigate to their child?
The code I have is:
data = soup.findAll(id="profile-experience")
print data[0].get_text()
And it returns a block of text (for example, some of the text isn't spaced out properly)
The DOM looks something like this
<div id="profile-experience>
<div class="module-body>
<li class="position">
<li class="position">
<li class="position">
If I just do a findAll on class="position I get way too much crap back. Is there a way using BeautifulSoup to just find the elements that are <li class="position"> that are nested underneath <div id="profile-experience">
I want to do something like this:
data = soup.findAll('li',attrs={'class':'position'})
(Where I'm only getting the nested data)
d in data:
print d.get_text()
Sure, you can "chain" the find* calls:
profile_experience = soup.find(id="profile-experience")
for li in profile_experience.find_all("li", class_="position"):
print(li.get_text())
Or, you can solve it in one go with a CSS selector:
for li in soup.select("#profile-experience li.position"):
print(li.get_text())

How do I access an inline element inside a loop in lxml?

I am trying to screen scrape values from a website.
# get the raw HTML
fruitsWebsite = lxml.html.parse( "http://pagetoscrape.com/data.html" )
# get all divs with class fruit
fruits = fruitsWebsite.xpath( '//div[#class="fruit"]' )
# Print the name of this fruit (obtained from an <em> in the fruit div)
for fruit in fruits:
print fruit.xpath('//li[#class="fruit"]/em')[0].text
However, the Python interpreter complains that 0 is an out of bounds iterator. That's interesting because I am sure that the element exists. What is the proper way to access the inside <em> element with lxml?
The following code works for me with my test file.
#test.py
import lxml.html
# get the raw HTML
fruitsWebsite = lxml.html.parse('test.html')
# get all divs with class fruit
fruits = fruitsWebsite.xpath('//div[#class="fruit"]')
# Print the name of this fruit (obtained from an <em> in the fruit div)
for fruit in fruits:
#Use a relative path so we don't find ALL of the li/em elements several times. Note the .//
for item in fruit.xpath('.//li[#class="fruit"]/em'):
print(item.text)
#Alternatively
for item in fruit.xpath('//div[#class="fruit"]//li[#class="fruit"]/em'):
print(item.text)
Here is the html file I used to test again. If this doesn't work for the html you're testing again, you'll need to post a sample file that fails as I requested in the comments above.
<html>
<body>
Blah blah
<div>Ignore me</div>
<div>Outer stuff
<div class='fruit'>Some <em>FRUITY</em> stuff.
<ol>
<li class='fruit'><em>This</em> should show</li>
<li><em>Super</em> Ignored LI</li>
<li class='fruit'><em>Rawr</em> Hear it roar.</li>
</ol>
</div>
</div>
<div class='fruit'><em>Super</em> fruity website of awesome</div>
</body>
</html>
You definitely will get too many results with the code you originally posted (the inner loop will search the entire tree rather than the subtree for each "fruit"). The error you're describing doesn't make much sense unless your input is different than what I understood.

Categories