python selenium xpath horror

python selenium xpath horror - python

i know i am doing something wrong here but i would like your help on this one
i have this html code
<span id='some text'>some text</span>
<ul> this is what I would like to grab </ul>
<span id='some more text'>some more text</span>
so i tried something and since this is the first time i am working with xpath i was quite sure i was doing wrong.
driver.find_elements_by_xpath('//ul[preceding:://span[#id="some text"] and following:://span[#id="some more text"] ')
any help is appreciated

An id attribute is supposed to be unique, so one is enough to select a branch.
To get the <ul> tag following the <span id='some text'>:
driver.find_elements_by_xpath("//span[#id='some text']/following-sibling::ul[1]")
, and with a CSS selector:
driver.find_elements_by_css_selector("span[id='some text'] + ul")

Related

Extract text from children of next nodes with XPath and Scrapy

With Python Scrapy, I am trying to get contents in a webpage whose nodes look like this:
<div id="title">Title</div>
<ul>
<li>
<span>blahblah</span>
<div>blahblah</div>
<p>CONTENT TO EXTRACT</p>
</li>
<li>
<span>blahblah</span>
<div>blahblah</div>
<p>CONTENT TO EXTRACT</p>
</li>
...
</ul>
I'm a newbie with XPath and couldn't get it for now. My last try was something like:
contents = response.xpath('[#id="title"]/following-sibling::ul[1]//li//p.text()')
... but it seems I cannot use /following-sibling after [#id="title"].
Any idea?

Try this XPath
contents = response.xpath('//div[#id="title"]/following-sibling::ul[1]/li/p/text()')
It selects both "CONTENT TO EXTRACT" text nodes.

One XPath would be:
response.xpath('//*[#id="title"]/following-sibling::ul[1]//p/text()).getall()
which get text from every <p> tag child or grand child of nearest <ul> tag to node with id = "title".
XPath syntax

Try this using css selector.
response.css('#title ::text).extract()

Second element on selenium chromedriver

When trying to automate our application, there are two div with same parameters.
I'm not able to find a way to recognize these. Please let me know what could be the other ways to identify these elements in selenium chromedriver in python.
I tried it, no it's doesn't work for me.
driver.find_element_by_xpath("(//div[#class='c3'])[2]/p").send_keys('text')
This is my html code
<div class="c3">
<p> test1 </p>
</div>
<div class="c3">
<p> test2 </p>
</div>
<div class="c3">
<p> test3 </p>
</div>
I want add my text after test2

Problem is the extra parentheses() / parentheses at wrong place in your xpath.
Correct xpath would be:
driver.find_element_by_xpath("//div[#class='c3'][2]/p").send_keys('text')
OR
driver.find_element_by_xpath("(//div[#class='c3']/p)[2]").send_keys('text')

Python splinter select by tag attribute

I am messing around with some web scraping using Splinter but have this issue. The html basically has loads of li only some of which I am interested in. The ones I am interested in have a bid value. Now, I know for Beautiful Soup I can do
tab = browser.find_by_css('li', {'bid': '18663145091'})
but this doesn't seem to work for splinter. I get an error saying:
find_by_css() takes exactly 2 arguments (3 given)
This is a sample of my html:
<li class="rugby" bid="18663145091">
<span class="info">
<div class="points">
12
</div>
<img alt="Leinster" height="19" src="..Leinster" width="26"/>
</span>
</li>

It looks like you are using find_by_css() method as if it was a BeautifulSoup method. Instead, provide a valid CSS selector checking the value of the bid attribute:
tab = browser.find_by_css('li[bid=18663145091]')

Python Selenium Webdriver - Grab div after specified one

I am trying to use Python Selenium Firefox Webdriver to grab the h2 content 'My Data Title' from this HTML
<div class="box">
<ul class="navigation">
<li class="live">
<span>
Section Details
</span>
</li>
</ul>
</div>
<div class="box">
<h2>
My Data Title
</h2>
</div>
<div class="box">
<ul class="navigation">
<li class="live">
<span>
Another Section
</span>
</li>
</ul>
</div>
<div class="box">
<h2>
Another Title
</h2>
</div>
Each div has a class of box so I can't easily identify the one I want. Is there a way to tell Selenium to grab the h2 in the box class that comes after the one that has the span called 'Section Details'?

If you want grab the h2 in the box class that comes after the one that has the span with text Section Details try below xpath using preceding :-
(//h2[preceding::span[normalize-space(text()) = 'Section Details']])[1]
or using following :
(//span[normalize-space(text()) = 'Section Details']/following::h2)[1]
and for Another Section just change the span text in xpath as:-
(//h2[preceding::span[normalize-space(text()) = 'Another Section']])[1]
or
(//span[normalize-space(text()) = 'Another Section']/following::h2)[1]

Here is an XPath to select the title following the text "Section Details":
//div[#class='box'][normalize-space(.)='Section Details']/following::h2

yeah, you need to do some complicated xpath searching:
referenceElementList = driver.find_elements_by_xpath("//span")
for eachElement in referenceElementList:
if eachElement.get_attribute("innerHTML") == 'Section Details':
elementYouWant = eachElement.find_element_by_xpath("../../../following-sibling::div/h2")
elementYouWant.get_attribute("innerHTML") should give you "My Data Title"
My code reads:
find all span elements regardless of where they are in HTML and store them in a list called referenceElementList;
iterate all span elements in referenceElementList one by one, looking for a span whose innerHTML attribute is 'Section Details'.
if there is a match, we have found the span, and we navigate backwards three levels to locate the enclosing div[#class='box'], and find this div element next sibling, which is the second div element,
Lastly, we locate the h2 element from its parent.
Can you please tell me if my code works? I might have gone wrong somewhere navigating backwards.
There is potential difficulty you may encounter, the innerHTML attribute may contain tab, new line and space characters, in that case, you need regex to do some filtering first.

Using regex on python + beautiful soup

I have an html page like this:
<td class="subject windowbg2">
<div>
<span id="msg_152617">
<a href= SOME INFO THAT I WANT </a>
</span>
</div>
<div>
<span id="msg_465412">
<a href= SOME INFO THAT I WANT</a>
</span>
</div>
as you can see the id="msg_465412" have a variable number, so this is my code:
import urllib.request, http.cookiejar,re
from bs4 import BeautifulSoup
contenturl = "http://megahd.me/peliculas-microhd/"
htmll=urllib.request.urlopen(contenturl).read()
soup = BeautifulSoup(htmll)
print (soup.find('span', attrs=re.compile(r"{'id': 'msg_\d{6}'}")))
in the last line I tried to find all the "span" tags that contain an id that can be msg_###### (with any number) but something is wrong in my code and it doesn't find anything.
P.S: all the code I want is in a table with 6 columns and I want the third column of all rows, but I thought that it was easier to use regex

You're a bit mixed up with your attrs argument ... at the moment it's a regex which contains the string representation of a dictionary, when it needs to be a dictionary containing the attribute you're searching for and a regex for its value.
This ought to work:
print (soup.find('span', attrs={'id': re.compile(r"msg_\d{6}")}))

Try using the following:
soup.find_all("span" id=re.compile("msg_\d{6}"))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python selenium xpath horror - python

Related

Extract text from children of next nodes with XPath and Scrapy

Second element on selenium chromedriver

Python splinter select by tag attribute

Python Selenium Webdriver - Grab div after specified one

Using regex on python + beautiful soup

Categories

Resources