How to extract number from HTML Xpath

How to extract number from HTML Xpath - python

Please consider this statement :
hxs.select('//span[#class="product-count"]')
It selects span which is recognized by product-count. It returns correct html path which is :
HtmlXPathSelector xpath='//span[#class="product-count"]' data='<span class="product-count">2160</span>
I want to extract this specific number 2160 using regex or any other method. I treated it as string and tried getting the number using regex but that didn't work, probably because it is not a string and rather an xpath.
Thanks in advance.

Try this:
number = response.xpath('//span[#class="product-count"]/text()').get()

Related

How to find text with Selenium

Guys I need to know how to find a text with Selenium, this one for example:
Test
I can get the text with the following code:
wait.until(EC.element_to_be_clickable((By.XPATH, '//*[text()="Test"]'))).text
But I need to get a text in the following format:
"Key" "email#gmail.com"
I need to be able to get the above text, remembering the email and password may be different depending on the case, so I would like to get the full value of the string from the .com of the string, since the email will always have a .com, so in this case , I need to be able to find the .com and after finding me return the full value of the string.

Use contains in xpath to find a text that has .com. You can also use ends-with. The xpath would be something like this:
//*[contains(text(),'.com')]
Or
//*[ends-with(text(),'.com')]

I need to find the value in a dynamic h1 element in selenium python

I am using python 2.7, Firefox and selenium 3.11
I have the following line with a dynamic number as id:
<div class="class1" dojoattachpoint="titleText" dojoattachevent="onmouseover:_mouseOverTitle,onmouseout:_mouseOutTitle,onmousedown:_mouseDownTitle">
<h1 id="_N90399546_title">‪textToFind</h1>
I want to locate textToFind.
I have tried the following:
x= driver.find_element_by_xpath("//h1[#id='dynamicNumber']") -> which I was trying when the dynamic number was the same, to see if it worked, but didn't...
Result: textToFind not found
x= driver.find_element_by_link_text('textToFind') -> didn't work in a dynamic search
Result: textToFind not found
x = driver.find_element_by_xpath("//*[#class='class1'][contains(text(),'textToFind')]")
Result: textToFind not found
x = driver.find_element_by_xpath("//*[#class='class1'][contains(text(),'textToFind')]").getAttribute("innerHTML")
Result: textToFind not found
x = driver.find_element_by_xpath("//*[#class='class1']")
Result: textToFind not found
I don't need to extract the text, get attributes, or anything, I just need to find the "textToFind" string. I really appreciate the help.
I have had search several selenium dynamic tutorials, look dozens of websites, search several similar posts in stackoverflow and I can't get to find the text I need.

You can try Following css selector.
print(driver.find_element_by_css_selector("div.class1>h1").text)

If you have the fix text string ( textToFind in your case ). Then go ahead with following xpath:
x = driver.find_element_by_xpath("//div[#class='class1']/h1[contains(text(),'textToFind')]")
If id is not completely dynamic. Means, if it appends some dynamic number or alphabets either start, end or middle. for example <h1 id="some-prefix-132dsad233"> then you use following partial match using CSS selector:
if fix prefix string e.g. id="some-prefix-132dsad233" then x = driver.find_element_by_css_selector("h1[id^='some-prefix']")
if fix postfix string e.g. id="132dsad233some-postfix" then x = driver.find_element_by_css_selector("h1[id$='some-postfix']")
match some fix text in id e.g. id="132dsad233some-postfix" then x = driver.find_element_by_css_selector("h1[id*='some']")
You can also use combination of both e.g. id="hello132dsad233some-postfix" then x = driver.find_element_by_css_selector("h1[id^='hello'][id$='postfix']")
Please use other surround element combination to make your locator unique and robust

Find all elements that start with 'button-'

I am using Selenium to try and get all ID elements that that start with "button-". What I have tried so far was to use regex to match the "button-" but I get an error stating that TypeError: Object of type 'SRE_Pattern' is not JSON serializable. My code so far is:
all_btns = self.driver.find_elements_by_id(re.compile('^button-?'))
But as mentioned that raises an error. What is the appropriate way of getting all elements when you don't know the full ID, class, css selector etc.?

You could use find_element_by_xpath and starts-with:
find_elements_by_xpath('//*[starts-with(#id, "button-")]')
//* will match any elements
[starts-with(#id, "button-")] will filter the elements with a property id that starts with button-

Clément's answers works just fine, but there is also a way to do this with css selectors:
*[id^='button'].
* matches all tags, just like in xpath and ^= means 'starts with'

Scrapy SgmlLinkExtractor how to scrape li tags with changing id's

How can I get an element at this specific location:
Check picture
The XPath is:
//*[#id="id316"]/span[2]
I got this path from google chrome browser. I basically want to retreive the number at this specific location with the following statement:
zimmer = response.xpath('//*[#id="id316"]/span[2]').extract()
However I'm not getting anything but an empty string. I found out that the id value is different for each element in the list I'm interested in. Is there a way to write this expression such that it works for generic numbers?

Use the corresponding label and get the following sibling element containing the value:
//span[. = 'Zimmer']/following-sibling::span/text()
And, note the bonus to the readability of the locator.

Extract information from a webpage in a particular format

I am trying to make a simple python script to extract certain links from a webpage. I am able to extract link successfully but now I want to extract some more information like bitrate,size,duration given on that webpage.
I am using the below xpath to extract the above mentioned info
>>> doc = lxml.html.parse('http://mp3skull.com/mp3/linkin_park_faint.html')
>>> info = doc.xpath(".//*[#id='song_html']/div[1]/text()")
>>> info[0:7]
['\n\t\t\t', '\n\t\t\t\t3.71 mb\t\t\t', '\n\t\t\t', '\n\t\t\t\t3.49 mb\t\t\t', '\n\t\t\t', '\n\t\t\t\t192 kbps', '2:41']
Now what I need is that for a particular link the info I require is generated in a form of tuple like (bitrate,size,duration).
The xpath I mentioned above generates the required info but it is ill-formatted that is it is not possible to achieve my required format with any logic at least I am not able to that.
So, is there any way to achieve the output in my format.?

I think BeautifulSoup will do the job, it parses even badly formatted HTML:
http://www.crummy.com/software/BeautifulSoup/
parsing is quite easy with BeautifulSoup - for example:
import bs4
import urllib
soup = bs4.BeautifulSoup(urllib.urlopen('http://mp3skull.com/mp3/linkin_park_faint.html').read())
print soup.find_all('a')
and have quite good docs:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/

You can actually strip everything out with XPath:
translate(.//*[#id='song_html']/div[1]/text(), "\n\t,'", '')
So for your additional question, either:
info[0, len(info)]
for altogether, or:
info.rfind(" ")
Since the translate leaves a space character, but you could replace that with whatever you wanted.
Addl info found here

How are you with regular expressions and python's re module?
http://docs.python.org/library/re.html may be essential.
As far as getting the data out of the array, re.match(regex,info[n]) should suffice, as far as the triple tuple goes, the python tuple syntax takes care of it. Simply match from members of your info array with re.match.
import re
matching_re = '.*' # this re matches whole strings, rather than what you need
incoming_value_1 = re.match(matching_re,info[1])
# etc.
var truple = (incoming_value_1, incoming_value_2, incoming_value_2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract number from HTML Xpath - python

Try this: number = response.xpath('//span[#class="product-count"]/text()').get()

Related

How to find text with Selenium

I need to find the value in a dynamic h1 element in selenium python

Find all elements that start with 'button-'

Scrapy SgmlLinkExtractor how to scrape li tags with changing id's

Extract information from a webpage in a particular format

Categories

Resources