Xpath seems didn't return expected span - python

I am trying to use xpath to download all images from a webpage.
Have managed to found the specific element which has several spans, and the full xpath looks as below:
/html/body/div[2]/div[3]/main/ul/li[4]/article/div[1]/a/span/span[1]
/html/body/div[2]/div[3]/main/ul/li[4]/article/div[1]/a/span/span[2]
/html/body/div[2]/div[3]/main/ul/li[4]/article/div[1]/a/span/span[3]
etc.
Currently I've get the whole element till "li[4]" level and tried to use below code to find all the leaves elements of the tree, but the returned value is empty:
->node.xpath('./article/div[#class="flex-box"]/a/span[starts-with(#class,"grid-box")]/span')
->[]
And the parent node length is only 1 instead of the number of the leaves which I expected to be at least 4-5 here:
->len(node.xpath('./article/div[#class="flex-box"]/a/span[starts-with(#class,"grid-box")]'))
->1
->node.xpath('./article/div[#class="flex-box"]/a/span[starts-with(#class,"grid-box")]')[0]
-><Element span at 0x1ac51134040>
Any one could help me figure out what is going on here?

Related

Finding an element via xpath by index number of an element in a table

I am trying to find an element via index number in Python Selenium. The element has the following xpath;
/html/body/div[4]/div[1]/div[3]/div/div[2]/div/div/div/div/div/div/div[1]/div[2]/div/div/div/div[2]/div/div/div/div[2]/div/div[3]/div[1]/div/div/div[9]
I have also included the below image of the element I want.
Element I want:
I am able to find the element via normal xpath, but I want to do this in a loop 10,000 times.
I was wondering if there is a way to find the element using its index number. Here is the code I have for it, for just index value 5, but it is not working.
Fund_click.append (driver.find_element_by_xpath("//div[#id='app']/div[1]/div[3]/div/div[2]/div/div/div/div/div/div/div[1]/div[2]/div/div/div/div[2]/div/div/div/div[2]/div/div[3]/div[1]/div/div/[div/#index='5']"))
Based on your snapshot you can try this way to see if you get any difference.I have consider to attribute claas & Index.
for i in range(1,10000):
print(driver.find_element_by_xpath("//div[#class='tg-row tg-level-0 tg-even tg-focused']" and "//div[#index='" + str(i) + "']"))
If you want use index number then just try below code.But i am not sure you are able to identify elements only using index so better to try 1st option above.
for i in range(1,10000):
print(driver.find_element_by_xpath("//div[#index='" + str(i) + "']"))
Try the below xpath to confirm, you can able to locate all the elements or not? If the page is loaded fully then you should see all the matches (Inspect manually).
//div[#id='app']//div[contains(#class, 'tg-row')]
You can fetch and store all the elements by using driver.find_elements_by_xpath() method with the help of above xpath locator so that you can avoid iterating through element index each time. Try the below code :
# Fetching and storing all the matches
elements = driver.find_elements_by_xpath("//div[#id='app']//div[contains(#class, 'tg-row')]");
for element in elements:
# printing the index numbers to confirm it has fetched or not?
print(element.get_attribute('index'))
Try to give some delay before fetching if the element is present and if you get NoSuchElementException or check for frame/iframe.
If all the elements are not visible, you many need to perform some scroll operations.
If the above mentioned method doesn't work then you can try the below xpaths to identify that element based on index/matching index numbers and you can proceed with your approach.
(//div[#id='app']//div[contains(#class, 'tg-row')])[matching index]
or
//div[#id='app']//div[contains(#class, 'tg-row') and #index='provide
index number here']
or
//div[contains(#class, 'tg-row') and #index='provide index number
here']
or
//div[#index='provide index number here']
or
(//div[contains(#class, 'tg-row')])[provide matching index number
here]
I hope it helps...

XPath not expected result with lxml

I'm sorry if my question isn't formatted right, English isn't my native language.
I'm trying to get the table from the following url Bulapedia, Bulbasaur But lxml gives me very weird results when I use xpath.
I've tried the following:
for elem in tree.xpath('//*[#id="mw-content-text"]//table[14]//tr[3]//td//table//tr//td'):
print(etree.tostring(elem, pretty_print=True))
This doesn't give me the data I need, it gives values from a different table data, randomized even.
I'm at a loss of what to try now, cssselect isn't an option either, since that seems to change depending on which Pokemon I'm searching for.
I'm trying to get the following results:
Other than the first element *[#id="mw-content-text"], all the rest of the elements in your XPath should be the immediate children of the ones before them. By using // you're selecting elements of any depth within the parent, which is not what you want.
Change all but the first //s to / and it should work as intended:
for elem in tree.xpath('//*[#id="mw-content-text"]/table[14]/tr[3]/td/table/tr/td'):
print(etree.tostring(elem, pretty_print=True))

Scrapy SgmlLinkExtractor how to scrape li tags with changing id's

How can I get an element at this specific location:
Check picture
The XPath is:
//*[#id="id316"]/span[2]
I got this path from google chrome browser. I basically want to retreive the number at this specific location with the following statement:
zimmer = response.xpath('//*[#id="id316"]/span[2]').extract()
However I'm not getting anything but an empty string. I found out that the id value is different for each element in the list I'm interested in. Is there a way to write this expression such that it works for generic numbers?
Use the corresponding label and get the following sibling element containing the value:
//span[. = 'Zimmer']/following-sibling::span/text()
And, note the bonus to the readability of the locator.

Scan through row elements whose classname is identical

I have a number of links in rows, in a web page, whose class-names are the same. Like this:
I am able to click the first link occurrence using XPATH,
"(//span[#class='odds black'])"
However, I want to scan through the particular row and click on each odds (if it is present).
Any help on how to achieve this ?
Note: I cannot find the element using other attributes, as it will change dynamically as per the data.
Image of reference source code:
Instead of using the XPATH in this format:
"(//span[#class='odds black'])"
could you use it in this format shown just above your red box:
/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr[1]/td[5]/a/span[2]/span/span
(you can get this format easily by selecting an element in firebug, right clicking it's code and selecting copy XPATH).
I have found in many instances I can add a counter for a tr[1] or some other path attribute in order to move down rows quite accurately. I can't really see your site to compare the xpath below but I imagine it would be something like:
/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr[1]/td[5]/a/span[2]/span/span
/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr[2]/td[5]/a/span[2]/span/span
/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr[3]/td[5]/a/span[2]/span/span
then you can add a counter like "i"
so you would iterate the counter in the loop and set it to something along the lines of:
"/html/body/div[2]/div[3]/div[1]/div[1]/div[2]table/tbody[31]/tr["+str(i)+"]/td[5]/a/span[2]/span/span"
Assuming that class name will be always 'odds some color' you can use xpath's contains() function. Xpath like this:
"//span[contains(#class,'odds')]"
will return all spans that contain string 'odds' in classname.
CSS selectors are class aware so it would make more sense to me to use;
span.odds
Xpath treats class as a simple string so forces you to use "contains" where as CSS allows you to treat classes separately

In Selenium, how do I include a specific node [1] using find_elements_by_css_selector()

In the case that I want the first use of class so I don't have to guess the find_elements_by_xpath(), what are my options for this? The goal is to write less code, assuring any changes to the source I am scraping can be fixed easily. Is it possible to essentially
find_elements_by_css_selector('source[1]')
This code does not work as is though.
I am using selenium with Python and will likely be using phantomJS as the webdriver (Firefox for testing).
In CSS Selectors, square brackets select attributes, so your sample code is trying to select the 'source' type element with an attribute named 1, eg
<source 1="your_element" />
Whereas I gather you're trying to find the first in a list that looks like this:
<source>Blah</source>
<source>Rah</source>
If you just want the first matching element, you can use the singular form:
element = find_element_by_css_selector("source")
The form you were using returns a list, so you're also able to get the n-1th element to find the nth instance on the page (Lists index from 0):
element = find_elements_by_css_selector("source")[0]
Finally, if you want your CSS selectors to be completely explicit in which element they're finding, you can use the nth-of-type selector:
element = find_element_by_css_selector("source:nth-of-type(1)")
You might find some other helpful information at this blog post from Sauce Labs to help you write flexible selectors to replace your XPath.

Categories