Scraping a sliding table os Selenium - python

I'm trying to get data from a sliding table on a website (like those stockmarket prices on some websites).
I'm using this line:
elem=driver.find_elements_by_xpath('/html/body/div[1]/div/div/article/div/div/div/div/div[1]/div/div/aside/div/div/div/ul/li')
It seems to get all the elements to the list just fine.
But once I use any method on the list, let's say:
for i in elem:
print(i.text)
It actually just return the values visible at that very moment.
Can somebody help?

So in most cases try in the following order:
getText() if it doesn't work use getAttribute('textContent') if that too doesn't work use getAttribute('value')
getAttribute('value') works only if there is an attribute called value in your element (like id, name etc)
So in most cases, if getText doesn't work use .getAttribute('textContent')
Use:
i.get_attribute("textContent")
Because getText or text() actually uses innerText ,and will not detect text from hidden elements .
Don't get confused by the differences between Node.textContent and
HTMLElement.innerText. Although the names seem similar, there are
important differences:
textContent gets the content of all elements, including and
elements. In contrast, innerText only shows “human-readable” elements.
textContent returns every element in the node. In contrast, innerText
is aware of styling and won’t return the text of “hidden” elements.
Moreover, since innerText takes CSS styles into account, reading the
value of innerText triggers a reflow to ensure up-to-date computed
styles. (Reflows can be computationally expensive, and thus should be
avoided when possible.)
Unlike textContent, altering innerText in Internet Explorer (version
11 and below) removes child nodes from the element and permanently
destroys all descendant text nodes. It is impossible to insert the
nodes again into any other element or the same element after
doing so
https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent

Try
for i in elem:
print(i.get_attribute('textContent'))
to get text from hidden elements as well

Related

Finding screen elements in appium(iOS) using contains

I am doing QA and recently started using appium and cucumber to automate some tests and am still new to this.
I succeeded in what I wanted to automate using some functions like this one.
def function(element_name)
find_element(:xpath,
'//XCUIElementTypeOther[#name="' + element_name + '"]'
).click
end
This works for what I want, but now I am trying to redo the same functions but using contains. Something like this
def function(element_name)
find_element(:xpath,
'//*[contains(text(), element_name)]'
).click
end
What I'm getting is
An element could not be located on the page using the given search parameters.
I think I am just not using contains the right way but I am really not sure.
Xpath is not a good way for searching elements in Appium/XCUITest as it is often too slow and might not work as you expect it to work (your case with contains).
Instead you should you try XCUITest native locator strategies: iOSNsPredicate or iOSClassChain, e.g.
driver.find_element_by_ios_predicate("label contains 'Some text'")
You can check more examples here: python client tests, java client tests
the way you are using contains is correct, the only problem is its not finding the element.
If XPATH is not working , why dont you try with relative path e.g. below
def function(element_name)
//tbody//td[contains(text(),element_name)]

Is it more reliable to rely on index of occurrence or keywords when parsing websites?

Just started using XPath, I'm parsing a website with lxml.
Is it preferable to do:
number_I_want = parsed_body.xpath('.//b')[6].text
#or
number_I_want = parsed_body.xpath('.//span[#class="class_name"]')[0].text
I'd rather find this out now, rather than much further down the line. Actually I couldn't get something like the second expression to work for my particular case.
But essentially, the question: is it better to rely on class names (or other keywords) or indices of occurrence (such as 7th occurrence of bolded text)?
I'd say that it is generally better to rely on id attributes, or class by default, than on the number and order of appearance of specific tags.
That is more resilient to change in the page content.

Troubles with XPath Plugin Python

So i am getting the path id('page-content')/x:div[6]/x:div/x:div/x:div/x:a[1]/x:img
How would i go about clicking the img?
I've tried
lol=find_element_by_xpath("//div[#class='emoji-items nano-content']//a[#title=':heart:']/img")
as well as
lol=find_element_by_xpath("//a[#title=':heart:']/img")
which i believe should work, but it instead gives me an error
What i did is: select all links elements (a) that contains heart in the title.
I tried to avoid using #title= because sometimes you might have issues due syntax used in the methods for finding the element or in other methods due to special characters like :.
The resulted path was:
//a[contains(#title, 'heart')]
If more elements are found you need to add another element in front of this to restrict the section that is searched in.

Python MiniDom not removing elements properly

I'm converting a piece of JS code to Python, and I have been using mini DOM, but certain things aren't working right. They were working find when running in JavaScript. I'm converting because I want consistent changes / order (i.e. where the class attribute is added), as well as so I can use some of Pythons easier features.
My latest issue that I've come across is this:
fonts = doc.getElementsByTagName('font')
while(fonts.length > 0):
# Create a new span
span = doc.createElement("span")
# Give it a class name based on the color (colors is a map)
span.setAttribute("class", colors[fonts[0].getAttribute("color")])
# Place all the children inside
while(fonts[0].firstChild):
span.appendChild(fonts[0].firstChild)
# end while
# Replace the <font> with a the <span>
print(fonts[0].parentNode.toxml())
fonts[0].parentNode.replaceChild(span, fonts[0])
# end while
The problem is that, unlike in JavaScript, the element isn't removed from fonts like it should be. Is there a better library I should be using that uses the standard (level 3) DOM rules, or am I going to just have to hack it out if I don't want to use xPath (what all the other DOM parsers seem to use)?
Thanks.
You can see in the documentation for Python DOM (very bottom of the page) that it doesn't work like a "real" DOM in the sense that collections like you get from getElementsByTagName are not "live". Using getElementsByTagName here just returns a static snapshot of the matching elements at that moment. This isn't usually a problem with Python, because when you're using xml.dom you're not working with a live-updating page inside a browser; you're just manipulating a static DOM parsed from a file or string, so you know no other code is messing with the DOM while you aren't looking.
In most cases, you can probably get what you want by changing the structure of your code to reflect this. For this case, you should be able to accomplish your goal with something like this:
fonts = doc.getElementsByTagName('font')
for font in fonts:
# Create a new span
span = doc.createElement("span")
# Give it a class name based on the color (colors is a map)
span.setAttribute("class", colors[font.getAttribute("color")])
# Place all the children inside
while(font.firstChild):
span.appendChild(font.firstChild)
# end while
# Replace the <font> with a the <span>
font.parentNode.replaceChild(span, font)
The idea is that instead of always looking at the first element in fonts, you iterate over each one and replace them one at a time.
Because of these differences, if your JavaScript DOM code makes use of these sorts of on-the-fly DOM updates, you won't be able to port it "verbatim" to Python (using the same DOM calls). However, sometimes doing it in this less dynamic way can be easier, because things change less under your feet.

Why use a generator object in this particular case?

I was looking at a bit of code I downloaded from the internet. It's for a basic webcrawler. I came across the following for loop:
for link in (links.pop(0) for _ in xrange(len(links))):
...
Now, I feel the following code will also work:
for link in links:
....
links=[]
Researching, I found out that the first instance clears links and also generates a generator object (genexpr). links is never used in the for loop, so its decreasing length has nothing to do with the code.
Is there any particular reason for using the xrange, and popping the elements each time? I.e. Is there any advantage to using a generator object over calling elements of the standard list? Additionally, in what cases would a generator be useful; why?
It's hard to see any justification for the code you quoted.
The only thing I can think of is that the objects in links might be large, or otherwise associated with scarce resources, and so it might be important to free them as soon as possible (rather than waiting until the end of the loop to free them all). But (a) if so, it would be better to process each link as you created it (perhaps using a generator to organize the code), instead of building up the whole list of links before starting to process it; and (b) even if you had no choice but to build up the whole list before processing it, it would be cheaper to clear each list entry than to pop the list:
for i, link in enumerate(links):
links[i] = None
...
(Popping the first element off a list with n items takes O(n), although in practice it will be fairly fast since it's implemented using memmove.)
Even if you absolutely insisted on repeatedly popping a list as you iterated across it, it would be better to write the loop like this:
while links:
link = links.pop(0)
...
The purpose of generators is to avoid building large collections of intermediate objects that won't serve any external use.
If all the code is doing is building the set of links on a page, the second code snippet is fine. But perhaps what might be desired is the set of root websites names (eg google.com rather than google.com/q=some_search_term....). If this is the case, you'd then take the list of links and then go through the full list, stripping out just the first part.
It's for this second stripping portion where you'd gain more by using a generator. Rather than having needlessly built a list of links which takes memory and time to build, you can now pass through each link one-by-one, getting the website name without a big intermediary list of all links.

Categories