I've written an xpath to locate a certain element. However, the thing is the element i'm after may be available in either childNodes[10] or childNodes[12]. I'would like to create an expression combining the two in a single expression and it will still locate the both irrespective of its position. The two expressions are:
First one:
element = driver.find_element_by_xpath("//td[#class='data']/table//th")
name = driver.execute_script("return arguments[0].childNodes[10].textContent", element).strip()
Second one:
element = driver.find_element_by_xpath("//td[#class='data']/table//th")
name = driver.execute_script("return arguments[0].childNodes[12].textContent", element).strip()
I tried like this but won't get any result:
element = driver.find_element_by_xpath("//td[#class='data']/table//th")
name = driver.execute_script("return arguments[0].childNodes[10].textContent|return arguments[0].childNodes[12].textContent", element).strip()
How can I cement the both in a single expression? Btw, I'm using this xpath in python + selenium script.
Here is the link to the html:
"https://www.dropbox.com/s/vl8anp8te48ktl2/For%20SO.txt?dl=0"
If one of text nodes is an empty string you can try:
element = driver.find_element_by_xpath("//td[#class='data']/table//th")
name = driver.execute_script("return arguments[0].childNodes[10].textContent", element).strip() or driver.execute_script("return arguments[0].childNodes[12].textContent", element).strip()
This should return you name with non-empty string (the first occurence of non-empty text node)
how about this,
following will give you an input element at index 10 or 12
element = driver.find_element_by_xpath("//td[#class='data']/table//th//*[position()=10 or position()=12]/input")
Related
This is my current code:
kwlist = []
for url in soup.find_all('a', href=True):
text = url.get_text()
if keywords in text:
links = (url['href'])
kwlist.append(links)
'|'.join(kwlist)
colorlist = []
for url in soup.find_all('a', href=True):
text = url.get_text()
if color in text:
colorlinks = (url['href'])
colorlist.append(colorlinks)
'|'.join(color)
#finallink = any(x in kwlist for x in colorlist)
#print(finallink)
kwset = set(kwlist)
colorset = set(colorlist)
intersection = str(kwset.intersection(colorset))
print(intersection)
driver.get('https://www.supremenewyork.com/' + intersection)
This is the print output:
{'/shop/tops-sweaters/deshwjqp5/i640qb2pu'}
But, this is the website that selenium is navigating to:
https://www.supremenewyork.com/%7B'/shop/tops-sweaters/deshwjqp5/i640qb2pu'%7D
I need selenium to navigate to:
https://www.supremenewyork.com/shop/tops-sweaters/deshwjqp5/i640qb2pu
That is the same item in my list, just with out %7b' and '%7D
How would I change the list into my desired output?
%7B and %7D are encoded { and } characters from the set after you call str on it.
You need to "unwrap" the value out of the set first. If you only want a single entry from intersection, and you know that it will always contain at least one element, you can convert it to a list, then take an element from it. Something like:
# Convert to list, then take the first element
intersection = str(list(kwset.intersection(colorset))[0])
print(intersection)
driver.get('https://www.supremenewyork.com/' + intersection)
If you can't guarantee kwset.intersection(colorset) will always return at least an element, you'll need to do some error handling, or switch to a different way to "unwrap" the set, like calling a defaulting next on an iterator of the set.
I have used this code but I want to remove those links where it appears the string 'vs' three times in the link, it only has to appear one time:
elems = driver.find_elements_by_xpath("//a[contains(#href, '/president/us/general_election')][not(#href = following::a/#href)]")
for elem in elems:
print(elem.get_attribute("href"))
Update:
I have realised that some of my code is not working as I expected, I used the code : [not(#href = following::a/#href)] to remove repeated href but there is still one href that is repeated. Any help is welcome!
Try this:
elems = driver.find_elements_by_xpath("//a[contains(#href, '/president/us/general_election')][not(#href = following::a/#href)]")
for elem in elems:
if elem.get_attribute("href").count("vs") == 1: print(elem.get_attribute("href"))
To get link with only one occurrence of "_vs_" in #href you can extend your XPath with predicate
[not(contains(substring-after(#href, "_vs_"), "_vs_"))]
Your final XPath will be
"//a[contains(#href, '/president/us/general_election') and not(contains(substring-after(#href, "_vs_"), "_vs_")) and not(#href = following::a/#href)]"
I need to get element from page and then print it out
But it always print out this:
[<selenium.webdriver.remote.webelement.WebElement (session="636e500d9db221d6b7b10b8d7849e1b5",
element="4f0ccc3b-44b0-4cf2-abd4-95a70278bf77")>...
My code:
film_player = free_filmy_url + filmPlayerPart
dr = webdriver.Chrome()
dr.get(film_player)
captcha_button = dr.find_element_by_xpath('/html/body/form/div/input[1]')
captcha_items = dr.find_elements_by_class_name('captchaMyImg')
print(captcha_items)
You can iterate through the captcha_items and print each of them.
for captcha_item in captcha_items:
# you can access each item here as "captcha_item"
print(captcha_item.text()) # if you are trying to print the text
Through your lines of code:
captcha_items = dr.find_elements_by_class_name('captchaMyImg')
print(captcha_items)
You are printing the elements.
Rather you must be looking to print an attribute of the elements and you can use the following solution:
print([my_elem.get_attribute("attribute") for my_elem in dr.find_elements_by_class_name('captchaMyImg')])
Note: You need to replace the text attribute with any of the existing attributes for those elements, e.g. src, href, innerHTML, etc.
i'm trying to scrape text from this page:
http://codingbat.com/prob/p187868
specifically, i want to scrape two strings from the page, to combine as the key in a dictionary with the problem statement as value. these are the two parts of the name of the problem (here: 'Warmup-1' and 'sleepin'). however, the strings are contained in different levels of the parse tree and this is creating problems.
abstractly, the problem is this:
i'm trying to scrape text from a parse tree of:
div-->{[a[span'h2'[string1]]], [span'h2'[string2]], some other tags}
since they are both contained in 'span' tags with the attribute class='h2', i can scrape a list of these and then select from the list.
div_nameparts = name_div.find_all('span', class_='h2')
name1 = div_nameparts[0].string
name2 = div_nameparts[1].string
problem_name = name1+' > '+name2
print(problem_name)
but what if those tags didn't share an attribute like they do here ('h2')?
if i try to navigate the parse tree using div.a.string, i can get the first tag - string1. but div.span.string does not return the second value (string2).
name1 = name_div.a.string
name2 = name_div.span.string
instead it again returns the first (string1), apparently navigating to div.a.span (the child of a child) and stopping, before finding its way to div.span (the next child).
and if i try div.a.next_sibling to try to navigate to div.span and get the string with div.span.string,
name1 = name_div.a.string
name2_div = name_div.a.next_sibling
name2 = name2_div.string
it returns an empty string, a value of None.
is there a better/effective way to get to navigate the parse tree to get to these span tags?
thanks in advance!
This'll work as long as the 'greater than' symbol (' > ') with leading and trailing space doesn't appear before the pair of strings you want:
gt = soup.find(text=' > ')
string1 = gt.findPrevious('span').text
string2 = gt.findNext('span').text
print(string1, gt, string2, sep='')
The output:
Warmup-1 > sleepIn
Here is my code:
def textfinder():
try:
textfinder1 = driver.find_elements_by_class_name("m-b-none").text
except NoSuchElementException:
pass
print("no such element")
print(textfinder1)
It works only when I use find_element. When I use find_elements, it gives me error "list" object has no attribute "text". I understand that it returns a list, but I just don’t know how to "read" it. When I remove .text from the command, I don’t get any error, but some weird data, but I need the text content of the class.
Actually, when you do
text = driver.find_element_by_class_name("m-b-none").text
You will get the first element that is matched, and this element possesses, thanks to Selenium, an attribute whose name is text. A contrario, when you do
matched_elements = driver.find_elements_by_class_name("m-b-none")
^
it will match all corresponding elements. Given that matched_elements is a list, and that this list is a Python-native one, (it is not, for example, a Selenium-modified object which has text as attribute), you will have to iter over it, and iteratively get the text of each element. As follows
texts = []
for matched_element in matched_elements:
text = matched_element.text
texts.append(text)
print(text)
Or if you want to leave your code unchanged as possible, you can do it in one line:
texts = [el.text for el in driver.find_elements_by_class_name("m-b-none")]
You would need to reference each like an element of a list. Something like this:
textfinder1 = driver.find_elements_by_class_name("m-b-none")
for elements in textfinder1:
print elements.text
The method find_elements_by_class_name returns a list, so you should do:
text = ''
textfinder1 = driver.find_elements_by_class_name("m-b-none")
for i in textfinder1:
text += i.text + '\n' # Or whatever the way you want to concatenate your text
print(text)