Can't get element SELENIUM PYTHON - python

I need to get element from page and then print it out
But it always print out this:
[<selenium.webdriver.remote.webelement.WebElement (session="636e500d9db221d6b7b10b8d7849e1b5",
element="4f0ccc3b-44b0-4cf2-abd4-95a70278bf77")>...
My code:
film_player = free_filmy_url + filmPlayerPart
dr = webdriver.Chrome()
dr.get(film_player)
captcha_button = dr.find_element_by_xpath('/html/body/form/div/input[1]')
captcha_items = dr.find_elements_by_class_name('captchaMyImg')
print(captcha_items)

You can iterate through the captcha_items and print each of them.
for captcha_item in captcha_items:
# you can access each item here as "captcha_item"
print(captcha_item.text()) # if you are trying to print the text

Through your lines of code:
captcha_items = dr.find_elements_by_class_name('captchaMyImg')
print(captcha_items)
You are printing the elements.
Rather you must be looking to print an attribute of the elements and you can use the following solution:
print([my_elem.get_attribute("attribute") for my_elem in dr.find_elements_by_class_name('captchaMyImg')])
Note: You need to replace the text attribute with any of the existing attributes for those elements, e.g. src, href, innerHTML, etc.

Related

Trying to isolate URL suffix's from list of href tags

I'm currently working on a simple web crawling program that will crawl the SCP wiki to find links to other articles in each article. So far I have been able to get a list of href tags that go to other articles, but can't navigate to them since the URL I need is embedded in the tag:
[ SCP-1512,
SCP-2756,
SCP-002,
SCP-004 ]
Is there any way I would be able to isolate the "/scp-xxxx" from each item in the list so I can append it to the parent URL?
The code used to get the list looks like this:
import requests
import lxml
from bs4 import BeautifulSoup
import re
def searchSCP(x):
url = str(SCoutP(x))
c = requests.get(url)
crawl = BeautifulSoup(c.content, 'lxml')
#Searches HTML for text containing "SCP-" and href tags containing "scp-"
ref = crawl.find_all(text=re.compile("SCP-"), href=re.compile("scp-",))
param = "SCP-" + str(SkateP(x)) #SkateP takes int and inserts an appropriate number of 0's.
for i in ref: #Below function is for sorting out references to the article being searched
if str(param) in i:
ref.remove(i)
if ref != []:
print(ref)
The main idea I've tried to use is finding every item that contains items in quotations, but obviously that just returned the same list. What I want to be able to do is select a specific item in the list and take out ONLY the "scp-xxxx" part or, alternatively, change the initial code to only extract the href content in quotations to the list.
Is there any way I would be able to isolate the "/scp-xxxx" from each item in the list so I can append it to the parent URL?
If I understand correctly, you want to extract the href attribute - for that, you can use i.get('href') (or probably even just i['href']).
With .select and list comprehension, you won't even need regex to filter the results:
[a.get('href') for a in crawl.select('*[href*="scp-"]') if 'SCP-' in a.get_text()]
would return
['/scp-1512', '/scp-2756', '/scp-002', '/scp-004']
If you want the parent url attached:
root_url = 'https://PARENT-URL.com' ## replace with the actual parent url
scpLinks = [root_url + l for l, t in list(set([
(a.get('href'), a.get_text()) for a in crawl.select('*[href*="scp-"]')
])) if 'SCP-' in t]
scpLinks should return
['https://PARENT-URL.com/scp-004', 'https://PARENT-URL.com/scp-002', 'https://PARENT-URL.com/scp-1512', 'https://PARENT-URL.com/scp-2756']
If you want to filter out param, add str(param) not in t to the filter:
scpLinks = [root_url + l for l, t in list(set([
(a.get('href'), a.get_text()) for a in crawl.select('*[href*="scp-"]')
])) if 'SCP-' in t and str(param) not in t]
if str(param) was 'SCP-002', then scpLinks would be
['https://PARENT-URL.com/scp-004', 'https://PARENT-URL.com/scp-1512', 'https://PARENT-URL.com/scp-2756']

How would I convert an entry in my list into a string?

This is my current code:
kwlist = []
for url in soup.find_all('a', href=True):
text = url.get_text()
if keywords in text:
links = (url['href'])
kwlist.append(links)
'|'.join(kwlist)
colorlist = []
for url in soup.find_all('a', href=True):
text = url.get_text()
if color in text:
colorlinks = (url['href'])
colorlist.append(colorlinks)
'|'.join(color)
#finallink = any(x in kwlist for x in colorlist)
#print(finallink)
kwset = set(kwlist)
colorset = set(colorlist)
intersection = str(kwset.intersection(colorset))
print(intersection)
driver.get('https://www.supremenewyork.com/' + intersection)
This is the print output:
{'/shop/tops-sweaters/deshwjqp5/i640qb2pu'}
But, this is the website that selenium is navigating to:
https://www.supremenewyork.com/%7B'/shop/tops-sweaters/deshwjqp5/i640qb2pu'%7D
I need selenium to navigate to:
https://www.supremenewyork.com/shop/tops-sweaters/deshwjqp5/i640qb2pu
That is the same item in my list, just with out %7b' and '%7D
How would I change the list into my desired output?
%7B and %7D are encoded { and } characters from the set after you call str on it.
You need to "unwrap" the value out of the set first. If you only want a single entry from intersection, and you know that it will always contain at least one element, you can convert it to a list, then take an element from it. Something like:
# Convert to list, then take the first element
intersection = str(list(kwset.intersection(colorset))[0])
print(intersection)
driver.get('https://www.supremenewyork.com/' + intersection)
If you can't guarantee kwset.intersection(colorset) will always return at least an element, you'll need to do some error handling, or switch to a different way to "unwrap" the set, like calling a defaulting next on an iterator of the set.

find_elements_by_xpath that do not contains 3 times a specific string

I have used this code but I want to remove those links where it appears the string 'vs' three times in the link, it only has to appear one time:
elems = driver.find_elements_by_xpath("//a[contains(#href, '/president/us/general_election')][not(#href = following::a/#href)]")
for elem in elems:
print(elem.get_attribute("href"))
Update:
I have realised that some of my code is not working as I expected, I used the code : [not(#href = following::a/#href)] to remove repeated href but there is still one href that is repeated. Any help is welcome!
Try this:
elems = driver.find_elements_by_xpath("//a[contains(#href, '/president/us/general_election')][not(#href = following::a/#href)]")
for elem in elems:
if elem.get_attribute("href").count("vs") == 1: print(elem.get_attribute("href"))
To get link with only one occurrence of "_vs_" in #href you can extend your XPath with predicate
[not(contains(substring-after(#href, "_vs_"), "_vs_"))]
Your final XPath will be
"//a[contains(#href, '/president/us/general_election') and not(contains(substring-after(#href, "_vs_"), "_vs_")) and not(#href = following::a/#href)]"

Can't combine the two xpath expression in a single one

I've written an xpath to locate a certain element. However, the thing is the element i'm after may be available in either childNodes[10] or childNodes[12]. I'would like to create an expression combining the two in a single expression and it will still locate the both irrespective of its position. The two expressions are:
First one:
element = driver.find_element_by_xpath("//td[#class='data']/table//th")
name = driver.execute_script("return arguments[0].childNodes[10].textContent", element).strip()
Second one:
element = driver.find_element_by_xpath("//td[#class='data']/table//th")
name = driver.execute_script("return arguments[0].childNodes[12].textContent", element).strip()
I tried like this but won't get any result:
element = driver.find_element_by_xpath("//td[#class='data']/table//th")
name = driver.execute_script("return arguments[0].childNodes[10].textContent|return arguments[0].childNodes[12].textContent", element).strip()
How can I cement the both in a single expression? Btw, I'm using this xpath in python + selenium script.
Here is the link to the html:
"https://www.dropbox.com/s/vl8anp8te48ktl2/For%20SO.txt?dl=0"
If one of text nodes is an empty string you can try:
element = driver.find_element_by_xpath("//td[#class='data']/table//th")
name = driver.execute_script("return arguments[0].childNodes[10].textContent", element).strip() or driver.execute_script("return arguments[0].childNodes[12].textContent", element).strip()
This should return you name with non-empty string (the first occurence of non-empty text node)
how about this,
following will give you an input element at index 10 or 12
element = driver.find_element_by_xpath("//td[#class='data']/table//th//*[position()=10 or position()=12]/input")

How can I get text content of multiple elements with Python Selenium?

Here is my code:
def textfinder():
try:
textfinder1 = driver.find_elements_by_class_name("m-b-none").text
except NoSuchElementException:
pass
print("no such element")
print(textfinder1)
It works only when I use find_element. When I use find_elements, it gives me error "list" object has no attribute "text". I understand that it returns a list, but I just don’t know how to "read" it. When I remove .text from the command, I don’t get any error, but some weird data, but I need the text content of the class.
Actually, when you do
text = driver.find_element_by_class_name("m-b-none").text
You will get the first element that is matched, and this element possesses, thanks to Selenium, an attribute whose name is text. A contrario, when you do
matched_elements = driver.find_elements_by_class_name("m-b-none")
^
it will match all corresponding elements. Given that matched_elements is a list, and that this list is a Python-native one, (it is not, for example, a Selenium-modified object which has text as attribute), you will have to iter over it, and iteratively get the text of each element. As follows
texts = []
for matched_element in matched_elements:
text = matched_element.text
texts.append(text)
print(text)
Or if you want to leave your code unchanged as possible, you can do it in one line:
texts = [el.text for el in driver.find_elements_by_class_name("m-b-none")]
You would need to reference each like an element of a list. Something like this:
textfinder1 = driver.find_elements_by_class_name("m-b-none")
for elements in textfinder1:
print elements.text
The method find_elements_by_class_name returns a list, so you should do:
text = ''
textfinder1 = driver.find_elements_by_class_name("m-b-none")
for i in textfinder1:
text += i.text + '\n' # Or whatever the way you want to concatenate your text
print(text)

Categories