How can I get text content of multiple elements with Python Selenium?

How can I get text content of multiple elements with Python Selenium? - python

Here is my code:
def textfinder():
try:
textfinder1 = driver.find_elements_by_class_name("m-b-none").text
except NoSuchElementException:
pass
print("no such element")
print(textfinder1)
It works only when I use find_element. When I use find_elements, it gives me error "list" object has no attribute "text". I understand that it returns a list, but I just don’t know how to "read" it. When I remove .text from the command, I don’t get any error, but some weird data, but I need the text content of the class.

Actually, when you do
text = driver.find_element_by_class_name("m-b-none").text
You will get the first element that is matched, and this element possesses, thanks to Selenium, an attribute whose name is text. A contrario, when you do
matched_elements = driver.find_elements_by_class_name("m-b-none")
^
it will match all corresponding elements. Given that matched_elements is a list, and that this list is a Python-native one, (it is not, for example, a Selenium-modified object which has text as attribute), you will have to iter over it, and iteratively get the text of each element. As follows
texts = []
for matched_element in matched_elements:
text = matched_element.text
texts.append(text)
print(text)
Or if you want to leave your code unchanged as possible, you can do it in one line:
texts = [el.text for el in driver.find_elements_by_class_name("m-b-none")]

You would need to reference each like an element of a list. Something like this:
textfinder1 = driver.find_elements_by_class_name("m-b-none")
for elements in textfinder1:
print elements.text

The method find_elements_by_class_name returns a list, so you should do:
text = ''
textfinder1 = driver.find_elements_by_class_name("m-b-none")
for i in textfinder1:
text += i.text + '\n' # Or whatever the way you want to concatenate your text
print(text)

Related

how to create a list of pdf page numbers if pdf page contains specific text strings using python

I am trying to extract PDF page numbers if the page contains certain strings, and then append the selected page numbers to a list. For example, page 2, 254, 439 and 458 meet the criteria and I'm expecting the output as a list [2,254,439,458]. My code is:
object=PyPDF2.PdfFileReader(file_path)
NumPages = object.getNumPages()
String = 'specific string'
for i in range(0,NumPages):
PageObj=object.getPage(i)
Text = PageObj.extractText()
ReSearch = re.search(String,Text)
Pagelist=[]
if ReSearch != None:
Pagelist.append(i)
print(Pagelist)
I received output as:
[2]
[254]
[439]
[458]
Could someone please take a look and see how I can fix it? Thank you

Right now you are defining a new llst in every iteration, so you have to define the list only once, before the loop. Also print it outside the loop:
Pagelist=[]
for i in range(0,NumPages):
# rest of the loop
print(Pagelist)

How would I convert an entry in my list into a string?

This is my current code:
kwlist = []
for url in soup.find_all('a', href=True):
text = url.get_text()
if keywords in text:
links = (url['href'])
kwlist.append(links)
'|'.join(kwlist)
colorlist = []
for url in soup.find_all('a', href=True):
text = url.get_text()
if color in text:
colorlinks = (url['href'])
colorlist.append(colorlinks)
'|'.join(color)
#finallink = any(x in kwlist for x in colorlist)
#print(finallink)
kwset = set(kwlist)
colorset = set(colorlist)
intersection = str(kwset.intersection(colorset))
print(intersection)
driver.get('https://www.supremenewyork.com/' + intersection)
This is the print output:
{'/shop/tops-sweaters/deshwjqp5/i640qb2pu'}
But, this is the website that selenium is navigating to:
https://www.supremenewyork.com/%7B'/shop/tops-sweaters/deshwjqp5/i640qb2pu'%7D
I need selenium to navigate to:
https://www.supremenewyork.com/shop/tops-sweaters/deshwjqp5/i640qb2pu
That is the same item in my list, just with out %7b' and '%7D
How would I change the list into my desired output?

%7B and %7D are encoded { and } characters from the set after you call str on it.
You need to "unwrap" the value out of the set first. If you only want a single entry from intersection, and you know that it will always contain at least one element, you can convert it to a list, then take an element from it. Something like:
# Convert to list, then take the first element
intersection = str(list(kwset.intersection(colorset))[0])
print(intersection)
driver.get('https://www.supremenewyork.com/' + intersection)
If you can't guarantee kwset.intersection(colorset) will always return at least an element, you'll need to do some error handling, or switch to a different way to "unwrap" the set, like calling a defaulting next on an iterator of the set.

Can't get element SELENIUM PYTHON

I need to get element from page and then print it out
But it always print out this:
[<selenium.webdriver.remote.webelement.WebElement (session="636e500d9db221d6b7b10b8d7849e1b5",
element="4f0ccc3b-44b0-4cf2-abd4-95a70278bf77")>...
My code:
film_player = free_filmy_url + filmPlayerPart
dr = webdriver.Chrome()
dr.get(film_player)
captcha_button = dr.find_element_by_xpath('/html/body/form/div/input[1]')
captcha_items = dr.find_elements_by_class_name('captchaMyImg')
print(captcha_items)

You can iterate through the captcha_items and print each of them.
for captcha_item in captcha_items:
# you can access each item here as "captcha_item"
print(captcha_item.text()) # if you are trying to print the text

Through your lines of code:
captcha_items = dr.find_elements_by_class_name('captchaMyImg')
print(captcha_items)
You are printing the elements.
Rather you must be looking to print an attribute of the elements and you can use the following solution:
print([my_elem.get_attribute("attribute") for my_elem in dr.find_elements_by_class_name('captchaMyImg')])
Note: You need to replace the text attribute with any of the existing attributes for those elements, e.g. src, href, innerHTML, etc.

python + beautiful soup, navigating the parse tree without the aid of attributes?

i'm trying to scrape text from this page:
http://codingbat.com/prob/p187868
specifically, i want to scrape two strings from the page, to combine as the key in a dictionary with the problem statement as value. these are the two parts of the name of the problem (here: 'Warmup-1' and 'sleepin'). however, the strings are contained in different levels of the parse tree and this is creating problems.
abstractly, the problem is this:
i'm trying to scrape text from a parse tree of:
div-->{[a[span'h2'[string1]]], [span'h2'[string2]], some other tags}
since they are both contained in 'span' tags with the attribute class='h2', i can scrape a list of these and then select from the list.
div_nameparts = name_div.find_all('span', class_='h2')
name1 = div_nameparts[0].string
name2 = div_nameparts[1].string
problem_name = name1+' > '+name2
print(problem_name)
but what if those tags didn't share an attribute like they do here ('h2')?
if i try to navigate the parse tree using div.a.string, i can get the first tag - string1. but div.span.string does not return the second value (string2).
name1 = name_div.a.string
name2 = name_div.span.string
instead it again returns the first (string1), apparently navigating to div.a.span (the child of a child) and stopping, before finding its way to div.span (the next child).
and if i try div.a.next_sibling to try to navigate to div.span and get the string with div.span.string,
name1 = name_div.a.string
name2_div = name_div.a.next_sibling
name2 = name2_div.string
it returns an empty string, a value of None.
is there a better/effective way to get to navigate the parse tree to get to these span tags?
thanks in advance!

This'll work as long as the 'greater than' symbol (' > ') with leading and trailing space doesn't appear before the pair of strings you want:
gt = soup.find(text=' > ')
string1 = gt.findPrevious('span').text
string2 = gt.findNext('span').text
print(string1, gt, string2, sep='')
The output:
Warmup-1 > sleepIn

python - sorting strings via xml attribute, .text malforms xml data

#!/usr/bin/env python
import os, sys, os.path
import string
def sort_strings_file(xmlfile,typee):
"""sort all strings within given strings.xml file"""
all_strings = {}
orig_type=typee
# read original file
tree = ET.ElementTree()
tree.parse(xmlfile)
# iter over all strings, stick them into dictionary
for element in list(tree.getroot()):
all_strings[element.attrib['name']] = element.text
# create new root element and add all strings sorted below
newroot = ET.Element("resources")
for key in sorted(all_strings.keys()):
# Check for IDs
if typee == "id":
typee="item"
# set main node type
newstring = ET.SubElement(newroot, typee)
#add id attrib
if orig_type == "id":
newstring.attrib['type']="id"
# continue on
newstring.attrib['name'] = key
newstring.text = all_strings[key]
# write new root element back to xml file
newtree = ET.ElementTree(newroot)
newtree.write(xmlfile, encoding="UTF-8")
This works great and all, but if a string start with like <b> it breaks badly.
EX
<string name="uploading_to"><b>%s</b> Odovzdávanie do</string>
becomes
<string name="uploading_to" />
I've looked into the xml.etree Element class, but it seems to only have .text method. I just need a way to pull everything in between xml tags. No, I can't change the input data. It comes directly from an Android APK ready to be translated, I cannot predict how / what the data comes in besides the fact that it must be valid XML Android code.

I think you are looking for the itertext() method instead. .text only returns text directly contained at the start of the element:
>>> test = ET.fromstring('<elem>Sometext <subelem>more text</subelem> rest</elem>')
>>> test.text
'Sometext '
>>> ''.join(test.itertext())
'Sometext more text rest'
The .itertext() iterator on the other hand let's you find all text contained in the element, including inside nested elements.
If, however, you only want text directly contained in an element, skipping the contained children, you want the combination of .text and the .tail values of each of the children:
>>> (test.text or '') + ''.join(child.tail for child in test.getchildren())
'Sometext middle rest'
If you need to capture everything contained, then you need to do a little more work; capture the .text, and cast each child to text with ElementTree.tostring():
>>> (test.text or '') + ''.join(ET.tostring(child) for child in test.getchildren())
'Sometext <subelem>more text</subelem> middle <subelem>other text</subelem> rest'
ET.tostring() takes the element tail into account. I use (test.text or '') because the .text attribute can be None as well.
You can capture that last method in a function:
def innerxml(elem):
return (elem.text or '') + ''.join(ET.tostring(child) for child in elem.getchildren())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I get text content of multiple elements with Python Selenium? - python

You would need to reference each like an element of a list. Something like this: textfinder1 = driver.find_elements_by_class_name("m-b-none") for elements in textfinder1: print elements.text

The method find_elements_by_class_name returns a list, so you should do: text = '' textfinder1 = driver.find_elements_by_class_name("m-b-none") for i in textfinder1: text += i.text + '\n' # Or whatever the way you want to concatenate your text print(text)

Related

how to create a list of pdf page numbers if pdf page contains specific text strings using python

How would I convert an entry in my list into a string?

Can't get element SELENIUM PYTHON

python + beautiful soup, navigating the parse tree without the aid of attributes?

python - sorting strings via xml attribute, .text malforms xml data

Categories

Resources