Python Selenium - Find Input Field after Label - python

i need to find an Input field after an Label. Id and Name are dynamic and changes every Login.
HTML:
<label for="EveryTimeDifferent">LabelText</label>
<div>
<div>
<input name="EveryTimeDifferent" id="EveryTimeDifferent">
</div>
</div>
Python:
driver.find_element_by_xpath("//label[text()='LabelText']//following::input[1]")

well first of you'll need to find something that stays the same each session on the given website. I assume something stays the same in the said label in the given example. You could use BeautifulSoup and find the label with bs4's HTML parsing functions (You'll need to create a local string representation of the HTML code from the website for the BeautifulSoup() instance to be created).
That'd be something like:
from bs4 import BeautifulSoup
soup = BeautifulSoup(HTML_SRC_STR, 'html.parser')
soup.find('<your_label_tag'>' ...)
The docs of bs4 will explain it way better: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Related

Is there a specific way of retreiving only the required information from an HTML tree? Example included

I am using python3.8 and BeautfiulSoup 4 to parse a website. The section I want to read is here:
<h1 class="pr-new-br">
Rotring
<span> 0.7 Imza Uçlu Kurşun Versatil Kalem 37.28.221.368 </span>
</h1>
I find this from the website using this code and get the text from it (soup is the variable for the BeautifulSoup object from the website):
product_name_text = soup.select("h1.pr_new_br")[0].get_text()
However, this ofcourse return me all of the text. I want to seperate the text between the <a href> and the text between <span>.
How can I do this? How can I specifically for for a tag or a link in a href?
Thank you very much in advance, I am pretty new in the field, sorry if this is very basic.
get_text method has a parameter to split different elements' text.
As an example:
product_name_text = soup.select("h1.pr_new_br")[0].get_text('|')
# You will get -> Rotring|0.7 Imza Uçlu Kurşun Versatil Kalem 37.28.221.368
# Then you can split with same symbol and you would have list of different el's texts

Selenium Missing text value

I am using python & beautifulsoup to extract the data, but for some reason it is doesn't see the text value.
HTML page has:
<div ng-if="!order.hidePrices" style="white-space: nowrap;" class="ng-binding ng-scope">$ 1,599.00</div>
Python code:
for price in prices:
price_value = price.find('div', {"class":"ng-binding", "class":"ng-scope"})
print(price_value)
And python output is missing text value:
<div class="ng-binding ng-scope" ng-if="!order.hidePrices" style="white-space: nowrap;"></div>
The HTML file has no other class with that name. Where am I doing wrong?
There are several issues here:
Your locator seems to be not unique.
price_value here is a web element. To retrieve it's text you shoul apply .text on it
You are using BS, not Selenium here.
See if this will work
for price in prices:
price_value = price.find('div', {"class":"ng-binding", "class":"ng-scope"})
print(price_value.get_text())
If you have a running driver instance, the do this :
print(driver.find_element_by_css_selector("div.ng-binding.ng-scope"").text)
class name with spaces have no support in Selenium, so try to use css selector.
update :
for div in soup.select('div.ng-binding.ng-scope'):
print(div.text)
also you can do :
print(soup.find_all(attrs={'class': 'ng-binding'}, {'class': 'ng-scope'}))
Update 1:
See here what official docs says.
html.parser Advantages
Batteries included
Decent speed
Lenient (As of Python 2.7.3 and 3.2.)
html.parser Disadvantages
Not as fast as lxml, less lenient than html5lib.
Read here for more detailed docs
An HTMLParser instance is fed HTML data and calls handler methods when
start tags, end tags, text, comments, and other markup elements are
encountered. The user should subclass HTMLParser and override its
methods to implement the desired behavior.

Python getting the part after div class

Here is what I have
<div class="investor-item" usrid="75500">
<div class="row">
<div class="col-sm-3">
<div class="number">10,000€</div>
<div class="date">03 December 2018</div>
</div>
I would like to scrape "75500", but have no clue how to do that.
When I use
soup.findAll('div',{"class":"investor-item"})
it does not capture what I want. Do you have any suggestion?
There are a number of ways you could capture this. Your command worked for me. Though since you have a Euro sign in there, you may want to make sure your script is using the right encoding. Also, remember that find_all will return a list, not just the first matching item.
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
html = """
<div class="investor-item" usrid="75500">
<div class="row">
<div class="col-sm-3">
<div class="number">10,000€</div>
<div class="date">03 December 2018</div>
</div>
"""
soup = BeautifulSoup(html, 'lxml')
mytag = soup.find('div', {"class": "investor-item"})
mytag2 = soup.find('div', class_="investor-item")
mytag3 = soup.find_all('div', class_="investor-item")[0]
mytag4 = soup.findAll('div', class_="investor-item")[0]
mytag5 = soup.findAll('div',{"class":"investor-item"})[0]
print(mytag['usrid']) # Returns 75500
print(mytag2['usrid']) # Also returns 75500
print(mytag3['usrid']) # Also returns 75500
print(mytag4['usrid']) # Also returns 75500
print(mytag5['usrid']) # Also returns 75500
EDIT: Here are some more details on the 5 different examples I gave.
The typical naming convention for Python functions is to use all lowercase and underscores, whereas some other languages use camel case. So although although find_all() is more of the "official" way to do this in BeautifulSoup with Python, and findAll is something you'd see in BeautifulSoup for other languages, Python seems to accept it too.
As mentioned, find_all returns a list whereas find returns the
first match, so doing a find_all and taking the first element
([0]) gives the same result.
Finally, {"class": "investor-item"} is an example of the general way you can specify attributes beyond just the HTML tag name itself. You just pass in the additional parameters in a dictionary like this. But since class is such a common attribute to look for in a tag, BeautifulSoup gives you the option of not having to use a dictionary and instead typing class_= followed by a string of the class name you're looking for. The reason for that underscore is so that Python doesn't confuse it with class, the Python command to create a Python class in your code.

get div from HTML with Python

I want to get a value inside certain div from a HTML page
<div class="well credit">
<div class="span2">
<h3><span>
$ 5.402
</span></h3>
</div>
</div>
I've done it with regular expressions ( re.seach() ) but it take too long to find the div since it's a huge html.
Is there a way to do this faster but with no external libraries?
Thanks
I would use BeautifulSoup!
to get everything with <div> tag just do:
soup = BeautifulSoup(html)#make soup that is parse-able by bs
soup.findAll('div')
to get the value inside of span you could do:
soup.find('span').get_text()
there are tons of differnt methods of getting the informaton you need
Good Luck hope this helps!
Python has only one HTML parser in the standard library and it's pretty low-level, so you'll have to install some sort of HTML parsing library if you want to work with HTML.
lxml is by far the fastest:
import lxml.html
root = lxml.html.parse(handle)
price = root.xpath('//div[#class="well credit"]//span/#text')[0]
If you want it to be even faster, use root.iter and stop parsing the HTML once you hit the right element.
Scrapy might also be a solution for this. Please read http://doc.scrapy.org/en/latest/topics/selectors.html
x = sel.xpath('//div[#class="span2"]')
for i in x:
print i.extract()
Output:
<div class="span2">
<h3><span>
$ 5.402
</span></h3>
</div>

Beautiful Soup - Find identified tag in the original text

I need to manipulate certain text in an HTML document after I have identified the text in the original document. Let's say I have this HTML code
<div id="identifier">
<a href="link" id="linkid">
</a>
</div>
I want to delete the id attribute in the <a> tag. I can identify a particular tag using BeautifulSoup, but because it changes the formatting of the original document I can't search/replace the string either. I don't want to just write output of BeautifulSoup, instead I want to identify <a href="link" id="linkid"> tag in the original document and replace with just <a href="link">. Any idea how to proceed?
Answering a few questions raised:
This is a huge existing codebase that needs some updation, so it's not just a single search/replace kind of job.
The original formatting is important because the organization follows a certain coding standards for formatting code, which I want to retain. Also, BS introduces extra tags for the sake of completeness like for and so on.
Which version of beautifulsoup are you using?
You can edit html nodes like dictionaries in bs4
From documentation:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#changing-tag-names-and-attributes
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')
tag = soup.b
del tag['class']
del tag['id']
Also, you seem to have a problem with the way beautiful soup output the modified html code.
If you want to pretty print your document, or use custom formatting, you can do it easily
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#output

Categories