I am working with a web page that needs some automation and having trouble interacting with certain elements due to their structure. Brief example:
<ul>
<li data-title="Search" data-action="search">
<li class="disabled" data-title="Ticket Grid" data-action="ticket-grid">
<li data-title="Create Ticket" data-action="create">
<li data-title="Settings" data-action="settings">
</ul>
I am aware of all the locator strategies like id and name listed here:
http://selenium-python.readthedocs.org/en/latest/locating-elements.html
However, is there a way to specify finding something by a custom value like in this example "data-title"?
You can use CSS to select any attribute, this is what the formula looks like:
element[attribute(*|^|$|~)='value']
Per your example, it would be:
li[data-title='Ticket Grid']
(source http://ddavison.io/css/2014/02/18/effective-css-selectors.html)
If there are multiple possibilities it is also worth knowing the following option
from selenium.webdriver import Firefox
driver = Firefox()
driver.get(<your_html>)
li_list = driver.find_elements_by_tag_name('li')
for li in li_list:
if li.get_attribute('data-title') == '<wanted_value>':
<do_your_thing>
You can use:
"//li[#data-title='Ticket Grid']"
Related
I'm fairly new to the web scraping world but I really need to do some web scraping on the Thesaurus website for a project I'm working on. I have successfully created a program using beautifulsoup4 that asks the user for a word, then returns the most likely synonyms based on Thesaurus. However, I would like to not only have those synonyms but also the synonyms of every sense of the word (which is depicted on Thesaurus by a list of buttons above the synonyms). I noticed that when clicking a button, the name of the classes also change, so I did a little digging and decided to go with Selenium instead of beautifulsoup.
I have now a code that writes a word on the search bar and clicks it, however, I'm unable to get the synonyms or the said buttons, simply because the find_element finds nothing, and being new to this, I'm afraid I'm using the wrong syntax.
This is my code at the moment (it looks for synonyms of "good"):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
PATH = "C:\Program Files (x86)\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://thesaurus.com")
search = driver.find_element_by_id("searchbar_input")
search.send_keys('good')
search.send_keys(Keys.RETURN)
try:
headword = WebDriverWait(driver,10).until(
EC.presence_of_element_located((By.ID, "headword"))
)
print(headword.text)
#buttons = headword.find_element_by_class_name("css-bjn8wh e1br8a1p0")
#print(buttons.text)
meanings = WebDriverWait(driver,10).until(
EC.presence_of_element_located((By.ID, "meanings"))
)
print(meanings.text)
#words = meanings.find_elements_by_class_name("css-1kg1yv8 eh475bn0")
#print(words.text)
except:
print('failed')
driver.quit()
For the first part, I want to access the buttons. The headword is simply the element that contains all the buttons I want to press. This is the headword element according to the inspect tool:
<div id="headword" class="css-bjn8wh e1br8a1p0">
<div class="css-vw3jp5 e1ibdjtj4">
*unecessary stuff*
<div class="css-bjn8wh e1br8a1p0">
<div class="postab-container css-cthfds ew5makj3">
<ul class="css-gap396 ew5makj2">
<li data-test-pos-tab="true" class="active-postab css-kgfkmr ew5makj4">
<a class="css-sc11zf ew5makj1">
<em class="css-1v93s5a ew5makj0">adj.</em>
<strong>pleasant, fine</strong>
</a>
</li>
<li data-test-pos-tab="true" class=" css-1ha4k0a ew5makj4">
*similar stuff*
<li data-test-pos-tab="true" class=" css-1ha4k0a ew5makj4">
...
where each one these <li data-test-pos-tab="true" class=" css-1ha4k0a ew5makj4"> is a button I want to click. So far I have tried a bunch of things like the one showed in the code, and also things like:
buttons = headword.find_elements_by_class_name("css-1ha4k0a ew5makj4")
buttons = headword.find_elements_by_css_selector("css-1ha4k0a ew5makj4")
buttons = headword.find_elements_by_class_name("postab-container css-cthfds ew5makj3")
buttons = headword.find_elements_by_css_selector("postab-container css-cthfds ew5makj3")
but in any case Selenium can find these elements.
For the second part I want the synonyms. Here is the meaning element:
<div id="meanings" class="css-16lv1yi e1qo4u831">
<div class="css-1f3egm3 efhksxz0">
*unecessary stuff*
<div data-testid="word-grid-container" class="css-ixatld e1cc71bi0">
<ul class="css-1ngwve3 e1ccqdb60">
<li>
<a font-weight="inherit" href="/browse/acceptable" data-linkid="nn1ov4" class="css-1kg1yv8 eh475bn0">
</a>
</li>
<li>
<a font-weight="inherit" href="/browse/bad" data-linkid="nn1ov4" class="css-1kg1yv8 eh475bn0">
...
where each of these elements is a synonym I want to get. Similarly to the previous case I tried several things such as:
synGrid = meanings.find_element_by_class_name("css-ixatld e1cc71bi0")
synGrid = meanings.find_element_by_css_selector("css-ixatld e1cc71bi0")
words = meanings.find_elements_by_class_name("css-1kg1yv8 eh475bn0")
words = meanings.find_elements_by_css_selector("css-1kg1yv8 eh475bn0")
And again Selenium cannot find these elements...
I would really appreciate some help in order to achieve this, even if it is just a push in the right direction instead of giving a full solution.
Hope I wrote all the needed information, if not, please let me know.
If you use css selector then you have to use dot for class
css_selector(".css-ixatld.e1cc71bi0")
and hash for id
css_selector("#headword")
like you would use in files .css
In css selector you can use also other methods avaliable in CSS.
See css selectors on w3schools.com
Selenium converts class_name to css selector but class_name() expects single name and Selenium has problems when there are two or more names. When it converts class_name to css_selector then it adds dot only before first name but it needs dot also before second and other names. So you have to manually add second dot
class_name("css-ixatld.e1cc71bi0")
See if this works:
meanings = driver.find_elements_by_xpath(".//div[#id='meanings']/div[#data-testid='word-grid-container']/ul/li")
for e in meanings:
e.find_element_by_tag_name("a").click()
//Add a implicit wait if you need
driver.back()
I am trying to select the title of posts that are loaded in a webpage by integrating multiple css selectors. See below my process:
Load relevant libraries
import time
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
Then load the content I wish to analyse
options = Options()
options.set_preference("dom.push.enabled", False)
browser = webdriver.Firefox(options=options)
browser.get("https://medium.com/search")
browser.find_element_by_xpath("//input[#type='search']").send_keys("international development",Keys.ENTER)
time.sleep(5)
scrolls = 2
while True:
scrolls -= 1
browser.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(5)
if scrolls < 0:
break
Then to get the content for each selector separately, call for css_selector
titles=browser.find_elements_by_css_selector("h3[class^='graf']")
TitlesList = []
for names in titles:
names.text
TitlesList.append(names.text)
times=browser.find_elements_by_css_selector("time[datetime^='2016']")
Times = []
for names in times:
names.text
Times.append(names.text)
It all works so far...Now trying to bring them together with the aim to identify only choices from 2016
choices = browser.find_elements_by_css_selector("time[datetime^='2016'] and h3[class^='graf']")
browser.quit()
On this last snippet, I always get an empty list.
So I wonder 1) How can I select multiple elements by considering different css_selector as conditions for selection at the same time 2) if the syntax to find under multiple conditions would be the same to link elements by using different approaches like css_selector or x_paths and 3) if there is a way to get the text for elements identified by calling for multiple css selectors along a similar line of what below:
[pair.text for pair in browser.find_elements_by_css_selector("h3[class^='graf']") if pair.text]
Thanks
Firstly, I think what you're trying to do is to get any title that has time posted in 2016 right?
You're using CSS selector "time[datetime^='2016'] and h3[class^='graf']", but this will not work because its syntax is not valid (and is not valid). Plus, these are 2 different elements, CSS selector can only find 1 element. In your case, to add a condition from another element, use a common element like a parent element or something.
I've checked the site, here's the HTML that you need to take a look at (if you're trying to the title that published in 2016). This is the minimal HTML part that can help you identify what you need to get.
<div class="postArticle postArticle--short js-postArticle js-trackPostPresentation" data-post-id="d17220aecaa8"
data-source="search_post---------2">
<div class="u-clearfix u-marginBottom15 u-paddingTop5">
<div class="postMetaInline u-floatLeft u-sm-maxWidthFullWidth">
<div class="u-flexCenter">
<div class="postMetaInline postMetaInline-authorLockup ui-captionStrong u-flex1 u-noWrapWithEllipsis">
<div
class="ui-caption u-fontSize12 u-baseColor--textNormal u-textColorNormal js-postMetaInlineSupplemental">
<a class="link link--darken"
href="https://provocations.darkmatterlabs.org/reimagining-international-development-for-the-21st-century-d17220aecaa8?source=search_post---------2"
data-action="open-post"
data-action-value="https://provocations.darkmatterlabs.org/reimagining-international-development-for-the-21st-century-d17220aecaa8?source=search_post---------2"
data-action-source="preview-listing">
<time datetime="2016-09-05T13:55:05.811Z">Sep 5, 2016</time>
</a>
</div>
</div>
</div>
</div>
</div>
<div class="postArticle-content">
<a href="https://provocations.darkmatterlabs.org/reimagining-international-development-for-the-21st-century-d17220aecaa8?source=search_post---------2"
data-action="open-post" data-action-source="search_post---------2"
data-action-value="https://provocations.darkmatterlabs.org/reimagining-international-development-for-the-21st-century-d17220aecaa8?source=search_post---------2"
data-action-index="2" data-post-id="d17220aecaa8">
<section class="section section--body section--first section--last">
<div class="section-divider">
<hr class="section-divider">
</div>
<div class="section-content">
<div class="section-inner sectionLayout--insetColumn">
<h3 name="5910" id="5910" class="graf graf--h3 graf--leading graf--title">Reimagining
International Development for the 21st Century.</h3>
</div>
</div>
</section>
</a>
</div>
</div>
Both time and h3 are in a big div with class of postArticle. The article contains time published & the title, so it makes sense to get the whole article div that published in 2016 right?
Using XPATH is much more powerful & easier to write:
This will get all articles div that contains class name of postArticle--short: article_xpath = '//div[contains(#class, "postArticle--short")]'
This will get all time tag that contains class name of 2016: //time[contains(#datetime, "2016")]
Let's combine both of them. I want to get article div that contains a time tag with classname of 2016:
article_2016_xpath = '//div[contains(#class, "postArticle--short")][.//time[contains(#datetime, "2016")]]'
article_element_list = driver.find_elements_by_xpath(article_2016_xpath)
# now let's get the title
for article in article_element_list:
title = article.find_element_by_tag_name("h3").text
I haven't tested the code yet, only the xpath. You might need to adapt the code to work on your side.
By the way, using find_element... is not a good idea, try using explicit wait: https://selenium-python.readthedocs.io/waits.html
This will help you to avoid making stupid time.sleep waits and improve your app performance, and you can handle errors pretty well.
Only use find_element... when you already located the element, and you need to find a child element inside. For example, in this case if I want to find articles, I will find by explicit wait, then after the element is located, I will use find_element... to find child element h3.
I am trying to automate JIRA tasks but struggling to access bulkedit option after JQL filter. After accessing the correct sceen I am stuck at this point:
enter image description here
HTML code:
<div class="aui-list">
<h5>Bulk Change:</h5>
<ul class="aui-list-sectionaui-first aui-last">
<li class="aui-list-item active">
<a class="aui-list-item-link" id="bulkedit_all" href="/secure/views/bulkedit/BulkEdit1!default.jspa?reset=true&tempMax=4">all 4 issue(s)</a>
</li>
</ul>
</div>
My Python code:
bulkDropdown = browser.find_elements_by_xpath("//div[#class='aui-list']//aui-list[#class='aui-list-item.active']").click()
Try the following xpath -
bulkDropdown = browser.find_elements_by_xpath("//li/a[#id='bulkedit_all']").click()
The link you want has an ID, you should use that unless you find that it's not unique on the page.
browser.find_element_by_id("bulkedit_all").click()
You will likely need to add a wait for clickable since from the screenshot it looks like a popup or tooltip of some kind. See the docs for more info on the different waits available.
so I'm currently using python to import data from an excel sheet and then take that information and use it to fill out a form on a webpage.
The problem I'm having is selecting a profile of the drop-down menu.
I've been using the Selenium library and I can actually select the element using find_element_by_xpath assuming, but that's assuming I know the data value, the data value is auto-generated for each new profile that's added so I can't use that as a reliable means.
Profile = Browser.find_element_by_xpath("/html/something/something/.....")
Profile.click()
time.sleep(0.75) #allowing time for link to be clickable
The_Guy = Browser.find_element_by-xpath("/html/something/something/...")
The_Guy.click()
This works only on known paths I would like to do something like this
Profile = Browser.find_element_by_xpath("/html/something/something/.....")
Profile.click()
time.sleep(0.75) #allowing time for link to be clickable
The_Guy = Browser.find_element_by_id("Caption.A")
The_Guy.click()
EXAMPLE OF HTML
<ul class ="list">
<li class = "option" data-value= XXXXX-XXXXX-XXXXX-XX-XXX>
::marker
Thor
</li>
<li class = "option" data-value= XXXXX-XXXXX-XXXXX-XX-XXX>
::marker
IronMan
</li>
<li class = "option" data-value= XXXXX-XXXXX-XXXXX-XX-XXX>
::marker
Caption.A
</li>
....
</ul>
What I'll like to be able to do is search via name (like Caption.A) and then step back to select the parent Li. Thanks in advance
Try using following xpath to find the li containing desired text and then click on it. Sample code:
driver.find_element(By.xpath("//li[contains(text(), 'Caption.A')]")).click()
Hope it helps :)
I'm currently able to find certain elements using the findAll function. Is there a way to navigate to their child?
The code I have is:
data = soup.findAll(id="profile-experience")
print data[0].get_text()
And it returns a block of text (for example, some of the text isn't spaced out properly)
The DOM looks something like this
<div id="profile-experience>
<div class="module-body>
<li class="position">
<li class="position">
<li class="position">
If I just do a findAll on class="position I get way too much crap back. Is there a way using BeautifulSoup to just find the elements that are <li class="position"> that are nested underneath <div id="profile-experience">
I want to do something like this:
data = soup.findAll('li',attrs={'class':'position'})
(Where I'm only getting the nested data)
d in data:
print d.get_text()
Sure, you can "chain" the find* calls:
profile_experience = soup.find(id="profile-experience")
for li in profile_experience.find_all("li", class_="position"):
print(li.get_text())
Or, you can solve it in one go with a CSS selector:
for li in soup.select("#profile-experience li.position"):
print(li.get_text())