How to find child element while having found the parent div? - python

I have a google form that I am webscraping(at least trying to). I want to look through it to find text where it says "Name ", then find the parent div of the whole block that contains that question and look for an input element where I can use send_keys() to fill out my name. I was able to find "Name " and find the parent div.
src = result.content
driver = webdriver.Chrome()
driver.get('https://exampleform.com')
soup = BeautifulSoup(src, 'html.parser')
name = soup.find(string="Name ")
nameBlock = name.find_parents('div', class_="freebirdFormviewerViewNumberedItemContainer", limit = 1)
if soup.find(string="Name "):
print('yes')
#prints yes
if name.find_parents('div', class_="freebirdFormviewerViewNumberedItemContainer", limit = 1):
print('okay')
#prints okay
if nameBlock.find('input'):
print('yup')
#gets error
# Also have tried nameBlock.find(tag = 'input')
# and nameBlock.find_element_by_tag_name('input)
# and name.find_parents('div', class_="freebirdFormviewerViewNumberedItemContainer", limit = 1).find('input)
I have tested out that I have found the parent function by printing out the variable "nameBlock" and was able to get the whole parent div and saw with the input tag there in the console. But then I get this error when I try the use the final "if" statement to see if I was able to locate the input element:
AttributeError: ResultSet object has no attribute 'find_elements_by_tag_name'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
I think it has the parent selected as each child within that parent as well so every child is looking for
nameBlock.find('input')
but I am not really sure how I would go about just selecting the child input element. Any suggestions? Also here is the HTML to better help understand my issue.

AttributeError: ResultSet object has no attribute 'find_elements_by_tag_name'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
The error says that You have used find_elements_by_tag_name() which returns as list not the webelement.
Change this to
find_element_by_tag_name() which will return as webelement and you can now use send_keys() method.

Related

Python Selenium - Xpatch should be an element

I have this in my code:
link_tag = "//div[#class= 'yuRUbf']//a/#href"
With this code I get this error
The result of the xpath expression "//div[#class= 'yuRUbf']//a/#href" is: [object Attr]. It should be an element.
I don't know any other way to scrape the URL from that div class. How can I fix this?
This code works fie on the scraper chrome extension but not on python.
You have to change your XPath to select an element node (as the error message suggests) - and not an attribute node - and, after that, get its attribute. So use
link_tag = "//div[#class= 'yuRUbf']//a"
links = driver.find_elements_by_xpath(link_tag)
and then extract the attribute with
links[0].get_attribute("href")
to get the #href attribute of the first matching element.
This solution might work.
div = driver.find_element(By.CSS_SELECTOR,"div[class='yuRUbf']")
url = div.get_attribute("href")

Get child element using xpath selenium python

We can get a parent of a selenium element using xpath, by
par_element = element.find_element_by_xpath('..')
In a similar fashion, how can we get the child of the element? I tried the following and did not work
child_element = element.find_element_by_xpath('/')
child_element = element.find_element_by_xpath('//')
To get to the child of the WebElement you need to set the context to the current element using the dot character i.e. .. So effectively your line of code can be either of the following:
child_element = element.find_element_by_xpath('./')
or
child_element = element.find_element_by_xpath('.//')

Nonetype error/ No elements printed using beautifulsoup for python

So im trying to compare 2 lists using python, one contains like 1000 links i fetched from a website. The other one contains a few words, that might be contained in a link in the first list. If this is the case, i want to get an output. i printed that first list, it actually works. for example if the link is "https://steamcdn-a.swap.gg/apps/730/icons/econ/stickers/eslkatowice2015/counterlogic.f49adabd6052a558bff3fe09f5a09e0675737936.png" and my list contains the word "eslkatowice2015", i want to get an output using the print() function. My code looks like this:
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
Bot_Stickers = soup.find_all('img', class_='csi')
for sticker in Bot_Stickers:
for i in StickerIDs:
if i in sticker:
print("found")
driver.close()
now the problem is that i dont get an output, which is impossible because if i manually compare the lists, there are clearly elements from the first list existing in the 2nd list (the one with the links). when trying to fix i always got a NoneType error. The driver.page_source is above defined by some selenium i used to access a site and click some javascript stuff, to be able to find everything. I hope its more or less clear what i wanted to reach
Edit: the StickerIDs variable is the 2nd list containing the words i want to be checked
NoneType error means that you might be getting a None somewhere, so it's probably safer to check the results returned by find_all for None.
It's been a while since is used BeautifulSoup, but If I remember correctly, find_all returns a list of beautiful soup tags that match the search criteria, not URLs. You need to get the href attribute from the tag before checking if it contains a keyword.
Something like that:
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
Bot_Stickers = soup.find_all('img', class_='csi')
if Bot_Stickers and StickersIDs:
for sticker in Bot_Stickers:
for i in StickerIDs:
if i in sticker.get("href"): # get href attribute of the img tag
print("found")
else:
print("Bot_Stickers:", Bot_Stickers)
print("StickersIDs:" StickersIDs)
driver.close()

How to grab child attribute using Selenium Webdriver with Python?

I am trying to grab an attribute of a child. I inspect my element in Chrome, and see the following code for it:
<div class="input-wrapper">
<ion-label style="text-overflow: ellipsis; display; block; overflow: hidden;white-space: nowrap;" class="label label-md" id="lbl-80" producttypeid="39553">Item 1</ion-label>
The attribute that I need is "producttypeid".
When I right click on the first row and choose its selector the following command gives me nothing:
browser.find_element_by_css_selector("cssFromChrome").get_attribute("producttypeid")
When I right click on the second row and choose its selector, the selector that I get is #lbl-80 which is not good for me since it is dynamic, and I need something static to grab that element.
What else can I try to grab that attribute?
One of the options is regex match if id will always uniquely start with lbl-
browser.find_element_by_css_selector("ion-label[id^='lbl-']").get_attribute("producttypeid")
Another option is to use the child
browser.find_element_by_css_selector(".input-wrapper > ion-label").get_attribute("producttypeid")
As per the HTML if you want to extract the producttypeid attribute of the child you can write a function as follows :
def get_producttypeid(myString):
print(driver.find_elements_by_xpath("//div[#class='producttypeid']/ion-label[#class='label label-md' and starts-with(#id,'lbl-')][.='" + myString + "']").get_attribute("producttypeid"))
Now, from you can call the get_producttypeid() method with the required item description e.g. Item 1 to retrieve the producttypeid attribute as follows :
get_producttypeid("Item 1");
How about:
element = browser.find_element_by_xpath("//*[contains(#id, 'lbl') and contains(text(), 'Item 1')]").get_attribute("producttypeid")

soup.find("div", id = "tournamentTable"), None returned - python 2.7 - BS 4.5.1

I'm Trying to parse the following page: http://www.oddsportal.com/soccer/france/ligue-1-2015-2016/results/
The part I'm interested in is getting the table along with the scores and odds.
The code I have so far:
url = "http://www.oddsportal.com/soccer/france/ligue-1-2015-2016/results/"
req = requests.get(url, timeout = 9)
soup = BeautifulSoup(req.text)
print soup.find("div", id = "tournamentTable"), soup.find("#tournamentTable")
>>> <div id="tournamentTable"></div> None
Very simple but I'm thus weirdly stuck at finding the table in the tree. Although I found already prepared datasets I would like to know to why the printed strings are a tag and None.
Any ideas?
Thanks
First, this page uses JavaScript to fetch data, if you disable the JS in your browser, you will notice that the div tag exist but nothing in it, so, the first will print a single tag.
Second, # is CSS selector, you can not use it in find()
Any argument that’s not recognized will be turned into a filter on one
of a tag’s attributes.
so, the second find will to find some tag with #tournamentTable as it's attribute, and nothing will be match, so it will return None
It looks like the table gets populated with an Ajax call back to the server. That is why why you print soup.find("div", id = "tournamentTable") you get only the empty tag. When you print soup.find("#tournamentTable"), you get None because that is trying to find a element with the tag "#tournamentTable". If you want to use CSS selectors, you should use soup.select() like this, soup.select('#tournamentTable') or soup.select('div#tournamentTable') if you want to be even more particular.

Categories