Getting access to html element in a twitter iframe, without src property - python

I have been using Python with BeautifulSoup 4 to scrape the data out of unglobal website. Some companies over there, like this one: https://www.unglobalcompact.org/what-is-gc/participants/2968-Orsted-A-S
have twitter accounts. I would like to access the names of the twitter accounts. Problem is that it is inside of an iframe without a src property. I know that iframe is called by a different request than the rest of the website, but I wonder now if it is even possible to acess it without src property visible?

You can use selenium to do this. Here is the full code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "https://www.unglobalcompact.org/what-is-gc/participants/2968-Orsted-A-S "
driver = webdriver.Chrome()
driver.get(url)
iframe = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[#id="twitter-widget-0"]')))
driver.switch_to.frame(iframe)
names = driver.find_elements_by_xpath('//*[#class="TweetAuthor-name Identity-name customisable-highlight"]')
names = [name.text for name in names]
try:
name = max(set(names), key=names.count) #Finds the most frequently occurring name. This is because the same author has also retweeted tweets made by others. These retweets would contain the name of other people. The most frequently occurring name is the name of the author.
print(name)
except ValueError:
print("No Twitter Feed Found!")
driver.close()
Output:
Ørsted

Related

Selenium cant find element by id, yet it exists

So basically my script goes to the product-creation page of my shop, then logs in, after redirect it should put in the product title.
For that i wanted to use selenium, as this shop system has no useful API features for me.
the code that does the trick is following:
from selenium import webdriver
import time
browser = webdriver.Firefox()
url3 = 'https://mywebsite.net/admin#/sw/product/create/base'
browser.get(url3)
swu = 'admin'
swp = 'password'
browser.find_element_by_id('sw-field--username').send_keys(swu)
browser.find_element_by_id(
'sw-field--password').send_keys(swp)
browser.find_element_by_class_name('sw-button__content').click()
time.sleep(5)
browser.find_element_by_id(
'sw-field--product-name').send_keys('dsdsdsdssds')
However, my script perfectly recognizes the admin and password field by id, but after login and redirect it cant recognize the product title field.
The shop system is shopware 6
As you haven't provided any HTML to work with (or a reprex), I can't give an answer specific to your use case. However, the solution is likely to be that you need to use Selenium's expected conditions:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://mywebsite.net/admin#/sw/product/create/base")
...
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()
Another notable cause of the element not being visible to selenium is when it's contained within an iframe. If that's the case, you'll need to use switch_to_frame as described in the documentation.

Need help using Selenium-Chromedriver and Python3, browser automation

I would like to print each name of every merchant on this page. I tried this:
browser.get('https://www.trovaprezzi.it/televisori-lcd-plasma/prezzi-scheda-prodotto/lg_oled_cx3?sort=prezzo_totale')
Names = browser.find_elements_by_xpath("//span[#class='merchant_name']")
for span in Names:
print(span.text)
However, when I run the code, it prints an huge empty space without any word.
1 You need to get alt attribute to get a seller name
2 You need to use waits.
3 Check your indentation when you print a list values.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
browser = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
browser.get('https://www.trovaprezzi.it/televisori-lcd-plasma/prezzi-scheda-prodotto/lg_oled_cx3?sort=prezzo_totale')
wait = WebDriverWait(browser, 10)
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".merchant_name_and_logo img")))
names = browser.find_elements_by_css_selector(".merchant_name_and_logo img")
for span in names:
print(span.get_attribute("alt"))
Prints:
Climaconvenienza
Shopdigit
eBay
ePrice
Onlinestore
Shoppyssimo
Prezzo forte
eBay
eBay
eBay
eBay
eBay
eBay
Yeppon
Showprice
Galagross
Sfera Ufficio
Climaconvenienza
Di Lella Shop
Shopdigit
Instead of span.text please try getting the "value" attribute there
Names = browser.find_elements_by_xpath("//span[#class='merchant_name']")
for span in Names:
print(span..get_attribute("value"))
Also, don't forget adding some wait / delay before
Names = browser.find_elements_by_xpath("//span[#class='merchant_name']")

Selenium wrong selectors leading no no output

I'm trying to scrape this website
Best Western Mornington Hotel
for the name of hotel rooms and the price of said room. I'm using Selenium to try and scrape this data but I keep on getting no return after what I assume is me using the wrong selectors/XPATH. Is there any method of identifying the correct XPATH/div class/selector? I feel like I have selected the correct ones but there is no output.
from re import sub
from decimal import Decimal
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
seleniumurl = 'https://www.bestwestern.co.uk/hotels/best-western-mornington-hotel-london-hyde-park-83187/in-2021-06-03/out-2021-06-05/adults-1/children-0/rooms-1'
driver = webdriver.Chrome(executable_path='C:\\Users\\Conor\\Desktop\\diss\\chromedriver.exe')
driver.get(seleniumurl)
time.sleep(5)
working = driver.find_elements_by_class_name('room-type-block')
for work in working:
name = work.find_elements_by_xpath('.//div/h4').string
price = work.find_elements_by_xpath('.//div[2]/div[2]/div/div[1]/div/div[3]/div/div[1]/div/div[2]/div[1]/div[2]/div[1]/div[1]/span[2]').string
print(name,price)
I only work with Selenium in Java, but from I can see you're trying to get collection of WebElements and invoke toString() on them...
should be that find_element_by_xpath to get just one WebElement and then call .text instead of .string?
Marek is right use .text instead of .string. Or use .get_attribute("innerHTML"). I also think your xpath may be wrong unless I'm looking at the wrong page. Here are some xpaths from the page you linked.
#This will get all the room type sections.
roomTypes = driver.find_elements_by_xpath("//div[contains(#class,'room-type-box__content')]")
#This will get the room type titles
roomTypes.find_elements_by_xpath("//div[contains(#class,'room-type-title')]/h3")
#Print out room type titles
for r in roomTypes:
print(r.text)
Please use this selector div#rr_wrp div.room-type-block and .visibility_of_all_elements_located method for get category div list.
With the above selector, you can search title by this xpath: .//h2[#class="room-type--title"], sub category by .//strong[#class="trimmedTitle rt-item--title"] and price .//div[#class="rt-rate-right--row group"]//span[#data-bind="text: priceText"].
And please try the following code with zip loop to extract parallel list:
driver = webdriver.Chrome(executable_path='C:\\Users\\Conor\\Desktop\\diss\\chromedriver.exe')
driver.get('https://www.bestwestern.co.uk/hotels/best-western-mornington-hotel-london-hyde-park-83187/in-2021-06-03/out-2021-06-05/adults-1/children-0/rooms-1')
wait = WebDriverWait(driver, 20)
elements = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'div#rr_wrp div.room-type-block')))
for element in elements:
for room_title in element.find_elements_by_xpath('.//h2[#class="room-type--title"]'):
print("Main Title ==>> " +room_title.text)
for room_type, room_price in zip(element.find_elements_by_xpath('.//strong[#class="trimmedTitle rt-item--title"]'), element.find_elements_by_xpath('.//div[#class="rt-rate-right--row group"]//span[#data-bind="text: priceText"]')) :
print(room_type.text +" " +room_price.text)
driver.quit()

How can I web scrape information from a website that has all the tags in the <pre>preformatted tag section?

I am creating a python crawler that scrapes information from the Interpol website. I was successfully able to scrape information from the first page like names of people, date of birth, nationality etc. In order to scrape information from the second page, I first got the URL from tag and clicked on the link using my program. When I went to the URL, I found out that all the information(meaning all the tags) were in the < pre > tag section. I am confused about why that is the case. So my question is how can I get information from inside the pre-tag section where all the other tags are. I am trying to get names of people, birthdays, their corresponding links, etc. I am using selenium btw. I will put down the URL of the website. And the URL of the second page that I found in the tag. I hope that helps you guys understand what I am talking about.
Main Website:
https://www.interpol.int/en/How-we-work/Notices/View-Red-Notices
The second-page link I found in the tag:
https://ws-public.interpol.int/notices/v1/red?resultPerPage=20&page=2
The code for the problem I have so far will be posted down below:
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.interpol.int/en/How-we-work/Notices/View-Red-Notices'
driver = webdriver.Chrome(executable_path="c:\\SeliniumWebDrivers\\chromedriver.exe")
driver.get(url) //to go the website
url = [] //to get all the URLs of the people
names = [] //to get the names of the peoples
age = [] //to get the age of the people
nationality = [] //to get the nationality of the people
newwindow = [] //to get all the next page links
y = 0
g = 1
try:
driver.get(driver.current_url)
main = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, 'noticesResultsItemList'))
)
links = main.find_elements_by_tag_name("a")
years = main.find_elements_by_class_name("age")
borns = main.find_elements_by_class_name("nationalities")
for link in links:
newurl = link.get_attribute('href')
url.append(newurl)
names.append(link.text) //adding the names
y += 1
for year in years:
age.append(year.text) //adding the age to list
for nation in borns:
nationality.append(nation.text) //adding the nationality to list
driver.get(driver.current_url)
driver.refresh()
next = WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.ID, 'paginationPanel'))
)
pages = next.find_elements_by_tag_name("a")
for page in pages:
newlink = page.get_attribute('href')
newwindow.append(newlink)
#to get to the next page
print(newwindow[2])
driver.get(newwindow[2])
````
you can use selenium to click next page instead of getting the url. This is a just a simple ,you may need to use a loop and extract data and click next page. I've use variable browser instead of main.I've written a function and used a for loop to get the data from each page
from selenium import webdriver
import time
from selenium.common.exceptions import NoSuchElementException,ElementNotInteractableException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser = webdriver.Chrome('/home/cam/Downloads/chromedriver')
url='https://www.interpol.int/en/How-we-work/Notices/View-Red-Notices'
browser.get(url)
def get_data():
links = browser.find_elements_by_tag_name("a")
years = browser.find_elements_by_class_name("age")
borns = browser.find_elements_by_class_name("nationalities")
time.sleep(5)
try:
browser.find_element_by_xpath('//*[#id="privacy-cookie-banner__privacy-accept"]').click()
except ElementNotInteractableException:
pass
for i in range(1,9):
print(i)
get_data()
print('//*[#id="paginationPanel"]/div/div/ul/li['+str(i+2)+']/a')
b=WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="paginationPanel"]/div/div/ul/li['+str(i+2)+']/a')))
b.click()
time.sleep(10)

Having trouble finding an element on a page. Selenium, python

Scraping this page here. I am trying to get the mail icon in the names. I have tried many things but cannot seem to click/find it. Some help please?
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='C:/chromedriver.exe')
search_term = input("Enter your search term :")
url = f'https://www.sciencedirect.com/search?qs={search_term}&show=100'
driver.get(url)
driver.maximize_window()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'/html/body/div[3]/div/div/div/button/span'))).click()
divs = driver.find_elements_by_class_name('result-item-content')
links = []
for div in divs:
link = div.find_element_by_tag_name('a')
links.append(link)
links[0].click()
div = driver.find_element_by_id('author-group')
print(div.text[0:])
name_links = div.find_elements_by_tag_name('a')
spans =[]
for name in name_links:
span = name.find_element_by_tag_name('span')
spans.append(span)
for span in spans:
mail = span.find_element_by_class_name('icon icon-envelope')
mail.click()
break
It seems that not every author has that icon, but, even taking that into account, you have a couple of mistakes in the current approach:
you are looking inside each span element of the author group - you don't have to do that
find_element_by_class_name would work with a single class value, not multiple (class is a multi-valued attribute with space being a delimiter between values)
Here is how would I go about this:
from selenium.common.exceptions import NoSuchElementException
author_group = driver.find_element_by_id('author-group')
for author in author_group.find_elements_by_css_selector("a.author"):
try:
given_name = author.find_element_by_css_selector(".given-name").text
surname = author.find_element_by_css_selector(".surname").text
except NoSuchElementException:
print("Could not extract first or last name")
continue
try:
mail_icon = author.find_element_by_css_selector(".icon-envelope")
mail_icon_present = True
except NoSuchElementException:
mail_icon_present = False
print(f"Author {given_name} {surname}. Mail icon present: {mail_icon_present}")
Notes:
note how we iterate over authors, container by container, and then looking for specific properties inside each one
note how we are checking for the presence of the mail icon in a forgiving EAFP manner
the . in before a class value in a CSS selector is a special syntax to match an element by a single class value

Categories