Google search company names and websites in Selenium pyhon - python

As a starting videographer I am trying to make a list of companies in a specific area.
So far I was able to get to the results with this code:
search_input = "hovenier Ridderkerk"
PATH = **location chromedriver**
driver = webdriver.Chrome(PATH)
driver.get("https://www.google.com")
cookie_consent = driver.find_element_by_xpath('//*[#id="L2AGLb"]').click()
time.sleep(0.5)
search = driver.find_element_by_xpath('/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input')
time.sleep(0.5)
search.send_keys(search_input)
time.sleep(0.5)
search.send_keys(Keys.RETURN)
time.sleep(0.5)
show_all = driver.find_element_by_xpath('//*[#id="rso"]/div[2]/div/div/div/div/div[6]/div/g-more-link/a/div').click()
time.sleep(0.5)
however after this I am a bit stuck, it appears that the names and websites of these companies are in a lot of classes and I cant figure out over which element I should do an iteration to get all the company names and websites in the list.
*edit: I added the manual steps that I would like it to do.
go to google.com
search for 'hoveniers Ridderkerk' (garden companies)
show all results
make a list with all the company names and website addresses*
Can anyone guide me in the correct direction? perhaps also explaining how they know they should use that specific element, so next time I can do it myself?
Thank you!

Related

How to return the link to the first Youtube video after a search in Selenium Python?

I am trying to create a function that takes a search term from a list and add it to a list of links.
formatted_link = link_to_vid.replace(' ', '+')
driver.get('https://www.youtube.com/results?search_query={}'.format(str(formatted_link)))
Then extracts the first link that YouTube returns. For example, I search 'Never Gonna Give You Up' and it adds the link of the first result to a list.
Then I may want to do ['Never Gonna Give You Up', 'Shrek']
How would I do this without actually clicking on the link?
I hope, I got your question right, here is an example:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.implicitly_wait(5)
# Let's create a list of terms to search for and an empty list for links
search_terms = ['Never Gonna Give You Up', 'Shrek']
links = []
for term in search_terms:
# Open YouTube page for each search term
driver.get('https://www.youtube.com/results?search_query={}'.format(term))
# Find a first webelement with video thumbnail on the page
link_webelement = driver.find_element(By.CSS_SELECTOR, 'div#contents ytd-item-section-renderer>div#contents a#thumbnail')
# Grab webelement's href, this is your link, and store in a list of links:
links += [link_webelement.get_attribute('href')]
print(links)
Hope this helps. Keep in mind, that for scraping data like this, you don't have to open webpages and use selenium, there are libraries like Beautiful Soup that help you do this kind of things faster.

Selenium not finding list of sections with classes?

I am attempting to get a list of games on
https://www.xbox.com/en-US/live/gold#gameswithgold
According to Firefox's dev console, it seems that I found the correct class: https://i.imgur.com/M6EpVDg.png
In fact, since there are 3 games, I am supposed to get a list of 3 objects with this code: https://pastebin.com/raw/PEDifvdX (the wait is so Seleium can load the page)
But in fact, Selenium says it does not exist: https://i.imgur.com/DqsIdk9.png
I do not get what I am doing wrong. I even tried css selectors like this
listOfGames = driver.find_element_by_css_selector("section.m-product-placement-item f-size-medium context-game gameDiv")
Still nothing. What am I doing wrong?
You are trying to get three different games so you need to give different element path or you can use some sort of loop like this one
i = 1
while i < 4:
link = f"//*[#id='ContentBlockList_11']/div[2]/section[{i}]/a/div/h3"
listGames = str(driver.find_element_by_xpath(link).text)
print(listGames)
i += 1
you can use this kind of loop in some places where there is slight different in xpath,css or class
in this way it will loop over web element one by one and get the list of game
as you are trying to get name I think so you need to put .text which will only get you the name nothing else
Another option with a selector that isn't looped over and changed-- also one that's less dependent on the page structure and a little easier to read:
//a[starts-with(#data-loc-link,'keyLinknowgame')]//h3
Here's sample code:
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
driver = webdriver.Chrome()
url = f"https://www.xbox.com/en-US/live/gold#gameswithgold"
driver.get(url)
driver.implicitly_wait(10)
listOfGames = driver.find_elements_by_xpath("//a[starts-with(#data-loc-link,'keyLinknowgame')]//h3")
for game in listOfGames:
try:
print(game.text)
except StaleElementReferenceException:
pass
If you're after more than just the title, remove the //h3 selection:
//a[starts-with(#data-loc-link,'keyLinknowgame')]
And add whatever additional Xpath you want to narrow things down to the content/elements that you're after.

How to loop through these pages and scrape information from each one?

I am new to programming and need some help with my web-crawler.
At the moment, I have my code opening up every web-page in the list. However, I wish to extract information from each one it loads. This is what I have.
from selenium import webdriver;
import csv;
driver = webdriver.Firefox();
driver.get("https://www.betexplorer.com/baseball/usa/mlb-2018/results/?
stage=KvfZSOKj&month=all")
links_code : list = driver.find_elements_by_xpath('//a[#class="in-match"]');
first_two : list = links_code[0:2];
first_two_links : list = [];
i : int;
for i in first_two:
link = i.get_attribute("href");
first_two_links.append(link);
odds : list = [];
i :int;
for i in first_two_links:
driver.get(i);
o = driver.find_element_by_xpath('//span[#class="table-main__detail-
odds--hasarchive"]');
odds.append(o);
**Error:** NoSuchElementException: Message: Unable to locate element:
//span[#class="table-main__detail- odds--hasarchive"]
I am just looking to scrape the first two links at the moment so that it is easier to manage. However, I can't seem to figure out a way around this error.
It seems to me as if the error indicates that it is searching the x_path in the home page, rather than the link it follows.
Any help is appreciated.

Web scraping in Selenium in Python - find elements via xpath or id return empty list

So I am trying to scrape a list of email addresses from my User Explorer page in Google Analytics.
which
I obtained the x-path via here
The item's X-path is //*[#id="ID-explorer-table-dataTable-key-0-0"]/div
But no matter how I do:
driver.find_elements_by_xpath(`//*[#id="ID-explorer-table-dataTable-key-0-0"]/div`)
or
driver.find_elements_by_xpath('//*[#id="ID-reportContainer"]')
or
driver.find_elements_by_id(r"ID-explorer-table-dataTable-key-0-0")
it returns an empty list.
Can anyone tell me where I have gone wrong?
I also tried using:
html = driver.page_source
but of course I couldnt find the list of the emails as well.
I am also thinking, if this doesnt work, whether there is a way I can automate control + a and copy all the text displayed into a string in Python and then usere.findall() to find the email addresses?
email = driver.find_element_by_xpath(//*[#id="ID-explorer-table-dataTable-key-0-0"]/div)
print("email", email.get_attribute("innerHTML"))
Thanks for the help of #Guy!
It was something related to iframe and this worked and detected which frame the item i need belong to:
iframelist=driver.find_elements_by_tag_name('iframe')
for i in range(len(iframelist)):
driver.switch_to.frame(iframelist[i])
if len(driver.find_elements_by_xpath('//*[#id="ID-explorer-table-dataTable-key-0-0"]/div'))!=0:
print('it is item {}'.format(i))
break
else:
driver.switch_to.default_content()

Selenium Visible, Non Visible Elements (Drop Down)

I am trying to select all elements of a dropdown.
The site I am testing on is: http://jenner.com/people
The dropdown(checkbox list) I am trying to access is the "locations" list.
I am using Python. I am getting the following error: Message: u'Element is not currently visible and so may not be interacted with'
The code I am using is:
from selenium import webdriver
url = "http://jenner.com/people"
driver = webdriver.Firefox()
driver.get(url)
page = driver.page_source
element = driver.find_element_by_xpath("//div[#class='filter offices']")
elements = element.find_elements_by_tag_name("input")
counter = 0
while counter <= len(elements) -1:
driver.get(url)
element = driver.find_element_by_xpath("//div[#class='filter offices']")
elements1 = element.find_elements_by_tag_name("input")
elements1[counter].click()
counter = counter + 1
I have tried a few variations, including clicking the initial element before clicking on the dropdown options, that didnt work. Any ideas on how to make elements visible in Selenium. I have spent the last few hours searching for an answer online. I have seen a few posts regarding moving the mouse in Selenium, but havent found a solution that works for me yet.
Thanks a lot.
As input check-boxes are not visible at initial state,they get visible after click on "filter offices" option.Also there is change in class name changes from "filter offices" to "filter offices open",if you have observed in firebug.Below code works for me but it is in Java.But you can figure out python as it contain really basic code.
driver.get("http://jenner.com/people");
driver.findElement(By.xpath("//div[#class='filter offices']/div")).click();
Thread.sleep(2000L);
WebElement element = driver.findElement(By.xpath("//div[#class='filter offices open']"));
Thread.sleep(2000L);
List <WebElement> elements = element.findElements(By.tagName("input"));
for(int i=0;i<=elements.size()-1;i++)
{
elements.get(i).click();
Thread.sleep(2000L);
elements = element.findElements(By.tagName("input"));
}
I know this is an old question, but I came across it when looking for other information. I don't know if you were doing QA on the site to see if the proper cities were showing in the drop down, or if you were actually interacting with the site to get the list of people who should be at each location. (Side note: selecting a location then un-selecting it returns 0 results if you don't reset the filter - possibly not desired behavior.)
If you were trying to get a list of users at each location on this site, I would think it easier to not use Selenium. Here is a pretty simple solution to pull the people from the first city "Chicago." Of course, you could make a list of the cities that you are supposed to look for and sub them into the "data" variable by looping through the list.
import requests
from bs4 import BeautifulSoup
url = 'http://jenner.com/people/search'
data = 'utf8=%E2%9C%93&authenticity_token=%2BayQ8%2FyDPAtNNlHRn15Fi9w9OgXS12eNe8RZ8saTLmU%3D&search_scope=full_name' \
'&search%5Bfull_name%5D=&search%5Boffices%5D%5B%5D=Chicago'
r = requests.post(url, data=data)
soup = BeautifulSoup(r.content)
people_results = soup.find_all('div', attrs={'class': 'name'})
for p in people_results:
print p.text

Categories