Webscraping using selenium

Webscraping using selenium - python

I am very new to Python and I am trying to get all store locations from website of Muthoot. the following is a code i wrote but i am not getting any output. Please let me know what is wrong and what i need to correct.
As i understand, the code is not getting the search button clicked and hence nothing is moving. But how to do that??
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
import pandas as pd
driver= webdriver.Chrome(executable_path="D:\Chromedriverpath\chromedriver_win32\chromedriver.exe")
driver.get("https://www.muthootfinance.com/branch-locator")
#Saving this element in a variable
drp=Select(driver.find_element_by_id("statelist"))
slist=drp.options
for ele in slist:
table=driver.select("table.table")
columns=table.find("thead").find_all("th")
column_names=[c.string for c in columns]
table_rows=table.find("tbody").find_all("tr")
l=[]
for tr in table_rows:
td=tr.find_all('td')
row=[str(tr.get_text()).strip() for tr in td]
l.append(row)
df=pd.DataFrame(l,columns=column_names)
df.head()

I think this will work for you now, I copied your code and it seems to work!
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
import pandas as pd
driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")
driver.get("https://www.muthootfinance.com/branch-locator")
# Saving this element in a variable
html_list = driver.find_element_by_id("state_branch_list")
items = html_list.find_elements_by_tag_name("li")
for item in items:
places = item.text
print(places)
df = pd.DataFrame([places])

Related

How do I scrap a JS popup table with Selenium?

The goal: need to select a option on a dropdown menu then when a list gets pasted below I need to click on each one iteratively and scrap all the given data. Thankfully classes have proper ID names so should be doable but am facing some issues as described below
Can better understand it if you visit the website here www.psx.com.pk/psx/resources-and-tools/listings/listed-companies
Messy code:
chromedriver = "chromedriver.exe"
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.psx.com.pk/psx/resources-and-tools/listings/listed-companies")
select = Select(driver.find_element_by_id("sector"))
for opt in select.options: #this will loop through all the dropdown options from the site
opt.click() #in source code table class gets populated here
table = driver.find_elements_by_class_name("addressbook")
for index in range(len(table)):
# if index % 2 == 0:
elem = table[index].text
print(elem)
elem.click()
data = driver.find_elements_by_class_name("addressbookdata")
print(data)
If you run this code on your end the output is very erratic, if everything work correctly I will get Index/Company names in my table.text variable so thought a quick and dirty solution to just get IDs would be to % 2 the index instead of populating a df first and then dropping the duplicates. After I've gotten all the IDs I need to click on all of them and then extract and append the data from ID addressbookdata into a dataframe whole, I don't think theres any logical problem in my code right now? But I can't make this work, its my first time using selenium as well am much more comfortable with beautifulsoup

I select dropdown table by value and pull table data selenium with pandas
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
driver = webdriver.Chrome(ChromeDriverManager().install())
url = 'https://www.psx.com.pk/psx/resources-and-tools/listings/listed-companies'
driver.get(url)
driver.maximize_window()
wait = WebDriverWait(driver,30)
#select from dropdown pop up option
Select(Wait.until(EC.visibility_of_element_located((By.XPATH, "//select[#id='sector']")))).select_by_value("0801")
dptable = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#class="table-responsive"]'))).get_attribute("outerHTML")
df = pd.read_html(dptable)
print(df)

How to print 2 values at once with selenium and python?

I hope everyone is having a good day. I am trying to extract values from a website and have them print out as a list, but I can't figure out how to do that. I have all the values printing as expecting, just can't figure out how to have them print one after another. I know this is a very basic question, but I can't figure it out. Any advice or information is appreciated! Thank you!
import time
import webbrowser
from os import O_SEQUENTIAL, link
import chromedriver_autoinstaller
from selenium import webdriver as wd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
webdriver = wd.Chrome(executable_path= r"C:\Users\Stephanie\anaconda3\pkgs\python-chromedriver-binary-98.0.4758.48.0-py39hcbf5309_0\Lib\site-packages\chromedriver_binary\chromedriver.exe")
webdriver.implicitly_wait(1)
webdriver.maximize_window()
webdriver.get("https://pcpartpicker.com/user/stephwaters/saved/#view=HgH2xr")
time.sleep(2)
partname = webdriver.find_elements(By.CLASS_NAME, 'td__component')
for part in partname:
print(part.text + ': ')
prices = webdriver.find_elements(By.CLASS_NAME, 'td__price')
for price in prices:
print(price.text)
This is the output:
I would like it to print:
Case: $168.99
Power Supply: $182.00
and so on.

Instead of getting the partnames and prices separately you can iterate over a list of products extracting from each one it's name and price.
Also it's recommended to use Expected Conditions explicit waits, not a hardcoded pauses.
Your code could be something like this:
import time
import webbrowser
from os import O_SEQUENTIAL, link
import chromedriver_autoinstaller
from selenium import webdriver as wd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
webdriver = wd.Chrome(executable_path= r"C:\Users\Stephanie\anaconda3\pkgs\python-chromedriver-binary-98.0.4758.48.0-py39hcbf5309_0\Lib\site-packages\chromedriver_binary\chromedriver.exe")
wait = WebDriverWait(webdriver, 20)
webdriver.maximize_window()
webdriver.get("https://pcpartpicker.com/user/stephwaters/saved/#view=HgH2xr")
wait.until(EC.visibility_of_element_located((By.XPATH, "//tr[#class='tr__product']")))
time.sleep(0.3) #short delay added to make sure not the first product only got loaded
products = = webdriver.find_elements(By.XPATH, '//tr[#class="tr__product"]')
for product in products:
name = product.find_element(By.XPATH, './/td[#class="td__component"]')
price = product.find_element(By.XPATH, './/td[#class="td__price"]//a')
print(name.text + ': ' + price.text)

Selenium Unable to Locate Class Elements

I am learning how to use Selenium with Python and have been toying around with a few different things. I keep having an issue where I cannot locate any classes. I am able to locate and print data by xpath, but I cannot locate the classes.
The goal of this script is to gather a number from the table on the website and the current time, then append the items to a CSV file.
Site: https://bitinfocharts.com/top-100-richest-dogecoin-addresses.html
Any advice or guidance would be greatly appreciated as I am new to python. Thank you.
Code:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv
from datetime import datetime
from selenium.webdriver.common.by import By
#Open ChromeDriver
PATH = ('/Users/brandon/Desktop/chromedriver')
driver = webdriver.Chrome(PATH)
driver.get("https://bitinfocharts.com/top-100-richest-dogecoin-addresses.html")
driver.implicitly_wait(10)
driver.maximize_window()
#Creates the Time
now = datetime.now()
current_time = now.strftime("%m/%d/%Y, %H:%M:%S")
#####
#Identify the section of page
page = driver.find_element(By.CLASS_NAME,'table table-condensed bb')
time.sleep(3)
#Gather the data
for page in pages():
num_of_wallets = page.find_element(By.XPATH,
"//html/body/div[5]/table[1]/tbody/tr[10]/td[2]").text
table_dict = {"Time":current_time,
"Wallets":num_of_wallets}
file = open('dogedata.csv', 'a')
try:
file.write(f'{current_time},{num_of_wallets}')
finally:
file.close()

table table-condensed bb actually contains 3 class names.
So the best way to locate element based on multiple class names is to use css selector or xpath like:
page = driver.find_element(By.CSS_SELECTOR,'.table.table-condensed.bb')
or
page = driver.find_element(By.XPATH,"//*[#class='table table-condensed bb']")

Web Scraping with Selenium on Python using Google Chrome

I am trying to scape a website to get some company information. If the search result is there and matches the search term I would like to continue, if not, I would like to move on to the next company.
Here is the code:
import pandas as pd
import numpy as np
from tqdm import notebook
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import time, sleep
import datetime
import sys
url = "https://register.fca.org.uk/s/"
search_box_path = '//*[#id="search-form-search-section-main-input"]'
firm_checkbox_path = '//*[#id="search-form-search-options-radio-group"]/span[1]/label/span[1]'
searchterm = 'XXX Company'
driver = webdriver.Chrome(executable_path=r'C:\Users\XXXX\Chrome Webdriver\chromedriver.exe')
driver.get(url)
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH,firm_checkbox_path)))
driver.find_element_by_xpath(firm_checkbox_path).click()
driver.find_element_by_xpath(search_box_path).send_keys(searchterm)
driver.find_element_by_xpath(search_box_path).send_keys(Keys.RETURN)
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH,'//*
[#id="maincontent"]/div[4]/div/div[2]/h1/span[2]')))
element = driver.find_element_by_xpath('//*[#id="maincontent"]/div[4]/div/div[2]/h1/span[2]')
check_result()
The issue is with the check_result function. In this function I am just comparing the searchterm against the element.text of the element from the website.
def check_result():
name= driver.find_element_by_xpath('//*[#id="maincontent"]/div[4]/div/div[2]/h1/span[2]')
return name.text == searchterm:
This logic on its own works fine, but along with the code it give me false even though I know that the text I provide is equal to the element.text.
Any help is much appreciated.

Selenium Webscraper with Python won't let me click an element

I am trying to put together a web scraper to get locations by zip code entered by the user. As of right now I am able to navagate to the website but I am not able to click on the drop down button that allows you to enter in a zip code. Here is what I have so far
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
import pandas as pd
from selenium.webdriver.common.by import By
zipcode = input("What zip code would you like to search? ")
out_table = 'Oreilly_autp_parts_addresses_{}.csv'.format(zipcode)
#Using Selenium to navigate to website, search zipcode and get html data
driver = webdriver.Chrome() #requires geckodriver.exe
driver.get('https://www.oreillyauto.com/')
time.sleep(2)
driver.maximize_window()
el = driver.find_element_by_class_name("site-store")
time.sleep(2)
driver.execute_script("arguments[0].setAttribute('class','site-store site-nav_item dispatcher-trigger--active')", el)
It seems to be clicking on the correct element but the drop down that is supposed to show up isn't there. HTML Code Block
Any help is much appreciated!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Webscraping using selenium - python

Related

How do I scrap a JS popup table with Selenium?

How to print 2 values at once with selenium and python?

Selenium Unable to Locate Class Elements

Web Scraping with Selenium on Python using Google Chrome

Selenium Webscraper with Python won't let me click an element

Categories

Resources