How to webscrape with Selenium when URL remains static after click

How to webscrape with Selenium when URL remains static after click - python

I am new to Selenium/Firefox. My goal is to go to my URL, fill in basic input, select a few items, let browser change the content and download a PDF from there. Ideally, I would love to do it repeatedly later by looping a number of new items. As a first step, I manage to get the browser to work and change content once. But I am stuck in getting the content out as find_elements_by_tag_name() seem to get me something funny rather than some usual HTML tag like what Beautifulsoup .find_all() would do. Appreciate very much any help here.
Here is my code:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
url ='http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx'
browser = webdriver.Firefox(executable_path = 'C:\Program Files\Mozilla
Firefox\geckodriver.exe')
browser.get(url)
StockElem = browser.find_element_by_id('ctl00_txt_stock_code')
StockElem.send_keys('00772')
StockElem.click()
select = Select(browser.find_element_by_id('ctl00_sel_tier_1'))
select.select_by_value('3')
select = Select(browser.find_element_by_id('ctl00_sel_tier_2'))
select.select_by_value('153')
select = Select(browser.find_element_by_id('ctl00_sel_DateOfReleaseFrom_d'))
select.select_by_value('01')
select = Select(browser.find_element_by_id('ctl00_sel_DateOfReleaseFrom_m'))
select.select_by_value('01')
select = Select(browser.find_element_by_id('ctl00_sel_DateOfReleaseFrom_y'))
select.select_by_value('2000')
# select the search button
browser.execute_script("document.forms[0].submit()")
element = browser.find_elements_by_tag_name("a")
print(element)

After clicking on the Search button -- you have 5 links to download PDF files.
You should find those links by CSS selector: .news.
Then go through the list of links by index and click on each link to Download:
elements[0].click() -- by clicking on the first link.

Related

How can I select a search suggestion using selenium? The site prevents me from just clicking on submit, requires a selection

I'm trying to make searching for temporary apartments a bit easier on myself, but a website with listings for these apartments requires me to select a suggestion from their drop down list before I can click on submit. No matter how complete the entry in the search box might be.
The ultimate hope here is that I can get forward to the search results and then extract contact information from each listing. I was able to extract the data I need from a listing using Beautiful soup and Requests, but I had to paste in the URL for that specific listing into my code. I didn't get that far. If anyone has a suggestion on how to perhaps circumvent the landing page to get to the relevant listings, please let me know.
I tried just splicing the town name and the state name into the address bar by looking at how it's written after a successful search but that didn't work.
The site is Mein Monteurzimmer.
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
driver = webdriver.Firefox()
webpage = r"https://mein-monteurzimmer.de"
print('Prosim vnesi zeljeno mesto') #Please enter the town to search
searchterm = input()
driver.get(webpage)
sbox = driver.find_element_by_xpath("/html/body/main/cpagearea/section/div[2]/div/section[1]/div/div[1]/section/form/div/input")
sbox.send_keys(searchterm)
ddown = driver.find_element_by_xpath("/html/body/main/cpagearea/section/div[2]/div/section[1]/div/div[1]/section/form/div")
ddown.select_by_value(1)
webdriver.wait(2)
#select = driver.find_element_by_xpath("/html/body/main/cpagearea/section/div[2]/div/section[1]/div/div[1]/section/form/div")
submit = driver.find_element_by_xpath("/html/body/main/cpagearea/section/div[2]/div/section[1]/div/div[1]/section/form/button")
submit.click
When I inspect the search box I can't find anything related to the suggestions until I enter a text. Then I can't click on the HTML code because that dismisses the suggestions. It's quite frustrating.
Here's a screenshot:
So I'm blindly trying to select something.
The error here is:
AttributeError: 'FirefoxWebElement' object has no attribute 'select_by_value'
I tried something with select, but that doesn't work with the way I tried this.
I am stumped and the solutions I could find were specific for other sites like Google or Amazon and I couldn't make sense if it.
Does anyone know how I could make this work?
Here's the code for getting information out of a listing, which I'll have to expand on to get the other data:
import bs4, requests
def getMonteurAddress(MonteurUrl):
res = requests.get(MonteurUrl)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
elems = soup.select('section.c:nth-child(4) > div:nth-child(2) > div:nth-child(2) > dl:nth-child(1) > dd:nth-child(2)')
return elems[0].text.strip()
address = getMonteurAddress('https://mein-monteurzimmer.de/105742/monteurzimmer/deggendorf-monteurzimmer-deggendorf-pensionfelix%40googlemailcom')
print('Naslov je ' + address) #print call to see if it gets the right data

As you can see once you type in, there is a list of divs creating. Now you need to get the a valid locator for these divs. To get the locator for these created divs you need to inspect elements in debug pause mode ( F12--> Source Tab --> F8).
Try below code to select first matching address as you typed.
sbox = driver.find_element_by_xpath("//input[#placeholder='Adresse, PLZ oder Ort eingeben']")
sbox.send_keys(searchterm)
addessXpath = "//div[contains(text(),'"+searchterm+"')]"
driver.find_element_by_xpath(addessXpath).click()
Note : If there are more than one matching address , first one will be selected.

Get html of inspect element source with selenium

I'm working in selenium with Chrome.
The webpage I'm accessing updates dynamically.
I need the html that shows the results, I can access it when I do 'inspect element'.
I don't get how I need to access that html from my code. I always get the original html.
I tried this: Get HTML Source of WebElement in Selenium WebDriver using Python
browser.get('http://bijsluiters.fagg-afmps.be/?localeValue=nl')
searchform = browser.find_element_by_class_name('iceInpTxt')
searchform.send_keys('cefuroxim')
button = browser.find_element_by_class_name('iceCmdBtn').click()
element = browser.find_element_by_class_name('contentContainer')
html = element.get_attribute('innerHTML')
browser.close()
print(html)

It seems that it's working after some delay. If I were you I should try to experiment with the delay time.
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get('http://bijsluiters.fagg-afmps.be/?localeValue=nl')
searchform = browser.find_element_by_class_name('iceInpTxt')
searchform.send_keys('cefuroxim')
button = browser.find_element_by_class_name('iceCmdBtn').click()
time.sleep(10)
element = browser.find_element_by_class_name('contentContainer')
html = element.get_attribute('innerHTML')
browser.close()
print(html)
Addition: a nicer way is to let the script proceed when an element is available (because of time it takes with JS (for example) before a specific element has been added to the DOM). The element to look for in your example is table with id iceDatTbl (for what I could find after a quick look).

How to loop though HTML and return id values

I am using selenium to navigate to a webpage and store the page source in a variable.
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://google.com")
html1 = driver.page_source
html1 now contains the page source of http://google.com.
My question is How can I return html selectors such as id="id" or name="name".
EDIT:
For example:
The webpage I navigated to with selenium has a menu bar with 4 tabs. Each tab has an id element; id="tab1", id="tab2", and so on. I would like to return each id value. So I want tab1, tab2, so on.
Edit#2:
Another example:
The homepage on my webpage (http://chrisarroyo.me) have several clickable links with ids. I would like to be able to return/print those ids to my console.
So I would like to return the ids for the Learn More button and the ids for the links in the footer (facebookLnk, githubLnk, etc..)

If you are looking for a list of WebElements that have an ID use:
elements = driver.find_elements_by_xpath("//*[#id]")
You can then iterate over that list and use get_attribute_("id") to pull out each elements specific ID.
For name, its pretty much the same code. Except change id to name and you're set.

Thank you #stewartm you comment helped.
This ended up giving me the results I was looking for:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://chrisarroyo.me")
id_elements = driver.find_elements_by_xpath("//*[#id]")
for eachElement in id_elements:
individual_ids = eachElement.get_attribute("id")
print(individual_ids)
After running the above ^^ the output listed each of the ids on the webpage specified.
output:
navbarNavAltMarkup
learnBtn
githubLnk
facebookLnk
linkedinLnk

web Scraping search Results in Python

I am very new to web scraping with Python. In the web page, which I am trying to scrape, I can enter string 'ABC' in the text box and click search. This gives me the details of 'ABC', but under the same URL. There is no change in url. I am trying to scrape the result details information.
I have worked till the "search" click. But I do not know how to capture the results of the search (details of search string 'ABC'). Please suggest how could I achieve it.
from selenium import webdriver
import webbrowser
new = 2 # open in a new tab, if possible
path_to_chromedriver = 'C:/Tech-stuffs/chromedriver/chromedriver.exe' # change path as needed
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
url = 'https://www.federalreserve.gov/apps/mdrm/data-dictionary'
browser.get(url)
browser.find_element_by_xpath('//*[#id="form0"]/table/tbody/tr[2]/td/label[2]').click()
browser.find_element_by_xpath("//select[#id='SelectedReportForm']/option[#value='1']").click()
browser.find_element_by_xpath('//*[#id="Search"]').click()

Use find_elements_by_xpath() to locate the xpath which entails all of the search results. Then iterate through them using a for loop and print each result's text. That should, at the bare minimum, get what you want.
results = browser.find_elements_by_xpath('//table//tr')
for result in results:
print "%s\n" % result.text

Find table elements to fill forms selenium python

My code so far is:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://moodle.tau.ac.il/')
driver.find_element_by_xpath("id('page-content')//form[#id='login']// \
input[#type='submit']").click()
Now I'm trying to fill up the login form and I succeeded to find the division
that follows id= content, easy to see in the image:
The following code line I used:
elem = driver.find_element_by_xpath("id('content'))
but it doesn't recognize anything in it and I cant get any further, what should I do to locate the input element?

It doesn't recognize anything because it is in an iframe. Therefore, you first have to switch to the iframe and then search the login form.
Switch to the iframe:
frame = driver.find_element_by_id('credentials')
driver.switch_to.frame(frame)
Or:
driver.switch_to.frame('credentials')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to webscrape with Selenium when URL remains static after click - python

After clicking on the Search button -- you have 5 links to download PDF files. You should find those links by CSS selector: .news. Then go through the list of links by index and click on each link to Download: elements[0].click() -- by clicking on the first link.

Related

How can I select a search suggestion using selenium? The site prevents me from just clicking on submit, requires a selection

Get html of inspect element source with selenium

How to loop though HTML and return id values

web Scraping search Results in Python

Find table elements to fill forms selenium python

Categories

Resources