Python: How to use re.compile after click() - python

I've successfully coded Python to open a website and then click a link. After that, I'd like to grab info from the site that is "active". But I'm getting the following error:
Search = Regex.search(res.text)
AttributeError: 'NoneType' object has no attribute 'text'
I think the problem is that I don't know how to "define" the clicked-into webpage as a variable. Here is the code that is relevant:
import re, requests, csv, pyperclip, logging
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://espn.go.com/golf/players')
find_player = browser.find_element_by_partial_link_text("Allenby, Robert")
res = find_player.click()
xRegex = re.compile(r'(1991)')
xSearch = xRegex.search(res.text)
output_player_name = xSearch.group(1)
This is my first Python coding experience and my first post to ask a question. Thanks in advance for any help.
PS I know that 1991 appears in the webpage. It's the year Robert Allenby turned pro.

find_element_by_partial_link_text() returns a anchor link and click is an event in browser which returns None. To get page content use browser.page_source and process the same as per requirement.
import re
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://espn.go.com/golf/players')
href_link = browser.find_element_by_partial_link_text("Allenby")
href_link.click()
res = browser.page_source
print res
xRegex = re.compile(r'(1991)')
xSearch = xRegex.search(res)
output_player_name = xSearch.group(1)
print output_player_name
Hope this helps :)

find_player is of type WebElement, res is type NoneType. The clicked-into webpage is held in the variable browser. You can use functions like find_element_by_partial_link_text to find what you need.
By the way, what are you trying to accomplish? It might help me answer your question better.

Related

Is that I have no way to use Selenium in Bloomberg.com?

I would like to retrieve some financial data in Bloomberg, however I found that it can realise that the execution is a bot/automation or something.
import ws_functions.config as ws_config
import ws_functions.cust_functions as ws_functions
import time
import sqlite3
from selenium.webdriver.common.keys import Keys
ws_functions.print_runtime()
bloomberg_frontpage_url = 'https://www.bloomberg.com/asia'
browser = ws_functions.get_ChromeDriver((""))
browser.implicitly_wait(10)
try:
browser.get(bloomberg_frontpage_url)
time.sleep(5)
browser.find_element_by_xpath('//*[#id="nav-bar-search-button"]').click() #Click "Search"
time.sleep(3)
browser.find_element_by_xpath('//*[#id="navi-search-input"]').send_keys('HSI') #Send 'HSI' to the searchbar
time.sleep(2)
browser.find_element_by_partial_link_text('HSI:IND').click()
May I know how I can resolve this?
The upon is my all codes so far, hope someone can advise me, thanks all :)
Below is the website popped to me:
I would like to scrape the below data:
Here is the direct URL to the Hang Seng Index :
https://www.bloomberg.com/quote/HSI:IND
Finally I have got a workaround to just simply call the Bloomberg API to retrieve that instead of using the mentioning solution, thanks.

How can I select a search suggestion using selenium? The site prevents me from just clicking on submit, requires a selection

I'm trying to make searching for temporary apartments a bit easier on myself, but a website with listings for these apartments requires me to select a suggestion from their drop down list before I can click on submit. No matter how complete the entry in the search box might be.
The ultimate hope here is that I can get forward to the search results and then extract contact information from each listing. I was able to extract the data I need from a listing using Beautiful soup and Requests, but I had to paste in the URL for that specific listing into my code. I didn't get that far. If anyone has a suggestion on how to perhaps circumvent the landing page to get to the relevant listings, please let me know.
I tried just splicing the town name and the state name into the address bar by looking at how it's written after a successful search but that didn't work.
The site is Mein Monteurzimmer.
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
driver = webdriver.Firefox()
webpage = r"https://mein-monteurzimmer.de"
print('Prosim vnesi zeljeno mesto') #Please enter the town to search
searchterm = input()
driver.get(webpage)
sbox = driver.find_element_by_xpath("/html/body/main/cpagearea/section/div[2]/div/section[1]/div/div[1]/section/form/div/input")
sbox.send_keys(searchterm)
ddown = driver.find_element_by_xpath("/html/body/main/cpagearea/section/div[2]/div/section[1]/div/div[1]/section/form/div")
ddown.select_by_value(1)
webdriver.wait(2)
#select = driver.find_element_by_xpath("/html/body/main/cpagearea/section/div[2]/div/section[1]/div/div[1]/section/form/div")
submit = driver.find_element_by_xpath("/html/body/main/cpagearea/section/div[2]/div/section[1]/div/div[1]/section/form/button")
submit.click
When I inspect the search box I can't find anything related to the suggestions until I enter a text. Then I can't click on the HTML code because that dismisses the suggestions. It's quite frustrating.
Here's a screenshot:
So I'm blindly trying to select something.
The error here is:
AttributeError: 'FirefoxWebElement' object has no attribute 'select_by_value'
I tried something with select, but that doesn't work with the way I tried this.
I am stumped and the solutions I could find were specific for other sites like Google or Amazon and I couldn't make sense if it.
Does anyone know how I could make this work?
Here's the code for getting information out of a listing, which I'll have to expand on to get the other data:
import bs4, requests
def getMonteurAddress(MonteurUrl):
res = requests.get(MonteurUrl)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
elems = soup.select('section.c:nth-child(4) > div:nth-child(2) > div:nth-child(2) > dl:nth-child(1) > dd:nth-child(2)')
return elems[0].text.strip()
address = getMonteurAddress('https://mein-monteurzimmer.de/105742/monteurzimmer/deggendorf-monteurzimmer-deggendorf-pensionfelix%40googlemailcom')
print('Naslov je ' + address) #print call to see if it gets the right data
As you can see once you type in, there is a list of divs creating. Now you need to get the a valid locator for these divs. To get the locator for these created divs you need to inspect elements in debug pause mode ( F12--> Source Tab --> F8).
Try below code to select first matching address as you typed.
sbox = driver.find_element_by_xpath("//input[#placeholder='Adresse, PLZ oder Ort eingeben']")
sbox.send_keys(searchterm)
addessXpath = "//div[contains(text(),'"+searchterm+"')]"
driver.find_element_by_xpath(addessXpath).click()
Note : If there are more than one matching address , first one will be selected.

For Loops while using selenium for webscraping Python

I am attempting to web-scrape info off of the following website: https://www.axial.net/forum/companies/united-states-family-offices/
I am trying to scrape the description for each family office, so "https://www.axial.net/forum/companies/united-states-family-offices/"+insert_company_name" are the pages I need to scrape.
So I wrote the following code to test the program for just one page:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('insert_path_here/chromedriver')
driver.get("https://network.axial.net/company/ansaco-llp")
page_source = driver.page_source
soup2 = soup(page_source,"html.parser")
soup2.findAll('axl-teaser-description')[0].text
This works for the single page, as long as the description doesn't have a "show full description" drop down button. I will save that for another question.
I wrote the following loop:
#Note: Lst2 has all the names for the companies. I made sure they match the webpage
lst3=[]
for key in lst2[1:]:
driver.get("https://network.axial.net/company/"+key.lower())
page_source = driver.page_source
for handle in driver.window_handles:
driver.switch_to.window(handle)
word_soup = soup(page_source,"html.parser")
if word_soup.findAll('axl-teaser-description') == []:
lst3.append('null')
else:
c = word_soup.findAll('axl-teaser-description')[0].text
lst3.append(c)
print(lst3)
When I run the loop, all of the values come out as "null", even the ones without "click for full description" buttons.
I edited the loop to instead print out "word_soup", and the page is different then if I had run it without a loop and does not have the description text.
I don't understand why a loop would cause that but apparently it does. Does anyone know how to fix this problem?
Found solution. pause the program for 3 seconds after driver.get:
import time
lst3=[]
for key in lst2[1:]:
driver.get("https://network.axial.net/company/"+key.lower())
time.sleep(3)
page_source = driver.page_source
word_soup = soup(page_source,"html.parser")
if word_soup.findAll('axl-teaser-description') == []:
lst3.append('null')
else:
c = word_soup.findAll('axl-teaser-description')[0].text
lst3.append(c)
print(lst3)
I see that the page uses javascript to generate the text meaning it doesn't show up in the page source, which is weird but ok. I don't quite understand why you're only iterating through and switching to all the instances of Selenium you have open, but you definitely won't find the description in the page source / beautifulsoup.
Honestly, I'd personally look for a better website if you can, otherwise, you'll have to try it with selenium which is inefficient and horrible.

Need help on retrieving site key from dynamic website

I am creating an automated account creator using Pycharm. I am facing a problem to which I have yet to find a good solution. I want to get the site-key in order to pass the captcha to a service I have bought. I have used the requests.get method but it gives back "None" as a result. I am using selenium in my program. After some thought I realized that using requests.get method, if it worked, would bring me a different key than the one that my selenium driver is currently displayed. I googled a lot and found only that there is a module named Selenium-Requests which doesn't have Edge imported. I am using Edge as it is the only browser everyone has and doesn't require the developer version of it like Chrome and Firefox.
Generally I haven't found a fix that can help me retrieve the key within my driver.
This is the retrieve code:
registerurl = requests.get(url)
registerurlstring = ''.join(str(e) for e in registerurl)
soup = BeautifulSoup(registerurlstring, features="html5lib")
hidden_tags = soup.find({"id":"recaptcha-token"})
sitekey = hidden_tags
try:
print('Sitekey = ', sitekey)
except:
print('Sitekey = Not Found')
I am not sure this is what you are after or not.To get the recaptcha value which is inside iframe so you have to target that src value of that iframe and using python request module you can get value of that input.
import requests
from bs4 import BeautifulSoup
url='https://www.google.com/recaptcha/api2/anchor?ar=1&k=6Lc3HAsUAAAAACsN7CgY9MMVxo2M09n_e4heJEiZ&co=aHR0cHM6Ly9zaWdudXAuZXVuZS5sZWFndWVvZmxlZ2VuZHMuY29tOjQ0Mw..&hl=en&v=A1Aard-wURuGsXRGA7JMOqVO&theme=dark&size=invisible&badge=bottomright&cb=ezyy1frci5ms'
registerurl = requests.get(url)
soup = BeautifulSoup(registerurl.text, features="html5lib")
hidden_tags = soup.find('input' ,attrs={"id":"recaptcha-token"})
print(hidden_tags['value'])
Output:
03AOLTBLQFd9hdHGmOesrT0xDcA8MkI6FGIiM3892Uws3aEWzPxUT8-U8IBEZHYzUEba2Jp9m3s9z_sz_fuij9OXZHABulFrI8YCD95kXV_H6xTO9vOubuZfzscleb6fdkkAE3IwUUSdTzPbXILy6SGLPI3LpPUptC1enZLIkQxQq9T8AEPPvCIsVgGe4jSE_l1jCWIRmBeBXsLgPLABZSq6ah6QWFfAngdC1rQaLMKWzLBmzh6ytEEGNYHmEG7P6UVtYcTI1IRIvq-ba-oGIUS1ELUb-1d3upQ29JWBtQ2t7_VNn237fguztf_FUDEHnAfHppUsrz-ZlkE00sMXFCuQ1XF6Qz7lH2j5g2z5KZQiODhRUBRRyd-ydjetz053bKRcgWpnNoZGNf1GBlW5inL9AtyYTkpruttw5sruAPuVgs5mrniQ5hrHNvfDIZKX905T2E21W2DsW1_07rItFYa-zkylMU83YXRQ
Hope this helps.
Updated code to get the iframe src value using webdriver.
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
driver=webdriver.Chrome()
driver.get("https://signup.eune.leagueoflegends.com/en/signup/index")
url=driver.find_element_by_css_selector("iframe[role='presentation']").get_attribute('src')
registerurl = requests.get(url)
soup = BeautifulSoup(registerurl.text, features="html5lib")
hidden_tags = soup.find('input' ,attrs={"id":"recaptcha-token"})
print(hidden_tags['value'])

Selenium Driver - Webscraping

Using the Selenium Module to try and webscrape but when I print out the element, it seems that it returns a location the data is stored on the Selenium Server? I'm not exactly sure how this works. Anyway, here's my code. I'm very confused. Can someone tell me what I'm doing wrong?
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://caribeexpress.com.do/') #get method
elem2 = browser.find_elements_by_css_selector('div.plan:nth-child(3) > div:nth-child(2) > span:nth-child(2)')
print(elem2)
elems3 = browser.find_elements_by_class_name('value')
print(elems3)
elem4 = browser.find_element_by_xpath('//*[#id="content-wrapper"]/div[2]/div[3]/div/span[2]')
print(elem4)
For some reason, what displays in my Python IDE doesn't display here, I included it in my gist.
https://gist.github.com/jtom343
In case you want to extract the text between span tags.
Replace this to :
print(elem2)
TO:
print(elem2.text.strip())
and this : print(elem4)
To:
print(elem4.text.strip())

Categories