Retrieve message from DOM with Selenium - python

I am learning Python and I am trying things with Selenium. Today I am trying to retrieve a message from the Spectrum chat of Star Citizen: https://robertsspaceindustries.com/spectrum/community/SC/lobby/1
I would like to retrieve the: div class="lobby-motd-message" because it gives good information.
This is my code but when I run it, it displays nothing... Can you help me to solve this problem ? Please. I will do more things with Selenium ( a Discord bot) but I need to retrieve this information first.
#!/usr/bin/python3
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
opts = Options()
opts.headless = True
browser = webdriver.Firefox(options=opts)
url = "https://robertsspaceindustries.com/spectrum/community/SC/lobby/1"
browser.get(url)
browser.implicitly_wait(10)
try:
info = browser.find_element_by_class_name("lobby-motd-message")
print(info.text)
except:
print("not found")
browser.close()
quit()

Depending on what element you want you might need to target and get it's text.
try:
info = browser.find_element_by_class_name("lobby-motd-message")
print(info.find_element_by_tag_name('p').text)
except:
print("not found")
Outputs
Star Citizen Alpha 3.11 is LIVE - Discover more here !
All of it
print(info.get_attribute('innerHTML'))
Outputs
<p>Star Citizen Alpha 3.11 is LIVE - Discover more here !</p>

Related

Open whatsapp on a headless browser

Use case:
Sending out automated WhatsApp messages using pythonanywhere. Step by step logic below:
non-coders write on a gsheet the phone numbers to which we should
send out the messages
the gsheet data is read (using gspread in pythonanywhere)
open WhatsApp URL to send out the messages in bulk
I have a code using selenium running on my machine that opens the web whatsapp url, finds the needed elements on the website, and sends the messages to the gsheets phone numbers - find below a snippet from that part of the code that I am using on my machine:
global driver
driver.get('https://web.whatsapp.com/')
waiter.until(EC.title_is("WhatsApp"))
waitCounter = 0
while 1:
try:
waiter.until(EC.presence_of_element_located((By.XPATH, "//canvas[#aria-label='Scan me!']")))
waitCounter+=1
if waitCounter%1000 == 0:
print("Waiting for user to log in...", 'WARNING')
except:
print("Logged in to WhatsApp")
break
for entry in data:
driver.find_element_by_xpath(PHONE_NUMER_INPUT).send_keys(str(entry['PhoneNumber']))
time.sleep(2)
driver.find_element_by_xpath(PHONE_NUMER_INPUT).send_keys(Keys.ENTER)
time.sleep(2)
driver.find_element_by_class_name('p3_M1').send_keys(str(entry['Message']))
time.sleep(2)
driver.find_element_by_class_name('_4sWnG').click()
time.sleep(2)
Doubt:
To make step nr 3. working on python anywhere I would have to use a headless browser. However, to initiate whatsapp web we always need to do the QR code scan, so I am not being able to do it that way. Find below the current (useless) part of my code with the headless selenium code - (NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id='side']/div[1]/div/label/div/div[2]"}). I am quite stuck here. Any tip or idea to overcome this is more than welcome and happy to discuss any possible solutions using other libraries that you guys might find appropriate.
Thanks in advance.
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=chrome_options)
def send_whatsapp_message():
global driver
driver.get('https://web.whatsapp.com/')
print("Done updating, check the spreadsheet now")
#redirect('https://web.whatsapp.com/', code=302)
for entry in data:
driver.find_element_by_xpath("//*[#id='side']/div[1]/div/label/div/div[2]").send_keys(str(entry['PhoneNumber']))
time.sleep(2)
driver.find_element_by_xpath("//*[#id='side']/div[1]/div/label/div/div[2]").send_keys(Keys.ENTER)
time.sleep(2)
driver.find_element_by_class_name('p3_M1').send_keys(str(entry['Message']))
time.sleep(2)
driver.find_element_by_class_name('_4sWnG').click()
time.sleep(2)
print("Successfully send message to {0}, name: {1}".format(str(entry['PhoneNumber']), str(entry['Name'])), 'INFO')

Web Scraping Python - Pubs

I am trying to extract the site name and address data from this website for each card but this doesn't seem to work. Any suggestions?
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://order.marstons.co.uk/")
all_cards = driver.find_elements_by_xpath("//div[#class='h3.body__heading']/div[1]")
for card in all_cards:
print(card.text) # do as you will
I'm glad that you are trying to help yourself, it seems you are new to this so let me offer some help.
Automating a browser via Selenium to do this is going to take you forever, the Marston's site is pretty straightforward to scrape if you know where to look: If you open your browser Developer Tools (F12 on pc) then - Network tab - fetch/Xhr and then hit refresh while on the Marston's site you'll see some backend api calls happening. If you click on the one that says "brand" then click the "preview" tab that should be available, you'll see a collapsible list of all sorts of information, that is a JSON file which is essentially a collection of python lists and dictionaries which make it easier to get the data you are after. The information in the "venue" list is going to be helpful when it comes to scraping the menus for each venue.
When you go to a specific pub you'll see an api call with the pubs name, this has all the menu info which you can see in the same way and we can make calls to these venue api's using the "slug" data from the venues response above.
So by making our own requests to these URLs and stepping through the JSON and getting the data we want we can have everything done in a couple minutes, far easier than trying to do this automating a browser! I've written the code below, feel free to ask questions if anything is unclear you'll need to pip install requests and pandas to make this work. You owe me a pint! :) Cheers
import requests
import pandas as pd
headers = {'origin':'https://order.marstons.co.uk'}
url = 'https://api-cdn.orderbee.co.uk/brand'
resp = requests.get(url,headers=headers).json()
venues = {}
for venue in resp['venues']:
venues[venue['slug']] = venue
print(f'{len(venues)} venues to scrape')
output = []
for venue in venues.keys():
try:
url = f'https://api-cdn.orderbee.co.uk/venues/{venue}'
print(f'Scraping: {venues[venue]["name"]}')
try:
info = requests.get(url,headers=headers).json()
except Exception as e:
print(e)
print(f'{venues[venue]["name"]} not available')
continue
for category in info['menus']['oat']['categories']: #oat = order at table?
cat_name = category['name']
for subcat in category['subCategories']:
subcat_name = subcat['name']
for item in subcat['items']:
info = {
'venue_name': venues[venue]['name'],
'venue_city': venues[venue]['address']['city'],
'venue_address': venues[venue]['address']['streetAddress'],
'venue_postcode': venues[venue]['address']['postCode'],
'venue_latlng': venues[venue]['address']['location']['coordinates'],
'category':cat_name,
'subcat':subcat_name,
'item_name' : item['name'],
'item_price' : item['price'],
'item_id' : item['id'],
'item_sku' : item['sku'],
'item_in_stock' : item['inStock'],
'item_active' : item['isActive'],
'item_last_update': item['updatedAt'],
'item_diet': item['diet']
}
output.append(info)
except Exception as e:
print(f'Problem scraping {venues[venue]["name"]}, skipping it') #when there is no menu available for some reason? Closed location?
continue
df = pd.DataFrame(output)
df.to_csv('marstons_dump.csv',index=False)
I use Firefox but it should work also for Chrome.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# driver = webdriver.Chrome(ChromeDriverManager().install())
driver = webdriver.Firefox()
driver.get("https://order.marstons.co.uk/")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[#id="app"]/div/div/div/div[2]/div'))
).find_elements_by_tag_name('a')
for el in element:
print("heading", el.find_element_by_tag_name('h3').text)
print("address", el.find_element_by_tag_name('p').text)
finally:
driver.quit()

Selenium Driver - Webscraping

Using the Selenium Module to try and webscrape but when I print out the element, it seems that it returns a location the data is stored on the Selenium Server? I'm not exactly sure how this works. Anyway, here's my code. I'm very confused. Can someone tell me what I'm doing wrong?
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://caribeexpress.com.do/') #get method
elem2 = browser.find_elements_by_css_selector('div.plan:nth-child(3) > div:nth-child(2) > span:nth-child(2)')
print(elem2)
elems3 = browser.find_elements_by_class_name('value')
print(elems3)
elem4 = browser.find_element_by_xpath('//*[#id="content-wrapper"]/div[2]/div[3]/div/span[2]')
print(elem4)
For some reason, what displays in my Python IDE doesn't display here, I included it in my gist.
https://gist.github.com/jtom343
In case you want to extract the text between span tags.
Replace this to :
print(elem2)
TO:
print(elem2.text.strip())
and this : print(elem4)
To:
print(elem4.text.strip())

Log in to website using python and selenium

I'm trying to log in to http://sports.williamhill.com/bet/en-gb using python and selenium.
Here is what I've tried so far:
from selenium import webdriver
session = webdriver.Chrome()
session.get('https://sports.williamhill.com/bet/en-gb')
# REMOVE POP-UP
timezone_popup_ok_button = session.find_element_by_xpath('//a[#id="yesBtn"]')
timezone_popup_ok_button.click()
# FILL OUT FORMS
usr_field = session.find_element_by_xpath('//input[#value="Username"]')
usr_field.clear()
WebDriverWait(session, 10).until(EC.visibility_of(usr_field))
usr_field.send_keys('myUsername')
pwd_field = session.find_element_by_xpath('//input[#value="Password"]')
pwd_field.clear()
pwd_field.send_keys('myPassword')
login_button = session.find_element_by_xpath('//input[#id="signInBtn"]')
login_button.click()
I'm getting the following error.
selenium.common.exceptions.ElementNotVisibleException: Message: element not visible
when trying to execute
usr_field.send_keys('myUsername')
The usr_field element seems to be visible if I'm viewing it with the inspector tool, however I'm not 100% sure here.
I'm using this script (with some modifications) successfully on other sites, but this one is giving me a real headache and I can't seem to find the answer anywhere on the net.
Would appreciate if someone could help me out here!
The following code will resolve the issue.
from selenium import webdriver
session = webdriver.Chrome()
session.get('https://sports.williamhill.com/bet/en-gb')
# REMOVE POP-UP
timezone_popup_ok_button = session.find_element_by_xpath('//a[#id="yesBtn"]')
timezone_popup_ok_button.click()
# FILL OUT FORMS
user_element = session.find_element_by_name("tmp_username")
user_element.click()
actual_user_elm = session.find_element_by_name("username")
actual_user_elm.send_keys("myUsername")
password_element = session.find_element_by_id("tmp_password")
password_element.click()
actual_pass_element = session.find_element_by_name("password")
actual_pass_element.send_keys("myPassword")
login_button = session.find_element_by_xpath('//input[#id="signInBtn"]')
login_button.click()

selenium works on local and not on azure server

I am trying to get video url from links on this page. Video link could be seen on https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html . (Open in Chrome)
For that I wrote chrome web driver related code as below :
from bs4 import BeautifulSoup
from selenium import webdriver
from pyvirtualdisplay import Display
chromedriver = '/usr/local/bin/chromedriver'
os.environ['webdriver.chrome.driver'] = chromedriver
display = Display(visible=0, size=(800,600))
display.start()
driver = webdriver.Chrome(chromedriver)
driver.get('https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html')
try:
element = WebDriverWait(driver, 20).until(lambda driver: driver.find_elements_by_class_name('yvp-main'))
self.yahoo_video_trend = []
for s in driver.find_elements_by_class_name('yvp-main'):
print "Processing link - ", item['link']
trend = item
print item['description']
trend['video_link'] = s.find_element_by_tag_name('video').get_attribute('src')
print
print s.find_element_by_tag_name('video').get_attribute('src')
self.yahoo_video_trend.append(trend)
except:
return
This works fine on my local system but when I run on my azure server it does not give any result at s.find_element_by_tag_name('video').get_attribute('src')
I have installed chrome on my azureserver.
Update :
Please see, requests and Beautifulsoup I already tried, but as yahoo loads html content dynamically from json, I could not get it using them.
And yeah azure server is simple linux system with command line access. Not any application.
I tried to reproduce your issue using you code. However, I found there was no tag named video in that page('https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html')(using IE and Chrome to test).
I used the developer Tool to check the HTML code, like this picture:
It seems that this page used the flash player to play video,not HTML5 video control.
For this reason, I suggest that you can check your code whether used the rightly tag name.
Any concerns, please feel free to let me know.
We tried to reproduce the error on our side. I was not able to get chrome driver to work, but I did try the firefox driver and it worked fine. It was able to load the page and get the link via the URL.
Can you change your code to print the exception and send it to us, to see where the script is failing?
Change your code:
except:
return
try
do
except Exception,e: print str(e)
Send us the exception, so we can take a look.

Categories