How to handle alerts with Python? - python

I wuold like to handle alerts with Python. What I wuold like to do is:
Open a url
Submit a form or click some links
Check if an alert occurs in the new page
I made this with Javascript using PhantomJS, but I would made even with Python.
Here is the javascript code:
file test.js:
var webPage = require('webpage');
var page = webPage.create();
var url = 'http://localhost:8001/index.html'
page.onConsoleMessage = function (msg) {
console.log(msg);
}
page.open(url, function (status) {
page.evaluate(function () {
document.getElementById('myButton').click()
});
page.onConsoleMessage = function (msg) {
console.log(msg);
}
page.onAlert = function (msg) {
console.log('ALERT: ' + msg);
};
setTimeout(function () {
page.evaluate(function () {
console.log(document.documentElement.innerHTML)
});
phantom.exit();
}, 1000);
});
file index.html
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title></title>
<meta charset="utf-8" />
</head>
<body>
<form>
<input id="username" name="username" />
<button id="myButton" type="button" value="Page2">Go to Page2</button>
</form>
</body>
</html>
<script>
document.getElementById("myButton").onclick = function () {
location.href = "page2.html";
};
</script>
file page2.html
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title></title>
<meta charset="utf-8" />
</head>
<body onload="alert('hello')">
</body>
</html>
This works; it detects the alert on page2.html.
Now I made this python script:
test.py
import requests
from test import BasicTest
from selenium import webdriver
from bs4 import BeautifulSoup
url = 'http://localhost:8001/index.html'
def main():
#browser = webdriver.Firefox()
browser = webdriver.PhantomJS()
browser.get(url)
html_source = browser.page_source
#browser.quit()
soup = BeautifulSoup(html_source, "html.parser")
soup.prettify()
request = requests.get('http://localhost:8001/page2.html')
print request.text
#Handle Alert
if __name__ == "__main__":
main();
Now, how can I check if an alert occur on page2.html with Python? First I open the page index.html, then page2.html.
I'm at the beginning, so any suggestions will be appreciate.
p.s.
I also tested webdriver.Firefox() but it is extremely slow.
Also i read this question : Check if any alert exists using selenium with python
but it doesn't work (below is the same previous script plus the solution suggested in the answer).
.....
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
....
def main():
.....
#Handle Alert
try:
WebDriverWait(browser, 3).until(EC.alert_is_present(),
'Timed out waiting for PA creation ' +
'confirmation popup to appear.')
alert = browser.switch_to.alert()
alert.accept()
print "alert accepted"
except TimeoutException:
print "no alert"
if __name__ == "__main__":
main();
I get the error :
"selenium.common.exceptions.WebDriverException: Message: Invalid
Command Method.."

PhantomJS uses GhostDriver to implement the WebDriver Wire Protocol, which is how it works as a headless browser within Selenium.
Unfortunately, GhostDriver does not currently support Alerts. Although it looks like they would like help to implement the features:
https://github.com/detro/ghostdriver/issues/20
You could possibly switch to the javascript version of PhantomJS or use the Firefox driver within Selenium.
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
if __name__ == '__main__':
# Switch to this driver and switch_to_alert will fail.
# driver = webdriver.PhantomJS('<Path to Phantom>')
driver = webdriver.Firefox()
driver.set_window_size(1400, 1000)
driver.get('http://localhost:8001/page2.html')
try:
driver.switch_to.alert.accept()
print('Alarm! ALARM!')
except NoAlertPresentException:
print('*crickets*')

Related

What's wrong with these html5 and python codes?

I have been working on these codes for quiet some time (even tried to use chatgpt's help lol), but I still haven't completed it because of this last problem. These codes' main purpose is for user to be able to input a song name in the html5 page, then the name should be sent to my python code, which would generate similar song names and send them to my html5 code, which would display it for user. Unfortunately, I have been running into some obstacles. For example, right now, when I input the song name, instead of getting recommended song names, I get my C:/ files displayed instead. Do you have any ideas how to fix it? Here are my codes:
,,index.html":
<!DOCTYPE html>
<html>
<head>
<title>Song4u</title>
<meta charset="utf-8">
<link rel="stylesheet" type="text/css" href="style.css">
<script type="text/javascript" href="script.js"></script>
<link href="https://fonts.googleapis.com/css2?family=Roboto+Mono:wght#100&display=swap" rel="stylesheet">
</head>
<body>
<h3><header>Song4u</header></h3><br>
<br>
<h1>FIND YOUR NEW FAVOURITE SONG!</h1> <br>
<h2>Insert your youtube link and we will show you songs that suit your vibe the most!</h2>
<form action="/" method="post">
<label for="songname">Songname:</label>
<input type="text" id="songname" name="songname">
<input type="submit" value="Submit">
</form>
</body>
</html>
"gptcode.py":
from flask import Flask, request, render_template
import subprocess
import sys
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
app = Flask(__name__, template_folder='templates')
def install(package):
subprocess.check_call([sys.executable, "-m", "pip", "install", package])
install("selenium")
#app.route('/', methods=['GET', 'POST'])
def index():
if request.method == 'POST':
songname = request.form['songname']
options = Options()
options.headless = True
options.add_argument('--disable-extensions')
options.add_argument('--disable-gpu')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(options=options)
driver.get("https://www.chosic.com/playlist-generator/")
sleep(1)
driver.find_element(By.XPATH, ('//*[text()="AGREE"]')).click()
driver.find_element(By.NAME, 'q').send_keys(songname)
sleep(3)
driver.find_element(By.XPATH, ('//*[#id="hh1"]')).click()
sleep(2)
driver.find_element(By.XPATH, ('//*[#id="generate-button"]')).click()
sleep(2)
siul1 = driver.find_element(By.XPATH, ('//*[#id="result"]/div/div[4]/div[1]/span')).get_attribute('textContent')
siul2 = driver.find_element(By.XPATH, ('//*[#id="result"]/div/div[7]/div[1]/span')).get_attribute('textContent')
siul3 = driver.find_element(By.XPATH, ('//*[#id="result"]/div/div[10]/div[1]/span')).get_attribute('textContent')
siul4 = driver.find_element(By.XPATH, ('//*[#id="result"]/div/div[13]/div[1]/span')).get_attribute('textContent')
siul5 = driver.find_element(By.XPATH, ('//*[#id="result"]/div/div[16]/div[1]/span')).get_attribute('textContent')
return render_template('index.html', siul1=siul1, siul2=siul2, siul3=siul3, siul4=siul4, siul5=siul5 )
if __name__ == '__main__':
app.run(debug=True)
(btw these codes are solely just for a school project)
The selenium part of the code was made by my friend, so I don't really have that much knowledge there, the problem might be hiding in there, I have no idea, but for what I have gathered till now, I think that it might be that the html5 code doesn't really understand it's purpose and the python code fails to send the answers to it?
I tried playing with "post" and "get" commands, with whole <form> code part, getting help from chatgpt, but the only change that I'd receive would be my python code displayed instead of C:/ files like in the one displayed :)
(Edit):
That's what I get when I insert a song's name in the tab:

Web Scraping Blocked by Robots Meta Directives

I am working on a web scraper to access scheduling data from a website. Our company has full access to this website and data via login credentials. With dynamic site navigation required, I am using Selenium for automated data scraping, Python, and BeautifulSoup to work with the HTML structure. With all variables defined, I have the following code:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import lxml.html as lh
opt = Options()
opt.headless = True
driver = webdriver.Chrome(options=opt, executable_path=<path to chromedriver.exe>)
driver = webdriver.Chrome(<path to chromedriver.exe>)
driver.get(<website login page URL>?username=' + username + '&password=' + password)
driver.get(<url of website page with data>?start_date=' + start_date + '&end_date=' + end_date +'&type=Excel')
soup = BeautifulSoup(driver.page_source, 'lxml')
print(soup)
The result of the print(soup) is as follows:
<html style="height:100%">
<head>
<meta content="NOINDEX, NOFOLLOW" name="ROBOTS"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="initial-scale=1.0" name="viewport"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
</head>
<body> ... irrelevant ... </body></html>
Before any questions, I do not have much knowledge regarding robot or HTTP requests. My questions are:
When I run a headless driver as above, the scrape is blocked by robots. When I run a regular, non-headless driver where an automated browser opens, the scrape is successful. Why is this the case?
What is the best method to get around this? The scrape is legal and non-exploitive as we practically have full access to the data we are scraping (we are a registered client). Will using the requests library solve this problem? Are there other methods of running headless web drivers that won't get blocked? Is there some parameter I can change that prevents the block?
How do I see the robots.txt file of a website?
you can use the following code to hide the webdriver
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
also, add this to your chromedriver options
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option('useAutomationExtension', False)

Is there any way to download the audio from a certain page

I am working on a selenium script with python, and want to download the audio coming from a certain page.
the page looks like this :
the HTML code of the page :
<html>
<head>
<meta name="viewport" content="width=device-width">
</head>
<body>
<video controls="" autoplay="" name="media">
<source src="https://website//id=47c484fc7f8f" type="audio/mp3">
</video>
</body>
</html>
my code so far:
from seleniumwire import webdriver
import sys
from webdriver_manager.chrome import ChromeDriverManager
import time
import pyaudio
import wave
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
# for linux/Ubuntu only
#chrome_options.add_argument("--no-sandbox")
browser = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options)
browser.get("website")
search = browser.find_element_by_id("text-area")
search.clear()
text = input("text here : ")
search.send_keys(text)
#print(data)
time.sleep(2)
browser.find_element_by_id("btn").click()
# Access and print requests via the `requests` attribute
for request in browser.requests:
if request.response and request.url.__contains__('website//id'):
browser.get(request.url)
I am open to work with any language to achieve the goal
You don't need Selenium for this, requests library is enough. You must provide a unique identifier to your post request as sessionID, so you can pick up the generated file in the next get request.
Use the following snippet as an example, it saves the generated file under provided sessionID name.
import requests
sessionID = '78aa8dd0-9529-11eb-a8b3-0242ac130003'
payload = {'ssmlText': '<prosody pitch=\"default\" rate=\"-0%\">Roses are red, violets are blue</prosody>', 'sessionID': sessionID}
r1 = requests.post("https://www.ibm.com/demos/live/tts-demo/api/tts/store", data = payload)
r1.raise_for_status()
print(r1.status_code, r1.reason)
tts_url = 'https://www.ibm.com/demos/live/tts-demo/api/tts/newSynthesize?voice=en-US_OliviaV3Voice&id=' + sessionID
try:
r2 = requests.get(tts_url, timeout = 10, cookies = r1.cookies)
print(r2.status_code, r2.reason)
try:
with open(sessionID + '.mp3', "w+b") as f:
f.write(r2.content)
except IOError:
print("IOError: could not write a file")
except requests.exceptions.Timeout as err:
print("Timeout: could not get response from the server")

Why Requests library cannot read the source-code?

I've been writing a python script for all the Natas challenges. So far, everything went smooth.
In challenge natas22, there is nothing on the page, but it gives you the link of the source-code. From the browser, I can reach to the source-code (which is PHP) and read it. But I cannot do it with my Python script. Which is very weird, because I've done that in other challenges...
I also tried to give a user-agent (up to date chrome browser), did not work.
Here is the small code:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.text)
Which returns:
<code><span style="color: #000000">
<br /></span>ml>id="viewsource"><a href="index-source.html">View sourcecode</a></div>nbsp;next level are:<br>";l.js"></script>
</code>
But in fact, it should had returned:
<? session_start();
if(array_key_exists("revelio", $_GET)) {
// only admins can reveal the password
if(!($_SESSION and array_key_exists("admin", $_SESSION) and $_SESSION["admin"] == 1)) {
header("Location: /");
} } ?>
<html> <head> <!-- This stuff in the header has nothing to do with the level --> <link rel="stylesheet" type="text/css" href="http://natas.labs.overthewire.org/css/level.css"> <link rel="stylesheet" href="http://natas.labs.overthewire.org/css/jquery-ui.css" /> <link rel="stylesheet" href="http://natas.labs.overthewire.org/css/wechall.css" /> <script src="http://natas.labs.overthewire.org/js/jquery-1.9.1.js"></script> <script src="http://natas.labs.overthewire.org/js/jquery-ui.js"></script> <script src=http://natas.labs.overthewire.org/js/wechall-data.js></script><script src="http://natas.labs.overthewire.org/js/wechall.js"></script> <script>var wechallinfo = { "level": "natas22", "pass": "<censored>" };</script></head> <body> <h1>natas22</h1> <div id="content">
<?
if(array_key_exists("revelio", $_GET)) {
print "You are an admin. The credentials for the next level are:<br>";
print "<pre>Username: natas23\n";
print "Password: <censored></pre>";
} ?>
<div id="viewsource">View sourcecode</div> </div> </body> </html>
Why it's behaving like this? I'm very curious and couldn't find out
If you want the url for trying from the browser:
url: http://natas22.natas.labs.overthewire.org/index-source.html
Username: natas22
Password: chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ
Your code seems to be fine. The source code use \r instead of \n, so most of the code is hidden in a terminal.
You can see this using response.content instead of response.test to see this:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.content)
Try:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.text.replace('\r', '\n'))
This also works:
import requests
user = 'natas22'
passw = 'chG9fbe1Tq2eWVMgjYYD1MsfIvN461kJ'
url = 'http://%s.natas.labs.overthewire.org/' % user
response = requests.get('http://natas22.natas.labs.overthewire.org/index-source.html', auth=(user, passw))
print(response.content.decode('utf8').replace('\r', '\n'))

Selenium - How to debug timeout for iframe > input field?

I'm stuck writing a Selenium WebDriver script for Instagram web login. I think I switched to the appropriate iframe but WebDriver keeps timing out when it should locate the user input field.
Relevant source from Instagram site:
https://instagram.com/accounts/login/
<iframe class="hiFrame" data-reactid=".0.0.0.1.0.1.0.0.$frame" src="https://instagram.com/accounts/login/ajax/?targetOrigin=https%3A%2F%2Finstagram.com" scrolling="no" seamless="">
<!DOCTYPE html>
<html class="hl-en not-logged-in " lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<body class="LoginFormChrome ">
<div class="LoginFormPage" data-reactid=".0">
<form data-reactid=".0.0">
<p class="lfField" data-reactid=".0.0.0">
<label class="lfFieldLabel" data-reactid=".0.0.0.0">
<input class="lfFieldInput" type="text" data-reactid=".0.0.0.1" value="" autocorrect="false" autocapitalize="false" maxlength="30" name="username">
</p>
Source from Selenium script:
login_url = 'https://instagram.com/accounts/login/'
profile_url = '<path_firefix_profile>'
user = '<user_name>'
#login
my_profile = FirefoxProfile(profile_url)
self.driver = webdriver.Firefox(my_profile)
self.driver.get(login_url)
self.driver.implicitly_wait(10)
my_iframe = self.driver.find_element_by_css_selector("iframe.hiFrame")
#my_iframe = self.driver.find_element_by_css_selector("iframe:nth-of-type(1)")
#my_iframe = self.driver.find_element_by_tag_name("iframe")
self.driver.switch_to_frame(my_iframe)
try:
element = WebDriverWait(self.driver, 30).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "input[name='username']")))
user_input = self.driver.find_element_by_css_selector("input[name='username']")
user_input.send_keys(user)
finally:
print('user name input appeared')
Results:
This error results from WebDriver:
File "instagram_firefox.py", line 51, in setUp
element = WebDriverWait(self.driver, 45).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "input[name='username']")))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/selenium/webdriver/support/wait.py", line 71, in until
raise TimeoutException(message)
I tried to verify that the css selector for the input field was correct. On the page, https://instagram.com/accounts/login/, FireFox FireFinder does not recognize the css selector that I used. But if I open another tab with the source of the iframe, https://instagram.com/accounts/login/ajax/?targetOrigin=https%3A%2F%2Finstagram.com, then Firefinder recognizes the css selector that I used. Does this mean I need to manually get the url of the iframe source or should that be done automatically when WebDriver switches to the iframe?
We should wait first for div spinner element to disappear, then we can retrieve that iframe you need:
user = "user"
self.driver.get("https://instagram.com/accounts/login/")
#Wait for spinner to disappear
WebDriverWait(self.driver, 10).until(EC.invisibility_of_element_located((By.CSS_SELECTOR, "div.liSpinnerLayer")))
#Get iframe and switch to it
my_iframe = self.driver.find_element_by_css_selector("iframe.hiFrame")
self.driver.switch_to_frame(my_iframe)
element = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "input[name='username']")))
element.send_keys(user)

Categories