I have been trying to get some information of a website with python. I have tried using requests and selenium to get the HTML code of the website but I always get this HTML. I guess the website realizes it is not an actual person doing the search and therefore denies access. Is there any way to solve this issue and get the HTML code of this website?
<html lang="en"><head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Access to this page has been denied.</title>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:300" rel="stylesheet">
<style>
html, body {
margin: 0;
padding: 0;
font-family: 'Open Sans', sans-serif;
color: #000;
}
a {
color: #c5c5c5;
text-decoration: none;
}
.container {
align-items: center;
display: flex;
flex: 1;
justify-content: space-between;
flex-direction: column;
height: 100%;
}
.container > div {
width: 100%;
display: flex;
justify-content: center;
}
.container > div > div {
display: flex;
width: 80%;
}
.customer-logo-wrapper {
padding-top: 2rem;
flex-grow: 0;
background-color: #fff;
visibility: visible;
}
.customer-logo {
border-bottom: 1px solid #000;
}
.customer-logo > img {
padding-bottom: 1rem;
max-height: 50px;
max-width: 100%;
}
.page-title-wrapper {
flex-grow: 2;
}
.page-title {
flex-direction: column-reverse;
}
.content-wrapper {
flex-grow: 5;
}
.content {
flex-direction: column;
}
.page-footer-wrapper {
align-items: center;
flex-grow: 0.2;
background-color: #000;
color: #c5c5c5;
font-size: 70%;
}
#media (min-width: 768px) {
html, body {
height: 100%;
}
}
</style>
<!-- Custom CSS -->
<link rel="stylesheet" type="text/css" href="https://d33a4decm84gsn.cloudfront.net/static/partners/perimeterx/perimeterx.css">
<script type="text/javascript" async="" src="https://www.gstatic.com/recaptcha/releases/zItNOfzbrqVGbb4QFYpPpcrw/recaptcha__es.js"></script><script src="/Z5wgH7n9/captcha/captcha.js?a=c&u=ad14b320-8116-11ea-9d99-a1ff7eeb44e0&v=&m=0"></script><script src="https://www.recaptcha.net/recaptcha/api.js?hl=es-ES"></script><script src="/Z5wgH7n9/init.js"></script><a tabindex="-1" aria-hidden="true" href="/colleges/yale-university/?_pxhc=1587174500133" rel="nofollow" target="_blank" style="width: 0px; height: 0px; font-size: 0px; line-height: 0;"></a></head>
<body>
<section class="container">
<div class="customer-logo-wrapper">
<div class="customer-logo">
<img src="https://www.niche.com/about/wp-content/themes/niche-about/images/about-home/stacked-green.svg" alt="Logo">
</div>
</div>
<div class="page-title-wrapper">
<div class="page-title">
<h1>Please verify you are a human</h1>
</div>
</div>
<div class="content-wrapper">
<div class="content">
<div id="px-captcha"><div class="g-recaptcha" data-sitekey="6Lcj-R8TAAAAABs3FrRPuQhLMbp5QrHsHufzLf7b" data-callback="handleCaptcha" data-theme="dark"><div style="width: 304px; height: 78px;"><div><iframe src="https://www.google.com/recaptcha/api2/anchor?ar=1&k=6Lcj-R8TAAAAABs3FrRPuQhLMbp5QrHsHufzLf7b&co=aHR0cHM6Ly93d3cubmljaGUuY29tOjQ0Mw..&hl=es&v=zItNOfzbrqVGbb4QFYpPpcrw&theme=dark&size=normal&cb=19z4nanjwlu" width="304" height="78" role="presentation" name="a-s7me84fdbal4" frameborder="0" scrolling="no" sandbox="allow-forms allow-popups allow-same-origin allow-scripts allow-top-navigation allow-modals allow-popups-to-escape-sandbox"></iframe></div><textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid rgb(193, 193, 193); margin: 10px 25px; padding: 0px; resize: none; display: none;"></textarea></div><iframe style="display: none;"></iframe></div></div>
<p>
Access to this page has been denied because we believe you are using automation tools to browse the
website.
</p>
<p>
This may happen as a result of the following:
</p>
<ul>
<li>
Javascript is disabled or blocked by an extension (ad blockers for example)
</li>
<li>
Your browser does not support cookies
</li>
</ul>
<p>
Please make sure that Javascript and cookies are enabled on your browser and that you are not blocking
them from loading.
</p>
<p>
Reference ID: #ad14b320-8116-11ea-9d99-a1ff7eeb44e0
</p>
</div>
</div>
<div class="page-footer-wrapper">
<div class="page-footer">
<p>
Powered by
PerimeterX
, Inc.
</p>
</div>
</div>
</section>
<!-- Px -->
<script>
window._pxAppId = 'PXZ5wgH7n9';
window._pxJsClientSrc = '/Z5wgH7n9/init.js';
window._pxFirstPartyEnabled = true;
window._pxVid = '';
window._pxUuid = 'ad14b320-8116-11ea-9d99-a1ff7eeb44e0';
window._pxHostUrl = '/Z5wgH7n9/xhr';
</script>
<script>
var s = document.createElement('script');
s.src = '/Z5wgH7n9/captcha/captcha.js?a=c&u=ad14b320-8116-11ea-9d99-a1ff7eeb44e0&v=&m=0';
var p = document.getElementsByTagName('head')[0];
p.insertBefore(s, null);
if (true) {
s.onerror = function () {
s = document.createElement('script');
var suffixIndex = '/Z5wgH7n9/captcha/captcha.js?a=c&u=ad14b320-8116-11ea-9d99-a1ff7eeb44e0&v=&m=0'.indexOf('captcha.js');
var temperedBlockScript = '/Z5wgH7n9/captcha/captcha.js?a=c&u=ad14b320-8116-11ea-9d99-a1ff7eeb44e0&v=&m=0'.substring(suffixIndex);
s.src = '//captcha.px-cdn.net/PXZ5wgH7n9/' + temperedBlockScript;
p.parentNode.insertBefore(s, p);
};
}
</script>
<!-- Custom Script -->
</body></html>
It's clear the website is able to recognize your bot. Since I am not aware of what website you are trying to scrape, I can't tell if this particular method will work.
Try changing the user agent. By default, the user agent of the chromedriver is different from the usual Chrome browser.
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
options = Options()
options.add_argument("user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36")
driver = webdriver.Chrome(chromedriver,chrome_options=options)
Related
I am trying to build a web-app which extracts and inputs information about different city buildings using Python, Flask and HTML. At the moment I want to create a button which after clicking it will give me a list of all the buildings available on the database. The database is populated and stored in PostgreSQL. The problem is that the button is created and displayed but the list is not. I used the second answer on this link as a reference.
My Python code looks like as follows :
app = flask.Flask(__name__)
#app.route('/')
def home():
return flask.render_template('interface.html')
#app.route('/GetBuildingsLists', methods = ['GET','POST'])
def GetBuildingsLists():
print('Connecting to the PostgreSQL database...')
db = pg.connect(
host="****",
database="****",
user ="****",
password="*****")
db_cursor = db.cursor()
print('PostgreSQL database version:')
db_cursor.execute('SELECT version()')
q = ("SELECT building_id FROM table1")
db_cursor.execute(q)
buildings = db_cursor.fetchall()
unique_buildings = list(dict.fromkeys(buildings))
db_cursor.close()
#print(unique_buildings)
return flask.render_template('interfaceLists.html', unique_buildings = unique_buildings)
if __name__ == '__main__':
app.run()
Meanwhile, on a template folder I have interfaceLists.html as below :
<html>
<head>
<title>Results</title>
<style>
.links-unordered {
display: inline-block;
position: relative;
}
.links-unordered {
margin-top: 20px;
min-height: 30px;
}
.links-unordered .toggle-button {
text-decoration: none;
padding: 12px 16px 12px 16px;
transition: 0.2s;
border: 1px solid black;
}
.links-unordered .toggle-button:hover,
.links-unordered .toggle-button:active,
.links-unordered .toggle-button:focus,
.links-unordered .toggle-button:visited {
text-decoration: none;
color: black;
}
.links-unordered .toggle-button:hover {
border-width: 2px;
}
.links-unordered ul {
position: absolute;
top: 10px;
margin-top: 25px;
padding-inline-start: 20px;
}
.links-unordered ul li {
line-height: 25px;
padding-left: 15px;
}
.links-unordered a {
text-decoration: none;
color: black;
}
</style>
</head>
<body>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="demo_script_src.js"></script>
<div class="links-unordered">
<a class="toggle-button" href="#">Buildings</a>
{% for building in unique_buildings %}
<ul style="display:none;">
<li>building[0]</li>
</ul>
{% endfor %}
</div>
</body>
</html>
I take the results from the PostgreSQL using a query in the python code and store them in the list called unique_buildings. I have tried to display afterwards the results as an unordered list but the list is not displayed.
The .js file mention in the HTML file performs the animation while using the button and it looks like this :
$(document).ready(function() {
$(".toggle-button").click(function() {
$(this).parent().find("ul").slideToggle(function() {
// Animation complete.
});
});
})
Can someone please help me by telling what might be wrong with my script and why what I want is not working? Thank you!!
I have a flask app that executes scripts using exec(script_name, globals()) and is running in Google Cloud Run using a docker. All my scripts are in Google Cloud storage. So I use gcsfs module to read the scripts from GCS and execute.
For eg:
exec(gcs_file_system.open(<script_from_cloud>).read(), globals())
But the problem I am facing is that, whenever there is a new package to be imported, I need to first install that package through my flask app using exec() function. As far, I have tried using
1. exec("os.system('pip install package_name')", globals())
2. exec("subprocess.check_call([sys.executable, '-m', 'pip', 'install', package_name])", globals())
3. import pip
pip.main(['install', package_name])
4. import pip
exec("pip.main(['install', package_name])", globals())
5. os.system('pip install package_name')
6. subprocess.check_call([sys.executable, '-m', 'pip', 'install', package_name])
All these were tried executing in a script script.py which i call using
exec(gcs_file_system.open('bucket_name..../script.py').read())
Everytime i try any of these, I either get an upstream disconnect error or the script simply fails. I really need some help or suggestion on how to install a package through a flask app that is running in the cloud (Google Cloud Run).
Installing a package can be done by defining another route function just for installing a package.
This was done by providing the below statement in a separate route function rather than specifying to install the package in another exec(script) function inside a route function.
exec("os.system('pip install " + str(packages) + "')", globals())
Server.py
#app.route('/add_package', methods=['GET', 'POST'])
def add_package():
return render_template('add_package.html')
#app.route('/add_package_success', methods=['GET', 'POST'])
def add_package_success():
code_content = request.form.get("code_editor", "").split("\n")
for empties in range(code_content.count("")):
code_content.remove("")
code_content = [packages.strip().replace("\n", "") for packages in code_content]
for packages in code_content:
exec("os.system('pip install " + str(packages) + "')", globals())
return render_template('add_package_success.html')
add_package.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Add Packages</title>
<link rel="stylesheet" href="{{ url_for('static', filename='css/codemirror.css') }}">
<script type="text/javascript"
src="{{ url_for('static', filename='codemirror.js') }}"></script>
<script type="text/javascript"
src="{{ url_for('static', filename='python.js') }}"></script>
<style>
.center {
width: 15%;
border: 2.5px solid red;
}
.header{
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
font-weight: bold;
position: relative;
left: 10%;
}
.body_text{
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
}
.info_div{
height: 180px;
width: 50%;
position: absolute;
left: 50px;
display: none;
z-index:100;
}
.info{
height: 15px;
width: 15px;
background-color: yellow;
position: relative;
left: 30px;
border: solid red;
text-align: center;
font-weight bold;
font-family: Arial, Helvetica, sans-serif;
}
.info:hover{
cursor: help;
}
.info:hover + .info_div{
display: block;
}
.submit_btn_2 {
color: white;
border: solid;
position: relative;
background-color: #003280;
width: 170px;
height: 30px;
}
.submit_btn_2:hover {
background-color: #575558;
cursor: pointer;
}
</style>
</head>
<body>
<form action="/add_package_success" method="post">
<div class="center">
<p class='header'>Add Packages</p><input class='info' value='?' readonly<br/><br/>
</div><br/>
<div>
<a class="body_text" style="border: 2px #020d36 solid;color: #3a18a5;text-align: left;">Enter the Packages name (one in each line): </a><br/><br/>
<textarea name="code_editor" id="code_editor"></textarea><br/><br/>
<button type="submit">Add</button><br/><br/>
<button formaction="/" style="left: 0px; top: 10px; width: 100px; height: 30px;" class = "submit_btn_2" type="submit"> Home </button>
</div>
</form>
</body>
<script>
var editor = CodeMirror.fromTextArea(document.getElementById("code_editor"), {
mode: {name: "python",
version: 3,
singleLineStringErrors: false},
lineNumbers: true,
indentUnit: 4,
matchBrackets: true
});
</script>
</html>
add_package_success.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Code Edited</title>
</head>
<body onload="load_function()">
</body>
<script>
function load_function(){
if(alert("Package Installed.\n\nPress OK to go HOME.")){
window.location.href = "{{ url_for('index') }}";
}
window.location.href = "{{ url_for('index') }}";
}
</script>
</html>
index.html
<form method='post' action='/'>
<button class = "submit_btn_2" formaction="/add_package" id='add_package' style="position: absolute; top: 210px; left:1230px; height: 30px;" name="add_package">Install Packages</button>
</form>
the cards goes horizontally i want it to go down vertically when the card goes off screen.
This is a project of django so im looping over a div but the card is going off the screen horizontally i want it to go vertically when the card goes off the screen.
PYTHON CODE [HTML]
`
<div class="container">
{% for project in projects %}
<div class="items">
<h2>{{project}}</h2>
<div class="purl">
<p>{{project.description}}</p>
</div>
<div class="purl1">
<a href='{{project.url}}'>{{project.url}}</a>
</div>
<div class='buttons'>
<a class='btn' href='{% url "update" project.id %}'>Edit</a>
<a style='background-color:#f25042' class='btn' href='{% url "delete" project.id %}'>Delete</a>
</div>
</div>
{% endfor %}
</div>
`
CSS
`
.container{
display:flex;
}
.purl{
text-overflow: hidden;
padding-right:.1em;
}
.purl1{
max-width:20em;
margin-right:14px;
}
.items{
border-radius: 8px;
padding:1em 1.5em .5em 1.4em;
box-shadow: 0px 0px 17px 0px black;
margin:5% 1em 2em .5em;
width:100%;
}
.items h2{
color:#fffffe;
}
.items p{
color:#94a1b2;
}
a{
text-decoration: none;
color:#7f5af0;
}
a:hover{
text-decoration: none;
color:#fffffe;
}
.buttons{
display:flex;
}
.btn{
float:left;
text-align: center;
font-size:17px;
font-weight:bold;
padding:.5em;
margin:2em 1em .5em 1em ;
width:3.5em;
border:none;
border-radius: 8px;
background-color: #2cb67d;
color:#fffffe;
font-family: 'Noto Sans', sans-serif;
}
.btn:hover{
background-color: #3dd696;
cursor: pointer;
}
#media only screen and (max-width: 800px) {
.container{
margin-left:10%;
display:block;
}
.items{
width:60%;
}
}
`
this is the html and css of the code.
I'm looping over html by python [ django ]
so i want to make cards responsive
It is very simple, just use flex-wrap: wrap; to make the flexible items wrap if necessary:
Please add flex-wrap style to your container element in your CSS:
.container{
display:flex;
flex-wrap: wrap;
}
This has nothing to do with Django or Python, you will be able to achieve that only in CSS, whatever the backend is.
You should use CSS flexbox layout, even in your CSS media queries, by keeping your display:flex whatever the media query.
Then use different flex-direction property in CSS depending on your media query / device, in order to order the inside elements horizontally or vertically.
More info: https://css-tricks.com/snippets/css/a-guide-to-flexbox/
I´ve been looking for answers and just lost the sight in all the different ones here.
I am bulding a tool, where you can enter a funding ID (included in papers) and a python code collects different papers from different websites and gives them a score in relevance.
My problem: I built a website with html/css and now I want to use the entered funding ID in one of the forms to pass it on to my python program. I know that i can use action in the form in my html to connect my html file with a different file. I read a lot of things about CGI and servers and Apache, etc. others talked about flask. I just want to find a simple way to exchange information from my html file and my python code and how can I display the information I got from my code in an HTML website?
Thank you!
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
body {
font-family: "Verdana", sans-serif;
}
.header {
padding: 60px;
text-align: center;
background: #0C0040;
color: white;
font-size: 30px;
}
.text_header {
margin-left: 160px;
letter-spacing: 6px;
}
.sub_header {
color: #00BFFF;
letter-spacing: 4px;
}
.sidenav {
height: 100%;
width: 160px;
position: fixed;
z-index: 1;
top: 0;
left: 0;
background-color: #474e5d;
overflow-x: hidden;
padding-top: 20px;
}
.sidenav a {
padding: 6px 8px 6px 16px;
text-decoration: none;
font-size: 25px;
color: #00BFFF;
display: block;
}
.sidenav a:hover {
color: #f1f1f1;
}
.main {
margin-left: 160px; /* Same as the width of the sidenav */
font-size: 28px; /* Increased text to enable scrolling */
padding: 0px 10px;
text-align: center;
}
.title{
color: #0C0040
text-align: center;
}
#media screen and (max-height: 450px) {
.sidenav {padding-top: 15px;}
.sidenav a {font-size: 18px;}
}
</style>
</head>
<body>
<div class="header">
<div class="text_header">
<h1>Looking for more?</h1>
<div class="sub_header">
<p>find equally relevant papers from the same funder</p>
</div>
</div>
</div>
<!--- so far only the About Us page is linked --->
<div class="sidenav">
About Us
Services
Contact
</div>
<div class="main">
<div class="title"> <h2>Insert your funding ID here:</h2></div>
<!--- this is the form where the input is put in--->
<div class="input">
<form name="search" action="../Python/example.py" method="post">
<label for="input">enter in correct format:</label>
<input type="text" name="input" id="input">
<input type="submit" value="Submit">
</form>
</div>
</div>
</body>
</html>
WARNING: This is only for amusement purposes--do not use in production.
Sticking to the Python standard library, here's an example to get you started.
from http.server import BaseHTTPRequestHandler, HTTPServer
class WebServer(BaseHTTPRequestHandler):
def do_POST(self):
content_length = int(self.headers['Content-Length'])
funding_id = bytes.decode(self.rfile.read(content_length)).split('input=')[1]
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes(f'<html><head><title>Funding ID Form Submit</title></head><body><p><b>Funding ID:</b> {funding_id}</p></body></html>', "utf-8"))
ws = HTTPServer(('localhost', 8080), WebServer)
ws.serve_forever()
You will need to change the action attribute of your HTML <form> to action="http://localhost:8080/" to see this in action.
Similarly, you can serve normal page requests by implementing the do_GET method for the WebServer class.
As already stated, this is for amusement/learning purposes. If you're putting something into production, look into learning something like Flask or Django
I've the following code in my webpage (Python/Django framework) to enable a video to play in the background.
HTML
<div class="video-container">
<div class="video-container-bg">
<video playsinline autoplay muted loop poster="{{page.image.url}}" id="bgvid">
<source src="{{page.video.url}}" type="video/mp4">
<source src="{{page.mac_video.url}}" type="video/webm">
</video>
<div class="container">
<div class="row">
<div class="col-sm-12 col-md-8">
<div class="animation-element bounce-up">
<h1 class="page-title">{{page.page_title}}</h1>
<p class="strapeline">{{page.strapline}}</p>
<a class="butt" href="#about-us">Learn More</a>
</div>
</div>
</div>
</div>
</div>
</div>
CSS
video#bgvid {
position: absolute;
top: 50%;
left: 50%;
min-width: 100%;
min-height:100%;
overflow: hidden !important;
z-index: -100;
-ms-transform: translateX(-50%) translateY(-50%);
-moz-transform: translateX(-50%) translateY(-50%);
-webkit-transform: translateX(-50%) translateY(-50%);
transform: translateX(-50%) translateY(-50%);
background: url() no-repeat;
background-size: 100%;
}
.video-container {
min-height: calc(100vh - 75px);
overflow: hidden !important;
position: relative;
}
.video-container-bg {
padding-top: 25vh;
color: #fff;
}
It works fine on everything except Safari where nothing plays. Why not? Is it something Apple have set to prevent? In fact, when I run Safari on Windows it's telling me it cannot play HTML5 video. Is that right?
I've solved this by saving the mp4 files in a lossless state. It now seems to work. I have no idea why