I have just started learning about webscraping using selenium and mechanize with great results. I was wondering if it was at all possible to scrape a cgi python script from a site as well so I can replicate the sites functionality offline.
Here is an example script.
http://www.tutorialspoint.com/cgi-bin/hello.py
When I attempt to scrape this file I get the html output from the script instead of the script itself. Which is this:
<html>
<head>
<title>Hello Word - First CGI Program</title>
</head>
<body>
<h2>Hello Word! This is my first CGI program</h2>
</body>
</html>
The details of the python script can be found here:
http://www.tutorialspoint.com/python/python_cgi_programming.htm
If you can provide any insight I would be extremely grateful.
Thanks
Each site consists of two parts: the back-end and front-end.
"Back-end" means on the server side, usually PHP, Python, ASP or JSP languages.
"Front-end" means client side: HTML, JavaScript and CSS.
You, as a surfer view only the front-end, and this is what you scrape. You have no access to the back end.
Related
I am trying to get HTML/webpage interface to run a python script I wrote that generates a graph and saves it in a local directory. I would like to click a single button and have the python code run, generate the graph and save it in a folder which then HTML could embed it on the HTML page. My issue is this code isn't working, the code is not run and no new graph is generated. Both the HTML and Python file (Create_Chart.py) are in the same folder.
Python code
globals().clear()
import numpy as np
import matplotlib.pyplot as plt
import os
rand=np.random.normal(100,1,size=[100,1])
chart=plt.plot(rand)
plt.savefig(r'C:\Users\...\example_chart.png')
HTML/Javascript code
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<input type="button" id='script' name="scriptbutton" value=" Run Script " onclick="goPython()">
<script src="http://code.jquery.com/jquery-3.3.1.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<script>
function goPython(){
$.ajax({url: "Create_Chart.py",context: document.body}).done(function() {alert('finished python script');;});
}
</script>
</body>
</html>
I am trying to follow the example from this other stackoverflow thread. What am I doing wrong?
How can I execute a python script from an html button?
The javascript code is being called from the following link 'http://code.jquery.com/jquery-3.3.1.min.js'. Do I need to install Jquery on my local computer?
You'll need a webserver software (as in nginx) which understands how to run python scripts (as in a WSGI application). The script then needs to be something that understands HTTP. Your script needs then to answer with an HTTP response.
Since you're asking a question on that level, these are many concepts. I suggest you to watch into Django or FastAPI (Python web frameworks; google them). Those come with inbuild test webservers and get you started.
I am used to using BeautifulSoup to scrape a website, however this website is different. Upon soup.prettify() I get back Javascript code, lots of stuff. I want to scrape this website for the data on the actual website (company name, telephone number etc). Is there a way of scraping these scripts such as Main.js to retrieve the data that is displayed on the website to me?
Clear version:
Code is:
<script src="/docs/Main.js" type="text/javascript" language="javascript"></script>
This holds the text that is on the website. I would like to scrape this text however it is populated using JS not HTML (which I used to use BeautifulSoup for).
You're asking if you can scrape text generated at runtime by Javascript. The answer is sort-of.
You'd need to run some kind of headless browser, like PhantomJS, in order to let the Javascript execute and populate the page. You'd then need to feed the HTML that the headless browser generates to BeautifulSoup in order to parse it.
I've created a local address http webserver that will hopefully be able to open both HTML webpages and python scripts that create HTML webpages, but I keep getting a 501 error because I can't figure out how to get the webserver to recognize the code correctly. Another note: I'm coding on Windows, I'm not using Cygwin (I don't know anything about Cygwin, so if it's the recommended method here tips on how to get started with it would be appreciated.)
Anyways, here's what I've got.
I have created a folder in C:\ called server_test, and inside this folder is where I have been putting all my relevant HTML and python files. I've been editing my .py files in PyDev in eclipse (C:\workspace) and then copying the files over and putting them in C:\server_test.
To get my server running, I navigate to \server_test in cmd, open python, then create a webserver using HTTPServer, CGIRequestHandler, the current directory, address 127.0.0.1, and a port of my choosing (usually 9090). Once I've done this, I can go to my web browser and type in the address for one of my HTML pages and it runs perfectly fine. However, I currently have an HTML page meant to call a python script (also located in \server_test) that will create another HTML page, but I can't get it to work.
My HTML code looks like this:
<html>
<title>Debug Page</title>
<body
<h1>This is a test file. </h1>
<form method=POST action="my_code_2.py">
<P><input type=submit>
</form>
</body></html>
And then my 'my_code_2.py' looks like this:
#!C:\Python35-32\python.exe
import cgi
import cgitb; cgitb.enable()
print("Content-type: text/html\r\n\r\n")
print('<html>')
print('<h1>')
print('<title>This is a second test.</title>')
print('</h1>')
print('</body')
print('</html>')
From what I've read about shebang lines, it appears Windows native doesn't support them? So how can I make sure that my computer knows it's supposed to run the code as Python? At the moment, when I press the button on my first HTML page, the page http://127.0.0.1:port/my_code_2.py is merely a white page with my python code printed on it.
Try changing the extension to .cgi on your python file and see if that helps. You'll probably want to use something like bottle.py or django tho if you're running python on a webserver. Bottle is easier to learn but with fewer features.
http://bottlepy.org/docs/dev/index.html
https://www.djangoproject.com/
As mentioned here, I can import Python codes inside .html files using <% and %> tags. Just to try it, I wrote the below code in notepad and save it as a file named test.html :
<html>
<title>
</title>
<body>
<%print ("Hello")%>
</body>
Once I do a double click on the test.html, Chrome opens with the below line on the top :
<%print ("Heloo")%>
What I must I do to have 'Hello' in output?
Note: "print" is an example, What kind of ways is there to import and run python codes in html files?
That page is related to Karrigell a Python web framework, you can only have Python and HTML files (Web pages) if you use a Python web framework like web.py, Pylons, Django, and others.
Browsers can only execute JavaScript code, other programming languages have to use special components to be executed by browsers.
I have a bigger project to handle, so this is what I want to do:
I have a Server with an MySQL database and Apache webserver running on. I save some machine information data in the database and want to create a web app to see, e. g. if the machine is running.
The web app should be designed responsive, i. e. changing design in accordance to the screen resolution of the current used device. This is important because the app will be used from smartphones and tablets mainly, but should also work on a normal pc.
I wrote a Python programm for my machine to get the data, and another Python programm on my server receiving information and saving in the database.
So my job now is to create the "responsive website" for my smartphone etc.
Then I want to broadcast this with my webserver.
Another Point is, that the web app should be build dynamically.
If I add another machine to my database, it should appear on my web app to be clickable and then show the related information.
First I thought about doing this in HTML5 and CSS3, with the use of jQueryMobile.
But I never used javascript. I'm just experienced in the "old" HTML and CSS.
Is Django a better choice, since I'm quite experienced in Python?
Or do I need both perhaps?
I haven't worked with any webframework yet, please help me choosing.
Or do I need one at all?
It looks like your server layer is OK for getting server informations and storing informations in database. Done with python.
And now, if I can resume, you need :
a reponsive web client
notification features
dynamically able to display new set for html elements
Based on this, I doubt in the fact that you will find a complete already packaged solution. Django should have this kind of features but it is not my favorite approach for such custom requirements.
If I have to do this I would use :
NodeJS for serverside code managing notifications
AngularJS for clientside managing client (!) and clean dynamic DOM manipulation with directives.
CSS Framework like Foundation or Bootstrap where responsive is native
What I would do is :
Init Phase
install nodejs and yeoman
initialize an angular app
write basic nodeJS server with a basic HTTP service
test your HTTP service with curl & your app with chrome or FF
Integration Phase
write basic angular HTTP call to this service
add communication between Node and Python (See
Combining node.js and Python
or something like this)
Client & Look and feel phase
add CSS framework for responsive and use it (navbar, table...)
look at Angular directives, develop a directive for adding new DOM elements
Finish / Clean your code and rollout
My solution now is as follows:
I will use the bottle microframework for generating serverside dynamic html-pages on request.
This will cause me to reload the page everytime I want to see new machine information, but for now it is enough for me.
Later I can add AJAX for live monitoring (I know this is javascript, I think I have to learn it anyway.)
Thanks for your solutions though.
You can put Bootstrap too for making responsive website.
Follow the code below code in your index.html in Django template.
<html>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
{% block head %}
:
:
{% endblock %}
{% block body %}
.
.
.
{% endblock %}
<script
src="https://code.jquery.com/jquery-3.2.1.min.js"
integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"
integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa"
crossorigin="anonymous"></script>
</html>