Execute Javascript function on website using Python - python

Is it possible to call a Javascript function on a website that I'm web scraping and saving the result of the function?
I'm using Requests to establish a connection and saving certain pages that I need and BeautifulSoup to make it readable and accessing certain parts.
There is one part that I'm not sure how to call, or even if it's possible:
<tr class=TRDark>
<td width=100% colspan=3>
<a href="" onclick="OpenPayPlan('payplan.asp?app=******');return false;">
Betalingsplan
</a>
</td>
</tr>
This function will open a new window and calculate some data that I need. Is this possible to do with Python?
I cannot use Selenium or similar programs for this. This must be executed in the terminal and only the terminal.

You need to find a JavaScript interpreter with Python bindings maybe. When you've found one which will fit with your needs you can read the documentation and there you can see how this interpreter works. An example could be pyv8. Python however, does not include a JavaScript interpreter.

Related

Is it possible to download the 'inspect element' data from a website?

I have been trying to access the inspect element data from a certain website (The regular source code won't work for this). At first I tried rendering the javascript for the site. I've tried using selenium, pyppeteer, webbot, phantomjs, and request_html + beautifulsoup. All of these did not work. Would it be possible to simply copy-paste this data using python?
The data I need is from https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 and looks like this:
<nav class="feature-list">
<span style="" id="ember683" class="flex-horizontal feature-list-item ember-view">
(all span's in this certain nav)

"Clicking" button with requests

I have this little website i want to fill in a form with the requests library. The problem is i cant get to the next site when filling the form data and hitting the button(Enter does not work).
The important thing is I can't do it via a clicking bot of some kind. This needs to be done so I can run in without graphics.
info = {'name':'JohnJohn',
'message':'XXX',
'sign':"XXX",
'step':'1'}
First three entries name, message, sign are the text areas and step is I think the button.
r = requests.get(url)
r = requests.post(url, data=info)
print(r.text)
The Form Data looks like this when i send a request via chrome manually:
name:JohnJohn
message:XXX
sign:XXX
step:1
The button element looks like this:
<td colspan="2" style="text-align: center;">
<input name="step" type="hidden" value="1">
<button id="button" type="button" onclick="myClick();"
style="background-color: #ef4023; width: 80px; font-face: times; font-size: 14pt;">
Wyƛlij
</button>
</td>
The next site if i do this manually has the same adres.
As you might see from the snipped you posted, clicking the button is triggering some JavaScript code, namely a method called myClick().
It is not straightforward to click on this thing using pythons requests library. You might have more luck trying to find out what happens inside myClick(). My guess would be that at some point, a POST request will be made to a HTTP endpoint. If you can figure this out, you can translate it into your python code.
If that does not work, another option would be to use something like Selenium/PhantomJS, which give you the ability to have a real, headless and scriptable browser. Using such a tool, you can actually have it fill out forms and click buttons. You can have a look at this so answer, as it shows you how to use Selenium+PhantomJS from python.
Please make sure not to abuse such methods by spamming forums or [insert illegal or otherwise abusive activity here].
In such a situation when you need to forge scripted button's request, it may be easier not to guess the logic of JS but instead perform a physical click and look into chrome devtools' network sniffer which gives you a plain request made which, in turn, can be easily forged in Python

Can't scrape nested html using BeautifulSoup

I have am interested in scraping "0.449" from the following source code from http://hdsc.nws.noaa.gov/hdsc/pfds/pfds_map_cont.html?Lat=33.146425&Lon=-87.5805543.
<td class="tblInner" id="0-0">
<div style="font-size:110%">
<b>0.449</b>
</div>
"(0.364-0.545)"
</td>
Using BeautifulSoup, I currently have written:
storm=soup.find("td",{"class":"tblInner","id":"0-0"})
which results in:
<td class="tblInner" id="0-0">-</td>
I am unsure of why everything nested within the td is not showing up. When I search the contents of the td, my result is simply "-". How can I scrape the value that I want from this code?
You are likely scraping a website that uses javascript to update the DOM after the initial load.
You have a couple choices:
Find out where did the javascript code that fills the HTML page got the data from and call this instead. The data most likely comes from an API that you can call directly with CURL. That's the best method 99% of the time.
Use a headless browser (zombie.js, ...) to retrieve the HTML code after the javascript changes it. Convenient and fast, but few tools in python to do this (google python headless browser).
Use selenium or splinter to remote control a real browser (chrome, firefox, ...). It's convenient and works in python, but slow as hell
Edit:
I did not see that you posted the url you wanted to scrape.
In your particular case, the data you want comes from an AJAX call to this URL:
http://hdsc.nws.noaa.gov/cgi-bin/hdsc/new/cgi_readH5.py?lat=33.1464&lon=-87.5806&type=pf&data=depth&units=english&series=pds
You now only need to understand what each parameter does, and parse the output of that instead of writing an HTML scraper.
Please excuse lack of error checking and modularity, but this should get you what you need, based on #Eloims observation:
import requests
import re
url = 'http://hdsc.nws.noaa.gov/cgi-bin/hdsc/new/cgi_readH5.py?lat=33.1464&lon=-87.5806&type=pf&data=depth&units=english&series=pds'
r = requests.get(url)
response = r.text
coord_list_text = re.search(r'quantiles = (.*);', response)
coord_list = eval(coord_list_text.group(1))
print coord_list[0][0]

omegle lxml scrape not working

So I'm performing a scrape of omegle trying to scrape the users online.
This is the HTML code:
<div id="onlinecount">
<strong>
30,000+
</strong>
</div>
Now I would presume that using LXML it would be //div[#id="onlinecount"] to scrape any text within the , I want to get the numbers from the tags, but when I try to scrape this, I just end up with an empty list
Here's my relevant code:
print "\n Grabbing users online now from",self.website
site = requests.get(self.website)
tree = html.fromstring(site.text)
users = tree.xpath('//div[#id="onlinecount"]')
Note that the self.website variable is just http://www.omegle.com
Any ideas what I'm doing wrong? Note I can scrape other parts just not the number of online users.
I ended up using a different set of code which I learned from a friend.
Here's my full code for anyone interested.
http://pastebin.com/u1kTLZtJ
When you send a GET request to "http://www.omegle.com" using requests python module,what I observed is that there is no "onlinecount" in site.text. The reason is that part gets rendered by a javascript. You should use a library that is able to execute the javascript and give you the final html source that is rendered in a browser. One such third party library is Selenium http://selenium-python.readthedocs.org/. The only downside is that it opens a real web browser.
Below is a working code using selenium and an attached screenshot:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.omegle.com")
element = browser.find_element_by_id("onlinecount")
onlinecount = element.find_element_by_tag_name("strong")
You can also use GET method on this http://front1.omegle.com/status
that will return the count of online users and other details in JSON form
I have done a bit of looking at this and that particular part of the page is not XML but Javascript.
Here is the source (this is what the requests library is returning in your program)
<div id="onlinecount"></div>
<script>
if (IS_MOBILE) {
$('sharebuttons').dispose();
$('onlinecount').dispose();
}
</script>
</div>
As you can see, in lxml's eyes there is nothing but a script in the onlinecount div.
I agree with Praveen.
If you want to avoid launching a visible browser, you could use PhantomJS which also has a selenium driver :
http://phantomjs.org/
PhantomJS is a headless WebKit scriptable with a JavaScript API
Instead of selenium scripts, you could also write PhantomJS js scripts (but I assume you prefer to stay in Python env ;))

Python - Want to change header logo based on url selection

I would like to change the logo of a website based on which menu is currently activated/seen by the user browsing the website.
For instance I have www.urltowebsite.com/menu1 = Header Logo 1
And then I have www.urltowebsite.com/menu2 = Header Logo 2
And on top of this I want to add an else statement stating that: If any other menu is selected, use header logo 3.
How can I make this possible with Python? I cant seem to wrap my head around what to define where and how to call up the different functions on the HTML website.
Oh and I insist doing this with Python. And preferably without any framework such as Django. But if needs be I can install web.py
EDIT:
Am I forced to go with php then? I would like to once and for all start utilizing Python on my web projects.
The website is made in simple HTML as I said first. The Javascript functions are only used to serve the HTML menu's through AJAX. Again this does not matter much for what I am trying to do, as menu's have classes and I can define those in php and thus change my logo/header.
What I want to do is to use Python in this instance. Here is a code snippet from the site:
<div id="header">
<span class="title"><img src="http://www.url.com/subfolder/images/logo.png"/>
</span>
</div>
And some more relevant to this:
<div id="menu">
<ul>
<li>001</li>
<li>002</li>
<li>003</li>
<li>004</li>
<li>005</li>
<li>006</li>
<li>007</li>
<li>008</li>
</ul>
</div>
So can I use python here?
You're asking to do the wrong thing the wrong way.
In order to change the logo based on the URL in Python , you need Python to generate the page and know what that url is.
There are two ways to do that in Python:
Use an existing Web Framework
Write your own Web Framework
"Python" doesn't know or care what your URL is - the frameworks and support libraries ( Django, Pyramid, Bottle, Flash, Tornado, Twisted, etc) figure out what the URL is by an integration with an underlying web server ( though some have their own webserver coupled in ). Similarly, PHP doesn't really know or care what the URL is - that information comes from an integration with Apache or FCGI/Nginx/etc. PHP tends to ship with most/all of that integration done. It's also worth noting that PHP is not just a language, but a web framework. Python is just a language.
Most Python frameworks will be written to the WSGI spec and have a "request" object that has all the data you want ( and many use the WebOb librbary for that ).
If you plan on doing everything with static HTML files, then you have a few options:
have a single static directory. use javascript to figure out the addressbar location, and render the corresponding logo / write the headers & footers.
have a "template" directory of all your HTML. use a Python script build a static version of each website with the custom headers/footers and configure your webserver to serve a different one for each domain.
No, Python cannot run inside the HTML web page. If you're really serving plain HTML pages then you must use javascript to execute code in the browser once the page is loaded. However, since you mention using AJAX, it sounds like it's not really true that you're serving plain HTML but rather have some server side code. If so, that server side code is the place to put your HTML-construction logic. To know the best way to do that, you would have to describe what's happening on the server.
Although I haven't used it, I have heard that the pyhp project more or less provides php-like embedded functionality for python.

Categories