Python - Scrape Javascript using bs4 and print out the value

Python - Scrape Javascript using bs4 and print out the value - python

So I have been trying to create a script where there is a countdown with epoch time which I later gonna convert it.
The html is following:
<script type="text/javascript">
new Countdown('countdown_timer', '1547161260', 'https://stackoverflow.com/');
</script>
and I started to scrape it which I managed to scrape using:
try:
time_countdown_tag = bs4.find_all('script', {'type': 'text/javascript'})
except Exception:
time_countdown_tag = []
for countdown in time_countdown_tag:
if 'new Countdown' in countdown.text.strip():
print(countdown)
which my output is:
<script type="text/javascript">
new Countdown('countdown_timer', '1547161260', 'https://stackoverflow.com/');
</script>
However what I want to print out in this case is the number inside the params which is 1547161260 - I would appreciate all kind of help to be able to be able to only print out the number (epoch) if it is possible?

You can use regular expressions to match the portion of the JS that contains a positive integer:
import re
output = """<script type="text/javascript">
new Countdown('countdown_timer', '1547161260', 'https://stackoverflow.com/');
</script>"""
re.findall("\d+", output)

Related

Embedding Python Game Into HTML Using Skulpt

I have written a game in Python using the PyGame library that I am trying to embed into an HTML page to allow me to play in a web browser.
I am attempting to do this using the JavaScript library Skulpt. I have attached a test script below that successfully outputs the print statement below.
skulpt.html
<html>
<head>
<script src="assets/skulpt/skulpt.js" type="text/javascript"></script>
</head>
<body>
<textarea id="pythonCode">
print "I am python."
</textarea><br />
<pre id="output"></pre>
<script type="text/javascript">
function outf(text) {
var mypre = document.getElementById("output");
mypre.innerHTML = mypre.innerHTML + text;
}
var code = document.getElementById("pythonCode").value;
Sk.configure({output:outf});
eval(Sk.importMainWithBody("<stdin>",false,code));
</script>
</body>
</html>
Output of skulpt.html:
The issue that I am having is that when I use my game code instead of the simple print statement shown above it produces the error seen below;
I have included all relevant images to my web servers' directory at the correct path. I am unsure of why this error is being produced. Any help would be much appreciated, thanks!
Also, here is the attached Python game code (and a live demo of the error):
http://nicolasward.com/portfolio/skulpt.html

You have a lot of indentation on line 1 -> remember, in python, indentation always matters. Take away all those spaces/tabs on the first line and it should run.

Node.js's python child script outputting on finish, not real time

I am new to node.js and socket.io and I am trying to write a small server that will update a webpage based on python output.
Eventually this will be used for a temperature sensor so for now I have a dummy script which prints temperature values every few seconds:
Thermostat.py
import random, time
for x in range(10):
print(str(random.randint(23,28))+" C")
time.sleep(random.uniform(0.4,5))
Here's a cut down version of the server:
Index.js
var sys = require('sys'),
spawn = require('child_process').spawn,
thermostat = spawn('python', ["thermostat.py"]),
app = require('express')(),
http = require('http').Server(app),
io = require('socket.io')(http);
thermostat.stdout.on('data', function (output) {
var temp = String(output);
console.log(temp);
io.sockets.emit('temp-update', { data: temp});
});
app.get('/', function(req, res){
res.sendFile(__dirname + '/index.html');
});
And finally the web page:
Index.html
<!doctype html>
<html>
<head>
<title>Live temperature</title>
<link rel="stylesheet" type="text/css" href="styles.css">
</head>
<body>
<div id="liveTemp">Loading...</div>
<script src="http://code.jquery.com/jquery-1.11.1.js"></script>
<script src="/socket.io/socket.io.js"></script>
<script>
var socket = io();
socket.on('temp-update', function (msg) {
$('#liveTemp').html(msg.data)
});
</script>
</body>
</html>
The problem is nodejs seems to recieve all of the temperature values at once, and instead of getting 10 temperature values at random intervals, I get all of the values in one long string after the script has finished:

You need to disable output buffering in python. This can be done many different ways, including:
Setting the PYTHONUNBUFFERED environment variable
Passing the -u switch to the python executable
Calling sys.stdout.flush() after each write (or print() in your case) to stdout
For Python 3.3+ you can pass flush=true to print(): print('Hello World!', flush=True)
Additionally, in your node code, (even though you have a sleep in your python code and you are now flushing stdout) you really should not assume that output in your 'data' handler for thermostat.stdout is always going to be just one line.

json unicode to string so i can use that in django html page

Edited :
Anybody tell me how to decode the unicode.I just want to print json unicode into my html page i developed. I got the api from heroku api.
pretty much i followed every step correctly. But the output is unicode and that i don`t know how to extract the content and display into my page.
I need to print the content. How to do that ?
my views.py
template_vars['kural'] = json.dumps(thirukural[x])
t = loader.get_template('index.html')
c = Context(template_vars)
#pprint.pprint(c)
return HttpResponse(t.render(c))
Html Page
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head><body>
<p id="p"></p>
<script type="text/javascript">
var t = {{kural|safe}}
var text = eval(t);
var p = document.getElementById("p");
p.innerHTML=t.kural;
</script>
</body></html>
</body>
</html>
It`s currently printed like this
யாதனின் யாதனின் நீங்கியான் நோதல் அதனின் அதனின் இலன்.
but in the heroku api page the sample output printed like this
{
"id": "213",
"kural": "புத்தே ளுலகத்தும் ஈண்டும் பெறலரிதே\n\nஒப்புரவின் நல்ல பிற."
}
You can see that my output doesnt have line breaks that \n . How can i do that ?

The kural variable is a dict, if you want to display kural in your view, I think you need json.
import json
template_vars['kural'] = json.dumps(thirukural[x])

I believe what you'll want to do is change that first line you show of your views code to:
template_vars['kural'] = thirukural[x].encode('ascii', 'xmlcharrefreplace')
That should change everything into HTML entities and it will end up looking something like this:
'உலகம் தழீஇய தொட்பம் மலர்தலும்\n\nகூம்பலும் இல்ல தறிவு.'

Display multiple mpld3 exports on a single HTML page

I've found the mpld3 package to be brilliant for exporting a matplolib plot to HTML and displaying this via a flask app.
Each export comes with a lot of JS which seems unnecessary duplication if you want to display multiple plots within a single page. However I'm not well enough versed in JS to extract the relevant components and then loop through them. The .fig_to_dict method gives the necessary JSON to display each chart but then I'm left wondering what JS/ template work is needed to display each chart in turn.
I considered just stacking each plot into a single big figure but I'd like to layout the charts in separate DIVs and so this isn't the right solution.
I think I can see what the JS is doing to display them but I don't have enough knowledge to modify the functions to suit the purpose.
I haven't included code as I'm expecting this only to be relevant to someone with mpld3 experience but can supply some sample code if needed.
Sample HTML output for mpld3.fig_to_html(fig, template_type="simple"):
<script type="text/javascript" src="http://d3js.org/d3.v3.min.js"></script>
<script type="text/javascript" src="http://mpld3.github.io/js/mpld3.v0.1.js"></script>
<style>
</style>
<div id="fig136845463888164451663379"></div>
<script type="text/javascript">
var spec136845463888164451663379 = { <snip JSON code> };
var fig136845463888164451663379 = mpld3.draw_figure("fig136845463888164451663379", spec136845463888164451663379);
</script>
I'd thought it would be as simple as linking the two core scripts from the template header and then creating a new script for each JSON export. But that hasn't worked for me.

You're half-way there with your answer. I think what you want to do is something like this, which will embed three figures on the page:
<script type="text/javascript" src="http://d3js.org/d3.v3.min.js"></script>
<script type="text/javascript" src="http://mpld3.github.io/js/mpld3.v0.1.js"></script>
<style>
</style>
<div id="fig01"></div>
<div id="fig02"></div>
<div id="fig03"></div>
<script type="text/javascript">
var json01 = { <snip JSON code> };
var json02 = { <snip JSON code> };
var json03 = { <snip JSON code> };
mpld3.draw_figure("fig01", json01);
mpld3.draw_figure("fig02", json02);
mpld3.draw_figure("fig03", json03);
</script>
The json code for each figure can be created in Python by running
import json
# ... create matplotlib figure
json01 = json.dumps(mpld3.fig_to_dict(fig))
Embed this string at the appropriate place in the HTML document you're creating, and you should be good to go. I hope that helps!

Note that since jakevdp's answer was posted mpld3 has had a new release. As of today (September 2014) the mpld3 include has to be:
<script type="text/javascript" src="http://mpld3.github.io/js/mpld3.v0.2.js"></script>

Python HTML - Get element by attribute

There is music website I regularly read, and it has a section where users post their own fictional music-related stories. There is a 91 part series (Written over a length of time, uploaded part by part) that always follows the convention of:
http://www.ultimate-guitar.com/columns/fiction/riot_band_blues_part_#.html.
I would like to be able to get just the formatted text from every part and put it into one html file.
Conveniently, there is a link to a print version, correctly formatted for my purposes. All I would have to do is write a script to download all of the parts and then dump them into file. Not hard.
Unfortunately, the url for a print version is as follows:
www.ultimate-guitar.com/print.php?what=article&id=95932
The only way to know what article corresponds to what ID field is to look at the value attribute of a certain input tag in the original article.
What I want to do is this:
Go to each page, incrementng through the varying numbers.
Find the <input> tag with attribute 'name="rowid"' and get the number in it's 'value=' attribute.
Go to www.ultimate-guitar.com/print.php?what=article&id=<value>.
Append everything (minus <html><head> and <body> to a html file.
Rinse and repeat.
Is this possible? And is python the right language? Also, what dom/html/xml library should I use?
Thanks for any help.

With lxml and urllib2:
import lxml.html
import urllib2
#implement the logic to download each page, with HTML strings in a sequence named pages
url = "http://www.ultimate-guitar.com/print.php?what=article&id=%s"
for page in pages:
html = lxml.html.fromstring(page)
ID = html.find(".//input[#name='rowid']").value
article = urllib2.urlopen(url % ID).read()
article_html = lxml.html.fromstring(article)
with open(ID + ".html", "w") as html_file:
html_file.write(article_html.find(".//body").text_content())
edit: Upon running this, it seems there may be some Unicode characters in the page. One way to get around this is to do article = article.encode("ascii", "ignore") or to put the encode method after .read(), to force ASCII and ignore Unicode, though this is a lazy fix.
This is assuming you just want the text content of everything inside the body tag. This will save files with the format of storyID.html (so "95932.html") in the local directory of the Python file. Change the save semantics if you like.

You could actually do this in javascript/jquery without too much trouble. javascripty-pseudocode, appending to an empty document:
for(var pageNum = 1; i<= 91; i++) {
$.ajax({
url: url + pageNum,
async: false,
success: function() {
var printId = $('input[name="rowid"]').val();
$.ajax({
url: printUrl + printId,
async: false,
success: function(data) {
$('body').append($(data).find('body').contents());
}
});
}
});
}
After the loading completes you could save the resultant HTML to a file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Scrape Javascript using bs4 and print out the value - python

You can use regular expressions to match the portion of the JS that contains a positive integer: import re output = """<script type="text/javascript"> new Countdown('countdown_timer', '1547161260', 'https://stackoverflow.com/'); </script>""" re.findall("\d+", output)

Related

Embedding Python Game Into HTML Using Skulpt

Node.js's python child script outputting on finish, not real time

json unicode to string so i can use that in django html page

Display multiple mpld3 exports on a single HTML page

Python HTML - Get element by attribute

Categories

Resources