Responsive Web Design Detection for Crawler

Responsive Web Design Detection for Crawler - python

I'm writing a web crawler, but I only care about pages with responsive web design (RWD). Is there a tell-tale sign that the site is responsive? I am using the mechanize module in python.
The only thing I can think of is grepping the html for something like
href="css/bootstrap.min.css"
or
class="row-fluid"
or something that indicates percentages instead of pixels.
Any help would be appreciated.

My vote would be to search the page head for
<meta name="viewport" content="width=device-width, initial-scale=1.0" *** wildcard-selector-here *** >
I think it would be easier and more acurate than searching for the presence of CSS media queries.
Good luck!

I had a project where I needed to make the website responsive without touching any html markup and no programming code, the only thing I could modify was a stylesheet and a javascript file. I didn't even know which were all the pages of the website because it was a new project to me.
So the goal was to make it responsive so Google crawler won't penalize the site.
So I knew I could use https://www.google.com/webmasters/tools/mobile-friendly/ manually for the pages I wanted to test. But how could I test the whole site?
Well what I've done is to ask for a Webmaster Tools export of the most important links of the site, hundreds of them.
Then I've built a small "tool" that would do exactly what I think Google Responsive Test does, but this tool would accept a list of urls and would loop and test each of them if they fit on a 320px screen (iframe).
This is the HTML tool that you just open, type the urls in a text box, and hit Start! (responsiveChecker.html)
<!DOCTYPE html>
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
-->
<html>
<head>
<title>Responsive Checker</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="http://www.google.com/jsapi"></script>
<script>
var urls;
var delay=3000;
google.load("jquery", "1");
google.setOnLoadCallback(function() {
// Place init code here instead of $(document).ready()
console.log('jquery loaded');
$('#responsiveFrame').load(function(){
if (urls.length > 0) {
setTimeout(function(){ checkUrl(); }, delay);
}
});
});
function startChecking() {
var textUrls=$('#urls').val();
urls= textUrls.match(/[^\r\n]+/g);
checkUrl();
}
function checkUrl() {
var url;
if (urls.length > 0) {
url=urls[0];
console.log("checking: "+url);
$('#responsiveFrame').attr('src',url+'#rc=1');
urls.splice(0, 1);
} else {
console.log("no more urls");
}
}
</script>
</head>
<body>
<iframe id="responsiveFrame" width="320" height="480" src="about:blank" style="border: 1px solid red;" scrolling="no"></iframe>
<p>
<label for="urls">Enter URLs to check, one per line:</label><br />
<textarea id="urls" rows="30" cols="100"></textarea>
</p>
<p>
<input type="button" value="Start checking!" onclick="startChecking();">
</p>
</body>
</html>
This a script that must be loaded and run on the website you want to check:
var responsiveChecker = new function () {
this.width = 320;
this.hashParams={};
this.check = function () {
this.hashParams=this.parseHashBangArgs();
// this.log('responsiveChecker');
// this.log(this.hashParams);
if (!this.mustCheck()) {
return;
} else {
this.updateParams();
this.log('must check!');
var that = this;
var counter=0;
var visibleCounter=0;
jQuery("*").each(function() {
if (jQuery(this).width() > that.width) {
if ('SCRIPT' === this.tagName) {
// ignore script tags
} else {
that.log(this.tagName + "#" + this.id);
counter++;
if (jQuery(this).is(":visible")) {
visibleCounter++;
that.log(this.tagName + "#" + this.id);
}
}
}
});
var page=window.location.href;
if (visibleCounter > 0) {
this.log('[ERROR] page not responsive, there are elements bigger than screen size: '+page);
} else {
if (counter > 0) {
this.log('[WARNING] hey check the above list, there are some hidden elements with size bigger than the screen: '+page);
} else {
this.log('[SUCCESS] ¡todo bien! looks like all elements fit on the screen: '+page);
}
}
}
};
this.updateParams = function () {
if (typeof(this.hashParams.width) !== 'undefined') {
this.width=parseInt(this.hashParams.width);
}
};
this.mustCheck = function () {
if (typeof(this.hashParams.rc) !== 'undefined') {
return true;
}
return false;
};
// https://gist.github.com/miohtama/1570295
this.parseHashBangArgs = function() {
var aURL = window.location.href;
var vars = {};
var hashes = aURL.slice(aURL.indexOf('#') + 1).split('&');
for(var i = 0; i < hashes.length; i++) {
var hash = hashes[i].split('=');
if(hash.length > 1) {
vars[hash[0]] = hash[1];
} else {
vars[hash[0]] = null;
}
}
return vars;
};
this.log = function (msg) {
console.log(msg);
};
};
Place this at the end of a jquery ready:
responsiveChecker.check();
So finally how does it work:
You add the responsiveChecker javascripts on the website you want to check
You open the responsiveChecker.html file, add the urls of the site in the textarea and hit Start
It will start to load the urls in the iframe one by one, and on the Console tab of your browser it will log a "success", "warning" or "error", that means that either it is responsive, or maybe, or not responsive.
Let me know what you think!
By the way, does anybody think this could be useful if we clean it and build a real live tool/service that people could use to test their websites for responsiveness?
Oh btw: The actual checking is done with jQuery, by testing all the elements of the page that have a width smaller or equal to 320px. Indeed this is not 100% guarantee but I think Google's bot might be doing something like this but I'm sure it is more sophisticated.

Related

QWebEngineView.page().runJavaScript() does not run a JavaScript code correctly

I am making a Python program using PyQt5 GUI library.
I found out that using runJavaScript() method does not work for executing JavaScript code on my HTML document.
Here is my HTML document - a Mapbox GL JS component. It can also be found here: https://docs.mapbox.com/mapbox-gl-js/example/simple-map/ .
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Display a map on a webpage</title>
<meta name="viewport" content="initial-scale=1,maximum-scale=1,user-scalable=no">
<link href="https://api.mapbox.com/mapbox-gl-js/v2.10.0/mapbox-gl.css" rel="stylesheet">
<script src="https://api.mapbox.com/mapbox-gl-js/v2.10.0/mapbox-gl.js"></script><script src="qrc:///qtwebchannel/qwebchannel.js"></script>
<style>
body { margin: 0; padding: 0; }
#map { position: absolute; top: 0; bottom: 0; width: 100%; }
</style>
</head>
<body>
<div id="map"></div>
<script>
mapboxgl.accessToken = 'pk.eyJ1IjoidmxhZGlra2lyIiwiYSI6ImNsNno2dnN3cjAxamYzbm4xeDhxa2xuY2oifQ.HhDTHZglHlDNte7XwGZ1Xg';
const map = new mapboxgl.Map({
container: 'map', // container ID
// Choose from Mapbox's core styles, or make your own style with Mapbox Studio
style: 'mapbox://styles/mapbox/streets-v11', // style URL
center: [-74.5, 40], // starting position [lng, lat]
zoom: 9, // starting zoom
projection: 'globe' // display the map as a 3D globe
});
map.on('style.load', () => {
map.setFog({}); // Set the default atmosphere style
});
</script>
</body>
</html>
Here is a part of my Python code:
# Creating QWebEngineView widget called "mapView"
self.mapView = QtWebEngineWidgets.a
mapSizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Minimum, QtWidgets.QSizePolicy.Minimum)
mapSizePolicy.setHeightForWidth(self.mapView.sizePolicy().hasHeightForWidth())
self.mapView.setSizePolicy(mapSizePolicy)
self.mapView.setObjectName("mapView")
self.detstartpointMapLayout.addWidget(self.mapView)
# Opening an HTML document and passing the components to QWebEngineView widget
with open('mapboxjs.html', 'r') as file:
mapHTML = file.read()
self.mapView.setHtml(mapHTML)
# Running a JavaScript code (with no success).
self.mapView.page().runJavaScript("const marker1 = new mapboxgl.Marker().setLngLat([12.554729, 55.70651]).addTo(map);")
Here is an error that my program returned:
js: Uncaught ReferenceError: mapboxgl is not defined .
I suppose this happens because runJavaScript() or QWebEngineView do not notice libraries that I have imported before in HEAD section of the HTML document using tag. How to I bypass that?
The same JavaScript command works with no errors when I open the HTML code in Firefox and send JS code into the console.

My suggestion was right - it happened because the JS function in page.runJavaScript() was executed before the .js script in HEAD section of the HTML file has completed it's execution.
So, to solve the issue I delayed page().runJavascript() execution until the HTML file finishes loading completely (including .js file in the HEAD section) by replacing
self.widgetname.page().runJavaScript("someJavaScriptFunction")
with
self.widgetname.page().loadFinished.connect(lambda: self.widgetname.page().runJavaScript("someJavaScriptFunction"))
Don't forget to include lambda: before the self.widgetname.page().runJavaScript() .

Access "Browser" location

We all have seen this prompt:
As far as I know, this is not IP-based location. This is device-based location.
I don't want IP-based location because 1) It's not reliable and 2) If the user browses my website with a VPN, the location data is absolutely wrong.
I've searched PyPi.org and DjangoPackages.org but didn't find anything to implement that in my Django app.
Is there any solution?

To get the browser to ask the user for geolocation you must use javascript.
This snippet from w3schools demonstrates this.
The latitude and longitude can you then pass forward to the backend.
<!DOCTYPE html>
<html>
<body>
<p>Click the button to get your coordinates.</p>
<button onclick="getLocation()">Try It</button>
<p id="demo"></p>
<script>
var x = document.getElementById("demo");
function getLocation() {
if (navigator.geolocation) {
navigator.geolocation.getCurrentPosition(showPosition);
} else {
x.innerHTML = "Geolocation is not supported by this browser.";
}
}
function showPosition(position) {
x.innerHTML = "Latitude: " + position.coords.latitude +
"<br>Longitude: " + position.coords.longitude;
}
</script>
</body>
</html>

Cannot read property 'join' of undefined in Javascript file Electron JS

I have an application in Electron JS that is calling a python function to execute a python script.
When the script executes it should send the data back to the Electron JS GUI and display it.
The issue I am having is that it is saying that join is undefined:
weather.js:9 Uncaught TypeError: Cannot read property 'join' of
undefined
at get_weather (weather.js:9)
at HTMLButtonElement.onclick (weather.html:14)
here is my JavaScript file:
let {PythonShell} = require('python-shell')
var path = require("path")
function get_weather() {
var city = document.getElementById("city").value
var options = {
scriptPath : path.join(__dirname, '/../engine/'),
args : [city]
}
let pyshell = new PythonShell('weatherApp.py', options);
pyshell.on('message', function(message) {
swal(message);
})
document.getElementById("city").value = "";
}
The line "scriptPath : path.join(__dirname, '/../engine/')," seems to be the offending piece of code.
My gui.html file is as follows:
<html>
<head>
<title></title>
<meta charset="UTF-8">
</head>
<body>
<h1>Get your local weather ...</h1>
<br>
<br>
<label>Enter city name here: <label>
<input id="city" type = "text" placeholder="City">
<button type = "button" value="clickme" onclick="get_weather()">Get Weather</button>
<!--- <button class="btn btn-success" onclick="get_weather();">Go!</button> -->
<br>
<br>
<br>
<script src="/home/ironmantis7x/Documents/BSSLLC/projects/node_electron/electronDemoApps/guiApp/gui/linkers/weather.js"></script>
<p><button type="button">Back to Main Page</button>
</body>
</html>
What error(s) do I need to fix to get this working correctly?
Thank you.

The Problem
Since Electron 5 nodeIntegration is disabled by default in the window. Since normal browser API does not know require or join, you get errors when you try.
Reenabling nodeIntegration
You could enable nodeIntegration again, but it was disabled for a reason. Be sure you read and understand the electron security tutorial.
Using a preload script
Another way is to use a preload script. Let's have a look at the BrowserWindow documentation.
When creating a new BrowserWindow you can add several options. For this case we need the webPreferences.preload option:
Specifies a script that will be loaded before other scripts run in the page. This script will always have access to node APIs no matter whether node integration is turned on or off. The value should be the absolute file path to the script. When node integration is turned off, the preload script can reintroduce Node global symbols back to the global scope.
Be aware that the preload script is run in the renderer process.
Example
Following is an example app, that opens a window with a button that uses the electron dialog to select files. This would not work with disabled nodeIntegration but thanks to our preload script, we reintroduced dialog.showOpenDialog() to our window.
main.js
const { app, BrowserWindow } = require("electron");
const { join } = require("path");
let win;
app.on("ready", () => {
win = new BrowserWindow({
webPreferences: {
//this is the default since electron 5
nodeIntegration: false,
//here you load your preload script
preload: join(__dirname, "preload.js")
}
});
win.loadURL(join(__dirname, "index.html"));
});
preload.js
const { dialog } = require("electron").remote;
window.mystuff = {
selectFile
};
async function selectFile() {
const files = await dialog.showOpenDialog({
properties: ["openFile", "multiSelections"]
});
return files;
}
index.html
<html>
<body>
<main>
<button onclick="myFunction()">select file</button>
<ul id="foo"></ul>
</main>
<script>
async function myFunction() {
//the function provided by the preload script
const files = await window.mystuff.selectFile();
const list = document.getElementById("foo");
for (const file of files) {
const node = document.createElement("LI");
const textNode = document.createTextNode(file);
node.appendChild(textNode);
list.appendChild(node);
}
}
</script>
</body>
</html>
Sending events via IPC
If you are unsure your functionality should be exposed in the window, you can also send events via ipcRenderer.
preload.js
const { ipcRenderer } = require("electron");
window.mystuff = {
selectFile
};
function selectFile() {
return new Promise(resolve => {
ipcRenderer.on("selected-files", (e, files) => {
resolve(files);
});
ipcRenderer.send("select-files");
});
}
Additional part in main.js
ipcMain.on("select-files", async () => {
const files = await dialog.showOpenDialog({
properties: ["openFile", "multiSelections"]
});
win.webContents.send("selected-files", files);
});

Simple AJAX example with Python not working

I am trying to implement a simple AJAX example, based on the demo shown on this page:
http://www.degraeve.com/reference/simple-ajax-example.php
I have copied the HTML portion and named it ajax_demo.html. For example:
<html>
<head>
<title>Simple Ajax Example</title>
<script language="Javascript">
function xmlhttpPost(strURL) {
var xmlHttpReq = false;
var self = this;
// Mozilla/Safari
if (window.XMLHttpRequest) {
self.xmlHttpReq = new XMLHttpRequest();
}
// IE
else if (window.ActiveXObject) {
self.xmlHttpReq = new ActiveXObject("Microsoft.XMLHTTP");
}
self.xmlHttpReq.open('POST', strURL, true);
self.xmlHttpReq.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded');
self.xmlHttpReq.onreadystatechange = function() {
if (self.xmlHttpReq.readyState == 4) {
updatepage(self.xmlHttpReq.responseText);
}
}
self.xmlHttpReq.send(getquerystring());
}
function getquerystring() {
var form = document.forms['f1'];
var word = form.word.value;
qstr = 'w=' + escape(word); // NOTE: no '?' before querystring
return qstr;
}
function updatepage(str){
document.getElementById("result").innerHTML = str;
}
</script>
</head>
<body>
<form name="f1">
<p>word: <input name="word" type="text">
<input value="Go" type="button" onclick='JavaScript:xmlhttpPost("/cgi-bin/simple-ajax-example.py")'></p>
<div id="result"></div>
</form>
</body>
</html>
Not shown above is the full real path to my simple-ajax-example.py here:
<input value="Go" type="button" onclick='JavaScript:xmlhttpPost("/cgi-bin/simple-ajax-example.py")'>
Both files are on my apache server. For example:
http://myserver.com/ajax_demo.html
http://myserver.com/cgi-bin/simple-ajax-example.py
My Python script does work when called directly and looks like this:
import cgi
form = cgi.FieldStorage()
secret_word = form.getvalue('word')
print "Content-type: text/html"
print ""
print "<p>The secret word is", secret_word, "<p>"
Problem is, this simply doesn't work. In the ajax_demo.html text box, when I enter text and click Go, nothing seems to happen.
What am I missing?

You probably need open a server to serve ../cgi-bin/ directory :)
try this commend,
python -m simpleHTTPServer 8100
-----EDIT-----
secret_word = form.getvalue('word')
qstr = 'word=' + escape(word)
Your querystring key is misspelled.

ASP.NET equivalent to Python's os.system([string])

I have an app made in Python, which accesses a Linux server's command prompt with os.system([string])
Now I'd like to transfer this away from Python, into some language like ASP.NET or something.
Is there a way to access the server's command prompt and run commands with ASP.NET or any technology found in Visual Studio?
This needs to run in a web app, where a user will click a button, and then a server-side command will run, so it's important that the technology suggested is compatible with all that.

Well it isn't ASP.net specific but in c#:
using System.Diagnostics;
Process.Start([string]);
Or With more access to the specific parts of running a program (like arguments, and output streams)
Process p = new Process();
p.StartInfo.FileName = "cmd.exe";
p.StartInfo.Arguments = "/c dir *.cs";
p.StartInfo.UseShellExecute = false;
p.StartInfo.RedirectStandardOutput = true;
p.Start();
here is how you could combine this with an ASPx Page:
First Process.aspx:
<%# Page Language="C#" AutoEventWireup="true" CodeBehind="Process.aspx.cs" Inherits="com.gnld.web.promote.Process" %>
<!DOCTYPE html>
<html>
<head>
<title>Test Process</title>
<style>
textarea { width: 100%; height: 600px }
</style>
</head>
<body>
<form id="form1" runat="server">
<asp:Button ID="RunCommand" runat="server" Text="Run Dir" onclick="RunCommand_Click" />
<h1>Output</h1>
<asp:TextBox ID="CommandOutput" runat="server" ReadOnly="true" TextMode="MultiLine" />
</form>
</body>
</html>
Then the code behind:
using System;
namespace com.gnld.web.promote
{
public partial class Process : System.Web.UI.Page
{
protected void RunCommand_Click(object sender, EventArgs e)
{
using (var cmd = new System.Diagnostics.Process()
{
StartInfo = new System.Diagnostics.ProcessStartInfo()
{
FileName = "cmd.exe",
Arguments = "/c dir *.*",
UseShellExecute = false,
CreateNoWindow = true,
RedirectStandardOutput = true
}
})
{
cmd.Start();
CommandOutput.Text = cmd.StandardOutput.ReadToEnd();
};
}
}
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Responsive Web Design Detection for Crawler - python

My vote would be to search the page head for <meta name="viewport" content="width=device-width, initial-scale=1.0" * wildcard-selector-here * > I think it would be easier and more acurate than searching for the presence of CSS media queries. Good luck!

Related

QWebEngineView.page().runJavaScript() does not run a JavaScript code correctly

Access "Browser" location

Cannot read property 'join' of undefined in Javascript file Electron JS

Simple AJAX example with Python not working

ASP.NET equivalent to Python's os.system([string])

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Responsive Web Design Detection for Crawler - python

My vote would be to search the page head for <meta name="viewport" content="width=device-width, initial-scale=1.0" *** wildcard-selector-here *** > I think it would be easier and more acurate than searching for the presence of CSS media queries. Good luck!

Related

QWebEngineView.page().runJavaScript() does not run a JavaScript code correctly

Access "Browser" location

Cannot read property 'join' of undefined in Javascript file Electron JS

Simple AJAX example with Python not working

ASP.NET equivalent to Python's os.system([string])

Categories

Resources

My vote would be to search the page head for <meta name="viewport" content="width=device-width, initial-scale=1.0" * wildcard-selector-here * > I think it would be easier and more acurate than searching for the presence of CSS media queries. Good luck!