Checking for dead links locally in a static website (using wget?)

Checking for dead links locally in a static website (using wget?) - python

A very nice tool to check for dead links (e.g. links pointing to 404 errors) is wget --spider. However, I have a slightly different use-case where I generate a static website, and want to check for broken links before uploading. More precisely, I want to check both:
Relative links like file.pdf
Absolute links, most likely to external sites like example.
I tried wget --spyder --force-html -i file-to-check.html, which reads the local file, considers it as HTML and follows each links. Unfortunately, it can't deal with relative links within the local HTML file (errors out with Cannot resolve incomplete link some/file.pdf). I tried using file:// but wget does not support it.
Currently, I have a hack based on running a local webserver throught python3 http.serve and checking the local files through HTTP:
python3 -m http.server &
pid=$!
sleep .5
error=0
wget --spider -nd -nv -H -r -l 1 http://localhost:8000/index.html || error=$?
kill $pid
wait $pid
exit $error
I'm not really happy with this for several reasons:
I need this sleep .5 to wait for the webserver to be ready. Without it, the script fails, but I can't guarantee that 0.5 seconds will be enough. I'd prefer having a way to start the wget command when the server is ready.
Conversely, this kill $pid feels ugly.
Ideally, python3 -m http.server would have an option to run a command when the server is ready and would shutdown itself after the command is completed. That sounds doable by writing a bit of Python, but I was wondering whether a cleaner solution exists.
Did I miss anything? Is there a better solution? I'm mentioning wget in my question because it does almost what I want, but using wget is not a requirement for me (nor is python -m http.server). I just need to have something easy to run and automate on Linux.

So I think you are running in the right direction. I would use wget and python as they are two readily available options on many systems. And the good part is that it gets the job done for you. Now what you want is to listen for Serving HTTP on 0.0.0.0 from the stdout of that process.
So I would start the process using something like below
python3 -u -m http.server > ./myserver.log &
Note the -u I have used here for unbuffered output, this is really important
Now next is waiting for this text to appear in myserver.log
timeout 10 awk '/Serving HTTP on 0.0.0.0/{print; exit}' <(tail -f ./myserver.log)
So 10 seconds is your maximum wait time here. And rest is self-explanatory. Next about your kill $pid. I don't think it is a problem, but if you want it to be more like the way a user does it then I would change it to
kill -s SIGINT $pid
This will be equivalent to you processing CTRL+C after launching the program. Also I would handle the SIGINT my bash script as well using something like below
https://unix.stackexchange.com/questions/313644/execute-command-or-function-when-sigint-or-sigterm-is-send-to-the-parent-script/313648
The above basically adds below to top of the bash script to handle you killing the script using CTRL+C or external kill signal
#!/bin/bash
exit_script() {
echo "Printing something special!"
echo "Maybe executing other commands!"
trap - SIGINT SIGTERM # clear the trap
kill -- -$$ # Sends SIGTERM to child/sub processes
}
trap exit_script SIGINT SIGTERM

Tarun Lalwani's answer is correct, and following the advices given there one can write a clean and short shell script (relying on Python and awk). Another solution is to write the script completely in Python, giving a slightly more verbose but arguably cleaner script. The server can be launched in a thread, then the command to check the website is executed, and finally the server is shut down. We don't need to parse the textual output nor to send a signal to an external process anymore. The key parts of the script are therefore:
def start_server(port,
server_class=HTTPServer,
handler_class=SimpleHTTPRequestHandler):
server_address = ('', port)
httpd = server_class(server_address, handler_class)
thread = threading.Thread(target=httpd.serve_forever)
thread.start()
return httpd
def main(cmd, port):
httpd = start_server(port)
status = subprocess.call(cmd)
httpd.shutdown()
sys.exit(status)
I wrote a slightly more advanced script (with a bit of command-line option parsing on top of this) and published it as: https://gitlab.com/moy/check-links

Related

How to run a python or bash script interactively on a webpage?

I am building a website and I would like to show a terminal on my webpage which runs a script (python or bash) interactively.
Something like trinket.io but I would like to use the python interpreter or the bash I have on my server, so I could install pip packages and in general control every aspect of the script.
I was thinking at something like an interactive frame which shows the terminal and what's executed in it, obv with user interaction supported.
A good example is https://create.withcode.uk/, it's exactly what I want but I would like to host it on my own server with my own modules and ecosystem. This seems to be pretty good also on the security side.
Is there anything like that?

If I understand well you look for a mechanism, that allows you to display a terminal on a web server.
Then you want to run an interactive python script on that terminal, right.
So in the end the solution to share a terminal does not necessarily have to be written in python, right? (Though I must admit that I prefer python solutions if I find them, but sometimes being pragmatic isn't a bad idea)
You might google for http and terminal emulators.
Perhaps ttyd fits the bill. https://github.com/tsl0922/ttyd
Building on linux could be done with
sudo apt-get install build-essential cmake git libjson-c-dev libwebsockets-dev
git clone https://github.com/tsl0922/ttyd.git
cd ttyd && mkdir build && cd build
cmake ..
make && make install
Usage would be something like:
ttyd -p 8888 yourpythonscript.py
and then you could connect with a web browser with http://hostip:8888
you might of course 'hide' this url behind a reverse proxy and add authentification to it
or add options like --credential username:password to password protect the url.
Addendum:
If you want to share multiple scripts with different people and the shareing is more a on the fly thing, then you might look at tty-share ( https://github.com/elisescu/tty-share ) and tty-server ( https://github.com/elisescu/tty-server )
tty-server can be run in a docker container.
tty-share can be used to run a script on your machine on one of your terminals. It will output a url, that you can give to the person you want to share the specific session with)
If you think that's interesting I might elaborate on this one

>> Insert security disclaimer here <<
Easiest most hacktastic way to do it is to create a div element where you'll store your output and an input element to enter commands. Then you can ajax POST the command to a back-end controller.
The controller would take the command and run it while capturing the output of the command and sending it back to the web page for it to render it in the div
In python I use this to capture command output:
from subprocess import Popen, STDOUT, PIPE
proc = Popen(['ls', '-l'], stdout=PIPE, stderr=STDOUT, cwd='/working/directory')
proc.wait()
return proc.stdout.read()

Run a series of external commands in a Python script

I'm trying to run external commands (note the plural) from a Python script. I've been reading about the subprocess module and use it. It works for me when I have a single or independent commands to run, whether I'm interested in the stdout or not.
What I want to do is a bit different: I want something persistent. Basically the first command I run is to log in an application, then I can run some other commands which only work if I'm logged in. For some of these commands, I need the stdout.
So when I use subprocess to login, it does it, but then the process is killed and next time I run a command with subprocess, I'm not logged in anymore... I just need to run a series of commands, like I would do it in a terminal.
Any idea how to do that?

You can pass in an arbitrarily complex series of commands with shell=True though I would generally advise against doing that, if not least because you are making your Python script platform dependent.
result = subprocess.check_output('''
servers=0
for server in one two three four; do
output=$(printf 'echo moo\necho bar\necho baz\n' | ssh "$server")
case $output in *"hello"*) echo "$output";; esac
echo "$output" | grep -q 'ALERT' && echo "$server: Intrusion detected"
servers=$((servers++))
done
echo "$servers hosts checked"
''', shell=True)
One of the problems with shell script (or I guess Powershell or cmd batch script if you are in that highly unfortunate predicament) is that doing what you are vaguely describing is often hard to do with a bunch of unconnected processes. E.g. curl has a crude way to maintain a session between separate invocations by keeping a "cookie jar" which allows one curl to pass on login credentials etc to an otherwise independent curl call later on, but there is no good, elegant, general mechanism for this. If at all possible, doing these things from within Python would probably make your script more robust as well as simpler and more coherent.

bash wait for first python file to start before continue

I have this bash:
#!/bin/sh
# launcher.sh
echo "Remote Control Server is starting up..."
sudo python RControlPanel.py &
wait &
sudo python startup.py &
wait
The first python file is a flask server which its necessary to start up first.
The second file is initialising the components on the raspberry pi, and turns on a couple of LEDs and stuff. The way the script is written requires the flask application first and then initialise the components.
Its seems that the flask application it takes longer to start up and the bash script continues to run the startup.py
Is it possible to make sure that the flask app is running and then carry on to the next script? I though wait at the end will work but it doesnt. I have even tried with sleep.
Update: Im not quite sure but, I think when flask app runs, is getting to an endless loop, and waits for requests, like a normal web server do. Maybe thats the problem why the solutions bellow wont work.

I suppose the Flask server opens some HTTP port? Let's say on port 8080, then you could poll the app like so:
while ! curl http://localhost:8080 -m1 -o/dev/null -s ; do
sleep 0.1
done
Options:
-m1 to allow at most 1 second for the HTTP request. If your firewall is configured to silently drop packets to closed ports, this should make it go faster.
-o/dev/null so the HTTP response body doesn't get printed.
-s to hide any errors, as they are expected.
Add -S if you still want to see the "Connection refused" messages scroll by until the server is up.

I came up with a solution which im using the Thomas answer.
I have created a sh file which is called webserver.sh:
echo "Remote Control Server is starting up..."
sudo python RControlPanel.py
Then a second file which is called components.sh:
while ! curl http://127.0.0.1:80 -m1 -o/dev/null -s ; do
sleep 0.1
echo "Web Server still loading" #This line is for testing purposes
done
sudo python startup.py
echo "Startup Initialazation done. System Ready!"
And the a thrid file launcher.sh:
./launcher.sh
./remoteServer.sh
The first file starts up the web server only. no other code needs to be executed in there because it will be skipped, cause the flask app is an endless loop and it will skip everything underneath it.
The second file at the biggening is using the Thomas code to check if the webserver is running. If it does not is keep looping until the webserver (Flask app) come alive, and then run the startup.py python script which is initialising the components.
The third file is just calling launcher.sh and remoteServer.sh. So I can run my whole project within a single file, no matter which one is gonna start first.

Just use:
sudo python RControlPanel.py && sudo python startup.py &
Double && ensures that the second command runs only after first returns exit status zero.

Python (2.7) script monitoring and notification system

I've read a lot of other posts about monitoring python scripts, but haven't been able to find anything like what I am hoping to do. Essentially, I have 2 desktops running Linux. Each computer has multiple python scripts running non-stop 24/7. Most of them are web scraping, while a few others are scrubbing and processing data. I have built pretty extensive exception handling into them that sends me an email in the event of any error or crash, but there are some situations that I haven't been able to get emailed about (such as if the script itself just freezes or the computer itself crashes, or the computer looses internet connection, etc.)
So, I'm trying to build a sort of check-in service where a python script checks in to the service multiple times throughout it's run, and if it doesn't check-in within X amount of time, then send me an email. I don't know if this is something that can be done with the signal or asyncore module(s) and/or sockets, or what a good place would be to even start.
Has anyone had any experience in writing anything like this? Or can point me in the right direction?

Take a look at supervision tools like monit or supervisord.
Those tools are built to do what you described.
For example: create a simple init.d script for your python process:
PID_FILE=/var/run/myscript.pid
LOG_FILE=/mnt/logs/myscript.log
SOURCE=/usr/local/src/myscript
case $1 in
start)
exec /usr/bin/python $SOURCE/main_tread.py >> LOG_FILE 2>&1 &
echo $! > $PID_FILE
;;
stop)
kill `cat ${PID_FILE}`
;;
*)
echo "Usage: wrapper {start|stop}"
;;
esac
exit 0
Then add this to the monit config:
check process myscript pidfile /var/run/myscript.pid
start program = "/etc/init.d/myscript start"
stop program = "/etc/init.d/myscript stop"
check file myscript.pid path /var/run/myscript.pid
if changed checksum then alert
Also check documentation, it has pretty good example on how to setup alerts and send emails.

Upstart is a good choice but I' afraid it is only available for Ubuntu and Redhat based distros

GUI not opening

I have written a python script which scans my gmail INBOX for a particular mail, and if that mail is present it opens up a GUI. I have tested this script and works correctly.
I want to run this script whenever the network connection is established. So, I have added a script in the dispatch.d directory of the NetworkManager. My bash script is shown below.
#!/bin/bash
#/etc/NetworkManager/dispatcher.d/90filename.sh
IF=$1
STATUS=$2
if [ "$IF" == "wlan0" ]; # for wireless internet
then
case "$2" in
up)
logger -s "NM Script up triggered"
python /home/rahul/python/expensesheet/emailReader.py
logger -s "emailReader completed"
exitValue=$?
python3.2 /home/rahul/python/expensesheet/GUI.py &
logger -s "GUI completed with exit status $exitValue"
;;
down)
logger -s "NM Script down triggered"
#place custom here
;;
pre-up)
logger -s "NM Script pre-up triggered"
#place custom here
;;
post-down)
logger -s "NM Script post-down triggered"
#place custom here
;;
*)
;;
esac
fi
I have used tkinter to design my GUI.
My problem is that, emailReader(which has no GUI) gets executed correctly, but GUI.py doesn't get executed. It exits with the exit status 1.
Can somebody throw some light on this matter and explain what I'm doing wrong?

NetworkManager is a process that is running on a virtual terminal, outside of your X-server.
(e.g. NetworkManager get's started on bootup before your window manager gets started; they are totally unrelated).
therefore, any script started by NetworkManager will not (directly) be able to access the GUI. (it is very similar to what you get when you change from your desktop to a virtual terminal (e.g. Ctrl-Alt-1), and then try to run your GUI from there: you will most likely get an error like "Can't open display".
if you want to start a GUI-program, you have 2 possibilities
tell a notification daemon (a sub-process of your window-manager) to start your GUI
tell your GUI to start on the correct display (the one, where your desktop is running)
i'd go for the first solution (notification daemons are designed for that very purpose), but how to do it, heavily depends on the window-manager you use.
the 2nd solution is a bit more dirty and involves potential security breaches but basically try something like starting DISPLAY=:0.0 myguiapp.py instead of starting myguiapp.py (this assumes you are running an X-server on localhost:0.0).
you can check whether this works by simply launching the command with the DISPLAY-line from a virtualterminal.
to get the display you are actually using, simply run echo $DISPLAY in a terminal within your X-server.
usually, remote connections are disabled to your running Xserver (as it allows non-proviliged users to take over your desktop - everything from starting new GUI-programs (which is what you want) to installing keyloggers); if that's the case check man xhost (or go for solution #1)
UPDATE
for the 1st solution, you probably want to check out libraries like libnotify (there's python bindings in python-notify and python-notify2).
if you want more than simple "notification popups", you probably have to dig into D-BUS.
a simple example (haven't tested it personally, though), can be found here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.