System Design for Bulk Server Diagnosis - python

The following is the skeleton code for a script that addresses servers that are down on the network. The script does the job, but I would like it to operate faster/better.
The script does the following:
Determines if machine is reachable via ssh.
If not reachable install a recovery image.
If reachable then send a script that take server name as a command line argument and does a quick diagnosis to determine why the server went down.
Problems:
Some servers that are reachable over the network get stuck when is_reachable() is called. The diagnosis_script.py uses linux commands to find hardware issues and logging errors. The script hangs for up to 30 mins until the ssh connection is terminated. It will continue to the next reachable server in the for loop, but this is a huge time sink.
Is there a way to put a timer on this? To exit the ssh connection and continue to the next server if the current server takes too long?
I believe a queue based multiprocessing algorithm could also expedite this script as well. Does anyone have exp or an have an example of how to implement something like this?
Example Skeleton Code:
import os
server_list = [machine1, machine2, machine3, machine4, machine5, machine6, ... , machine100]
reachable = []
unreachable = []
def is_sshable(server_list):
for server in server_list:
ssh_tester = 'ssh -o ConnectTimeout=3 -T root#{}'.format(server)
ssh = os.popen(ssh_tester).read()
if "0" not in ssh:
unreachable.append(server)
else:
reachable.append(server)
def is_unreachable(servername):
# Recover server is an internal linux command
for server in unreachable:
os.system('recover server {}'.format(servername))
def is_reachable(servername):
for server in reachable:
os.system('python3 diagnosis_script.py {}'.format(server))

Related

python server executing DB/Binance connections every time the system is accessed

I am using Python and Flask as part of a server. When the server starts up, it connects to an Oracle database and Binance Crypto Exchange server
The server starts in either TEST or PRODUCTION mode. In order to determine the mode to use when starting up, I take an input variable and then use it to determine whether or not to connect to the PROD configuration (which would actually execute trades) and the TEST system (which is more like a sandbox)
Whenever I make a call to the server ( ex: http://<myservername.com>:80/ ) it seems as though the server connections are executed with each call. So, if I type in http://<myservername.com>:80/ 7 times, the code that connects to the database (and the code that connects to the Binance server) is EXECUTED SEVEN times.
Question: Is there a place where one can put the connection code so that it is executed ONCE when the server is started up?
I saw the following:
https://damyan.blog/post/flask-series-structure/
How to execute a block of code only once in flask?
Flask at first run: Do not use the development server in a production environment
and tried using the solution in #2
#app.before_first_request
def do_something_only_once():
The code was changed so it had the following below (connection to the Binance server is not shown):
#app.before_first_request
def do_something_only_once():
system_access = input(" Enter the system access to use \n-> ")
if ( system_access.upper() == "TEST" ) :
global_STARTUP_DB_SERVER_MODE = t_system_connect.DBSystemConnection.DB_SERVER_MODE_TEST
print(" connected to TEST database")
if ( system_access.upper() == "PROD" ) :
global_STARTUP_DB_SERVER_MODE = t_system_connect.DBSystemConnection.DB_SERVER_MODE_PROD
print(" connected to PRODUCTION database")
When starting the server up, I never get an opportunity to enter "TEST" ( in order to connect to the "TEST" database). In fact, the code under the area of:
#app.before_first_request
def do_something_only_once():
is never executed at all.
Question: How can one fix the code so that when the server is started, the code responsible for connecting to the Oracle DB server and connecting to the Binance server is only executed ONCE and not every time the server is being accessed by using http://<myservername.com>:80/
Any help, hints or advice would be greatly appreciated
TIA
#Christopher Jones
Thanks for the response.
What I was hoping to do was to have this Flask server implemented as a Docker process. The idea is to start several of these processes at one time. The group of Docker Processes would then be managed by some kind of Dispatcher. When an http://myservername.com:80/ command was executed, the connection information would first go to the Dispatcher which would forward it to a Docker Process that was "free" for usage. My thoughts were that Docker Swarm (or something under Kubernetes) might work in this fashion(?) : one process gets one connection to the DB (and the dispatcher would be responsible for distributing work).
I came from ERP background. The existence of the Oracle Connection Pool was known but it was elected to move most of the work to the OS processing level (in that if one ran "ps -ef | grep <process_name>" they would see all of the processes that the "dispatcher" would forward work to). So, I was looking for something similar - old habits die hard ...
Most Flask apps will be called by more than one user so a connection pool is important. See How to use Python Flask with Oracle Database.
You can open a connection pool at startup:
if __name__ == '__main__':
# Start a pool of connections
pool = start_pool()
...
(where start_pool() calls cx_Oracle.SessionPool() - see the link for the full example)
Then your routes borrow a connection as needed from the pool:
    connection = pool.acquire()
    cursor = connection.cursor()
    cursor.execute("select username from demo where id = :idbv", [id])
    r = cursor.fetchone()
    return (r[0] if r else "Unknown user id")
Even if you only need one connection, a pool of one connection can be useful because it gives some Oracle high availability features that holding open a standalone connection for the duration of the application won't give.

Airflow using GCP. Unable to ping external IP Address within Airflow DAG

Background
I have created an Airflow webserver using a Composer Environment within Google Cloud Platform. i.e. 3 nodes, composer-1.10.0-airflow-1.10.6 image version, machine type n1-standard-1.
I have not yet configured any networks for this environment.
The Airflow works fine for simple test DAGs, i.e.:
The problem
I wrote a ping_ip DAG for determining whether a physical machine (i.e. my laptop) is connected to the internet. (Code: https://pastebin.com/FSBPNnkP)
I tested the python used to ping the machine locally (via response = os.system("ping -c 1 " + ip_address)) and it returned 0, aka Active Network.
When I moved this code into an Airflow DAG, the code ran fine, but this time returned 256 for the same IP address.
Here's the DAG code in a pastebin:
https://pastebin.com/FSBPNnkP
Here are the Airflow Logs for the triggered DAG pasted above:
[2020-04-28 07:59:35,671] {base_task_runner.py:115} INFO - Job 2514: Subtask ping_ip 1 packets transmitted, 0 received, 100% packet loss, time 0ms
[2020-04-28 07:59:35,673] {base_task_runner.py:115} INFO - Job 2514: Subtask ping_ip [2020-04-28 07:59:35,672] {logging_mixin.py:112} INFO - Network Error.
[2020-04-28 07:59:35,674] {base_task_runner.py:115} INFO - Job 2514: Subtask ping_ip [2020-04-28 07:59:35,672] {python_operator.py:114} INFO - Done. Returned value was: ('Network Error.', 256)
I guess I have Networking issues for external IPs in my server.
Does anybody know how to ping an external IP from within an Airflow Service managed by GCP?
The end goal is to create a DAG that prompts a physical machine to run a python script. I thought this process should start with a simple sub-DAG that checks to see if the machine is connected to the internet. If I'm going about this the wrong way, please lemme know.
What worked for me is removing the response part. Here's the code:
import os
def ping_ip():
ip_address = "8.8.8.8" # My laptop IP
response = os.system("ping -c 1 " + ip_address)
if response == 0:
pingstatus = "Network Active."
else:
pingstatus = "Network Error."
print("\n *** Network status for IP Address=%s is : ***" % ip_address)
print(pingstatus)
return pingstatus
print(ping_ip())
Let me give my opinion.
Composer by default uses the default network that contains a firewall rule that allow ICMP protocol (ping). So, I think any public IP should be reached out, for example, when PYPI packages are installed you usually don't configure anything special, the PYPI repositories are accessible.
A machine that has Internet access not necessarily means that it has a public IP, e.g. it can be under NAT or any other network configuration (network is not my expertise). To ensure you are specifying the public address of your Internet connection, you can use tools like https://www.myip.com/, where you will see the Public IP (e.g 189.226.116.31) and your Host IP (e.g. 10.0.0.30), if you get something similar, you will need to use the public one.
If you are using the Host IP, it is possible that it is working locally as that IP is reachable from the same private network you are in, the traffic is not going outside the network. But in the case of Composer where your DAG was uploaded, the nodes are completely outside of your local network.
I didn't find what the ping code 256 could mean, but if you are using the correct public IP, you can try increasing the timeout of the response with -W, it is probably only taking more time to reach out the IP.
The VMs created by Composer are unlikely to have "ping" installed. These are standard images. I think you are basically invoking the Linux "ping" command and it fails because it is not installed in the node/vm. So you need to change your implementation to "ping" the server another way.
You can SSH to the Composer node VMs and install "ping" and then rerun the DAG. But even if it works I would not consider it a clean solution and it will not scale. But it is okay to do this for a pilot.
Lastly, if your goal is to execute a Python script have you thought of using a Python Operator from within a DAG. If you want to somehow decouple the execution of Python script from the DAG itself, an alternative is to use a PubSub + CloudFunction combination.
Other probable causes for being unable to reach External IPs is misconfigured firewall rules. To fix this you must:
Define an egress firewall rule to enable ping to your destination IP and attach the firewall rule to a "tag".
Make sure you attach the same "tag" to the VMs/nodes created for Composer.

How to start clients from server itself in python?

I'm developting a automation framework with little manual intervention.
There is one server and 3 client machines.
what server does is it sends some command to each client one by one and get the output of that command and stores in a file.
But to establish the connection I have to manually start clients in different machine in the command line, is there a way that the server itself sends a signal or something to start the client sends command stores output and then start next client so on in python?
Edited.
After the below suggestion, I used spur module
import spur
ss = spur.SshShell(hostname = "172.16.6.58",username ='username',password='some_password',shell_type=spur.ssh.ShellTypes.minimal,missing_host_key=spur.ssh.MissingHostKey.accept)
res = ss.run(['python','clientsock.py'])
I'm trying to start the clientsock.py file in one of the client machine (server is already running in the current machine) but, it hangs there nothing happens. what am i missing here?

Check if socket is in use in python

I am running a script that telnets to a terminal server. Occasionally the script is launched while one instance is already running, which causes the already running script to fail with
EOFError: telnet connection closed
Is there a quick and easy and pythonic way to check if the required socket is already in use on the client computer before I try to open a connection with telnetlib?
SOLUTION:
I wanted to avoid making a subprocess call but since I do not control software on the client computers, and other programs may be using the same socket, the file lock suggestion below (a good idea) wouldn't work for me. I ended up using SSutrave's suggestion. Here is my working code that uses netstat in Windows 7:
# make sure the socket is not already in use
try:
netstat = subprocess.Popen(['netstat','-nao'],stdout=subprocess.PIPE)
except:
raise ValueError("couldn't launch netstat to check sockets. exiting")
ports = netstat.communicate()[0]
if (ip + ':' + port) in ports:
print 'socket ' + ip + ':' + port + ' in use on this computer already. exiting'
return
You can check for open ports by running the following linux command netstat | grep 'port number' | wc -l by importing subprocess library in python.
There is not a standard way to know if a server has other opened connections before you attempt to connect to it. You must ask it either connecting to another service in the server that checks it, or by asking the other clients, if you know all of them.
That said, telnet servers should be able to handle more than one connection at a time, so it should not matter if there are more clients connected.

how to get name of the machine which is connected to remote machine using Python

I have 2 remote machines which can be accessed by a group of machines connected through LAN.
If a machine is connects to that remote machine using mstsc, how can we get the name of machine that is connected?
Is there any python package to get this data?
Thanks in advance.
You have to run as admin. Following code prints connected machines using mstsc.exe with it's connected port number.
f = subprocess.check_output('netstat -b')
prevLine = ""
for line in f:
if (line.find("mstsc.exe") !=-1):
print prevLine.split()[1]
else:
prevLine=line
idea is to run netstat with -b option to find all established connections. From output, we can parse for connections using mstsc.

Categories