How to check if SSH connection was established with AWS instance - python

I'm trying to connect to the Amazon EC2 instance via SSH using boto. I know that ssh connection can be established after some time after instance was created. So my questions are:
Can I somehow check if SSH is up on the instance? (if so, how?)
Or how can I check for the output of boto.manage.cmdshell.sshclient_from_instance()? I mean for example if the output prints out Could not establish SSH connection, than try again.
That's what I tried so far, but have no luck:
if instance.state == 'running':
retry = True
while retry:
try:
print 'Connecting to ssh'
key_path = os.path.join(os.path.expanduser('~/.ssh'), 'secret_key.pem')
cmd = boto.manage.cmdshell.sshclient_from_instance(instance,
key_path,
user_name='ec2-user')
print instance.update()
if cmd:
retry = False
except:
print 'Going to sleep'
time.sleep(10)
SSH Connection refused, will retry in 5 seconds
SSH Connection refused, will retry in 5 seconds
SSH Connection refused, will retry in 5 seconds
SSH Connection refused, will retry in 5 seconds
SSH Connection refused, will retry in 5 seconds
Could not establish SSH connection
And of course everything is working properly, because I can launch the same code after some time and will get a connection, and will be able to use cmd.shell()

The message "SSH Connection refused, will retry in 5 seconds" is coming from boto: http://code.google.com/p/boto/source/browse/trunk/boto/manage/cmdshell.py
Initially, 'running' just implicates that the instance has started booting. As long as sshd is not up, connections to port 22 are refused. Hence, what you observe is absolutely to be expected if sshd does not come up within the first 25 seconds of 'running' state.
Since it is not predictable when sshd comes up exactly and in case you do not want to waste time by just defining a constant long waiting period, you could implement your own polling code that in e.g. 1 to 5 second intervals checks if port 22 is reachable. Only if it is invoke boto.manage.cmdshell.sshclient_from_instance().
A simple way to test if a certain TCP port of a certain host is reachable is via the socket module:
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect(('hostname', 22))
print "Port 22 reachable"
except socket.error as e:
print "Error on connect: %s" % e
s.close()

I have 2 parts, one to check if the instance is running, then another one to check if the instance is reachable
# Get instance status till it is running
status_output=$(aws ec2 describe-instance-status --instance-ids $instance_id)
instance_status=$(jq -n "$status_output" | jq .InstanceStatuses[0] | jq .InstanceState.Name)
echo $instance_status
while [ ${instance_status:1:-1} != running ]
do
status_output=$(aws ec2 describe-instance-status --instance-ids $instance_id)
instance_status=$(jq -n "$status_output" | jq .InstanceStatuses[0] | jq .InstanceState.Name)
echo $instance_status
done
# Get instance reachability till it is ready
status_output=$(aws ec2 describe-instance-status --instance-ids $instance_id)
instance_reachability=$(jq -n "$status_output" | jq .InstanceStatuses[0] | jq .InstanceStatus.Status)
echo $instance_reachability
while [ ${instance_reachability:1:-1} != ok ]
do
status_output=$(aws ec2 describe-instance-status --instance-ids $instance_id)
instance_reachability=$(jq -n "$status_output" | jq .InstanceStatuses[0] | jq .InstanceStatus.Status)
echo $instance_reachability
done

Related

Connection refused in multi-server TCP socket connection with Python

I have 3 servers I would like to link so that they communicate between each other. The goal is to run MapReduce in a distributed way.
I have a problem when proceeding to a multi-server connection using TCP socket in python.
I simplified the code so that it enhances the understanding of this particular problem.
IMPORTANT INFO : This exact code is sent and ran on every server of the list "computers" using the bash code given at the very bottom of this thread.
`
from _thread import *
import socket
from time import sleep
computers = ["137.194.142.130", "137.194.142.131", "137.194.142.133"]
list_PORT = [3652,4457, 6735, 9725]
idt = socket.gethostbyname(socket.gethostname())
SIZE = 1024
FORMAT = "utf-8"
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind((idt, list_PORT[computers.index(idt)+1]))
server.listen()
sleep(4) #So that every server have time to listen before
# the other ones start to connect with the next part of the code
list_socket_rec = []
if computers.index(idt) != 0:
for server in computers[:computers.index(idt)]:
socket_nb = 0
skt = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
list_socket_rec.append(skt)
list_socket_rec[socket_nb].connect((server, list_PORT[computers.index(idt)])) # error connection refused
socket_nb += 1
#In this loop, I'm trying to connect to every server whose index are "smaller" in the computers list (in order to not have duplicates)
The error that I get is the following: (occuring on the connect() function at the end of the code)
`
Traceback (most recent call last):
File "file", line 24, in <module>
list_socket_rec[socket_nb].connect((server, list_PORT[computers.index(idt)])) # fonctionne pas ==> connection refused
ConnectionRefusedError: [Errno 111] Connection refused
Does someone know how to solve this error please?
Shell code to run the servers :
`
#!/bin/bash
# A simple variable example
login="login"
remoteFolder="/tmp/$login/"
fileName="server"
fileExtension=".py"
computers=("137.194.142.130", "137.194.142.131", "137.194.142.133")
for c in ${computers[#]}; do
command0=("ssh" "$login#$c" "lsof -ti | xargs kill -9")
command1=("ssh" "$login#$c" "rm -rf $remoteFolder;mkdir $remoteFolder")
command2=("scp" "$fileName$fileExtension" "$login#$c:$remoteFolder$fileName$fileExtension")
command3=("ssh" "$login#$c" "cd $remoteFolder; python3 $fileName$fileExtension")
echo ${command0[*]}
"${command0[#]}"
echo ${command1[*]}
"${command1[#]}"
echo ${command2[*]}
"${command2[#]}"
echo ${command3[*]}
"${command3[#]}" &
done
`
I've tried to bind/connect on different port for each connection because I thought the problem could come from multipole connection on the same port but the error was still there.
I have tried to go manually on 2 of the servers, I ran the previous code on the first of the computers list and then on the second of the computers list, and the connection worked.

Python - Continuously Looping a Function

Python newbie here so excuse the bad coding do.Ob
When I run the following.....
def bug_check():
for host in hosts:
sshInteract.send('ssh ' + host )
try:
sshInteract.expect(ep1)
sshInteract.send(password)
sshInteract.expect(ep2)
except:
print("SSH unsuccessful")
else:
sshInteract.send("exit")
sshInteract.expect(ep3)
print("SSH successful")
bug_check()
...the print output I get (which is correct) based on the three hosts in my list is:
SSH successful
SSH successful
SSH unsuccessful
What I would like to do is to continuously run this every 60 seconds until I manually stop it. So what I did was the following:
def bug_check():
for host in hosts:
sshInteract.send('ssh ' + host )
try:
sshInteract.expect(ep1)
sshInteract.send(password)
sshInteract.expect(ep2)
except:
print("SSH unsuccessful")
else:
sshInteract.send("exit")
sshInteract.expect(ep3)
print("SSH successful")
while True:
bug_check()
time.sleep(60)
When running this, the print output looks like this which is incorrect:
SSH successful
SSH successful
SSH unsuccessful
SSH unsuccessful
SSH unsuccessful
SSH unsuccessful
SSH unsuccessful
SSH unsuccessful
SSH unsuccessful
What it should look like is:
SSH successful
SSH successful
SSH unsuccessful
SSH successful
SSH successful
SSH unsuccessful
SSH successful
SSH successful
SSH unsuccessful
What have I done wrong here?

thrift timeout for long run call: thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

I've build some a rpc service using thrift. It may run long time (minutes to hours) for each call. I've set the thrift timeout to 2 days.
transport = TSocket.TSocket(self.__host, self.__port)
transport.setTimeout(2 * 24 * 60 * 60 * 1000)
But the thrift always closes connection after about 600s, with the following exception:
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
Is there's any other timeout should i set? (python, thrift server: windows; client: ubuntu)
The Thrift Transport connection is being disconnected. This could be due to network issues or remote service restart or time out issues. Whenever any call is made after a disconnect this results in TTransportException. This problem can be solved by reconnecting to the remote service.
Try using this, invoking it before making a remote service call.
def repoen_transport():
try:
if not transport.isOpen():
transport.open()
except Exception, msg:
print >> sys.stderr.write("Error reopening transport {}".format(msg))

Number of connections to tor via stem library - Controller creates 2 connections but closing the controller removes only one connection

I am working with the stem library and I'm writing a thin wrapper over the Stem Controller class.
I have some questions relating to the number of connections created when a controller is instantiated and how many are removed when the controller is closed.
This is the code I have so far:
import logging
import sys
import subprocess
from optparse import OptionParser
import stem
from stem.control import Controller
SOCKET_ERROR_CODE = 2
MISSING_PWD_ERROR_CODE = 3
PWD_FAIL_ERROR_CODE = 4
AUTH_FAIL_ERROR_CODE = 5
UNKNOWN_ERROR_CODE = -1
class StemController():
def __init__(self):
# Added print statements for debugging - Yes, I know, shell=True is bad
print(
"[__init__ start] ",
str(int(subprocess.check_output('netstat -na | grep 9051 | wc -l', shell=True).strip()))
)
# Controller connection and credentials
self.tor_router_ip = "127.0.0.1"
self.tor_router_port = 9051
self.tor_password = "controllportpassword" # Add yours to test
# Create a controller - This might actually fail
try:
# Tried both and one at a time, still two connections
self.controller = Controller.from_port(
# address=self.tor_router_ip,
port=self.tor_router_port
)
except stem.SocketError as e:
logging.info("SocketError when opening controller. Now existing with code %s." % (
SOCKET_ERROR_CODE
))
sys.exit(SOCKET_ERROR_CODE)
except Exception as e:
logging.exception(e)
sys.exit(UNKNOWN_ERROR_CODE)
print(
"[Controller created] ",
str(int(subprocess.check_output('netstat -na | grep 9051 | wc -l', shell=True).strip()))
)
# Authenticate controller - This might fail as well
try:
self.controller.authenticate(password=self.tor_password)
except stem.connection.MissingPassword:
logging.info(
"MissingPassword when authenticating controller. Now existing with code %s." % MISSING_PWD_ERROR_CODE
)
self.controller.close()
sys.exit(MISSING_PWD_ERROR_CODE)
except stem.connection.PasswordAuthFailed:
logging.info(
"PasswordAuthFailed when authenticating controller. Now existing with code %s." % PWD_FAIL_ERROR_CODE
)
self.controller.close()
sys.exit(PWD_FAIL_ERROR_CODE)
except stem.connection.AuthenticationFailure:
logging.info(
"AuthenticationFailure when authenticating controller. Now existing with code %s." % AUTH_FAIL_ERROR_CODE
)
self.controller.close()
sys.exit(AUTH_FAIL_ERROR_CODE)
except Exception as e:
logging.exception(e)
self.controller.close()
sys.exit(UNKNOWN_ERROR_CODE)
print(
"[Controller auth] ",
str(int(subprocess.check_output('netstat -na | grep 9051 | wc -l', shell=True).strip()))
)
def __del__(self):
init_tor_connections = int(subprocess.check_output('netstat -na | grep 9051 | wc -l', shell=True).strip())
print(
"\nDeleting and cleaning up controller. Before cleanup there are %s tor connections." % init_tor_connections
)
self.controller.close()
final_tor_connections = int(subprocess.check_output('netstat -na | grep 9051 | wc -l', shell=True).strip())
print(
"After deletion of controller. After cleanup there are %s tor connections." % final_tor_connections
)
import unittest
class TestStemController(unittest.TestCase):
def setUp(self):
# This is a darknet site that is almost always up
self.up_site = "deepdot35wvmeyd5.onion"
def test_stem_controller(self):
# Make sure that the controller opens and closes correctly
# Count how many connections on port 9051 we have
pre_stem_controller = int(subprocess.check_output('netstat -na | grep 9051 | wc -l', shell=True).strip())
print("pre_stem_controller: ", pre_stem_controller)
test_stem_controller = StemController()
# Count how many connections on port 9051 we have after initializing the controller
post_stem_controller = int(subprocess.check_output('netstat -na | grep 9051 | wc -l', shell=True).strip())
print("post_stem_controller: ", post_stem_controller)
# We should have one more connection, since we created another once when we initialized the controller
self.assertEqual(post_stem_controller, pre_stem_controller + 1)
# Delete the controller
del test_stem_controller
# Count how many connections on port 9051 we have after deleting the controller
post_del_stem_controller = int(subprocess.check_output('netstat -na | grep 9051 | wc -l', shell=True).strip())
print("post_del_stem_controller: ", post_del_stem_controller)
# We should have as many as we had in the beginning
self.assertEqual(post_del_stem_controller, pre_stem_controller)
def suite():
test_suite = unittest.TestSuite()
test_suite.addTest(TestStemController('test_stem_controller'))
return test_suite
if __name__ == '__main__':
arguments = sys.argv[1:]
parse = OptionParser("This is the stem controller script script.")
parse.add_option(
"-u",
"--unittests",
help="Action: Run unittests.",
default=False,
action='store_true'
)
(options, arguments) = parse.parse_args()
if options.unittests:
test_suite = suite()
unittest.TextTestRunner().run(test_suite)
logging.info("Unittests done. Now existing.")
sys.exit()
TL;DR of code: Count number of connections on port 9051, create a controller, count number of connections on port 9051 again and assert the number increased by one. Same thing for deleting except assert connection count decreased by one.
I run my code with python3 stem_code.py -u and get, for example this:
pre_stem_controller: 1
[__init__ start] 1
[Controller created] 3
[Controller auth] 3
post_stem_controller: 3
F
Deleting and cleaning up controller. Before cleanup there are 3 tor connections.
After deletion of controller. After cleanup there are 2 tor connections.
======================================================================
FAIL: test_stem_controller (__main__.TestStemController)
----------------------------------------------------------------------
Traceback (most recent call last):
self.assertEqual(post_stem_controller, pre_stem_controller + 1)
AssertionError: 3 != 2
----------------------------------------------------------------------
Ran 1 test in 0.050s
FAILED (failures=1)
The most relevant part I think is this:
[__init__ start] 1
[Controller created] 3
My first question is: Why are two connections created here?
I have developed a theory why that is, but I am not sure.
Curious to see what those 2 connections are, I did this after instantiating the controller:
netstat -na | grep 9051
The result was this:
tcp 0 0 127.0.0.1:9051 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9051 127.0.0.1:40606 ESTABLISHED
tcp 0 0 127.0.0.1:40606 127.0.0.1:9051 ESTABLISHED
So, I have the tor client listening on port 9051 ( first line ) and two connections, one from tor to stem ( 9051 to 40606, second row ) and one from stem to tor ( 40606 to 9051, third row ).
Is this dual connection, tor to stem and stem to tor the reason two connections are created?
Following this, I decided I would take the fact 2 connections are created as is. So I changed my unit test from +1 to +2 to pass that particular assertion. The test moved on but failed to pass the assertion that the number of connections pre initialisation is equal to the number of connections post deletion.
self.assertEqual(post_del_stem_controller, pre_stem_controller)
AssertionError: 2 != 1
Using the same procedure as in the case of the connection, I did netstat -na and go this:
tcp 0 0 127.0.0.1:9051 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9051 127.0.0.1:40636 TIME_WAIT
The tor to stem connection still appears to be around.
Am I correct in thinking that when I do controller.close() I only close the stem to tor connection but the tor to stem connection remains active ( at least for a while, TIME_WAIT state )?
Now, assuming I was correct so far:
Is there any way to force tor to close it's side of the connection?
Is there any reason to try to force tor to close it's connection? ( My reasoning goes like this. I know there are a maximum number of 256 connections to the tor client. I want to have as many of them free as possible ). Put otherwise, connections in state TIME_WAIT count as actual connection? For example, if I have 255 ESTABLISHED connections and one TIME_WAIT, will I still be able to make another connection to tor?
Do you think this is the correct way to test my wrapper class, or are there better ways of making sure the controller opens and closes correctly?
Thanks!
The connection is closed as far as Tor is concerned. That is, it has released that connection and no longer sees that client as connected. When the socket is in TIME_WAIT state, the socket is closed but the OS keeps it around to ensure no late packets arrive from the old connection that might be accepted by a later connection from the same set of ports (e.g. 40606 from your example).
You can decrease the TIME_WAIT period in the OS, but it doesn't really address the problem. See http://www.isi.edu/touch/pubs/infocomm99/infocomm99-web/ and http://www.linuxbrigade.com/reduce-time_wait-socket-connections/ for more info.
No. Once you close the connection to the controller, Tor decreases the number of connected clients so it's freed up to accept more. I wouldn't worry about it.
Probably not. Why use netstat to test your connection when you can just issue a command to the controller and read a response?
The reason you see what looks like 3 connections, is because first is the listening socket on 9051, then when you connect, netstat is showing you both ends of the same connection (since it's local). So you can see that there's a connection to Tor using port 40636, and then you also (because it's a local connection) see the connection from your control client on port 40636 as well. These two ends of the connection really represent a single connection.
So as long as you're connecting to a local Tor instance, the number of lines in netstat will increase by 2 for each connection.
You could eliminate the local client connection from the output by changing your grep to
netstat -ano|grep -E ':9051[[:space:]]+[[:digit:]]'
You could then further filter out the LISTENING connection with:
netstat -ano|grep -E ':9051[[:space:]]+[[:digit:]]'|grep -v 'LISTENING'
Am I correct in thinking that when I do controller.close() I only
close the stem to tor connection but the tor to stem connection
remains active ( at least for a while, TIME_WAIT state )?
When you do controller.close(), both are closed. Tor no longer sees the connection as open, it's just being held by the OS for the reason described above; to make sure that another connection using the same port combination doesn't get established and potentially accept lingering packets from the previous connection on the new one. It's no longer open in Tor's eyes, it has removed the client from it's connection list.
Hope that answers your questions.

Why do I need `Ctrl-c` when `curl` to the this tcp proxy

I'm trying to get this tcp proxy to work. I used a simple server based on BaseHTTPServer
run server on port 12343
run proxy on port 12344
run curl against port 12343. It works!
run curl against port 12344. Now I need to press Ctrl-C.
Here is the code and how to reproduce the situation:
$ ./server0.py 12343
$ ./relay.py 12344 127.0.0.1 12343
$ curl 'http://localhost:12343' # This works fine
$ curl 'http://localhost:12344' # This needs Ctrl-c
PS: Another question is how can I stop relay.py once it's running. Ctrl-C does not work. Currently, I'm using Ctrl-z with kill `jobs -ps`
It was because you forgot to close the connection between the client (curl) and relay.
The BaseHTTPServer will close the TCP connection once it finishes writing response, so it works fine to directly connect to the HTTP server.
When connecting to HTTP server through the relay, however, connections will like this
--Relay-------
| pipe1 |
connection1 | ----------> | connection2
| ----------->|/ \|------------> |
curl| | | | HTTP Server
| <-----------|\ pipe2 /|<------------ |
| <--------- |
The connection2 will be closed by the HTTP Server, but connection1 won't, so curl will keep hanging.
To fix the problem, just call
socket.shutdown()
on connection1, the revised PipeThread.run() is like the following:
def run(self):
while True:
try:
data = self.source.recv(1024)
if not data:
break
self.sink.send(data)
except Exception as e:
logging.error(e)
break
logging.info('%s terminating' % self)
self.sink.shutdown(socket.SHUT_WR) #close the socket on the write side
PipeThread.pipes.remove(self)
logging.info('%s pipes active' % len(PipeThread.pipes))
Full code is revised code is here

Categories