I am writing tcp server on Python. Os - ubuntu 12.04
The server is multiprocess. Main process is accepting connections. Accepted socket reduced and sent to the worker:
<main process>
h = reduce_handle(conn.fileno())
self.queue.put(h)
Worker creates separated Thread for this connection:
<worker process>
t = threading.Thread(target=sock_thread, args=(h, DBSession, Protocol))
t.start()
Reduced socket is recovered and working at the separeted Thread:
<Connection Thread>
fd=rebuild_handle(h)
sock = socket.fromfd(fd,socket.AF_INET,socket.SOCK_STREAM)
<data transmition>
sock.close()
Everything works fine. Today I has got an exception:
error: [Errno 24] Too many open files
Restarting server solved this problem, but number of unclosed files are increased. I monitor it trow command line:
lsof | grep python | wc -l
What is the problem? I close each socket at the thread. All threads are normally works and finishes. sock.shutdown(socket.SHUT_RDWR) before socket closing raises exception: bad file descriptor.
Is there a way to close the file associated with the socket?
Thanks.
Related
I have 3 servers I would like to link so that they communicate between each other. The goal is to run MapReduce in a distributed way.
I have a problem when proceeding to a multi-server connection using TCP socket in python.
I simplified the code so that it enhances the understanding of this particular problem.
IMPORTANT INFO : This exact code is sent and ran on every server of the list "computers" using the bash code given at the very bottom of this thread.
`
from _thread import *
import socket
from time import sleep
computers = ["137.194.142.130", "137.194.142.131", "137.194.142.133"]
list_PORT = [3652,4457, 6735, 9725]
idt = socket.gethostbyname(socket.gethostname())
SIZE = 1024
FORMAT = "utf-8"
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind((idt, list_PORT[computers.index(idt)+1]))
server.listen()
sleep(4) #So that every server have time to listen before
# the other ones start to connect with the next part of the code
list_socket_rec = []
if computers.index(idt) != 0:
for server in computers[:computers.index(idt)]:
socket_nb = 0
skt = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
list_socket_rec.append(skt)
list_socket_rec[socket_nb].connect((server, list_PORT[computers.index(idt)])) # error connection refused
socket_nb += 1
#In this loop, I'm trying to connect to every server whose index are "smaller" in the computers list (in order to not have duplicates)
The error that I get is the following: (occuring on the connect() function at the end of the code)
`
Traceback (most recent call last):
File "file", line 24, in <module>
list_socket_rec[socket_nb].connect((server, list_PORT[computers.index(idt)])) # fonctionne pas ==> connection refused
ConnectionRefusedError: [Errno 111] Connection refused
Does someone know how to solve this error please?
Shell code to run the servers :
`
#!/bin/bash
# A simple variable example
login="login"
remoteFolder="/tmp/$login/"
fileName="server"
fileExtension=".py"
computers=("137.194.142.130", "137.194.142.131", "137.194.142.133")
for c in ${computers[#]}; do
command0=("ssh" "$login#$c" "lsof -ti | xargs kill -9")
command1=("ssh" "$login#$c" "rm -rf $remoteFolder;mkdir $remoteFolder")
command2=("scp" "$fileName$fileExtension" "$login#$c:$remoteFolder$fileName$fileExtension")
command3=("ssh" "$login#$c" "cd $remoteFolder; python3 $fileName$fileExtension")
echo ${command0[*]}
"${command0[#]}"
echo ${command1[*]}
"${command1[#]}"
echo ${command2[*]}
"${command2[#]}"
echo ${command3[*]}
"${command3[#]}" &
done
`
I've tried to bind/connect on different port for each connection because I thought the problem could come from multipole connection on the same port but the error was still there.
I have tried to go manually on 2 of the servers, I ran the previous code on the first of the computers list and then on the second of the computers list, and the connection worked.
I have a CherryPy script that I frequently run to start a server. Today I was having to start and stop it a few times to fix some bugs in a config file, and I guess the socket didn't close all the way because when I tried to start it up again I got this issue:
[23/Mar/2015:14:08:00] ENGINE Listening for SIGHUP.
[23/Mar/2015:14:08:00] ENGINE Listening for SIGTERM.
[23/Mar/2015:14:08:00] ENGINE Listening for SIGUSR1.
[23/Mar/2015:14:08:00] ENGINE Bus STARTING
CherryPy Checker:
The Application mounted at '' has an empty config.
[23/Mar/2015:14:08:00] ENGINE Started monitor thread 'Autoreloader'.
[23/Mar/2015:14:08:00] ENGINE Started monitor thread '_TimeoutMonitor'.
[23/Mar/2015:14:08:00] ENGINE Error in HTTP server: shutting down
Traceback (most recent call last):
File "/home/andrew/virtualenvs/mikernels/lib/python2.7/site-packages/cherrypy/process/servers.py", line 188, in _start_http_thread
self.httpserver.start()
File "/home/andrew/virtualenvs/mikernels/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 1848, in start
raise socket.error(msg)
error: No socket could be created
I edited CherryPy's wsgiserver2.py to see the details of the socket.error and error.strerror was
98 (98, 'Address already in use') Address already in use
Meanwhile my socket is constructed as:
af = 2
socktype = 1
proto = 6
canonname = ''
sa = ('0.0.0.0', 2112)
self.bind(af, socktype, proto)
(that's not exact code but that's what the values are when the error is fired)
I checked netstat and didn't see anything listening on port 2112, what could be causing the problem and how can I go about diagnosing it?
Thanks!
You can try the following
from socket import *
sock=socket()
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
# then bind
From the docs:
The SO_REUSEADDR flag tells the kernel to reuse a local socket in TIME_WAIT state, without waiting for its natural timeout to expire.
Here's the complete explanation:
Running an example several times with too small delay between executions, could lead to this error:
socket.error: [Errno 98] Address already in use
This is because the previous execution has left the socket in a TIME_WAIT state, and can’t be immediately reused.
There is a socket flag to set, in order to prevent this, socket.SO_REUSEADDR:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST, PORT))
You could find the process and kill it by doing:
ps aux | grep python
, finding the process ID, and stopping it manually by doing:
sudo kill -9 PID
replacing PID with your PID.
I often have to do this while testing with Flask/CherryPy. Would be interested to see if there's an easier way (for e.g. to prevent it in the first place)
Much more easier to do it by:
Check the PID(:5000 is the host since I've been running on 127.0.0.1:5000):$ lsof -i :5000Then kill it:$ sudo kill -9 PID
I have an extremely simple tcp server in python the code for which is below:
#!/usr/bin/env python
import socket
sock = socket.socket()
sock.bind(('',3912))
sock.listen(100)
num_cons = 10
cons = []
for i in range(num_cons):
con, addr = sock.accept()
cons.append(con)
while True:
for con in cons:
msg = "a"* 1000
num_sent = con.send(msg.encode())
print("sent: {} bytes of msg:{}".format(str(num_sent), msg))
The corresponding client code is
#!/usr/bin/env python
import socket
sock = socket.socket()
sock.connect(('',3912)) # in reality here I use the IP of the host where
# I run the server since I launch the clients on a different host
while True:
data = sock.recv(1000)
print("received data: {} ".format(str(data)))
Now, if I start the server with
./server.py
and 10 clients in parallel from a different host:
for i in `seq 1 10`; do ./client.py 2>/dev/null 1>/dev/null & done
And I send kill -SIGSTOP %1 to the first client, I expect the server to successfully keep trying to send data because it cannot know that the client has been stopped. Instead, the server blocks when it tries to send the data to client 1. I can understand the behaviour if the clients were on the same host as the server: we tried to write data, but the kernel buffers are full, so we block in the server, but the client never reads, so the buffer is never freed. However, if the clients are on a different machine, the kernel buffers of the server host should only be full temporarily and then the kernel should send the data over the network card and free them. So why is my server blocking on the send call? I have not verified if the same behaviour is seen when using a different language (C for example)
It is weird because 1000 characters is a small size for TCP. I have no available Linux machine but on a FreeBSD box, I could successfully send 130000 bytes on a TCP connection where the peer was stopped before the sender blocks. And more that 1000000 on Windows.
But as TCP is a connected protocol, a send call will block if it cannot queue its data because the internal TCP stack queue is full.
The gist of your problem seems to be that you're creating a SOCK_STREAM socket (i.e. TCP), and then abruptly terminating the client. As discussed in the Python Socket Programming HOWTO, a hang is expected in this situation.
TCP is a reliable protocol, meaning that every transmitted packet has to be acked. If the receiving side is dead, the sender will block waiting for that acknowledgement. Try setting a timeout and see if your send raises a socket.timeout after the expected time.
I've build some a rpc service using thrift. It may run long time (minutes to hours) for each call. I've set the thrift timeout to 2 days.
transport = TSocket.TSocket(self.__host, self.__port)
transport.setTimeout(2 * 24 * 60 * 60 * 1000)
But the thrift always closes connection after about 600s, with the following exception:
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
Is there's any other timeout should i set? (python, thrift server: windows; client: ubuntu)
The Thrift Transport connection is being disconnected. This could be due to network issues or remote service restart or time out issues. Whenever any call is made after a disconnect this results in TTransportException. This problem can be solved by reconnecting to the remote service.
Try using this, invoking it before making a remote service call.
def repoen_transport():
try:
if not transport.isOpen():
transport.open()
except Exception, msg:
print >> sys.stderr.write("Error reopening transport {}".format(msg))
Below is the code I am running within a service. For the most part the script runs fine for days/weeks until the script hiccups and crashes. I am not so worried about the crashing part as I can resolve the cause from the error logs an patch appropriately. The issue I am facing is that sometimes when the service restarts and tries to connect to the server again, it gets a (10061, 'Connection refused') error, so that the service is unable to start up again. The bizarre part is that there is no python processes running when connections are being refused. IE no process with image name "pythonw.exe" or "pythonservice.exe." It should be noted that I am unable to connect to the server with any other machine as well until I reset computer which runs the client script. The client machine is running python 2.7 on a windows server 2003 OS. It should also be noted that the server is coded on a piece of hardware of which I do not have access to the code.
try:
EthernetConfig = ConfigParser()
EthernetConfig.read('Ethernet.conf')
HOST = EthernetConfig.get("TCP_SERVER", "HOST").strip()
PORT = EthernetConfig.getint("TCP_SERVER", "PORT")
lp = LineParser()
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
reader = s.makefile("rb")
while(self.run == True):
line = reader.readline()
if line:
line = line.strip()
lp.parse(line)
except:
servicemanager.LogErrorMsg(traceback.format_exc()) # if error print it to event log
s.shutdown(2)
s.close()
os._exit(-1)
Connection refused is an error meaning that the program on the other side of the connection is not accepting your connection attempt. Most probably it hasn't noticed you crashing, and hasn't closed its connection.
What you can do is simply sleep a little while (30-60 seconds) and try again, and do this in a loop and hope the other end notices that the connection in broken so it can accept new connections again.
Turns out that Network Admin had the port closed that I was trying to connect to. It is open for one IP which belongs to the server. Problem is that the server has two network cards with two separate IP's. Issue is now resolved.