Spyne rpc server benchmark (Failed to establish a new connection)

Spyne rpc server benchmark (Failed to establish a new connection) - python

I have python rpc server written above http://spyne.io (and twisted). I've done some Multi-Mechanize benchmarks on it and as you can see in the image bellow - after a minute of testing it starts to have problems establishing connections.
111274, 254.989, 1516806285, user_group-1, 0.017, HTTPConnectionPool(host='0.0.0.0' port=4321): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2c78bf2810>: Failed to establish a new connection: [Errno 99] Cannot assign requested address')), {'increment': 0.0179598331451416}
Since this happens as a clock (after 60s) I'm suspecting, I've run into some implicit/default rate limits from twisted (but my search for them wasn't successful).
Is this possible - if so can someone point me to those limits?
Or is this just overload of the server?
Thanks
multi mechanize benchmark image
EDIT:
Thanks to Jean-Paul Calderone answer I've looked at the number of TCP connections netstat -at | wc -l.
Every time it gets above 28K I get the Cannot assign requested address.
I'm happy this isn't a server issue. :)

The error
Cannot assign requested address
is probably telling you that you've run out of IP/port combinations. A particular IP address can be combined with about 65535 different port numbers to form an address. A TCP connection involves two addresses, one for each end of the connection.
Thus, a particular IP address can not establish more than 65535 TCP connections. And if both ends of the TCP connection fall on the same IP address, that drops the limit to half of this.
Furthermore, TCP connection cleanup involves the TIME_WAIT state - a time-bounded interval during which the connection is still exists though it has already been closed. During this interval, the address of its endpoint is not available for re-use. Therefore, in addition to a hard maximum on the number of TCP connections that you can have open at a given time on a given IP address, there is also a hard maximum to the number of TCP connections that you can open within a given window of time.
If you are benchmarking on your local system, it seems likely that you've run in to these limits. You can expand your benchmark beyond them by using more IP addresses.
Of course, if your server can deal with enough connections that your benchmark actually encounters these limits, perhaps you've answered the question of whether the server is fast enough, already. There's not much to be gained by going from (for example) 1000 connections/second to 10000 connections/second unless your application logic can process requests faster by order of magnitude or so.
Consider an application which requires 1ms processing time. Paired with an RPC server which can service 1000 connections/sec, you'll be able to service 500 requests/second (1ms app + 1ms server -> 2ms -> 1/500th sec -> 500/sec). Now replace the server with one with one tenth the overhead: 1ms app + .1ms server -> 1.1ms -> 909/sec. So your server is 10x faster but you haven't quite achieved 2x the throughput. Now, almost 2x is not too bad. But it's rapidly diminishing returns from here - a server ten times faster again only gets you to 990 requests/second. And you'll never beat 1000 requests/second because that's your application logic's limit.

Related

Why is there a discrepancy between python sockets and tcp ping for the same IP:port destination?

My setup:
I am using an IP and port provided by portmap.io to allow me to perform port forwarding.
I have OpenVPN installed (as required by portmap.io), and I run a ready-made config file when I want to operate my project.
My main effort involves sending messages between a client and a server using sockets in Python.
I have installed a software called tcping, which basically allows me to ping an IP:port over a tcp connection.
This figure basically sums it up:
Results I'm getting:
When I try to "ping" said IP, the average RTT ends up being around 30ms consistently.
I try to use the same IP to program sockets in Python, where I have a server script on my machine running, and a client script on any other machine but binding to this IP. I try sending a small message like "Hello" over the socket, and I am finding that the message is taking a significantly greater amount of time to travel across, and an inconsistent one for that matter. Sometimes it ends up taking 1 second, sometimes 400ms...
What is the reason for this discrepancy?

What is the reason for this discrepancy?
tcpping just measures the time needed to establish the TCP connection. The connection establishment is usually completely done in the OS kernel, so there is not even a switch to user space involved.
Even some small data exchange at the application is significantly more expensive. First, the initial TCP handshake must be done. Usually only once the TCP handshake is done the client starts sending the payload, which then needs to be delivered to the other side, put into the sockets read buffer, schedule the user space application to run, read the data from the buffer in the application and process, create and deliver the response to the peers OS kernel, let the kernel deliver the response to the local system and lots of stuff here too until the local app finally gets the response and ends the timing of how long this takes.
Given that the time for the last one is that much off from the pure RTT I would assume though that the server system has either low performance or high load or that the application is written badly.

Telnet server: is it good practice to keep connections open?

I'm working in a NetHack clone that is supposed to be playing through Telnet, like many NetHack servers. As I've said, this is a clone, so it's being written from scratch, on Python.
I've set up my socket server reusing code from a SMTP server I wrote a while ago and all of suddenly my attention jumped to this particular line of code:
s.listen(15)
My server was designed to be able to connect to 15 simultaneous clients just in case the data exchange with any took too long, but ideally listen(1) or listen(2) would be enough. But this case is different.
As it happens with Alt.org when you telnet their NetHack servers, people connected to my server should be able to play my roguelike remotely, through a single telnet session, so I guess this connection should not be interrupted. Yet, I've read here that
[...] if you are really holding more than 128 queued connect requests you are
a) taking too long to process them or b) need a heavy-weight
distributed server or c) suffering a DDoS attack.
What is the better practice to carry out here? Should I keep every connection open until the connected user disconnects or is there any other way? Should I go for listen(128) (or whatever my system's socket.SOMAXCONN is) or is that a bad practice?

number in listen(number) request limits number of pending connect requests.
Connect request is pending from initial SYN request received by OS until you called accept socket method. So number does not limits open (established) connection number but it limits number of connections in SYN_RECV state.
It is bad idea not to answer on incoming connection because:
Client will retransmit SYN requests until answer SYN is received
Client can not distinguish situation when your server is not available and it just in queue.
Better idea is to answer on connection but send some message to client with rejection reason and then close connection.

ETIMEDOUT occurs when client(jmeter) fired more than 1000 parallel HTTP requests

I have a python application that uses eventlet Green thread (pool of 1000 green threads) to make HTTP connections. Whenever the client fired more than 1000 parallel requests ETIMEDOUT occurs. Can anyone help me out with the possible reason?

Most likely reason in this case: DNS server request throttling. You can easily check if that's the case by eliminating DNS resolving (request http://{ip-address}/path, don't forget to add proper Host: header). If you do web crawling these steps are not optional, you absolutely must:
control concurrency automatically (without human action) based on aggregate (i.e. average) execution time. This applies at all levels independently. Back off concurrent DNS requests if you get DNS responses slower. Back off TCP concurrency if you get response speed (body size / time) slower. Back off overall request concurrency if your CPU is overloaded - don't request more than you can process.
retry on temporary failures, each time increase wait-before-retry period, search backoff algorithm. How to decide if an error is temporary? Mostly research, trial and error.
run local DNS server, find and configure many upstreams
Next popular problem with high concurrency that you'll likely face is OS limit of number of open connections and file descriptors. Search sysctl somaxconn and ulimit nofile to fix those.

Using sniffing with python elasticsearch client to solve dead TCP connection issues

The Python elasticsearch client in my applicaiton is having connectivity issues (refused connections) because idle TCP connections timeout due to a firewall (I have no way to prevent this).
The easiest way for me to fix this would be if I could prevent the connection from going idle by sending some data over it periodically, the sniffing options in the elasticsearch client seem ideal for this, however they're not very well documented:
sniff_on_start – flag indicating whether to obtain a list of nodes
from the cluser at startup time
sniffer_timeout – number of seconds
between automatic sniffs
sniff_on_connection_fail – flag controlling
if connection failure triggers a sniff
sniff_timeout – timeout used for the sniff request - it should be a fast api call and we are talking potentially to more nodes so we want to fail quickly. Not used during initial sniffing (if sniff_on_start is on) when the connection still isn’t initialized.
What I would like is for the client to sniff every (say) 5 minutes, should I be using the sniff_timeout or sniffer_timeout option? Also, should the sniff_on_start parameter be set to True?

I used the suggestion from #val and found that these settings solved my problem:
sniff_on_start=True
sniffer_timeout=60
sniff_on_connection_fail=True
The sniffing puts enough traffic on the TCP connections so that they are never idle for long enough for our firewall to kill the conneciton.

What is the maximum simultaneous HTTP connections allowed on one machine (windows server 2008) using python

To be more specific, I'm using python and making a pool of HTTPConnection (httplib) and was wondering if there is an limit on the number of concurrent HTTP connections on a windows server.

Per the HTTP RFC, a client should not maintain more than 2 simultaneous connections to a webserver or proxy. However, most browsers don't honor that - firefox 3.5 allows 6 per server and 8 per proxy.
In short, you should not be opening 1000 connections to a single server, unless your intent is to impact the performance of the server. Stress testing your server would be a good legitimate example.
[Edit]
If this is a proxy you're talking about, then that's a little different story. My suggestion is to use connection pooling. Figure out how many simultaneous connections give you the most requests per second and set a hard limit. Extra requests just have to wait in a queue until the pool frees up. Just be aware than a single process is usually capped at 1024 file descriptors by default.
Take a look through apache's mod_proxy for ideas on how to handle this.

AFAIK, the numbers of internet sockets (necessary to make TCP/IP connections) is naturally limited on every machine, but it's pretty high. 1000 simulatneous connections shouldn't be a problem for the client machine, as each socket uses only little memory. If you start receiving data through all these channels, this might change though. I've heard of test setups that created a couple of thousands connections simultaneously from a single client.
The story is usually different for the server, when it does heavy lifting for each incoming connection (like forking off a worker process etc.). 1000 incoming connections will impact its performance, and coming from the same client they can easily be taken for a DoS attack. I hope you're in charge of both the client and the server... or is it the same machine?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.