Unable to close a stream opened with pycurl

Unable to close a stream opened with pycurl - python

I am working on a client for a web service using pycurl. The client opens a connection to a stream service and spawns it into a separate thread. Here's a stripped down version of how the connection is set up:
def _setup_connection(self):
self.conn = pycurl.Curl()
self.conn.setopt(pycurl.URL, FILTER_URL)
self.conn.setopt(pycurl.POST, 1)
.
.
.
self.conn.setopt(pycurl.HTTPHEADER, headers_list)
self.conn.setopt(pycurl.WRITEFUNCTION, self.local_callback)
def up(self):
if self.conn is None:
self._setup_connection()
self.perform()
Now, when i want to shut the connection down, if I call
self.conn.close()
I get the following exception:
error: cannot invoke close() - perform() is currently running
Which, in some way makes sense, the connection is constantly open. I've been hunting around and cant seem to find any way to circumvent this problem and close the connection cleanly.

It sounds like you are invoking close() in one thread while another thread is executing perform(). Luckily, the library warns you rather than descending into unknown behavior-ville.
You should only use the curl session from one thread - or have the perform() thread somehow communicate when the call to perform() is complete.

You obviously showed some methods from a curl wrapper class, what you need to do is to let the object handles itself.
def __del__(self):
self.conn.close()
and don't call the closing explicitly. When the object finishes its job and all the references to it are removed, the curl connection will be closed.

Related

Paramiko Server: Signalling the client that stdout is closed

Trying to implement a test server in paramiko without having to modify the client for testing,
I have stumbled across the problem how to close the stdout stream, making `stdout.read()´ not hang forever without going too low-level on the client's side. So far I have been able to communicate the completed command (simple text output to stdout) execution by:
class FakeCluster(paramiko.server.ServerInterface):
def check_channel_exec_request(self,channel,command):
writemessage = channel.makefile("w")
writemessage.write("SOME COMMAND SUBMITTED")
writemessage.channel.send_exit_status(0)
return True
but I have not found a method to avoid the middle two lines in
_,stdout,_ = ssh.exec_command("<FILEPATH>")
stdout.channel.recv_exit_status()
stdout.channel.close()
print(stdout.read())
which is already a good workaround not having to call channel.exec_command diretly (found here).
Not closing the stdoutstream, my output will not print and the underlying transport on the server also remains active forever.
Closing the channel with stdout.channel.close() does not really have an effect and alternatively using os.close(writemessage.fileno()) (Difference explained here) does not work because the paramiko.channel.ChannelFile object used for the I/O streams "has no attribute 'fileno'". (Detailed explanation found here.)
Also, closing the channel directly on the server side throws a SSHException for the client..
Solutions proposed here do always modify the client side but I know from using my client script on the actual server that it must be possible without these additional lines!

In check_channel_exec_request, close the channel on server side once exit status is sent, per protocol specification which states that a channel is active per lifetime of command executed and is closed there after.
This causes channel.eof() to be True on client side, indicating command has finished and reading from channel no longer hangs.
def check_channel_exec_request(self,channel,command):
writemessage = channel.makefile("w")
writemessage.write("SOME COMMAND SUBMITTED")
writemessage.channel.send_exit_status(0)
channel.close()
return True
See this embedded server for integration testing based on paramiko that has been around for some years for reference - it implements exec requests among others. Speaking from experience, I would recommend instead using an embedded OpenSSH based server, an example of which can also be found on the same repository. Paramiko code is not particularly bug-free.

I've experienced a problem that manifested in a similar manner to this. Our issue was that we were closing the whole session as soon as we exited this. Apparently our client (libssh2) didn't like that. So we just keep trying to accept a new channel each time we close one until the transport.is_active() is False.

How do I make PySolr drop a connection?

I'm working on time series charts for 300+ clients.
It is beneficial to us to pull each client separately as the combined data is huge and in some cases clients data is resampled or manipulated in a slightly different fashion.
My problem is that the function I loop through to get each client data opens 3 new threads but never closes the threads (I'm assuming the connection stays open) when the request is complete and the function returns the data.
Once I have the results of a client, I'd like to close that connection. I just can't figure out how to do that and haven't been able to find anything in my searches.
def solr_data_pull(submitterId):
zookeeper= pysolr.ZooKeeper('ndhhadr1dnp11,ndhhadr1dnp12,ndhhadr1dnp13:2181/solr')
solr = pysolr.SolrCloud(zookeeper, collection='tran_timings', timeout=60)
query = ('SubmitterId:'+ str(submitterId) +' AND Tier:'+tier+' AND Mode:'+mode+' '
'AND Timestamp:['+ str(start_period)+' TO '+ str(end_period)+ '] ')
results = solr.search(rows=50000, q=[query], fl=[fl_list])
return(pd.DataFrame(list(results)))

PySolr uses the Session object from requests as its underlying library (which in turn uses urllib3s connection pooling), so calling solr.get_session().close() should close all connections and drain the pool:
def close(self):
"""Closes all adapters and as such the session"""
(SolrCloud is an extension of Solr which have the get_session() method.)
For disconnecting from Zookeeper - which you probably shouldn't if its a long running session as it'll have to set up watches etc. again, you can use the .zk object directly on your SolrCloud instance - zk is a KazooClient:
stop()
Gracefully stop this Zookeeper session.
close()
Free any resources held by the client.
This method should be called on a stopped client before
it is discarded. Not doing so may result in filehandles
being leaked.

How to fork and exec a server and wait until it's ready?

Suppose I've got a simple Tornado web server, which starts like this:
app = ... # create an Application
srv = tornado.httpserver.HTTPServer(app)
srv.bind(port)
srv.start()
tornado.ioloop.IOLoop.instance().start()
I am writing an "end-to-end" test, which starts the server in a separate process with subprocess.Popen and then calls the server over HTTP. Now I need to make sure the server did not fail to start (e.g. because the port is busy) and then wait till server is ready.
I wrote a function to wait until the server gets ready :
def wait_till_ready(port, n=10, time_out=0.5):
for i in range(n):
try:
requests.get("http://localhost:" + str(port))
return
except requests.exceptions.ConnectionError:
time.sleep(time_out)
raise Exception("failed to connect to the server")
Is there a better way ?
How can the parent process, which forks and execs the server, make sure that the server didn't fail because the server port is busy for example ? (I can change the server code if I need it).

You could approach it in two ways:
Make a pipe / queue before you fork. Then, just before you start the io loop, notify the parent that everything went fine and you're ready for the request.
Open the port and bind to it before forking. You should make sure you close that socket on the parent side. But otherwise, the only thing which needs to run in the child is the io loop. You can handle all the other errors before the fork.

In this Python 3 client-server example, client can't send more than one message

This is a simple client-server example where the server returns whatever the client sends, but reversed.
Server:
import socketserver
class MyTCPHandler(socketserver.BaseRequestHandler):
def handle(self):
self.data = self.request.recv(1024)
print('RECEIVED: ' + str(self.data))
self.request.sendall(str(self.data)[::-1].encode('utf-8'))
server = socketserver.TCPServer(('localhost', 9999), MyTCPHandler)
server.serve_forever()
Client:
import socket
import threading
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(('localhost',9999))
def readData():
while True:
data = s.recv(1024)
if data:
print('Received: ' + data.decode('utf-8'))
t1 = threading.Thread(target=readData)
t1.start()
def sendData():
while True:
intxt = input()
s.send(intxt.encode('utf-8'))
t2 = threading.Thread(target=sendData)
t2.start()
I took the server from an example I found on Google, but the client was written from scratch. The idea was having a client that can keep sending and receiving data from the server indefinitely.
Sending the first message with the client works. But when I try to send a second message, I get this error:
ConnectionAbortedError: [WinError 10053] An established connection was
aborted by the software in your host machine
What am I doing wrong?

For TCPServer, the handle method of the handler gets called once to handle the entire session. This may not be entirely clear from the documentation, but socketserver is, like many libraries in the stdlib, meant to serve as clear sample code as well as to be used directly, which is why the docs link to the source, where you can clearly see that it's only going to call handle once per connection (TCPServer.get_request is defined as just calling accept on the socket).
So, your server receives one buffer, sends back a response, and then quits, closing the connection.
To fix this, you need to use a loop:
def handle(self):
while True:
self.data = self.request.recv(1024)
if not self.data:
print('DISCONNECTED')
break
print('RECEIVED: ' + str(self.data))
self.request.sendall(str(self.data)[::-1].encode('utf-8'))
A few side notes:
First, using BaseRequestHandler on its own only allows you to handle one client connection at a time. As the introduction in the docs says:
These four classes process requests synchronously; each request must be completed before the next request can be started. This isn’t suitable if each request takes a long time to complete, because it requires a lot of computation, or because it returns a lot of data which the client is slow to process. The solution is to create a separate process or thread to handle each request; the ForkingMixIn and ThreadingMixIn mix-in classes can be used to support asynchronous behaviour.
Those mixin classes are described further in the rest of the introduction, and farther down the page, and at the bottom, with a nice example at the end. The docs don't make it clear, but if you need to do any CPU-intensive work in your handler, you want ForkingMixIn; if you need to share data between handlers, you want ThreadingMixIn; otherwise it doesn't matter much which you choose.
Note that if you're trying to handle a large number of simultaneous clients (more than a couple dozen), neither forking nor threading is really appropriate—which means TCPServer isn't really appropriate. For that case, you probably want asyncio, or a third-party library (Twisted, gevent, etc.).
Calling str(self.data) is a bad idea. You're just going to get the source-code-compatible representation of the byte string, like b'spam\n'. What you want is to decode the byte string into the equivalent Unicode string: self.data.decode('utf8').
There's no guarantee that each sendall on one side will match up with a single recv on the other side. TCP is a stream of bytes, not a stream of messages; it's perfectly possible to get half a message in one recv, and two and a half messages in the next one. When testing with a single connection on localhost with the system under light load, it will probably appear to "work", but as soon as you try to deploy any code that assumes that each recv gets exactly one message, your code will break. See Sockets are byte streams, not message streams for more details. Note that if your messages are just lines of text (as they are in your example), using StreamRequestHandler and its rfile attribute, instead of BaseRequestHandler and its request attribute, solves this problem trivially.
You probably want to set server.allow_reuse_address = True. Otherwise, if you quit the server and re-launch it again too quickly, it'll fail with an error like OSError: [Errno 48] Address already in use.

Python Twisted, SSL Timeout Error

from twisted.web.resource import Resource
from twisted.web.server import Site, Session
from twisted.internet import ssl
from twisted.internet import reactor
class Echo(Resource):
def render_GET(self, request):
return "GET"
class WebSite(Resource):
def start(self):
factory = Site(self, timeout=5)
factory.sessionFactory = Session
self.putChild("echo", Echo())
reactor.listenSSL(443, factory, ssl.DefaultOpenSSLContextFactory('privkey.pem', 'cacert.pem'))
#reactor.listenTCP(8080, factory)
self.sessions = factory.sessions
if __name__ == '__main__':
ws = WebSite()
ws.start()
reactor.run()
On the code above, when i enter the url "https: //localhost/echo" from the web browser, it gets the page. After 5 seconds later i try to reload the page, it does not refresh the web page, stuck on reloading operation. On the second attempt of reload, it gets the page instantly.
When i run the code above with reactor.listenTCP(8080, factory), no such problem occurs. (I can reload page without stucking reload and get the page instantly)
Problem can be repeated with Chrome, Firefox. But when i try it with Ubuntu's Epiphany browser, no such problem occurs.
I could not understand why this occurs.
Any comment about understanding/solving problem will be appriciated.
Extra info:
When i use listenSSL, file descriptor related with the connection does not close after timeout seconds later. While reloading page it stays still, and on the second reload operation, it is closed and new file descriptor is opened. (and i get page instantly)
When i use listenTCP, file descriptor closes after timeout seconds later, and when i reload page it opens new file descriptor and return page instantly.
Also with Telnet connection, it timeout connections as expected in both case.
Twisted client that connects this server also affects timeouts as expected.

The class that timeout connection is TimeoutMixin class.
and it uses transport.loseConneciton() method to timeout connections.
Somehow, DefaultOpenSSLFactory uses the connection(?), therefore loseConnection method waits for finishing the transportation and at that time it doesn't accept any process on the connection.
According to twisted documentation:
In the code above, loseConnection is called immediately after writing to the transport. The loseConnection call will close the connection only when all the data has been written by Twisted out to the operating system, so it is safe to use in this case without worrying about transport writes being lost. If a producer is being used with the transport, loseConnection will only close the connection once the producer is unregistered.
In some cases, waiting until all the data is written out is not what
we want. Due to network failures, or bugs or maliciousness in the
other side of the connection, data written to the transport may not be
deliverable, and so even though loseConnection was called the
connection will not be lost. In these cases, abortConnection can be
used: it closes the connection immediately, regardless of buffered
data that is still unwritten in the transport, or producers that are
still registered. Note that abortConnection is only available in
Twisted 11.1 and newer.
As a result, when i change loseConnection() with abortConnection() on timeoutMixinClass via overriding it, situation is no more occuring.
When i clarify the reason of why loseConnection is not enough to close connection on specific situations, i'll note it here. (any comment about it will be appreciated)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.