how to properly close a tweepy stream

how to properly close a tweepy stream - python

I'm trying to figure out how to properly close an asynchronous tweepy stream.
The tweepy streaming module can be found here.
I start the stream like this:
stream = Stream(auth, listener)
stream.filter(track=['keyword'], async=True)
When closing the application, I try to close the stream as simple as:
stream.disconnect()
This method seems to work as intended but it seems to have one problem:
the stream thread is still in the middle of the loop (waiting/handling tweets) and is not killed until the next loop, so when the stream receives a tweet even after the app has closed, it still tries to call the listener object (this can be seen with a simple print syntax on the listener object). I'm not sure if this is a bad thing or if it can simply be ignored.
I have 2 questions:
Is this the best way to close the stream or should I take a different approach?
Shouldn't the async thread be created as a daemon thread?

I had the same problem. I fixed it with restarting the script. Tweepy Stream doesn't stop until the next incoming tweet.
Example:
import sys
import os
python=sys.executable
time.sleep(10)
print "restart"
os.execl(python,python,*sys.argv)
I didn't find another solution.

I am not positive that it applies to your situation, but in general you can have applicable entities clean up after themselves by putting them in a with block:
with stream = Stream(auth, listener):
stream.filter(track=['keyword'], async=True)
# ...
# Outside the with-block; stream is automatically disposed of.
What "disposed of" actually means, it that the entities __exit__ function is called.
Presumably tweepy will have overridden that to Do The Right Thing.
As #VooDooNOFX suggests, you can check the source to be sure.

This is by design. Looking at the source, you will notice that disconnect has no immediate termination option.
def disconnect(self):
if self.running is False:
return
self.running = False
When calling disconnect(), it simply sets self.running = False, which is then checked on the next loop of the _run method
You can ignore this side effect.

Instead of restarting the script, as #burkay suggests, I finally deleted the Stream object and started a new one. In my example, someone wants to add a new user to be followed, so I update the track list this way:
stream.disconnect() # that should wait until next tweet, so let's delete it
del stream
# now, create a new object
stream = tweepy.Stream( auth=api.auth, listener=listener )
stream.userstream( track=all_users(), async=True )

Related

Why should asyncio.StreamWriter.drain be explicitly called?

From doc:
write(data)
Write data to the stream.
This method is not subject to flow control. Calls to write() should be followed by drain().
coroutine drain()
Wait until it is appropriate to resume writing to the stream. Example:
writer.write(data)
await writer.drain()
From what I understand,
You need to call drain every time write is called.
If not I guess, write will block the loop thread
Then why is write not a coroutine that calls it automatically? Why would one call write without having to drain? I can think of two cases
You want to write and close immediately
You have to buffer some data before the message it is complete.
First one is a special case, I think we can have a different API. Buffering should be handled inside write function and application should not care.
Let me put the question differently. What is the drawback of doing this? Does the python3.8 version effectively do this?
async def awrite(writer, data):
writer.write(data)
await writer.drain()
Note: drain doc explicitly states the below:
When there is nothing to wait for, the drain() returns immediately.
Reading the answer and links again, I think the functions work like this. Note: Check accepted answer for more accurate version.
def write(data):
remaining = socket.try_write(data)
if remaining:
_pendingbuffer.append(remaining) # Buffer will keep growing if other side is slow and we have a lot of data
async def drain():
if len(_pendingbuffer) < BUF_LIMIT:
return
await wait_until_other_side_is_up_to_speed()
assert len(_pendingbuffer) < BUF_LIMIT
async def awrite(writer, data):
writer.write(data)
await writer.drain()
So when to use what:
When the data is not continuous, Like responding to an HTTP request. We just need to send some data and don't care about when it is reached and memory is not a concern - Just use write
Same as above but memory is a concern, use awrite
When streaming data to a large number of clients (e.g. some live stream or a huge file). If the data is duplicated in each of the connection's buffers, it will definitely overflow RAM. In this case, write a loop that takes a chunk of data each iteration and call awrite. In case of a huge file, loop.sendfile is better if available.

From what I understand, (1) You need to call drain every time write is called. (2) If not I guess, write will block the loop thread
Neither is correct, but the confusion is quite understandable. The way write() works is as follows:
A call to write() just stashes the data to a buffer, leaving it to the event loop to actually write it out at a later time, and without further intervention by the program. As far as the application is concerned, the data is written in the background as fast as the other side is capable of receiving it. In other words, each write() will schedule its data to be transferred using as many OS-level writes as it takes, with those writes issued when the corresponding file descriptor is actually writable. All this happens automatically, even without ever awaiting drain().
write() is not a coroutine, and it absolutely never blocks the event loop.
The second property sounds convenient - you can call write() wherever you need to, even from a function that's not async def - but it's actually a major flaw of write(). Writing as exposed by the streams API is completely decoupled from the OS accepting the data, so if you write data faster than your network peer can read it, the internal buffer will keep growing and you'll have a memory leak on your hands. drain() fixes that problem: awaiting it pauses the coroutine if the write buffer has grown too large, and resumes it again once the os.write()'s performed in the background are successful and the buffer shrinks.
You don't need to await drain() after every write, but you do need to await it occasionally, typically between iterations of a loop in which write() is invoked. For example:
while True:
response = await peer1.readline()
peer2.write(b'<response>')
peer2.write(response)
peer2.write(b'</response>')
await peer2.drain()
drain() returns immediately if the amount of pending unwritten data is small. If the data exceeds a high threshold, drain() will suspend the calling coroutine until the amount of pending unwritten data drops beneath a low threshold. The pause will cause the coroutine to stop reading from peer1, which will in turn cause the peer to slow down the rate at which it sends us data. This kind of feedback is referred to as back-pressure.
Buffering should be handled inside write function and application should not care.
That is pretty much how write() works now - it does handle buffering and it lets the application not care, for better or worse. Also see this answer for additional info.
Addressing the edited part of the question:
Reading the answer and links again, I think the the functions work like this.
write() is still a bit smarter than that. It won't try to write only once, it will actually arrange for data to continue to be written until there is no data left to write. This will happen even if you never await drain() - the only thing the application must do is let the event loop run its course for long enough to write everything out.
A more correct pseudo code of write and drain might look like this:
class ToyWriter:
def __init__(self):
self._buf = bytearray()
self._empty = asyncio.Event(True)
def write(self, data):
self._buf.extend(data)
loop.add_writer(self._fd, self._do_write)
self._empty.clear()
def _do_write(self):
# Automatically invoked by the event loop when the
# file descriptor is writable, regardless of whether
# anyone calls drain()
while self._buf:
try:
nwritten = os.write(self._fd, self._buf)
except OSError as e:
if e.errno == errno.EWOULDBLOCK:
return # continue once we're writable again
raise
self._buf = self._buf[nwritten:]
self._empty.set()
loop.remove_writer(self._fd, self._do_write)
async def drain(self):
if len(self._buf) > 64*1024:
await self._empty.wait()
The actual implementation is more complicated because:
it's written on top of a Twisted-style transport/protocol layer with its own sophisticated flow control, not on top of os.write;
drain() doesn't really wait until the buffer is empty, but until it reaches a low watermark;
exceptions other than EWOULDBLOCK raised in _do_write are stored and re-raised in drain().
The last point is another good reason to call drain() - to actually notice that the peer is gone by the fact that writing to it is failing.

Block python's atexit during a crash?

I have a python script I've written which uses atexit.register() to run a function to persist a list of dictionaries when the program exits. However, this code is also running when the script exits due to a crash or runtime error. Usually, this results in the data becoming corrupted.
Is there any way to block it from running when the program exits abnormally?
EDIT: To clarify, this involves a program using flask, and I'm trying to prevent the data persistence code from running on an exit that results from an error being raised.

You don't want to use atexit with Flask. You want to use Flask signals. It sounds like you are specifically looking for the request_finished signal.
from flask import request_finished
def request_finished_handler(sender, response, **extra):
sender.logger.debug('Request context is about to close down. '
'Response: %s', response)
# do some fancy storage stuff.
request_finished.connect(request_finished_handler, app)
The benefit of request_finished is that it only fires after a successful response. That means that so long as there isn't an error in another signal, you should be good.

One way: at global level in main program:
abormal_termination = False
def your_cleanup_function():
# Add next two lines at the top
if abnormal_termination:
return
# ...
# At end of main program:
try:
# your original code goes here
except Exception: # replace according to what *you* consider "abnormal"
abnormal_termination = True # stop atexit handler
Not pretty, but straightforward ;-)

client not receiving data via TCP socket

I have started network programming using Python and am working on a basic peer-to-peer chat client-server application. I got it working for console, but am facing problem while developing a GUI.
This is the code for my client script. It is sending data to the server but is unable to receive /display the data sent from server, I am at a loss. Please show the error in my code and the solution.
from socket import *
from tkinter import *
host="127.0.0.1"
port=1420
buffer=1024
server=(host,port)
clientsock=socket(AF_INET,SOCK_STREAM)
clientsock.connect(server)
class ipbcc(Frame):
def __init__(self,master):
Frame.__init__(self,master)
self.grid()
self.create()
self.connect()
def write(self,event):
msg=self.e.get()
clientsock.send(msg.encode())
def create(self):
self.pic=PhotoImage(file="logo.gif")
self.label=Label(self,image=self.pic)
self.label.grid(column=0)
self.wall=Text(self,width=70,height=20,wrap=WORD)
self.wall.grid(row = 0, column = 1, columnspan = 2, sticky = W)
self.e=Entry(self,width=50)
self.e.grid(row = 1, column = 1, sticky = W)
self.e.bind('<Return>',self.write)
def add(self,data):
self.wall.insert(END,data)
def connect(self):
def xloop():
while 1:
data=clientsock.recv(buffer).decode()
print(data)
self.add(data)
root=Tk()
root.title("IPBCC v0.1")
app=ipbcc(root)
root.mainloop()
PS: Python Version 3.3 and there is no problem in the server script.

Your connect function defines a function called xloop, but it doesn't call that function, or return it, or store it somewhere for anyone else to call it. You need to call that function for it to do anything.
Of course if you just call it directly inline, it will run forever, meaning you never get back to the event loop, and the UI freezes up and stops responding to the user.
There are two options for this: threading, or polling.
The obvious way to do this is with a background thread. The basic idea is very simple:
def connect(self):
def xloop():
while 1:
data=clientsock.recv(buffer).decode()
print(data)
self.add(data)
self.t = threading.Thread(target=xloop)
self.t.start()
However, there are two problems with this.
First, there's no way to stop the background thread. When you try to exit the program, it will wait for the background thread to stop—which means it will wait forever.
There's an easy solution to that one: if you make it a "daemon thread", it will be summarily killed when the main program exits. This is obviously no good for threads that are doing work that could be corrupted if interrupted in the middle, but in your case that doesn't seem to be a problem. So, just change one line:
self.t = threading.Thread(target=xloop, daemon=True)
Second, that self.add method needs to modify a Tkinter widget. You can't do that from a background thread. Depending on your platform, it may fail silently, raise an exception, or even crash—or, worse, it may work 99% of the time and fail 1%.
So, you need some way to send a message to the main thread, asking it to do the widget modification for you. This is a bit complicated, but Tkinter and Threads explains how to do it.
Alternatively, you could use mtTkinter, which intercepts Tkinter calls in background threads and passes them to the main thread automatically, so you don't have to worry about it.
The other option is to change the blocking xloop function into a nonblocking function that polls for data. The problem is that you want to wait on Tkinter GUI events, but you also want to wait on the socket.
If you could integrate the socket into the main event loop, that would be easy: a new message coming in would be handled just like any other event. Some of the more powerful GUI frameworks like Qt give you ways to do this, but Tkinter does not. A reactor framework like Twisted can tie itself into Tkinter and add it for you (or at least fake nicely). But if you want to stick with your basic design, you have to do it yourself.
So, there are two options:
Give Tkinter full control. Ask it to call your function every, say, 1/20th of a second, and in the function do a non-blocking check. Or maybe loop around non-blocking checks until there's nothing left to read.
Give the socket control. Ask Tkinter to call your function every time it gets a chance, and block for 1/20th of a second checking for data before returning to Tkinter.
Of course 1/20th of a second may not be the right length—for many applications, no answer is really correct. Anyway, here's a simple example:
def poll_socket(self):
r, w, x = select.select([clientsock], [], [], 0)
if r:
data=clientsock.recv(buffer).decode()
print(data)
self.add(data)
self.after(50, self.poll_socket)
def connect(self):
self.after(50, self.poll_socket)

You define xloop, however you never actually call it as far as I can see.
I would suggest you look into using threads - the threading module in the standard library would be one way to go. Then, in your code you will be able to create a thread running the xloop function, without stopping the rest of your code. Alternatively, you could remove the loop from xloop (or indeed just put the code in the function into the connect function) and call it periodically, using widget.after(milliseconds, a_function)
I'd also like to mention that from amodule import * is considered bad practice (although tkinter is one of the exceptions to this rule).

It might help to follow the flow. The "app=ipbcc(root)" step would call "self.connect()" and that has a "def xloop():" that has the step "data=clientsock.recv". But, then somebody needs to invoke xloop(). Who does that? Btw, why do have a function inside a method?
Also, I don't see anybody invoking the "clientsock.send(msg.encode())" via the write() method. I am not familiar with the Tinker part (and what the mainloop() does), so can you please check if there are callers to send() and the recv() call.

How to kill a requests Request object that is in progress

Edit: The main part of this question before this revision/update was how to terminate a QThread. That has been solved, the question is being revised to How to kill a requests rest object that is in progress.
http://docs.python-requests.org/en/v0.10.4/user/advanced/#asynchronous-requests
It appears using an Asynchronous Request still blocks - the user can't cancel the post while its in progress.
Basically this is the functionality that is needed:
When the user presses Stop Uploading, the uploading must stop instantly, I can stop the thread using stop() however it is only checked if it should stop once the loop has looped over again.
So basically, it should be possible to use an asynchronous request that would let me check if it should be cancelled during the request, however, I don't know how.
Any suggestions? The previous part of the post is still relevant so it's below.
Please note that the initial question of how to terminate a QThread has been solved, and as such the code below isn't too important, it's just for context, the thing I need help with now is what I just described.
I've been writing a program, it's a photo uploader, I've made a thread
in which I upload the files. I can't figure out how to exit the
thread. I've tried suggestions i've read from here:
1) I've tried a bool flag, wrapping it both around the method and the
for statement that does the work.
2) I've use a 'with' and then tried to set an exception.
I want to be able to cancel uploading, preferably quickly. I've read a
lot that it's always recommended to "clean up" the thread before
terminating it, I honestly don't know what 'clean it up' means.
But, I think I should be able to just kill the thread - since all it's
doing is sending the binary data of an image to the TUMBLR api. It
shouldn't matter if the request is cancelled early, because It will
just cancel the upload in the api too.
Anyway, here is my thread:
class WorkThread(QtCore.QThread):
def __init__(self):
QtCore.QThread.__init__(self)
global w
w = WorkThread
def __del__(slf):
self.wait()
def run(self):
url = 'https://www.tumblr.com/api/write'
files = os.listdir(directory)
for file in files:
file_path = os.path.join(directory + '\\' + file)
file_path = str(file_path)
if file_path[-3:] in ['bmp', 'gif', 'jpg', 'png', 'thm', 'tif', 'yuv']:
self.emit(QtCore.SIGNAL('update(QString)'), "Uploading %s ..." % file_path)
print smart_str(password)
data = {'email': email, 'password': password, 'type': 'photo'}
with open(file_path, 'rb') as photo:
r = requests.post(url, data=data, files={'data': photo})
print r.content
self.emit(QtCore.SIGNAL('update(QString)'), 'All finished! Go check out your stuff on tumblr :)\n')
return
This is how im calling it.
def upload(self):
self.doneBox.clear()
self.workThread = WorkThread()
self.connect( self.workThread, QtCore.SIGNAL("update(QString)"), self.startUploading )
self.workThread.start()
Can anyone suggest in a way that I can terminate the thread quickly
and quietly? OR if you think that's not good, a way to stop it safely.
HOWEVER, If I do not kill it instantly, and it goes through the for
loop again in the run() method, It will upload the photo that it was
uploading WHEN the user presses "Stop Uploading". I
wouldn't want it to do that, I'd prefer to have it stop uploading the
current photo the second the user presses "Stop Uploading".
Thanks.

I'm not sure what you are doing with that global w; w = WorkThread but it seems pointless since you aren't doing anything with it.
In your __init__() you want a flag:
def __init__(self):
...
self._isRunning = False
In your run() method you just check for this flag and exit when needed:
def run(self):
self._isRunning = True
while True:
if not self._isRunning:
# clean up or close anything here
return
# do rest of work
And maybe add a simple convenience method:
def stop(self):
self._isRunning = False
self.wait()
When documentation refers to cleaning up they are talking about closing down resources that you may have opened, undoing things that might be partially started, or anything else that you might want to signal or handle before just killing the thread object. In the case of your uploader, you want to check if the thread should exit before you start each upload process. This will give you the chance to stop the thread and have it exit before it continues with another bit of work.

Try terminating it:
self.workThread.terminate()
Just be careful while doing this. Since you are performing a network operation, you can't stop it, but usually you do this:
# Logic...
if self.running:
# Continue logic...
You set self.running from outside of the thread.

How to poll a file in /sys

I am stuck reading a file in /sys/ which contains the light intensity in Lux of the ambient light sensor on my Nokia N900 phone.
See thread on talk.maemo.org here
I tried to use pyinotify to poll the file but this looks some kind of wrong to me since the file is alway "process_IN_OPEN", "process_IN_ACCESS" and "process_IN_CLOSE_NOWRITE"
I basically want to get the changes ASAP and if something changed trigger an event, execute a class...
Here's the code I tried, which works, but not as I expected (I was hoping for process_IN_MODIFY to be triggered):
#!/usr/bin/env python
import os, time, pyinotify
import pyinotify
ambient_sensor = '/sys/class/i2c-adapter/i2c-2/2-0029/lux'
wm = pyinotify.WatchManager() # Watch Manager
mask = pyinotify.ALL_EVENTS
def action(self, the_event):
value = open(the_event.pathname, 'r').read().strip()
return value
class EventHandler(pyinotify.ProcessEvent):
...
def process_IN_MODIFY(self, event):
print "MODIFY event:", action(self, event)
...
#log.setLevel(10)
notifier = pyinotify.ThreadedNotifier(wm, EventHandler())
notifier.start()
wdd = wm.add_watch(ambient_sensor, mask)
wdd
time.sleep(5)
notifier.stop()
Update 1:
Mmmh, all I came up without having a clue if there is a special mechanism is the following:
f = open('/sys/class/i2c-adapter/i2c-2/2-0029/lux')
while True:
value = f.read()
print value
f.seek(0)
This, wrapped in a own thread, could to the trick, but does anyone have a smarter, less CPU-hogging and faster way to get the latest value?

Since the /sys/file is a pseudo-file which just presents a view on an underlying, volatile operating system value, it makes sense that there would never be a modify event raised. Since the file is "modified" from below it doesn't follow regular file-system semantics.
If a modify event is never raised, using a package like pinotify isn't going to get you anywhere. 'twould be better to look for a platform-specific mechanism.
Response to Update 1:
Since the N900 maemo runtime supports GFileMonitor, you'd do well to check if it can provide the asynchronous event that you desire.
Busy waiting - as I gather you know - is wasteful. On a phone it can really drain a battery. You should at least sleep in your busy loop.

Mmmh, all I came up without having a clue if there is a special mechanism is the following:
f = open('/sys/class/i2c-adapter/i2c-2/2-0029/lux')
while True:
value = f.read()
print value
f.seek(0)
This, wrapped in a own thread, could to the trick, but does anyone have a smarter, less CPU-hogging and faster way to get the latest value?
Cheers
Bjoern

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to properly close a tweepy stream - python

I had the same problem. I fixed it with restarting the script. Tweepy Stream doesn't stop until the next incoming tweet. Example: import sys import os python=sys.executable time.sleep(10) print "restart" os.execl(python,python,*sys.argv) I didn't find another solution.

Related

Why should asyncio.StreamWriter.drain be explicitly called?

Block python's atexit during a crash?

client not receiving data via TCP socket

How to kill a requests Request object that is in progress

How to poll a file in /sys

Categories

Resources