chatbot using twisted and wokkel - python

I am writing a chatbot using Twisted and wokkel and everything seems to be working except that bot periodically logs off. To temporarily fix that I set presence to available on every connection initialized. Does anyone know how to prevent going offline? (I assume if i keep sending available presence every minute or so bot wont go offline but that just seems too wasteful.) Suggestions anyone? Here is the presence code:
class BotPresenceClientProtocol(PresenceClientProtocol):
def connectionInitialized(self):
PresenceClientProtocol.connectionInitialized(self)
self.available(statuses={None: 'Here'})
def subscribeReceived(self, entity):
self.subscribed(entity)
self.available(statuses={None: 'Here'})
def unsubscribeReceived(self, entity):
self.unsubscribed(entity)
Thanks in advance.

If you're using XMPP, as I assume is the case given your mention of wokkel, then, per RFC 3921, the applicable standard, you do need periodic exchanges of presence information (indeed, that's a substantial overhead of XMPP, and solutions to it are being researched, but that's the state of the art as of now). Essentially, given the high likelihood that total silence from a client may be due to that client just going away, periodic "reassurance" of the kind "I'm still here" appears to be a must (I'm not sure what direction those research efforts are taking to ameliorate this situation -- maybe the client could commit to "being there for at least the next 15 minutes", but given that most clients are about a fickle human user who can't be stopped from changing their mind at any time and going away, I'm not sure that would be solid enough to be useful).

Related

How can I measure the coverage (in production system)?

I would like to measure the coverage of my Python code which gets executed in the production system.
I want an answer to this question:
Which lines get executed often (hot spots) and which lines are never used (dead code)?
Of course this must not slow down my production site.
I am not talking about measuring the coverage of tests.
I assume you are not talking about test suite code coverage which the other answer is referring to. That is a job for CI indeed.
If you want to know which code paths are hit often in your production system, then you're going to have to do some instrumentation / profiling. This will have a cost. You cannot add measurements for free. You can do it cheaply though and typically you would only run it for short amounts of time, long enough until you have your data.
Python has cProfile to do full profiling, measuring call counts per function etc. This will give you the most accurate data but will likely have relatively high impact on performance.
Alternatively, you can do statistical profiling which basically means you sample the stack on a timer instead of instrumenting everything. This can be much cheaper, even with high sampling rate! The downside of course is a loss of precision.
Even though it is surprisingly easy to do in Python, this stuff is still a bit much to put into an answer here. There is an excellent blog post by the Nylas team on this exact topic though.
The sampler below was lifted from the Nylas blog with some tweaks. After you start it, it fires an interrupt every millisecond and records the current call stack:
import collections
import signal
class Sampler(object):
def __init__(self, interval=0.001):
self.stack_counts = collections.defaultdict(int)
self.interval = interval
def start(self):
signal.signal(signal.VTALRM, self._sample)
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
def _sample(self, signum, frame):
stack = []
while frame is not None:
formatted_frame = '{}({})'.format(
frame.f_code.co_name,
frame.f_globals.get('__name__'))
stack.append(formatted_frame)
frame = frame.f_back
formatted_stack = ';'.join(reversed(stack))
self.stack_counts[formatted_stack] += 1
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
You inspect stack_counts to see what your program has been up to. This data can be plotted in a flame-graph which makes it really obvious to see in which code paths your program is spending the most time.
If i understand it right you want to learn which parts of your application is used most often by users.
TL;DR;
Use one of the metrics frameworks for python if you do not want to do it by hand. Some of them are above:
DataDog
Prometheus
Prometheus Python Client
Splunk
It is usually done by function level and it actually depends on application;
If it is a desktop app with internet access:
You can create a simple db and collect how many times your functions are called. For accomplish it you can write a simple function and call it inside every function that you want to track. After that you can define an asynchronous task to upload your data to internet.
If it is a web application:
You can track which functions are called from js (mostly preferred for user behaviour tracking) or from web api. It is a good practice to start from outer to go inner. First detect which end points are frequently called (If you are using a proxy like nginx you can analyze server logs to gather information. It is the easiest and cleanest way). After that insert a logger to every other function that you want to track and simply analyze your logs for every week or month.
But you want to analyze your production code line by line (it is a very bad idea) you can start your application with python profilers. Python has one already: cProfile.
Maybe make a text file and through your every program method just append some text referenced to it like "Method one executed". Run the web application like 10 times thoroughly as a viewer would and after this make a python program that reads the file and counts a specific parts of it or maybe even a pattern and adds it to a variable and outputs the variables.

How to delay a ROS Topic by a certain amount of time

Actual problem:
I have a controller node that subscribes to 2 topics and publishes to 1 topic. Although, in simulation everything seems to be working as expected, in actual HW, the performance degrades. I suspect the problem is that one of the two input topics is lagging behind the other one by a significant amount of time.
Question:
I want to re-create this behavior in simulation in order to test the robustness of the controller. Therefore, I need to delay one of the topics by a certain amount of time - ideally this should be configurable parameter. I could write a node that has a FIFO memory buffer and adjust the delay-time by monitoring the frequency of the topic. Before I do that, is there a command line tool or any other quick to implement method that I can use?
P.s. I'm using Ubuntu 16.04 and ROS-Kinetic.
I do not know of any out of the box solutions that would do exactly what you describe here.
For a quick hack, if your topic does not have a timestamp and the node just takes in the messages as they arrive the easiest thing to do would be to record a bag and play the two topics back from two different instances of rosbag play. Something like this:
first terminal
rosbag play mybag.bag --clock --topics /my/topic
second terminal started some amount of time later
rosbag play mybag.bag --topics /my/other_topic
Not sure about the --clock flag, whether you need it depends mostly on what you mean by simulation. If you would want to control the time difference more than pressing enter in two different terminals you could write a small bash script to launch them.
Another option that would still involve bags, but would give you more control over the exact time the message is delayed by would be to edit the bag to have the messages in the bag already with the correct delay. Could be done relatively easily by modifying the first example in the rosbag cookbook:
import rosbag
with rosbag.Bag('output.bag', 'w') as outbag:
for topic, msg, t in rosbag.Bag('input.bag').read_messages():
# This also replaces tf timestamps under the assumption
# that all transforms in the message share the same timestamp
if topic == "/tf" and msg.transforms:
outbag.write(topic, msg, msg.transforms[0].header.stamp)
else:
outbag.write(topic, msg, msg.header.stamp if msg._has_header else t)
Replacing the if else with:
...
import rospy
...
if topic == "/my/other_topic":
outbag.write(topic, msg, t + rospy.Duration.from_sec(0.5))
else:
outbag.write(topic, msg, t)
Should get you most of the way there.
Other than that if you think the node would be useful in the future or you want to have it work on live data as well then you would need to implement the the node you described with some queue. One thing you could look at for inspiration could be the topic tools, git for topic tools source.

Scraping Edgar with Python regular expressions

I am working on a personal project's initial stage of downloading 10-Q statements from EDGAR. Quick disclaimer, I am very new to programming and python so the code that I wrote is very basic, not even using custom functions and classes, just a very long script that I'm more comfortable editing. As a result, some solutions are quite rough (i.e. concatenating urls using CIKs and other search options instead of doing requests with "browser" headers)
I keep running into a problem that those who have scraped EDGAR might be familiar with. Every now and then my script just stops running. It doesn't raise any exceptions (I created some that append txt reports with links that can't be opened and so forth). I suspect that either SEC servers have a certain limit of requests from an IP per some unit of time (if I wait some time after CTRL-C'ing the script and run it again, it generates more output compared to rapid re-activation), alternatively it could be TWC that identifies me as a bot and limits such requests.
If it's SEC, what could potentially work? I tried learning how to work with TOR and potentially get a new IP every now and then but I can't really find some basic tutorial that would work for my level of expertise. Maybe someone can recommend something good on the topic?
Maybe the timers would work? Like force the script to sleep every hour or so (still trying to figure out how to make such timers and reset them if an event occurs). The main challenge with this particular problem is that I can't let it run at night.
Thank you in advance for any advice, I keep fighting with it for days and at this stage it could take me more than a month to get what I want (before I even start tackling 10-Ks)
It seems like delays are pretty useful - sitting at 3.5k downloads with no interruptions thanks to a simple:
import(time)
time.sleep(random.randint(0, 1) + abs(random.normalvariate(0, 0.2)))

Detecting serial port settings

From time to time I suddenly have a need to connect to a device's console via its serial port. The problem is, I never remember what port settings (baud rate, data bits, stop bits, etc...) to use with each particular device, and documentation never seems to be lying around when it's really needed.
I wrote a Python script, which uses a simple brute-force method (i.e. iterates over all possible settings, sends some test input and displays the response for a human to decide if it makes sense ), but:
it takes a long time to complete
does not always work (perhaps port reset/timeout issues)
just does not seem like a proper way to do this :)
So the question is: does anyone know of a procedure to auto-detect what port settings the remote device is using?
Although part 1 is no direct answer to your question:
There are devices, which just have a autodetection (called Auto-bauding) method included, that means: Send a character using your current settings (9k6, 115k2, ..) to the device and chances are high that the device will answer with your (!) settings. I've seen this on HP switches.
Second approach: try to re-order the connection possibilities. E.g. chances are high that the other end uses 9k6 with no hardware handshake, but less that it uses 38k4 with software Xon/Xoff.
If you break down your tries into just a few, the "brute force" method will be much more efficient.

sending instant messages through python (msn)

ok I am well aware there are many other questions about this, but I have been searching and have yet to find a solid proper answer that doesnt revolve around jabber or something worse. (no offense to jabber users, just I don't want all the extras that come with it)
I currently have msnp and twisted.words, I simply want to send and receive messages, have read many examples that have failed to work, and msnp is poorly documented.
My preference is msnp as it requires much less code, I'm not looking for something complicated.
Using this code I can login, and view my friends that are online (can't send them messages though.):
import msnp
import time, threading
msn = msnp.Session()
msn.login('XXXXXXX#hotmail.com', 'XXXXXX')
msn.sync_friend_list()
class MSN_Thread(threading.Thread):
def run(self):
msn.start_chat("XXXXXXX#hotmail.com") #this does not work
while True:
msn.process()
time.sleep(1)
start_msn = MSN_Thread()
start_msn.start()
I hope I have been clear enough, its pretty late and my head is not in a clear state after all this msn frustration.
edit: since it seems msnp is extremely outdated could anyone recommend with simple examples on how I could achieve this?
Don't need anything fancy that requires other accounts.
There is also xmpp which is used for gmail.
You are using a library abandoned in 2004 so i'm not sure if msnp could still be used to talk on MSN.
Anyway i would try with:
while True:
msn.process(chats = True)
time.sleep(1)
using the contact id and not the email address.
contacts = msn.friend_list.get_friends()
contact_id = contacts.get_passport_id()
Your code just start the chat without sending anything; you need to add the code to send message.
Have a look to send_message method in this tutorial.
It looks like papyon is a maintained fork of the pymsn library, and is currently used by telepathy-butterfly and amsn2.
papyon is an MSN client library, that tries to abstract the MSN protocol gory details. It is a fork of the unmaintained pymsn MSN library. papyon uses the GLib main event loop to process the network events in an asynchronous manner.

Categories