What's the best way to parse messages received from an IRC server with Python according to the RFC? I simply want some kind of list/whatever, for example:
:test!~test#test.com PRIVMSG #channel :Hi!
becomes this:
{ "sender" : "test!~test#test.com", "target" : "#channel", "message" : "Hi!" }
And so on?
(Edit: I want to parse IRC messages in general, not just PRIVMSG's)
Look at Twisted's implementation http://twistedmatrix.com/
Unfortunately I'm out of time, maybe someone else can paste it here for you.
Edit
Well I'm back, and strangely no one has pasted it yet so here it is:
http://twistedmatrix.com/trac/browser/trunk/twisted/words/protocols/irc.py#54
def parsemsg(s):
"""Breaks a message from an IRC server into its prefix, command, and arguments.
"""
prefix = ''
trailing = []
if not s:
raise IRCBadMessage("Empty line.")
if s[0] == ':':
prefix, s = s[1:].split(' ', 1)
if s.find(' :') != -1:
s, trailing = s.split(' :', 1)
args = s.split()
args.append(trailing)
else:
args = s.split()
command = args.pop(0)
return prefix, command, args
parsemsg(":test!~test#test.com PRIVMSG #channel :Hi!")
# ('test!~test#test.com', 'PRIVMSG', ['#channel', 'Hi!'])
This function closely follows the EBNF described in the IRC RFC.
You can do it with a simple list comprehension if the format is always like this.
keys = ['sender', 'type', 'target', 'message']
s = ":test!~test#test.com PRIVMSG #channel :Hi!"
dict((key, value.lstrip(':')) for key, value in zip(keys, s.split()))
Result:
{'message': 'Hi!', 'type': 'PRIVMSG', 'sender': 'test!~test#test.com', 'target': '#channel'}
Do you just want to parse IRC Messages in general or do you want just parse PRIVMSGs? However I have a implementation for that.
def parse_message(s):
prefix = ''
trailing = ''
if s.startswith(':'):
prefix, s = s[1:].split(' ', 1)
if ' :' in s:
s, trailing = s.split(' :', 1)
args = s.split()
return prefix, args.pop(0), args, trailing
If you want to keep to a low-level hacking I second the Twisted answer by Unknown, but first I think you should take a look at the very recently announced Yardbird which is a nice request parsing layer on top of Twisted. It lets you use something similar to Django URL dispatching for handling IRC messages with a side benefit of having the Django ORM available for generating responses, etc.
I know it's not Python, but for a regular expression-based approach to this problem, you could take a look at POE::Filter::IRCD, which handles IRC server protocol (see POE::Filter::IRC::Compat for the client protocol additions) parsing for Perl's POE::Component::IRC framework.
Related
I'm experimenting with using user inputs and Twilio to create a "new messaging platform". (It sounds ridiculous, I know, but I mainly want to see if this would work.) Anyway, when I am running my python code, it throws the error call() got an unexpected keyword argument 'body'. I don't know if this is my formatting or something else, but it's really annoying because I'm pretty close to being finished. Here's my code:
account_sid = 'AC4b7b29794774f13edbaeb19121730dbb'
auth_token = '---'
client = Client(account_sid, auth_token)
def sendText():
myNum = '+19737848243'
num = input('Enter your sender\'s phone number here: ')
text = input('Enter your message here: ')
myNum = str(myNum)
text = str(text)
num = str(num)
message = client.messages(
body=text,
from_=myNum,
to=num
)
print(message.sid)
sendText()
(BTW I'm not showing my auth_token on this post, so that's not the error. Trust me.)
Any help would be gladly appreciated. Thanks!
Just so this shows as answered, converting my comment:
You're supposed to call the create method of messages, not messages itself:
client.messages.create(body=text,
from_=myNum,
to=num)
messages appears to be callable, but what it's supposed to be used for is not made clear in the documentation, and clearly it doesn't take the args you were passing.
I created a bot which connect to the chan through socket like this
socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket.connect((network,port))
irc = ssl.wrap_socket(socket)
Then i send some message when some actions are triggered, this works quite well but there is one messsage which is truncated, and my script don't return any error. Here is the code of this message :
def GimmeUrlInfos(channel,message):
link = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', message)
response = urllib2.urlopen(link[0])
html = BeautifulSoup(response.read())
urlTitle = html.find('title')
irc.send("PRIVMSG %s Link infos:" % (channel) + urlTitle.contents[0] + "\r\n" )
The script look in the message if there is a link inside, if yes beautifulSoup get the title of the HTML page. So it's returns something like: Link infos: THis is the Title of the Webpage you give in your message.
But it only returns
Link
at the channel. Is there some limitations or something ?
Here's my next guess, now that you've given us a little more information:
Your string looks like this:
PRIVMSG #mychannel Link infos: Title of Page\r\n
In IRC, arguments are split on spaces, except that an argument that starts with a colon can include spaces, and runs to the end of the line. So, your target is #mychannel, your message is Link, and the whole rest of the line is a bunch of extra arguments that are ignored.
To fix this, you want to send:
PRIVMSG #mychannel :Link infos: Title of Page\r\n
So, change your code like this:
irc.send("PRIVMSG %s :Link infos:" % (channel) + urlTitle.contents[0] + "\r\n" )
For more details on how messages are formatted in RFC, and on the PRIVMSG command, see 2.3.1 Message format in 'pseudo' BNF and 4.4.1 Private messages in RFC 1459.
It's hard to tell from your question, but I think you wanted to send something like this:
PRIVMSG #mychannel Link infos: Title of Page\r\n
… and actually only sent something like this:
PRIVMSG #mychannel Link
One possible explanation of this is that socket.send and SSLSocket.send don't necessarily send the entire string you give it. That's why they returns a number of bytes sent. If you want to block until it's able to send the whole string, use sendall instead.
Update It seems to be the way untagged responses are handled by twisted, the only example I have found seem to iterate through the data received and somehow collect the response to their command though I am not sure how...
I am trying to implement the IMAP4 quota commands as defined in RFC 2087 ( https://www.rfc-editor.org/rfc/rfc2087 ).
Code - ImapClient
class SimpleIMAP4Client(imap4.IMAP4Client):
"""
A client with callbacks for greeting messages from an IMAP server.
"""
greetDeferred = None
def serverGreeting(self, caps):
self.serverCapabilities = caps
if self.greetDeferred is not None:
d, self.greetDeferred = self.greetDeferred, None
d.callback(self)
def lineReceived(self, line):
print "<" + str(line)
return imap4.IMAP4Client.lineReceived(self, line)
def sendLine(self, line):
print ">" + str(line)
return imap4.IMAP4Client.sendLine(self, line)
Code - QUOTAROOT Implementation
def cbExamineMbox(result, proto):
"""
Callback invoked when examine command completes.
Retrieve the subject header of every message in the mailbox.
"""
print "Fetching storage space"
cmd = "GETQUOTAROOT"
args = _prepareMailboxName("INBOX")
resp = ("QUOTAROOT", "QUOTA")
d = proto.sendCommand(Command(cmd, args, wantResponse=resp))
d.addCallback(cbFetch, proto)
return d
def cbFetch(result, proto):
"""
Finally, display headers.
"""
print "Got Quota"
print result
Output
Fetching storage space
>0005 GETQUOTAROOT INBOX
<* QUOTAROOT "INBOX" ""
<* QUOTA "" (STORAGE 171609 10584342)
<0005 OK Success
Got Quota
([], 'OK Success')
So I am getting the data but the result doesn't contain it, I am thinking it is because they are untagged responses?
Since the IMAP4 protocol mixes together lots of different kinds of information as "untagged responses", you probably also need to update some other parts of the parsing code in the IMAP4 client implementation.
Specifically, take a look at twisted.mail.imap4.Command and its finish method. Also look at twisted.mail.imap4.IMAP4Client._extraInfo, which is what is passed as the unusedCallback to Command.finish.
To start, you can check to see if the untagged responses to the QUOTA command are being sent to _extraInfo (and then dropped (well, logged)).
If so, I suspect you want to teach Command to recognize QUOTA and QUOTAROOT untagged responses to the QUOTA command, so that it collects them and sends them as part of the result it fires its Deferred with.
If not, you may need to dig a bit deeper into the logic of Command.finish to see where the data does end up.
You may also want to actually implement the Command.wantResponse feature, which appears to be only partially formed currently (ie, lots of client code tries to send interesting values into Command to initialize that attribute, but as far as I can tell nothing actually uses the value of that attribute).
I am using python sockets to receive web style and soap requests. The code I have is
import socket
svrsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = socket.gethostname();
svrsocket.bind((host,8091))
svrsocket.listen(1)
clientSocket, clientAddress = svrsocket.accept()
message = clientSocket.recv(4096)
Some of the soap requests I receive, however, are huge. 650k huge, and this could become several Mb. Instead of the single recv I tried
message = ''
while True:
data = clientSocket.recv(4096)
if len(data) == 0:
break;
message = message + data
but I never receive a 0 byte data chunk with firefox or safari, although the python socket how to says I should.
What can I do to get round this?
Unfortunately you can't solve this on the TCP level - HTTP defines its own connection management, see RFC 2616. This basically means you need to parse the stream (at least the headers) to figure out when a connection could be closed.
See related questions here - https://stackoverflow.com/search?q=http+connection
Hiya
Firstly I want to reinforce what the previous answer said
Unfortunately you can't solve this on the TCP level
Which is true, you can't. However you can implement an http parser on top of your tcp sockets. And that's what I want to explore here.
Let's get started
Problem and Desired Outcome
Right now we are struggling to find the end to a datastream. We expected our stream to end with a fixed ending but now we know that HTTP does not define any message suffix
And yet, we move forward.
There is one question we can now ask, "Can we ever know the length of the message in advance?" and the answer to that is YES! Sometimes...
You see HTTP/1.1 defines a header called Content-Length and as you'd expect it has exactly what we want, the content length; but there is something else in the shadows: Transfer-Encoding: chunked. unless you really want to learn about it, we'll stay away from it for now.
Solution
Here is a solution. You're not gonna know what some of these functions are at first, but if you stick with me, I'll explain. Alright... Take a deep breath.
Assuming conn is a socket connection to the desired HTTP server
...
rawheaders = recvheaders(conn,end=CRLF)
headers = dict_headers(io.StringIO(rawheaders))
l_content = headers['Content-Length']
#okay. we've got content length by magic
buffersize = 4096
while True:
if l_content <= 0: break
data = clientSocket.recv(buffersize)
message += data
l_content -= len(data)
...
As you can see, we enter the loop already knowing the Content-Length as l_content
While we iterate we keep track of the remaining content by subtracting the length of clientSocket.recv(buff) from l_content.
When we've read at least as much data as l_content, we are done
if l_content <= 0: break
Frustration
Note: For some these next bits I'm gonna give psuedo code because the code can be a bit dense
So now you're asking, what is rawheaders = recvheaders(conn), what is headers = dict_headers(io.StringIO(rawheaders)),
and HOW did we get headers['Content-Length']?!
For starters, recvheaders. The HTTP/1.1 spec doesn't define a message suffix, but it does define something useful: a suffix for the http headers! And that suffix is CRLF aka \r\n.That means we know when we've recieved the headers when we read CRLF. So we can write a function like
def recvheaders(sock):
rawheaders = ''
until we read crlf:
rawheaders = sock.recv()
return rawheaders
Next, parsing the headers.
def dict_header(ioheaders:io.StringIO):
"""
parses an http response into the status-line and headers
"""
#here I expect ioheaders to be io.StringIO
#the status line is always the first line
status = ioheaders.readline().strip()
headers = {}
for line in ioheaders:
item = line.strip()
if not item:
break
//headers look like this
//'Header-Name' : 'Value'
item = item.split(':', 1)
if len(item) == 2:
key, value = item
headers[key] = value
return status, headers
Here we read the status line then we continue to iterate over every remaining line
and build [key,value] pairs from Header: Value with
item = line.strip()
item = item.split(':', 1)
# We do split(':',1) to avoid cases like
# 'Header' : 'foo:bar' -> ['Header','foo','bar']
# when we want ---------> ['Header','foo:bar']
then we take that list and add it to the headers dict
#unpacking
#key = item[0], value = item[1]
key, value = item
header[key] = value
BAM, we've created a map of headers
From there headers['Content-Length'] falls right out.
So,
This structure will work as long as you can guarantee that you will always recieve Content-Length
If you've made it this far WOW, thanks for taking the time and I hope this helped you out!
TLDR; if you want to know the length of an http message with sockets, write an http parser
I'm writing a script of OAuth in Python.
For testing this, I use Twitter API. But it is not working well.
def test():
params = {
"oauth_consumer_key": TWITTER_OAUTH_CONSUMER_KEY,
"oauth_nonce": "".join(random.choice(string.digits + string.letters) for i in xrange(7)),
"oauth_signature_method": "HMAC-SHA1",
"oauth_timestamp": str(int(time.time())),
"oauth_token": res_dict["oauth_token"],
"oauth_version": "1.0",
}
status = {"status": u"Always_look_on_the_bright_side_of_life".encode("UTF-8")}
print status
params.update(status)
url = "http://twitter.com/statuses/update.xml"
key = "&".join([TWITTER_OAUTH_CONSUMER_SECRET, res_dict["oauth_token_secret"]])
msg = "&".join(["POST", urllib.quote(url,""),
urllib.quote("&".join([k+"="+params[k] for k in sorted(params)]), "-._~")])
print msg
signature = hmac.new(key, msg, hashlib.sha1).digest().encode("base64").strip()
params["oauth_signature"] = signature
req = urllib2.Request(url,
headers={"Authorization":"OAuth", "Content-type":"application/x-www-form-urlencoded"})
req.add_data("&".join([k+"="+urllib.quote(params[k], "-._~") for k in params]))
print req.get_data()
res = urllib2.urlopen(req).read()
print res
This script (status="Always_look_on_the_bright_side_of_life") is working.
But, in case status is "Always look on the bright side of life"(replaced underscore with space), it isn't working(is returning HTTP Error 401: Unauthorized).
I referenced this question, but failed.
Please give me some advice. Thank you.
I got the same problem in OAuth with FaceBook a while ago. The problem is that the signature validation on server side fails. See your signature generation code here:
msg = "&".join(["POST", urllib.quote(url,""),
urllib.quote("&".join([k+"="+params[k] for k in sorted(params)]), "-._~")])
print msg
signature = hmac.new(key, msg, hashlib.sha1).digest().encode("base64").strip()
It uses the raw (non-encoded) form of the string to generate the signature. However, the server side generates validates the signature against the URL quoted string:
req.add_data("&".join([k+"="+urllib.quote(params[k], "-._~") for k in params]))
To fix the code, you need to do fix this line by creating the signature from the url encoded parameter:
msg = "&".join(["POST", urllib.quote(url,""),
urllib.quote("&".join([k+"="+urllib.quote(params[k], "-._~") for k in sorted(params)]), "-._~")])
The easiest way to fix this is to add status = urllib.quote(status) after status = {"status": u"Always_look_on_the_bright_side_of_life".encode("UTF-8")}. This will escape the spaces and other special characters as required.