How to obtain substring of big string text in Python?

How to obtain substring of big string text in Python? - python

I have the following format of text files, which are outputs of an API:
TASK [Do this]
OK: {
"changed":false,
"msg": "check ok"
}
TASK [Do that]
OK
TASK [Do x]
Fatal: "Error message x"
TASK [Do y]
OK
TASK [Do z]
Fatal: "Stopped because of previous error"
The amount of lines, or tasks before and after the "Fatal" error are random, and I am only interested in the "Error message x" part.
Code as of now:
url = # API URL
r = request.get(url, verify=False, allow_redirects=True, headers=headers, timeout=10)
output = r.text
I tried using a combination of output.split("Fatal", 1)[1] but it seems to return list index out of range, while also messing up the text, adding a lot of \n.

You can use the re package to use a regular expression to search for the text you need. There are probably more optimal regex, but I wrote this one quickly using regex101.com: Fatal: "(.+)"
import re
s = '''TASK [Do this]
OK: {
"changed":false,
"msg": "check ok"
}
TASK [Do that]
OK
TASK [Do x]
Fatal: "Error message x"
TASK [Do y]
OK
TASK [Do z]
Fatal: "Stopped because of previous error"'''
errors = re.findall(r'Fatal: "(.+)"', s)
for x in errors:
print(x)

You should be able to use regular expressions with the re package to do that fairly easily. If it is possible for more than one occurrence of "Error Message X" then using something along the lines of
someVar = re.findall("Error Message X", output)
should return a list of all occurrences of strings within the output text that match. Findall can also be used if only one occurrence is possible, it will then just return a list with only one element.
Here is a helpful site for an intro to re
https://www.w3schools.com/python/python_regex.asp

Related

AWS Batch Operation - completion reports, no message

I make batch operations in my lambda function on a huge number of csv files. I want to have the content of error/exception in my completion reports. I have only 5% of error files so lambda works fine, but it doesn't write errors in the report.
When I test my lambda on a file that leads to errors, I see that "ResultMessage" is the same as error or exception. I tried adding a string with exception to report but the last column is always Null.
Can you help me?
except ClientError as e:
# If request timed out, mark as a temp failure
# and S3 Batch Operations will make the task for retry. If
# any other exceptions are received, mark as permanent failure.
errorCode = e.response['Error']['Code']
errorMessage = e.response['Error']['Message']
if errorCode == 'RequestTimeout':
resultCode = 'TemporaryFailure'
resultString = 'Retry request to Amazon S3 due to timeout.'
else:
resultCode = 'PermanentFailure'
resultString = '{}: {}'.format(errorCode, errorMessage)
except Exception as e:
# Catch all exceptions to permanently fail the task
resultCode = 'PermanentFailure'
resultString = 'Exception: {}'.format(e)
finally:
results.append({
'taskId': taskId,
'resultCode': resultCode,
'ResultMessage': resultString
})
return {
'invocationSchemaVersion': invocationSchemaVersion,
'invocationId': invocationId,
'results': results
}
Example rows of my report with failed csv

There's nothing obviously wrong with your code.
I checked the docs:
Response and result codes
There are two levels of codes that S3 Batch Operations expect from Lambda functions. The first is the response code for the entire request, and the second is a per-task result code. The following table contains the response codes.
Response code
Description
Succeeded
The task completed normally. If you requested a job completion report, the task's result string is included in the report.
TemporaryFailure
The task suffered a temporary failure and will be redriven before the job completes. The result string is ignored. If this is the final redrive, the error message is included in the final report.
PermanentFailure
The task suffered a permanent failure. If you requested a job-completion report, the task is marked as Failed and includes the error message string.
Sounds to me like you'd need to look into the Job Completion Report to get more details.

Best way to communicate with telnet api from python?

I have a multi-room speaker system from Denon called Heos which I want to control by using python script. To communicate with the multi-room system I have to telnet to port 1255 on the device and send commands like this:
heos://player/set_play_state?pid=player_id&state=play_state
The response back is in json:
{
"heos": {
"command": " player/set_play_state ",
"result": "success",
"message": "pid='player_id'&state='play_state'"
}
}
I have successfully used python telnet lib to send simple commands like this:
command = "heos://player/set_play_state?pid=player_id&state=play_state"
telnet.write(command.encode('ASCII') + b'\r\n')
But what is the best way to get the response back in a usable format? Loop with telnet.read_until? I want to result and message lines back to a clean variable.
This method with using telnet to communicate with api feels a bit dirty. Is it possible to use something else, for example socket?
Thanks in advance
The API/CLI is documented here: http://rn.dmglobal.com/euheos/HEOS_CLI_ProtocolSpecification.pdf

While it may be possible to use loop_until() here, it would depend on exactly how the response JSON is formatted, and it would probably be unwise to rely on it.
If the remote device closes the connection after sending the response, the easy way would be a simple
response = json.loads(telnet.read_all().decode())
If it remains open for more commands, then you'll instead need to keep receiving until you have a complete JSON object. Here's a possibility that just keeps trying to parse the JSON until it succeeds:
response = ''
while True:
response += telnet.read_some().decode()
try:
response = json.loads(response)
break
except ValueError:
pass
Either way, your result and message are response['heos']['result'] and response['heos']['message'].

FWIW, here is my GitHub repo (inspired by this repo) for controlling a HEOS speaker with Python. It uses a similar approach as the accepted result, but additionally waits if the HEOS player is busy.
def telnet_request(self, command, wait = True):
"""Execute a `command` and return the response(s)."""
command = self.heosurl + command
logging.debug("telnet request {}".format(command))
self.telnet.write(command.encode('ASCII') + b'\r\n')
response = b''
while True:
response += self.telnet.read_some()
try:
response = json.loads(response)
if not wait:
logging.debug("I accept the first response: {}".format(response))
break
# sometimes, I get a response with the message "under
# process". I might want to wait here
message = response.get("heos", {}).get("message", "")
if "command under process" not in message:
logging.debug("I assume this is the final response: {}".format(response))
break
logging.debug("Wait for the final response")
response = b'' # forget this message
except ValueError:
# response is not a complete JSON object
pass
except TypeError:
# response is not a complete JSON object
pass
if response.get("result") == "fail":
logging.warn(response)
return None
return response

Python regular expression

I have this HTTP Request and I want to display only the Authorization section (base64 Value) : any help ?
This Request is stored on a variable called hreq
I have tried this :
reg = re.search(r"Authorization:\sBasic\s(.*)\r", hreq)
print reg.group()
but doesn't work
Here is the request :
HTTP Request:
Path: /dynaform/custom.js
Http-Version: HTTP/1.1
Host: 192.168.1.254
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://domain.com/userRpm/StatusRpm.htm
Authorization: Basic YWhtEWa6MDfGcmVlc3R6bGH
I want to display the value YWhtEWa6MDfGcmVlc3R6bGH
Please I need your help
thanks in advance experts

You can get rid of the \r at the end of the regex, in Linux it is a \n and it might break your script since you were expecting \r instead of \n:
>>> reg = re.search(r"Authorization:\sBasic\s(.*)", a)
>>> reg.groups()
('YWhtEWa6MDfGcmVlc3R6bGH ',)

The \r is probably throwing things off; there probably isn't a carriage return at the end of the request (but it's hard to say from this end). Try removing it or using $ (end-of-input) instead.
You can use this online Python regex tester to try your inputs by hand before putting them in your code.

If that is your whole input text maybe /(\b[ ].*)$/ could help
Online Demo
[ ] match a space character present in the text. followed by
.* any character (except newline) followed by
$ the end of the string

Thanks for your answers :), actually I got an error msg
but anyway I'm gonna show what exactly I wanna do,
here is the script :
!/usr/bin/python
from scapy.all import *
from scapy.error import Scapy_Exception
from scapy import HTTP
my_iface="wlan0"
count=0
def pktTCP(pkt):
global count
count=count+1
if HTTP.HTTPRequest or HTTP.HTTPResponse in pkt:
src=pkt[IP].src
srcport=pkt[IP].sport
dst=pkt[IP].dst
dstport=pkt[IP].dport
test=pkt[TCP].payload
if HTTP.HTTPRequest in pkt:
print "HTTP Request:"
print test
print "======================================================================"
if HTTP.HTTPResponse in pkt:
print "HTTP Response:"
print test
print "======================================================================"
sniff(filter='tcp and port 80',iface=my_iface,prn=pktTCP)

Twisted IMAP4 Client QUOTA family of commands

Update It seems to be the way untagged responses are handled by twisted, the only example I have found seem to iterate through the data received and somehow collect the response to their command though I am not sure how...
I am trying to implement the IMAP4 quota commands as defined in RFC 2087 ( https://www.rfc-editor.org/rfc/rfc2087 ).
Code - ImapClient
class SimpleIMAP4Client(imap4.IMAP4Client):
"""
A client with callbacks for greeting messages from an IMAP server.
"""
greetDeferred = None
def serverGreeting(self, caps):
self.serverCapabilities = caps
if self.greetDeferred is not None:
d, self.greetDeferred = self.greetDeferred, None
d.callback(self)
def lineReceived(self, line):
print "<" + str(line)
return imap4.IMAP4Client.lineReceived(self, line)
def sendLine(self, line):
print ">" + str(line)
return imap4.IMAP4Client.sendLine(self, line)
Code - QUOTAROOT Implementation
def cbExamineMbox(result, proto):
"""
Callback invoked when examine command completes.
Retrieve the subject header of every message in the mailbox.
"""
print "Fetching storage space"
cmd = "GETQUOTAROOT"
args = _prepareMailboxName("INBOX")
resp = ("QUOTAROOT", "QUOTA")
d = proto.sendCommand(Command(cmd, args, wantResponse=resp))
d.addCallback(cbFetch, proto)
return d
def cbFetch(result, proto):
"""
Finally, display headers.
"""
print "Got Quota"
print result
Output
Fetching storage space
>0005 GETQUOTAROOT INBOX
<* QUOTAROOT "INBOX" ""
<* QUOTA "" (STORAGE 171609 10584342)
<0005 OK Success
Got Quota
([], 'OK Success')
So I am getting the data but the result doesn't contain it, I am thinking it is because they are untagged responses?

Since the IMAP4 protocol mixes together lots of different kinds of information as "untagged responses", you probably also need to update some other parts of the parsing code in the IMAP4 client implementation.
Specifically, take a look at twisted.mail.imap4.Command and its finish method. Also look at twisted.mail.imap4.IMAP4Client._extraInfo, which is what is passed as the unusedCallback to Command.finish.
To start, you can check to see if the untagged responses to the QUOTA command are being sent to _extraInfo (and then dropped (well, logged)).
If so, I suspect you want to teach Command to recognize QUOTA and QUOTAROOT untagged responses to the QUOTA command, so that it collects them and sends them as part of the result it fires its Deferred with.
If not, you may need to dig a bit deeper into the logic of Command.finish to see where the data does end up.
You may also want to actually implement the Command.wantResponse feature, which appears to be only partially formed currently (ie, lots of client code tries to send interesting values into Command to initialize that attribute, but as far as I can tell nothing actually uses the value of that attribute).

Python & parsing IRC messages

What's the best way to parse messages received from an IRC server with Python according to the RFC? I simply want some kind of list/whatever, for example:
:test!~test#test.com PRIVMSG #channel :Hi!
becomes this:
{ "sender" : "test!~test#test.com", "target" : "#channel", "message" : "Hi!" }
And so on?
(Edit: I want to parse IRC messages in general, not just PRIVMSG's)

Look at Twisted's implementation http://twistedmatrix.com/
Unfortunately I'm out of time, maybe someone else can paste it here for you.
Edit
Well I'm back, and strangely no one has pasted it yet so here it is:
http://twistedmatrix.com/trac/browser/trunk/twisted/words/protocols/irc.py#54
def parsemsg(s):
"""Breaks a message from an IRC server into its prefix, command, and arguments.
"""
prefix = ''
trailing = []
if not s:
raise IRCBadMessage("Empty line.")
if s[0] == ':':
prefix, s = s[1:].split(' ', 1)
if s.find(' :') != -1:
s, trailing = s.split(' :', 1)
args = s.split()
args.append(trailing)
else:
args = s.split()
command = args.pop(0)
return prefix, command, args
parsemsg(":test!~test#test.com PRIVMSG #channel :Hi!")
# ('test!~test#test.com', 'PRIVMSG', ['#channel', 'Hi!'])
This function closely follows the EBNF described in the IRC RFC.

You can do it with a simple list comprehension if the format is always like this.
keys = ['sender', 'type', 'target', 'message']
s = ":test!~test#test.com PRIVMSG #channel :Hi!"
dict((key, value.lstrip(':')) for key, value in zip(keys, s.split()))
Result:
{'message': 'Hi!', 'type': 'PRIVMSG', 'sender': 'test!~test#test.com', 'target': '#channel'}

Do you just want to parse IRC Messages in general or do you want just parse PRIVMSGs? However I have a implementation for that.
def parse_message(s):
prefix = ''
trailing = ''
if s.startswith(':'):
prefix, s = s[1:].split(' ', 1)
if ' :' in s:
s, trailing = s.split(' :', 1)
args = s.split()
return prefix, args.pop(0), args, trailing

If you want to keep to a low-level hacking I second the Twisted answer by Unknown, but first I think you should take a look at the very recently announced Yardbird which is a nice request parsing layer on top of Twisted. It lets you use something similar to Django URL dispatching for handling IRC messages with a side benefit of having the Django ORM available for generating responses, etc.

I know it's not Python, but for a regular expression-based approach to this problem, you could take a look at POE::Filter::IRCD, which handles IRC server protocol (see POE::Filter::IRC::Compat for the client protocol additions) parsing for Perl's POE::Component::IRC framework.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to obtain substring of big string text in Python? - python

Related

AWS Batch Operation - completion reports, no message

Best way to communicate with telnet api from python?

Python regular expression

Twisted IMAP4 Client QUOTA family of commands

Python & parsing IRC messages

Categories

Resources