Python parse log using regex

Python parse log using regex - python

Hope someone might be able to help. I have a log been sent from a syslog server to python which looks like this:
{'Raw': 'Nov 26 00:23:07 TEST 23856434232342 (2016-11-26T00:23:07) http-proxy[2063]: Allow 1-Trusted 0-External tcp 192.168.0.1 2.3.4.5 57405 80 msg="HTTP Request" proxy_act="HTTP-TEST" op="POST" dstname="www.google.com" arg="/" sent_bytes="351" rcvd_bytes="1400" (HTTP-proxy-TEST-00)'}
I need to be able to extract the IP address, dstname=, sent_bytes= and dcvd_bytes= and if possible parse to json. I started trying to use REGEX (["'])(?:(?=(\\?))\2.)*?\1 to match the double quotes but its not working correctly.
Any ideas how I might get the data I need? Or how to parse the above to json?
Thanks

Assuming IP, dstname sent_bytes and rcvd_bytes are always in order, use re.findall to get them all
import re
s = r"""{'Raw': 'Nov 26 00:23:07 TEST 23856434232342 (2016-11-26T00:23:07) http-proxy[2063]: Allow 1-Trusted 0-External tcp 192.168.0.1 2.3.4.5 57405 80 msg="HTTP Request" proxy_act="HTTP-TEST" op="POST" dstname="www.google.com" arg="/" sent_bytes="351" rcvd_bytes="1400" (HTTP-proxy-TEST-00)'}"""
match = re.findall('(?:tcp |dstname=|sent_bytes=|rcvd_bytes=)"?([^\s"]+)', s)
# match = ['192.168.0.1', 'www.google.com', '351', '1400']
(ip, dstname, sent_bytes, rcvd_bytes) = match
# use this to parse to json

Related

How to obtain packet count of TCP packets?

I am retrieving flow statistics using a _flow_stats_reply_handler as demonstrated in the Ryu Traffic Monitor example.
I print using the following:
file.write("\n{},{},{},{},{},{},{},{},{}"
.format(ev.msg.datapath.id,
stat.match['in_port'], stat.match['eth_src'], stat.match['eth_dst'],
stat.instructions[0].actions[0].port,
stat.packet_count, stat.byte_count,
stat.duration_sec, stat.duration_nsec))
Note the stat.packet_count.
How could I change this to count TCP packets? I understand there is an ip_proto field and a tcp_flags field but I don't know how to code the match/count.
Edit:
I have further investigated this and added a flow match to my request flow stats function:
def _request_stats(self, datapath):
self.logger.debug('send stats request: %016x', datapath.id)
ofp = datapath.ofproto
parser = datapath.ofproto_parser
cookie = cookie_mask = 0
match = parser.OFPMatch(eth_type=0x0800)
req = parser.OFPFlowStatsRequest(datapath, 0, ofp.OFPTT_ALL, ofp.OFPP_ANY, ofp.OFPG_ANY,
cookie, cookie_mask, match)
datapath.send_msg(req)
This unfortunately still doesn't work, any ideas as to why not would be greatly appreciated.

You should add more data to your match, like ip_proto in order to match with tcp, as you may know, IP protocol number of TCP is 6, for more information about IP Protocol numbers check Wikipedia.
Please use the code below, You don't need to settcp_flags in this case.
match = parser.OFPMatch(
eth_type=0x0800,
ip_proto=6,
)

Scapy: How to manipulate Host in http header?

I wrote this piece of code to get http header and set Host:
http_layer = packet.getlayer(http.HTTPRequest).fields
http_layer['Host'] = "newHostName"
return packet
After running the afforementioned code,the new host name has been set correctly, but the problem is that when I write the packet in pcap file, I still see the previous host in http fields,
Is there an absolute way to manipulate http_layer['Host'] ?
Any help would be appreciated.
Regards.

After all, found the answer.
The key is that scapy firstly parses HTTP Request and shows the dict of its fields. So when we try to assign a new field like Host, it changes the Host which it has already parsed and does not change the original field value.
So, this is the way to modify Host or any other respective fields:
str_headers = pkt['HTTP']['HTTP Request'].fields['Headers']
str_headers = str_headers.replace('Host: ' + pkt['HTTP']['HTTP Request'].fields['Host'], 'Host: ' + new_val)
pkt['HTTP']['HTTP Request'].fields['Headers'] = str_headers
return pkt

Python regular expression

I have this HTTP Request and I want to display only the Authorization section (base64 Value) : any help ?
This Request is stored on a variable called hreq
I have tried this :
reg = re.search(r"Authorization:\sBasic\s(.*)\r", hreq)
print reg.group()
but doesn't work
Here is the request :
HTTP Request:
Path: /dynaform/custom.js
Http-Version: HTTP/1.1
Host: 192.168.1.254
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://domain.com/userRpm/StatusRpm.htm
Authorization: Basic YWhtEWa6MDfGcmVlc3R6bGH
I want to display the value YWhtEWa6MDfGcmVlc3R6bGH
Please I need your help
thanks in advance experts

You can get rid of the \r at the end of the regex, in Linux it is a \n and it might break your script since you were expecting \r instead of \n:
>>> reg = re.search(r"Authorization:\sBasic\s(.*)", a)
>>> reg.groups()
('YWhtEWa6MDfGcmVlc3R6bGH ',)

The \r is probably throwing things off; there probably isn't a carriage return at the end of the request (but it's hard to say from this end). Try removing it or using $ (end-of-input) instead.
You can use this online Python regex tester to try your inputs by hand before putting them in your code.

If that is your whole input text maybe /(\b[ ].*)$/ could help
Online Demo
[ ] match a space character present in the text. followed by
.* any character (except newline) followed by
$ the end of the string

Thanks for your answers :), actually I got an error msg
but anyway I'm gonna show what exactly I wanna do,
here is the script :
!/usr/bin/python
from scapy.all import *
from scapy.error import Scapy_Exception
from scapy import HTTP
my_iface="wlan0"
count=0
def pktTCP(pkt):
global count
count=count+1
if HTTP.HTTPRequest or HTTP.HTTPResponse in pkt:
src=pkt[IP].src
srcport=pkt[IP].sport
dst=pkt[IP].dst
dstport=pkt[IP].dport
test=pkt[TCP].payload
if HTTP.HTTPRequest in pkt:
print "HTTP Request:"
print test
print "======================================================================"
if HTTP.HTTPResponse in pkt:
print "HTTP Response:"
print test
print "======================================================================"
sniff(filter='tcp and port 80',iface=my_iface,prn=pktTCP)

How to parse URLs using urlparse and split() in python?

Could someone explain to me the purpose of this line host = parsed.netloc.split('#')[-1].split(':')[0]in the following code? I understand that we are trying to get the host name from netlock but I don't understand why we are splitting with the # delimiter and then again with the : delimiter.
import urlparse
parsed = urlparse.urlparse('https://www.google.co.uk/search?client=ubuntu&channel=fs')
print parsed
host = parsed.netloc.split('#')[-1].split(':')[0]
print host
Result:
ParseResult(scheme='https', netloc='www.google.co.uk', path='/search', params='', query='client=ubuntu&channel=fs, fragment='')
www.google.co.uk
Surely if one just needs the domain, we can get that from urlparse.netloc

Netloc in its full form can have HTTP authentication credentials and a port number:
login:password#www.google.co.uk:80
See RFC1808 and RFC1738
So we potentially have to split that into ["login:password", "www.google.co.uk:80"], take the last part, split that into ["www.google.co.uk", "80"] and take the hostname.
If these parts are omitted, there's no harm in trying to split on nonexisting delimeters, and no need to check if they're omitted or not.
urlparse documentation

Python IMAP Search from or to designated email address

I am using this with Gmail's SMTP server, and I would like to search via IMAP for emails either sent to or received from an address.
This is what I have:
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('user', 'pass')
mail.list()
mail.select("[Gmail]/All Mail")
status, email_ids = mail.search(None, 'TO "tech163#fusionswift.com" OR FROM "tech163#fusionswift.com"')
The last line of the error is: imaplib.error: SEARCH command error: BAD ['Could not parse command']
Not sure how I'm supposed to do that kind of OR statement within python's imaplib. If someone can quickly explain what's wrong or point me in the right direction, it'd be greatly appreciated.

The error you are receiving is generated from the server because it can't parse the search query correctly. In order to generate a valid query follow the RFC 3501, in page 49 it is explained in detail the structure.
For example your search string to be correct should be:
'(OR (TO "tech163#fusionswift.com") (FROM "tech163#fusionswift.com"))'

Try to use IMAP query builder from https://github.com/ikvk/imap_tools
from imap_tools import A, AND, OR, NOT
# AND
A(text='hello', new=True) # '(TEXT "hello" NEW)'
# OR
OR(text='hello', date=datetime.date(2000, 3, 15)) # '(OR TEXT "hello" ON 15-Mar-2000)'
# NOT
NOT(text='hello', new=True) # 'NOT (TEXT "hello" NEW)'
# complex
A(OR(from_='from#ya.ru', text='"the text"'), NOT(OR(A(answered=False), A(new=True))), to='to#ya.ru')
Of course you can use all library tools

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python parse log using regex - python

Related

How to obtain packet count of TCP packets?

Scapy: How to manipulate Host in http header?

Python regular expression

How to parse URLs using urlparse and split() in python?

Python IMAP Search from or to designated email address

Categories

Resources