How do I parse lines from a log file? - python

I need to extract the values of following output:
Oct 6 17:29:52 FW kernel: [ 5470.058450] ipTables: IN= OUT=eth0 SRC=192.168.1.116 DST=192.168.1.110 LEN=516 TOS=0x10 PREC=0x00 TTL=64 ID=4949 DF PROTO=TCP SPT=22 DPT=46216 WINDOW=446 RES=0x00 ACK PSH URGP=0
I'd need for example the value of PROTO stored in a value. tried shellscripting, my problem is that it works only if the log entry is in the same order everytime.
so this doens't work:
while read line
do
in_if=`echo $line | cut -d ' ' -f 10 | cut -d '=' -f 2`;
out_if=`echo $line | cut -d ' ' -f 11 | cut -d '=' -f 2`;
src_ip=`echo $line | cut -d ' ' -f 12 | cut -d '=' -f 2`;
dst_ip=`echo $line | cut -d ' ' -f 13 | cut -d '=' -f 2`;
pro=`echo $line | cut -d ' ' -f 20 | cut -d '=' -f 2`;
echo "$in_if,$out_if,$src_ip,$dst_ip,$pro" >> output.csv;
done < $tmp_file

Python does this conveniently. A general solution that gets all the KEY=value pairs is:
import re
import fileinput
pair_re = re.compile('([^ ]+)=([^ ]+)') # Matches KEY=value pair
for line in fileinput.input(): # The script accepts both data from stdin or a filename
line = line.rstrip() # Removes final spaces and newlines
data = dict(pair_re.findall(line)) # Fetches all the KEY=value pairs and puts them in a dictionary
# Example of usage:
print "PROTO =", data['PROTO'], "SRC =", data['SRC'] # Easy access to any value
This is arguably more legible, flexible and convenient than a shell script.

You can do this without touching Perl. You were on the right track, but with a regex you can search by name, not position.
Also, you should put quotes around $line so you don't get burned by any pipes or semicolons hanging around.
pro=`echo "$line" | grep -o 'PROTO=\w+\+' | cut -d '=' -f 2`;
Of course, if you did want to use Perl, you could make a much slicker solution:
#!/usr/bin/perl
while(<>) {
/IN=(\S*) .*OUT=(\S*) .*SRC=(\S*) .*DST=(\S*) .*PROTO=(\S*)/
and print "$1,$2,$3,$4,$5\n";
}
Then call:
./thatScript.pl logFile.txt >>output.csv

you don't even need to cut:
grep -Po "(?<=PROTO=)\w+" yourFile
OR
sed -r 's/.*PROTO=(\w+).*/\1/' yourFile
OR
awk -F'PROTO=' '{split($2,a," ");print a[1]}' yourfile
test:
kent$ echo "Oct 6 17:29:52 FW kernel: [ 5470.058450] ipTables: IN= OUT=eth0 SRC=192.168.1.116 DST=192.168.1.110 LEN=516 TOS=0x10 PREC=0x00 TTL=64 ID=4949 DF PROTO=TCP SPT=22 DPT=46216 WINDOW=446 RES=0x00 ACK PSH URGP=0"|grep -Po "(?<=PROTO=)\w+"
TCP
kent$ echo "Oct 6 17:29:52 FW kernel: [ 5470.058450] ipTables: IN= OUT=eth0 SRC=192.168.1.116 DST=192.168.1.110 LEN=516 TOS=0x10 PREC=0x00 TTL=64 ID=4949 DF PROTO=TCP SPT=22 DPT=46216 WINDOW=446 RES=0x00 ACK PSH URGP=0"|sed -r 's/.*PROTO=(\w+).*/\1/'
TCP
kent$ echo "Oct 6 17:29:52 FW kernel: [ 5470.058450] ipTables: IN= OUT=eth0 SRC=192.168.1.116 DST=192.168.1.110 LEN=516 TOS=0x10 PREC=0x00 TTL=64 ID=4949 DF PROTO=TCP SPT=22 DPT=46216 WINDOW=446 RES=0x00 ACK PSH URGP=0"|awk -F'PROTO=' '{split($2,a," ");print a[1]}'
TCP

A straightforward Perl solution might be the most readable one:
#!/usr/bin/env perl
use strict; use warnings;
my $s = q{Oct 6 17:29:52 FW kernel: [ 5470.058450] ipTables: IN= OUT=eth0
SRC=192.168.1.116 DST=192.168.1.110 LEN=516 TOS=0x10 PREC=0x00 TTL=64
ID=4949 DF PROTO=TCP SPT=22 DPT=46216 WINDOW=446 RES=0x00 ACK PSH URGP=0};
while ($s =~ /(?<k> [A-Z]+) = (?<v> \S*)/xg) {
print "'$+{k}' = '$+{v}'\n";
}
C:\Temp> z
'IN' = ''
'OUT' = 'eth0'
'SRC' = '192.168.1.116'
'DST' = '192.168.1.110'
'LEN' = '516'
'TOS' = '0x10'
'PREC' = '0x00'
'TTL' = '64'
'ID' = '4949'
'PROTO' = 'TCP'
'SPT' = '22'
'DPT' = '46216'
'WINDOW' = '446'
'RES' = '0x00'
'URGP' = '0'
You can also assign the information in the log line to a hash:
my %entry = ($s =~ /(?<k> [A-Z]+) = (?<v> \S*)/xg);

In perl this should do it
#consider the $a variable has the log file my
$a = <<log file>>;
my $desired_answer;
#regex
if ($a =~ m/PROTO=(.*?) /ig)
{ $desired_answer=$1; }

thanks for all the responses!
i chose the way of shellscripting using egrep and regex...
in_if=`echo "$line" | egrep -Eo 'IN=eth[0-9]*\b' | cut -d '=' -f 2`;
out_if=`echo "$line" | egrep -Eo 'OUT=eth[0-9]*\b' | cut -d '=' -f 2`;
src_ip=`echo "$line" | egrep -Eo 'SRC=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | cut -d '=' -f 2`;
dst_ip=`echo "$line" | egrep -Eo 'DST=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | cut -d '=' -f 2`;
pro=`echo "$line" | grep -o 'PROTO=[A-Z]*\b' | cut -d '=' -f 2`;

Related

Python store value into list and run a grep with if/else statement?

Im creating a code in which I need to check the list ip addresses from npat variable which i need to create a loop, the code will run two things 1 is grep and 2 is lookup using whois both of this task has 2 possible output and its either match or unmatch and result should be in the list.
Q's:
store the if/else statement result into a list that is the result from grep/whois?
What pattern should I use to match route: (spaces) from whois? so far my regex pattern for this work especially matching the address but I'm having issue matching the word "route:(spaces).
Some output:
npat list = ['6.120.0.0/18', '6.120.0.0/17', '13.44.61.0/24', '13.44.62.0/24']
Whois possible output:
1.
RADB: % No entries found for the selected source(s).
RADB: route: 6.120.0.0/18
descr: name.com
origin: AS1111
notify: network#email.com
source: RADB
Here's the code:
import re, base64, os, sys
#SAMPLE STRING
teststr = """router#sh ip bgp
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, x best-external
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 6.120.0.0/18 2.2.2.11 0 3111 2000 2485 43754 i
*> 6.120.0.0/17 2.2.2.11 0 3111 2000 2485 43754 i
*> 13.44.61.0/24 2.2.2.11 0 3111 4559 i
*> 13.44.62.0/24 2.2.2.11 0 3111 4559 i"""
##print (teststr,"\n")
#SEARCH NETWORK ENTRY*Working)
npat = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})/\d+',teststr)
print ("List: \n",npat)
for ips in npat:
ipnet = ips.strip()
print ("Processing ..... ", ipnet)
fgen = "grep " +ipnet+ " /mnt/hgfs/IRR/fgen.txt"
f2pat = re.findall(ipnet,fgen)
print ("\nCommand: ",fgen)
os.system(fgen)
print ("\n NEW NPATH: ",f2pat)
if ipnet in f2pat:
flist = "Grep Found"
print ("Result ", flist)
else:
flist = "Grep Not found"
print ("Result: ",flist)
f = os.popen('whois -h whois.radb.net ' + ipnet)
who = f.read()
radbpat = re.findall(ipnet,who)
print ("\nRADB: ", who)
radbpat = re.findall(r'(?<=route: )(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})/\d+',who)
print ("Radb :",radbpat)
if ipnet in radbpat:
rlist = "Found in RADB"
print ("Result ", rlist)
else:
rlist = "Not found in RADB"
print ("Result: ",rlist)
## OUTPUT
titles = ['RS-SET', 'GREP', 'RADB']
data = [titles] + list(zip(npat, flist, rlist))
for i, d in enumerate(data):
line = '|'.join(str(x).ljust(15) for x in d)
print(line)
if i == 0:
print('-' * len(line))
My target is to create a loop so I could check all the list of ip address from npat then the output shows the result from task 1 and 2??
I have created a table so my target output should be like this.
RS-SET |Grep |RADB
--------------------------------------------
xx.xx.xx.0/yy |not found |Found
My Current output is like this:
RS-SET |GREP |RADB
-----------------------------------------------
27.54.41.0/24 |G |N
223.253.0.0/20 |r |o
27.54.41.0/24 |e |t
27.54.42.0/24 |p |
27.54.43.0/24 | |f
Grep and radb output has been vertically added... my flist and rlist has only 1 data.

How could I read a Markdown list into a Python OrderedDict?

I have Markdown lists of the following form:
- launchers
- say hello
- command: echo "hello" | festival --tts
- icon: shebang.svg
- say world
- command: echo "world" | festival --tts
- icon: shebang.svg
- say date
- command: date | festival --tts
I have a function that can convert this Markdown list to a dictionary, like the following:
{'say world': {'command': 'echo "world" | festival --tts', 'icon': 'shebang.svg'}, 'say hello': {'command': 'echo "hello" | festival --tts', 'icon': 'shebang.svg'}, 'say date': {'command': 'date | festival --tts'}}
When I do this, obviously the ordering is lost. What would be an appropriate way to keep this ordering? Would a plain list be good? Would an OrderedDict be better? How should it be done?
What I have so far is shown below as a minimal working example:
import re
def Markdown_list_to_dictionary(Markdown_list):
line = re.compile(r"( *)- ([^:\n]+)(?:: ([^\n]*))?\n?")
depth = 0
stack = [{}]
for indent, name, value in line.findall(Markdown_list):
indent = len(indent)
if indent > depth:
assert not stack[-1], "unexpected indent"
elif indent < depth:
stack.pop()
stack[-1][name] = value or {}
if not value:
# new branch
stack.append(stack[-1][name])
depth = indent
return(stack[0])
Markdown_list =\
"""
- launchers
- say hello
- command: echo "hello" | festival --tts
- icon: shebang.svg
- say world
- command: echo "world" | festival --tts
- icon: shebang.svg
- say date
- command: date | festival --tts
"""
print(Markdown_list_to_dictionary(Markdown_list))
Yes, an OrderedDict looks like it should work in this circumstance. You code would then look something like this:
import re
from collections import OrderedDict as _OrderedDict
def Markdown_list_to_dictionary(Markdown_list):
line = re.compile(r"( *)- ([^:\n]+)(?:: ([^\n]*))?\n?")
depth = 0
stack = [_OrderedDict()]
for indent, name, value in line.findall(Markdown_list):
indent = len(indent)
if indent > depth:
assert not stack[-1], "unexpected indent"
elif indent < depth:
stack.pop()
stack[-1][name] = value or _OrderedDict()
if not value:
# new branch
stack.append(stack[-1][name])
depth = indent
return(stack[0])
Markdown_list =\
"""
- launchers
- say hello
- command: echo "hello" | festival --tts
- icon: shebang.svg
- say world
- command: echo "world" | festival --tts
- icon: shebang.svg
- say date
- command: date | festival --tts
"""
print(Markdown_list_to_dictionary(Markdown_list))
And the output like this:
OrderedDict([('launchers', OrderedDict([('say hello', OrderedDict([('command', 'echo "hello" | festival --tts'), ('icon', 'shebang.svg')])), ('say world', OrderedDict([('command', 'echo "world" | festival --tts'), ('icon', 'shebang.svg')])), ('say date', OrderedDict([('command', 'date | festival --tts')]))]))])
It isn't as nice to look at when printed, but it does function correctly.

using ffmepg in python, shell and stdout

I want to run the following line in python:
ffmpeg -i test.avi -ss 0 -r 25 -vframes 100 ./out/image-%3d.jpg 2>&1 | grep output
which should, if I directly run it in shell, output:
>>Output #0, image2, to './out/image-%3d.jpg':
However, when I do this in python:
command = 'ffmpeg -i '+video_name+' -ss '+str(T) + ' -r '+str(25) + ' -vframes '+str(N)+' '+out_dir+'/image-%3d.jpg 2>&1 | grep output'
argx = shlex.split(command)
print argx
proc = subprocess.Popen(argx,stdout=subprocess.PIPE,shell = True)
(out,err) = proc.communicate()
it outputs this:
['ffmpeg', '-i', 'test.avi', '-ss', '0', '-r', '25', '-vframes', '100', './out/image-%3d.jpg', '2>&1', '|', 'grep', 'output']
ffmpeg version 1.2.6-7:1.2.6-1~trusty1 Copyright (c) 2000-2014 the FFmpeg developers
built on Apr 26 2014 18:52:58 with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1)
configuration: --arch=amd64 --disable-stripping --enable-avresample --enable-pthreads --enable-runtime-cpudetect --extra-version='7:1.2.6-1~trusty1' --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --enable-bzlib --enable-libdc1394 --enable-libfreetype --enable-frei0r --enable-gnutls --enable-libgsm --enable-libmp3lame --enable-librtmp --enable-libopencv --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-vaapi --enable-vdpau --enable-libvorbis --enable-libvpx --enable-zlib --enable-gpl --enable-postproc --enable-libcdio --enable-x11grab --enable-libx264 --shlibdir=/usr/lib/x86_64-linux-gnu --enable-shared --disable-static
libavutil 52. 18.100 / 52. 18.100
libavcodec 54. 92.100 / 54. 92.100
libavformat 54. 63.104 / 54. 63.104
libavdevice 53. 5.103 / 53. 5.103
libavfilter 3. 42.103 / 3. 42.103
libswscale 2. 2.100 / 2. 2.100
libswresample 0. 17.102 / 0. 17.102
libpostproc 52. 2.100 / 52. 2.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
Use -h to get full help or, even better, run 'man ffmpeg'
where apparently ffmpeg didn't get the proper arguments
where is wrong? Thx
When shell=True, you should pass the command as a string, not as a list of arguments:
command = 'ffmpeg -i '+video_name+' -ss '+str(T) + ' -r '+str(25) + ' -vframes '+str(N)+' '+out_dir+'/image-%3d.jpg 2>&1 | grep output'
proc = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
out, err = proc.communicate()
Note that using shell=True is a security risk if command depends on user input.
If you wish to use shell=False, then you'll need to replace the shell pipeline with two subprocess.Popen calls, with proc1.stdout connected to proc2.stdin:
import subprocess
PIPE = subprocess.PIPE
filename = out_dir+'/image-%3d.jpg'
args = ['ffmpeg', '-i', video_name, '-ss', T, '-r', 25, '-vframes', N, filename]
proc1 = subprocess.Popen(args, stdout=PIPE, stderr=PIPE, shell=False)
proc2 = subprocess.Popen(['grep', 'output'), stdin=proc1.stdout, stdout=PIPE, stderr=PIPE)
proc1.stdout.close() # Allow proc1 to receive a SIGPIPE if proc2 exits.
out, err = proc2.communicate()

EOL whilst scanning string literal - Python

I'm new to Python. I'm trying to make code it so it will print out this ASCII art traffic light, here is the actual ASCII
##
_[]_
[____]
.----' '----.
.===| .==. |===.
\ | /####\ | /
/ | \####/ | \
'===| `""` |==='
.===| .==. |===.
\ | /::::\ | /
/ | \::::/ | \
'===| `""` |==='
.===| .==. |===.
\ | /&&&&\ | /
/ | \&&&&/ | \
'===| `""` |==='
jgs '--.______.--'
And the Code I'm trying to use is this
print ("##"),
print (" _[]_"),
print (".----' '----."),
print (" .===| .==. |===."),
print (" \ | /####\ | /"),
print (" / | \####/ | \\"),
print ("'===| `""` |==='"),
print (" .===| .==. |===."),
print ("\ | /::::\ | /"),
print (" / | \::::/ | \"),
print ("'===| `""` |==='"),
print (".===| .==. |===."),
print (" \ | /&&&&\ | /"),
print (" / | \&&&&/ | \"),
print (" '===| `""` |==='"),
print ("'--.______.--'")
You need to escape the \ characters, double them:
print (" / | \::::/ | \"),
should be:
print(" / | \\::::/ | \\")
You want to get rid of all the commas too.
Note that you can create a multiline string using triple quotes; make it a raw string (using r'') and you don't have to escape anything either:
print(r''' _[]_
[____]
.----' '----.
.===| .==. |===.
\ | /####\ | /
/ | \####/ | \
'===| `""` |==='
.===| .==. |===.
\ | /::::\ | /
/ | \::::/ | \
'===| `""` |==='
.===| .==. |===.
\ | /&&&&\ | /
/ | \&&&&/ | \
'===| `""` |==='
jgs '--.______.--'
''')

custom sorting for find command output

I'm trying to get sorted directory/file list with unix "find" command.
# find . -type f
.
./bin
./data
./data/disks
./inc
./inc/calls
./inc/calls/show
./inc/calls/show/system
./inc/calls/show/cli
./inc/calls/show/network
./inc/calls/show/stats
./inc/calls/services
./inc/calls/services/ntp
./inc/calls/services/tsa
./inc/calls/services/webgui
./inc/calls/services/engine
./inc/calls/system
./inc/calls/change
./inc/calls/change/password
./inc/calls/change/network
./inc/calls/disk
./inc/calls/disk/encr
./inc/etc
I want to sort it like:
./inc/calls/show/system \
./inc/calls/show/cli \
./inc/calls/show/network \
./inc/calls/show/stats \
./inc/calls/services/ntp \
./inc/calls/services/tsa \
./inc/calls/services/webgui \
./inc/calls/services/engine \
./inc/calls/change/password \
./inc/calls/change/network \
./inc/calls/disk/encr \
./inc/calls/system \
./inc/calls/change \
./inc/calls/services \
./inc/calls/disk \
./inc/calls/show \
./inc/calls \
./data/disks \
./inc/etc \
./bin \
./data \
./inc
Which node (directory/file) has more child (directory/files) should be first... i want to do it with bash or python... What is the best way to do that?
Match lines containing / and prepend the number of fields to the line using / as the separator, sort on the numbers of fields and remove the count.
$ awk -F/ '/\//{print NF,$0}' file | sort -nrk1 | cut -d' ' -f2-
./inc/calls/show/system
./inc/calls/show/stats
./inc/calls/show/network
./inc/calls/show/cli
./inc/calls/services/webgui
./inc/calls/services/tsa
./inc/calls/services/ntp
./inc/calls/services/engine
./inc/calls/disk/encr
./inc/calls/change/password
./inc/calls/change/network
./inc/calls/system
./inc/calls/show
./inc/calls/services
./inc/calls/disk
./inc/calls/change
./inc/etc
./inc/calls
./data/disks
./inc
./data
./bin
I would use python and try to convert:
a/b
a/c
b/e/f
b/e/g
in something like:
{'a': {'b': {}, 'c': {}},
'b': {'e': {'f': {}, 'g': {}}},
}
To achieve this:
def add_list_to_dict(lst,d):
key, lst = lst[0], lst[1:]
if not key in d:
d[key] = {}
if lst:
add_list_to_dict(lst,d[key])
d = {}
for path in paths:
add_list_to_dict(path.split('/'),d)

Categories