Parsing a text file in Python

Parsing a text file in Python - python

I am new in python and I am trying to extract values out of a text file.
Input:
Vlan101, Interface status: protocol-up/link-up/admin-up, iod: 257,
IP address: 1.1.1.1, IP subnet: 1.1.1.0/24
IP broadcast address: 255.255.255.255
Output:
Vlan101,1.1.1.0/24
I have a code which is working but not giving me the desired output.
My code:
if 'Vlan' in text:
vlanArray = text.split(",")
print(vlanArray[0])
if 'IP subnet' in text:
ipAddress = text.split(":")
lenipAdd = len(ipAddress)
print(ipAddress[lenipAdd-1].strip())
Any help would be appreciated.

It seems you going a bit to fast. I would suggest to first try an intermediate step:
vlanArray = text.split(",")
for txt in vlanArray:
print(txt)
This should give you direction about the next steps to take.

You can use a regular expression to extract the information you need:
s = """Vlan101, Interface status: protocol-up/link-up/admin-up, iod: 257,
IP address: 1.1.1.1, IP subnet: 1.1.1.0/24
IP broadcast address: 255.255.255.255"""
import re
m = re.match(r'^([\d\w]+)(.*)(IP subnet: )([0-9./]+)', s, re.DOTALL | re.S | re.MULTILINE )
result = m.groups()
print (result[0], result[-1])
Returns:
Vlan101 1.1.1.0/24

There is no need for 2 or more split of same text. Try below:
Do the split of the text and store in an array
Run the loop through the array
Check if the array item contains Vlan or IP subnet
If true, append in the output variable
Like below:
vlanArray = text.split(",")
outTxt = []
for subTxt in vlanArray:
if 'Vlan' in subTxt:
outTxt.append(subTxt)
if 'IP subnet' in subTxt:
ipAddress = subTxt.split(":")
lenipAdd = len(ipAddress)
outTxt.append(ipAddress[lenipAdd-1].strip())
outTxt = ','.join(outTxt)

Related

How can I print in an f-string an IP address in the dotted binary notation from the output of inet_pton()?

The following code is supposed to take an IP from its user, convert it to the binary and print it to the screen.
#!/usr/bin/env python3
from socket import inet_aton, inet_pton, AF_INET
ip = input("IP?\n")
ip = inet_pton(AF_INET, ip)
print(f"{ip}")
When given 185.254.27.69 it prints
b'\xb9\xfe\x1bE' .f"{ip:08b}" does not work, perhaps because of the three dots in between the fours octets.. How could I get the dotted binary format of an IP printed on the screen? Any resources of use?

Unless I'm missing something, I don't see a reason to use inet_pton here. It converts to packed bytes, when you want a binary representation of the numbers (I assume):
ip = input("IP?\n")
print('.'.join(f'{int(num):08b}' for num in ip.split('.')))
For the input you supplied:
IP?
185.254.27.69
10111001.11111110.00011011.01000101

this code works for binary ip and keeps leading zeros:
from socket import inet_aton, inet_pton, AF_INET
ip = ip2 = input("IP?\n")
ip = inet_pton(AF_INET, ip)
ip2 = ip2.split(".")
ip3 = ""
for ip in ip2:
ip = int(ip)
if len(ip3) == 0:
zeros = str(bin(ip)[2:]).zfill(8)
ip3 += zeros
else:
zeros = str(bin(ip)[2:]).zfill(8)
ip3 += "." + zeros
print(f"{ip3}")

python3: extract IP address from compiled pattern

I want to process every line in my log file, and extract IP address if line matches my pattern. There are several different types of messages, in example below I am using p1andp2`.
I could read the file line by line, and for each line match to each pattern. But
Since there can be many more patterns, I would like to do it as efficiently as possible. I was hoping to compile thos patterns into one object, and do the match only once for each line:
import re
IP = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
p1 = 'Registration from' + IP + '- Wrong password'
p2 = 'Call from' + IP + 'rejected because extension not found'
c = re.compile(r'(?:' + p1 + '|' + p2 + ')')
for line in sys.stdin:
match = re.search(c, line)
if match:
print(match['ip'])
but the above code does not work, it complains that ip is used twice.
What is the most elegant way to achieve my goal ?
EDIT:
I have modified my code based on answer from #Dev Khadka.
But I am still struggling with how to properly handle the multiple ip matches. The code below prints all IPs that matched p1:
for line in sys.stdin:
match = c.search(line)
if match:
print(match['ip1'])
But some lines don't match p1. They match p2. ie, I get:
1.2.3.4
None
2.3.4.5
...
How do I print the matching ip, when I don't know wheter it was p1, p2, ... ? All I want is the IP. I don't care which pattern it matched.

You can consider installing the excellent regex module, which supports many advanced regex features, including branch reset groups, designed to solve exactly the problem you outlined in this question. Branch reset groups are denoted by (?|...). All capture groups of the same positions or names in different alternative patterns within a branch reset grouop share the same capture groups for output.
Notice that in the example below the matching capture group becomes the named capture group, so that you don't need to iterate over multiple groups searching for a non-empty group:
import regex
ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
pattern = regex.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
match = regex.search(pattern, line)
if match:
print(match['ip'])
Demo: https://repl.it/#blhsing/RegularEmbellishedBugs

why don't you check which regex matched?
if 'ip1' in match :
print match['ip1']
if 'ip2' in match :
print match['ip2']
or something like:
names = [ 'ip1', 'ip2', 'ip3' ]
for n in names :
if n in match :
print match[n]
or even
num = 1000 # can easily handle millions of patterns =)
for i in range(num) :
name = 'ip%d' % i
if name in match :
print match[name]

thats because you are using same group name for two group
try this, this will give group names ip1 and ip2
import re
IP = r'(?P<ip%d>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
p1 = 'Registration from' + IP%1 + '- Wrong password'
p2 = 'Call from' + IP%2 + 'rejected because extension not found'
c = re.compile(r'(?:' + p1 + '|' + p2 + ')')

Named capture groups must have distinct names, but since all of your capture groups are meant to capture the same pattern, it's better not to use named capture groups in this case but instead simply use regular capture groups and iterate through the groups from the match object to print the first group that is not empty:
ip_pattern = r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
pattern = re.compile('|'.join(patterns).format(ip=ip_pattern))
for line in sys.stdin:
match = re.search(pattern, line)
if match:
print(next(filter(None, match.groups())))
Demo: https://repl.it/#blhsing/UnevenCheerfulLight

Adding ip address validity to already accepted answer.
Altho import ipaddress & import socket should be ideal ways, this code will parse-the-host,
import regex as re
from io import StringIO
def valid_ip(address):
try:
host_bytes = address.split('.')
valid = [int(b) for b in host_bytes]
valid = [b for b in valid if b >= 0 and b<=255]
return len(host_bytes) == 4 and len(valid) == 4
except:
return False
ip_pattern = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
patterns = patterns = [
'Registration from {ip} - Wrong password',
'Call from {ip} rejected because extension not found'
]
file = StringIO('''
Registration from 259.1.1.1 - Wrong password,
Call from 1.1.2.2 rejected because extension not found
''')
pattern = re.compile('(?|%s)' % '|'.join(patterns).format(ip=ip_pattern))
list1 = []
list2 = []
for line in file:
match = re.search(pattern, line)
if match:
list1.append(match['ip']) # List of ip address
list2.append(valid_ip(match['ip'])) # Boolean results of valid_ip
for i in range(len(list1)):
if list2[i] == False:
print(f'{list1[i]} is invalid IP')
else:
print(list1[i])
259.1.1.1 is invalid IP
1.1.2.2
[Program finished]

Python program to convert wildcard mask to netmask

I need help with writing a python program to achieve this task.
I am trying to convert wildcard mask to netmask.
Input:
192.168.0.1 0.0.0.15
Expected output:
192.168.0.1 255.255.255.240

What have you tried? I think it is just xor operator on the bits. Let me know if I'm correct please.
my inputs: 192.168.0.1 0.0.0.15
expected output: 192.168.0.1 255.255.255.240
ip, wcmask = input.split()
netmask='.'.join([str(255^int(i)) for i in wcmask.split('.')])
return '{} {}'.format(ip, netmask)

python2
>>> import ipaddress
>>> print ipaddress.ip_network(u'192.168.0.1/0.0.0.15', strict=False).netmask
255.255.255.240
python3
>>> import ipaddress
>>> print(ipaddress.ip_network('192.168.0.1/0.0.0.15', strict=False).netmask)
255.255.255.240

Convert wildcard to subnet
from cisco_acl import Address
address = Address("192.168.0.1 0.0.0.15")
subnets = address.subnets()
print(subnets)
# ['192.168.0.0 255.255.255.240']
Convert non-contiguous wildcard to list of subnets
from cisco_acl import Address
address = Address("192.168.0.1 0.0.3.15")
subnets = address.subnets()
print(subnets)
# ['192.168.0.0 255.255.255.240',
# '192.168.1.0 255.255.255.240',
# '192.168.2.0 255.255.255.240',
# '192.168.3.0 255.255.255.240']

IPv4 address substitution in Python script

I'm having trouble getting this to work, and I am hoping for any ideas:
My goal: to take a file, read it line by line, substitute any IP address for a specific substitute, and write the changes to the same file.
I KNOW THIS IS NOT CORRECT SYNTAX
Pseudo-Example:
$ cat foo
10.153.193.0/24 via 10.153.213.1
def swap_ip_inline(line):
m = re.search('some-regex', line)
if m:
for each_ip_it_matched:
ip2db(original_ip)
new_line = reconstruct_line_with_new_ip()
line = new_line
return line
for l in foo.readlines():
swap_ip_inline(l)
do some foo to rebuild the file.
I want to take the file 'foo', find each IP in a given line, substitute the ip using the ip2db function, and then output the altered line.
Workflow:
1. Open File
2. Read Lines
3. Swap IP's
4. Save lines (altered/unaltered) into tmp file
5. Overwrite original file with tmp file
*edited to add pseudo-code example

Here you go:
>>> import re
>>> ip_addr_regex = re.compile(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b')
>>> f = open('foo')
>>> for line in f:
... print(line)
...
10.153.193.0/24 via 10.153.213.1
>>> f.seek(0)
>>>
specific_substitute = 'foo'
>>> for line in f:
... re.sub(ip_addr_regex, specific_substitute, line)
...
'foo/24 via foo\n'

This link gave me the breatkthrough I was looking for:
Python - parse IPv4 addresses from string (even when censored)
a simple modification passes initial smoke tests:
def _sub_ip(self, line):
pattern = r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)([ (\[]?(\.|dot)[ )\]]?(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})"
ips = [each[0] for each in re.findall(pattern, line)]
for item in ips:
location = ips.index(item)
ip = re.sub("[ ()\[\]]", "", item)
ip = re.sub("dot", ".", ip)
ips.remove(item)
ips.insert(location, ip)
for ip in ips:
line = line.replace(ip, self._ip2db(ip))
return line
I'm sure I'll clean it up down the road, but it's a great start.

How can I generate all possible IPs from a CIDR list in Python?

Let's say I have a text file contains a bunch of cidr ip ranges like this:
x.x.x.x/24
x.x.x.x/24
x.x.x.x/23
x.x.x.x/23
x.x.x.x/22
x.x.x.x/22
x.x.x.x/21
and goes on...
How can I convert these cidr notations to all possible ip list in a new text file in Python?

You can use netaddr for this. The code below will create a file on your disk and fill it with every ip address in the requested block:
from netaddr import *
f = open("everyip.txt", "w")
ip = IPNetwork('10.0.0.0/8')
for addr in ip:
f.write(str(addr) + '\n')
f.close()

If you don't need the satisfaction of writing your script from scratch, you could use the python cidrize package.

based off How can I generate all possible IPs from a list of ip ranges in Python?
import struct, socket
def ips(start, end):
start = struct.unpack('>I', socket.inet_aton(start))[0]
end = struct.unpack('>I', socket.inet_aton(end))[0]
return [socket.inet_ntoa(struct.pack('>I', i)) for i in range(start, end)]
# ip/CIDR
ip = '012.123.234.34'
CIDR = 10
i = struct.unpack('>I', socket.inet_aton(ip))[0] # number
# 175893026
start = (i >> CIDR) << CIDR # shift right end left to make 0 bits
end = i | ((1 << CIDR) - 1) # or with 11111 to make highest number
start = socket.inet_ntoa(struct.pack('>I', start)) # real ip address
end = socket.inet_ntoa(struct.pack('>I', end))
ips(start, end)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing a text file in Python - python

It seems you going a bit to fast. I would suggest to first try an intermediate step: vlanArray = text.split(",") for txt in vlanArray: print(txt) This should give you direction about the next steps to take.

Related

How can I print in an f-string an IP address in the dotted binary notation from the output of inet_pton()?

python3: extract IP address from compiled pattern

Python program to convert wildcard mask to netmask

IPv4 address substitution in Python script

How can I generate all possible IPs from a CIDR list in Python?

Categories

Resources