Extract data from a field in a text file in Python - python

I am new to Python. I want to know what is the best way to extract data from a field in a text file?
My text file saves the information of a network. It looks like this:
Name: Machine_1 Status: On IP:10.0.0.1
Name: Machine_2 Status: On IP:10.0.0.2
Network_name: Private Router_name: router1 Router_ID=3568
Subnet: Tenant A
The file is not very structured. It cannot even be expressed as a CSV file due to non-homogeneous nature of rows i.e. all of them do not have the same column identifiers.
What I want to do is to be able to get the value of any field I want e.g. Router_ID.
Please help me find a solution to this.
Thanks.

You could use regular expressions to scan through your file. You'd have to define a regular expression for each field you want to extract. For example:
import re
data = """Name: Machine_1 Status: On IP:10.0.0.1
Name: Machine_2 Status: On IP:10.0.0.2
Network_name: Private Router_name: router1 Router_ID=3568
Subnet: Tenant A"""
for line in data.split('\n'):
ip = re.match('.*IP:(\d+.\d+.\d+.\d+)', line)
rname = re.match('.*Router_name: (\w+)', line)
if ip and ip.lastindex > 0:
print(ip.group(1))
if rname and rname.lastindex > 0:
print(rname.group(1))
Output:
10.0.0.1
10.0.0.2
router1

Related

Using Regex to find and replace email addresses

New to Python and would like to use it with Regex to work with a list of 5k+ email addresses. I need to change the encapsulate each address with either quotes. I am using \b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b to identify each email address. How would I replace the current entry of user#email.com to "user#email.com" adding quotes around the each of the 5k email addresses?
You can use re.sub module and using back-reference like this:
>>> a = "this is email: someone#mail.com and this one is another email foo#bar.com"
>>> re.sub('([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', a)
'this is email: "someone#mail.com" and this one is another email "foo#bar.com"'
UPDATE: If you have a file that want to replace emails in each line of it you can use readlines() like this:
import re
with open("email.txt", "r") as file:
lines = file.readlines()
new_lines = []
for line in lines:
new_lines.append(re.sub('([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', line))
with open("email-new.txt", "w") as file:
file.writelines(new_lines)
email.txt:
this is test#something.com and another email here foo#bar.com
another email abc#bcd.com
still remaining someone#something.com
email-new.txt (after running the code):
this is "test#something.com" and another email here "foo#bar.com"
another email "abc#bcd.com"
still remaining "someone#something.com"

Using nslookup to find domain name and only the domain name

Currently I have a text file with mutiple IP's I am currently attempting to pull only the domain name from the set of information given using nslookup (code below)
with open('test.txt','r') as f:
for line in f:
print os.system('nslookup' + " " + line)
This works in so far that it pulls all the information from the first IP's. I can't get it passed the first IP but I'm currently attempting to clean up the information recived to only the Domain name of the IP. Is there any way to do that or do I need to use a diffrent module
Like IgorN, I wouldn't make a system call to use nslookup; I would also use socket. However, the answer shared by IgorN provides the hostname. The requestor asked for the domain name. See below:
import socket
with open('test.txt', 'r') as f:
for ip in f:
fqdn = socket.gethostbyaddr(ip) # Generates a tuple in the form of: ('server.example.com', [], ['127.0.0.1'])
domain = '.'.join(fqdn[0].split('.')[1:])
print(domain)
Assuming that test.txt contains the following line, which resolves to a FQDN of server.example.com:
127.0.0.1
this will generate the following output:
example.com
which is what (I believe) the OP desires.
import socket
name = socket.gethostbyaddr(‘127.0.0.1’)
print(name) #to get the triple
print(name[0]) #to just get the hostname

Extract and get value from a text file python

I have executed ssh commands in remote machine using paramiko library and written output to text file. Now, I want to extract few values from a text file. The output of a text file looks as pasted below
b'\nMS Administrator\n(C) Copyright 2006-2016 LP\n\n[MODE]> SHOW INFO\n\n\nMode: \nTrusted Certificates\n1 Details\n------------\n\tDeveloper ID: MS-00c1\n\tTester ID: ms-00B1\n\tValid from: 2030-01-29T06:51:15Z\n\tValid until: 2030-01-30T06:51:15Z\n\t
how do i get the value of Developer ID and Tester ID. The file is huge.
As suggested by users I have written the snippet below.
file = open("Output.txt").readlines()
for lines in file:
word = re.findall('Developer\sID:\s(.*)\n', lines)[0]
print(word)
I see the error IndexError: list index out of range
If i remove the index. I see empty output
file = open("Output.txt").readlines()
developer_id=""
for lines in file:
if 'Developer ID' in line:
developer_id = line.split(":")[-1].strip()
print developer_id
You can use Regular expressions
text = """\nMS Administrator\n(C) Copyright 2006-2016 LP\n\n[MODE]> SHOW INFO\n\n\nMode: \nTrusted Certificates\n1 Details\n------------\n\tDeveloper ID: MS-00c1\n\tTester ID: ms-00B1\n\tValid from: 2030-01-29T06:51:15Z\n\tValid until: 2030-01-30T06:51:15Z\n\t"""
import re
developerID = re.match("Developer ID:(.+)\\n", text).group(0)
testerID = re.match("Tester ID:(.+)\\n", text).group(0)
If your output is consistent in format, you can use something as easy as line.split():
developer_id = line.split('\n')[11].lstrip()
tester_id = line.split('\n')[12].lstrip()
Again, this assumes that every line is using the same formatting. Otherwise, use regex as suggested above.

Regex to find consecutive IP Addresses

I finally have to throw in the towel after working with this for quite some time today. I am trying to retrieve all the IP addresses from a output that looks like this:
My Address: 10.10.10.1
Explicit Route: 192.168.238.90 192.168.252.209 192.168.252.241 192.168.192.209
192.168.192.223
Record Route:
I need to pull all the IP addresses between from 'Explicit Route' and 'Record Route'. I am using textfsm and I seem not to be able to get everything I need.
Use regex and string operations:
import re
s = '''My Address: 10.10.10.1
Explicit Route: 192.168.238.90 192.168.252.209 192.168.252.241 192.168.192.209
192.168.192.223
Record Route:'''
ips = re.findall(r'\d+\.\d+\.\d+\.\d+', s[s.find('Explicit Route'):s.find('Record Route')])
import re
with open('file.txt', 'r') as file:
f = file.read().splitlines()
for line in f:
found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', line)
for f in found:
print(f)
Edit:
We open the txt and read by line, then for each line using regular exp. to find the ips ( can have 1-3 numbers, then . and repeat 4 times)

Read from file and store into dict containing dict and list

I've searched all over the web for the answer to my question and I've used aspects of what I've learned to help me get to this point. However, I've been unable to find the solution to get me where I need to be.
In the shortest way possible, I need to create a dictionary containing a dictionary and a list of values as read in from a file, and print the output.
I was able to do this using a statically created dictionary, but I seem to be unable to create the dictionary in the same format while reading in from a file.
Here is the code I was able to get working:
routers = {'fre-agg1': {'interface Te0/1/0/0': ["rate-limit input 135", "rate-limit input 136"],
'interface Te0/2/0/0': ["rate-limit input 135", "rate-limit input 136"]},
'fre-agg2': {'interface Te0/3/0/0': ["rate-limit input 135", "rate-limit input 136", "rate-limit input 137"]}}
for rname in routers:
print rname
for iname in routers[rname]:
print iname
for int_config in routers[rname][iname]:
print int_config
The output of this prints exactly in the format I need it to be:
fre-agg2
interface Te0/3/0/0
rate-limit input 135
rate-limit input 136
rate-limit input 137
fre-agg1
interface Te0/1/0/0
rate-limit input 135
rate-limit input 136
interface Te0/2/0/0
rate-limit input 135
rate-limit input 136
The file I am trying to read in is in a different format:
ama-coe:interface Loopback0
ama-coe: ip address 10.1.1.1 255.255.255.255
ama-coe:interface GigabitEthernet0/0/0
ama-coe: description EGM to xyz Gi2/0/1
ama-coe: ip address 10.2.1.1 255.255.255.254
ama-coe:interface GigabitEthernet0/0/1
ama-coe: description EGM to abc Gi0/0/1
ama-coe: ip address 10.3.1.1 255.255.255.254
For this file, I'd like the output of the file as the same output shown above, with the interface configuration listed under the interface name, listed under the device name
ama-coe
interface Loopback0
ip address 10.1.1.1 255.255.255.255
interface GigabitEthernet0/0/0
etc etc etc
So far, here is the code I have:
routers = {}
with open('cpe-interfaces-ipaddress.txt') as inputFile:
inputData = inputFile.read().splitlines()
for rname in inputData:
device, stuff = rname.split(':')
if not device in routers:
routers[device] = None
elif stuff == "interface":
routers[device][None] = stuff
I know this code is extremely incomplete but I can't for the life of me figure out the dictionary and list structure as I did when statically creating the dict.
Any help that can be provided would be greatly appreciated.
Thank you.
routers = {}
with open('cpe-interfaces-ipaddress.txt') as inputFile:
cur_interface = None
for rname in inputFile:
device, stuff = rname.strip().split(':')
print device, stuff, cur_interface
if not device in routers:
routers[device] = {}
if stuff.startswith("interface"):
key_word, interface = stuff.split(' ')
routers[device].setdefault(interface, [])
cur_interface = interface
else: # ip address
routers[device][cur_interface].append(stuff)
Not knowing if this meet your need. I made an assumption that each ip address is belong to previous interface.
The way using a dictionary to organize stuff is common. You should learn some built-in methods, such as setdefault and append.

Categories