Regex to find consecutive IP Addresses

Regex to find consecutive IP Addresses - python

I finally have to throw in the towel after working with this for quite some time today. I am trying to retrieve all the IP addresses from a output that looks like this:
My Address: 10.10.10.1
Explicit Route: 192.168.238.90 192.168.252.209 192.168.252.241 192.168.192.209
192.168.192.223
Record Route:
I need to pull all the IP addresses between from 'Explicit Route' and 'Record Route'. I am using textfsm and I seem not to be able to get everything I need.

Use regex and string operations:
import re
s = '''My Address: 10.10.10.1
Explicit Route: 192.168.238.90 192.168.252.209 192.168.252.241 192.168.192.209
192.168.192.223
Record Route:'''
ips = re.findall(r'\d+\.\d+\.\d+\.\d+', s[s.find('Explicit Route'):s.find('Record Route')])

import re
with open('file.txt', 'r') as file:
f = file.read().splitlines()
for line in f:
found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', line)
for f in found:
print(f)
Edit:
We open the txt and read by line, then for each line using regular exp. to find the ips ( can have 1-3 numbers, then . and repeat 4 times)

Related

Using Regex to find and replace email addresses

New to Python and would like to use it with Regex to work with a list of 5k+ email addresses. I need to change the encapsulate each address with either quotes. I am using \b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b to identify each email address. How would I replace the current entry of user#email.com to "user#email.com" adding quotes around the each of the 5k email addresses?

You can use re.sub module and using back-reference like this:
>>> a = "this is email: someone#mail.com and this one is another email foo#bar.com"
>>> re.sub('([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', a)
'this is email: "someone#mail.com" and this one is another email "foo#bar.com"'
UPDATE: If you have a file that want to replace emails in each line of it you can use readlines() like this:
import re
with open("email.txt", "r") as file:
lines = file.readlines()
new_lines = []
for line in lines:
new_lines.append(re.sub('([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,})', r'"\1"', line))
with open("email-new.txt", "w") as file:
file.writelines(new_lines)
email.txt:
this is test#something.com and another email here foo#bar.com
another email abc#bcd.com
still remaining someone#something.com
email-new.txt (after running the code):
this is "test#something.com" and another email here "foo#bar.com"
another email "abc#bcd.com"
still remaining "someone#something.com"

Python replace line in text file

I am trying to manage a host file with a python script. I am new to python and I am having a hard time with figuring out how to replace a line if I find a match. For example, if the address gets changed in a host file for a website I want the script to find it and change it back. Thanks for your help.
import os
import time
#location to the host file to read and write to
hosts_path=r"C:\Windows\System32\drivers\etc\hosts"
#the address I want for the sites
redirect="0.0.0.0"
#the websites that I will set the address for
website_list=["portal.citidirect.com","www.bcinet.nc","secure.banque-tahiti.pf","www.bancatlan.hn","www.bancentro.com.ni","www.davivienda.com.sv","www.davivienda.cr","cmo.cibc.com","www.bi.com.gt","empresas.banistmo.com","online.belizebank.com","online.westernunion.com","archive.clickatell.com"]
#continuous loop
while True:
with open(hosts_path,'r+') as file:
content=file.read()
#for each of the websites in the list above make sure they are in the host file with the correct address
for website in website_list:
site=redirect+" "+ website
#here is where I have an issue, if the website is in the host file but with the wrong address I want to write over the line, instead the program is adding it to the end of the file
if website in content:
if site in content:
pass
else:
file.write(site)
else:
file.write("\n"+site)
time.sleep(300)
os.system('ipconfig /flushdns')

You need to read the file into a list, then changes the index of the list if it needs to be, then writes the list back to the file. What you are doing was just writing to the end of the file. You can’t change a file directly like that. You need to record the changes in a list then write the list. I ended up having to re-write a lot of the code. Here's the full script. I wasn't sure what the os.system('ipconfig /flushdns') was accomplishing, so I removed it. You can easily add it back where you want.
#!/usr/bin/env python3.6
import time
hosts_path = r"C:\\Windows\\System32\\drivers\\etc\\hosts"
redirect = "0.0.0.0"
website_list = [
"portal.citidirect.com",
"www.bcinet.nc",
"secure.banque-tahiti.pf",
"www.bancatlan.hn",
"www.bancentro.com.ni",
"www.davivienda.com.sv",
"www.davivienda.cr",
"cmo.cibc.com",
"www.bi.com.gt",
"empresas.banistmo.com",
"online.belizebank.com",
"online.westernunion.com",
"archive.clickatell.com"]
def substring_in_list(the_list, substring):
for s in the_list:
if substring in s:
return True
return False
def write_websites():
with open(hosts_path, 'r') as file:
content = file.readlines()
for website in website_list:
site = "{} {}\n".format(redirect, website)
if not substring_in_list(content, website):
content.append(site)
else:
for line in content:
if site in line:
pass
elif website in line:
line = site
with open(hosts_path, "w") as file:
file.writelines(content)
while True:
write_websites()
time.sleep(300)

So, you're going to assign the same IP address to every site that doesn't appear in your websites list?
The following would replace what's inside your outermost while loop:
# Read in all the lines from the host file,
# splitting each into hostname, IPaddr and aliases (if any),
# and trimming off leading and trailing whitespace from
# each of these components.
host_lines = [[component.strip() for component in line.split(None, 2)] for line in open(host_path).readlines()]
# Process each of the original lines.
for line in host_lines:
# Is the site in our list?
if line[1] in website_list:
# Make sure the address is correct ...
if line[0] != redirect:
line[0] == redirect
# We can remove this from the websites list.
website_list.remove(line[1])
# Whatever sites are left in websites don't appear
# in the hosts file. Add lines for these to host_lines
host_lines.extend([[redirect, site] for site in website_list])
# Write the host_lines back out to the hosts file:
open(hosts_path, 'w').write("\n".join([" ".join(line) for line in host_lines]))
The rightmost join glues the components of each line back together into a single string. The join to the left of it glues all of these strings together with newline characters between them, and writes this entire string to the file.
I have to say, this looks like a rather complicated and even dangerous way to make sure your hosts file stays up-to-date and accurate. Wouldn't it be better to just have a cron job scp a known-good hosts file from a trusted host every five minutes instead?

I ended up mixing some of the responses to create a new file to replace the current host file using functions as shown below. In addition to this code I am using pyinstaller to turn it into an exe then I setup that exe to run as a auto-start service.
#!/usr/bin/env python3.6
import os
import shutil
import time
temp_file = r"c:\temp\Web\hosts"
temp_directory="c:\temp\Web"
hosts_path = r"C:\Windows\System32\drivers\etc\hosts"
websites = ('''# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
# 102.54.94.97 rhino.acme.com # source server
# 38.25.63.10 x.acme.com # x client host
# localhost name resolution is handled within DNS itself.
# 127.0.0.1 localhost
# ::1 localhost
0.0.0.0 portal.citidirect.com
0.0.0.0 www.bcinet.nc
0.0.0.0 secure.banque-tahiti.pf
0.0.0.0 www.bancatlan.hn
0.0.0.0 www.bancentro.com.ni
0.0.0.0 www.davivienda.com.sv
0.0.0.0 www.davivienda.cr
0.0.0.0 cmo.cibc.com
0.0.0.0 www.bi.com.gt
0.0.0.0 empresas.banistmo.com
0.0.0.0 online.belizebank.com
0.0.0.0 online.westernunion.com
0.0.0.0 archive.clickatell.com''')
def write_websites():
with open(temp_file, 'w+') as file:
file.write(websites)
while True:
if not os.path.exists(temp_directory):
os.makedirs(temp_directory)
try:
os.remove(temp_file)
except OSError:
pass
write_websites()
try:
os.remove(hosts_path)
except OSError:
pass
try:
shutil.move(temp_file,hosts_path)
except OSError:
pass
os.system('ipconfig /flushdns')
time.sleep(300)

Using nslookup to find domain name and only the domain name

Currently I have a text file with mutiple IP's I am currently attempting to pull only the domain name from the set of information given using nslookup (code below)
with open('test.txt','r') as f:
for line in f:
print os.system('nslookup' + " " + line)
This works in so far that it pulls all the information from the first IP's. I can't get it passed the first IP but I'm currently attempting to clean up the information recived to only the Domain name of the IP. Is there any way to do that or do I need to use a diffrent module

Like IgorN, I wouldn't make a system call to use nslookup; I would also use socket. However, the answer shared by IgorN provides the hostname. The requestor asked for the domain name. See below:
import socket
with open('test.txt', 'r') as f:
for ip in f:
fqdn = socket.gethostbyaddr(ip) # Generates a tuple in the form of: ('server.example.com', [], ['127.0.0.1'])
domain = '.'.join(fqdn[0].split('.')[1:])
print(domain)
Assuming that test.txt contains the following line, which resolves to a FQDN of server.example.com:
127.0.0.1
this will generate the following output:
example.com
which is what (I believe) the OP desires.

import socket
name = socket.gethostbyaddr(‘127.0.0.1’)
print(name) #to get the triple
print(name[0]) #to just get the hostname

find country from full domain name

I am writing an script to analyse the countries of a list of domain names(e.g. third.second.first). The data set is pretty old and many of the fully qualified domain names cannot be found via socket.gethostbyname(domain_str) in python. Here are some of the alternatives I come up with:
Retrieving the ip of second.first if the ip of third.second.first
cannot be found and then find the country of that ip
This seems not to be a good idea since a dns A-record can map a subdomain to an ip different from its primary domain.
detect the country code of the domain name. e.g. if it is ..jp, it is from Japan
My questions are:
Is the first method acceptable ?
are there other methods to retrieve the country information of a domain name ?
Thank you.

I would recommend using the geolite2 module:
https://pypi.python.org/pypi/maxminddb-geolite2
So you could do something like this:
#!/usr/bin/python
import socket
from geolite2 import geolite2
def origin(ip, domain_str, result):
print("{0} [{1}]: {2}".format(domain_str.strip(), ip, result))
def getip(domain_str):
ip = socket.gethostbyname(domain_str.strip())
reader = geolite2.reader()
output = reader.get(ip)
result = output['country']['iso_code']
origin(ip, domain_str, result)
with open("/path/to/hostnames.txt", "r") as ins:
for domain_str in ins:
try:
getip(domain_str)
except socket.error as msg:
print("{0} [could not resolve]".format(domain_str.strip()))
if len(domain_str) > 2:
subdomain = domain_str.split('.', 1)[1]
try:
getip(subdomain)
except:
continue
geolite2.close()
Output:
bing.com [204.79.197.200]: US
dd15-028.compuserve.com [could not resolve]
compuserve.com [149.174.98.149]: US
google.com [172.217.11.78]: US

Find http:// and or www. and strip from domain. leaving domain.com

I'm quite new to python. I'm trying to parse a file of URLs to leave only the domain name.
some of the urls in my log file begin with http:// and some begin with www.Some begin with both.
This is the part of my code which strips the http:// part. What do I need to add to it to look for both http and www. and remove both?
line = re.findall(r'(https?://\S+)', line)
Currently when I run the code only http:// is stripped. if I change the code to the following:
line = re.findall(r'(https?://www.\S+)', line)
Only domains starting with both are affected.
I need the code to be more conditional.
TIA
edit... here is my full code...
import re
import sys
from urlparse import urlparse
f = open(sys.argv[1], "r")
for line in f.readlines():
line = re.findall(r'(https?://\S+)', line)
if line:
parsed=urlparse(line[0])
print parsed.hostname
f.close()
I mistagged by original post as regex. it is indeed using urlparse.

It might be overkill for this specific situation, but i'd generally use urlparse.urlsplit (Python 2) or urllib.parse.urlsplit (Python 3).
from urllib.parse import urlsplit # Python 3
from urlparse import urlsplit # Python 2
import re
url = 'www.python.org'
# URLs must have a scheme
# www.python.org is an invalid URL
# http://www.python.org is valid
if not re.match(r'http(s?)\:', url):
url = 'http://' + url
# url is now 'http://www.python.org'
parsed = urlsplit(url)
# parsed.scheme is 'http'
# parsed.netloc is 'www.python.org'
# parsed.path is None, since (strictly speaking) the path was not defined
host = parsed.netloc # www.python.org
# Removing www.
# This is a bad idea, because www.python.org could
# resolve to something different than python.org
if host.startswith('www.'):
host = host[4:]

You can do without regexes here.
with open("file_path","r") as f:
lines = f.read()
lines = lines.replace("http://","")
lines = lines.replace("www.", "") # May replace some false positives ('www.com')
urls = [url.split('/')[0] for url in lines.split()]
print '\n'.join(urls)
Example file input:
http://foo.com/index.html
http://www.foobar.com
www.bar.com/?q=res
www.foobar.com
Output:
foo.com
foobar.com
bar.com
foobar.com
Edit:
There could be a tricky url like foobarwww.com, and the above approach would strip the www. We will have to then revert back to using regexes.
Replace the line lines = lines.replace("www.", "") with lines = re.sub(r'(www.)(?!com)',r'',lines). Of course, every possible TLD should be used for the not-match pattern.

I came across the same problem. This is a solution based on regular expressions:
>>> import re
>>> rec = re.compile(r"https?://(www\.)?")
>>> rec.sub('', 'https://domain.com/bla/').strip().strip('/')
'domain.com/bla'
>>> rec.sub('', 'https://domain.com/bla/ ').strip().strip('/')
'domain.com/bla'
>>> rec.sub('', 'http://domain.com/bla/ ').strip().strip('/')
'domain.com/bla'
>>> rec.sub('', 'http://www.domain.com/bla/ ').strip().strip('/')
'domain.com/bla'

Check out the urlparse library, which can do these things for you automatically.
>>> urlparse.urlsplit('http://www.google.com.au/q?test')
SplitResult(scheme='http', netloc='www.google.com.au', path='/q', query='test', fragment='')

You can use urlparse. Also, the solution should be generic to remove things other than 'www' before the domain name (i.e., handle cases like server1.domain.com). The following is a quick try that should work:
from urlparse import urlparse
url = 'http://www.muneeb.org/files/alan_turing_thesis.jpg'
o = urlparse(url)
domain = o.hostname
temp = domain.rsplit('.')
if(len(temp) == 3):
domain = temp[1] + '.' + temp[2]
print domain

I believe #Muneeb Ali is the nearest to the solution but the problem appear when is something like frontdomain.domain.co.uk....
I suppose:
for i in range(1,len(temp)-1):
domain = temp[i]+"."
domain = domain + "." + temp[-1]
Is there a nicer way to do this?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regex to find consecutive IP Addresses - python

Use regex and string operations: import re s = '''My Address: 10.10.10.1 Explicit Route: 192.168.238.90 192.168.252.209 192.168.252.241 192.168.192.209 192.168.192.223 Record Route:''' ips = re.findall(r'\d+\.\d+\.\d+\.\d+', s[s.find('Explicit Route'):s.find('Record Route')])

Related

Using Regex to find and replace email addresses

Python replace line in text file

Using nslookup to find domain name and only the domain name

find country from full domain name

Find http:// and or www. and strip from domain. leaving domain.com

Categories

Resources