Find all IPs on an HTML Page - python

I want to get an HTML page with python and then print out all the IPs from it.
I will define an IP as the following:
x.x.x.x:y
Where:
x = a number between 0 and 256.
y = a number with < 7 digits.
Thanks.

Right. The only part I cant do is the regular expression one. – das 9 mins ago If someone shows me that, I will be fine. – das 8 mins ago
import re
ip = re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?):\d{1,6}\b")
junk = " 1.1.1.1:123 2.2.2.2:321 312.123.1.12:123 "
print ip.findall(junk)
# outputs ['1.1.1.1:123', '2.2.2.2:321']
Here is a complete example:
import re, urllib2
f = urllib2.urlopen("http://www.samair.ru/proxy/ip-address-01.htm")
junk = f.read()
ip = re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?):\d{1,6}\b")
print ip.findall(junk)
# ['114.30.47.10:80', '118.228.148.83:80', '119.70.40.101:8080', '12.47.164.114:8888', '121.
# 17.161.114:3128', '122.152.183.103:80', '122.224.171.91:3128', '123.234.32.27:8080', '124.
# 107.85.115:80', '124.247.222.66:6588', '125.76.228.201:808', '128.112.139.75:3128', '128.2
# 08.004.197:3128', '128.233.252.11:3124', '128.233.252.12:3124']

The basic approach would be:
Use urllib2 to download the contents of the page
Use a regular expression to extract IPv4-like addresses
Validate each match according to the numeric constraints on each octet
Print out the list of matches
Please provide a clearer indication of what specific part you are having trouble with, along with evidence to show what it is you've tried thus far.

Not to turn this into a who's-a-better-regex-author-war but...
(\d{1,3}\.){3}\d{1,3}\:\d{1,6}

Try:
re.compile("\d?\d?\d.\d?\d?\d.\d?\d?\d.\d?\d?\d:\d+").findall(urllib2.urlopen(url).read())

In action:
\b(?: # A.B.C in A.B.C.D:port
(?:
25[0-5]
| 2[0-4][0-9]
| 1[0-9][0-9]
| [1-9]?[0-9]
)\.
){3}
(?: # D in A.B.C.D:port
25[0-5]
| 2[0-4][0-9]
| 1[0-9][0-9]
| [1-9]?[0-9]
)
:[1-9]\d{0,5} # port number any number in (0,999999]
\b

Related

How do I grab specific text in between other text?

I need help grabbing just K334-76A9 from this string:
b'\x0cWelcome, Pepo \r\nToday is Mon 04/29/2019 \r\n\r\n Volume in drive C has no label.\r\n Volume Serial Number is K334-76A9\r\n
Please help, I have tried so many things but none have worked.
Sorry if my question is bad :/
If you want to find the format xxxx-xxxx, no matter what string you have you can do it like this:
import re
b = '\x0cWelcome, Pepo \r\nToday is Mon 04/29/2019 \r\n\r\n Volume in drive C has no label.\r\n Volume Serial Number is K334-76A9\r\n'
splitString = []
splitString = b.split()
r = re.compile('.{4}-.{4}')
for string in splitString:
if r.match(string):
print(string)
Output:
K334-76A9
Here's code that grabs everything after "Serial Number is " up to the next whitespace character.
import re
data = b'\x0cWelcome, Pepo \r\nToday is Mon 04/29/2019 \r\n\r\n Volume in drive C has no label.\r\n Volume Serial Number is K334-76A9\r\n'
pat = re.compile(r"Serial Number is ([^\s]+)")
match = pat.search(data.decode("ASCII"))
if match:
print(match.group(1))
Result:
K334-76A9
You can adjust the regular expression per your needs. Regular expressions are Da Bomb! This one's really simple, but you can do amazingly complex things with them.

Parse Output for Python

My software outputs these two types of output:
-rwx------ Administrators/Domain Users 456220672 0% 2018-04-16 16:04:40 E:\\_WiE10-18.0.100-77.iso
-rwxrwx--- Administrators/unknown 6677 0% 2018-04-17 01:33:23 E:\\program files\\cluster groups\\sql server (mssqlserver)\\logs\\progress-MOD-1523883344023-3001-Windows.log
I would like to get the file names from both outputs:
E:\\_WiE10-18.0.100-77.iso, for the first one
E:\\program files\\cluster groups\\sql server (mssqlserver)\\logs\\progress-MOD-1523883344023-3001-Windows.log, for the second one
If i use something like the code below, it won't work if the second parameter has spaces in it. It works if there aren't any spaces in the Domain Username.
for item in outputs:
outputs.extend(item.split())
for item2 in [' '.join(outputs[6:])]:
new_list.append(item2)
How can I get all the parameters individually, including the filenames?
If regex is an option:
text = """-rwx------ Administrators/Domain Users 456220672 0% 2018-04-16 16:04:40 E:\\_WiE10-18.0.100-77.iso
-rwxrwx--- Administrators/unknown 6677 0% 2018-04-17 01:33:23 E:\\program files\\cluster groups\\sql server (mssqlserver)\\logs\\progress-MOD-1523883344023-3001-Windows.log"""
import re
for h in re.findall(r"^.*?\d\d:\d\d:\d\d (.*)",text,flags=re.MULTILINE):
print(h)
Output:
E:\_WiE10-18.0.100-77.iso
E:\program files\cluster groups\sql server (mssqlserver)\logs\progress-MOD-1523883344023-3001-Windows.log
Pattern explained:
The pattern r"^.*?\d\d:\d\d:\d\d (.*)" looks for linestart '^' + as less anythings as possible '.*?' + the time-stamp '\d\d:\d\d:\d\d ' followed by a space and captures all behind it till end of line into a group.
It uses the re.MULTILINE flag for that.
Edit:
Capturing the individual things needs some more capturing groups:
import re
for h in re.findall(r"^([rwexXst-]+) ([^0-9]+) +\d+.+? +(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (.*)",text,flags=re.MULTILINE):
# ^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^
# flags grpName datetime filename
for k in h:
print(k)
print("")
Output:
-rwx------
Administrators/Domain Users
2018-04-16 16:04:40
E:\_WiE10-18.0.100-77.iso
-rwxrwx---
Administrators/unknown
2018-04-17 01:33:23
E:\program files\cluster groups\sql server (mssqlserver)\logs\progress-MOD-1523883344023-3001-Windows.log
You could use a regular expression like
\b[A-Z]:\\\\.+
Aside from using regex, you can try something similar to this.
output = '-rwx------ ... 2018-04-16 16:04:40 E:\\\\_WiE10-18.0.100-77.iso'
drive_letter_start = output.find(':\\\\')
filename = output[drive_letter_start - 1:]
It looks for the first occurrence of ':\\'and gets the drive letter before the substring (i.e. ':\\') and the full file path after the substring.
EDIT
Patrick Artner's answer is better and completely answers OP's question compared to this answer. This only encompasses capturing the file path. I am leaving this answer here should anyone find it useful.

Python multiple if search in one string

I´m a network engineer with no experience in programming, recently in python, but making small improvements everyday.
I need some help in getting multiple matches in IF statements like:
if "access-class 30" in output and "exec-timeout 5 5" in output:
print ('###### ACL VTY OK!!! ######')
Is it possible to check multiple keywords in a single string ?
Thanks for all your time.
Use the all function with a generator expression:
data = ["access-class 30", "exec-timeout 5 5"]
if all(s in output for s in data):
print('###### ACL VTY OK!!! ######')
Yes it is possible.
You can use regular expressions(Regex).
import re
li = [] # List of all the keywords
for l in li
for m in re.finditer(l,output)
if m !=None:
print 'match found'

Add letters to string conditionally

Input: 1 10 avenue
Desired Output: 1 10th avenue
As you can see above I have given an example of an input, as well as the desired output that I would like. Essentially I need to look for instances where there is a number followed by a certain pattern (avenue, street, etc). I have a list which contains all of the patterns and it's called patterns.
If that number does not have "th" after it, I would like to add "th". Simply adding "th" is fine, because other portions of my code will correct it to either "st", "nd", "rd" if necessary.
Examples:
1 10th avenue OK
1 10 avenue NOT OK, TH SHOULD BE ADDED!
I have implemented a working solution, which is this:
def Add_Th(address):
try:
address = address.split(' ')
except AttributeError:
pass
for pattern in patterns:
try:
location = address.index(pattern) - 1
number_location = address[location]
except (ValueError, IndexError):
continue
if 'th' not in number_location:
new = number_location + 'th'
address[location] = new
address = ' '.join(address)
return address
I would like to convert this implementation to regex, as this solution seems a bit messy to me, and occasionally causes some issues. I am not the best with regex, so if anyone could steer me in the right direction that would be greatly appreciated!
Here is my current attempt at the regex implementation:
def add_th(address):
find_num = re.compile(r'(?P<number>[\d]{1,2}(' + "|".join(patterns + ')(?P<following>.*)')
check_th = find_num.search(address)
if check_th is not None:
if re.match(r'(th)', check_th.group('following')):
return address
else:
# this is where I would add th. I know I should use re.sub, i'm just not too sure
# how I would do it
else:
return address
I do not have a lot of experience with regex, so please let me know if any of the work I've done is incorrect, as well as what would be the best way to add "th" to the appropriate spot.
Thanks.
Just one way, finding the positions behind a digit and ahead of one of those pattern words and placing 'th' into them:
>>> address = '1 10 avenue 3 33 street'
>>> patterns = ['avenue', 'street']
>>>
>>> import re
>>> pattern = re.compile(r'(?<=\d)(?= ({}))'.format('|'.join(patterns)))
>>> pattern.sub('th', address)
'1 10th avenue 3 33th street'

regular expression search in python

I am trying to parse some data and just started reading up on regular Expressions so I am pretty new to it. This is the code I have so far
String = "MEASUREMENT 3835 303 Oxygen: 235.78 Saturation: 90.51 Temperature: 24.41 DPhase: 33.07 BPhase: 29.56 RPhase: 0.00 BAmp: 368.57 BPot: 18.00 RAmp: 0.00 RawTem.: 68.21"
String = String.strip('\t\x11\x13')
String = String.split("Oxygen:")
print String[1]
String[1].lstrip
print String[1]
What I am trying to do is to do is remove the oxygen data (235.78) and put it in its own variable using an regular expression search. I realize that there should be an easy solution but I am trying to figure out how regular expressions work and they are making my head hurt. Thanks for any help
Richard
re.search( r"Oxygen: *([\d.]+)", String ).group( 1 )
import re
string = "blabla Oxygen: 10.10 blabla"
regex_oxygen = re.compile('''Oxygen:\W+([0-9.]*)''')
result = re.findall(regex_oxygen,string)
print result
What for?
print String.split()[4]
For general parsing of lists like this one could
import re
String = "MEASUREMENT 3835 303 Oxygen: 235.78 Saturation: 90.51"
String = String.replace(':','')
value_list=re.split("MEASUREMENT\W+[0-9]+\W+[0-9]+\W",String)[1].rstrip().split()
values = dict(zip(value_list[::2],map(float,value_list[1::2])))
I believe the answer to you specific problem has been posted. However I wanted to show you a few ressource for regular expression for python. The python documentation on regular expression is the place to start.
O'reilly also has many good books on the subject, either if you want to understand regular expression deep down or just enough to make things work.
Finally regular-expressions.info is a good ressource for regular expression among mainstream languages. You can even test your regular expression on the website.
I would like to share my ?is this an email? regex expresion, just to inspire you. :)
9 emailregex = "^[a-zA-Z.a-zA-Z]+#mycompany.org$"
10
11 def validateEmail(email):
12 """returns 1 if is an email, 0 if not """
13 # len(x.y#mycompany.org) = 17
14 if len(email)>=17:
15 if re.match(emailregex,email)!= None:
16 return 1
17 return 0

Categories