Regular expression to check if IP is found on 2 ranges - python

Is it possible to write a regular expression as one expression to check if an IP is found on 2 ranges?
I can do this in 2 steps:
if ($ip =~ /$range1/ and $ip =~ /$range2/ ) {
print "intersection"
}
but I wonder if it's possible to do this in one regex:
if ($ip =~ /$my_regex/ ) {
print "intersection";
}

You can use the Module NetAddr::IP:
use strict;
use warnings;
use NetAddr::IP;
my #addresses = (
NetAddr::IP->new('192.168.172.1/255.255.0.0'),
NetAddr::IP->new('10.1.0.0/255.0.0.0'),
);
my $address_to_check = NetAddr::IP->new($IP_TO_CHECK);
foreach my $address_in_list (#addresses) {
if ($address_to_check->within $address_in_list) {
# do something
}
}

Below is a solution in Perl.
Why not use NetAddr::IP and let it handle the thing? For example
#!/usr/bin/perl
use strict;
use warnings;
use NetAddr::IP;
my #addresses = (
new NetAddr::IP '216.239.32.0/255.255.32.0',
new NetAddr::IP '64.157.227.255/255.255.252.0'
);
my $banned = 0;
my $visitor_address = NetAddr::IP->new($visitor_ip);
foreach my $banned_address (#addresses) {
if ($visitor_address->within $banned_address) {
$banned = 1;
last;
}
}
Read the documentation and available methods at: https://metacpan.org/pod/NetAddr::IP

Yes, it is possible to join two independent subexpressions into a single regex using lookahead assertions:
if ($ip =~ /^(?=.*$range1)(?=.*$range2)/s ) {
print "intersection"
}
However, if you really are dealing with IP addresses, you should use a module like NetAddr::IP.

Related

Regex Expression to extract coefficent of linear eq "2x-3y+0.5"

I want to convert my eq to vector form so i need to extract the coefficient
r"[-+]?\d*\.\d+|\d+" I tried this but I am not able to get the "-" sign along with the integer.
For 2x-3y+0 i am getting [2, 3, 0] but I need [2, -3, 0].
The expression in this answer is much better, since it does not capture the + for instance.
Being said that, my guess is your designed expression is also just fine, maybe we'd slightly modify that to:
[-+]?\d+\.\d+|[-+]?\d+
and it might likely work, since validation seems to be unnecessary.
Please see the demo and explanation here.
Test
import re
matches = re.finditer(r"[-+]?\d+\.\d+|[-+]?\d+", "-0.2x-0.73y-0.11z-0.2x-0.73y-0.11")
linear_eq_coeff=[]
for match in matches:
linear_eq_coeff.append(match.group())
print linear_eq_coeff
Output
['-0.2', '-0.73', '-0.11', '-0.2', '-0.73', '-0.11']
Demo
const regex = /[-+]?\d+\.\d+|[-+]?\d+/gm;
const str = `-0.2x-0.73y-0.11z-0.2x-0.73y-0.11`;
let m;
arr = [];
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
arr.push(match);
});
}
console.log(arr);
Advice
wjandrea advice is that:
Could simplify to [-+]?(\d*\.)?\d+
Just use a better regex.
>>>import re
>>>eqn = '2x-3y+0'
>>>re.findall(r'(-?(?:\d+(?:\.\d*)?|\.\d+))', eqn)
['2', '-3', '0']

How to parseIPv6 address in a string using Python 3.x [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
I'm having trouble writing a regular expression that matches valid IPv6 addresses, including those in their compressed form (with :: or leading zeros omitted from each byte pair).
Can someone suggest a regular expression that would fulfill the requirement?
I'm considering expanding each byte pair and matching the result with a simpler regex.
I was unable to get #Factor Mystic's answer to work with POSIX regular expressions, so I wrote one that works with POSIX regular expressions and PERL regular expressions.
It should match:
IPv6 addresses
zero compressed IPv6 addresses (section 2.2 of rfc5952)
link-local IPv6 addresses with zone index (section 11 of rfc4007)
IPv4-Embedded IPv6 Address (section 2 of rfc6052)
IPv4-mapped IPv6 addresses (section 2.1 of rfc2765)
IPv4-translated addresses (section 2.1 of rfc2765)
IPv6 Regular Expression:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
For ease of reading, the following is the above regular expression split at major OR points into separate lines:
# IPv6 RegEx
(
([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}| # 1:2:3:4:5:6:7:8
([0-9a-fA-F]{1,4}:){1,7}:| # 1:: 1:2:3:4:5:6:7::
([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}| # 1::8 1:2:3:4:5:6::8 1:2:3:4:5:6::8
([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}| # 1::7:8 1:2:3:4:5::7:8 1:2:3:4:5::8
([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}| # 1::6:7:8 1:2:3:4::6:7:8 1:2:3:4::8
([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}| # 1::5:6:7:8 1:2:3::5:6:7:8 1:2:3::8
([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}| # 1::4:5:6:7:8 1:2::4:5:6:7:8 1:2::8
[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})| # 1::3:4:5:6:7:8 1::3:4:5:6:7:8 1::8
:((:[0-9a-fA-F]{1,4}){1,7}|:)| # ::2:3:4:5:6:7:8 ::2:3:4:5:6:7:8 ::8 ::
fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}| # fe80::7:8%eth0 fe80::7:8%1 (link-local IPv6 addresses with zone index)
::(ffff(:0{1,4}){0,1}:){0,1}
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])| # ::255.255.255.255 ::ffff:255.255.255.255 ::ffff:0:255.255.255.255 (IPv4-mapped IPv6 addresses and IPv4-translated addresses)
([0-9a-fA-F]{1,4}:){1,4}:
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]) # 2001:db8:3:4::192.0.2.33 64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address)
)
# IPv4 RegEx
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])
To make the above easier to understand, the following "pseudo" code replicates the above:
IPV4SEG = (25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])
IPV4ADDR = (IPV4SEG\.){3,3}IPV4SEG
IPV6SEG = [0-9a-fA-F]{1,4}
IPV6ADDR = (
(IPV6SEG:){7,7}IPV6SEG| # 1:2:3:4:5:6:7:8
(IPV6SEG:){1,7}:| # 1:: 1:2:3:4:5:6:7::
(IPV6SEG:){1,6}:IPV6SEG| # 1::8 1:2:3:4:5:6::8 1:2:3:4:5:6::8
(IPV6SEG:){1,5}(:IPV6SEG){1,2}| # 1::7:8 1:2:3:4:5::7:8 1:2:3:4:5::8
(IPV6SEG:){1,4}(:IPV6SEG){1,3}| # 1::6:7:8 1:2:3:4::6:7:8 1:2:3:4::8
(IPV6SEG:){1,3}(:IPV6SEG){1,4}| # 1::5:6:7:8 1:2:3::5:6:7:8 1:2:3::8
(IPV6SEG:){1,2}(:IPV6SEG){1,5}| # 1::4:5:6:7:8 1:2::4:5:6:7:8 1:2::8
IPV6SEG:((:IPV6SEG){1,6})| # 1::3:4:5:6:7:8 1::3:4:5:6:7:8 1::8
:((:IPV6SEG){1,7}|:)| # ::2:3:4:5:6:7:8 ::2:3:4:5:6:7:8 ::8 ::
fe80:(:IPV6SEG){0,4}%[0-9a-zA-Z]{1,}| # fe80::7:8%eth0 fe80::7:8%1 (link-local IPv6 addresses with zone index)
::(ffff(:0{1,4}){0,1}:){0,1}IPV4ADDR| # ::255.255.255.255 ::ffff:255.255.255.255 ::ffff:0:255.255.255.255 (IPv4-mapped IPv6 addresses and IPv4-translated addresses)
(IPV6SEG:){1,4}:IPV4ADDR # 2001:db8:3:4::192.0.2.33 64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address)
)
I posted a script on GitHub which tests the regular expression: https://gist.github.com/syzdek/6086792
The following will validate IPv4, IPv6 (full and compressed), and IPv6v4 (full and compressed) addresses:
'/^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$/iD'
It sounds like you may be using Python. If so, you can use something like this:
import socket
def check_ipv6(n):
try:
socket.inet_pton(socket.AF_INET6, n)
return True
except socket.error:
return False
print check_ipv6('::1') # True
print check_ipv6('foo') # False
print check_ipv6(5) # TypeError exception
print check_ipv6(None) # TypeError exception
I don't think you have to have IPv6 compiled in to Python to get inet_pton, which can also parse IPv4 addresses if you pass in socket.AF_INET as the first parameter. Note: this may not work on non-Unix systems.
From "IPv6 regex":
(\A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,6}\Z)|
(\A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,5}\Z)|
(\A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,4}\Z)|
(\A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,3}\Z)|
(\A([0-9a-f]{1,4}:){1,5}(:[0-9a-f]{1,4}){1,2}\Z)|
(\A([0-9a-f]{1,4}:){1,6}(:[0-9a-f]{1,4}){1,1}\Z)|
(\A(([0-9a-f]{1,4}:){1,7}|:):\Z)|
(\A:(:[0-9a-f]{1,4}){1,7}\Z)|
(\A((([0-9a-f]{1,4}:){6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})\Z)|
(\A(([0-9a-f]{1,4}:){5}[0-9a-f]{1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})\Z)|
(\A([0-9a-f]{1,4}:){5}:[0-9a-f]{1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,3}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,2}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,1}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A(([0-9a-f]{1,4}:){1,5}|:):(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A:(:[0-9a-f]{1,4}){1,5}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)
This catches the loopback(::1) as well and ipv6 addresses.
changed {} to + and put : inside the first square bracket.
([a-f0-9:]+:+)+[a-f0-9]+
tested on with ifconfig -a output
http://regexr.com/
Unix or Mac OSx terminal o option returns only the matching output(ipv6) including ::1
ifconfig -a | egrep -o '([a-f0-9:]+:+)+[a-f0-9]+'
Get All IP addresses (IPv4 OR IPv6) and print match on unix OSx term
ifconfig -a | egrep -o '([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}) | (([a-f0-9:]+:+)+[a-f0-9]+)'
I'd have to strongly second the answer from Frank Krueger.
Whilst you say you need a regular expression to match an IPv6 address, I'm assuming what you really need is to be able to check if a given string is a valid IPv6 address. There is a subtle but important distinction here.
There is more than one way to check if a given string is a valid IPv6 address and regular expression matching is only one solution.
Use an existing library if you can. The library will have fewer bugs and its use will result in less code for you to maintain.
The regular expression suggested by Factor Mystic is long and complex. It most likely works, but you should also consider how you'd cope if it unexpectedly fails. The point I'm trying to make here is that if you can't form a required regular expression yourself you won't be able to easily debug it.
If you have no suitable library it may be better to write your own IPv6 validation routine that doesn't depend on regular expressions. If you write it you understand it and if you understand it you can add comments to explain it so that others can also understand and subsequently maintain it.
Act with caution when using a regular expression whose functionality you can't explain to someone else.
I'm not an Ipv6 expert but I think you can get a pretty good result more easily with this one:
^([0-9A-Fa-f]{0,4}:){2,7}([0-9A-Fa-f]{1,4}$|((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4})$
to answer "is a valid ipv6" it look like ok to me. To break it down in parts... forget it. I've omitted the unspecified one (::) since there is no use to have "unpecified adress" in my database.
the beginning:
^([0-9A-Fa-f]{0,4}:){2,7} <-- match the compressible part, we can translate this as: between 2 and 7 colon who may have heaxadecimal number between them.
followed by:
[0-9A-Fa-f]{1,4}$ <-- an hexadecimal number (leading 0 omitted)
OR
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4} <-- an Ipv4 adress
This regular expression will match valid IPv6 and IPv4 addresses in accordance with GNU C++ implementation of regex with REGULAR EXTENDED mode used:
"^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:)))(%.+)?\s*$"
A simple regex that will match, but I wouldn't recommend for validation of any sort is this:
([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4}
Note this matches compression anywhere in the address, though it won't match the loopback address ::1. I find this a reasonable compromise in order to keep the regex simple.
I successfully use this in iTerm2 smart selection rules to quad-click IPv6 addresses.
Beware! In Java, the use of InetAddress and related classes (Inet4Address, Inet6Address, URL) may involve network trafic! E.g. DNS resolving (URL.equals, InetAddress from string!). This call may take long and is blocking!
For IPv6 I have something like this. This of course does not handle the very subtle details of IPv6 like that zone indices are allowed only on some classes of IPv6 addresses. And this regex is not written for group capturing, it is only a "matches" kind of regexp.
S - IPv6 segment = [0-9a-f]{1,4}
I - IPv4 = (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})
Schematic (first part matches IPv6 addresses with IPv4 suffix, second part matches IPv6 addresses, last patrt the zone index):
(
(
::(S:){0,5}|
S::(S:){0,4}|
(S:){2}:(S:){0,3}|
(S:){3}:(S:){0,2}|
(S:){4}:(S:)?|
(S:){5}:|
(S:){6}
)
I
|
:(:|(:S){1,7})|
S:(:|(:S){1,6})|
(S:){2}(:|(:S){1,5})|
(S:){3}(:|(:S){1,4})|
(S:){4}(:|(:S){1,3})|
(S:){5}(:|(:S){1,2})|
(S:){6}(:|(:S))|
(S:){7}:|
(S:){7}S
)
(?:%[0-9a-z]+)?
And here the might regex (case insensitive, surround with what ever needed like beginning/end of line, etc.):
(?:
(?:
::(?:[0-9a-f]{1,4}:){0,5}|
[0-9a-f]{1,4}::(?:[0-9a-f]{1,4}:){0,4}|
(?:[0-9a-f]{1,4}:){2}:(?:[0-9a-f]{1,4}:){0,3}|
(?:[0-9a-f]{1,4}:){3}:(?:[0-9a-f]{1,4}:){0,2}|
(?:[0-9a-f]{1,4}:){4}:(?:[0-9a-f]{1,4}:)?|
(?:[0-9a-f]{1,4}:){5}:|
(?:[0-9a-f]{1,4}:){6}
)
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9]{1,2})|
:(?::|(?::[0-9a-f]{1,4}){1,7})|
[0-9a-f]{1,4}:(?::|(?::[0-9a-f]{1,4}){1,6})|
(?:[0-9a-f]{1,4}:){2}(?::|(?::[0-9a-f]{1,4}){1,5})|
(?:[0-9a-f]{1,4}:){3}(?::|(?::[0-9a-f]{1,4}){1,4})|
(?:[0-9a-f]{1,4}:){4}(?::|(?::[0-9a-f]{1,4}){1,3})|
(?:[0-9a-f]{1,4}:){5}(?::|(?::[0-9a-f]{1,4}){1,2})|
(?:[0-9a-f]{1,4}:){6}(?::|(?::[0-9a-f]{1,4}))|
(?:[0-9a-f]{1,4}:){7}:|
(?:[0-9a-f]{1,4}:){7}[0-9a-f]{1,4}
)
(?:%[0-9a-z]+)?
If you use Perl try Net::IPv6Addr
use Net::IPv6Addr;
if( defined Net::IPv6Addr::is_ipv6($ip_address) ){
print "Looks like an ipv6 address\n";
}
NetAddr::IP
use NetAddr::IP;
my $obj = NetAddr::IP->new6($ip_address);
Validate::IP
use Validate::IP qw'is_ipv6';
if( is_ipv6($ip_address) ){
print "Looks like an ipv6 address\n";
}
Following regex is for IPv6 only. Group 1 matches with the IP.
(([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4})
Regexes for ipv6 can get really tricky when you consider addresses with embedded ipv4 and addresses that are compressed, as you can see from some of these answers.
The open-source IPAddress Java library will validate all standard representations of IPv6 and IPv4 and also supports prefix-length (and validation of such). Disclaimer: I am the project manager of that library.
Code example:
try {
IPAddressString str = new IPAddressString("::1");
IPAddress addr = str.toAddress();
if(addr.isIPv6() || addr.isIPv6Convertible()) {
IPv6Address ipv6Addr = addr.toIPv6();
}
//use address
} catch(AddressStringException e) {
//e.getMessage has validation error
}
In Scala use the well known Apache Commons validators.
http://mvnrepository.com/artifact/commons-validator/commons-validator/1.4.1
libraryDependencies += "commons-validator" % "commons-validator" % "1.4.1"
import org.apache.commons.validator.routines._
/**
* Validates if the passed ip is a valid IPv4 or IPv6 address.
*
* #param ip The IP address to validate.
* #return True if the passed IP address is valid, false otherwise.
*/
def ip(ip: String) = InetAddressValidator.getInstance().isValid(ip)
Following the test's of the method ip(ip: String):
"The `ip` validator" should {
"return false if the IPv4 is invalid" in {
ip("123") must beFalse
ip("255.255.255.256") must beFalse
ip("127.1") must beFalse
ip("30.168.1.255.1") must beFalse
ip("-1.2.3.4") must beFalse
}
"return true if the IPv4 is valid" in {
ip("255.255.255.255") must beTrue
ip("127.0.0.1") must beTrue
ip("0.0.0.0") must beTrue
}
//IPv6
//#see: http://www.ronnutter.com/ipv6-cheatsheet-on-identifying-valid-ipv6-addresses/
"return false if the IPv6 is invalid" in {
ip("1200::AB00:1234::2552:7777:1313") must beFalse
}
"return true if the IPv6 is valid" in {
ip("1200:0000:AB00:1234:0000:2552:7777:1313") must beTrue
ip("21DA:D3:0:2F3B:2AA:FF:FE28:9C5A") must beTrue
}
}
Looking at the patterns included in the other answers there are a number of good patterns that can be improved by referencing groups and utilizing lookaheads. Here is an example of a pattern that is self referencing that I would utilize in PHP if I had to:
^(?<hgroup>(?<hex>[[:xdigit:]]{0,4}) # grab a sequence of up to 4 hex digits
# and name this pattern for usage later
(?<!:::):{1,2}) # match 1 or 2 ':' characters
# as long as we can't match 3
(?&hgroup){1,6} # match our hex group 1 to 6 more times
(?:(?:
# match an ipv4 address or
(?<dgroup>2[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3}(?&dgroup)
# match our hex group one last time
|(?&hex))$
Note: PHP has a built in filter for this which would be a better solution than this
pattern.
Regex101 Analysis
Depending on your needs, an approximation like:
[0-9a-f:]+
may be enough (as with simple log file grepping, for example.)
I generated the following using python and works with the re module. The look-ahead assertions ensure that the correct number of dots or colons appear in the address. It does not support IPv4 in IPv6 notation.
pattern = '^(?=\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$)(?:(?:25[0-5]|[12][0-4][0-9]|1[5-9][0-9]|[1-9]?[0-9])\.?){4}$|(?=^(?:[0-9a-f]{0,4}:){2,7}[0-9a-f]{0,4}$)(?![^:]*::.+::[^:]*$)(?:(?=.*::.*)|(?=\w+:\w+:\w+:\w+:\w+:\w+:\w+:\w+))(?:(?:^|:)(?:[0-9a-f]{4}|[1-9a-f][0-9a-f]{0,3})){0,8}(?:::(?:[0-9a-f]{1,4}(?:$|:)){0,6})?$'
result = re.match(pattern, ip)
if result: result.group(0)
In Java, you can use the library class sun.net.util.IPAddressUtil:
IPAddressUtil.isIPv6LiteralAddress(iPaddress);
It is difficult to find a regular expression which works for all IPv6 cases. They are usually hard to maintain, not easily readable and may cause performance problems. Hence, I want to share an alternative solution which I have developed: Regular Expression (RegEx) for IPv6 Separate from IPv4
Now you may ask that "This method only finds IPv6, how can I find IPv6 in a text or file?" Here are methods for this issue too.
Note: If you do not want to use IPAddress class in .NET, you can also replace it with my method. It also covers mapped IPv4 and special cases too, while IPAddress does not cover.
class IPv6
{
public List<string> FindIPv6InFile(string filePath)
{
Char ch;
StringBuilder sbIPv6 = new StringBuilder();
List<string> listIPv6 = new List<string>();
StreamReader reader = new StreamReader(filePath);
do
{
bool hasColon = false;
int length = 0;
do
{
ch = (char)reader.Read();
if (IsEscapeChar(ch))
break;
//Check the first 5 chars, if it has colon, then continue appending to stringbuilder
if (!hasColon && length < 5)
{
if (ch == ':')
{
hasColon = true;
}
sbIPv6.Append(ch.ToString());
}
else if (hasColon) //if no colon in first 5 chars, then dont append to stringbuilder
{
sbIPv6.Append(ch.ToString());
}
length++;
} while (!reader.EndOfStream);
if (hasColon && !listIPv6.Contains(sbIPv6.ToString()) && IsIPv6(sbIPv6.ToString()))
{
listIPv6.Add(sbIPv6.ToString());
}
sbIPv6.Clear();
} while (!reader.EndOfStream);
reader.Close();
reader.Dispose();
return listIPv6;
}
public List<string> FindIPv6InText(string text)
{
StringBuilder sbIPv6 = new StringBuilder();
List<string> listIPv6 = new List<string>();
for (int i = 0; i < text.Length; i++)
{
bool hasColon = false;
int length = 0;
do
{
if (IsEscapeChar(text[length + i]))
break;
//Check the first 5 chars, if it has colon, then continue appending to stringbuilder
if (!hasColon && length < 5)
{
if (text[length + i] == ':')
{
hasColon = true;
}
sbIPv6.Append(text[length + i].ToString());
}
else if (hasColon) //if no colon in first 5 chars, then dont append to stringbuilder
{
sbIPv6.Append(text[length + i].ToString());
}
length++;
} while (i + length != text.Length);
if (hasColon && !listIPv6.Contains(sbIPv6.ToString()) && IsIPv6(sbIPv6.ToString()))
{
listIPv6.Add(sbIPv6.ToString());
}
i += length;
sbIPv6.Clear();
}
return listIPv6;
}
bool IsEscapeChar(char ch)
{
if (ch != ' ' && ch != '\r' && ch != '\n' && ch!='\t')
{
return false;
}
return true;
}
bool IsIPv6(string maybeIPv6)
{
IPAddress ip;
if (IPAddress.TryParse(maybeIPv6, out ip))
{
return ip.AddressFamily == AddressFamily.InterNetworkV6;
}
else
{
return false;
}
}
}
InetAddressUtils has all the patterns defined. I ended-up using their pattern directly, and am pasting it here for reference:
private static final String IPV4_BASIC_PATTERN_STRING =
"(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\\.){3}" + // initial 3 fields, 0-255 followed by .
"([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])"; // final field, 0-255
private static final Pattern IPV4_PATTERN =
Pattern.compile("^" + IPV4_BASIC_PATTERN_STRING + "$");
private static final Pattern IPV4_MAPPED_IPV6_PATTERN = // TODO does not allow for redundant leading zeros
Pattern.compile("^::[fF]{4}:" + IPV4_BASIC_PATTERN_STRING + "$");
private static final Pattern IPV6_STD_PATTERN =
Pattern.compile(
"^[0-9a-fA-F]{1,4}(:[0-9a-fA-F]{1,4}){7}$");
private static final Pattern IPV6_HEX_COMPRESSED_PATTERN =
Pattern.compile(
"^(([0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,5})?)" + // 0-6 hex fields
"::" +
"(([0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,5})?)$"); // 0-6 hex fields
Using Ruby? Try this:
/^(((?=.*(::))(?!.*\3.+\3))\3?|[\dA-F]{1,4}:)([\dA-F]{1,4}(\3|:\b)|\2){5}(([\dA-F]{1,4}(\3|:\b|$)|\2){2}|(((2[0-4]|1\d|[1-9])?\d|25[0-5])\.?\b){4})\z/i
For PHP 5.2+ users filter_var works great.
I know this doesn't answer the original question (specifically a regex solution), but I post this in the hope it may help someone else in the future.
$is_ip4address = (filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV4) !== FALSE);
$is_ip6address = (filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6) !== FALSE);
Here's what I came up with, using a bit of lookahead and named groups. This is of course just IPv6, but it shouldn't interfere with additional patterns if you want to add IPv4:
(?=([0-9a-f]+(:[0-9a-f])*)?(?P<wild>::)(?!([0-9a-f]+:)*:))(::)?([0-9a-f]{1,4}:{1,2}){0,6}(?(wild)[0-9a-f]{0,4}|[0-9a-f]{1,4}:[0-9a-f]{1,4})
Just matching local ones from an origin with square brackets included. I know it's not as comprehensive but in javascript the other ones had difficult to trace issues primarily that of not working, so this seems to get me what I needed for now. extra capitals A-F aren't needed either.
^\[([0-9a-fA-F]{1,4})(\:{1,2})([0-9a-fA-F]{1,4})(\:{1,2})([0-9a-fA-F]{1,4})(\:{1,2})([0-9a-fA-F]{1,4})(\:{1,2})([0-9a-fA-F]{1,4})\]
Jinnko's version is simplified and better I see.
As stated above, another way to get an IPv6 textual representation validating parser is to use programming. Here is one that is fully compliant with RFC-4291 and RFC-5952. I've written this code in ANSI C (works with GCC, passed tests on Linux - works with clang, passed tests on FreeBSD). Thus, it does only rely on the ANSI C standard library, so it can be compiled everywhere (I've used it for IPv6 parsing inside a kernel module with FreeBSD).
// IPv6 textual representation validating parser fully compliant with RFC-4291 and RFC-5952
// BSD-licensed / Copyright 2015-2017 Alexandre Fenyo
#include <string.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
typedef enum { false, true } bool;
static const char hexdigits[] = "0123456789abcdef";
static int digit2int(const char digit) {
return strchr(hexdigits, digit) - hexdigits;
}
// This IPv6 address parser handles any valid textual representation according to RFC-4291 and RFC-5952.
// Other representations will return -1.
//
// note that str input parameter has been modified when the function call returns
//
// parse_ipv6(char *str, struct in6_addr *retaddr)
// parse textual representation of IPv6 addresses
// str: input arg
// retaddr: output arg
int parse_ipv6(char *str, struct in6_addr *retaddr) {
bool compressed_field_found = false;
unsigned char *_retaddr = (unsigned char *) retaddr;
char *_str = str;
char *delim;
bzero((void *) retaddr, sizeof(struct in6_addr));
if (!strlen(str) || strchr(str, ':') == NULL || (str[0] == ':' && str[1] != ':') ||
(strlen(str) >= 2 && str[strlen(str) - 1] == ':' && str[strlen(str) - 2] != ':')) return -1;
// convert transitional to standard textual representation
if (strchr(str, '.')) {
int ipv4bytes[4];
char *curp = strrchr(str, ':');
if (curp == NULL) return -1;
char *_curp = ++curp;
int i;
for (i = 0; i < 4; i++) {
char *nextsep = strchr(_curp, '.');
if (_curp[0] == '0' || (i < 3 && nextsep == NULL) || (i == 3 && nextsep != NULL)) return -1;
if (nextsep != NULL) *nextsep = 0;
int j;
for (j = 0; j < strlen(_curp); j++) if (_curp[j] < '0' || _curp[j] > '9') return -1;
if (strlen(_curp) > 3) return -1;
const long val = strtol(_curp, NULL, 10);
if (val < 0 || val > 255) return -1;
ipv4bytes[i] = val;
_curp = nextsep + 1;
}
sprintf(curp, "%x%02x:%x%02x", ipv4bytes[0], ipv4bytes[1], ipv4bytes[2], ipv4bytes[3]);
}
// parse standard textual representation
do {
if ((delim = strchr(_str, ':')) == _str || (delim == NULL && !strlen(_str))) {
if (delim == str) _str++;
else if (delim == NULL) return 0;
else {
if (compressed_field_found == true) return -1;
if (delim == str + strlen(str) - 1 && _retaddr != (unsigned char *) (retaddr + 1)) return 0;
compressed_field_found = true;
_str++;
int cnt = 0;
char *__str;
for (__str = _str; *__str; ) if (*(__str++) == ':') cnt++;
unsigned char *__retaddr = - 2 * ++cnt + (unsigned char *) (retaddr + 1);
if (__retaddr <= _retaddr) return -1;
_retaddr = __retaddr;
}
} else {
char hexnum[4] = "0000";
if (delim == NULL) delim = str + strlen(str);
if (delim - _str > 4) return -1;
int i;
for (i = 0; i < delim - _str; i++)
if (!isxdigit(_str[i])) return -1;
else hexnum[4 - (delim - _str) + i] = tolower(_str[i]);
_str = delim + 1;
*(_retaddr++) = (digit2int(hexnum[0]) << 4) + digit2int(hexnum[1]);
*(_retaddr++) = (digit2int(hexnum[2]) << 4) + digit2int(hexnum[3]);
}
} while (_str < str + strlen(str));
return 0;
}
The regex allows the use of leading zeros in the IPv4 parts.
Some Unix and Mac distros convert those segments into octals.
I suggest using 25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d as an IPv4 segment.
This will work for IPv4 and IPv6:
^(([0-9a-f]{0,4}:){1,7}[0-9a-f]{1,4}|([0-9]{1,3}\.){3}[0-9]{1,3})$
You can use the ipextract shell tools I made for this purpose. They are based on regexp and grep.
Usage:
$ ifconfig | ipextract6
fe80::1%lo0
::1
fe80::7ed1:c3ff:feec:dee1%en0
Try this small one-liner. It should only match valid uncompressed/compressed IPv6 addresses (no IPv4 hybrids)
/(?!.*::.*::)(?!.*:::.*)(?!:[a-f0-9])((([a-f0-9]{1,4})?[:](?!:)){7}|(?=(.*:[:a-f0-9]{1,4}::|^([:a-f0-9]{1,4})?::))(([a-f0-9]{1,4})?[:]{1,2}){1,6})[a-f0-9]{1,4}/
If you want only normal IP-s (no slashes), here:
^(?:[0-9a-f]{1,4}(?:::)?){0,7}::[0-9a-f]+$
I use it for my syntax highlighter in hosts file editor application. Works as charm.

How can I report any regexp which never matched in the whole script?

I often loop over lines in a file and apply several regexp substitutions, where I sometimes make mistakes so that one of these expressions never matches on any line.
How can I find out which regexp didn't match without cluttering my code with checks? Does any scripting language provide metaprogramming facilities or debugging facilities for that?
Example input:
foo
bar
baz
Example script (pseudocode):
for each line of the file:
s/foo/lorem/
s/bazzz/ipsum/ # this never matches on any line and should get reported
Edit: I prefer Mark Thomas' solution, because I want the file to be read line by line and stops applying substitutions after the first match. Next time I should make my requirements clearer. A metaprogramming solution would have additional benefits, because I often do more complex case-specific processing line by line, although I think given the inspirations from the answers I can probably come up with a ruby extension method myself so that I can replace gsub! with gsub_debug! for debugging and get a report of all non-matching regular expressions when the program finished running.
In Ruby, gsub! modifies the string in place and returns nil if the pattern hasn't been found:
text = "foo
bar
baz"
replacements = [['foo', 'lorem'], ['bazzz', 'ipsum']]
# or with regexen:
replacements = [[/foo/, 'lorem'], [/bazzz/, 'ipsum']]
replacements.each do |pattern, replacement|
unless text.gsub!(pattern, replacement)
puts "#WARNING: #{pattern} wasn't found"
end
end
puts text
It outputs:
WARNING: bazzz wasn't found
lorem
bar
baz
Note that applying replacements one after the other can lead to bugs.
Here's a Ruby script that:
Reads the substitutions from a simple delimited file
Reads the file to process from the command line
Stops applying substitutions to a line once one matches
Reports which patterns did not match
Currently, it prints the output but it can be changed to write to a file.
substitutions.txt
foo lorem
bazzz ipsum
qux notfound
example.txt
The foo and bazzz
The foo
The bazzz
and the ugly
subs.rb, invocation: ruby subs.rb example.txt
filename = ARGV[0]
substitutions = File.readlines("substitutions.txt").map(&:split)
used = {}
IO.foreach(filename) do |line|
substitutions.each do |pattern, replacement|
if line.gsub!(pattern, replacement)
used[pattern] = true
break #no more substitutions for this line
end
end
puts line
end
unused = substitutions.map(&:first) - used.keys
unless unused.empty?
puts "Unused patterns:"
puts unused
end
Output:
The lorem and bazzz
The lorem
The ipsum
and the ugly
Unused patterns:
qux
Not really metaprogramming, but here's a Perl version which counts how many lines each pattern matches. It doesn't modify the input data or the patterns and only keeps one line of input in memory at a time:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
my #patterns = qw( foo bazzz );
my %matches;
for my $line (<DATA>) {
for my $pat (#patterns) {
if ($line =~ /$pat/) {
$matches{$pat}++;
}
}
}
for my $pat (sort #patterns) {
say "$pat matched no lines" unless $matches{$pat};
}
__DATA__
foo
bar
baz
Output:
bazzz matched no lines
Edit: How careless of me. You want to do substitutions, not matches! That actually makes it a little simpler, since the Perl regex substitution operator returns the number of substitutions performed. Here's a modified version which does that:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
my %patterns = ( foo => 'lorem', bazzz => 'ipsum' );
my %matches;
for my $line (<DATA>) {
for my $from (keys %patterns) {
my $to = $patterns{$from};
$matches{$from} += $line =~ s/$from/$to/g;
}
}
for my $pat (sort keys %patterns) {
say "$pat matched no lines" unless $matches{$pat};
}
__DATA__
foo
bar
baz
output
bazzz matched no lines
All you need is:
awk '
BEGIN {
map["foo"] = "lorem"
map["bazzz"] = "ipsum"
}
{
for (re in map) {
cnt[re] += gsub(re,map[re])
}
print
}
END {
for (re in map) {
print re, cnt[re]+0 | "cat>&2"
}
}
' file
The above will print to stderr you how many times each substitution was made - massage to suit, e.g.:
END {
for (re in map) {
if ( cnt[re] == 0 ) {
print "WARNING: never matched", re | "cat>&2"
}
}
}
It only keeps one line of the file at a time in memory.

Regular expression in python replace

Is there any way using regular expression in python to replace all the occurrences of , (comma) after the flower braces {
Data is of the following format in a file - abc.json
{
"Key1":"value1",
"Key2":"value2"
},
{
"Key1":"value3",
"Key2":"value4"
},
{
"Key1":"value5",
"Key2":"value6"
}
This should result in following -
{
"Key1":"value1",
"Key2":"value2"
}
{
"Key1":"value3",
"Key2":"value4"
}
{
"Key1":"value5",
"Key2":"value6"
}
As you can see the ,(comma) has been removed after every braces }.
Would be helpful if this can be achieved via jq as well, apart from python REGEX
Test Source: https://regex101.com/r/wT6uU2/1
import re
p = re.compile(ur'},')
test_str = u"{\n\"Key1\":\"value1\",\n\"Key2\":\"value2\"\n},\n\n{\n\"Key1\":\"value3\",\n\"Key2\":\"value4\"\n},\n\n{\n\"Key1\":\"value5\",\n\"Key2\":\"value6\"\n}"
re.findall(p, test_str)
But use replace instead
replace }, -> }
This works:
import re
s="""{
"Key1":"value1",
"Key2":"value2"
},
{
"Key1":"value3",
"Key2":"value4"
},
{
"Key1":"value5",
"Key2":"value6"
}"""
pattern=re.compile(r'(?P<data>{.*?}),', re.S)
print pattern.findall(s)
s1=pattern.sub(r'\g<data>', s)
print s1
If you intend to process the resulting JSON in jq, it's probably easier to wrap it in brackets [{...}, {...}] to make it a JSON array. Then, you can use .[] in jq to unwrap the array.
Before you even consider other options, you really should go back to the source that generated that file and make sure it actually outputs valid json.
That said, you could use JQ to manipulate the contents as a raw string to add brackets, then parse it as an array to them spit out the contents.
$ jq -Rs '"[\(.)]" | fromjson[]' abc.json

how do i do a range regex in ruby like awk /start/,/stop/

I want to do an AWK-style range regex like this:
awk ' /hoststatus/,/\}/' file
In AWK this would print all the lines between the two patterns in a file:
hoststatus {
host_name=myhost
modified_attributes=0
check_command=check-host-alive
check_period=24x7
notification_period=workhours
check_interval=5.000000
retry_interval=1.000000
event_handler=
}
How do I do that in Ruby?
Bonus: How would you do it in Python?
This is really powerful in AWK, but I'm new to Ruby, and not sure how you'd do it. In Python I was also unable to find a solution.
Ruby:
str =
"drdxrdx
hoststatus {
host_name=myhost
modified_attributes=0
check_command=check-host-alive
check_period=24x7
notification_period=workhours
check_interval=5.000000
retry_interval=1.000000
event_handler=
}"
str.each_line do |line|
print line if line =~ /hoststatus/..line =~ /\}/
end
This is the infamous flip-flop.
with python passing in the multiline and dotall flags to re. The ? following the * makes it non-greedy
>>> import re
>>> with open('test.x') as f:
... print re.findall('^hoststatus.*?\n\}$', f.read(), re.DOTALL + re.MULTILINE)

Categories