Python remove common word from list - python

I need to remove common word from the list. The word which need to be removed is IPNetwork.
IP_list = [IPNetwork('10.60.252.0/23'),
IPNetwork('10.60.254.0/23'),
IPNetwork('10.208.0.0/15'),
IPNetwork('10.208.64.80/28'),
IPNetwork('10.208.152.0/24'),
IPNetwork('10.208.153.0/24'),
IPNetwork('10.208.154.0/24'),
IPNetwork('10.208.155.128/25'),
IPNetwork('10.208.156.0/24')]
expected result:
['10.60.252.0/23',
'10.60.254.0/23',
'10.208.0.0/15',
'10.208.64.80/28',
'10.208.152.0/24',
'10.208.153.0/24',
'10.208.154.0/24',
'10.208.155.128/25',
'10.208.156.0/24']

IPNetwork is a class and you are instantiating objects of it. If you just want the ip addresses in string format, convert them to string explicitly
Using a map
>>> list(map(str, IP_list))
['10.60.252.0/23', '10.60.254.0/23', '10.208.0.0/15', '10.208.64.80/28', '10.208.152.0/24', '10.208.153.0/24', '10.208.154.0/24', '10.208.155.128/25', '10.208.156.0/24']
Or using list comprehension
>>> [str(ip) for ip in IP_list]
['10.60.252.0/23', '10.60.254.0/23', '10.208.0.0/15', '10.208.64.80/28', '10.208.152.0/24', '10.208.153.0/24', '10.208.154.0/24', '10.208.155.128/25', '10.208.156.0/24']

Related

How to generate substrings of a dot-separated string?

I am struggling to generate a list of substrings from a given list of strings.
I have a list of domains -
domains = ["abc.pqr.com", "pqr.yum.abc.com"]
Now, for each domain in the list I want to generate subdomains.
For example the subdomains of domain "abc.pqr.com" would be
["pqr.com", "abc.pqr.com"]
Also, for domain "pqr.yum.abc.com" the subdomains would be
["yum.abc.com", "pqr.yum.abc.com", "abc.com"]
So the out put of the method would be -
["yum.abc.com", "pqr.yum.abc.com", "abc.com", "pqr.com", "abc.pqr.com"]
First you have to iterate on elements then split your element by the '.' seperator. After that in order to keep the 'com' element intact, we iterate on the range - 1. After creating every alternative, we join the result again with the seperator "."
domains = ["abc.pqr.com", "pqr.yum.abc.com"]
domains_new = []
for d in domains:
liste = d.split(".")
for i in range(len(liste)-1):
domains_new.append(liste[i:])
domains_new = [".".join(ele) for ele in domains_new]
domains_new
output:
['abc.pqr.com', 'pqr.com', 'pqr.yum.abc.com', 'yum.abc.com', 'abc.com']
Assuming the domains only contain simple tlds like .com and no second-level domains like .co.uk, you can use a python list comprehension.
[domain.split(".", x)[-1] for domain in domains for x in range(domain.count("."))]
domains = ["abc.pqr.com", "pqr.yum.abc.com"]
lst = []
for i in domains:
splits = i.split('.')
for j in range(len(splits),1,-1):
lst.append('.'.join(splits[-j:]))
I did something similar to https://stackoverflow.com/users/12959241/alphabetagamma

Split List Elements in byte format to separate bytes in python

I have a list with byte elements like this:
list = [b'\x00\xcc\n', b'\x14I\x8dy_\xeb\xbc1C']
Now I want to separate all bytes like following:
list_new =[b'\x00', b'\xcc', b'\x14I', b'\x8dy_', b'\xeb', b'\xbc1C']
I am assuming here that you wanted to split the data with split criteria of '\x', this seems to be matching with your desired output. Let me know otherwise. Also I am not sure why you got this type of string, its little awkward to work with. A bigger context on the question might be more helpful. Nevertheless, I tried to get your desired output in following way:(May be not efficient but gets your job done).
import re
from codecs import encode
lists = [b'\x00\xcc\n', b'\x14I\x8dy_\xeb\xbc1C']
split = [re.split(r'(?=\\x)', str(item)) for item in lists] ## splitting with assumption of \x using lookarounds here
output = [] ## container to save the final item
for item in split: ## split is list of lists hence required two for loops
for nitem in item:
if nitem != "b'": ## remove anything which has only "b'"
output.append(nitem.replace('\\n','').replace("'",'').encode()) ## finally appending everyitem
## Note here that output contains two backward slashes , to remove them we use encode function from codecs module
## like below
[encode(itm.decode('unicode_escape'), 'raw_unicode_escape') for itm in output] ## Final output
Output:
[b'\x00', b'\xcc', b'\x14I', b'\x8dy_', b'\xeb', b'\xbc1C']

function call the convert a list is alpha characters to numeric

I am trying a manual implementation of the Soundex Algorithm and this requires converting alpha text characters to numeric text characters. I have defined the following function:
import re
def sub_pattern(text):
sub = [str(i) for i in range(1,4)]
string = text
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
return(string)
This function will encode abc characters to 1 and xyz characters to 2. However, it only works for a single string and I need to pass a list of strings to the function. I've gotten the results I want using:
list(map(sub_pattern, ['aab', 'axy', 'bzz']
But I want to be able to pass the list to the function directly. I've tried this with no success as it ends only returning the first string from the list.
def sub_pattern(text_list):
all_encoded = []
sub = [str(i) for i in range(1,4)]
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for string in text_list:
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
all_encoded.append(string)
A couple things to note:
Because I am implementing the Soundex Algorithm, the order of the text when I encode it matters. I would prefer to update the string character at its orginal index to avoid having to reorganize it afterwards. In other words, you can't do any sorting to the string...I've created the iterator to incrementally update the string and it only grabs the next regex pattern if all the characters have not already been converted.
This function will be a part of two custom classes that I am creating. Both will call the __iter__ method so that I can created the iterable. That's why I use the iter() function to create an iterable because it will create a new instance if the iterator automatically.
I know this may seem like a trivial issue relative to what I'm doing, but I'm stuck.
Thank you in advance.
How about using your own function recursively? You get to keep the original exactly as it is, in case you needed it:
import re
def sub_pattern(text):
if isinstance(text, str):
sub = [str(i) for i in range(1,4)]
string = text
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
return(string)
else:
return([sub_pattern(t) for t in text])
print(list(map(sub_pattern, ['aab', 'axy', 'bzz']))) # old version still works
print(sub_pattern(['aab', 'axy', 'bzz'])) # new version yields the same result
Should a reader don't know what recursively means: calling a function from within itself.
It is allowed because each function call creates its own
scope,
it can be useful when you can solve a problem by performing a simple operation multiple times, or can't predict in advance how many times you need to perform it to reach your solution, e.g. when you need to unpack nested structures
it is defined by choosing a base case (the solution), and call the function in all other cases until you reach your base case.
I assume the issue with your example was, that once you traversed the iterator, you ran into StopIteration for the next string.
I'm not sure this is what you want, but I would create a new iterator for each string, since you have to be able to traverse over all of it for every new item. I tweaked some variable names that may cause confusion, too (string and sub). See comments for changes:
def sub_pattern(text_list):
all_encoded = []
digits = [str(i) for i in range(1,4)]
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
alpha_search = re.compile('[a-zA-Z]')
for item in text_list:
# Create new iterator for each string.
encode_iter = iter(encode)
for i in digits:
if alpha_search.search(item):
pattern = next(encode_iter)
item = pattern.sub(i, item)
else:
all_encoded.append(item)
# You likely want appending to end once no more letters can be found.
break
# Return encoded texts.
return all_encoded
Test:
print(sub_pattern(['aab', 'axy', 'bzz'])) # Output: ['111', '122', '122']

How to match if a string is inside another string

basically I need to find if a string (actually a Path) is inside a similar string but more long.
I have this string in a list:
/aa/bb/cc
/aa/bb/cc/11
/aa/bb/cc/22
/aa/bb/dd
/aa/bb/dd/33
/aa/bb/dd/44
I expect to put inside a list only string like:
/aa/bb/cc/11
/aa/bb/cc/22
/aa/bb/dd/33
/aa/bb/dd/44
I need a new list without /aa/bb/cc and /aa/bb/dd because exists /aa/bb/cc/11 and /aa/bb/cc/22, same for /aa/bb/dd, exists /aa/bb/dd/33 and /aa/bb/dd/44 so I do not want the base form /aa/bb/cc and /aa/bb/dd.
I hope I was clear :-D
How can I do thet in Python 3?
Regards
Use regular expressions.
import re
list_1 = ["/aa/bb/cc",
"/aa/bb/cc/11",
"/aa/bb/cc/22",
"/aa/bb/dd",
"/aa/bb/dd/33",
"/aa/bb/dd/44"]
regex = re.compile(r'/aa/bb/cc/+.')
obj = filter(regex.search, list_1)
regex2 = re.compile(r'/aa/bb/dd/+.')
obj2 = filter(regex2.search, list_1)
print(list(obj))
print(list(obj2))
Output:
['/aa/bb/cc/11', '/aa/bb/cc/22']
['/aa/bb/dd/33', '/aa/bb/dd/44']

regex to find match in element of list

I'm new to Python and have complied a list of items from a file that has the an element which appeared in the file and its frequency in the file like this
('95.108.240.252', 9)
its mostly IP addresses I'm gathering. I'd like to output the address and frequency like this instead
IP Frequency
95.108.240.252 9
I'm trying to do this by regexing the list item and printing that but it returns the following error when I try TypeError: expected string or bytes-like object
This is the code I'm using to do all the now:
ips = [] # IP address list
for line in f:
match = re.search("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line) # Get all IPs line by line
if match:
ips.append(match.group()) # if found add to list
from collections import defaultdict
freq = defaultdict( int )
for i in ips:
freq[i] += 1 # get frequency of IPs
print("IP\t\t Frequency") # Print header
freqsort = sorted(freq.items(), reverse = True, key=lambda item: item[1]) # sort in descending frequency
for c in range(0,4): # print the 4 most frequent IPs
# print(freqsort[c]) # This line prints the item like ('95.108.240.252', 9)
m1 = re.search("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", freqsort[c]) # This is the line returning errors - trying to parse IP on its own from the list
print(m1.group()) # Then print it
Not trying to even parse the frequency yet, just wanted the IPs as a starting point
The second parameter in re.search() should be string and you are passing tuple. So it is generating an error saying that it expected string or buffer.
NOTE:- Also you need to make sure that there at least 4 elements for IP address, otherwise there will be index out of bounds error
Delete the last two lines and use this instead
print(freqsort[c][0])
If you want to stick to your format you can use the following but it is of no use
m1 = re.search(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", freqsort[c][0]) # This is the line returning errors - trying to parse IP on its own from the list
print(m1.group())
Use a byte object instead:
# notice the `b` before the quotes.
match = re.search(b'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', line)
Try regex with positive and negative lookaround.
(?<=\(\')(.*)(?=\').*(\d+)
First captured group will be your IP and second frequency.
You can use the ipaddress and Counter in the stdlib to assist with this...
from collections import Counter
from ipaddress import ip_address
with open('somefile.log') as fin:
ips = Counter()
for line in fin:
ip, rest_of_line = line.partition(' ')[::2]
try:
ips[ip_address(ip)] += 1
except ValueError:
pass
print(ips.most_common(4))
This'll also handle IPv4 and IPv6 style addresses and make sure they're technically correct not just "look" correct. Using a collections.Counter also gives you a .most_common() method to automatically sort by the most frequent and limit it to n amounts.

Categories