How to match if a string is inside another string - python

basically I need to find if a string (actually a Path) is inside a similar string but more long.
I have this string in a list:
/aa/bb/cc
/aa/bb/cc/11
/aa/bb/cc/22
/aa/bb/dd
/aa/bb/dd/33
/aa/bb/dd/44
I expect to put inside a list only string like:
/aa/bb/cc/11
/aa/bb/cc/22
/aa/bb/dd/33
/aa/bb/dd/44
I need a new list without /aa/bb/cc and /aa/bb/dd because exists /aa/bb/cc/11 and /aa/bb/cc/22, same for /aa/bb/dd, exists /aa/bb/dd/33 and /aa/bb/dd/44 so I do not want the base form /aa/bb/cc and /aa/bb/dd.
I hope I was clear :-D
How can I do thet in Python 3?
Regards

Use regular expressions.
import re
list_1 = ["/aa/bb/cc",
"/aa/bb/cc/11",
"/aa/bb/cc/22",
"/aa/bb/dd",
"/aa/bb/dd/33",
"/aa/bb/dd/44"]
regex = re.compile(r'/aa/bb/cc/+.')
obj = filter(regex.search, list_1)
regex2 = re.compile(r'/aa/bb/dd/+.')
obj2 = filter(regex2.search, list_1)
print(list(obj))
print(list(obj2))
Output:
['/aa/bb/cc/11', '/aa/bb/cc/22']
['/aa/bb/dd/33', '/aa/bb/dd/44']

Related

Break the string (udacity: Intro to python ) [duplicate]

I wrote a quick script to remove the 'http://' substring from a list of website addresses saved on an excel column. The function replace though, doesn't work and I don't understand why.
from openpyxl import load_workbook
def rem(string):
print string.startswith("http://") #it yields "True"
string.replace("http://","")
print string, type(string) #checking if it works (it doesn't though, the output is the same as the input)
wb = load_workbook("prova.xlsx")
ws = wb["Sheet"]
for n in xrange(2,698):
c = "B"+str(n)
print ws[c].value, type(ws[c].value) #just to check value and type (unicode)
rem(str(ws[c].value)) #transformed to string in order to make replace() work
wb.save("prova.xlsx") #nothing has changed
String.replace(substr)
does not happen in place, change it to:
string = string.replace("http://","")
string.replace(old, new[, max]) only returns a value—it does not modify string. For example,
>>> a = "123"
>>> a.replace("1", "4")
'423'
>>> a
'123'
You must re-assign the string to its modified value, like so:
>>> a = a.replace("1", "4")
>>> a
'423'
So in your case, you would want to instead write
string = string.replace("http://", "")

Remove Prefixes From a String

What's a cute way to do this in python?
Say we have a list of strings:
clean_be
clean_be_al
clean_fish_po
clean_po
and we want the output to be:
be
be_al
fish_po
po
Another approach which will work for all scenarios:
import re
data = ['clean_be',
'clean_be_al',
'clean_fish_po',
'clean_po', 'clean_a', 'clean_clean', 'clean_clean_1']
for item in data:
item = re.sub('^clean_', '', item)
print (item)
Output:
be
be_al
fish_po
po
a
clean
clean_1
Here is a possible solution that works with any prefix:
prefix = 'clean_'
result = [s[len(prefix):] if s.startswith(prefix) else s for s in lst]
You've merely provided minimal information on what you're trying to achieve, but the desired output for the 4 given inputs can be created via the following function:
def func(string):
return "_".join(string.split("_")[1:])
you can do this:
strlist = ['clean_be','clean_be_al','clean_fish_po','clean_po']
def func(myList:list, start:str):
ret = []
for element in myList:
ret.append(element.lstrip(start))
return ret
print(func(strlist, 'clean_'))
I hope, it was useful, Nohab
There are many ways to do based on what you have provided.
Apart from the above answers, you can do in this way too:
string = 'clean_be_al'
string = string.replace('clean_','',1)
This would remove the first occurrence of clean_ in the string.
Also if the first word is guaranteed to be 'clean', then you can try in this way too:
string = 'clean_be_al'
print(string[6:])
You can use lstrip to remove a prefix and rstrip to remove a suffix
line = "clean_be"
print(line.lstrip("clean_"))
Drawback:
lstrip([chars])
The [chars] argument is not a prefix; rather, all combinations of its values are stripped.

function call the convert a list is alpha characters to numeric

I am trying a manual implementation of the Soundex Algorithm and this requires converting alpha text characters to numeric text characters. I have defined the following function:
import re
def sub_pattern(text):
sub = [str(i) for i in range(1,4)]
string = text
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
return(string)
This function will encode abc characters to 1 and xyz characters to 2. However, it only works for a single string and I need to pass a list of strings to the function. I've gotten the results I want using:
list(map(sub_pattern, ['aab', 'axy', 'bzz']
But I want to be able to pass the list to the function directly. I've tried this with no success as it ends only returning the first string from the list.
def sub_pattern(text_list):
all_encoded = []
sub = [str(i) for i in range(1,4)]
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for string in text_list:
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
all_encoded.append(string)
A couple things to note:
Because I am implementing the Soundex Algorithm, the order of the text when I encode it matters. I would prefer to update the string character at its orginal index to avoid having to reorganize it afterwards. In other words, you can't do any sorting to the string...I've created the iterator to incrementally update the string and it only grabs the next regex pattern if all the characters have not already been converted.
This function will be a part of two custom classes that I am creating. Both will call the __iter__ method so that I can created the iterable. That's why I use the iter() function to create an iterable because it will create a new instance if the iterator automatically.
I know this may seem like a trivial issue relative to what I'm doing, but I'm stuck.
Thank you in advance.
How about using your own function recursively? You get to keep the original exactly as it is, in case you needed it:
import re
def sub_pattern(text):
if isinstance(text, str):
sub = [str(i) for i in range(1,4)]
string = text
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
return(string)
else:
return([sub_pattern(t) for t in text])
print(list(map(sub_pattern, ['aab', 'axy', 'bzz']))) # old version still works
print(sub_pattern(['aab', 'axy', 'bzz'])) # new version yields the same result
Should a reader don't know what recursively means: calling a function from within itself.
It is allowed because each function call creates its own
scope,
it can be useful when you can solve a problem by performing a simple operation multiple times, or can't predict in advance how many times you need to perform it to reach your solution, e.g. when you need to unpack nested structures
it is defined by choosing a base case (the solution), and call the function in all other cases until you reach your base case.
I assume the issue with your example was, that once you traversed the iterator, you ran into StopIteration for the next string.
I'm not sure this is what you want, but I would create a new iterator for each string, since you have to be able to traverse over all of it for every new item. I tweaked some variable names that may cause confusion, too (string and sub). See comments for changes:
def sub_pattern(text_list):
all_encoded = []
digits = [str(i) for i in range(1,4)]
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
alpha_search = re.compile('[a-zA-Z]')
for item in text_list:
# Create new iterator for each string.
encode_iter = iter(encode)
for i in digits:
if alpha_search.search(item):
pattern = next(encode_iter)
item = pattern.sub(i, item)
else:
all_encoded.append(item)
# You likely want appending to end once no more letters can be found.
break
# Return encoded texts.
return all_encoded
Test:
print(sub_pattern(['aab', 'axy', 'bzz'])) # Output: ['111', '122', '122']

Python remove common word from list

I need to remove common word from the list. The word which need to be removed is IPNetwork.
IP_list = [IPNetwork('10.60.252.0/23'),
IPNetwork('10.60.254.0/23'),
IPNetwork('10.208.0.0/15'),
IPNetwork('10.208.64.80/28'),
IPNetwork('10.208.152.0/24'),
IPNetwork('10.208.153.0/24'),
IPNetwork('10.208.154.0/24'),
IPNetwork('10.208.155.128/25'),
IPNetwork('10.208.156.0/24')]
expected result:
['10.60.252.0/23',
'10.60.254.0/23',
'10.208.0.0/15',
'10.208.64.80/28',
'10.208.152.0/24',
'10.208.153.0/24',
'10.208.154.0/24',
'10.208.155.128/25',
'10.208.156.0/24']
IPNetwork is a class and you are instantiating objects of it. If you just want the ip addresses in string format, convert them to string explicitly
Using a map
>>> list(map(str, IP_list))
['10.60.252.0/23', '10.60.254.0/23', '10.208.0.0/15', '10.208.64.80/28', '10.208.152.0/24', '10.208.153.0/24', '10.208.154.0/24', '10.208.155.128/25', '10.208.156.0/24']
Or using list comprehension
>>> [str(ip) for ip in IP_list]
['10.60.252.0/23', '10.60.254.0/23', '10.208.0.0/15', '10.208.64.80/28', '10.208.152.0/24', '10.208.153.0/24', '10.208.154.0/24', '10.208.155.128/25', '10.208.156.0/24']

Remove comma and change string to float

I want to find "money" in a file and change the string to float , for example, I use regular expression to find "$33,326" and would like to change to [33326.0, "$"] (i.e., remove comma, $ sign and change to float). I wrote the following function but it gives me an error
import locale,re
def currencyToFloat(x):
empty = []
reNum = re.compile(r"""(?P<prefix>\$)?(?P<number>[.,0-9]+)(?P<suffix>\s+[a-zA-Z]+)?""")
new = reNum.findall(x)
for i in new:
i[1].replace(",", "")
float(i[1])
empty.append(i[1])
empty.append(i[0])
return empty
print currencyToFloat("$33,326")
Can you help me debug my code?
money = "$33,326"
money_list = [float("".join(money[1:].split(","))), "$"]
print(money_list)
OUTPUT
[33326.0, '$']
When you do
float(i[1])
you are not modifying anything. You should store the result in some variable, like:
temp = ...
But to cast to float your number have to have a dot, not a comma, so you can do:
temp = i[1].replace(",", ".")
and then cast it to float and append to the list:
empty.append(float(temp))
Note:
Something important you should know is that when you loop through a list, like
for i in new:
i is a copy of each element, so if you modify it, no changes will be done in the list new. To modify the list you can iterate over the indices:
for i in range(len(new)):
new[i] = ...
You can use str.translate()
>>>money= "$333,26"
>>>float(money.translate(None, ",$"))
33326.0
With Python 3 you can use str.maketrans with str.translate:
money = "$33,326"
print('money: {}'.format(float(money.translate(str.maketrans('', '', ",$")))))
Output: money: 33326.0

Categories