python string split slice and into a list - python

I have a string for example "streemlocalbbv"
and I have my_function that takes this string and a string that I want to find ("loc") in the original string. And what I want to get returned is this;
my_function("streemlocalbbv", "loc")
output = ["streem","loc","albbv"]
what I did so far is
def find_split(string,find_word):
length = len(string)
find_word_start_index = string.find(find_word)
find_word_end_index = find_word_start_index + len(find_word)
string[find_word_start_index:find_word_end_index]
a = string[0:find_word_start_index]
b = string[find_word_start_index:find_word_end_index]
c = string[find_word_end_index:length]
return [a,b,c]
Trying to find the index of the string I am looking for in the original string, and then split the original string. But from here I am not sure how should I do it.

You can use str.partition which does exactly what you want:
>>> "streemlocalbbv".partition("loc")
('streem', 'loc', 'albbv')

Use the split function:
def find_split(string,find_word):
ends = string.split(find_word)
return [ends[0], find_word, ends[1]]

Use the split, index and insert function to solve this
def my_function(word,split_by):
l = word.split(split_by)
l.insert(l.index(word[:word.find(split_by)])+1,split_by)
return l
print(my_function("streemlocalbbv", "loc"))
#['str', 'eem', 'localbbv']

Related

Remove Prefixes From a String

What's a cute way to do this in python?
Say we have a list of strings:
clean_be
clean_be_al
clean_fish_po
clean_po
and we want the output to be:
be
be_al
fish_po
po
Another approach which will work for all scenarios:
import re
data = ['clean_be',
'clean_be_al',
'clean_fish_po',
'clean_po', 'clean_a', 'clean_clean', 'clean_clean_1']
for item in data:
item = re.sub('^clean_', '', item)
print (item)
Output:
be
be_al
fish_po
po
a
clean
clean_1
Here is a possible solution that works with any prefix:
prefix = 'clean_'
result = [s[len(prefix):] if s.startswith(prefix) else s for s in lst]
You've merely provided minimal information on what you're trying to achieve, but the desired output for the 4 given inputs can be created via the following function:
def func(string):
return "_".join(string.split("_")[1:])
you can do this:
strlist = ['clean_be','clean_be_al','clean_fish_po','clean_po']
def func(myList:list, start:str):
ret = []
for element in myList:
ret.append(element.lstrip(start))
return ret
print(func(strlist, 'clean_'))
I hope, it was useful, Nohab
There are many ways to do based on what you have provided.
Apart from the above answers, you can do in this way too:
string = 'clean_be_al'
string = string.replace('clean_','',1)
This would remove the first occurrence of clean_ in the string.
Also if the first word is guaranteed to be 'clean', then you can try in this way too:
string = 'clean_be_al'
print(string[6:])
You can use lstrip to remove a prefix and rstrip to remove a suffix
line = "clean_be"
print(line.lstrip("clean_"))
Drawback:
lstrip([chars])
The [chars] argument is not a prefix; rather, all combinations of its values are stripped.

function call the convert a list is alpha characters to numeric

I am trying a manual implementation of the Soundex Algorithm and this requires converting alpha text characters to numeric text characters. I have defined the following function:
import re
def sub_pattern(text):
sub = [str(i) for i in range(1,4)]
string = text
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
return(string)
This function will encode abc characters to 1 and xyz characters to 2. However, it only works for a single string and I need to pass a list of strings to the function. I've gotten the results I want using:
list(map(sub_pattern, ['aab', 'axy', 'bzz']
But I want to be able to pass the list to the function directly. I've tried this with no success as it ends only returning the first string from the list.
def sub_pattern(text_list):
all_encoded = []
sub = [str(i) for i in range(1,4)]
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for string in text_list:
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
all_encoded.append(string)
A couple things to note:
Because I am implementing the Soundex Algorithm, the order of the text when I encode it matters. I would prefer to update the string character at its orginal index to avoid having to reorganize it afterwards. In other words, you can't do any sorting to the string...I've created the iterator to incrementally update the string and it only grabs the next regex pattern if all the characters have not already been converted.
This function will be a part of two custom classes that I am creating. Both will call the __iter__ method so that I can created the iterable. That's why I use the iter() function to create an iterable because it will create a new instance if the iterator automatically.
I know this may seem like a trivial issue relative to what I'm doing, but I'm stuck.
Thank you in advance.
How about using your own function recursively? You get to keep the original exactly as it is, in case you needed it:
import re
def sub_pattern(text):
if isinstance(text, str):
sub = [str(i) for i in range(1,4)]
string = text
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
encode_iter = iter(encode)
alpha_search = re.compile('[a-zA-Z]')
for i in sub:
if alpha_search.search(string):
pattern = next(encode_iter)
string = pattern.sub(i, string)
else:
return(string)
else:
return([sub_pattern(t) for t in text])
print(list(map(sub_pattern, ['aab', 'axy', 'bzz']))) # old version still works
print(sub_pattern(['aab', 'axy', 'bzz'])) # new version yields the same result
Should a reader don't know what recursively means: calling a function from within itself.
It is allowed because each function call creates its own
scope,
it can be useful when you can solve a problem by performing a simple operation multiple times, or can't predict in advance how many times you need to perform it to reach your solution, e.g. when you need to unpack nested structures
it is defined by choosing a base case (the solution), and call the function in all other cases until you reach your base case.
I assume the issue with your example was, that once you traversed the iterator, you ran into StopIteration for the next string.
I'm not sure this is what you want, but I would create a new iterator for each string, since you have to be able to traverse over all of it for every new item. I tweaked some variable names that may cause confusion, too (string and sub). See comments for changes:
def sub_pattern(text_list):
all_encoded = []
digits = [str(i) for i in range(1,4)]
abc = re.compile('[abc]')
xyz = re.compile('[xyz]')
encode = [abc, xyz]
alpha_search = re.compile('[a-zA-Z]')
for item in text_list:
# Create new iterator for each string.
encode_iter = iter(encode)
for i in digits:
if alpha_search.search(item):
pattern = next(encode_iter)
item = pattern.sub(i, item)
else:
all_encoded.append(item)
# You likely want appending to end once no more letters can be found.
break
# Return encoded texts.
return all_encoded
Test:
print(sub_pattern(['aab', 'axy', 'bzz'])) # Output: ['111', '122', '122']

Update last character of string with a value

I have two strings:
input = "12.34.45.362"
output = "2"
I want to be able to replace the 362 in input by 2 from output.
Thus the final result should be 12.34.45.2. I am unsure on how to do it. Any help is appreciated.
You can use a simple regex for this:
import re
input_ = "12.34.45.362"
output = "2"
input_ = re.sub(r"\.\d+$", f".{output}", input_)
print(input_)
Output:
12.34.45.2
Notice that I also changed input to input_, so we're not shadowing the built-in input() function.
Can also use a more simple, but little bit less robust pattern, which doesn't take the period into account at all, and just replaces all the digits from the end:
import re
input_ = "12.34.45.362"
output = "2"
input_ = re.sub(r"\d+$", output, input_)
print(input_)
Output:
12.34.45.2
Just in case you want to do this for any string of form X.Y.Z.W where X, Y, Z, and W may be of non-constant length:
new_result = ".".join(your_input.split(".")[:-1]) + "." + output
s.join will join a collection together to a string using the string s between each element. s.split will turn a string into a list which each element between the given character .. Slicing the list (l[:-1]) will give you all but the last element, and finally string concatenation (if you are sure output is str) will give you your result.
Breaking it down step-by-step:
your_input = "12.34.45.362"
your_input.split(".") # == ["12", "34", "45", "362"]
your_input.split(".")[:-1] # == ["12", "34", "45"]
".".join(your_input.split(".")[:-1]) # == "12.34.45"
".".join(your_input.split(".")[:-1]) + "." + output # == "12.34.45.2"
If you are trying to split int the lat . just do a right split get everything and do a string formatting
i = "12.34.45.362"
r = "{}.2".format(i.rsplit(".",1)[0])
output
'12.34.45.2'

Python Joining List and adding and removing characters

I have a list i need to .join as string and append characters
my_list = ['3.3.3.3', '2.2.2.3', '2.2.2.2']
my_list.append(')"')
my_list.insert(0,'"(')
hostman = '|'.join('{0}'.format(w) for w in my_list)
#my_list.pop()
print(hostman)
print(my_list)
My output = "(|3.3.3.3|2.2.2.3|2.2.2.2|)"
I need the output to be = "(3.3.3.3|2.2.2.3|2.2.2.2)"
how can i strip the first and last | from the string
You are making it harder than it needs to be. You can just use join() directly with the list:
my_list = ['3.3.3.3', '2.2.2.3', '2.2.2.2']
s = '"(' + '|'.join(my_list) + ')"'
# s is "(3.3.3.3|2.2.2.3|2.2.2.2)"
# with quotes as part of the string
or if you prefer format:
s = '"({})"'.format('|'.join(my_list))
Try this :
hostman = "("+"|".join(my_list)+")"
OUTPUT :
'(3.3.3.3|2.2.2.3|2.2.2.2)'

Remove comma and change string to float

I want to find "money" in a file and change the string to float , for example, I use regular expression to find "$33,326" and would like to change to [33326.0, "$"] (i.e., remove comma, $ sign and change to float). I wrote the following function but it gives me an error
import locale,re
def currencyToFloat(x):
empty = []
reNum = re.compile(r"""(?P<prefix>\$)?(?P<number>[.,0-9]+)(?P<suffix>\s+[a-zA-Z]+)?""")
new = reNum.findall(x)
for i in new:
i[1].replace(",", "")
float(i[1])
empty.append(i[1])
empty.append(i[0])
return empty
print currencyToFloat("$33,326")
Can you help me debug my code?
money = "$33,326"
money_list = [float("".join(money[1:].split(","))), "$"]
print(money_list)
OUTPUT
[33326.0, '$']
When you do
float(i[1])
you are not modifying anything. You should store the result in some variable, like:
temp = ...
But to cast to float your number have to have a dot, not a comma, so you can do:
temp = i[1].replace(",", ".")
and then cast it to float and append to the list:
empty.append(float(temp))
Note:
Something important you should know is that when you loop through a list, like
for i in new:
i is a copy of each element, so if you modify it, no changes will be done in the list new. To modify the list you can iterate over the indices:
for i in range(len(new)):
new[i] = ...
You can use str.translate()
>>>money= "$333,26"
>>>float(money.translate(None, ",$"))
33326.0
With Python 3 you can use str.maketrans with str.translate:
money = "$33,326"
print('money: {}'.format(float(money.translate(str.maketrans('', '', ",$")))))
Output: money: 33326.0

Categories