Replace only the ending of a string - python

It irks me not to be able to do the following in a single line. I've a feeling that it can be done through list comprehension, but how?
given_string = "first.second.third.None"
string_splitted = given_string.split('.')
string_splitted[-1] = "fourth"
given_string = ".".join(string_splitted)
Please note that the number of dots (.) in the given_string is constant (3). So i always want to replace the fourth fragment of the string.

It seems like you should be able to do this without splitting into an array. Find the last . and slice to there:
> given_string = "first.second.third.None"
> given_string[:given_string.rfind('.')] + '.fourth'
'first.second.third.fourth'

You could try this:
given_string = "first.second.third.None"
given_string = ".".join(given_string.split('.')[:-1] + ["fourth"])
print(given_string)
Output:
first.second.third.fourth

Try this one liner:-
print (".".join(given_string.split(".")[:-1]+["Fourth"]))
Output:
first.second.third.Fourth

You could use rsplit. This would work no matter how many dots precede the last split
given_string = "first.second.third.None"
string_splitted = given_string.rsplit('.', 1)[0] + '.fourth'
print(string_splitted)
first.second.third.fourth

my_string = "first.second.third.None"
my_sub = re.sub(r'((\w+\.){3})(\w+)', r'\1fourth', my_string)
print(my_sub)
first.second.third.fourth
A good explanation of this style is here: How to find and replace nth occurence of word in a sentence using python regular expression?

Related

How to extract a substring from a string in Python 3

I am trying to pull a substring out of a function result, but I'm having trouble figuring out the best way to strip the necessary string out using Python.
Output Example:
[<THIS STRING-STRING-STRING THAT THESE THOSE>]
In this example, I would like to grab "STRING-STRING-STRING" and throw away all the rest of the output. In this example, "[<THIS " &" THAT THESE THOSE>]" are static.
Many many ways to solve this. Here are two examples:
First one is a simple replacement of your unwanted characters.
targetstring = '[<THIS STRING-STRING-STRING THAT THESE THOSE>]'
#ALTERNATIVE 1
newstring = targetstring.replace(r" THAT THESE THOSE>]", '').replace(r"[<THIS ", '')
print(newstring)
and this drops everything except your target pattern:
#ALTERNATIVE 2
match = "STRING-STRING-STRING"
start = targetstring.find(match)
stop = len(match)
targetstring[start:start+stop]
These can be shortened but thought it might be useful for OP to have them written out.
I found this extremely useful, might be of help to you as well: https://www.computerhope.com/issues/ch001721.htm
If by '"[<THIS " &" THAT THESE THOSE>]" are static' you mean that they are always the exact same string, then:
s = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
before = len("[<THIS ")
after = len(" THAT THESE THOSE>]")
s[before:-after]
# 'STRING-STRING-STRING'
Like so (as long as the postition of the characters in the string doesn't change):
myString = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
myString = myString[7:27]
Another alternative method;
import re
my_str = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
string_pos = [(s.start(), s.end()) for s in list(re.finditer('STRING-STRING-STRING', my_str))]
start, end = string_pos[0]
print(my_str[start: end + 1])
STRING-STRING-STRING
If the STRING-STRING-STRING occurs multiple times in the string, start and end indexes of the each occurrences will be given as tuples in string_pos.

How to remove substring after a specific character in a list of strings in Python

I have a list of string labels. i want to keep the substring of very element before the second "." and remove all characters after the second ".".
I found post that show how to do this with a text string using the split function. However, the list datatype does not have a split function. The actual data type is a pandas.core.indexes.base.index which appears to be a list to me.
For the first element in the list, I want to keep L1.Energy and remove everything after the second ".".
current_list = ['L1.Energy.Energy', 'L1.Utility.Energy', 'L1.Technology.Utility', 'L1.Financial.Utility']
desired_list = [L1.Energy', 'L1.Utility', 'L1.Technology,'L1.Financial']
Here as a oneliner:
desired_list = [ s[:s.find(".",s.find(".")+1)] for s in current_list]
current_list = ['L1.Energy.Energy', 'L1.Utility.Energy', 'L1.Technology.Utility', 'L1.Financial.Utility']
desired_list = [ '.'.join(x.split('.')[:2]) for x in current_list ]
BTW, this will work also if your labels have more than two dots (like 'L1.Utility.Energy.Electric')
Here, its ugly but it works
bob = ['L1.Energy.Energy', 'L1.Utility.Energy',
'L1.Technology.Utility','L1.Financial.Utility']
result = []
for i in bob:
temp = i.split(".")
result.append(temp[0] + "." + temp[1])
print(result)
A solution with regex:
desired_list = [re.sub('(\..*)(\..*)',r'\1', s) for s in current_list]
Output:
['L1.Energy', 'L1.Utility', 'L1.Technology', 'L1.Financial']

Converting regex whitespace characters from list into string

So i want to convert regex whitespaces into a string for example
list1 = ["Hello","\s","my","\s","name","\s","is"]
And I want to convert it to a string like
"Hello my name is"
Can anyone please help.
But also if there was characters such as
"\t"
how would i do this?
list = ["Hello","\s","my","\s","name","\s","is"]
str1 = ''.join(list).replace("\s"," ")
Output :
>>> str1
'Hello my name is'
Update :
If you have something like this list1 = ["Hello","\s","my","\s","name","\t","is"] then you can use multiple replace
>>> str1 = ''.join(list).replace("\s"," ").replace("\t"," ")
>>> str1
'Hello my name is'
or if it's only \t
str1 = ''.join(list).replace("\t","anystring")
I would highly recommend using the join string function mentioned in one of the earlier answers, as it is less verbose. However, if you absolutely needed to use regex in order to complete the task, here's the answer:
import re
list1 = ["Hello","\s","my","\s","name","\s","is"]
list_str = ''.join(list1)
updated_str = re.split('\\\s', list_str)
updated_str = ' '.join(updated_str)
print(updated_str)
Output is:
'Hello my name is'
In order to use raw string notation, replace the 5th line of code with the one below:
updated_str = re.split(r'\\s', list_str)
Both will have the same output result.
You don't even need regular expressions for that:
s = ' '.join([item for item in list if item is not '\s'])
Please note that list is an invalid name for a variable in python as it conflicts with the list function.

Extract substrings from logical expressions

Let's say I have a string that looks like this:
myStr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
What I would like to obtain in the end would be:
myStr_l1 = '(Txt_l1) or (Txt2_l1)'
and
myStr_l2 = '(Txt_l2) or (Txt2_l2)'
Some properties:
all "Txt_"-elements of the string start with an uppercase letter
the string can contain much more elements (so there could also be Txt3, Txt4,...)
the suffixes '_l1' and '_l2' look different in reality; they cannot be used for matching (I chose them for demonstration purposes)
I found a way to get the first part done by using:
myStr_l1 = re.sub('\(\w+\)','',myStr)
which gives me
'(Txt_l1 ) or (Txt2_l1 )'
However, I don't know how to obtain myStr_l2. My idea was to remove everything between two open parentheses. But when I do something like this:
re.sub('\(w+\(', '', myStr)
the entire string is returned.
re.sub('\(.*\(', '', myStr)
removes - of course - far too much and gives me
'Txt2_l2))'
Does anyone have an idea how to get myStr_l2?
When there is an "and" instead of an "or", the strings look slightly different:
myStr2 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2))'
Then I can still use the command from above:
re.sub('\(\w+\)','',myStr2)
which gives:
'(Txt_l1 and Txt2_l1 )'
but I again fail to get myStr2_l2. How would I do this for these kind of strings?
And how would one then do this for mixed expressions with "and" and "or" e.g. like this:
myStr3 = '(Txt_l1 (Txt_l2) and Txt2_l1 (Txt2_l2)) or (Txt3_l1 (Txt3_l2) and Txt4_l1 (Txt2_l2))'
re.sub('\(\w+\)','',myStr3)
gives me
'(Txt_l1 and Txt2_l1 ) or (Txt3_l1 and Txt4_l1 )'
but again: How would I obtain myStr3_l2?
Regexp is not powerful enough for nested expressions (in your case: nested elements in parentheses). You will have to write a parser. Look at https://pyparsing.wikispaces.com/
I'm not entirely sure what you want but I wrote this to strip everything between the parenthesis.
import re
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
sets = mystr.split(' or ')
noParens = []
for line in sets:
mat = re.match(r'\((.* )\((.*\)\))', line, re.M)
if mat:
noParens.append(mat.group(1))
noParens.append(mat.group(2).replace(')',''))
print(noParens)
This takes all the parenthesis away and puts your elements in a list. Here's an alternate way of doing it without using Regular Expressions.
mystr = '(Txt_l1 (Txt_l2)) or (Txt2_l1 (Txt2_l2))'
noParens = []
mystr = mystr.replace(' or ', ' ')
mystr = mystr.replace(')','')
mystr = mystr.replace('(','')
noParens = mystr.split()
print(noParens)

Strip in Python

I have a question regarding strip() in Python. I am trying to strip a semi-colon from a string, I know how to do this when the semi-colon is at the end of the string, but how would I do it if it is not the last element, but say the second to last element.
eg:
1;2;3;4;\n
I would like to strip that last semi-colon.
Strip the other characters as well.
>>> '1;2;3;4;\n'.strip('\n;')
'1;2;3;4'
>>> "".join("1;2;3;4;\n".rpartition(";")[::2])
'1;2;3;4\n'
how about replace?
string1='1;2;3;4;\n'
string2=string1.replace(";\n","\n")
>>> string = "1;2;3;4;\n"
>>> string.strip().strip(";")
"1;2;3;4"
This will first strip any leading or trailing white space, and then remove any leading or trailing semicolon.
Try this:
def remove_last(string):
index = string.rfind(';')
if index == -1:
# Semi-colon doesn't exist
return string
return string[:index] + string[index+1:]
This should be able to remove the last semicolon of the line, regardless of what characters come after it.
>>> remove_last('Test')
'Test'
>>> remove_last('Test;abc')
'Testabc'
>>> remove_last(';test;abc;foobar;\n')
';test;abc;foobar\n'
>>> remove_last(';asdf;asdf;asdf;asdf')
';asdf;asdf;asdfasdf'
The other answers provided are probably faster since they're tailored to your specific example, but this one is a bit more flexible.
You could split the string with semi colon and then join the non-empty parts back again using ; as separator
parts = '1;2;3;4;\n'.split(';')
non_empty_parts = []
for s in parts:
if s.strip() != "": non_empty_parts.append(s.strip())
print "".join(non_empty_parts, ';')
If you only want to use the strip function this is one method:
Using slice notation, you can limit the strip() function's scope to one part of the string and append the "\n" on at the end:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:8].strip(';') + str[8:]
Using the rfind() method(similar to Micheal0x2a's solution) you can make the statement applicable to many strings:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:str.rfind(';') + 1 ].strip(';') + str[str.rfind(';') + 1:]
re.sub(r';(\W*$)', r'\1', '1;2;3;4;\n') -> '1;2;3;4\n'

Categories