How to add a space every time X appears? - python

For example, how do I make this:
file1.exefile2.exefile3.exe
Into this:
file1.exe file2.exe file3.exe
In this example ".exe" is X.

You can first split the given text on x i.e. .exe in this case, then join all the split items by x + , and finally strip off on the space to get the desired result.
text='file1.exefile2.exefile3.exe'
x = '.exe'
f'{x} '.join(text.split(x)).strip()
#output: 'file1.exe file2.exe file3.exe'

This should do the job.
text = ".exe ".join(text.split(".exe")).rstrip()

Using a RegEx:
import re
string = 'file1.exefile2.exefile3.exe'
print(re.sub('\.exe(?!$)', '.exe ', string))
The first argument is the RegEx to match, the second is the replacement and the third is the string.
The \. inside the RegEx just means a dot (".") and the (?!$) is a negative lookahead onto the end of the string. This way the last ".exe" is not matched. Plainly, it checks whether the next symbol is the end of the string and if it is, the pattern does not match.

I would do something like this (if you want to do this manually)
string="file1.exefile2.exefile3.exe"
X="exe"
lenX=len(X)
temp=""
start=0
for i in range(string.count(X)):
string=string[start:]
if(X in string):
temp+=string[:string.index(X)+lenX]+" "
start=string.index(X)+lenX
print(temp.strip())

Related

How to get value of "string"?

Given strings like:
"hello"
'hello'
I want to remove only first and last char if:
They are the same
They are " or '
I.e., given 'hello' I'm expecting hello. Given 'hello" I'm not expecting it to change.
I was able to do this by reading first char and last char, validating they are the same + validating they are equal to ' or " and validating it's not the the same index for char (because I don't want this: ' to end up as the empty string). With all edge cases checking I ended with 10s of lines.
What's your approach to solve this?
In simple words, Given a string in Python format I want to return its data and if it's not valid to keep it as is.
Sounds like a job for regular expressions with groups:
import re
re.sub(r'^([\'"])(.*)(\1)$', r'\2', s)
Which reads as:
^ - match the beginning of the string
(['"]) - either single or double quote (group 1)
(.*) any (possibly, empty) sequence of characters in between (group 2)
(\1) - the same character as in group 1
$ - end of the string
If the string matches the pattern above, replace it with the content of the group 2.
For example:
>>> s = re.sub(r'^([\'"])(.*)(\1)$', r'\2', "'hello'")
>>> print(s)
hello
An alternative way could be with ast.literal_eval(), but it won't handle non-matching quotes.
I would use str.endswith and str.startswith, although it still gets a bit long:
def readstring(string):
if len(string)>1 and (string.startswith('"') and string.endswith('"') or string.startswith("'") and string.endswith("'")):
return string[1:-1]
return string

How to remove everything before certain character in Python

I'm new to python and struggle with a certain task:
I have a String that could have anything in it, but it always "ends" the same.
It can be just a Filename, a complete path, or just a random string, ending with a Version Number.
Example:
C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1
string-anotherstring-15.1R7-S8.1
string-anotherstring.andanother-15.1R7-S8.1
What always is the same (looking from the end) is that if you reach the second dot and go 2 characters in front of it, you always match the part that I'm interested in.
Cutting everything after a certain string was "easy," and I solved it myself - that's why the string ends with the version now :)
Is there a way to tell python, "look for the second dot from behind the string and go 2 in front of it and delete everything in front of that so that I get the Version as a string?
Happy for any pointers in the right direction.
Thanks
If you want the version number, can you use the hyphen (-) to split the string? Or do you need to depend on the dots only?
Please see below use of rsplit and join which can help you.
>>> a = 'string-anotherstring.andanother-15.1R7-S8.1'
>>> a.rsplit('-')
['string', 'anotherstring.andanother', '15.1R7', 'S8.1']
>>> a.rsplit('-')[-2:] #Get everything from second last to the end
['15.1R7', 'S8.1']
>>> '-'.join(a.rsplit('-')[-2:]) #Get everything from second last to the end, and join them with a hyphen
'15.1R7-S8.1'
>>>
For using dots, use the same way
>>> a
'string-anotherstring.andanother-15.1R7-S8.1'
>>> data = a.rsplit('.')
>>> [data[-3][-2:]]
['15']
>>> [data[-3][-2:]] + data[-2:]
['15', '1R7-S8', '1']
>>> '.'.join([data[-3][-2:]] + data[-2:])
'15.1R7-S8.1'
>>>
You can build a regex from the end mark of a line using the anchor $.
Using your own description, use the regex:
(\d\d\.[^.]*)\.[^.]*$
Demo
If you want the last characters from the end included, just move the capturing parenthesis:
(\d\d\.[^.]*\.[^.]*)$
Demo
Explanation:
(\d\d\.[^.]*\.[^.]*)$
^ ^ #digits
^ # a literal '.'
^ # anything OTHER THAN a '.'
^ # literal '.'
^ # anything OTHER THAN a '.'
^ # end of line
Assuming I understand this correctly, there are two ways to do this that come to mind:
Including both, since I might not understand this correctly, and for completeness reasons. I think the split/parts solution is cleaner, particularly when the 'certain character' is a dot.
>>> msg = r'C:\Users\abc\Desktop\string-anotherstring-15.1R7-S8.1'
>>> re.search(r'.*(..\..*)', msg).group(1)
'S8.1'
>>> parts = msg.split('.')
>>> ".".join((parts[-2][-2:], parts[-1]))
'S8.1'
For your example, you can split the string by the separator '-', and then join the last two indices. Like so:
txt = "string-anotherstring-15.1R7-S8.1"
x = txt.split("-")
y = "".join(x[-2:])
print(y) # outputs 15.1R7S8.1

How to remove text before a particular character or string in multi-line text?

I want to remove all the text before and including */ in a string.
For example, consider:
string = ''' something
other things
etc. */ extra text.
'''
Here I want extra text. as the output.
I tried:
string = re.sub("^(.*)(?=*/)", "", string)
I also tried:
string = re.sub(re.compile(r"^.\*/", re.DOTALL), "", string)
But when I print string, it did not perform the operation I wanted and the whole string is printing.
I suppose you're fine without regular expressions:
string[string.index("*/ ")+3:]
And if you want to strip that newline:
string[string.index("*/ ")+3:].rstrip()
The problem with your first regex is that . does not match newlines as you noticed. With your second one, you were closer but forgot the * that time. This would work:
string = re.sub(re.compile(r"^.*\*/", re.DOTALL), "", string)
You can also just get the part of the string that comes after your "*/":
string = re.search(r"(\*/)(.*)", string, re.DOTALL).group(2)
Update: After doing some research, I found that the pattern (\n|.) to match everything including newlines is inefficient. I've updated the answer to use [\s\S] instead as shown on the answer I linked.
The problem is that . in python regex matches everything except newlines. For a regex solution, you can do the following:
import re
strng = ''' something
other things
etc. */ extra text.
'''
print(re.sub("[\s\S]+\*/", "", strng))
# extra text.
Add in a .strip() if you want to remove that remaining leading whitespace.
to keep text until that symbol you can do:
split_str = string.split(' ')
boundary = split_str.index('*/')
new = ' '.join(split_str[0:boundary])
print(new)
which gives you:
something
other things
etc.
string_list = string.split('*/')[1:]
string = '*/'.join(string_list)
print(string)
gives output as
' extra text. \n'

Remove variable parts of a string that start and end the same

I have a string as the following:
'1:CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD\n&:TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG\n&:TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE\n2:ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG\n&:AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS\n&:CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y\n&:9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L\n&:9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99\n'
As you can see, I would like to get the '\n(number or & here):' replaced by ','
Since they all start with '\n' and end with ':' I believe that there should be a way to replace them all at once.
The output would be as the sort:
'CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD,TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG,TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE,ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG,AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS,CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y,9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L,9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99'
What could work was making a for lop for numbers and &.
string.replace('\n&:',',')
for i in range(1,20):
string.replace('\ni:',',')
But I believe there must be a better way.
You can use regex to get the job done:
Input:
import re
text = '1:CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD\n&:TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG\n&:TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE\n2:ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG\n&:AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS\n&:CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y\n&:9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L\n&:9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99\n'
text = re.sub(r'\n&*(\d*:)*',',', text[2:]).rstrip(',')
Output:
'CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD,TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG,TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE,ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG,AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS,CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y,9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L,9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99'
You can use a regular expression replace:
s = '1:CH,AG,ME,GS,AP,CH,HE,AC,AC,AG,CA,HE,AT,AT,AC,AT,OG,NE,AG,AC,CS,OD\n&:TA,EB,PA,AC,BR,TH,PO,AC,2I,AC,TH,PE,TH,AZ,AZ,ZE,CS,OD,CH,EO,ZE,OG\n&:TH,ZE,ZE,HE,HE,HP,HP,OG,HP,ZE\n2:ZE,FD,FD,AG,EO,OG,AG,NE,RU,GS,HP,ZE,ZE,HM,HM,PC,PC,AS,AS,TY,TY,AG\n&:AG,GS,NO,EU,ZF,HE,AT,AT,OD,OD,EB,OD,GS,TR,OD,AC,TR,GS,OD,TR,OD,AT,GS\n&:CA,GS,NE,GS,AG,PS,HL,AG,NE,ID,AJ,AX,DI,OD,ME,AT,GS,MU,HO,PB,LT,9Z,PT,9Y\n&:9W,9X,AR,9V,9U,9T,AX,9S,9R,AT,AJ,DI,ST,EA,AG,ME,NE,MU,9Q,9P,9O,9N,9M,9L\n&:9K,ID,MG,OD,FY,AU,AU,HR,HR,9J,TL,9I,9H,9G,9F,AC,BR,AC,9E,9D,9C,9B,99\n'
s = re.sub(r"(\n\d*?:)|(\n&:)", ",", s).strip() # replaces the middle bits with commas and strips trailing \n
s = re.sub(r"^(\d*?:)|(&:)", "", s) # removes the initial 1: or similar

Change pa$$word to pa\$\$word in Python

I have a string pa$$word. I want to change this string to pa\$\$word. This must be changed to 2 or more such characters only and not for pa$word. The replacement must happen n number of times where n is the number of "$" symbols. For example, pa$$$$word becomes pa\$\$\$\$word and pa$$$word becomes pa\$\$\$word.
How can I do it?
import re
def replacer(matchobj):
mat = matchobj.group()
return "".join(item for items in zip("\\" * len(mat), mat) for item in items)
print re.sub(r"((\$)\2+)", replacer, "pa$$$$word")
# pa\$\$\$\$word
print re.sub(r"((\$)\2+)", replacer, "pa$$$word")
# pa\$\$\$word
print re.sub(r"((\$)\2+)", replacer, "pa$$word")
# pa\$\$word
print re.sub(r"((\$)\2+)", replacer, "pa$word")
# pa$word
((\$)\2+) - We create two capturing groups here. First one is, the entire match as it is, which can be referred later as \1. The second capturing group is a nested one, which captures the string \$ and referred as \2. So, we first match $ once and make sure that it exists more than once, continuously by \2+.
So, when we find a string like that, we call replacer function with the matched string and the captured groups. In the replacer function, we get the entire matched string with matchobj.group() and then we simply interleave that matched string with \.
I believe the regex you're after is:
[$]{2,}
which will match 2 or more of the character $
this should help
import re
result = re.sub("\$", "\\$", yourString)
or you can try
str.replace("\$", "\\$")

Categories