Changing only one letter when there are a lot simular in string

Changing only one letter when there are a lot simular in string - python

Suppose I have the following string:
I.like.football
sky.is.blue
I need to make a loop that changes the last '.' to '_' so it looks this way
I.like_football
sky.is_blue
They are all simular style(3 words, 3 dots).
How to do that in a loop?

str='I.like.football'
str=str.rsplit('.',1) #this split from right but only first '.'
print '_'.join(str) # then join it
#output I.like_football
in single line
str='_'.join(str.rsplit('.',1))

str.replace lets you specify the number of replacements. Unfortunately there is no str.rreplace, so you'd need to reverse the string before and after :) eg.
>>> def f(s):
... return s[::-1].replace(".", "_", 1)[::-1]
...
>>> f('I.like.football')
'I.like_football'
>>> f('sky.is.blue')
'sky.is_blue'
Alternatively you could use one of str.rpartition, str.rsplit, str.rfind

This doesn't even need to run in a loop:
import re
p = re.compile(ur'\.(?=[^\.]+$)', re.IGNORECASE | re.MULTILINE)
test_str = u"I.like.football\nsky.is.blue"
subst = u"_"
result = re.sub(p, subst, test_str)

Related

extract data from nested parenthesis?

I have a string:
test_string = 'RGBA(30(25VARGHK_65FVDFKDGV_10FVDSSLBA)_10UJN(85VOEZSR_5VAVUSR_10SQMCFE)_20BBLRG(SSLCN)_10UDSCT(80EDYFIH_10VAP_10SNE)_30EDU(50EDFva_50VAP)_10EDP(50EDFva_50SNE))'
I need to extract the data from the string and the final result should look like that:
RGBA,
30TCH:25VARGHK, 65FVDFKDGV, 10FVDSSLBA,
10UJN:85VOEZSR, 5VAVUSR, 5SQMCFE
....
and so on..
I thought using regex but it is not good solution here..

Regex will work fine. After you remove the outer I(), you have a many sets of "prefix" followed by a (group_of_data)
If you don't want trailing commas, try this
import re
regex = r"[^(]+\([^)]+\)"
s = 'RGBA(30(25VARGHK_65FVDFKDGV_10FVDSSLBA)_10UJN(85VOEZSR_5VAVUSR_10SQMCFE)_20BBLRG(SSLCN)_10UDSCT(80EDYFIH_10VAP_10SNE)_30EDU(50EDFva_50VAP)_10EDP(50EDFva_50SNE))'
first_start = s.index('(')
print(s[:first_start])
matches = re.finditer(regex, s[first_start+1:-1], re.MULTILINE)
for _, match in enumerate(matches, start=1):
g = match.group().lstrip('_')
data_start = g.index('(')
prefix = g[:data_start]
data = ', '.join(g[data_start + 1:-1].split('_'))
print(f'{prefix}:{data}')
Output
RGBA
30:25VARGHK, 65FVDFKDGV, 10FVDSSLBA,
10UJN:85VOEZSR, 5VAVUSR, 10SQMCFE
20BBLRG:SSLCN
10UDSCT:80EDYFIH, 10VAP, 10SNE
30EDU:50EDFva, 50VAP
10EDP:50EDFva, 50SNE

This seems to get you (almost) there -
[_.replace("(", ": ").replace("_", ", ") for _ in re.split(r"\)_", test_string)]
Output
['RGBA: 30TCH: 25VARGHK, 65FVDFKDGV, 10FVDSSLBA',
'10UJN: 85VOEZSR, 5VAVUSR, 10SQMCFE
'20BBLRG:SSLCN',
'10UDSCT:80EDYFIH, 10VAP, 10SNE
'30EDU:50EDF, 50VPC',
'10EDP:50EDF, 50SNELP))']

I think we may need a little more clarification on the logic. It looks like ( should translate into a :, but not every time. Here is my crack at it using regexes. This might not be exactly what you are looking for, but should be pretty close:
import re
def main():
test_string = 'RGBA(30(25VARGHK_65FVDFKDGV_10FVDSSLBA)_10UJN(85VOEZSR_5VAVUSR_10SQMCFE)_20BBLRG(SSLCN)_10UDSCT(80EDYFIH_10VAP_10SNE)_30EDURKA(50EDFL_50VAP)_10EDPJ(50EDFV_50SNOL))'
test_string = re.sub("\)_", ",\n", test_string)
test_string = re.sub("_", ",", test_string)
test_string = re.sub("\(", ":", test_string)
test_string = re.sub("\)\)", "", test_string)
print(test_string)
if __name__ == "__main__":
main()
results:
RGBA:30:25VARGHK,65FVDFKDGV,10FVDSSLBA,
10UJN:85VOEZSR,5VAVUSR,10SQMCFE,
20BBLRG:SSLCN,
10UDSCT:85EDYFIH,5VAPOR,10SQMCFE,
30EDURKA:70EDFL,30VAPOR,
10EDPJ:50EDFV,50SNOL
Pretty much just a series of regexes. Note that by using re.sub like this in an order, you clean the string as you go. You could certainly just fiddle the beginning of the string to change the first : to a ,\n but I'm not sure if the data you are getting in is always the same.

String.split() after n characters

I can split a string like this:
string = 'ABC_elTE00001'
string = string.split('_elTE')[1]
print(string)
How do I automate this, so I don't have to pass '_elTE' to the function? Something like this:
string = 'ABC_elTE00001'
string = string.split('_' + 4 characters)[1]
print(string)

Use regex, regex has a re.split thing which is the same as str.split just you can split by a regex pattern, it's worth a look at the docs:
>>> import re
>>> string = 'ABC_elTE00001'
>>> re.split('_\w{4}', string)
['ABC', '00001']
>>>
The above example is using a regex pattern as you see.

split() on _ and take everything after the first four characters.
s = 'ABC_elTE00001'
# s.split('_')[1] gives elTE00001
# To get the string after 4 chars, we'd slice it [4:]
print(s.split('_')[1][4:])
OUTPUT:
00001

You can use Regular expression to automate the extraction that you want.
import re
string = 'ABC_elTE00001'
data = re.findall('.([0-9]*$)',string)
print(data)

This is a, quite horrible, version that exactly "translates" string.split('_' + 4 characters)[1]:
s = 'ABC_elTE00001'
s.split(s[s.find("_"):(s.find("_")+1)+4])[1]
>>> '00001'

Slice string at last digit in Python

So I have strings with a date somewhere in the middle, like 111_Joe_Smith_2010_Assessment and I want to truncate them such that they become something like 111_Joe_Smith_2010. The code that I thought would work is
reverseString = currentString[::-1]
stripper = re.search('\d', reverseString)
But for some reason this doesn't always give me the right result. Most of the time it does, but every now and then, it will output a string that looks like 111_Joe_Smith_2010_A.
If anyone knows what's wrong with this, it would be super helpful!

You can use re.sub and $ to match and substitute alphabetical characters
and underscores until the end of the string:
import re
d = ['111_Joe_Smith_2010_Assessment', '111_Bob_Smith_2010_Test_assessment']
new_s = [re.sub('[a-zA-Z_]+$', '', i) for i in d]
Output:
['111_Joe_Smith_2010', '111_Bob_Smith_2010']

You could strip non-digit characters from the end of the string using re.sub like this:
>>> import re
>>> re.sub(r'\D+$', '', '111_Joe_Smith_2010_Assessment')
'111_Joe_Smith_2010'
For your input format you could also do it with a simple loop:
>>> s = '111_Joe_Smith_2010_Assessment'
>>> i = len(s) - 1
>>> while not s[i].isdigit():
... i -= 1
...
>>> s[:i+1]
'111_Joe_Smith_2010'

You can use the following approach:
def clean_names():
names = ['111_Joe_Smith_2010_Assessment', '111_Bob_Smith_2010_Test_assessment']
for name in names:
while not name[-1].isdigit():
name = name[:-1]
print(name)

Here is another solution using rstrip() to remove trailing letters and underscores, which I consider a pretty smart alternative to re.sub() as used in other answers:
import string
s = '111_Joe_Smith_2010_Assessment'
new_s = s.rstrip(f'{string.ascii_letters}_') # For Python 3.6+
new_s = s.rstrip(string.ascii_letters+'_') # For other Python versions
print(new_s) # 111_Joe_Smith_2010

Regex: Replace one pattern with another

I am trying to replace one regex pattern with another regex pattern.
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
pattern = re.compile('\d+x\d+') # for st_srt
re.sub(pattern, 'S\1E\2',st_srt)
I know the use of S\1E\2 is wrong here. The reason am using \1 and \2 is to catch the value 01 and 02 and use it in S\1E\2.
My desired output is:
st_srt = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'
So, what is the correct way to achieve this.

You need to capture what you're trying to preserve. Try this:
pattern = re.compile(r'(\d+)x(\d+)') # for st_srt
st_srt = re.sub(pattern, r'S\1E\2', st_srt)

Well, it looks like you already accepted an answer, but I think this is what you said you're trying to do, which is get the replace string from 'st_mkv', then use it in 'st_srt':
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
replace_pattern = re.compile(r'Awake\.([^.]+)\.')
m = replace_pattern.match(st_mkv)
replace_string = m.group(1)
new_srt = re.sub(r'^Awake\.[^.]+\.', 'Awake.{0}.'.format(replace_string), st_srt)
print new_srt

Try using this regex:
([\w+\.]+){5}\-\w+
copy the stirngs into here: http://www.gskinner.com/RegExr/
and paste the regex at the top.
It captures the names of each string, leaving out the extension.
You can then go ahead and append the extension you want, to the string you want.
EDIT:
Here's what I used to do what you're after:
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt' // dont actually need this one
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
replace_pattern = re.compile(r'([\w+\.]+){5}\-\w+')
m = replace_pattern.match(st_mkv)
new_string = m.group(0)
new_string += '.srt'
>>> new_string
'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'

import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
pattern = re.compile(r'(\d+)x(\d+)')
st_srt_new = re.sub(pattern, r'S\1E\2', st_srt)
print st_srt_new

operating over strings, python

How to define a function that takes a string (sentence) and inserts an extra space after a period if the period is directly followed by a letter.
sent = "This is a test.Start testing!"
def normal(sent):
list_of_words = sent.split()
...
This should print out
"This is a test. Start testing!"
I suppose I should use split() to brake a string into a list, but what next?
P.S. The solution has to be as simple as possible.

Use re.sub. Your regular expression will match a period (\.) followed by a letter ([a-zA-Z]). Your replacement string will contain a reference to the second group (\2), which was the letter matched in the regular expression.
>>> import re
>>> re.sub(r'\.([a-zA-Z])', r'. \1', 'This is a test.This is a test. 4.5 balloons.')
'This is a test. This is a test. 4.5 balloons'
Note the choice of [a-zA-Z] for the regular expression. This matches just letters. We do not use \w because it would insert spaces into a decimal number.

One-liner non-regex answer:
def normal(sent):
return ".".join(" " + s if i > 0 and s[0].isalpha() else s for i, s in enumerate(sent.split(".")))
Here is a multi-line version using a similar approach. You may find it more readable.
def normal(sent):
sent = sent.split(".")
result = sent[:1]
for item in sent[1:]:
if item[0].isalpha():
item = " " + item
result.append(item)
return ".".join(result)
Using a regex is probably the better way, though.

Brute force without any checks:
>>> sent = "This is a test.Start testing!"
>>> k = sent.split('.')
>>> ". ".join(l)
'This is a test. Start testing!'
>>>
For removing spaces:
>>> sent = "This is a test. Start testing!"
>>> k = sent.split('.')
>>> l = [x.lstrip(' ') for x in k]
>>> ". ".join(l)
'This is a test. Start testing!'
>>>

Another regex-based solution, might be a tiny bit faster than Steven's (only one pattern match, and a blacklist instead of a whitelist):
import re
re.sub(r'\.([^\s])', r'. \1', some_string)

Improving pyfunc's answer:
sent="This is a test.Start testing!"
k=sent.split('.')
k='. '.join(k)
k.replace('. ','. ')
'This is a test. Start testing!'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Changing only one letter when there are a lot simular in string - python

Suppose I have the following string: I.like.football sky.is.blue I need to make a loop that changes the last '.' to '_' so it looks this way I.like_football sky.is_blue They are all simular style(3 words, 3 dots). How to do that in a loop?

str='I.like.football' str=str.rsplit('.',1) #this split from right but only first '.' print '_'.join(str) # then join it #output I.like_football in single line str='_'.join(str.rsplit('.',1))

This doesn't even need to run in a loop: import re p = re.compile(ur'\.(?=[^\.]+$)', re.IGNORECASE | re.MULTILINE) test_str = u"I.like.football\nsky.is.blue" subst = u"_" result = re.sub(p, subst, test_str)

Related

extract data from nested parenthesis?

String.split() after n characters

Slice string at last digit in Python

Regex: Replace one pattern with another

operating over strings, python

Categories

Resources