How can i find a pattern in regex? - python

I want to find a pattern and replace it with another
Suppose i have:
"Name":"hello"
And want to do this
Name= "hello"
Using python regex
The string could be anything inside double quotes so i need to find pattern "": "" and replace it with =""

This expression,
^"\s*([^"]+?)\s*"\s*:\s*"?([^"]+)"?$
has two capturing groups:
([^"]+?)
for collecting our desired data. Then, we would simply re.sub.
In this demo, the expression is explained, if you might be interested.
Test
import re
result = re.sub('^"\s*([^"]+?)\s*"\s*:\s*"?([^"]+)"?$', '\\1= "\\2"', '" Name ":" hello "')
print(result)

Why not use this regex:
import re
s = '"Name":"hello"'
print(re.sub('"(.*)":"(.*)"', '\\1= \"\\2\"', s))
Output:
Name= "hello"
Explanation here.
For strings containing more than one of those kind of strings, you would need to add some python code to it:
import re
s = '"Name":"hello", "Name2":"hello2"'
print(re.sub('"(.*?)":"(.*?)"', '\\1= \"\\2\"', s))
Output:
Name= "hello", Name2= "hello2"

Using pure Python, this is as simple as:
s = '"Name":"hello"'
print(s.replace(':', '= ').replace('"', '', 2))
# Name= "hello"

Related

Remove String between two characters for all occurrences

I am looking for help on string manipulation in Python 3.
Input String
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
Desired Output
s = "ID,FIRST_NM,LAST_NM,FILLER1"
Basically, the objective is to remove anything between space and comma at all occurrences in the input string.
Any help is much appreciated
using simple regex
import re
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
res = re.sub('\s\w+', '', s)
print(res)
# output ID,FIRST_NM,LAST_NM,FILLER1
You can use regex
import re
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
s = ','.join(re.findall('\w+(?= \w+)', s))
print(s)
Output:
ID,FIRST_NM,LAST_NM,FILLER1

How to start at a specific letter and end when it hits a digit?

I have some sample strings:
s = 'neg(able-23, never-21) s2-1/3'
i = 'amod(Market-8, magical-5) s1'
I've got the problem where I can figure out if the string has 's1' or 's3' using:
word = re.search(r's\d$', s)
But if I want to know if the contains 's2-1/3' in it, it won't work.
Is there a regex expression that can be used so that it works for both cases of 's#' and 's#+?
Thanks!
You can allow the characters "-" and "/" to be captured as well, in addition to just digits. It's hard to tell the exact pattern you're going for here, but something like this would capture "s2-1/3" from your example:
import re
s = "neg(able-23, never-21) s2-1/3"
word = re.search(r"s\d[-/\d]*$", s)
I'm guessing that maybe you would want to extract that with some expression, such as:
(s\d+)-?(.*)$
Demo 1
or:
(s\d+)-?([0-9]+)?\/?([0-9]+)?$
Demo 2
Test
import re
expression = r"(s\d+)-?(.*)$"
string = """
neg(able-23, never-21) s211-12/31
neg(able-23, never-21) s2-1/3
amod(Market-8, magical-5) s1
"""
print(re.findall(expression, string, re.M))
Output
[('s211', '12/31'), ('s2', '1/3'), ('s1', '')]

Getting word from string

How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"
You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example
You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())
Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True

How to get string following some specific letters?

How can I get string from some specific characters? (more specifically, get "test" from "A8 test")
In this case, "A8" is following a pattern like "[A-Z]+[0-9]+".
So it can also be "C6 test","X90 test" and etc.
I've tried in Python using "(?<=[A-Z]+[0-9]).+", which throws an Exception:
"sre_constants.error: look-behind requires fixed-width pattern."
It means I should use fixed-width pattern such as "(?<=[A-Z]{1}[0-9]{1})".
But actually it's not fixed-width. What can I do?
If you means get the rest behind pattern "[A-Z]+[0-9]+", you can try this:
import re
s1 = 'A8 test'
s2 = 'C6 123'
s3 = 'X90 test32'
# parentheses is what you want
p = re.compile("[A-Z]+[0-9]+ (\w+)")
print(p.findall(s1))
print(p.findall(s2))
print(p.findall(s3))
output:
['test']
['123']
['test32']
Hope that will help you, and comment if you have further questions. : )
You can use a capture group to get what you need.
>>> regexp = r"[A-Z]+[0-9]+ (.+)"
>>> re.search(regexp, "C6 test")[1]
"test"
>>> re.search(regexp, "X90 test")[1]
"test"
>>> re.search(regexp, "CBF58456 test")[1]
"test"
Note that the current pattern you show would pick up on any number of uppercase letters followed by any number of digits, as long as there's at least one of each. Also note that my example above would require a blank between the first part and the test string to capture.
You could also use re.sub to jettison part of str you do not need by simply using empty str as second argument:
import re
text = "X90 test"
t = re.sub("[A-Z]+[0-9]+ ","",text)
print(t) #test
import re
ex = r"[A-Z]+[0-9]+ (.+)"
print(re.search(ex , "X90 test")[1])
print(re.search(ex , "C6 test")[1])
print(re.search(ex , "CBF58456 test")[1])
Output
test
test
test
You can split the string, then get your string.
>>> re.split(r'([A-Z]+[0-9]+ )(test)', 'A8 test')
['', 'A8 ', 'test', '']
Or you can write a simple function to find your string in the whole string by not using regex.

Regex: Replace one pattern with another

I am trying to replace one regex pattern with another regex pattern.
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
pattern = re.compile('\d+x\d+') # for st_srt
re.sub(pattern, 'S\1E\2',st_srt)
I know the use of S\1E\2 is wrong here. The reason am using \1 and \2 is to catch the value 01 and 02 and use it in S\1E\2.
My desired output is:
st_srt = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'
So, what is the correct way to achieve this.
You need to capture what you're trying to preserve. Try this:
pattern = re.compile(r'(\d+)x(\d+)') # for st_srt
st_srt = re.sub(pattern, r'S\1E\2', st_srt)
Well, it looks like you already accepted an answer, but I think this is what you said you're trying to do, which is get the replace string from 'st_mkv', then use it in 'st_srt':
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
replace_pattern = re.compile(r'Awake\.([^.]+)\.')
m = replace_pattern.match(st_mkv)
replace_string = m.group(1)
new_srt = re.sub(r'^Awake\.[^.]+\.', 'Awake.{0}.'.format(replace_string), st_srt)
print new_srt
Try using this regex:
([\w+\.]+){5}\-\w+
copy the stirngs into here: http://www.gskinner.com/RegExr/
and paste the regex at the top.
It captures the names of each string, leaving out the extension.
You can then go ahead and append the extension you want, to the string you want.
EDIT:
Here's what I used to do what you're after:
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt' // dont actually need this one
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
replace_pattern = re.compile(r'([\w+\.]+){5}\-\w+')
m = replace_pattern.match(st_mkv)
new_string = m.group(0)
new_string += '.srt'
>>> new_string
'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
pattern = re.compile(r'(\d+)x(\d+)')
st_srt_new = re.sub(pattern, r'S\1E\2', st_srt)
print st_srt_new

Categories