How to match sequential string in text using regular expression? - python

I am using python's re module to match sequential string in text, for example:
s = 'habcabcabcj', I try the following code:
import re
re.findall(r'(abc)+', s)
And the result is: ["abc"]
If I want the match result to be ["abcabcabc"], how can I do this?

Use a non-capturing group (?:...):
>>> import re
>>> s = 'habcabcabcj'
>>> re.findall(r'(?:abc)+', s)
['abcabcabc']
>>>

Related

I want to extract data using regular expression in python

I have a string = "ProductId%3D967164%26Colour%3Dbright-royal" and i want to extract data using regex so output will be 967164bright-royal.
I have tried with this (?:ProductId%3D|Colour%3D)(.*) in python with regex, but getting output as 967164%26Colour%3Dbright-royal.
Can anyone please help me to find out regex for it.
You don't need a regex here, use urllib.parse module:
from urllib.parse import parse_qs, unquote
qs = "ProductId%3D967164%26Colour%3Dbright-royal"
d = parse_qs(unquote(qs))
print(d)
# Output:
{'ProductId': ['967164'], 'Colour': ['bright-royal']}
Final output:
>>> ''.join(i[0] for i in d.values())
'967164bright-royal'
Update
>>> ''.join(re.findall(r'%3D(\S*?)(?=%26|$)', qs))
'967164bright-royal'
The alternative matches on the first part, you can not get a single match for 2 separate parts in the string.
If you want to capture both values using a regex in a capture group:
(?:ProductId|Colour)%3D(\S*?)(?=%26|$)
Regex demo
import re
pattern = r"(?:ProductId|Colour)%3D(\S*?)(?=%26|$)"
s = "ProductId%3D967164%26Colour%3Dbright-royal"
print(''.join(re.findall(pattern, s)))
Output
967164bright-royal
If you must use a regular expression and you can guarantee that the string will always be formatted the way you expect, you could try this.
import re
pattern = r"ProductId%3D(\d+)%26Colour%3D(.*)"
string = "ProductId%3D967164%26Colour%3Dbright-royal"
matches = re.match(pattern, string)
print(f"{matches[1]}{matches[2]}")

Regex to add quotes around hyphenated strings

I want to add quotes around all hyphenated words in a string.
With an example string, the desired function add_quotes() should perform like this:
>>> s = '{name = first-name}'
>>> add_quotes(s)
{name = "first-name"}
I know how to find all occurances of hyphenated works using this Regex selector, but don't know how to add quotes around each of those occurances in the original string.
>>> import re
>>> s = '{name = first-name}'
>>> re.findall(r'\w+(?:-\w+)+', s)
['first-name']
Regex can be used to do this with Python Module re from the standard library.
import re
def add_quotes(s):
return re.sub(r'\w+(?:-\w+)+', r'"\g<0>"', s)
s = '{name = first-name}'
add_quotes(s) # returns '{name = "first-name"}'
where the occurances of hyphenated words are found using this selector.

python regex find characters from and end of the string

svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz
from the following string I need to fetch Rev1233. So i was wondering if we can have any regexpression to do that. I like to do following string.search ("Rev" uptill next /)
so far I split this using split array
s1,s2,s3,s4,s5 = string ("/",4)
You don't need a regex to do this. It is as simple as:
str = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
str.split('/')[-2]
Here is a quick python example
>>> impot re
>>> s = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
>>> p = re.compile('.*/(Rev\d+)/.*')
>>> p.match(s).groups()[0]
'Rev1223'
Find second part from the end using regex, if preferred:
/(Rev\d+)/[^/]+$
http://regex101.com/r/cC6fO3/1
>>> import re
>>> m = re.search('/(Rev\d+)/[^/]+$', 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz')
>>> m.groups()[0]
'Rev1223'

Replace space character between two numbers

I need to replace space with comma between two numbers
15.30 396.90 => 15.30,396.90
In PHP this is used:
'/(?<=\d)\s+(?=\d)/', ','
How to do it in Python?
There are several ways to do it (sorry, Zen of Python). Which one to use depends on your input:
>>> s = "15.30 396.90"
>>> ",".join(s.split())
'15.30,396.90'
>>> s.replace(" ", ",")
'15.30,396.90'
or, using re, for example, this way:
>>> import re
>>> re.sub("(\d+)\s+(\d+)", r"\1,\2", s)
'15.30,396.90'
You can use the same regex with the re module in Python:
import re
s = '15.30 396.90'
s = re.sub(r'(?<=\d)\s+(?=\d)', ',', s)

how to Ignore characters other than [a-z][A-Z]

How can I ignore characters other than [a-z][A-Z] in input string in python, and after applying method what will the string look like?
Do I need to use regular expressions?
If you need to use a regex, use a negative character class ([^...]):
re.sub(r'[^a-zA-Z]', '', inputtext)
A negative character class matches anything not named in the class.
Demo:
>>> import re
>>> inputtext = 'The quick brown fox!'
>>> re.sub(r'[^a-zA-Z]', '', inputtext)
'Thequickbrownfox'
But using str.translate() is way faster:
import string
ascii_letters = set(map(ord, string.ascii_letters))
non_letters = ''.join(chr(i) for i in range(256) if i not in ascii_letters)
inputtext.translate(None, non_letters)
Using str.translate() is more than 10 times faster than a regular expression:
>>> import timeit, partial, re
>>> ascii_only = partial(re.compile(r'[^a-zA-Z]').sub, '')
>>> timeit.timeit('f(t)', 'from __main__ import ascii_only as f, inputtext as t')
7.903045892715454
>>> timeit.timeit('t.translate(None, m)', 'from __main__ import inputtext as t, non_letters as m')
0.5990171432495117
Using Jakub's method is slower still:
>>> timeit.timeit("''.join(c for c in t if c not in l)", 'from __main__ import inputtext as t; import string; l = set(string.letters)')
9.960685968399048
You can use regex:
re.compile(r'[^a-zA-Z]').sub('', your_string)
You could also manage without regular expressions (e.g, if you had regex phobia):
import string
new_string = ''.join(c for c in old_string
if c not in set(string.letters))
Although I would use regex, this example has additional educational values: set, comprehension and string library. Note that set is not strictly needed here

Categories