How to match sequential string in text using regular expression?

How to match sequential string in text using regular expression? - python

I am using python's re module to match sequential string in text, for example:
s = 'habcabcabcj', I try the following code:
import re
re.findall(r'(abc)+', s)
And the result is: ["abc"]
If I want the match result to be ["abcabcabc"], how can I do this?

Use a non-capturing group (?:...):
>>> import re
>>> s = 'habcabcabcj'
>>> re.findall(r'(?:abc)+', s)
['abcabcabc']
>>>

Related

I want to extract data using regular expression in python

I have a string = "ProductId%3D967164%26Colour%3Dbright-royal" and i want to extract data using regex so output will be 967164bright-royal.
I have tried with this (?:ProductId%3D|Colour%3D)(.*) in python with regex, but getting output as 967164%26Colour%3Dbright-royal.
Can anyone please help me to find out regex for it.

You don't need a regex here, use urllib.parse module:
from urllib.parse import parse_qs, unquote
qs = "ProductId%3D967164%26Colour%3Dbright-royal"
d = parse_qs(unquote(qs))
print(d)
# Output:
{'ProductId': ['967164'], 'Colour': ['bright-royal']}
Final output:
>>> ''.join(i[0] for i in d.values())
'967164bright-royal'
Update
>>> ''.join(re.findall(r'%3D(\S*?)(?=%26|$)', qs))
'967164bright-royal'

The alternative matches on the first part, you can not get a single match for 2 separate parts in the string.
If you want to capture both values using a regex in a capture group:
(?:ProductId|Colour)%3D(\S*?)(?=%26|$)
Regex demo
import re
pattern = r"(?:ProductId|Colour)%3D(\S*?)(?=%26|$)"
s = "ProductId%3D967164%26Colour%3Dbright-royal"
print(''.join(re.findall(pattern, s)))
Output
967164bright-royal

If you must use a regular expression and you can guarantee that the string will always be formatted the way you expect, you could try this.
import re
pattern = r"ProductId%3D(\d+)%26Colour%3D(.*)"
string = "ProductId%3D967164%26Colour%3Dbright-royal"
matches = re.match(pattern, string)
print(f"{matches[1]}{matches[2]}")

Regex to add quotes around hyphenated strings

I want to add quotes around all hyphenated words in a string.
With an example string, the desired function add_quotes() should perform like this:
>>> s = '{name = first-name}'
>>> add_quotes(s)
{name = "first-name"}
I know how to find all occurances of hyphenated works using this Regex selector, but don't know how to add quotes around each of those occurances in the original string.
>>> import re
>>> s = '{name = first-name}'
>>> re.findall(r'\w+(?:-\w+)+', s)
['first-name']

Regex can be used to do this with Python Module re from the standard library.
import re
def add_quotes(s):
return re.sub(r'\w+(?:-\w+)+', r'"\g<0>"', s)
s = '{name = first-name}'
add_quotes(s) # returns '{name = "first-name"}'
where the occurances of hyphenated words are found using this selector.

python regex find characters from and end of the string

svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz
from the following string I need to fetch Rev1233. So i was wondering if we can have any regexpression to do that. I like to do following string.search ("Rev" uptill next /)
so far I split this using split array
s1,s2,s3,s4,s5 = string ("/",4)

You don't need a regex to do this. It is as simple as:
str = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
str.split('/')[-2]

Here is a quick python example
>>> impot re
>>> s = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
>>> p = re.compile('.*/(Rev\d+)/.*')
>>> p.match(s).groups()[0]
'Rev1223'

Find second part from the end using regex, if preferred:
/(Rev\d+)/[^/]+$
http://regex101.com/r/cC6fO3/1
>>> import re
>>> m = re.search('/(Rev\d+)/[^/]+$', 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz')
>>> m.groups()[0]
'Rev1223'

Replace space character between two numbers

I need to replace space with comma between two numbers
15.30 396.90 => 15.30,396.90
In PHP this is used:
'/(?<=\d)\s+(?=\d)/', ','
How to do it in Python?

There are several ways to do it (sorry, Zen of Python). Which one to use depends on your input:
>>> s = "15.30 396.90"
>>> ",".join(s.split())
'15.30,396.90'
>>> s.replace(" ", ",")
'15.30,396.90'
or, using re, for example, this way:
>>> import re
>>> re.sub("(\d+)\s+(\d+)", r"\1,\2", s)
'15.30,396.90'

You can use the same regex with the re module in Python:
import re
s = '15.30 396.90'
s = re.sub(r'(?<=\d)\s+(?=\d)', ',', s)

how to Ignore characters other than [a-z][A-Z]

How can I ignore characters other than [a-z][A-Z] in input string in python, and after applying method what will the string look like?
Do I need to use regular expressions?

If you need to use a regex, use a negative character class ([^...]):
re.sub(r'[^a-zA-Z]', '', inputtext)
A negative character class matches anything not named in the class.
Demo:
>>> import re
>>> inputtext = 'The quick brown fox!'
>>> re.sub(r'[^a-zA-Z]', '', inputtext)
'Thequickbrownfox'
But using str.translate() is way faster:
import string
ascii_letters = set(map(ord, string.ascii_letters))
non_letters = ''.join(chr(i) for i in range(256) if i not in ascii_letters)
inputtext.translate(None, non_letters)
Using str.translate() is more than 10 times faster than a regular expression:
>>> import timeit, partial, re
>>> ascii_only = partial(re.compile(r'[^a-zA-Z]').sub, '')
>>> timeit.timeit('f(t)', 'from __main__ import ascii_only as f, inputtext as t')
7.903045892715454
>>> timeit.timeit('t.translate(None, m)', 'from __main__ import inputtext as t, non_letters as m')
0.5990171432495117
Using Jakub's method is slower still:
>>> timeit.timeit("''.join(c for c in t if c not in l)", 'from __main__ import inputtext as t; import string; l = set(string.letters)')
9.960685968399048

You can use regex:
re.compile(r'[^a-zA-Z]').sub('', your_string)
You could also manage without regular expressions (e.g, if you had regex phobia):
import string
new_string = ''.join(c for c in old_string
if c not in set(string.letters))
Although I would use regex, this example has additional educational values: set, comprehension and string library. Note that set is not strictly needed here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to match sequential string in text using regular expression? - python

I am using python's re module to match sequential string in text, for example: s = 'habcabcabcj', I try the following code: import re re.findall(r'(abc)+', s) And the result is: ["abc"] If I want the match result to be ["abcabcabc"], how can I do this?

Use a non-capturing group (?:...): >>> import re >>> s = 'habcabcabcj' >>> re.findall(r'(?:abc)+', s) ['abcabcabc'] >>>

Related

I want to extract data using regular expression in python

Regex to add quotes around hyphenated strings

python regex find characters from and end of the string

Replace space character between two numbers

how to Ignore characters other than [a-z][A-Z]

Categories

Resources