Match a string and replace remaning characters - python

I would like to extract the server type from the hostname and replace the remaining characters with underscores so I can then use it with the LIKE pattern match in SQLite
My initial approach was something like this (this is the excepted output):
>>> host = 'webus01'
>>> location = 'us'
>>> parts = list(host.partition(location))
>>> parts
['web', 'us', '01']
>>> parts[1] = "_" * len(parts[1])
>>> parts[2] = "_" * len(parts[2])
>>> "".join(parts) + ".%"
'web____.%'
But this won't work if the hostname starts with or contains the location name:
>>> host = 'utilityit01'
>>> pod = 'it'
>>> parts = list(host.partition(location))
>>> parts
['utilityit01', '', '']
>>> parts[1] = "_" * len(parts[1])
>>> parts[2] = "_" * len(parts[2])
>>> "".join(parts) + ".%"
'utilityit01.%'
Then I though it will be better to use RegEx for this to match only the location before digits.
The re.sub function seems to be a good candidate for this task but I'm not sure how to replace all characters instead of the match group as a whole:
>>> import re
>>> re.sub(r'it\d+.*', '_', 'utilityit01a')
'utility_'
The output in this case should be: utility_____.%.

Looks like you need.
import re
print(re.sub(r'(it\d+.*)', lambda x: '_'*len(x.group(1))+".%" , 'utilityit01a'))
print(re.sub(r'(us\d+.*)', lambda x: '_'*len(x.group(1))+".%" , 'webus01'))
Output:
utility_____.%
web____.%

You can find the substring that you want to replace first, and then replace it based on its length:
>>> import re
>>> host = 'utilityit01'
>>> substr = re.findall(r'it\d+.*', host)
>>> substr
['it01a']
>>> host.replace(substr[0], len(substr)*'_')
'utility_____'

Related

python distinguish number and string solution

I'm new to python and trying to solve the distinguish between number and string
For example :
Input: 111aa111aa
Output : Number: 111111 , String : aaaa
Here is your answer
for numbers
import re
x = '111aa111aa'
num = ''.join(re.findall(r'[\d]+',x))
for alphabets
import re
x = '111aa111aa'
alphabets = ''.join(re.findall(r'[a-zA-Z]', x))
You can use in-built functions as isdigit() and isalpha()
>>> x = '111aa111aa'
>>> number = ''.join([i for i in x if i.isdigit()])
'111111'
>>> string = ''.join([i for i in x if i.isalpha()])
'aaaa'
Or You can use regex here :
>>> x = '111aa111aa'
>>> import re
>>> numbers = ''.join(re.findall(r'\d+', x))
'111111'
>>> string = ''.join(re.findall(r'[a-zA-Z]', x))
'aaaa'
>>> my_string = '111aa111aa'
>>> ''.join(filter(str.isdigit, my_string))
'111111'
>>> ''.join(filter(str.isalpha, my_string))
'aaaa'
Try with isalpha for strings and isdigit for numbers,
In [45]: a = '111aa111aa'
In [47]: ''.join([i for i in a if i.isalpha()])
Out[47]: 'aaaa'
In [48]: ''.join([i for i in a if i.isdigit()])
Out[48]: '111111'
OR
In [18]: strings,numbers = filter(str.isalpha,a),filter(str.isdigit,a)
In [19]: print strings,numbers
aaaa 111111
As you mentioned you are new to Python, most of the presented approaches using str.join with list comprehensions or functional styles are quite sufficient. Alternatively, I present some options using dictionaries that can help organize data, starting from basic to intermediate examples with arguably increasing intricacy.
Basic Alternative
# Dictionary
d = {"Number":"", "String":""}
for char in s:
if char.isdigit():
d["Number"] += char
elif char.isalpha():
d["String"] += char
d
# {'Number': '111111', 'String': 'aaaa'}
d["Number"] # access by key
# '111111'
import collections as ct
# Default Dictionary
dd = ct.defaultdict(str)
for char in s:
if char.isdigit():
dd["Number"] += char
elif char.isalpha():
dd["String"] += char
dd

Find a regular extension of domains on a string python Regex

I would like to find extension "COM" from a sentence using regex in python.
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\s.\s\:]+\.[^\.\s\:]*)')
>>>
Result:
domain : 'domain.com' ### notes: not domain.coms
url : 'http://domain.coms/index/page/2'
may be you are looking for this:
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\/]*\.(?:com|en|org))')
>>> m = pattern.search(str)
>>> print m.group(1)
domain.com
((?:https?:\/\/)?(?:([^\s.\s\:]+\.[^\/]*)(?:\/|$)[^\.\s\:]*))
Try this.Group 1 will be the url.Group 2 will be the domain.
See demo.
http://regex101.com/r/sK8oK9/1
You could try the below.
>>> s = "finding exstention from string on http://domain.coms/index/page/2"
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(1)
>>> m
'http://domain.coms/index/page/2'
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(2)
>>> m
'domain.coms'

python find digits with leading '_v'

Is there a better way of finding digits in a string which starts with '_v' which stands for version number? What I want is just '001'
filename = 'greatv02_v001_jam.mb'
parts = re.split('_v|\_',filename)
>>['greatv02', '001', 'jam.mb']
b = re.findall(r'\d+', filename)
>>['02', '001']
Is there a way to split a string with something along these lines?
parts = re.split('_v###_',filename)
or
parts = re.split('_v*_',filename)
You could use lookarounds:
>>> filename = 'greatv02_v001_jam.mb'
>>> import re
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>>
>>> filename = 'greatv02_v001_av456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>> filename = 'greatv02_v001_v456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001', '456']
>>>
Ugly, but you could partition the file name twice
>>> filename.partition('_v')[2].partition('_')[0]
'001'
Use regex's grouping like this:
.*_v(\d+).*
Demo:
>>> filename = 'greatv02_v001_jam.mb'
>>> pattern = re.compile(r'.*_v(\d+).*')
>>> re.search(pattern, filename).group(1)
'001'
How about the regex _v(?P<version>\d+).*:
>>> regex = re.compile("_v(?P<version>\d+).*")
>>> r = regex.search(string)
# List the groups found
>>> r.groups()
(u'001',)
# List the named dictionary objects found
>>> r.groupdict()
{u'version': u'001'}
# Run findall
>>> regex.findall(string)
[u'001']
# Run timeit test
>>> setup = ur"import re; regex =re.compile("_v(?P<version>\d+).*");string="""greatv02_v00 ...
>>> t = timeit.Timer('regex.search(string)',setup)
>>> t.timeit(10000)
0.005126953125

How to search and export value with python and re?

I'm trying to export some value from the text to a txt file.
my text has this form:
"a='one' b='2' c='3' a='two' b='8' c='3'"
I want to export all the value of the key "a"
The result must be like
one
two
The other answers are correct for your particular case, but I think a regex with lookbehind/lookahead is a more general solution, i.e.:
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
expr = r"(?<=a=')[^']*(?=')"
matches = re.findall(expr,text)
for m in matches:
print m ##or whatever
This will match for any expression between single quotes preceded by a=, i.e. a='xyz', a='my#1.abcd' and a='a=5%' will all match
This regex is very easy to understand:
pattern = r"a='(.*?)'"
It doesn't use lookarounds (like (?<=a=')[^']*(?=') ) - so it's very simple ..
Whole program:
#!/usr/bin/python
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
pattern = r"a='(.*?)'"
for m in re.findall( pattern, text ):
print m
you can use something like this:
import re
r = re.compile(r"'([a-z]+)'")
f = open('input')
text = f.read()
m = r.finditer(text)
for mm in m:
print mm.group(1)
thought i would give a solution without re:
>>> text = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> step1 = text.split(" ")
>>> step1
["a='one'", "b='2'", "c='3'", "a='two'", "b='8'", "c='3'"]
>>> step2 = []
>>> for pair in step1:
split_pair = pair.split("=")
step2.append([split_pair[0],split_pair[1]])
>>> print step2
[['a', "'one'"], ['b', "'2'"], ['c', "'3'"], ['a', "'two'"], ['b', "'8'"], ['c', "'3'"]]
>>> results = []
>>> for split_pair in step2:
if split_pair[0] == "a":
results.append(split_pair[1])
>>> results
["'one'", "'two'"]
not the most elegant method, but it works.
Another non-regex solution: you could use the shlex module and the .partition method (or .split() with maxsplit=1):
>>> import shlex
>>> s = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> shlex.split(s)
['a=one', 'b=2', 'c=3', 'a=two', 'b=8', 'c=3']
>>> shlex.split(s)[0].partition("=")
('a', '=', 'one')
and so it's simply
>>> for group in shlex.split(s):
... key, eq, val = group.partition("=")
... if key == 'a':
... print val
...
one
two
with lots of variations of the same.

Python - using variable in regular expression

I'm trying to change width value in the string using regex.
Width can have multiple formats, such as width="500px" or width:500px or width=500px (without quotes), etc..
Currently, I'm searching and replacing individually like this
p = re.compile('width="\w{3}')
embed_url = p.sub('width="555', embed_url)
# width:"555
p = re.compile('width:"\w{3}')
embed_url = p.sub('width:"555', embed_url)
Is there any way to use one regular expression and replace strings : or = accordingly?
EDIT
Changed the above code, so : and = is changed accordingly, instead of replacing all of them with "="
Try this regex:
width(:|=)"?\w{3}
it matches:
width="300px
width=300px
width:"300px
width:300px
You can use | and grouping like this:
>>> p = re.compile('width(:|=)"\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
If you include quotes / apostrophes in a group, you can do this:
>>> p = re.compile('width(:|=)("|\')\w{3}')
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:\'500px"'))
width="555px"
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=\'500px"'))
width="555px"
Adding a ? will make the previous element optional:
>>> p = re.compile('width(:|=)("|\')?\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=500px'))
width="555px
Hope this helps.

Categories