I would like to extract the server type from the hostname and replace the remaining characters with underscores so I can then use it with the LIKE pattern match in SQLite
My initial approach was something like this (this is the excepted output):
>>> host = 'webus01'
>>> location = 'us'
>>> parts = list(host.partition(location))
>>> parts
['web', 'us', '01']
>>> parts[1] = "_" * len(parts[1])
>>> parts[2] = "_" * len(parts[2])
>>> "".join(parts) + ".%"
'web____.%'
But this won't work if the hostname starts with or contains the location name:
>>> host = 'utilityit01'
>>> pod = 'it'
>>> parts = list(host.partition(location))
>>> parts
['utilityit01', '', '']
>>> parts[1] = "_" * len(parts[1])
>>> parts[2] = "_" * len(parts[2])
>>> "".join(parts) + ".%"
'utilityit01.%'
Then I though it will be better to use RegEx for this to match only the location before digits.
The re.sub function seems to be a good candidate for this task but I'm not sure how to replace all characters instead of the match group as a whole:
>>> import re
>>> re.sub(r'it\d+.*', '_', 'utilityit01a')
'utility_'
The output in this case should be: utility_____.%.
Looks like you need.
import re
print(re.sub(r'(it\d+.*)', lambda x: '_'*len(x.group(1))+".%" , 'utilityit01a'))
print(re.sub(r'(us\d+.*)', lambda x: '_'*len(x.group(1))+".%" , 'webus01'))
Output:
utility_____.%
web____.%
You can find the substring that you want to replace first, and then replace it based on its length:
>>> import re
>>> host = 'utilityit01'
>>> substr = re.findall(r'it\d+.*', host)
>>> substr
['it01a']
>>> host.replace(substr[0], len(substr)*'_')
'utility_____'
Related
I'm new to python and trying to solve the distinguish between number and string
For example :
Input: 111aa111aa
Output : Number: 111111 , String : aaaa
Here is your answer
for numbers
import re
x = '111aa111aa'
num = ''.join(re.findall(r'[\d]+',x))
for alphabets
import re
x = '111aa111aa'
alphabets = ''.join(re.findall(r'[a-zA-Z]', x))
You can use in-built functions as isdigit() and isalpha()
>>> x = '111aa111aa'
>>> number = ''.join([i for i in x if i.isdigit()])
'111111'
>>> string = ''.join([i for i in x if i.isalpha()])
'aaaa'
Or You can use regex here :
>>> x = '111aa111aa'
>>> import re
>>> numbers = ''.join(re.findall(r'\d+', x))
'111111'
>>> string = ''.join(re.findall(r'[a-zA-Z]', x))
'aaaa'
>>> my_string = '111aa111aa'
>>> ''.join(filter(str.isdigit, my_string))
'111111'
>>> ''.join(filter(str.isalpha, my_string))
'aaaa'
Try with isalpha for strings and isdigit for numbers,
In [45]: a = '111aa111aa'
In [47]: ''.join([i for i in a if i.isalpha()])
Out[47]: 'aaaa'
In [48]: ''.join([i for i in a if i.isdigit()])
Out[48]: '111111'
OR
In [18]: strings,numbers = filter(str.isalpha,a),filter(str.isdigit,a)
In [19]: print strings,numbers
aaaa 111111
As you mentioned you are new to Python, most of the presented approaches using str.join with list comprehensions or functional styles are quite sufficient. Alternatively, I present some options using dictionaries that can help organize data, starting from basic to intermediate examples with arguably increasing intricacy.
Basic Alternative
# Dictionary
d = {"Number":"", "String":""}
for char in s:
if char.isdigit():
d["Number"] += char
elif char.isalpha():
d["String"] += char
d
# {'Number': '111111', 'String': 'aaaa'}
d["Number"] # access by key
# '111111'
import collections as ct
# Default Dictionary
dd = ct.defaultdict(str)
for char in s:
if char.isdigit():
dd["Number"] += char
elif char.isalpha():
dd["String"] += char
dd
I would like to find extension "COM" from a sentence using regex in python.
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\s.\s\:]+\.[^\.\s\:]*)')
>>>
Result:
domain : 'domain.com' ### notes: not domain.coms
url : 'http://domain.coms/index/page/2'
may be you are looking for this:
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\/]*\.(?:com|en|org))')
>>> m = pattern.search(str)
>>> print m.group(1)
domain.com
((?:https?:\/\/)?(?:([^\s.\s\:]+\.[^\/]*)(?:\/|$)[^\.\s\:]*))
Try this.Group 1 will be the url.Group 2 will be the domain.
See demo.
http://regex101.com/r/sK8oK9/1
You could try the below.
>>> s = "finding exstention from string on http://domain.coms/index/page/2"
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(1)
>>> m
'http://domain.coms/index/page/2'
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(2)
>>> m
'domain.coms'
Is there a better way of finding digits in a string which starts with '_v' which stands for version number? What I want is just '001'
filename = 'greatv02_v001_jam.mb'
parts = re.split('_v|\_',filename)
>>['greatv02', '001', 'jam.mb']
b = re.findall(r'\d+', filename)
>>['02', '001']
Is there a way to split a string with something along these lines?
parts = re.split('_v###_',filename)
or
parts = re.split('_v*_',filename)
You could use lookarounds:
>>> filename = 'greatv02_v001_jam.mb'
>>> import re
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>>
>>> filename = 'greatv02_v001_av456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>> filename = 'greatv02_v001_v456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001', '456']
>>>
Ugly, but you could partition the file name twice
>>> filename.partition('_v')[2].partition('_')[0]
'001'
Use regex's grouping like this:
.*_v(\d+).*
Demo:
>>> filename = 'greatv02_v001_jam.mb'
>>> pattern = re.compile(r'.*_v(\d+).*')
>>> re.search(pattern, filename).group(1)
'001'
How about the regex _v(?P<version>\d+).*:
>>> regex = re.compile("_v(?P<version>\d+).*")
>>> r = regex.search(string)
# List the groups found
>>> r.groups()
(u'001',)
# List the named dictionary objects found
>>> r.groupdict()
{u'version': u'001'}
# Run findall
>>> regex.findall(string)
[u'001']
# Run timeit test
>>> setup = ur"import re; regex =re.compile("_v(?P<version>\d+).*");string="""greatv02_v00 ...
>>> t = timeit.Timer('regex.search(string)',setup)
>>> t.timeit(10000)
0.005126953125
I'm trying to export some value from the text to a txt file.
my text has this form:
"a='one' b='2' c='3' a='two' b='8' c='3'"
I want to export all the value of the key "a"
The result must be like
one
two
The other answers are correct for your particular case, but I think a regex with lookbehind/lookahead is a more general solution, i.e.:
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
expr = r"(?<=a=')[^']*(?=')"
matches = re.findall(expr,text)
for m in matches:
print m ##or whatever
This will match for any expression between single quotes preceded by a=, i.e. a='xyz', a='my#1.abcd' and a='a=5%' will all match
This regex is very easy to understand:
pattern = r"a='(.*?)'"
It doesn't use lookarounds (like (?<=a=')[^']*(?=') ) - so it's very simple ..
Whole program:
#!/usr/bin/python
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
pattern = r"a='(.*?)'"
for m in re.findall( pattern, text ):
print m
you can use something like this:
import re
r = re.compile(r"'([a-z]+)'")
f = open('input')
text = f.read()
m = r.finditer(text)
for mm in m:
print mm.group(1)
thought i would give a solution without re:
>>> text = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> step1 = text.split(" ")
>>> step1
["a='one'", "b='2'", "c='3'", "a='two'", "b='8'", "c='3'"]
>>> step2 = []
>>> for pair in step1:
split_pair = pair.split("=")
step2.append([split_pair[0],split_pair[1]])
>>> print step2
[['a', "'one'"], ['b', "'2'"], ['c', "'3'"], ['a', "'two'"], ['b', "'8'"], ['c', "'3'"]]
>>> results = []
>>> for split_pair in step2:
if split_pair[0] == "a":
results.append(split_pair[1])
>>> results
["'one'", "'two'"]
not the most elegant method, but it works.
Another non-regex solution: you could use the shlex module and the .partition method (or .split() with maxsplit=1):
>>> import shlex
>>> s = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> shlex.split(s)
['a=one', 'b=2', 'c=3', 'a=two', 'b=8', 'c=3']
>>> shlex.split(s)[0].partition("=")
('a', '=', 'one')
and so it's simply
>>> for group in shlex.split(s):
... key, eq, val = group.partition("=")
... if key == 'a':
... print val
...
one
two
with lots of variations of the same.
I'm trying to change width value in the string using regex.
Width can have multiple formats, such as width="500px" or width:500px or width=500px (without quotes), etc..
Currently, I'm searching and replacing individually like this
p = re.compile('width="\w{3}')
embed_url = p.sub('width="555', embed_url)
# width:"555
p = re.compile('width:"\w{3}')
embed_url = p.sub('width:"555', embed_url)
Is there any way to use one regular expression and replace strings : or = accordingly?
EDIT
Changed the above code, so : and = is changed accordingly, instead of replacing all of them with "="
Try this regex:
width(:|=)"?\w{3}
it matches:
width="300px
width=300px
width:"300px
width:300px
You can use | and grouping like this:
>>> p = re.compile('width(:|=)"\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
If you include quotes / apostrophes in a group, you can do this:
>>> p = re.compile('width(:|=)("|\')\w{3}')
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:\'500px"'))
width="555px"
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=\'500px"'))
width="555px"
Adding a ? will make the previous element optional:
>>> p = re.compile('width(:|=)("|\')?\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=500px'))
width="555px
Hope this helps.