I would like to find extension "COM" from a sentence using regex in python.
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\s.\s\:]+\.[^\.\s\:]*)')
>>>
Result:
domain : 'domain.com' ### notes: not domain.coms
url : 'http://domain.coms/index/page/2'
may be you are looking for this:
>>> import re
>>> str = 'finding exstention from string on http://domain.coms/index/page/2'
>>> pattern = re.compile(r'([^\/]*\.(?:com|en|org))')
>>> m = pattern.search(str)
>>> print m.group(1)
domain.com
((?:https?:\/\/)?(?:([^\s.\s\:]+\.[^\/]*)(?:\/|$)[^\.\s\:]*))
Try this.Group 1 will be the url.Group 2 will be the domain.
See demo.
http://regex101.com/r/sK8oK9/1
You could try the below.
>>> s = "finding exstention from string on http://domain.coms/index/page/2"
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(1)
>>> m
'http://domain.coms/index/page/2'
>>> m = re.search(r'(\S+?([^/.]+\.[^/]+)\S+)', s).group(2)
>>> m
'domain.coms'
Related
I would like to extract the server type from the hostname and replace the remaining characters with underscores so I can then use it with the LIKE pattern match in SQLite
My initial approach was something like this (this is the excepted output):
>>> host = 'webus01'
>>> location = 'us'
>>> parts = list(host.partition(location))
>>> parts
['web', 'us', '01']
>>> parts[1] = "_" * len(parts[1])
>>> parts[2] = "_" * len(parts[2])
>>> "".join(parts) + ".%"
'web____.%'
But this won't work if the hostname starts with or contains the location name:
>>> host = 'utilityit01'
>>> pod = 'it'
>>> parts = list(host.partition(location))
>>> parts
['utilityit01', '', '']
>>> parts[1] = "_" * len(parts[1])
>>> parts[2] = "_" * len(parts[2])
>>> "".join(parts) + ".%"
'utilityit01.%'
Then I though it will be better to use RegEx for this to match only the location before digits.
The re.sub function seems to be a good candidate for this task but I'm not sure how to replace all characters instead of the match group as a whole:
>>> import re
>>> re.sub(r'it\d+.*', '_', 'utilityit01a')
'utility_'
The output in this case should be: utility_____.%.
Looks like you need.
import re
print(re.sub(r'(it\d+.*)', lambda x: '_'*len(x.group(1))+".%" , 'utilityit01a'))
print(re.sub(r'(us\d+.*)', lambda x: '_'*len(x.group(1))+".%" , 'webus01'))
Output:
utility_____.%
web____.%
You can find the substring that you want to replace first, and then replace it based on its length:
>>> import re
>>> host = 'utilityit01'
>>> substr = re.findall(r'it\d+.*', host)
>>> substr
['it01a']
>>> host.replace(substr[0], len(substr)*'_')
'utility_____'
I have the following as input. I am trying to write a regular expression which yields the below output. Can anyone provide
input on how to do this?
INPUT:-
refs/changes/44/1025744/3
refs/changes/62/1025962/5
refs/changes/45/913745/2
OUTPUT:-
1025744/3
1025962/5
913745/2
If that is the actual import format, a regex is not needed:
>>> source = """\
... refs/changes/44/1025744/3
... refs/changes/62/1025962/5
... refs/changes/45/913745/2
... """
>>> output = [line.split('/', 3)[-1] for line in source.splitlines()]
>>> output[0]
'1025744/3'
>>> output[1]
'1025962/5'
You can also have them all in one string, like this:
>>> ' '.join(line.split('/', 3)[-1] for line in source.splitlines())
'1025744/3 1025962/5 913745/2'
If you are feeding the input line by line, you could do this:
import re
instr = 'refs/changes/44/1025744/3'
print get_match(instr)
def get_match():
match = re.match("^(refs/changes/[0-9]*/)([0-9/]*)", instr)
if match:
return match.group(2)
>>> import re
>>> input="""refs/changes/44/1025744/3
... refs/changes/62/1025962/5
... refs/changes/45/913745/2"""
>>> res=re.findall(r'.*/.*/.*/(.*/.*)',input)
>>> for i in res:
... print i
...
1025744/3
1025962/5
913745/2
Is there a better way of finding digits in a string which starts with '_v' which stands for version number? What I want is just '001'
filename = 'greatv02_v001_jam.mb'
parts = re.split('_v|\_',filename)
>>['greatv02', '001', 'jam.mb']
b = re.findall(r'\d+', filename)
>>['02', '001']
Is there a way to split a string with something along these lines?
parts = re.split('_v###_',filename)
or
parts = re.split('_v*_',filename)
You could use lookarounds:
>>> filename = 'greatv02_v001_jam.mb'
>>> import re
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>>
>>> filename = 'greatv02_v001_av456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001']
>>> filename = 'greatv02_v001_v456jam.mb'
>>> re.findall(r'(?<=_v)\d+', filename)
['001', '456']
>>>
Ugly, but you could partition the file name twice
>>> filename.partition('_v')[2].partition('_')[0]
'001'
Use regex's grouping like this:
.*_v(\d+).*
Demo:
>>> filename = 'greatv02_v001_jam.mb'
>>> pattern = re.compile(r'.*_v(\d+).*')
>>> re.search(pattern, filename).group(1)
'001'
How about the regex _v(?P<version>\d+).*:
>>> regex = re.compile("_v(?P<version>\d+).*")
>>> r = regex.search(string)
# List the groups found
>>> r.groups()
(u'001',)
# List the named dictionary objects found
>>> r.groupdict()
{u'version': u'001'}
# Run findall
>>> regex.findall(string)
[u'001']
# Run timeit test
>>> setup = ur"import re; regex =re.compile("_v(?P<version>\d+).*");string="""greatv02_v00 ...
>>> t = timeit.Timer('regex.search(string)',setup)
>>> t.timeit(10000)
0.005126953125
I'm trying to export some value from the text to a txt file.
my text has this form:
"a='one' b='2' c='3' a='two' b='8' c='3'"
I want to export all the value of the key "a"
The result must be like
one
two
The other answers are correct for your particular case, but I think a regex with lookbehind/lookahead is a more general solution, i.e.:
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
expr = r"(?<=a=')[^']*(?=')"
matches = re.findall(expr,text)
for m in matches:
print m ##or whatever
This will match for any expression between single quotes preceded by a=, i.e. a='xyz', a='my#1.abcd' and a='a=5%' will all match
This regex is very easy to understand:
pattern = r"a='(.*?)'"
It doesn't use lookarounds (like (?<=a=')[^']*(?=') ) - so it's very simple ..
Whole program:
#!/usr/bin/python
import re
text = "a='one' b='2' c='3' a='two' b='8' c='3'"
pattern = r"a='(.*?)'"
for m in re.findall( pattern, text ):
print m
you can use something like this:
import re
r = re.compile(r"'([a-z]+)'")
f = open('input')
text = f.read()
m = r.finditer(text)
for mm in m:
print mm.group(1)
thought i would give a solution without re:
>>> text = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> step1 = text.split(" ")
>>> step1
["a='one'", "b='2'", "c='3'", "a='two'", "b='8'", "c='3'"]
>>> step2 = []
>>> for pair in step1:
split_pair = pair.split("=")
step2.append([split_pair[0],split_pair[1]])
>>> print step2
[['a', "'one'"], ['b', "'2'"], ['c', "'3'"], ['a', "'two'"], ['b', "'8'"], ['c', "'3'"]]
>>> results = []
>>> for split_pair in step2:
if split_pair[0] == "a":
results.append(split_pair[1])
>>> results
["'one'", "'two'"]
not the most elegant method, but it works.
Another non-regex solution: you could use the shlex module and the .partition method (or .split() with maxsplit=1):
>>> import shlex
>>> s = "a='one' b='2' c='3' a='two' b='8' c='3'"
>>> shlex.split(s)
['a=one', 'b=2', 'c=3', 'a=two', 'b=8', 'c=3']
>>> shlex.split(s)[0].partition("=")
('a', '=', 'one')
and so it's simply
>>> for group in shlex.split(s):
... key, eq, val = group.partition("=")
... if key == 'a':
... print val
...
one
two
with lots of variations of the same.
I'm trying to change width value in the string using regex.
Width can have multiple formats, such as width="500px" or width:500px or width=500px (without quotes), etc..
Currently, I'm searching and replacing individually like this
p = re.compile('width="\w{3}')
embed_url = p.sub('width="555', embed_url)
# width:"555
p = re.compile('width:"\w{3}')
embed_url = p.sub('width:"555', embed_url)
Is there any way to use one regular expression and replace strings : or = accordingly?
EDIT
Changed the above code, so : and = is changed accordingly, instead of replacing all of them with "="
Try this regex:
width(:|=)"?\w{3}
it matches:
width="300px
width=300px
width:"300px
width:300px
You can use | and grouping like this:
>>> p = re.compile('width(:|=)"\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
If you include quotes / apostrophes in a group, you can do this:
>>> p = re.compile('width(:|=)("|\')\w{3}')
>>> print(p.sub('width="555', 'width:"500px"'))
width="555px"
>>> print(p.sub('width="555', 'width:\'500px"'))
width="555px"
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=\'500px"'))
width="555px"
Adding a ? will make the previous element optional:
>>> p = re.compile('width(:|=)("|\')?\w{3}')
>>> print(p.sub('width="555', 'width="500px"'))
width="555px"
>>> print(p.sub('width="555', 'width=500px'))
width="555px
Hope this helps.