Extract substring using regular expressions - python

I need to extract only the part 0000520621 from the string nmg-22373-0000520621-001-010000520621.
I would like to use regular expressions in python for this task.
Can you help me in doing so?

You don't need a regexp to get the third member of a list. Just split your string by the minus character and pick it's third member.
test = 'nmg-22373-0000520621-001-010000520621'
test.split('-')[2]

my_string = 'nmg-22373-0000520621-001-010000520621'
expected = re.search('22373-(.+?)-001',l)
if expected:
print expected.group(1)

Related

how can I substitute a matched string in python

I have a string ="/One/Two/Three/Four"
I want to convert it to ="Four"
I can do this in one line in perl
string =~ s/.*+\///g
How Can I do this in python?
str_name="/One/Two/Three/Four"
str_name.split('/')[-1]
In general, split is a safe way to convert a string into a list based on some reg-ex. Then, we can call the last element in that list, which happens to be "Four" in this case.
Hope this helps.
Python's re module can handle regular expressions. For this case, you'd do
import re
my_str = "/One/Two/Three/Four"
new_str = re.sub(".*/", "", my_str)
# 'Four'
re.sub() is the regex replacement method. Like your perl regex, we simply look for any number of characters, followed by a slash, and then replace that with the empty string. What's left is what's after the last slash, which is 4.
The are alot of possibilities to solve this. One way would be by indexing the string. Other string method can be found here
string ="/One/Two/Three/Four"
string[string.index('Four'):]
Additionally you could split the string by the slash with .split('/')
print(string.split('/')[-1])
Another option would be regular expressions: see here

replacing a sub string using regular expression in python

I have a string that contains sub strings like
RTDEFINITION(55,4) RTDEFINITION(45,2)
I need to replace every occurrence of this kind of string with another string:
DEFRTE
using Python and regular expressions. Any ideas?
thx
This should work
import re
re.sub(r'RTDEFINITION\(\d+,\d+\)', 'DEFRTE', mystring)

get substring between character and whitespace

I am trying to get a specific substring from a text file that is always located between the word "in" and and open parenthesis. ex. in TEXT (blah). I am trying to get at TEXT.
currently i am using this
m = text[text.find("in")+1:text.find("(")]
This isn't working because other sections of the larger string sometimes contain the letters i and n. So I am thinking I should change it so it is specifically looking for instances of "in" followed by whitespace.
I cannot figure hot to incorporate \s to accomplish this. How would I do this?
Use a regular expression for this:
import re
preg = re.compile(r'(?<=in\s)(.*?)(?=\s\()')
for match in preg.finditer(text):
print(match.group(0))
I am using positive lookbehinds and lookaheads to check for "in " and " (".
Take a look here, it might help understanding the regular expression better.
Try this:
if text.find("in ") != -1:
m = text[text.find("in ")+3:text.find("(")]

Extract string using regex in Python

I'm struggling a bit on how to extract (i.e. assign to variable) a string based on a regex. I have the regex worked out -- I tested on regexpal. But I'm lost on how I actually implement that in Python. My regex string is:
http://jenkins.mycompany.com/job/[^\s]+
What I want to do is take string and if there's a pattern in there that matches the regex, put that entire "pattern" into a variable. So for example, given the following string:
There is a problem with http://jenkins.mycompany.com/job/app/4567. We should fix this.
I want to extract http://jenkins.mycompany.com/job/app/4567and assign it a variable. I know I'm supposed to use re but I'm not sure if I want re.match or re.search and how to get what I want. Any help or pointers would be greatly appreciated.
import re
p = re.compile('http://jenkins.mycompany.com/job/[^\s]+')
line = 'There is a problem with http://jenkins.mycompany.com/job/app/4567. We should fix this.'
result = p.search(line)
print result.group(0)
Output:
http://jenkins.mycompany.com/job/app/4567.
Try the first found match in the string, using the re.findall method to select the first match:
re.findall(pattern_string, input_string)[0] # pick the first match that is found

python and regex

#!/usr/bin/python
import re
str = raw_input("String containing email...\t")
match = re.search(r'[\w.-]+#[\w.-]+', str)
if match:
print match.group()
it's not the most complicated code, and i'm looking for a way to get ALL of the matches, if it's possible.
It sounds like you want re.findall():
findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.
As far as the actual regular expression for identifying email addresses goes... See this question.
Also, be careful using str as a variable name. This will hide the str built-in.
I guess that re.findall is what you're looking for.
You should give a try for find() or findall()
findall() matches all occurrences of a
pattern, not just the first one as
search() does. For example, if one was
a writer and wanted to find all of the
adverbs in some text, he or she might
use findall()
http://docs.python.org/library/re.html#finding-all-adverbs
You don't use raw_input in the way you used. Just use raw_input to get the input from the console.
Don't override built-in's such as str. Use a meaningful name and assign it a whole string value.
Also it is a good idea many a times to compile your pattern have it a Regex object to match the string against. (illustrated in the code)
I just realized that a complete regex to match an email id exactly as per RFC822 could be a pageful otherwise this snippet should be useful.
import re
inputstr = "something#exmaple.com, 121#airtelnet.com, ra#g.net, etc etc\t"
mailsrch = re.compile(r'[\w\-][\w\-\.]+#[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
matches = mailsrch.findall(inputstr)
print matches

Categories