How can I get string from some specific characters? (more specifically, get "test" from "A8 test")
In this case, "A8" is following a pattern like "[A-Z]+[0-9]+".
So it can also be "C6 test","X90 test" and etc.
I've tried in Python using "(?<=[A-Z]+[0-9]).+", which throws an Exception:
"sre_constants.error: look-behind requires fixed-width pattern."
It means I should use fixed-width pattern such as "(?<=[A-Z]{1}[0-9]{1})".
But actually it's not fixed-width. What can I do?
If you means get the rest behind pattern "[A-Z]+[0-9]+", you can try this:
import re
s1 = 'A8 test'
s2 = 'C6 123'
s3 = 'X90 test32'
# parentheses is what you want
p = re.compile("[A-Z]+[0-9]+ (\w+)")
print(p.findall(s1))
print(p.findall(s2))
print(p.findall(s3))
output:
['test']
['123']
['test32']
Hope that will help you, and comment if you have further questions. : )
You can use a capture group to get what you need.
>>> regexp = r"[A-Z]+[0-9]+ (.+)"
>>> re.search(regexp, "C6 test")[1]
"test"
>>> re.search(regexp, "X90 test")[1]
"test"
>>> re.search(regexp, "CBF58456 test")[1]
"test"
Note that the current pattern you show would pick up on any number of uppercase letters followed by any number of digits, as long as there's at least one of each. Also note that my example above would require a blank between the first part and the test string to capture.
You could also use re.sub to jettison part of str you do not need by simply using empty str as second argument:
import re
text = "X90 test"
t = re.sub("[A-Z]+[0-9]+ ","",text)
print(t) #test
import re
ex = r"[A-Z]+[0-9]+ (.+)"
print(re.search(ex , "X90 test")[1])
print(re.search(ex , "C6 test")[1])
print(re.search(ex , "CBF58456 test")[1])
Output
test
test
test
You can split the string, then get your string.
>>> re.split(r'([A-Z]+[0-9]+ )(test)', 'A8 test')
['', 'A8 ', 'test', '']
Or you can write a simple function to find your string in the whole string by not using regex.
Related
I have some sample strings:
s = 'neg(able-23, never-21) s2-1/3'
i = 'amod(Market-8, magical-5) s1'
I've got the problem where I can figure out if the string has 's1' or 's3' using:
word = re.search(r's\d$', s)
But if I want to know if the contains 's2-1/3' in it, it won't work.
Is there a regex expression that can be used so that it works for both cases of 's#' and 's#+?
Thanks!
You can allow the characters "-" and "/" to be captured as well, in addition to just digits. It's hard to tell the exact pattern you're going for here, but something like this would capture "s2-1/3" from your example:
import re
s = "neg(able-23, never-21) s2-1/3"
word = re.search(r"s\d[-/\d]*$", s)
I'm guessing that maybe you would want to extract that with some expression, such as:
(s\d+)-?(.*)$
Demo 1
or:
(s\d+)-?([0-9]+)?\/?([0-9]+)?$
Demo 2
Test
import re
expression = r"(s\d+)-?(.*)$"
string = """
neg(able-23, never-21) s211-12/31
neg(able-23, never-21) s2-1/3
amod(Market-8, magical-5) s1
"""
print(re.findall(expression, string, re.M))
Output
[('s211', '12/31'), ('s2', '1/3'), ('s1', '')]
How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"
You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example
You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())
Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True
First off, I know this may seem like a duplicate question, however, I could find no working solution to my problem.
I have string that looks like the following:
string = "api('randomkey123xyz987', 'key', 'text')"
I need to extract randomkey123xyz987 which will always be between api(' and ',
I was planning on using Regex for this, however, I seem to be having some trouble.
This is the only progress that I have made:
import re
string = "api('randomkey123xyz987', 'key', 'text')"
match = re.findall("\((.*?)\)", string)[0]
print match
The following code returns 'randomkey123xyz987', 'key', 'text'
I have tried to use [^'], but my guess is that I am not properly inserting it into the re.findall function.
Everything that I am trying is failing.
Update: My current workaround is using [2:-4], but I would still like to avoid using match[2:-4].
If the string contains only one instance, use re.search() instead:
>>> import re
>>> s = "api('randomkey123xyz987', 'key', 'text')"
>>> match = re.search(r"api\('([^']*)'", s).group(1)
>>> print match
randomkey123xyz987
You want the string between the ( and ,, you are catching everything between the parens:
match = re.findall("api\((.*?),", string)
print match
["'randomkey123xyz987'"]
Or match between the '':
match = re.findall("api\('(.*?)'", string)
print match
['randomkey123xyz987']
If that is how your strings actually look you can split:
string = "api('randomkey123xyz987', 'key', 'text')"
print(string.split(",",1)[0][4:])
You should use the following regex:
api\('(.*?)'
Assuming that api( is fixed prefix
It matches api(, then captures what appears next, until ' token.
>>> re.findall(r"api\('(.*?)'", "api('randomkey123xyz987', 'key', 'text')")
['randomkey123xyz987']
If you are certain that randomkey123xyz987 will always be between "api('" and "',", then using the split() method can get it done in one line. This way you will not have to use regex matching. It will match the pattern between the starting and ending delimiter which is "api('" and "',
".
>>> string = "api('randomkey123xyz987', 'key', 'text')"
>>> value = (string.split("api('")[1]).split("',")[0]
>>> print value
randomkey123xyz987
Regex to retrieve the last portion of a string:
https://play.google.com/store/apps/details?id=com.lima.doodlejump
I'm looking to retrieve the string followed by id=
The following regex didn't seem to work in python
sampleURL = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
re.search("id=(.*?)", sampleURL).group(1)
The above should give me an output:
com.lima.doodlejump
Is my search group right?
Your regular expression
(.*?)
will not work because, it will match between zero and unlimited times, as few times as possible (becasue of the ?). So, you have the following choices of RegEx
(.*) # Matches the rest of the string
(.*?)$ # Matches till the end of the string
But, you don't need RegEx at all here, simply split the string like this
data = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
print data.split("id=", 1)[-1]
Output
com.lima.doodlejump
If you really have to use RegEx, you can do like this
data = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
import re
print re.search("id=(.*)", data).group(1)
Output
com.lima.doodlejump
I'm surprised that nobody has mentioned urlparse yet...
>>> s = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
>>> urlparse.urlparse(s)
ParseResult(scheme='https', netloc='play.google.com', path='/store/apps/details', params='', query='id=com.lima.doodlejump', fragment='')
>>> urlparse.parse_qs(urlparse.urlparse(s).query)
{'id': ['com.lima.doodlejump']}
>>> urlparse.parse_qs(urlparse.urlparse(s).query)['id']
['com.lima.doodlejump']
>>> urlparse.parse_qs(urlparse.urlparse(s).query)['id'][0]
'com.lima.doodlejump'
The HUGE advantage here is that if the url query string gets more components then it could easily break the other solutions which rely on a simple str.split. It won't confuse urlparse however :).
Just split it in the place you want:
id = url.split('id=')[1]
If you print id, you'll get:
com.lima.doodlejump
Regex isn't needed here :)
However, in case there are multiple id=s in your string, and you only wanted the last one:
id = url.split('id=')[-1]
Hope this helps!
This works:
>>> import re
>>> sampleURL = "https://play.google.com/store/apps/details?id=com.lima.doodlejump"
>>> re.search("id=(.+)", sampleURL).group(1)
'com.lima.doodlejump'
>>>
Instead of capturing non-greedily for zero or more characters, this code captures greedily for one or more.
In Perl it is possible to do something like this (I hope the syntax is right...):
$string =~ m/lalala(I want this part)lalala/;
$whatIWant = $1;
I want to do the same in Python and get the text inside the parenthesis in a string like $1.
If you want to get parts by name you can also do this:
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcom Reynolds")
>>> m.groupdict()
{'first_name': 'Malcom', 'last_name': 'Reynolds'}
The example was taken from the re docs
See: Python regex match objects
>>> import re
>>> p = re.compile("lalala(I want this part)lalala")
>>> p.match("lalalaI want this partlalala").group(1)
'I want this part'
import re
astr = 'lalalabeeplalala'
match = re.search('lalala(.*)lalala', astr)
whatIWant = match.group(1) if match else None
print(whatIWant)
A small note: in Perl, when you write
$string =~ m/lalala(.*)lalala/;
the regexp can match anywhere in the string. The equivalent is accomplished with the re.search() function, not the re.match() function, which requires that the pattern match starting at the beginning of the string.
import re
data = "some input data"
m = re.search("some (input) data", data)
if m: # "if match was successful" / "if matched"
print m.group(1)
Check the docs for more.
there's no need for regex. think simple.
>>> "lalala(I want this part)lalala".split("lalala")
['', '(I want this part)', '']
>>> "lalala(I want this part)lalala".split("lalala")[1]
'(I want this part)'
>>>
import re
match = re.match('lalala(I want this part)lalala', 'lalalaI want this partlalala')
print match.group(1)
import re
string_to_check = "other_text...lalalaI want this partlalala...other_text"
p = re.compile("lalala(I want this part)lalala") # regex pattern
m = p.search(string_to_check) # use p.match if what you want is always at beginning of string
if m:
print m.group(1)
In trying to convert a Perl program to Python that parses function names out of modules, I ran into this problem, I received an error saying "group" was undefined. I soon realized that the exception was being thrown because p.match / p.search returns 0 if there is not a matching string.
Thus, the group operator cannot function on it. So, to avoid an exception, check if a match has been stored and then apply the group operator.
import re
filename = './file_to_parse.py'
p = re.compile('def (\w*)') # \w* greedily matches [a-zA-Z0-9_] character set
for each_line in open(filename,'r'):
m = p.match(each_line) # tries to match regex rule in p
if m:
m = m.group(1)
print m