if 'bl1' in open('/tmp/ch.py').read():
print 'OK'
I have to search for the particular string "bl1".
Any way to get it?
I tried using ^bl1$ , it didn't work.
If you are simply running a verification to confirm "bl1" appears in the text of the file, I think your statement is OK. It have tested and the if statement returned True (given that ch.py has words containing 'bl1' inside)...
if 'bl1' in open('/tmp/ch.py').read():
print 'OK'
>>> if 'bl1' in open('/tmp/ch.py').read():
... print("ok")
...
ok
>>> if 'bl1' in "dfdsaflj hjjfadsfbl1dafdsfd bl1llll bdasbl1aa":
... print("ok")
...
ok
Reading from you saying '^bl1$', I assume you are trying to apply regular expression, but unfortunately you didn't follow the rules of regular expression explicitly.
If you are looking to extract the words containing the 3 consecutive characters 'bl1', you can apply the built-in function from re module.
>>> import re
>>> match = re.findall('\w+[bl1]\w+', open('/tmp/ch.py').read()) // finds all occurrences
>>> match
['asfdbl1', 'bl123'] // return all occurrences of words containing 'bl1' in a list
As in your format, it should look like this:
>>> if re.findall('\w+[bl1]\w+', open('/tmp/ch.py').read()):
... print('OK')
...
OK
>>>
In regular expression, '^bl1$' is the format to look one match in a string that starts with "bl1" and ends with "bl1", which means the whole string has to be 'bl1' exactly for a matching..
>>> match = re.findall('^bl1$', open('/tmp/ch.py').read())
>>> match
[]
>>> match = re.findall('^bl1$', 'bl1') // exactly "bl1"
>>> match
['bl1']
>>> match = re.findall('^bl1$', 'bl12') // not exactly "bl1"
>>> match
[]
If you are interested in regular expression, I hope you can find what you like in the documentation of Python standard libraries - re : https://docs.python.org/3.4/library/re.html
Related
>>> import re
>>> s = 'this is a test'
>>> reg1 = re.compile('test$')
>>> match1 = reg1.match(s)
>>> print match1
None
in Kiki that matches the test at the end of the s. What do I miss? (I tried re.compile(r'test$') as well)
Use
match1 = reg1.search(s)
instead. The match function only matches at the start of the string ... see the documentation here:
Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).
Your regex does not match the full string. You can use search instead as Useless mentioned, or you can change your regex to match the full string:
'^this is a test$'
Or somewhat harder to read but somewhat less useless:
'^t[^t]*test$'
It depends on what you're trying to do.
It's because of that match method returns None if it couldn't find expected pattern, if it find the pattern it would return an object with type of _sre.SRE_match .
So, if you want Boolean (True or False) result from match you must check the result is None or not!
You could examine texts are matched or not somehow like this:
string_to_evaluate = "Your text that needs to be examined"
expected_pattern = "pattern"
if re.match(expected_pattern, string_to_evaluate) is not None:
print("The text is as you expected!")
else:
print("The text is not as you expected!")
I am using Pythex to test out two regexes, and I get the result I'm hoping for in Pythex, however, when I run these regexes against test strings in the console or while running the program, I don't get the match I'm expecting.
The first regex is supposed to check that the string has a pair of letters which occur at least twice in the string (but this pair does not overlap). So, "xyxy" and "aabcdefgaa" are valid, while "aaa" is not, since the a's overlap. Here is a link to the Pythex regex, where it's working: http://pythex.org/?regex=(.)%7B1%7D.(.)%7B1%7D.%5C1.%5C2.&test_string=qjhvhtzxzqqjkmpb&ignorecase=0&multiline=0&dotall=0&verbose=0. and here is the console output of the same regex & string in the python console (2.7):
>>> import re
>>> pair_of_letters = re.compile('(.){1}.*(.){1}.*\1.*\2.*')
>>> string = "qjhvhtzxzqqjkmpb"
>>> match = pair_of_letters.match(string); print match
None
The second regex is supposed to check that the string has a pair of letters with exactly one character between them, e.g, "xyx", "abcdefeghi", or "aaa". Again, here's a link to Pythex: http://pythex.org/?regex=(.)%7B1%7D.%7B1%7D%5C1&test_string=qjhvhtzxzqqjkmpb&ignorecase=0&multiline=0&dotall=0&verbose=0 and below I've pasted the Python console output:
>>> repeated_letter_with_one_between = re.compile('(.){1}.{1}\1')
>>> string = "qjhvhtzxzqqjkmpb"
>>> match = repeated_letter_with_one_between.match(string); print match
None
Does anyone know what might account for the discrepancy? Thanks in advance.
Use raw strings to define a regex, or \1 will be interpreted as ASCII 01.
pair_of_letters = re.compile(r'(.).*(.).*\1.*\2.*')
repeated_letter_with_one_between = re.compile(r'(.).\1')
To illustrate:
>>> "\1"
'\x01'
>>> r"\1"
'\\1'
>>> print("\1")
�
>>> print(r"\1")
\1
I'm tackling a python challenge problem to find a block of text in the format xXXXxXXXx (lower vs upper case, not all X's) in a chunk like this:
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
I have tested the following RegEx and found it correctly matches what I am looking for from this site (http://www.regexr.com/):
'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])'
However, when I try to match this expression to the block of text, it just returns the entire string:
In [1]: import re
In [2]: example = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
In [3]: expression = re.compile(r'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])')
In [4]: found = expression.search(example)
In [5]: print found.string
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
Any ideas? Is my expression incorrect? Also, if there is a simpler way to represent that expression, feel free to let me know. I'm fairly new to RegEx.
You need to return the match group instead of the string attribute.
>>> import re
>>> s = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
>>> rgx = re.compile(r'[a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]')
>>> found = rgx.search(s).group()
>>> print found
nJDKoJIWh
The string attribute always returns the string passed as input to the match. This is clearly documented:
string
The string passed to match() or search().
The problem has nothing to do with the matching, you're just grabbing the wrong thing from the match object. Use match.group(0) (or match.group()).
Based on xXXXxXXXx if you want upper letters with len 3 and lower with len 1 between them this is what you want :
([a-z])(([A-Z]){3}([a-z]))+
also you can get your search function with group()
print expression.search(example).group(0)
I have a regex match object in Python. I want to get the text it matched. Say if the pattern is '1.3', and the search string is 'abc123xyz', I want to get '123'. How can I do that?
I know I can use match.string[match.start():match.end()], but I find that to be quite cumbersome (and in some cases wasteful) for such a basic query.
Is there a simpler way?
You can simply use the match object's group function, like:
match = re.search(r"1.3", "abc123xyz")
if match:
doSomethingWith(match.group(0))
to get the entire match. EDIT: as thg435 points out, you can also omit the 0 and just call match.group().
Addtional note: if your pattern contains parentheses, you can even get these submatches, by passing 1, 2 and so on to group().
You need to put the regex inside "()" to be able to get that part
>>> var = 'abc123xyz'
>>> exp = re.compile(".*(1.3).*")
>>> exp.match(var)
<_sre.SRE_Match object at 0x691738>
>>> exp.match(var).groups()
('123',)
>>> exp.match(var).group(0)
'abc123xyz'
>>> exp.match(var).group(1)
'123'
or else it will not return anything:
>>> var = 'abc123xyz'
>>> exp = re.compile("1.3")
>>> print exp.match(var)
None
I need to find the value of "taxid" in a large number of strings similar to one given below. For this particular string, the 'taxid' value is '9606'. I need to discard everything else. The "taxid" may appear anywhere in the text, but will always be followed by a ":" and then number.
score:0.86|taxid:9606(Human)|intact:EBI-999900
How to write regular expression for this in python.
>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'
If there are multiple taxids, use re.findall, which returns a list of all matches:
>>> re.findall(r'taxid:(\d+)', s)
['9606']
for line in lines:
match = re.match(".*\|taxid:([^|]+)\|.*",line)
print match.groups()